Bug #64427
closed
Epic #65815: Improve Indexed search indexer
Indexes search - Issue with tt_news body text when image tag is included
Added by Siva Prasad almost 10 years ago.
Updated about 9 years ago.
Description
Hi all ,
Recently when I was working with indexed search and tt_news , I have found an issue which is quite strange . Lets assume that there are few number of tt_news records which contains an image tag in bodytext field . And in my indexedsearch configuration , I have added the fields "title,short,bodytext". so ideally the expected result is that , all these will be indexed with out any problems.But when I put an image tag inside the bodytext field of tt_news records and when try to index those records,title and subheader fields are properly indexed with out any issues.*But bodytext fields are not properly indexed*. I tested this with 10 + records and each time I experienced the same behavior.Can any one check this and feedback.
Thanks for early response.
BR
Siva
PIT Solutions Pvt Ltd.
- Status changed from New to Needs Feedback
- Assignee set to Mathias Schreiber
Hi Siva,
maybe there is a misunderstanding.
The indexedsearch does not index fields.
It builds its index based on the cached page, that an extension will generate.
So in order to get the bodytext of your news into the index you need to get a cached page rendered, that actually shows the bodytext.
Hi Mathias,
Thanks for the quick response !!
See the thing is ,in my case all pages/tt_news records are indexed properly. And when I search using indexedsearch extension I am able to see the news records as well.But the strange thing is , For some indexedrecords, only title and subheader fields are displaying and the bodytext fields were not showing in the search result. Though the body text field had a valid search string.
Consider a scenario like this . I searched a keyword "BMC" in typo3 backend and in news section it showed 74 news record that matches my search condition.But with the same string when I tried in the FE indexed search , It only showed 63 records. 11 records where missing . And when researched little deeper on this and one thing I was able to find out was that, all those missing 11 records contains a img in body.I removed img tag from one record and I re-indexed , I got that record in the search result with bodytext contents. This is why I raised a bug here ?
PS: For all these missing 11 records , the key word "BMC" was only present in bodytext.
Hi Siva,
thanks for clearing that up.
Is it possible you supply the rendered HTML output of the news bodytext once with the image and once without?
The markup is important, the "content" isn't, so feel free to scramble the content in case it's confidential - as long as the markup incl. comments (!) remains intact.
HI Mathias,
See below HTML content with Image .(Remove the original text and replaced with some dummy text).
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum ha</p>
<p><img src="http://www.doamin.com/fileadmin/_migrated/RTE/RTEmagicC_BikeDays_0214_big_01.jpg.jpg" data-htmlarea-file-uid="168" data-htmlarea-file-table="sys_file" data-htmlarea-clickenlarge="1" height="200" width="300" /></p>
<div style="font-size: 14px; "><p> <br /></p></div>
<p> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum ha </p>
<div></div>
<div><p><a href="" target="_blank" class="external-link-new-window" data-htmlarea-external="1"></a></p></div>
<div></div>
I wonder whether the indexer breaks because of the data attributes.
Here's something we would try:
Create an page somewhere in the tree and create an HTML element on that page.
Insert the code that tt_news rendered.
Then try removing the data attributes.
I'm sure we're narrowing this one down :)
HI Mathias,
Thanks for the tip . I added a new test page with same content which I provided in my last comment. Surprisingly that record indexed properly with and with out data attributes.
Now its becoming more confusing !! :P
Hi Mathias,
Any updates on this ??
not really, since I didn't find the time to work on that one yet.
It's still on my agenda, but I can't really give you a dedicated deadline yet.
Hi Mathias,
Thanks for the information .
BR
Siva
- Parent task set to #65815
I'm having hard time trying to reproduce the issue.
Can you please provide steps to follow to reproduce it?
Thanks
Sorry for the delayed response .
Can you put an html image tag (<img src="some_path">) inside the bodytext of any tt_news item and then try to index .I hope by this way we can reproduce the issue
Thanks & Regards
Siva
Hi Siva,
Can you make sure that the whole content of the tt_news is in between TYPO3SEARCH_ comments:
like that:
<!--TYPO3SEARCH_begin-->
Your content (header)
Your content (bodytext)
<!--TYPO3SEARCH_end-->
Indexed search takes only into account the content which is between these comments.
2. Can you also check in backend module, what content and words are indexed for this page?
3. Are you using the newest TYPO3 version (6.2.14)?
- Status changed from Needs Feedback to Closed
No feedback within the last 90 days => closing this issue.
If you think that this is the wrong decision or experience this issue again, then please write to the mailing list typo3.teams.bugs with issue number and an explanation or open a new ticket and add a relation to this ticket number.
Also available in: Atom
PDF