Bug #64427

Epic #65815: Improve Indexed search indexer

Indexes search - Issue with tt_news body text when image tag is included

Added by Siva Prasad over 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Should have
Category:
Indexed Search
Target version:
-
Start date:
2015-01-22
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
6.2
PHP Version:
5.5
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

Hi all ,

Recently when I was working with indexed search and tt_news , I have found an issue which is quite strange . Lets assume that there are few number of tt_news records which contains an image tag in bodytext field . And in my indexedsearch configuration , I have added the fields "title,short,bodytext". so ideally the expected result is that , all these will be indexed with out any problems.But when I put an image tag inside the bodytext field of tt_news records and when try to index those records,title and subheader fields are properly indexed with out any issues.*But bodytext fields are not properly indexed*. I tested this with 10 + records and each time I experienced the same behavior.Can any one check this and feedback.

Thanks for early response.

BR
Siva
PIT Solutions Pvt Ltd.

#1

Updated by Mathias Schreiber over 6 years ago

  • Status changed from New to Needs Feedback
  • Assignee set to Mathias Schreiber

Hi Siva,

maybe there is a misunderstanding.
The indexedsearch does not index fields.
It builds its index based on the cached page, that an extension will generate.

So in order to get the bodytext of your news into the index you need to get a cached page rendered, that actually shows the bodytext.

#2

Updated by Siva Prasad over 6 years ago

Hi Mathias,

Thanks for the quick response !!

See the thing is ,in my case all pages/tt_news records are indexed properly. And when I search using indexedsearch extension I am able to see the news records as well.But the strange thing is , For some indexedrecords, only title and subheader fields are displaying and the bodytext fields were not showing in the search result. Though the body text field had a valid search string.

Consider a scenario like this . I searched a keyword "BMC" in typo3 backend and in news section it showed 74 news record that matches my search condition.But with the same string when I tried in the FE indexed search , It only showed 63 records. 11 records where missing . And when researched little deeper on this and one thing I was able to find out was that, all those missing 11 records contains a img in body.I removed img tag from one record and I re-indexed , I got that record in the search result with bodytext contents. This is why I raised a bug here ?

PS: For all these missing 11 records , the key word "BMC" was only present in bodytext.

#3

Updated by Mathias Schreiber over 6 years ago

Hi Siva,

thanks for clearing that up.
Is it possible you supply the rendered HTML output of the news bodytext once with the image and once without?
The markup is important, the "content" isn't, so feel free to scramble the content in case it's confidential - as long as the markup incl. comments (!) remains intact.

#4

Updated by Siva Prasad over 6 years ago

HI Mathias,

See below HTML content with Image .(Remove the original text and replaced with some dummy text).

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum ha</p>
<p><img src="http://www.doamin.com/fileadmin/_migrated/RTE/RTEmagicC_BikeDays_0214_big_01.jpg.jpg" data-htmlarea-file-uid="168" data-htmlarea-file-table="sys_file" data-htmlarea-clickenlarge="1" height="200" width="300" /></p>
<div style="font-size: 14px; "><p>&nbsp;<br /></p></div>
<p> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum ha&nbsp;</p>
<div></div>
<div><p><a href="" target="_blank" class="external-link-new-window" data-htmlarea-external="1"></a></p></div>
<div></div>
#5

Updated by Mathias Schreiber over 6 years ago

I wonder whether the indexer breaks because of the data attributes.
Here's something we would try:
Create an page somewhere in the tree and create an HTML element on that page.
Insert the code that tt_news rendered.
Then try removing the data attributes.

I'm sure we're narrowing this one down :)

#6

Updated by Siva Prasad over 6 years ago

HI Mathias,

Thanks for the tip . I added a new test page with same content which I provided in my last comment. Surprisingly that record indexed properly with and with out data attributes.

Now its becoming more confusing !! :P

#7

Updated by Siva Prasad over 6 years ago

Hi Mathias,

Any updates on this ??

#8

Updated by Mathias Schreiber over 6 years ago

not really, since I didn't find the time to work on that one yet.
It's still on my agenda, but I can't really give you a dedicated deadline yet.

#9

Updated by Siva Prasad over 6 years ago

Hi Mathias,

Thanks for the information .

BR
Siva

#10

Updated by Tymoteusz Motylewski over 6 years ago

  • Parent task set to #65815
#11

Updated by Tymoteusz Motylewski over 6 years ago

I'm having hard time trying to reproduce the issue.
Can you please provide steps to follow to reproduce it?
Thanks

#12

Updated by Siva Prasad over 6 years ago

Sorry for the delayed response .

Can you put an html image tag (<img src="some_path">) inside the bodytext of any tt_news item and then try to index .I hope by this way we can reproduce the issue

Thanks & Regards
Siva

#13

Updated by Tymoteusz Motylewski about 6 years ago

Hi Siva,
Can you make sure that the whole content of the tt_news is in between TYPO3SEARCH_ comments:
like that:

<!--TYPO3SEARCH_begin-->

Your content (header)

Your content (bodytext)

<!--TYPO3SEARCH_end--> 

Indexed search takes only into account the content which is between these comments.

2. Can you also check in backend module, what content and words are indexed for this page?
3. Are you using the newest TYPO3 version (6.2.14)?

#14

Updated by Riccardo De Contardi almost 6 years ago

  • Status changed from Needs Feedback to Closed

No feedback within the last 90 days => closing this issue.

If you think that this is the wrong decision or experience this issue again, then please write to the mailing list typo3.teams.bugs with issue number and an explanation or open a new ticket and add a relation to this ticket number.

Also available in: Atom PDF