CoreCommunity ExtensionsIncubatorDistributionsTYPO3 4.5 ProjectsTYPO3 4.6 ProjectsTYPO3 4.7 ProjectsTYPO3 6.0 ProjectsTYPO3 6.1 ProjectsTYPO3 6.2 Projects (+)

Bug #7362

nbsp are not detected to split words

Added by Mario Rimann about 3 years ago. Updated over 2 years ago.

Status:Resolved Start date:2010-04-19
Priority:Should have Due date:
Assignee:Ingo Renner % Done:

100%

Category:Indexing
Target version:1.6-dkd
TYPO3 Version: Has patch:
PHP Version: Tags:
Votes: 0

Description

Text that contains non-breaking spaces or dashes gets concatenated while being indexed. This looks strange at least in the auto suggest functionality. Examples:
"In Olten ist..." -> "oltenist"
"Die Region Olten-Gösgen-Gäu..." -> "oltengosgengau"

Might be related to issue 4646 (not using stemmer for auto suggest stuff).

Replacing the non-breaking spaces within the content elements does solve the problem. But I'd prefer an automated way of handling this.

Associated revisions

Revision 43972
Added by Ingo Renner about 2 years ago

Fixed issue #7362: nbsp are not detected to split words

Revision 43972
Added by Ingo Renner about 2 years ago

Fixed issue #7362: nbsp are not detected to split words

History

Updated by Ingo Renner about 3 years ago

  • Target version deleted (1.1)

removed v1.1 target

Updated by Ingo Renner almost 3 years ago

Mario Rimann wrote:

Text that contains non-breaking spaces or dashes gets concatenated while being indexed. This looks strange at least in the auto suggest functionality. Examples: "In Olten ist..." -> "oltenist" "Die Region Olten-Gösgen-Gäu..." -> "oltengosgengau"

How about a content post processing hook in tx_solr_Typo3PageContentExtractor::getIndexableContent() ?

Might be related to issue 4646 (not using stemmer for auto suggest stuff).

probably not.

I checked for available filters in Solr, but didn't find any which would fit for your case....

Updated by Ingo Renner almost 3 years ago

Ingo Renner wrote:

How about a content post processing hook in tx_solr_Typo3PageContentExtractor::getIndexableContent() ?

BTW: wasn't there a way in TS to do search & replace? I'd rather like to use that if it's available already. Couldn't find something during a quick check...

Updated by Ingo Renner over 2 years ago

Mario, can you please check whether this is still valid? I see we're already using html_entity_decode() in tx_solr_Typo3PageContentExtractor:: getIndexableContent()

Updated by Ingo Renner over 2 years ago

  • Category set to Indexing
  • Status changed from New to Resolved
  • Assignee set to Ingo Renner
  • Target version set to 1.6-dkd
  • % Done changed from 0 to 100

fixed in 1.6-dkd, will be in 1.3

Also available in: Atom PDF