Bug #7362
nbsp are not detected to split words
| Status: | Resolved | Start date: | 2010-04-19 | |
|---|---|---|---|---|
| Priority: | Should have | Due date: | ||
| Assignee: | Ingo Renner | % Done: | 100% |
|
| Category: | Indexing | |||
| Target version: | 1.6-dkd | |||
| TYPO3 Version: | Has patch: | |||
| PHP Version: | Tags: | |||
| Votes: | 0 |
Description
Text that contains non-breaking spaces or dashes gets concatenated while being indexed. This looks strange at least in the auto suggest functionality. Examples:
"In Olten ist..." -> "oltenist"
"Die Region Olten-Gösgen-Gäu..." -> "oltengosgengau"
Might be related to issue 4646 (not using stemmer for auto suggest stuff).
Replacing the non-breaking spaces within the content elements does solve the problem. But I'd prefer an automated way of handling this.
Associated revisions
Fixed issue #7362: nbsp are not detected to split words
Fixed issue #7362: nbsp are not detected to split words
History
Updated by Ingo Renner almost 3 years ago
Mario Rimann wrote:
Text that contains non-breaking spaces or dashes gets concatenated while being indexed. This looks strange at least in the auto suggest functionality. Examples: "In Olten ist..." -> "oltenist" "Die Region Olten-Gösgen-Gäu..." -> "oltengosgengau"
How about a content post processing hook in tx_solr_Typo3PageContentExtractor::getIndexableContent() ?
Might be related to issue 4646 (not using stemmer for auto suggest stuff).
probably not.
I checked for available filters in Solr, but didn't find any which would fit for your case....
Updated by Ingo Renner almost 3 years ago
Ingo Renner wrote:
How about a content post processing hook in tx_solr_Typo3PageContentExtractor::getIndexableContent() ?
BTW: wasn't there a way in TS to do search & replace? I'd rather like to use that if it's available already. Couldn't find something during a quick check...
Updated by Ingo Renner over 2 years ago
Mario, can you please check whether this is still valid? I see we're already using html_entity_decode() in tx_solr_Typo3PageContentExtractor:: getIndexableContent()
Updated by Ingo Renner over 2 years ago
- Category set to Indexing
- Status changed from New to Resolved
- Assignee set to Ingo Renner
- Target version set to 1.6-dkd
- % Done changed from 0 to 100