Task #7469
Story #12706: Redesign search plugin
Indexing and search is broken
| Status: | Needs Feedback | Start date: | 2010-04-26 | |
|---|---|---|---|---|
| Priority: | Should have | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | - | |||
| Target version: | 1.10.0 | |||
| Votes: | 0 |
Description
I have two forum categories: "OpenSCADA" and "OpenSCADA EN". Category "OpenSCADA EN" placed to main page's part "en" and "OpenSCADA" placed to translation page's parts: "ru" and "ua".
Into "OpenSCADA" I had moved several topics. After that any indexing cause messages:
Could not find topic 85
Could not find topic 27
Could not find topic 15
Could not find topic 26
Could not find topic 37
And final search work only for "OpenSCADA EN".
No one message I can get from "OpenSCADA"!
History
Updated by Roman Savochenko about 3 years ago
Roman Savochenko wrote:
And final search work only for "OpenSCADA EN". No one message I can get from "OpenSCADA"!
I found problem into your code.
You full ignore not ANSI symbols into UTF-8 texts and Cyrillic, in my case, don't work.
I am fixed that problem by remove code "$string = preg_replace('/\W/',' ',$string);" from functions:
- tx_mmforum_pi4::searchform();
- tx_mmforum_indexing::wordArray()
For now search is work OK.
Updated by Martin Helmich about 3 years ago
- Status changed from New to Needs Feedback
- Target version deleted (
1.9.0)
Hi there,
just removing this regular expression replacement is problematic too, since without it, all kinds of special characters (like punctuations) will be indexed to. This way, the indexing will not be able to tell that "Hello!" and "Hello" are actually the same word.
The characters matched by the \W expression are affected by the locale setting. We've had a similar problem while setting up the indexing for the TYPO3.net forum. We fixed it by explicitly specifying the charset in the locale, just before the regular expression:
1 setlocale(LC_ALL, 'de_DE.iso-8859-1');
The config.locale_all directive in the Typoscript setup does not seem to affect the command-line scripts, that's why the locale needs to be set explicitly in the PHP script. Maybe we should implement an option to configure the locale to be used for indexing seperately?
Regards, Martin
Updated by Roman Savochenko about 3 years ago
The expression '/[*-_!~`#$%&\(\)|\']/' fully fix the problem for me.
Updated by Martin Helmich over 2 years ago
- Parent task set to #12706