Bug #15020
closed
mysql like regarding german "umlaute"
Added by Nikolas Hagelstein about 19 years ago.
Updated about 9 years ago.
Description
when performing a like search on index_search.baseword containing a special character like an german "umlaut" e.g."ß"
mysql returns
"schloß"
but also
"schloss"
this is because mysql reduces characters to its "basecharater" unless the db field is flagged as binary.
a resultrow containing basecharactered word instead of the orignal search word makes indexedsearch fail on rendering the preview/highlight-searchword stuff.
quick n dirty workaround is to set index_search.baseword to binary.
(issue imported from #M1561)
We cannot change this table field to binary, as this table is needed quite often and changing this field to binary might have serious speed impacts.
Are there some frontend errors because of this problem?
Greets, Sebastian
Are there some frontend errors because of this problem?
yes there are :
"a resultrow containing basecharactered word instead of the orignal search word makes indexedsearch fail on rendering the preview/highlight-searchword stuff."
cheers,
Nikolas
Hi Michael,
maybe you can have a look at this, as you know indexed search very well.
Greets, Sebastian
Changing the field to a blob does requires that the baseword index will be removed.
This is definitely a bad idea and I will not change this unless there is another solution.
Forget my last comment, BINARY != BLOB
Yes I think this can be changed now :-)
Michael:
but ...changing to binary would disable any "intelligent"-search which is possible wanted. A better solution would be to improve the "highlight searched word"-part.
Cheers,
Nikolas
What about changing the LIKEs in tx_indexedsearch::getPhashList() to somewhat like this:
CONVERT LIKE CONVERT
Just tested mentioned CONVERT-Syntax, works well on multiple installations, give it a try :)
Requires MySQL 4.1 or higher.
Regards,
Chris
Setting collation of the table index_search to utf8_general_ci instead of utf8_unicode_ci should avoid the whole problem outside TYPO3.
Collation of the field baseword also must not be utf8_unicode_ci but utf8_general_ci.
The syntax from note #22214 would break DBAL compatibility. As such -1 for this solution.
I am going to add this information to the documentation. In general the current solution works correctly for most languages. For example, in Latvian people often omit accents and search with palin latin letters. Google allows that and finds correct words. FOr example, people type "stradat" anf expect to find "str?d?t". This works correctly and according to expectations. Therefore it is not a bug. If this behavior is not desired, database should be updated locally.
- Description updated (diff)
- Status changed from Accepted to Needs Feedback
- Target version deleted (
0)
- TYPO3 Version set to 7
- Is Regression set to No
is this still viable with the entire utf8 stuff we changed?
Mathias Schreiber wrote:
is this still viable with the entire utf8 stuff we changed?
no se, haven't tested this for ages (literally).
the search works correctly (finds accented words when searching without accents)
however it doesn't highlight the accented word in the text in that case
So we can close this issue?
Highlighting of words isn't easy possible.
- Status changed from Needs Feedback to Closed
No feedback within the last 90 days => closing this issue.
If you think that this is the wrong decision or experience this issue again, then please write to the mailing list typo3.teams.bugs with issue number and an explanation or open a new ticket and add a relation to this ticket number.
Also available in: Atom
PDF