indexed_search and utf transformations
indexed_search version all. Trying to fix in 2.9.2
I have a site with 2 languages:
Default that is set to be greek and English.
The database is set to:
MySQL charset: (utf8)
MySQL connection collation: utf8_general_ci
And the typo3 database is: MyISAM utf8_general_ci
In the localconf the forceCharser is set to utf-8
When I search in greek (both with and without the L variable set) for a single Greek word the input field tx_indexedsearch[sword]" returns and renders the word correctly (in readable Greek) but the search is contacted for few repeating Greek letters. You can see this at attached picture. I know is Greek to you but the problem is obvious.
When I search for an English word then everything works fine.
Now when I search in English (with L=1) for an English word again everything goes fine. But when I query a Greek word then another strange issue appears. The search-for string is renders the Greek word correctly but the input field returns the word in the utf equivalent that looks something like the following
I tried to fixed that by removing the htmlspecialchars() when the input field is created in php, but I don’t believe that this is the correct approach.
The problem is the way the page is rendered. If I use utf-8 as metaCharset, the problems are solved but then I don’t have an actual localization and realy i dont like it.
Because I am new in the TYPO3 philosophy and my PHP programming skills are not so good I would appreciated some help in this issue.
(issue imported from #M4303)
Updated by Thanos no-lastname-given over 14 years ago
I delayed posting that bug and I made some progress that I would like to share. I found that the utf8_encode is applied twice to the query word.
At line 414 in the pi/class.tx_indexedsearch.php the function getSearchWords($defOp) takes the query string already encoded to utf-8 and then splits it in words. That is not correct because the spaces are lost. Also it makes a new utf_encode to that string and by that the Greek words are lost forever.
I made a primitive fix by using the following code:
$fixString = $GLOBALS['TSFE']->csConvObj->utf8_decode($this->piVars['sword'], $GLOBALS['TSFE']->metaCharset);
// Shorten search-word string to max 200 bytes (does NOT take multibyte charsets into account - but never mind, shortening the string here is only a run-away feature!)
$inSW = substr($fixString,0,200);
The rest of the functions code is the same.
I assume that this bug does not appear in English because the utf-8 and the iso-8859-1 first 127 characters are the same. Well in Greek iso are not.
Off course this is not the correct approach, but is a start.
First the initial utf_encode should be located and prevented.
Second the other part of the bug should be located. If I search in English for a Greek word then the problem of wrong encoding persist. That is because the use of the
where in the English translation the metaCharset is in English iso and the word I am searching is in Greek.
Any ideas? …