Bug #16606

indexed_search and utf transformations

Added by Thanos no-lastname-given about 14 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2006-09-29
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.1
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

indexed_search version all. Trying to fix in 2.9.2

I have a site with 2 languages:
Default that is set to be greek and English.
The database is set to:
MySQL charset: (utf8)
MySQL connection collation: utf8_general_ci
And the typo3 database is: MyISAM utf8_general_ci

In the localconf the forceCharser is set to utf-8
$TYPO3_CONF_VARS['BE']['forceCharset']='utf-8';

When I search in greek (both with and without the L variable set) for a single Greek word the input field tx_indexedsearch[sword]" returns and renders the word correctly (in readable Greek) but the search is contacted for few repeating Greek letters. You can see this at attached picture. I know is Greek to you but the problem is obvious.

When I search for an English word then everything works fine.

Now when I search in English (with L=1) for an English word again everything goes fine. But when I query a Greek word then another strange issue appears. The search-for string is renders the Greek word correctly but the input field returns the word in the utf equivalent that looks something like the following

(㥀㥀㥀㥀㥀㥀㥀㥀㥀㥀㥀 )

I tried to fixed that by removing the htmlspecialchars() when the input field is created in php, but I don’t believe that this is the correct approach.

The problem is the way the page is rendered. If I use utf-8 as metaCharset, the problems are solved but then I don’t have an actual localization and realy i dont like it.

So something in the core of the typo that handles the utf transformations is not working correctly and I noticed that also when I have to submit something from the front end (for example a chat) or when I have a javascript alert the same problem with the Greek appears.

Because I am new in the TYPO3 philosophy and my PHP programming skills are not so good I would appreciated some help in this issue.

(issue imported from #M4303)


Files

index_search1.gif (6.09 KB) index_search1.gif Administrator Admin, 2006-09-29 14:31
#1

Updated by Thanos no-lastname-given about 14 years ago

I delayed posting that bug and I made some progress that I would like to share. I found that the utf8_encode is applied twice to the query word.

At line 414 in the pi/class.tx_indexedsearch.php the function getSearchWords($defOp) takes the query string already encoded to utf-8 and then splits it in words. That is not correct because the spaces are lost. Also it makes a new utf_encode to that string and by that the Greek words are lost forever.

I made a primitive fix by using the following code:
$fixString = $GLOBALS['TSFE']->csConvObj->utf8_decode($this->piVars['sword'], $GLOBALS['TSFE']->metaCharset);

// Shorten search-word string to max 200 bytes (does NOT take multibyte charsets into account - but never mind, shortening the string here is only a run-away feature!)

$inSW = substr($fixString,0,200);

The rest of the functions code is the same.

I assume that this bug does not appear in English because the utf-8 and the iso-8859-1 first 127 characters are the same. Well in Greek iso are not.

Off course this is not the correct approach, but is a start.
First the initial utf_encode should be located and prevented.
Second the other part of the bug should be located. If I search in English for a Greek word then the problem of wrong encoding persist. That is because the use of the

utf8_encode($inSW, $GLOBALS['TSFE']->metaCharset)
where in the English translation the metaCharset is in English iso and the word I am searching is in Greek.

Any ideas? …

#2

Updated by Markus Klein almost 9 years ago

  • Target version deleted (0)

Is this still valid?

#3

Updated by Chris topher almost 9 years ago

  • Status changed from New to Needs Feedback
  • TYPO3 Version set to 4.1
#4

Updated by Alexander Opitz over 7 years ago

  • Status changed from Needs Feedback to Closed

No response over 1 year => closed.

Also available in: Atom PDF