Project

General

Profile

Actions

Bug #19400

closed

Indexed search doesn't find letters in Thai language.

Added by Juraj Sulek about 16 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2008-10-01
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.3
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

I have a multilanguage site with 35 languages. Some of the langages use special letters like Russian, Chinese, Arabic, Thai and more. The site ist utf-8 encoded. Indexed search have the same configufarion for each language there is only the L parameter different. The search works for all languages except of Thai.
I have prove it in the database. The page for thai is indexed but the search find 0 records if i try to search some word.

Thai use such letters:
????????????????????????????????????????????????????????? (????????) ????
???????????????????????????????? ????????????????????????????????????????????????????????????????????????? ? ??????? ?????????????????????????????????????????????
??????????????????????????????????????????????????????
?????????????????????????????????????????????????????????????????????????????????
?????????????????????????? ????????????????????????????????? ??????????
?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????????????????????????????????????

Does known anyone why the search doesnt work. As i write above for china, russia and other languages with special character the search works fine, only the thai doesn't works.

(issue imported from #M9461)

Actions #1

Updated by Juraj Sulek about 16 years ago

As i see it is not posible to copy the letters from page direct here. Here is a page with thai letters: http://www.bmeia.gv.at/th/botschaft/bangkok/impressum.html

Actions #2

Updated by Peter Niederlag over 15 years ago

Has this been solved in the meantime?

Actions #3

Updated by Dmitry Dulepov about 14 years ago

This cannot be really solved. We spent a lot of time investigating this and found out that every asian language has its specifics and it is not possible to build a common algorithm to recognize characters there. For example, japanese has three different spellings and some of them can be mixed. It is beyond our "open source" powers to implement that algorithm.

Currently indexed search implements a very simple approach. Each asian characters consists from 2-3 bytes. Consider you have "~ab~cd~efh~ijk" as a text on the page (~ separates bytes) and need to search for "xa~bc". Indexed search will try this:
abcdefjjik matches xabc?
bcdefjjik matches xabc?
cdefjjik matches xabc?
and so on. This is a standard simple way to do it and it often gives good results but not always (depends a lot on the language and text). I do not think we can implement anything more sophisticated than that.

Actions #4

Updated by Alexander Opitz about 11 years ago

  • Status changed from Accepted to Closed
  • Target version deleted (0)
  • TYPO3 Version set to 4.3
  • Is Regression set to No

As this can't be resolved on indexed_search, you may use Solr search fur such languages.

If you think, that this is the wrong decision, then please write to the mailing list typo3.teams.bugs with issue number and an explanation.

Actions

Also available in: Atom PDF