Feature #14355

catdoc with default charsets on indexed search

Added by Stefan Kreisberg about 17 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Should have
Category:
Frontend
Target version:
-
Start date:
2004-10-12
Due date:
% Done:

0%

Estimated time:
PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

Catdoc default parses doc files with cp-something not western european, resultning in a bad display of the search result. Usual workaround: compile catdoc with default charset for source and destinations. Problem: if no hands on server available. Solutions: provide -d and -s for execs to catdoc

Proposed solution: ts-setup flags source and destination charset. I.e.
source=cp1262
dest=8859-1
(catdoc: http://www.45.free.net/~vitus/ice/catdoc/charsets.html) and modify readFileContent in class.indexer.php accordingly, i.e.:

catdoc s[ts>source] -d[ts-dest] and all search result displays will be correct according to locale
(issue imported from #M417)

#1

Updated by Michael Stucki about 16 years ago

Fixed in 3.8.0. The indexer now indexes all external files with utf-8 charset and converts it back right before the text is displayed on the frontend.

Also available in: Atom PDF