Project

General

Profile

Actions

Bug #16986

closed

Word-Docs are not correct indexed

Added by Martin Kästner almost 18 years ago. Updated over 16 years ago.

Status:
Closed
Priority:
Should have
Category:
Indexed Search
Target version:
-
Start date:
2007-02-13
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

Typo3 4.0.4
Indexed Search 2.9.3
catdoc 0.91.5

I have configured indexed_search to index also external files. This works well for pdf, sxw, rtf, odt, ppt. But not for Word ".doc" - instead of the Text there are only question marks. It is surely a charset problem.

in a shell I could get the correct text from the file with the -8 option of catdoc:
catdoc -dutf-8 -8 worddok.doc

Is this a bug of catdoc? Should I use a newer version?
I have attached the file which could not get indexed correctly...

(issue imported from #M4985)


Files

worddok.doc (23.5 KB) worddok.doc Administrator Admin, 2007-02-13 23:06
Actions #1

Updated by Michael Stucki almost 18 years ago

I have catdoc 0.94 here, so maybe you first try updating...

Actions #2

Updated by Michael Stucki almost 18 years ago

If your catdoc doesn't work without the "-8" option, you'll need to fix it on that side. An upgrade should really help. With my 0.94 version the "-8" option is not required and the file is parsed correctly.

It's definitely not a TYPO3 bug!

Actions

Also available in: Atom PDF