Actions
Bug #16729
closedConverting external files to current charset fails
Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2006-11-20
Due date:
% Done:
0%
Estimated time:
TYPO3 Version:
4.0
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:
Description
When using the crawler extension, converting the charset of the indexed externel URL to UTF-8 fails if the external page does not give the charset in the metatags and is "iso-8859-1".
I solved the problem by adding one line the the function convertHTMLToUtf8 (in the file "class.indexer.php"):
// Find charset:
$charset = $charset ? $charset : $this->getHTMLcharset($content);
$charset = $this->csObj->parse_charset($charset);
// make the indexer convert the page if no charset is given...
if (!$charset) $charset='iso-8859-1';
Of course, this assumes, that the pages is in iso-8859-1 if no charset is given. But this is more likely true than assuming that the page is in utf-8 (as it works now).
(issue imported from #M4537)
Actions