Project

General

Profile

Actions

Bug #16729

closed

Converting external files to current charset fails

Added by Christian Buelter about 18 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2006-11-20
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.0
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

When using the crawler extension, converting the charset of the indexed externel URL to UTF-8 fails if the external page does not give the charset in the metatags and is "iso-8859-1".

I solved the problem by adding one line the the function convertHTMLToUtf8 (in the file "class.indexer.php"):

// Find charset:
$charset = $charset ? $charset : $this->getHTMLcharset($content);
$charset = $this->csObj->parse_charset($charset);
// make the indexer convert the page if no charset is given...
if (!$charset) $charset='iso-8859-1';

Of course, this assumes, that the pages is in iso-8859-1 if no charset is given. But this is more likely true than assuming that the page is in utf-8 (as it works now).

(issue imported from #M4537)

Actions

Also available in: Atom PDF