Project

General

Profile

Actions

Bug #19254

closed

indexing of records containing HTML leads to concatenated words

Added by Andreas Rieser about 16 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Should have
Category:
Indexed Search
Target version:
-
Start date:
2008-08-26
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

While indexing records via the class.crawler.php in the function indexSingleRecord() the content of the fields is simply passed through strip_tags() - that's not sufficient.

In class.indexer.php in the function splitHTMLContent() it's solved like this:

// remove tags, but first make sure we don't concatenate words by doing it
$contentArr['body'] = str_replace('<',' <',$contentArr['body']);
$contentArr['body'] = trim(strip_tags($contentArr['body']));

This has to be done here too:

$theContent = '';
foreach($fieldList as $k => $v) {
if (!$k) {
$theTitle = $r[$v];
} else {
$theContent.= $r[$v].' ';
}
}
// add the following lines to prevent concatenated words
$theTitle= str_replace('<',' <',$theTitle);
$theContent= str_replace('<',' <',$theContent);
// Indexing the record as a page (but with parameters set, see >backend_setFreeIndexUid())
$indexerObj
>backend_indexAsTYPO3Page(
strip_tags($theTitle),
'',
'',
strip_tags($theContent),
$GLOBALS['LANG']->charSet, // Requires that
$r[$GLOBALS['TCA'][$cfgRec['table2index']]['ctrl']['tstamp']],
$r[$GLOBALS['TCA'][$cfgRec['table2index']]['ctrl']['crdate']],
$r['uid']
);

(issue imported from #M9229)


Files

9229.diff (896 Bytes) 9229.diff Administrator Admin, 2010-03-22 14:42
Actions #1

Updated by Dmitry Dulepov over 14 years ago

Revisions:
- 7147 for 4.3
- 7146 for 4.4

Actions #2

Updated by Benni Mack about 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF