Project

General

Profile

Actions

Bug #84541

closed

Uncaught TYPO3 Exception in indexed_search: duplicate key (hash) error

Added by Sybille Peters about 6 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2018-03-27
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
11
PHP Version:
8.1
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

I get an Exception with HTTP status code 503 on loading a page with the Exception message:

Oops, an error occurred!
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [55602717, "images\/botschaften", "66411504"]: Duplicate entry '55602717' for key 'PRIMARY'

System

  • TYPO3 version 8.7.17
  • current master

Severity of error

Even if Configuration preset "Live" is selected, an error is still displayed in the frontend on first load of page. The page is not rendered. Doesn't happen for all pages, only for pages where the hash collision occurs. Still, it's pretty ugly.

Reproduce

  1. Activate the extension indexed_search
  2. Add static includes for indexed_search
  3. Add the following words to a new or existing content element: graf gettogethers abfluss erworbener
  4. Load the page in the frontend (you may have to do that twice)

Expected result

Page is displayed

Actual result

Exception message:

Uncaught TYPO3 Exception
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [186135449, "gettogethers", "199699927"]: Duplicate entry '186135449' for key 'PRIMARY' 

Reproduce with test code

You can also reproduce the problem with this simple test script, which includes the original hash function from IndexedSearchUtility and some test strings

<?php

$str = ['graf', 'gettogethers', 'abfluss', 'erworbener'];

function md5inthash($stringToHash)
{
    return hexdec(substr(md5($stringToHash), 0, 7));
}

foreach ($str as $s) {
    print("string=$s hash=" . md5inthash($s) . "\n");
}

The first 2 strings and the third and forth string produce the same hash (which causes the collisions):
string=graf hash=186135449
string=gettogethers hash=186135449
string=abfluss hash=211412923
string=erworbener hash=211412923

Additional information

If you need to reproduce this several times, you can delete the index for the page by selecting the "Indexing" module, select the page with the content you inserted earlier, then select "Detailed statistics" and press the "Delete" garbage can.

You can also check the sys_log for similar errors:

 SELECT uid,FROM_UNIXTIME(tstamp),details FROM sys_log WHERE details LIKE '%index_word%' and error = 2 ORDER BY uid DESC LIMIT 10;

Cause of error

possibly: https://github.com/TYPO3/TYPO3.CMS/blob/8124407655ae73656bf6c21f6bc8841b8e1d2023/typo3/sysext/indexed_search/Classes/Indexer.php#L2134

(the link goes to a specific commit which may no longer reflect the current codebase)

hash collision?

Affected fields

These are the fields in the DB, that get filled using the current hash algorithm:

(all affected fields by looking at DB schema and content and guessing):

  • index_debug.phash
  • index_fulltext.phash
  • index_grlist.phash
  • index_grlist.phash_x
  • index_grlist.hash_gr_list
  • index_phash.phash
  • index_phash.phash_grouping
  • index_phash.contentHash
  • index_rel.phash
  • index_rel.wid
  • index_section.phash (see https://forge.typo3.org/issues/79802)
  • index_section.phash_t3
  • index_words.wid (see this issue)
  • index_words.metaphone

Currently however, only index_words.wid seems to be affected by the collisions.


Files

exception.png (86.5 KB) exception.png Sybille Peters, 2018-08-10 17:49
exception2.png (47.1 KB) exception2.png Sybille Peters, 2018-08-10 17:52
disable_fe.png (18.6 KB) disable_fe.png Sybille Peters, 2018-11-02 14:30

Related issues 7 (0 open7 closed)

Related to TYPO3 Core - Bug #79802: phash not uniqueResolved2017-02-14

Actions
Related to TYPO3 Core - Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_relResolved2018-12-12

Actions
Related to TYPO3 Core - Bug #101249: Prevent exception caused by hash collisions in indexed_searchResolved2023-07-05

Actions
Related to TYPO3 Core - Task #102975: Use full md5 hash for `indexed_search`ClosedStefan Bürk2024-01-29

Actions
Has duplicate TYPO3 Core - Bug #88557: Indexed Search: generates identical word ids for different wordsClosed2019-06-13

Actions
Is duplicate of TYPO3 Core - Bug #17619: INSERT-Error on table "index_words" on ORACLE and PostgreSQLClosed2007-09-19

Actions
Has duplicate TYPO3 Core - Bug #90977: possible race condition in indexedsearchResolved2020-04-07

Actions
Actions

Also available in: Atom PDF