Bug #84541

Uncaught TYPO3 Exception in indexed_search: duplicate key (hash) error

Added by Sybille Peters over 1 year ago. Updated 5 days ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2018-03-27
Due date:
% Done:

0%

TYPO3 Version:
8
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:
On Location Sprint

Description

I get an Exception with HTTP status code 503 on loading a page with the Exception message:

Oops, an error occurred!
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [55602717, "images\/botschaften", "66411504"]: Duplicate entry '55602717' for key 'PRIMARY'

System

  • TYPO3 version 8.7.17
  • current master

Severity of error

Even if Configuration preset "Live" is selected, an error is still displayed in the frontend on first load of page. The page is not rendered. Doesn't happen for all pages, only for pages where the hash collision occurs. Still, it's pretty ugly.

Reproduce

  1. Activate the extension indexed_search
  2. Add static includes for indexed_search
  3. Add the following words to a new or existing content element: graf gettogethers abfluss erworbener
  4. Load the page in the frontend (you may have to do that twice)

Expected result

Page is displayed

Actual result

Exception message:

Uncaught TYPO3 Exception
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [186135449, "gettogethers", "199699927"]: Duplicate entry '186135449' for key 'PRIMARY' 

Reproduce with test code

You can also reproduce the problem with this simple test script, which includes the original hash function from IndexedSearchUtility and some test strings

<?php

$str = ['graf', 'gettogethers', 'abfluss', 'erworbener'];

function md5inthash($stringToHash)
{
    return hexdec(substr(md5($stringToHash), 0, 7));
}

foreach ($str as $s) {
    print("string=$s hash=" . md5inthash($s) . "\n");
}

The first 2 strings and the third and forth string produce the same hash (which causes the collisions):
string=graf hash=186135449
string=gettogethers hash=186135449
string=abfluss hash=211412923
string=erworbener hash=211412923

Additional information

If you need to reproduce this several times, you can delete the index for the page by selecting the "Indexing" module, select the page with the content you inserted earlier, then select "Detailed statistics" and press the "Delete" garbage can.

You can also check the sys_log for similar errors:

 SELECT uid,FROM_UNIXTIME(tstamp),details FROM sys_log WHERE details LIKE '%index_word%' and error = 2 ORDER BY uid DESC LIMIT 10;

Cause of error

possibly: https://github.com/TYPO3/TYPO3.CMS/blob/8124407655ae73656bf6c21f6bc8841b8e1d2023/typo3/sysext/indexed_search/Classes/Indexer.php#L2134

(the link goes to a specific commit which may no longer reflect the current codebase)

hash collision?

Affected fields

These are the fields in the DB, that get filled using the current hash algorithm:

(all affected fields by looking at DB schema and content and guessing):

  • index_debug.phash
  • index_fulltext.phash
  • index_grlist.phash
  • index_grlist.phash_x
  • index_grlist.hash_gr_list
  • index_phash.phash
  • index_phash.phash_grouping
  • index_phash.contentHash
  • index_rel.phash
  • index_rel.wid
  • index_section.phash (see https://forge.typo3.org/issues/79802)
  • index_section.phash_t3
  • index_words.wid (see this issue)
  • index_words.metaphone

Currently however, only index_words.wid seems to be affected by the collisions.

exception.png View (86.5 KB) Sybille Peters, 2018-08-10 17:49

exception2.png View (47.1 KB) Sybille Peters, 2018-08-10 17:52

disable_fe.png View (18.6 KB) Sybille Peters, 2018-11-02 14:30


Related issues

Related to TYPO3 Core - Bug #79802: phash not unique New 2017-02-14
Related to TYPO3 Core - Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel New 2018-12-12
Duplicated by TYPO3 Core - Bug #88557: Indexed Search: generates identical word ids for different words New 2019-06-13

History

#1 Updated by Sybille Peters over 1 year ago

The hash function IndexedSearchUtility::md5inthash() does indeed have collisions for e.g. the following words:

  • graf + gettogethers (md5inthash: 186135449)
  • erworbener + ablfluss (md5inthash: 211412923)

because only a substring of md5 is used. The md5 hash ist 32 chars long and only the first 7 chars are used (and then converted to int).

#2 Updated by Sybille Peters over 1 year ago

  • Description updated (diff)

#3 Updated by Gerrit Code Review over 1 year ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#4 Updated by Gerrit Code Review over 1 year ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#5 Updated by Gerrit Code Review over 1 year ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#6 Updated by Gerrit Code Review over 1 year ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#7 Updated by Gerrit Code Review over 1 year ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#8 Updated by Gerrit Code Review over 1 year ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#9 Updated by Gerrit Code Review over 1 year ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#10 Updated by Gerrit Code Review over 1 year ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#11 Updated by Sybille Peters over 1 year ago

  • Subject changed from Exception with duplicate key error in database for indexed_search to Exception with duplicate key (hash) error in database for indexed_search

#12 Updated by Sybille Peters over 1 year ago

#13 Updated by Sybille Peters over 1 year ago

  • Assignee set to Sybille Peters

#14 Updated by Gerrit Code Review over 1 year ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#15 Updated by Gerrit Code Review over 1 year ago

Patch set 10 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#16 Updated by Gerrit Code Review over 1 year ago

Patch set 11 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#17 Updated by Sybille Peters over 1 year ago

  • Subject changed from Exception with duplicate key (hash) error in database for indexed_search to Uncaught TYPO3 Exception in indexed_search: duplicate key (hash) error

#18 Updated by Sybille Peters over 1 year ago

  • Tags set to exception_handling

#19 Updated by Sybille Peters over 1 year ago

#20 Updated by Sybille Peters over 1 year ago

Still reproducible with current 8.7.19-dev and master (9.4.0-dev).

#21 Updated by Sybille Peters over 1 year ago

  • Description updated (diff)
  • Tags deleted (exception_handling)

#22 Updated by Sybille Peters over 1 year ago

  • Description updated (diff)

#23 Updated by Sybille Peters over 1 year ago

Added more information to description:

  • Affected fields
  • PHP script to reproduce

#24 Updated by Sybille Peters over 1 year ago

  • Assignee deleted (Sybille Peters)

#25 Updated by Gerrit Code Review about 1 year ago

Patch set 12 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#26 Updated by Gerrit Code Review about 1 year ago

Patch set 13 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#27 Updated by Gerrit Code Review about 1 year ago

Patch set 14 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

#28 Updated by Gerrit Code Review about 1 year ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

#29 Updated by Gerrit Code Review about 1 year ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

#30 Updated by Gerrit Code Review about 1 year ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#31 Updated by Gerrit Code Review about 1 year ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#32 Updated by Gerrit Code Review about 1 year ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

#33 Updated by Gerrit Code Review about 1 year ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#34 Updated by Gerrit Code Review about 1 year ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#35 Updated by Sybille Peters about 1 year ago

  • Sprint Focus set to On Location Sprint

#36 Updated by Gerrit Code Review about 1 year ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#37 Updated by Gerrit Code Review about 1 year ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#38 Updated by Gerrit Code Review about 1 year ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#39 Updated by Gerrit Code Review about 1 year ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#40 Updated by Gerrit Code Review about 1 year ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#41 Updated by Sybille Peters about 1 year ago

Workaround:

If you are affected by this bug and waiting for a patch, there is a workaround: You can disable the Indexing in the Frontend:

To do this, go to the Extension Module in the BE, select indexed_search and the cog wheel, click checkbox on "basic.disableFrontendIndexing".

Impact

Note: This does not disable the bug, but the effects are not as severe, because the Exceptions do not get thrown on FE rendering when the page is first loaded. If you do this however, you must activate some other method for indexing the pages, typically using the crawler via scheduler mechanism, see https://docs.typo3.org/typo3cms/extensions/indexed_search/IndexingConfigurations/CrawlerSetup/Index.html

#42 Updated by Gerrit Code Review about 1 year ago

Patch set 10 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#43 Updated by Gerrit Code Review about 1 year ago

Patch set 11 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#44 Updated by Gerrit Code Review about 1 year ago

Patch set 12 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#45 Updated by Gerrit Code Review about 1 year ago

Patch set 13 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#46 Updated by Sybille Peters about 1 year ago

  • Related to Bug #86491: Duplicate entry for PRIMARY key in cache_treelist added

#47 Updated by Gerrit Code Review about 1 year ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

#48 Updated by Gerrit Code Review about 1 year ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58853

#49 Updated by Gerrit Code Review about 1 year ago

Patch set 14 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#50 Updated by Gerrit Code Review about 1 year ago

Patch set 15 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#51 Updated by Gerrit Code Review about 1 year ago

Patch set 16 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

#52 Updated by Alexander Schnitzler about 1 year ago

  • Related to deleted (Bug #86491: Duplicate entry for PRIMARY key in cache_treelist)

#53 Updated by Reinhard Hiebl about 1 year ago

  • Related to Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel added

#54 Updated by Reinhard Hiebl about 1 year ago

  • Related to Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel added

#55 Updated by Reinhard Hiebl about 1 year ago

  • Related to deleted (Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel)

#56 Updated by Gerrit Code Review 10 months ago

Patch set 17 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/58717

#57 Updated by Gerrit Code Review 10 months ago

Patch set 18 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/58717

#58 Updated by Jonas Eberle 6 months ago

  • Related to Bug #88557: Indexed Search: generates identical word ids for different words added

#59 Updated by Jonas Eberle 6 months ago

  • Related to deleted (Bug #88557: Indexed Search: generates identical word ids for different words)

#60 Updated by Jonas Eberle 6 months ago

  • Duplicated by Bug #88557: Indexed Search: generates identical word ids for different words added

#61 Updated by Sybille Peters about 2 months ago

  • Status changed from Under Review to New

#62 Updated by Sybille Peters about 2 months ago

Patch was abandoned.

#63 Updated by Wolfgang Wagner 10 days ago

Problem still exists in 9.5.11 :-/

#64 Updated by Florian Rival 5 days ago

I confirm, the problem is still present with Typo3 9.5.11

An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' 
with params [108983540, "actors", "116730212"]: Duplicate entry '108983540' for key 'PRIMARY'

Also available in: Atom PDF