Project

General

Profile

Actions

Bug #84541

closed

Uncaught TYPO3 Exception in indexed_search: duplicate key (hash) error

Added by Sybille Peters about 6 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2018-03-27
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
11
PHP Version:
8.1
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

I get an Exception with HTTP status code 503 on loading a page with the Exception message:

Oops, an error occurred!
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [55602717, "images\/botschaften", "66411504"]: Duplicate entry '55602717' for key 'PRIMARY'

System

  • TYPO3 version 8.7.17
  • current master

Severity of error

Even if Configuration preset "Live" is selected, an error is still displayed in the frontend on first load of page. The page is not rendered. Doesn't happen for all pages, only for pages where the hash collision occurs. Still, it's pretty ugly.

Reproduce

  1. Activate the extension indexed_search
  2. Add static includes for indexed_search
  3. Add the following words to a new or existing content element: graf gettogethers abfluss erworbener
  4. Load the page in the frontend (you may have to do that twice)

Expected result

Page is displayed

Actual result

Exception message:

Uncaught TYPO3 Exception
An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [186135449, "gettogethers", "199699927"]: Duplicate entry '186135449' for key 'PRIMARY' 

Reproduce with test code

You can also reproduce the problem with this simple test script, which includes the original hash function from IndexedSearchUtility and some test strings

<?php

$str = ['graf', 'gettogethers', 'abfluss', 'erworbener'];

function md5inthash($stringToHash)
{
    return hexdec(substr(md5($stringToHash), 0, 7));
}

foreach ($str as $s) {
    print("string=$s hash=" . md5inthash($s) . "\n");
}

The first 2 strings and the third and forth string produce the same hash (which causes the collisions):
string=graf hash=186135449
string=gettogethers hash=186135449
string=abfluss hash=211412923
string=erworbener hash=211412923

Additional information

If you need to reproduce this several times, you can delete the index for the page by selecting the "Indexing" module, select the page with the content you inserted earlier, then select "Detailed statistics" and press the "Delete" garbage can.

You can also check the sys_log for similar errors:

 SELECT uid,FROM_UNIXTIME(tstamp),details FROM sys_log WHERE details LIKE '%index_word%' and error = 2 ORDER BY uid DESC LIMIT 10;

Cause of error

possibly: https://github.com/TYPO3/TYPO3.CMS/blob/8124407655ae73656bf6c21f6bc8841b8e1d2023/typo3/sysext/indexed_search/Classes/Indexer.php#L2134

(the link goes to a specific commit which may no longer reflect the current codebase)

hash collision?

Affected fields

These are the fields in the DB, that get filled using the current hash algorithm:

(all affected fields by looking at DB schema and content and guessing):

  • index_debug.phash
  • index_fulltext.phash
  • index_grlist.phash
  • index_grlist.phash_x
  • index_grlist.hash_gr_list
  • index_phash.phash
  • index_phash.phash_grouping
  • index_phash.contentHash
  • index_rel.phash
  • index_rel.wid
  • index_section.phash (see https://forge.typo3.org/issues/79802)
  • index_section.phash_t3
  • index_words.wid (see this issue)
  • index_words.metaphone

Currently however, only index_words.wid seems to be affected by the collisions.


Files

exception.png (86.5 KB) exception.png Sybille Peters, 2018-08-10 17:49
exception2.png (47.1 KB) exception2.png Sybille Peters, 2018-08-10 17:52
disable_fe.png (18.6 KB) disable_fe.png Sybille Peters, 2018-11-02 14:30

Related issues 7 (0 open7 closed)

Related to TYPO3 Core - Bug #79802: phash not uniqueResolved2017-02-14

Actions
Related to TYPO3 Core - Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_relResolved2018-12-12

Actions
Related to TYPO3 Core - Bug #101249: Prevent exception caused by hash collisions in indexed_searchResolved2023-07-05

Actions
Related to TYPO3 Core - Task #102975: Use full md5 hash for `indexed_search`ClosedStefan Bürk2024-01-29

Actions
Has duplicate TYPO3 Core - Bug #88557: Indexed Search: generates identical word ids for different wordsClosed2019-06-13

Actions
Is duplicate of TYPO3 Core - Bug #17619: INSERT-Error on table "index_words" on ORACLE and PostgreSQLClosed2007-09-19

Actions
Has duplicate TYPO3 Core - Bug #90977: possible race condition in indexedsearchResolved2020-04-07

Actions
Actions #1

Updated by Sybille Peters about 6 years ago

The hash function IndexedSearchUtility::md5inthash() does indeed have collisions for e.g. the following words:

  • graf + gettogethers (md5inthash: 186135449)
  • erworbener + ablfluss (md5inthash: 211412923)

because only a substring of md5 is used. The md5 hash ist 32 chars long and only the first 7 chars are used (and then converted to int).

Actions #2

Updated by Sybille Peters about 6 years ago

  • Description updated (diff)
Actions #3

Updated by Gerrit Code Review about 6 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #4

Updated by Gerrit Code Review about 6 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #5

Updated by Gerrit Code Review about 6 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #6

Updated by Gerrit Code Review about 6 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #7

Updated by Gerrit Code Review about 6 years ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #8

Updated by Gerrit Code Review about 6 years ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #9

Updated by Gerrit Code Review about 6 years ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #10

Updated by Gerrit Code Review about 6 years ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #11

Updated by Sybille Peters almost 6 years ago

  • Subject changed from Exception with duplicate key error in database for indexed_search to Exception with duplicate key (hash) error in database for indexed_search
Actions #12

Updated by Sybille Peters almost 6 years ago

Actions #13

Updated by Sybille Peters almost 6 years ago

  • Assignee set to Sybille Peters
Actions #14

Updated by Gerrit Code Review almost 6 years ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #15

Updated by Gerrit Code Review almost 6 years ago

Patch set 10 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #16

Updated by Gerrit Code Review almost 6 years ago

Patch set 11 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #17

Updated by Sybille Peters over 5 years ago

  • Subject changed from Exception with duplicate key (hash) error in database for indexed_search to Uncaught TYPO3 Exception in indexed_search: duplicate key (hash) error
Actions #18

Updated by Sybille Peters over 5 years ago

  • Tags set to exception_handling

Updated by Sybille Peters over 5 years ago

Actions #20

Updated by Sybille Peters over 5 years ago

Still reproducible with current 8.7.19-dev and master (9.4.0-dev).

Actions #21

Updated by Sybille Peters over 5 years ago

  • Description updated (diff)
  • Tags deleted (exception_handling)
Actions #22

Updated by Sybille Peters over 5 years ago

  • Description updated (diff)
Actions #23

Updated by Sybille Peters over 5 years ago

Added more information to description:

  • Affected fields
  • PHP script to reproduce
Actions #24

Updated by Sybille Peters over 5 years ago

  • Assignee deleted (Sybille Peters)
Actions #25

Updated by Gerrit Code Review over 5 years ago

Patch set 12 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #26

Updated by Gerrit Code Review over 5 years ago

Patch set 13 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #27

Updated by Gerrit Code Review over 5 years ago

Patch set 14 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/56470

Actions #28

Updated by Gerrit Code Review over 5 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

Actions #29

Updated by Gerrit Code Review over 5 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

Actions #30

Updated by Gerrit Code Review over 5 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #31

Updated by Gerrit Code Review over 5 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #32

Updated by Gerrit Code Review over 5 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

Actions #33

Updated by Gerrit Code Review over 5 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #34

Updated by Gerrit Code Review over 5 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #35

Updated by Sybille Peters over 5 years ago

  • Sprint Focus set to On Location Sprint
Actions #36

Updated by Gerrit Code Review over 5 years ago

Patch set 5 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #37

Updated by Gerrit Code Review over 5 years ago

Patch set 6 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #38

Updated by Gerrit Code Review over 5 years ago

Patch set 7 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #39

Updated by Gerrit Code Review over 5 years ago

Patch set 8 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #40

Updated by Gerrit Code Review over 5 years ago

Patch set 9 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #41

Updated by Sybille Peters over 5 years ago

Workaround:

If you are affected by this bug and waiting for a patch, there is a workaround: You can disable the Indexing in the Frontend:

To do this, change the extension configuration for indexed_search:"basic.disableFrontendIndexing".

Impact

Note: This does not disable the bug, but the effects are not as severe, because the Exceptions do not get thrown on FE rendering when the page is first loaded. If you do this however, you must activate some other method for indexing the pages, typically using the crawler via scheduler mechanism, see https://docs.typo3.org/typo3cms/extensions/indexed_search/IndexingConfigurations/CrawlerSetup/Index.html

Actions #42

Updated by Gerrit Code Review over 5 years ago

Patch set 10 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #43

Updated by Gerrit Code Review over 5 years ago

Patch set 11 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #44

Updated by Gerrit Code Review over 5 years ago

Patch set 12 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #45

Updated by Gerrit Code Review over 5 years ago

Patch set 13 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #46

Updated by Sybille Peters over 5 years ago

  • Related to Bug #86491: Duplicate entry for PRIMARY key in cache_treelist added
Actions #47

Updated by Gerrit Code Review over 5 years ago

Patch set 4 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58704

Actions #48

Updated by Gerrit Code Review over 5 years ago

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58853

Actions #49

Updated by Gerrit Code Review over 5 years ago

Patch set 14 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #50

Updated by Gerrit Code Review over 5 years ago

Patch set 15 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #51

Updated by Gerrit Code Review over 5 years ago

Patch set 16 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/58717

Actions #52

Updated by Alexander Schnitzler over 5 years ago

  • Related to deleted (Bug #86491: Duplicate entry for PRIMARY key in cache_treelist)
Actions #53

Updated by Reinhard Hiebl over 5 years ago

  • Related to Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel added
Actions #54

Updated by Reinhard Hiebl over 5 years ago

  • Related to Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel added
Actions #55

Updated by Reinhard Hiebl over 5 years ago

  • Related to deleted (Bug #87138: indexed_search: Duplicate entry for key 'Primary' in index_rel)
Actions #56

Updated by Gerrit Code Review about 5 years ago

Patch set 17 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/58717

Actions #57

Updated by Gerrit Code Review about 5 years ago

Patch set 18 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/58717

Actions #58

Updated by Jonas Eberle almost 5 years ago

  • Related to Bug #88557: Indexed Search: generates identical word ids for different words added
Actions #59

Updated by Jonas Eberle almost 5 years ago

  • Related to deleted (Bug #88557: Indexed Search: generates identical word ids for different words)
Actions #60

Updated by Jonas Eberle almost 5 years ago

  • Has duplicate Bug #88557: Indexed Search: generates identical word ids for different words added
Actions #61

Updated by Sybille Peters over 4 years ago

  • Status changed from Under Review to New
Actions #62

Updated by Sybille Peters over 4 years ago

Patch was abandoned.

Actions #63

Updated by Wolfgang Wagner over 4 years ago

Problem still exists in 9.5.11 :-/

Actions #64

Updated by Florian Rival over 4 years ago

I confirm, the problem is still present with Typo3 9.5.11

An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' 
with params [108983540, "actors", "116730212"]: Duplicate entry '108983540' for key 'PRIMARY'
Actions #65

Updated by Arne Bracht over 4 years ago

Bug is also in 9.5.13 and the workarround with crawler ist not useable annymore. The crawler ist for Version TYPO3 8 a the moment

Actions #66

Updated by Susanne Moog over 4 years ago

  • Sprint Focus deleted (On Location Sprint)
Actions #67

Updated by Sven Burkert over 4 years ago

  • Is duplicate of Bug #17619: INSERT-Error on table "index_words" on ORACLE and PostgreSQL added
Actions #68

Updated by Peter Linzenkirchner about 4 years ago

Bug is also in TYPO3 9.7.13. Flooding the log - only possibilty seems to deactivate indexed_search.

Actions #69

Updated by Michael Sollmann about 4 years ago

Can confirm this for 9.5.13 too. In my opinion it needs to be fixed urgently cause it regularly crashes the frontend in production environments.

Actions #70

Updated by Sybille Peters about 4 years ago

@Michael can you check the workaround in https://forge.typo3.org/issues/84541#note-41 and see if this helps for you (not as a solution, as a temporary workaround)

Actions #71

Updated by Riccardo De Contardi about 4 years ago

I add here the description of #88557 to keep track of it

Indexed Search: generates identical word ids for different words

Description

Because of the method used for hashing it, there are cases when a word id (wid) is a duplicate of the hash for a different word.
And the following error occurs because wid is primary key in the index_words table:

Core: Exception handler (WEB): Uncaught TYPO3 Exception: An exception occurred while executing ‘INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)’ with params [36468906, “nussholz”, “76382384"]: Duplicate entry ‘36468906’ for key ‘PRIMARY’ | Doctrine\DBAL\Exception\UniqueConstraintViolationException thrown in file deploy/releases/177/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php in line 66.

Test case:

\TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility::md5inthash('nam')
and
\TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility::md5inthash('nussholz')

will both return 36468906

Actions #72

Updated by a d almost 4 years ago

Why not simply change the IndexedSearchUtility::md5inthash() function to something which doesn't produce collisions?

This works perfectly - always produces an int32 (signed long) on 32 and 64 bit platforms and works with the existing database int columns:

    public static function md5inthash($stringToHash)
    {
        return current(unpack('l', pack('l', crc32($stringToHash))));
    }
Actions #73

Updated by Sybille Peters almost 4 years ago

If only it were that simple ...

I think this boils down to 2 questions:

1. Is there a collision-free hash function with fixed output size (32 bit) which produces no collisions for variable arbitrary input?
2. Is crc32() a good solution for a hash function for result in fixed length (32 bit) to produce hashes (to be used for key /value storage or checking duplicates, not for cryptography) with low probability of collision?

about 1: I think not. It does not even matter that much if the result is stored in a 32 bit field, 64bit or whatever. Using a better hash function or different handling of the result hash or a larger results datatype may make the collision less likely but will not completely eliminate the possibilities of collisions.

With collisions, we mean producing the same output hash for 2 different strings as input.

Correct me, if I am wrong.

See

About question 2: I really don't know, maybe, see e.g.

Still, your approach is good if it decreases the possibilities of collisions and what I've seen, the original function chops off part of the hash and yours does not so that should be an improvement.

Additionally, it might be a good solution to handle the possibility of a collision occurring gracefully and not throwing an exception.

It think this is possible, you (or anyone else for that matter) are welcome to try.

More information:

  • Create a patch (in Contribution Guide)
  • you can ask for input about the topic or get help with contribution in the #typo3-cms-coredev channel on Slack
  • I think Oliver Hader had some good ideas the last time I raised the question but I can't really remember what exactly he said. It has been a while.

I will not work on this any further, I abandoned my patch, the solution was flawed.

Actions #74

Updated by Tomas Norre Mikkelsen over 3 years ago

What about adding a timestamp to the hash-generation after the


\TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility::md5inthash('nam' . time())
\TYPO3\CMS\IndexedSearch\Utility\IndexedSearchUtility::md5inthash('nussholz' . time())

Then the likelihood of a collision would be smaller, at the same time then improve the hash itself, then we have to optimizations that will decrease the possibility for a collision.

Actions #75

Updated by Sybille Peters over 3 years ago

As I understand it, the hash should be reproducible - it should give consistent results. If you add a timestamp, you get different hashes for the same input. This bloats up the number of entries and is - in this implementation - also a problem because the hash is used to reference between tables. (e.g. index_words.wid < => index_rel.wid).

The hash in indexed_search is used as identifier and is created to have a "short" version of one or more (usually) longer data entries.

Search for md5inthash in indexed_search and look at relation of database tables including wid, phash etc.

Actions #76

Updated by Tomas Norre Mikkelsen over 3 years ago

I see.. That I didn't think about.

Actions #77

Updated by David Bruchmann over 3 years ago

I had the same issue but with another table `sys_file_processedfiles`.
The content of that table I had to delete and afterwards I created an autoindex for the field `uid`.
Afterwards the problem was gone.

As the content of indexed_search can be written new by indexing it won't be a problem to delete the content here too.
Else the uid and relations of data-row 0 or 1 had to be changed manually, it's not worth it if the content can be created easily new like in these cases.
I never thought about which field to change for indexed_search, and how in detail though.

Actions #78

Updated by Chris Bro almost 3 years ago

  • TYPO3 Version changed from 8 to 10
  • PHP Version set to 7.2

I have the same problem in a Typo3 10 installation which is rather large. Indexed_search is showing an acceptable performance - but I cannot use it because of this problem which occurs about 10 times per day!
The exact protocol message ist:

Core: Exception handler (WEB): Uncaught TYPO3 Exception: An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [26613867, "\u00fcberarbeiten", "210866314"]: Duplicate entry '26613867' for key 'PRIMARY' | Doctrine\DBAL\Exception\UniqueConstraintViolationException thrown in file /html/typo3/typo3_src-10.4.15/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php in line 59. Requested URL: https://www.xxx...

So this problem is NOT solved in versions bigger than 8.7!

Actions #79

Updated by Sybille Peters almost 3 years ago

@Chris Bro

If this is currently causing problems for you, you might want to either:

a) change the indexing to index by scheduler and not index on rendering the page:

see my comment "Workaround": https://forge.typo3.org/issues/84541#note-41

b) look at alternative search solutions. I can't really recommend anything, perhaps ask the community on Slack. There is a Solr extension by dkd which is quite good and has been in the field for years, but for this you have to setup a Solr server.

c) help to fix this issue or find someone to fix it & fund the fix.

In any case, for large sites indexed_search might not be the best solution. It is great that you get something out of the box that is simple to setup but it is not the best choice for every scenario.

Actions #80

Updated by Chris Bro almost 3 years ago

Thank you Sybille, I'll try a) like your #41 / if this doesn't work I'll have to do b) with Solr which I wouldn't like...

Actions #81

Updated by Jonas Eberle almost 3 years ago

Since we'll not have an absolutely collision-free hash ever, would it make sense to silence the exception and only log it?

Actions #82

Updated by Sybille Peters almost 3 years ago

@Jonas

Since we'll not have an absolutely collision-free hash ever, would it make sense to silence the exception and only log it?

Probably better than throwing an exception which you can't really do much about. So in this case, I would say yes, but should probably talk with core people what is best practice.

Please consider though - that the currently used hash function is not just a hash function but it also truncates the hash (to fit into the database field). This will cause more collisions (than you would otherwise expect).

see md5inthash() in https://github.com/TYPO3/TYPO3.CMS/blob/master/typo3/sysext/indexed_search/Classes/Utility/IndexedSearchUtility.php

You can write a simple script and feed it some commonly used words to see how many collisions you will get and how many you would get with just md5().

Also I am not sure what the result of the error will be (besides the exception) - would it still be possible to index the page?

When I first created a path a while back, I attempted to increase the size of the database field(s) but that had other negative impact (e.g. performance). I wasn't really sure how best to solve it so I abandoned the attempt. I didn't feel I could give the problem the necessary attention and expertise.

Actions #83

Updated by Xavier Perseguers over 2 years ago

  • Related to Bug #90977: possible race condition in indexedsearch added
Actions #84

Updated by Florian Schöppe about 2 years ago

Possible Workaround for MySQL

With https://docs.typo3.org/c/typo3/cms-core/10.4/en-us/Changelog/8.4/Breaking-77700-ExtensionIndexed_search_mysqlMergedIntoIndexed_search.html
the functionality of indexed_search_mysql was merged into indexed_search and could be enabled in the extension configuration with the feature flag "useMysqlFulltext" (database update is needed afterwards).

When the feature is activated the tables index_rel and index_words are no longer used (index_fulltext is used instead). The feature uses different tables and different code to process the search results on the database level - so there is the possibility that you get "other" search results (entries and order) after the switch. Although the performance of searches and/or indexing could differ.

For me the search results and performance (4000+ pages) were ok.

Note on indexed_search improvements

I looked into the code of indexed_search and the usage of the customized hashing-function "md5inthash". For building a word index (in index_words) it should not be necessary to use a custom word id (wid column). In my opinion the word id could be any unique id and the md5inthash should be replaced by a database lookup in the search (and related code handling "metaphones").

The resulting code that uses "md5inthash" deals with the hashing of page content (phash columns). Collisions there are also possible and could lead to different unwanted behaviors (no indexing because the phash is already present, ...). Fixing that seems to be quite some work.

Actions #85

Updated by Sybille Peters about 2 years ago

@Florian Kuss Thanks for the information. I was not aware of the changelog. Perhaps that might be helpful to add that to the official documentation (if not already included). Also, there is a page Known problems

I don't use the extension indexed_search anymore, perhaps someone else would like to follow up on the suggestions.

In any case, please feel free to submit a patch, see contribution guide: https://docs.typo3.org/m/typo3/guide-contributionworkflow/main/en-us/BugfixingAZ/Index.html

Actions #86

Updated by Gerrit Code Review almost 2 years ago

  • Status changed from New to Under Review

Patch set 1 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #87

Updated by Gerrit Code Review almost 2 years ago

Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #88

Updated by Gerrit Code Review almost 2 years ago

Patch set 2 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #89

Updated by Jonas Eberle almost 2 years ago

  • Related to deleted (Bug #90977: possible race condition in indexedsearch)
Actions #90

Updated by Jonas Eberle almost 2 years ago

  • Has duplicate Bug #90977: possible race condition in indexedsearch added
Actions #91

Updated by Gerrit Code Review almost 2 years ago

Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #92

Updated by Gerrit Code Review almost 2 years ago

Patch set 3 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #93

Updated by Gerrit Code Review almost 2 years ago

Patch set 4 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #94

Updated by Gerrit Code Review almost 2 years ago

Patch set 4 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #95

Updated by Gerrit Code Review almost 2 years ago

Patch set 5 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #96

Updated by Gerrit Code Review almost 2 years ago

Patch set 5 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #97

Updated by Gerrit Code Review almost 2 years ago

Patch set 6 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #98

Updated by Gerrit Code Review almost 2 years ago

Patch set 6 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #99

Updated by Gerrit Code Review almost 2 years ago

Patch set 7 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #100

Updated by Gerrit Code Review almost 2 years ago

Patch set 8 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #101

Updated by Gerrit Code Review almost 2 years ago

Patch set 8 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #102

Updated by Gerrit Code Review almost 2 years ago

Patch set 9 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #103

Updated by Gerrit Code Review almost 2 years ago

Patch set 10 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #104

Updated by Gerrit Code Review almost 2 years ago

Patch set 10 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #105

Updated by Kevin Appelt over 1 year ago

  • TYPO3 Version changed from 10 to 11
  • PHP Version changed from 7.2 to 8.1

Just to confirm: This issue is still valid in TYPO3 v11 with PHP 8.

Actions #106

Updated by ondro no-lastname-given over 1 year ago

Also can confirm: Issue still occurs in Typo3 v11.5.19 PHP v8.1

Actions #107

Updated by Dennis Metz about 1 year ago

Can confirm for v11.5.24 on PHP8.1

Actions #108

Updated by Gerrit Code Review 10 months ago

Patch set 11 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #109

Updated by Gerrit Code Review 10 months ago

Patch set 11 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #110

Updated by Gerrit Code Review 10 months ago

Patch set 12 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #111

Updated by Gerrit Code Review 10 months ago

Patch set 12 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #112

Updated by Stefan Bürk 10 months ago

  • Related to Bug #101249: Prevent exception caused by hash collisions in indexed_search added
Actions #113

Updated by Gerrit Code Review 10 months ago

Patch set 13 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #114

Updated by Gerrit Code Review 10 months ago

Patch set 13 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #115

Updated by Gerrit Code Review 10 months ago

Patch set 14 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #116

Updated by Gerrit Code Review 10 months ago

Patch set 14 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #117

Updated by Gerrit Code Review 10 months ago

Patch set 15 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #118

Updated by Gerrit Code Review 10 months ago

Patch set 15 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #119

Updated by Gerrit Code Review 10 months ago

Patch set 16 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #120

Updated by Gerrit Code Review 10 months ago

Patch set 16 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #121

Updated by Daniel Hettler 9 months ago

  • % Done changed from 0 to 100
Actions #122

Updated by Daniel Hettler 9 months ago

I think this task can be closed now as the bug is fixed in all supported TYPO3 versions.

Actions #123

Updated by Sybille Peters 9 months ago

  • Status changed from Under Review to Closed
Actions #124

Updated by Stefan Bürk 3 months ago

  • Related to Task #102975: Use full md5 hash for `indexed_search` added
Actions #125

Updated by Gerrit Code Review 3 months ago

  • Status changed from Closed to Under Review

Patch set 17 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #126

Updated by Gerrit Code Review 3 months ago

Patch set 18 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #127

Updated by Gerrit Code Review 3 months ago

Patch set 19 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #128

Updated by Gerrit Code Review 3 months ago

Patch set 20 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #129

Updated by Gerrit Code Review 3 months ago

Patch set 21 for branch main of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/74688

Actions #130

Updated by Stefan Bürk 3 months ago

  • Status changed from Under Review to Resolved
Actions

Also available in: Atom PDF