Project

General

Profile

Bug #84541

Updated by Sybille Peters over 5 years ago

I get an Exception with HTTP status code 503 on loading a page with the Exception message: 

 <pre> 
 Oops, an error occurred! 
 An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [55602717, "images\/botschaften", "66411504"]: Duplicate entry '55602717' for key 'PRIMARY' 
 </pre> 

 !exception.png! 


 h2. System 

 * TYPO3 version 8.7.17 
 * current master 

 h2. Severity of error 

 Even if Configuration preset "Live" is selected, an error is still displayed in the frontend on first load of page. The page is not rendered. Doesn't happen for all pages, only for pages where the hash collision occurs. Still, it's pretty ugly.  

 !exception2.png! 


 h2. Reproduce 


 # Activate the extension indexed_search 
 # Add static includes for indexed_search 
 # Add the following words to a new or existing content element: *graf graf gettogethers abfluss erworbener* erworbener 
 # Load the page in the frontend (you may have to do that twice) 


 

 h3. Expected result 

 Page is displayed 

 h3. Actual result 

 Exception message: 

 <pre> 
 Uncaught TYPO3 Exception 
 An exception occurred while executing 'INSERT INTO `index_words` (`wid`, `baseword`, `metaphone`) VALUES (?, ?, ?)' with params [186135449, "gettogethers", "199699927"]: Duplicate entry '186135449' for key 'PRIMARY'  
 </pre> 


 h2. Reproduce with test code 

 You can also reproduce the problem with this simple test script, which includes the original hash function from "IndexedSearchUtility":https://github.com/TYPO3/TYPO3.CMS/blob/master/typo3/sysext/indexed_search/Classes/Utility/IndexedSearchUtility.php and some test strings 

 <pre><code class="php"> 
 <?php 

 $str = ['graf', 'gettogethers', 'abfluss', 'erworbener']; 

 function md5inthash($stringToHash) 
 { 
     return hexdec(substr(md5($stringToHash), 0, 7)); 
 } 

 foreach ($str as $s) { 
     print("string=$s hash=" . md5inthash($s) . "\n"); 
 } 
 </code></pre> 
 The first 2 strings and the third and forth string produce the same hash (which causes the collisions): 

 <pre><code class="text"> 
 string=graf hash=186135449 
 string=gettogethers hash=186135449 
 string=abfluss hash=211412923 
 string=erworbener hash=211412923 

 </code></pre> 


 h2. Additional information 

 If you need to reproduce this several times, you can delete the index for the page by selecting the "Indexing" module, select the page with the content you inserted earlier, then select "Detailed statistics" and press the "Delete" garbage can.  


 You can also check the sys_log for similar errors: 

 <pre><code class="sql"> 
  SELECT uid,FROM_UNIXTIME(tstamp),details FROM sys_log WHERE details LIKE '%index_word%' and error = 2 ORDER BY uid DESC LIMIT 10; 
 </code></pre> 


 h2. Cause of error 

 possibly:    https://github.com/TYPO3/TYPO3.CMS/blob/8124407655ae73656bf6c21f6bc8841b8e1d2023/typo3/sysext/indexed_search/Classes/Indexer.php#L2134 

 (the link goes to a specific commit which may no longer reflect the current codebase) 

 hash collision? 

 h2. Affected fields 

 These are the fields in the DB, that get filled using the current hash algorithm: 

 (all affected fields by looking at DB schema and content and guessing): 

 * index_debug.phash 
 * index_fulltext.phash 
 * index_grlist.phash 
 * index_grlist.phash_x 
 * index_grlist.hash_gr_list 
 * index_phash.phash 
 * index_phash.phash_grouping 
 * index_phash.contentHash 
 * index_rel.phash 
 * index_rel.wid 
 * *index_section.phash* (see https://forge.typo3.org/issues/79802) 
 * index_section.phash_t3 
 * *index_words.wid* (see this issue) 
 * *index_words.metaphone* 


 Currently however, only index_words.wid seems They should be changed from int to be affected by char(32), and the collisions. hash algorithm changed to simple PHP md5() (without truncation).  

Back