Bug #44381

indexed_search FE Plugin doesn't show external urls in TYPO3 4.7.7

Added by Alexander Bohndorf over 7 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Must have
Category:
Indexed Search
Target version:
Start date:
2013-01-08
Due date:
% Done:

100%

TYPO3 Version:
4.7
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

The FE plugin of indexed_search doesn't show any external urls which start with http:// or https:// (or any other protocol).
These URLs have been added to the index correctly (via crawler).

The reason for that is in class.tx_indexedsearch.php in line 1247 ff. in function checkExistance($row).

The check is done for any paths with is_file() in line 1250. is_file() returns always 0 if used with urls like "http://google.de/".

To solve this problem you can use this alternative checkExistance() implementation:

    function checkExistance($row) {
        $recordExists = TRUE;    // Always expect that page content exists
        if ($row['item_type']) {        // External media:
      if(preg_match('/^http(s)?:\/\//',$row['data_filename'])){
         $ch = curl_init($row['data_filename']);
         curl_setopt($ch, CURLOPT_NOBODY, true);
         curl_exec($ch);
         $recordExists = (curl_getinfo($ch, CURLINFO_HTTP_CODE)==200);
      }
      else {
              if (!is_file($row['data_filename']) || !file_exists($row['data_filename'])) {
                  $recordExists = FALSE;
              }
      }
        }
        return $recordExists;
    }

But this implementation is very slow because each external url of the search results will be checked with one http request.

I created an extension sms_indexedsearch_fixexternals to fix this bug. In this extension you can enable or disable the checking of http(s) URLs in the extension configuration.

Pagination_issue.jpg View (33.9 KB) Minu Thomas, 2016-04-13 13:20


Related issues

Related to TYPO3 Core - Bug #25699: indexing of external files may be prevented by php's open_basedir restriction (Bug 18520 in core) Closed 2011-04-01
Related to TYPO3 Core - Bug #50095: Indexing of external files and absRefPrefix Closed 2013-07-17
Duplicated by TYPO3 Core - Bug #70458: EXT:indexed_search fails to checkExistance lokal files Closed 2015-10-07

Associated revisions

Revision 95ec4a6e (diff)
Added by Tymoteusz Motylewski over 4 years ago

[BUGFIX] Indexed Search: Display links to external files

Indexed search is rendering links to external files now.
Indexed search will not check if the file exists before displaying
search results. As a side effect this change also improves performance.

This change is affecting only the AbstractPlugin based plugin.
Extbase version do not have this check.

Resolves: #44381
Releases: master
Change-Id: Iae4e5b2f2cc575853f25c674cbb4307bdf3efa17
Reviewed-on: https://review.typo3.org/45142
Reviewed-by: Georg Ringer <>
Tested-by: Georg Ringer <>
Reviewed-by: Tymoteusz Motylewski <>
Tested-by: Tymoteusz Motylewski <>

History

#1 Updated by Oliver Hader over 7 years ago

  • Target version set to 2222

#2 Updated by Oliver Hader over 7 years ago

  • Project changed from Indexed Search to TYPO3 Core

#3 Updated by Oliver Hader over 7 years ago

  • Category set to Indexed Search

#4 Updated by Oliver Hader over 7 years ago

  • Target version deleted (2222)

#5 Updated by Michael Bakonyi about 6 years ago

If you use config.absRefPrefix = http://bla.com all file-links get prepended with this scheme + hostname. Together with the bug from above this leads to correctly indexed files which never get shown within the search-result as these files are skipped, too.

So either the final solution should involve an additional check, if a file-url really is external or not – and then decide if the existance is checked via curl or file_exists.
Or the existance-check should be completely removed or only activateable via ext-config.

I think the check should be removed as it is job of the editors to keep the pages and links uptodate – it's not the job of a search-result-process. Nowadays we have the linkvalidator-extension which is runable via scheduler every night which can remind the editor of broken links.

#6 Updated by Mathias Schreiber over 4 years ago

  • Assignee changed from Dmitry Dulepov to Tymoteusz Motylewski
  • TYPO3 Version set to 4.7
  • Is Regression set to No

@Tymek... thoughts?

#7 Updated by Tymoteusz Motylewski over 4 years ago

In my opinion plugin which displays search should trust, that data which is in the index is valid. It's a responsibility of indexer to update the data.
So I'm opting for removal of the check.

btw, In extbase version the check is already gone.

#8 Updated by Gerrit Code Review over 4 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

#9 Updated by Gerrit Code Review over 4 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

#10 Updated by Tymoteusz Motylewski over 4 years ago

  • Target version set to 7.6.1

#11 Updated by Gerrit Code Review over 4 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

#12 Updated by Tymoteusz Motylewski over 4 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100

#13 Updated by Minu Thomas over 4 years ago

The patch set 3 is functioning.
The issue was found in TYPO3 6.2.19 and below versions. There was an issue with the pagination too. Updates were made to the SearchFormController.php file. Now the search of indexed external urls and pagination are functioning properly.

Why is it not implemented to the core file yet? When can we expect the updates.

#14 Updated by Tymoteusz Motylewski over 4 years ago

Hi Minu,
Thanks for your report.
I believe the fix was applied only to TYPO3 7 as TYPO3 6.2 has entered the support phase where only priority bugfixes are applied.
See the roadmap:
https://typo3.org/typo3-cms/roadmap/

There was an issue with the pagination too. Updates were made to the SearchFormController.php file.

Not sure what do you mean.

#15 Updated by Minu Thomas over 4 years ago

Hi,
Thanks for your feedback. Now its clear about the updates.

The total result count and the display of pagination was showing incorrectly. This issue was also automatically fixed when patch was added.

Please see the attachment:

#16 Updated by Riccardo De Contardi over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF