Project

General

Profile

Actions

Bug #44381

closed

indexed_search FE Plugin doesn't show external urls in TYPO3 4.7.7

Added by Alexander Bohndorf over 11 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Must have
Category:
Indexed Search
Target version:
Start date:
2013-01-08
Due date:
% Done:

100%

Estimated time:
TYPO3 Version:
4.7
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

The FE plugin of indexed_search doesn't show any external urls which start with http:// or https:// (or any other protocol).
These URLs have been added to the index correctly (via crawler).

The reason for that is in class.tx_indexedsearch.php in line 1247 ff. in function checkExistance($row).

The check is done for any paths with is_file() in line 1250. is_file() returns always 0 if used with urls like "http://google.de/".

To solve this problem you can use this alternative checkExistance() implementation:

    function checkExistance($row) {
        $recordExists = TRUE;    // Always expect that page content exists
        if ($row['item_type']) {        // External media:
      if(preg_match('/^http(s)?:\/\//',$row['data_filename'])){
         $ch = curl_init($row['data_filename']);
         curl_setopt($ch, CURLOPT_NOBODY, true);
         curl_exec($ch);
         $recordExists = (curl_getinfo($ch, CURLINFO_HTTP_CODE)==200);
      }
      else {
              if (!is_file($row['data_filename']) || !file_exists($row['data_filename'])) {
                  $recordExists = FALSE;
              }
      }
        }
        return $recordExists;
    }

But this implementation is very slow because each external url of the search results will be checked with one http request.

I created an extension sms_indexedsearch_fixexternals to fix this bug. In this extension you can enable or disable the checking of http(s) URLs in the extension configuration.


Files

Pagination_issue.jpg (33.9 KB) Pagination_issue.jpg Minu Thomas, 2016-04-13 13:20

Related issues 3 (0 open3 closed)

Related to TYPO3 Core - Bug #25699: indexing of external files may be prevented by php's open_basedir restriction (Bug 18520 in core)Closed2011-04-01

Actions
Related to TYPO3 Core - Bug #50095: Indexing of external files and absRefPrefixClosed2013-07-17

Actions
Has duplicate TYPO3 Core - Bug #70458: EXT:indexed_search fails to checkExistance lokal filesClosed2015-10-07

Actions
Actions #1

Updated by Oliver Hader about 11 years ago

  • Target version set to 2222
Actions #2

Updated by Oliver Hader about 11 years ago

  • Project changed from 1382 to TYPO3 Core
Actions #3

Updated by Oliver Hader about 11 years ago

  • Category set to Indexed Search
Actions #4

Updated by Oliver Hader about 11 years ago

  • Target version deleted (2222)
Actions #5

Updated by Michael Bakonyi almost 10 years ago

If you use config.absRefPrefix = http://bla.com all file-links get prepended with this scheme + hostname. Together with the bug from above this leads to correctly indexed files which never get shown within the search-result as these files are skipped, too.

So either the final solution should involve an additional check, if a file-url really is external or not – and then decide if the existance is checked via curl or file_exists.
Or the existance-check should be completely removed or only activateable via ext-config.

I think the check should be removed as it is job of the editors to keep the pages and links uptodate – it's not the job of a search-result-process. Nowadays we have the linkvalidator-extension which is runable via scheduler every night which can remind the editor of broken links.

Actions #6

Updated by Mathias Schreiber over 8 years ago

  • Assignee changed from Dmitry Dulepov to Tymoteusz Motylewski
  • TYPO3 Version set to 4.7
  • Is Regression set to No

@Tymek... thoughts?

Actions #7

Updated by Tymoteusz Motylewski over 8 years ago

In my opinion plugin which displays search should trust, that data which is in the index is valid. It's a responsibility of indexer to update the data.
So I'm opting for removal of the check.

btw, In extbase version the check is already gone.

Actions #8

Updated by Gerrit Code Review over 8 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

Actions #9

Updated by Gerrit Code Review over 8 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

Actions #10

Updated by Tymoteusz Motylewski over 8 years ago

  • Target version set to 7.6.1
Actions #11

Updated by Gerrit Code Review over 8 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/45142

Actions #12

Updated by Tymoteusz Motylewski over 8 years ago

  • Status changed from Under Review to Resolved
  • % Done changed from 0 to 100
Actions #13

Updated by Minu Thomas about 8 years ago

The patch set 3 is functioning.
The issue was found in TYPO3 6.2.19 and below versions. There was an issue with the pagination too. Updates were made to the SearchFormController.php file. Now the search of indexed external urls and pagination are functioning properly.

Why is it not implemented to the core file yet? When can we expect the updates.

Actions #14

Updated by Tymoteusz Motylewski about 8 years ago

Hi Minu,
Thanks for your report.
I believe the fix was applied only to TYPO3 7 as TYPO3 6.2 has entered the support phase where only priority bugfixes are applied.
See the roadmap:
https://typo3.org/typo3-cms/roadmap/

There was an issue with the pagination too. Updates were made to the SearchFormController.php file.

Not sure what do you mean.

Actions #15

Updated by Minu Thomas about 8 years ago

Hi,
Thanks for your feedback. Now its clear about the updates.

The total result count and the display of pagination was showing incorrectly. This issue was also automatically fixed when patch was added.

Please see the attachment:

Actions #16

Updated by Riccardo De Contardi over 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF