Project

General

Profile

Actions

Bug #50095

closed

Epic #65815: Improve Indexed search indexer

Indexing of external files and absRefPrefix

Added by Harald no-lastname-given almost 11 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Must have
Assignee:
-
Category:
Indexed Search
Target version:
Start date:
2013-07-17
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
6.1
PHP Version:
Tags:
Complexity:
hard
Is Regression:
No
Sprint Focus:

Description

Hello!

If I for example use the following configuration

config.index_enable = 1
config.index_externals = 1

together with

config.absRefPrefix = /

the indexed files are displayed in the backend, but with the search function in the frontend not found. Only the pages on which the files are to be displayed.

When I use

config.absRefPrefix =

and then run a new indexing, it works!

Thank you very much

Harald


Files

50095-patch1.diff (818 Bytes) 50095-patch1.diff Chris Müller, 2013-10-21 15:10

Related issues 2 (0 open2 closed)

Related to TYPO3 Core - Bug #64315: Typo in function name checkExistanceClosed2015-01-16

Actions
Related to TYPO3 Core - Bug #44381: indexed_search FE Plugin doesn't show external urls in TYPO3 4.7.7ClosedTymoteusz Motylewski2013-01-08

Actions
Actions #1

Updated by Philipp Gampe over 10 years ago

  • Category set to Indexed Search
  • Status changed from New to Needs Feedback

As said in the other bug report, this is no question and answer site. This tracker is for actual bugs, feature requests and task which need to be done.

I am not into index search, you better ask in the forum/newsgroup.

Did you try this while setting config.absRefPrefix to the full domain?

Actions #2

Updated by Markus Blaschke over 10 years ago

I can confirm this issue with TYPO3 6.1.3.

Also no output when using full domain in config.absRefPrefix.

All files are indexed with /fileadmin/... when using config.absRefPrefix =

If we are using the full domain eg. config.absRefPrefix = http://www.example.com/ the files are indexed as http://www.example.com/fileadmin/....

Actions #3

Updated by Philipp Gampe over 10 years ago

Might be an issue inside index search then. Do know if it worked before, e.g. in some 4.x version?

Actions #4

Updated by Harald no-lastname-given over 10 years ago

Hello Philip!

Thanks for the reply!
Unfortunately, I can not say whether the problem has existed for TYPO3 4.x.
At least a corresponding file is found in the search in TYPO3 4.x. Whether the contents of this file was indexed, I do not know anymore!

Many Thanks

Harald

Actions #5

Updated by Philipp Gampe over 10 years ago

OK, leaving this open, but someone will need to dig into indexed_search and find the root cause of this behavior.

Actions #6

Updated by Chris Müller over 10 years ago

Today I had the same problem in TYPO3 6.1.5. We are using config.absRefPrefix = / and encountered the problem, that pdf files are not shown in the result list. So I digged into the code: The pdf files are found but the method "checkExistence()" throw them out of the results.

Attached the patch 50095-patch1.diff which fixes this issue. I tested it with absRefPrefix = / and baseUrl = http://www.example.org/

Actions #7

Updated by Thirot no-lastname-given about 10 years ago

Unfortunately, I can not say whether the problem has existed for TYPO3 4.x.
At least a corresponding file is found in the search in TYPO3 4.x. Whether the contents of this file was indexed, I do not know anymore!

I can see this issue in TYPO3 4.7.17.
The path is saved in the database with absRefPrefix.
The file is indexed but not usable in the search form.
And it is impossible to re-index the file in the Backend.
Indexed_search seems to use relative path without the prepend / slash but parse_url() return [path] => /path ?
So what's the correct path to use ?

Update 2014.01.28
- Absolute absRefPrefix (http://site.com) is not working. Any http:// URL is saved as an external URL.
- Indexed_search use URL and the localPath.
- The first calulated phash by the crwaler is based on localPath and not the URL, for this reason pash based on URL are invalid.
- The meta base of the html page is not used in extractHyperLinks() in order to extract absolute URL?
- But absolute URL can generate duplicate content for many domains
- I can't figure out how indexed_search is supposed to work

I made a patch for me. This patch save the PATH and not the URL in order to be pash compatible.
I didn't test external document.

 typo3/sysext/indexed_search/class.indexer.php | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/typo3/sysext/indexed_search/class.indexer.php b/typo3/sysext/indexed_search/class.indexer.php
index e3beed0..762754c 100755
--- a/typo3/sysext/indexed_search/class.indexer.php
+++ b/typo3/sysext/indexed_search/class.indexer.php
@@ -731,7 +731,7 @@ class tx_indexedsearch_indexer {
                         if (is_object($crawler))    {
                             $params = array(
                                 'document' => $linkSource,
-                                'alturl' => $linkInfo['href'],
+                                'alturl' => $linkInfo['localPath'],
                                 'conf' => $this->conf
                             );
                             unset($params['conf']['content']);

Actions #8

Updated by Alexander Opitz about 10 years ago

  • Status changed from Needs Feedback to New
  • Is Regression set to No
Actions #9

Updated by Benjamin Robinson over 9 years ago

I can confirm that this issue still exists in 6.2.4

Actions #10

Updated by Markus Klein over 9 years ago

  • Status changed from New to Accepted
  • Priority changed from Should have to Must have
  • Target version set to next-patchlevel
  • Complexity set to hard

Ran into this issue as well and debugged it.

Problem is that the DB has the href value in data_filename field.
When showing the output for the search result the SearchFormController::checkExistance() method is run, which checks with !is_file($row['data_filename']).
This is wrong!

The indexer uses Indexer::createLocalPath and 5 submethods to identify the correct local path for the given href value.
This functionality is needed for the check in SearchFormController::checkExistance() as well.

Solutions:
  • copy createLocalPath() code to SearchFormController
  • make it public in the Indexer
  • create new class to hold these path manipulation methods
  • create a new db field to hold the real local path
Actions #11

Updated by Mathias Schreiber over 9 years ago

  • Target version changed from next-patchlevel to 7.1 (Cleanup)
Actions #12

Updated by Tymoteusz Motylewski about 9 years ago

  • Parent task set to #65815
Actions #13

Updated by Benni Mack almost 9 years ago

  • Target version changed from 7.1 (Cleanup) to 7.4 (Backend)
Actions #14

Updated by Susanne Moog over 8 years ago

  • Target version changed from 7.4 (Backend) to 7.5
Actions #15

Updated by Benni Mack over 8 years ago

  • Target version changed from 7.5 to 8 LTS
Actions #16

Updated by Tymoteusz Motylewski over 8 years ago

FYI, we're not checking for file existence on rendering (so in searchController) any more.

Actions #17

Updated by Markus Klein over 8 years ago

This is still an issue on soon to come 6.2.16

Actions #18

Updated by Chris W about 8 years ago

TYPO3 6.2.21
As long as i am logged in with feuser i am able to find every PDF which is indexed in public pages. PDF files indexed in secured pages can't be found... Disable absRefPrefix fix this for me.

Actions #19

Updated by Jan Kiesewetter over 7 years ago

As the whole function was removed with #44381 the problem just occurs in 6.2.
For 6.2 I created an small extension which xclasses the SearchFormController and just return true like TYPO3 7.6 or 8 which no longer consider.
https://bitbucket.org/t3easy_de/indexed_search_absrefprefix

This issue can be closed.

Actions #20

Updated by Riccardo De Contardi over 7 years ago

  • Status changed from Accepted to Closed

Thank you for your answer and findings, I'll close it.

Regards.

If you think that this is the wrong decision please reopen in or open a new issue and add a reference to this one. Thank you.

Actions

Also available in: Atom PDF