Bug #50095
closedEpic #65815: Improve Indexed search indexer
Indexing of external files and absRefPrefix
0%
Description
Hello!
If I for example use the following configuration
config.index_enable = 1
config.index_externals = 1
together with
config.absRefPrefix = /
the indexed files are displayed in the backend, but with the search function in the frontend not found. Only the pages on which the files are to be displayed.
When I use
config.absRefPrefix =
and then run a new indexing, it works!
Thank you very much
Harald
Files
Updated by Philipp Gampe over 11 years ago
- Category set to Indexed Search
- Status changed from New to Needs Feedback
As said in the other bug report, this is no question and answer site. This tracker is for actual bugs, feature requests and task which need to be done.
I am not into index search, you better ask in the forum/newsgroup.
Did you try this while setting config.absRefPrefix
to the full domain?
Updated by Markus Blaschke over 11 years ago
I can confirm this issue with TYPO3 6.1.3.
Also no output when using full domain in config.absRefPrefix
.
All files are indexed with /fileadmin/...
when using config.absRefPrefix =
If we are using the full domain eg. config.absRefPrefix = http://www.example.com/
the files are indexed as http://www.example.com/fileadmin/...
.
Updated by Philipp Gampe over 11 years ago
Might be an issue inside index search then. Do know if it worked before, e.g. in some 4.x version?
Updated by Harald no-lastname-given over 11 years ago
Hello Philip!
Thanks for the reply!
Unfortunately, I can not say whether the problem has existed for TYPO3 4.x.
At least a corresponding file is found in the search in TYPO3 4.x. Whether the contents of this file was indexed, I do not know anymore!
Many Thanks
Harald
Updated by Philipp Gampe over 11 years ago
OK, leaving this open, but someone will need to dig into indexed_search and find the root cause of this behavior.
Updated by Chris Müller about 11 years ago
- File 50095-patch1.diff 50095-patch1.diff added
Today I had the same problem in TYPO3 6.1.5. We are using config.absRefPrefix = / and encountered the problem, that pdf files are not shown in the result list. So I digged into the code: The pdf files are found but the method "checkExistence()" throw them out of the results.
Attached the patch 50095-patch1.diff which fixes this issue. I tested it with absRefPrefix = / and baseUrl = http://www.example.org/
Updated by Thirot no-lastname-given almost 11 years ago
Unfortunately, I can not say whether the problem has existed for TYPO3 4.x.
At least a corresponding file is found in the search in TYPO3 4.x. Whether the contents of this file was indexed, I do not know anymore!
I can see this issue in TYPO3 4.7.17.
The path is saved in the database with absRefPrefix.
The file is indexed but not usable in the search form.
And it is impossible to re-index the file in the Backend.
Indexed_search seems to use relative path without the prepend / slash but parse_url() return [path] => /path ?
So what's the correct path to use ?
Update 2014.01.28
- Absolute absRefPrefix (http://site.com) is not working. Any http:// URL is saved as an external URL.
- Indexed_search use URL and the localPath.
- The first calulated phash by the crwaler is based on localPath and not the URL, for this reason pash based on URL are invalid.
- The meta base of the html page is not used in extractHyperLinks() in order to extract absolute URL?
- But absolute URL can generate duplicate content for many domains
- I can't figure out how indexed_search is supposed to work
I made a patch for me. This patch save the PATH and not the URL in order to be pash compatible.
I didn't test external document.
typo3/sysext/indexed_search/class.indexer.php | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/typo3/sysext/indexed_search/class.indexer.php b/typo3/sysext/indexed_search/class.indexer.php index e3beed0..762754c 100755 --- a/typo3/sysext/indexed_search/class.indexer.php +++ b/typo3/sysext/indexed_search/class.indexer.php @@ -731,7 +731,7 @@ class tx_indexedsearch_indexer { if (is_object($crawler)) { $params = array( 'document' => $linkSource, - 'alturl' => $linkInfo['href'], + 'alturl' => $linkInfo['localPath'], 'conf' => $this->conf ); unset($params['conf']['content']);
Updated by Alexander Opitz almost 11 years ago
- Status changed from Needs Feedback to New
- Is Regression set to No
Updated by Benjamin Robinson over 10 years ago
I can confirm that this issue still exists in 6.2.4
Updated by Markus Klein almost 10 years ago
- Status changed from New to Accepted
- Priority changed from Should have to Must have
- Target version set to next-patchlevel
- Complexity set to hard
Ran into this issue as well and debugged it.
Problem is that the DB has the href value in data_filename field.
When showing the output for the search result the SearchFormController::checkExistance()
method is run, which checks with !is_file($row['data_filename'])
.
This is wrong!
The indexer uses Indexer::createLocalPath
and 5 submethods to identify the correct local path for the given href value.
This functionality is needed for the check in SearchFormController::checkExistance()
as well.
- copy createLocalPath() code to SearchFormController
- make it public in the Indexer
- create new class to hold these path manipulation methods
- create a new db field to hold the real local path
Updated by Mathias Schreiber almost 10 years ago
- Target version changed from next-patchlevel to 7.1 (Cleanup)
Updated by Benni Mack over 9 years ago
- Target version changed from 7.1 (Cleanup) to 7.4 (Backend)
Updated by Susanne Moog over 9 years ago
- Target version changed from 7.4 (Backend) to 7.5
Updated by Benni Mack about 9 years ago
- Target version changed from 7.5 to 8 LTS
Updated by Tymoteusz Motylewski almost 9 years ago
FYI, we're not checking for file existence on rendering (so in searchController) any more.
Updated by Markus Klein almost 9 years ago
This is still an issue on soon to come 6.2.16
Updated by Chris W over 8 years ago
TYPO3 6.2.21
As long as i am logged in with feuser i am able to find every PDF which is indexed in public pages. PDF files indexed in secured pages can't be found... Disable absRefPrefix fix this for me.
Updated by Jan Kiesewetter about 8 years ago
As the whole function was removed with #44381 the problem just occurs in 6.2.
For 6.2 I created an small extension which xclasses the SearchFormController and just return true like TYPO3 7.6 or 8 which no longer consider.
https://bitbucket.org/t3easy_de/indexed_search_absrefprefix
This issue can be closed.
Updated by Riccardo De Contardi about 8 years ago
- Status changed from Accepted to Closed
Thank you for your answer and findings, I'll close it.
Regards.
If you think that this is the wrong decision please reopen in or open a new issue and add a reference to this one. Thank you.