Project

General

Profile

Actions

Bug #59007

open

Epic #65815: Improve Indexed search indexer

Indexed Search cannot index external files if FileStorage is not public

Added by Robert Vock almost 10 years ago. Updated over 1 year ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2014-05-21
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
7
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

External files are indexed with the following TypoScript:

config.index_enable = 1
config.index_externals = 1

and Configuration in extension (useCrawlerForExternalFiles)

If the sys_file_storage is public, everything works fine. The frontend generates URLs in the form of "fileadmin/download.pdf" which are correctly indexed.

But if the storage is not public, the frontend generates links in the form of "index.php?eID=dumpFile&t=f&f=FILEID&token=HASH" which are not indexed. This is the case even if it would be possible to read the file on the server side (for example if the path is a samba mount)

To reproduce the bug, just uncheck the "is public" checkbox for the default fileadmin FileStorage.

Actions #1

Updated by Tymoteusz Motylewski about 9 years ago

  • Parent task set to #65815
Actions #2

Updated by Michael OF about 7 years ago

  • TYPO3 Version changed from 6.2 to 7

Hi,

As this issue is open now for more than two years, and because I'm also missing this feature, could anyone please be so kind and tell me if there are any plans/roadmaps to solve it?

Regards,
Michael

Actions #3

Updated by Michael OF about 7 years ago

Remark: useCrawlerForExternalFiles is not needed to reproduce. Reproduced in 7.6.15

Actions #4

Updated by Tomas Norre Mikkelsen over 1 year ago

I accept that this still might be an issue, would need to check that.
But why would you index non-public file storage? So that the content gets public through the indexing? I might be missing a use case or context here.

Actions #5

Updated by Robert Vock over 1 year ago

If a file from a non-public file storage is used in a content element, it will be indexed with config.index_externals = 1. If it's used on a page, it should be findable via search.

The configuration does not index ALL the files in the non-public storage, just those, that are used on any public page.

Actions #6

Updated by Tomas Norre Mikkelsen over 1 year ago

Thanks for clearing that up.

I can still reproduce this with

TYPO3 11.5.19
PHP 8.0.24

Will see if I can figure out what the problem is.

I think the problems is around the lines https://github.com/TYPO3-CMS/indexed_search/blob/main/Classes/Indexer.php#L563
The allowedAbsPath doesn't allow this path pattern, that the link has if not public.

Perhaps a if clause should be added for the private file.
I don't know if it's wanted by design or not.

Perhaps @Benni Mack or @Georg Ringer can saw a word or two about that.

Actions

Also available in: Atom PDF