Feature #16534

Add possibility to start indexing an external site at a specific page

Added by Mario Rimann over 15 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2006-09-06
Due date:
% Done:

100%

Estimated time:
PHP Version:
Tags:
Complexity:
easy
Sprint Focus:

Description

Current behaviour is that the starting URL is used for two purposes:
- determine where crawling starts
- check if the indexed pages are "inside" this URL

If you need to start the crawler at a specific page which is not a directory name there needs to be an extra setting.

Old description:
I noticed some strange behaviour when working with the indexed_search and the crawler extension: Some websites (like http://typo3.org/) are getting indexed including the subpages.

But on other domains, just the first page is indexed - but the links on that page are not followed (even if I configure it to dig 3 levels deep).

All the pages that aren't working are valid HTML or valid XHTML. I tried some different scenarios (like absolute/relativ paths as links) - no success.

TYPO3 4.0
Indexed search 2.9.0
Crawler 1.1.0
(issue imported from #M4167)


Files

indexed_search_4167_v1.diff (1.5 KB) indexed_search_4167_v1.diff Administrator Admin, 2006-09-11 20:27
indexed_search_4167_v2.diff (1.4 KB) indexed_search_4167_v2.diff Administrator Admin, 2009-08-28 10:57

Also available in: Atom PDF