Add possibility to start indexing an external site at a specific page
Current behaviour is that the starting URL is used for two purposes:
- determine where crawling starts
- check if the indexed pages are "inside" this URL
If you need to start the crawler at a specific page which is not a directory name there needs to be an extra setting.
I noticed some strange behaviour when working with the indexed_search and the crawler extension: Some websites (like http://typo3.org/) are getting indexed including the subpages.
But on other domains, just the first page is indexed - but the links on that page are not followed (even if I configure it to dig 3 levels deep).
All the pages that aren't working are valid HTML or valid XHTML. I tried some different scenarios (like absolute/relativ paths as links) - no success.
Indexed search 2.9.0
(issue imported from #M4167)