Bug #22229
closedExternal URL only indexes first page
0%
Description
When indexing an external URL/website, the first page is indexed but no subpage of the external website.
Problem is related to relative links vs absolute (w/ scheme) in hyperlinks. Today's websites often use relative links:
<a href="some/relative/page.html">....
instead of
<a href="http://www.domain.tld/subsite/some/relative/page.html">
Problem is that EXT:indexed_search/class.crawler.php in method indexExtUrl() is not able to properly convert from relative link to absolute when dealing with external websites. It only supports converting relative link to absolute for the TYPO3 website. In such cases, the URL above will be converted to
http://typo3-website.tld/some/relative/page.html
This page 1) does not exist and 2) is not within the authorized target website and as such cannot and would not be indexed anyway, even if the document existed.
(issue imported from #M13732)
Files