Project

General

Profile

Actions

Bug #22229

closed

External URL only indexes first page

Added by Xavier Perseguers about 14 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Should have
Category:
-
Target version:
-
Start date:
2010-03-03
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

When indexing an external URL/website, the first page is indexed but no subpage of the external website.

Problem is related to relative links vs absolute (w/ scheme) in hyperlinks. Today's websites often use relative links:

<a href="some/relative/page.html">....

instead of

<a href="http://www.domain.tld/subsite/some/relative/page.html&quot;>

Problem is that EXT:indexed_search/class.crawler.php in method indexExtUrl() is not able to properly convert from relative link to absolute when dealing with external websites. It only supports converting relative link to absolute for the TYPO3 website. In such cases, the URL above will be converted to

http://typo3-website.tld/some/relative/page.html

This page 1) does not exist and 2) is not within the authorized target website and as such cannot and would not be indexed anyway, even if the document existed.

(issue imported from #M13732)


Files

13732.diff (2.08 KB) 13732.diff Administrator Admin, 2010-03-08 15:32
13732_v2.diff (2.34 KB) 13732_v2.diff Administrator Admin, 2010-03-08 16:43

Related issues 2 (0 open2 closed)

Related to TYPO3 Core - Bug #22296: IS cannot not index files if absRefPrefix is set and indexExternalURLs is notClosedDmitry Dulepov2010-03-18

Actions
Related to TYPO3 Core - Bug #20035: Crawler does not crawl though relative links in an external pageClosedJeff Segars2009-02-17

Actions
Actions

Also available in: Atom PDF