Project

General

Profile

Actions

Bug #14268

closed

function substUrlsInPlainText in class.t3lib_div.php cant extract properly an url with char other than space at end

Added by old_hoang over 19 years ago. Updated over 17 years ago.

Status:
Closed
Priority:
Should have
Category:
Backend API
Target version:
-
Start date:
2004-08-12
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
3.5.0 final
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

the code $newParts = split('[[:space:]]|\)|\(',$v,2);
in the function cant extract properly an url with char other than space at
the end.
for example http://www.cantgetlinkproperly.de! or
http://www.cantgetlinkproperly.de<br><br>
Result is a wrong link in table cache_md5params!

This occurs for example if you send a plaintext newsletter with embedded
html-content objects with html-links.

(issue imported from #M284)


Related issues 1 (0 open1 closed)

Related to TYPO3 Core - Bug #28248: t3lib_div::substUrlsInPlainText didn't recognize URLs at the end of a sentence correctlyClosed2011-07-15

Actions
Actions #1

Updated by old_hoang over 19 years ago

change $newParts = split('[[:space:]]|\)|\(',$v,2);

to

$newParts = split("[[:space:]]|\)|\(|<",$v,2);

will do for html-tags at the end of the url. By the way why is ( and ) in the regexpression?

Actions #2

Updated by Ingmar Schlecht over 19 years ago

The '(' and ')' are in the regex for the same reason you added '<' to the list: In order to allow for other characters than [:space:] to terminate the URL.

For example in the following example mail you'll see that the ')' terminates the URL.

Dear User

You have won the car (see http://domain.tld/index.php?id=2&asdf)

Regards,
your sweepstakes team

I hope I have answered your question.

However, I don't really like your fix to the problem about the '<' character.
I think the regexp should contain ALL characters that are not allowed in an URL.

After having a look at http://www.rfc-editor.org/rfc/rfc2396.txt, I'd say all of the following characters are possible as URL delimiters and should be checked by the regexp:
"<" | ">" | <"> | "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

Consider a URL written like that:
"http://domain.tld/index.php?id=2&asdf"
Or like that:
<http://domain.tld/index.php?id=2&asdf>
Or like that:
(http://domain.tld/index.php?id=2&asdf)

All of those possibilities should be considered, and as the RFC forbids to use these characters anyway, it should not be a problem. The only characters I'm not sure about is "[" and "]" because they are often illegally used by Typo3 for URLs.

Anyway, if there will be a fix to this bug, it will not go into the 3.6 branch but rather into HEAD/3.7-dev.

Actions #3

Updated by old_hoang over 19 years ago

Hello Ingmar,

but thats what my fix do! I dont understand! My problem is that I put in the newsletter a link of the type typolink object with wrap <br>|<br> (coded in typoscript with the current page id to provide the nl reader a link to the page itself, i.e. server based newsletter with all images etc, and sending out only plaintext newsletter with link to itself) . Because of this the link is not split correctly and stored wrong in this jumpurl table. If you want to add all chars not allowed, I'm fine with it.

Greets,

Chi

Actions #4

Updated by old_chihoang about 19 years ago

Hallo Ingmar,

anyway here is another fix (should occurs in Typo3.7 too):

newParts = split('[[:space:]\<]|\)|\(',$v,2);

Greets,

Chi

Actions #5

Updated by Michael Stucki about 19 years ago

Hi Ingmar, have you fixed this yet? Is the bug still reproducable? Didn't test it myself...

Actions #6

Updated by Ingmar Schlecht about 19 years ago

No, I didn't fix it and I won't fix it for 3.8.0 because I don't have time for that right now.

Actions #7

Updated by Wolfgang Klinger about 18 years ago

fixed in CVS

it's a compromise,
the following characters are now allowed to terminate the URL:

any kind of whitespace (space, tab, ..) and
<>"{}|\^`()'

Actions

Also available in: Atom PDF