Project

General

Profile

Actions

Bug #22357

closed

cropHTML uses faulty reg exp for HTML entities

Added by Ralf Hettinger over 14 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Should have
Assignee:
Category:
-
Target version:
-
Start date:
2010-03-30
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

The very sweet feature stdWrap.cropHTML uses a faulty regular expression (i.e. search pattern) for treating encoded HTML entities as a single character, which is supposed to avoid cropping in between such entities.

The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.

Attached patch fixes the problem; the code still needs to be simplified and tested though (ToDo: keep the search pattern in a variable and test it - I'm just short in time right now).

(issue imported from #M13972)


Files

0013972_4.4-rev7223.patch (1.33 KB) 0013972_4.4-rev7223.patch Administrator Admin, 2010-03-30 20:38
0013972_4.4-rev7370.patch (1.51 KB) 0013972_4.4-rev7370.patch Administrator Admin, 2010-04-15 13:45
issue_13972_v3.diff (5.4 KB) issue_13972_v3.diff Administrator Admin, 2010-04-15 15:01
issue_13972_v4.diff (5.4 KB) issue_13972_v4.diff Administrator Admin, 2010-04-16 23:32
Actions #1

Updated by Ralf Hettinger over 14 years ago

ToDo II: simplify the pattern itself, since it's unnecessarily redundant.

Actions #2

Updated by Marcus Krause over 14 years ago

The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.

You implicate that entities could consist of more than one semicolon. Could you please mention at least one example that breaks the expected behaviour ("...limit a string length to a certain number of chars...")?

Actions #3

Updated by Ralf Hettinger over 14 years ago

I didn't want to imply that entities can contain more than one semicolon, I'm saying that cropHTML is cropping at any semicolon.

Example:
You may have the text
"Lorem ipsum dolor sit amet; consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua." (156 chars)

Now let croptHTML = 50 | ... | 1

Expected:
"Lorem ipsum dolor sit amet; consetetur sadipscing..."

Current:
"Lorem ipsum dolor sit..." (since amet is appended with a semicolon, which will force the expression to crop)

Unfortunately my guess on the reg exp isn't working eithern, so please ignore the first patch :( I'll have a look at it today.

Actions #4

Updated by Ralf Hettinger over 14 years ago

Hm. After playing around with this a bit I think it would be easiest to split the search pattern up into a bit more readable PHP (at least this statement holds for me and afaik for many others).

After googling a bit, the truncate function from CakePHP (http://cakeforge.org/snippet/detail.php?type=snippet&id=174 ) looks pretty close to what is wanted here. I'll come up with a suggetion.

Actions #5

Updated by Ralf Hettinger over 14 years ago

0013972_4.4-rev7370.patch uses a very simple pattern matching, which imo should do the trick.

Actions #6

Updated by Jochen Rau over 14 years ago

I can confirm this as a bug. I have altered the unit tests to show up the fault. The patch provided by Ralf resolves the issue. In the attached patch contains his patch and my unit tests.

Actions #7

Updated by Jochen Rau over 14 years ago

On the core list we agreed to alter the description of this bug. The sentence

"The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably."

is now

The search pattern as used in the current preg_match currently always crops after the first semicolon."

Actions #8

Updated by Jochen Rau over 14 years ago

Committed to
trunk (r7423)
TYPO3_4-3 (r7424)

Actions #9

Updated by Benni Mack about 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF