Bug #22357
closedcropHTML uses faulty reg exp for HTML entities
0%
Description
The very sweet feature stdWrap.cropHTML uses a faulty regular expression (i.e. search pattern) for treating encoded HTML entities as a single character, which is supposed to avoid cropping in between such entities.
The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.
Attached patch fixes the problem; the code still needs to be simplified and tested though (ToDo: keep the search pattern in a variable and test it - I'm just short in time right now).
(issue imported from #M13972)
Files
Updated by Ralf Hettinger over 14 years ago
ToDo II: simplify the pattern itself, since it's unnecessarily redundant.
Updated by Marcus Krause over 14 years ago
The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.
You implicate that entities could consist of more than one semicolon. Could you please mention at least one example that breaks the expected behaviour ("...limit a string length to a certain number of chars...")?
Updated by Ralf Hettinger over 14 years ago
I didn't want to imply that entities can contain more than one semicolon, I'm saying that cropHTML is cropping at any semicolon.
Example:
You may have the text
"Lorem ipsum dolor sit amet; consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua." (156 chars)
Now let croptHTML = 50 | ... | 1
Expected:
"Lorem ipsum dolor sit amet; consetetur sadipscing..."
Current:
"Lorem ipsum dolor sit..." (since amet is appended with a semicolon, which will force the expression to crop)
Unfortunately my guess on the reg exp isn't working eithern, so please ignore the first patch :( I'll have a look at it today.
Updated by Ralf Hettinger over 14 years ago
Hm. After playing around with this a bit I think it would be easiest to split the search pattern up into a bit more readable PHP (at least this statement holds for me and afaik for many others).
After googling a bit, the truncate function from CakePHP (http://cakeforge.org/snippet/detail.php?type=snippet&id=174 ) looks pretty close to what is wanted here. I'll come up with a suggetion.
Updated by Ralf Hettinger over 14 years ago
0013972_4.4-rev7370.patch uses a very simple pattern matching, which imo should do the trick.
Updated by Jochen Rau over 14 years ago
I can confirm this as a bug. I have altered the unit tests to show up the fault. The patch provided by Ralf resolves the issue. In the attached patch contains his patch and my unit tests.
Updated by Jochen Rau over 14 years ago
On the core list we agreed to alter the description of this bug. The sentence
"The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably."
is now
The search pattern as used in the current preg_match currently always crops after the first semicolon."
Updated by Jochen Rau over 14 years ago
Committed to
trunk (r7423)
TYPO3_4-3 (r7424)