Bug #22357
closed
cropHTML uses faulty reg exp for HTML entities
Added by Ralf Hettinger over 14 years ago.
Updated about 6 years ago.
Description
The very sweet feature stdWrap.cropHTML uses a faulty regular expression (i.e. search pattern) for treating encoded HTML entities as a single character, which is supposed to avoid cropping in between such entities.
The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.
Attached patch fixes the problem; the code still needs to be simplified and tested though (ToDo: keep the search pattern in a variable and test it - I'm just short in time right now).
(issue imported from #M13972)
Files
ToDo II: simplify the pattern itself, since it's unnecessarily redundant.
The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably.
You implicate that entities could consist of more than one semicolon. Could you please mention at least one example that breaks the expected behaviour ("...limit a string length to a certain number of chars...")?
I didn't want to imply that entities can contain more than one semicolon, I'm saying that cropHTML is cropping at any semicolon.
Example:
You may have the text
"Lorem ipsum dolor sit amet; consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua." (156 chars)
Now let croptHTML = 50 | ... | 1
Expected:
"Lorem ipsum dolor sit amet; consetetur sadipscing..."
Current:
"Lorem ipsum dolor sit..." (since amet is appended with a semicolon, which will force the expression to crop)
Unfortunately my guess on the reg exp isn't working eithern, so please ignore the first patch :( I'll have a look at it today.
Hm. After playing around with this a bit I think it would be easiest to split the search pattern up into a bit more readable PHP (at least this statement holds for me and afaik for many others).
After googling a bit, the truncate function from CakePHP (http://cakeforge.org/snippet/detail.php?type=snippet&id=174 ) looks pretty close to what is wanted here. I'll come up with a suggetion.
0013972_4.4-rev7370.patch uses a very simple pattern matching, which imo should do the trick.
I can confirm this as a bug. I have altered the unit tests to show up the fault. The patch provided by Ralf resolves the issue. In the attached patch contains his patch and my unit tests.
On the core list we agreed to alter the description of this bug. The sentence
"The search pattern as used in the current preg_match currently always crops after the first semicolon and won't recognize entites reliably."
is now
The search pattern as used in the current preg_match currently always crops after the first semicolon."
Committed to
trunk (r7423)
TYPO3_4-3 (r7424)
- Status changed from Resolved to Closed
Also available in: Atom
PDF