Bug #81644
closedGeneralUtility::getUrl() socket method doesn't support chunked Content-Encoding
0%
Description
With pageNotFound_handling set to the URL of a page the page not found handler uses GeneralUtility::getUrl() to retrieve the contents of the page. Because it requests the headers too the socket method is used if useCurl is not set.
After reading the headers the rest of the stream is simply read in a single operation. If the server has Content-Encoding set to chunked it will send the content in chunks and put the length of each block in hexadecimal before each chunk (plus a zero after the last chunk).
getUrl() fails to process the chunks correctly and the chunk sizes are simply included in the content.
It doesn't seem that guzzle handles chunked encoded data, but in most cases it will use cUrl internally which handles it. I'll have to test if with disabled cUrl v8/master has the same issue.
We can use a simple function to decode the chunked data.
Files
Updated by Riccardo De Contardi almost 5 years ago
- Category set to Site Handling, Site Sets & Routing
Updated by Benni Mack over 4 years ago
- Status changed from New to Needs Feedback
Hey Jigal,
we avoid using getUrl() in most places now, and use Guzzle. Are you looking for solutions to fetch a page-not-found "chunked"?
Updated by Jigal van Hemert over 4 years ago
Hey Benni,
At the time of submitting the issue I couldn't find that Guzzle handles chunked Content Encoding automatically. cUrl does support chunked data and removes the size numbers before returning the content.
In the situation that the web server is configured to use chunked Content Encoding AND cUrl is disabled AND the page not found handling is set to fetching the contents of a page THEN the output is broken and displays the chunk sizes.
The workaround is to change one of those three conditions. But it would be nice if chunked data was supported.
IIRC Guzzle will automatically detect if cUrl can be used, so the chances of this problem happening are drastically reduced (most web servers will have cUrl).
Updated by Benni Mack over 4 years ago
Jigal van Hemert wrote:
The workaround is to change one of those three conditions. But it would be nice if chunked data was supported.
IIRC Guzzle will automatically detect if cUrl can be used, so the chances of this problem happening are drastically reduced (most web servers will have cUrl).
Yes. I believe so too, however, I don't have such a set up at hand to build tests around it to make sure we can support this (with guzzle). How do you suggest we proceed?
Updated by Markus Klein over 4 years ago
We face a similar issue with v10 and the PageContentErrorHandler. The fetched page is returned with "Transfer-Encoding: chunked" and this exact response is used to answer the original request. Problem is that this response does not use "Transfer-Encoding: chunked" which yields a failed-connection in the browser or - more weird - several retries by a proxy to "get the other chunks" ultimately DoS-attacking the server with useless requests.
I suggest to remove the header with https://review.typo3.org/c/Packages/TYPO3.CMS/+/64672
Updated by Markus Klein over 4 years ago
- File monitoring.jpg monitoring.jpg added
Updated by Markus Klein over 4 years ago
- Related to Bug #91582: Fetching an internal page as 404 content breaks browser output and CDNs added
Updated by Benni Mack over 4 years ago
Hey all,
now that the changes are merged, is this issue resolved for everybody?
Updated by Riccardo De Contardi about 4 years ago
- Status changed from Needs Feedback to Closed
I close this issue for now in agreement with the reporter;
If you think that this is the wrong decision or experience the issue again, please open a new issue with a reference to this one.
Thank you