Bug #85635

Broken <script> tag after XML import

Added by Dmitry no-lastname-given about 1 year ago. Updated 5 months ago.

Status:
Needs Feedback
Priority:
Should have
Assignee:
Category:
Miscellaneous
Start date:
2018-07-24
Due date:
% Done:

0%

TYPO3 Version:
8
PHP Version:
7.0
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

If bodytext contains <script src="/some/script.js"></script>, it would be replaced with <script src="/script.js"/> after importing the XML file with the localization manager. This is not valid HTML and brakes a page. The translated file has valid XML containing valid HTML with a closing tag, so the replacement happens during the import process.

History

#1 Updated by Coders.Care Extension Team about 1 year ago

  • Status changed from New to Needs Feedback

Could you please check your file again to find out if there is any <![CDATA[]]> surrounding your HTML code?
Actually this seems to make a difference at least on our testing system, since exactly your HTML-Code will be imported unchanged with CDATA but as a closed tag without it.

#2 Updated by Coders.Care Extension Team about 1 year ago

It seems to be a core bug, since the CatXmlImportManager uses

GeneralUtility::xml2tree

to transform the XML code.

That again uses xmlRecompileFromStructValArray to implode array structures back into XML.

If there is no value, this method automatically closes any opening tag into a self closing tag instead of using a closing tag for tags that are not allowed to be self closing.
If the type is detected as "cdata" the whole value is just added as is, which is why CDATA works.

    /**
     * This implodes an array of XML parts (made with xml_parse_into_struct()) into XML again.
     *
     * @param array $vals An array of XML parts, see xml2tree
     * @return string Re-compiled XML data.
     */
    public static function xmlRecompileFromStructValArray(array $vals)
    {
        $XMLcontent = '';
        foreach ($vals as $val) {
            $type = $val['type'];
            // Open tag:
            if ($type === 'open' || $type === 'complete') {
                $XMLcontent .= '<' . $val['tag'];
                if (isset($val['attributes'])) {
                    foreach ($val['attributes'] as $k => $v) {
                        $XMLcontent .= ' ' . $k . '="' . htmlspecialchars($v) . '"';
                    }
                }
                if ($type === 'complete') {
                    if (isset($val['value'])) {
                        $XMLcontent .= '>' . htmlspecialchars($val['value']) . '</' . $val['tag'] . '>';
                    } else {
                        $XMLcontent .= '/>';
                    }
                } else {
                    $XMLcontent .= '>';
                }
                if ($type === 'open' && isset($val['value'])) {
                    $XMLcontent .= htmlspecialchars($val['value']);
                }
            }
            // Finish tag:
            if ($type === 'close') {
                $XMLcontent .= '</' . $val['tag'] . '>';
            }
            // Cdata
            if ($type === 'cdata') {
                $XMLcontent .= htmlspecialchars($val['value']);
            }
        }
        return $XMLcontent;
    }

To get a better understanding of the types "open", "complete" and "close", see
http://php.net/manual/de/function.xml-parse-into-struct.php

#3 Updated by Coders.Care Extension Team about 1 year ago

  • Project changed from Localization Manager (l10nmgr) to TYPO3 Core
  • Description updated (diff)
  • Category changed from Importer to Miscellaneous
  • Assignee deleted (Jo Hasenau)
  • Target version changed from 8.x.x (TYPO3 8 LTS) to next-patchlevel
  • PHP Version set to 7.0

#4 Updated by Coders.Care Extension Team about 1 year ago

Since the method is quite old, I guess this happens with CMS 6 and 7 as well.

#5 Updated by Coders.Care Extension Team about 1 year ago

According to HTML5 specs, the following tags are allowed to be "void" and therefor self closing.

area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr
Any other tag has to get a closing tag to produce valid HTML5.

#6 Updated by Dmitry no-lastname-given about 1 year ago

Coders.Care Extension Team wrote:

Could you please check your file again to find out if there is any <![CDATA[]]> surrounding your HTML code?
Actually this seems to make a difference at least on our testing system, since exactly your HTML-Code will be imported unchanged with CDATA but as a closed tag without it.

Hi,
Just checked - no, block with script tag is not wrapped with CDATA. Weird, I have 20 translated bodytext fields in file, but only 4 of them are wrapped in CDATA.

#7 Updated by Coders.Care Extension Team about 1 year ago

  • Assignee set to Jo Hasenau

As a workaround you can check the checkbox "Do not check XML" - this automatically wraps content with CDATA to make sure it does not break the XML parser. Still this is just a workaround for the L10nmgr but the bug in xmlRecompileFromStructValArray should be fixed anyway.

#8 Updated by Dmitry no-lastname-given about 1 year ago

Coders.Care Extension Team wrote:

As a workaround you can check the checkbox "Do not check XML" - this automatically wraps content with CDATA to make sure it does not break the XML parser. Still this is just a workaround for the L10nmgr but the bug in xmlRecompileFromStructValArray should be fixed anyway.

It's already checked. If I don't check it - export fails.

#9 Updated by Benni Mack 5 months ago

  • Target version changed from next-patchlevel to Candidate for patchlevel

Also available in: Atom PDF