Feature #67686

Clean way to extract doctype

Added by Thomas Mayer over 4 years ago. Updated 9 months ago.

Status:
Needs Feedback
Priority:
Should have
Category:
Frontend
Start date:
2015-06-22
Due date:
% Done:

0%

PHP Version:
Tags:
Complexity:
Sprint Focus:

Description

Typo3 CMS is allowing a user to set a custom doctype string in

config.doctype

instead of using one of the constants:

xhtml_trans for the XHTML 1.0 Transitional doctype.
xhtml_frames for the XHTML 1.0 Frameset doctype.
xhtml_strict for the XHTML 1.0 Strict doctype.
xhtml_basic for the XHTML basic doctype.
xhtml_11 for the XHTML 1.1 doctype.
xhtml+rdfa_10 for the XHTML+RDFa 1.0 doctype.
html5 for the HTML5 doctype.
none for no doctype at all.

When a custom doctype string is used then the user MUST set

config.xhtmlDoctype

which is one of

xhtml_trans for XHTML 1.0 Transitional doctype.
xhtml_frames for XHTML 1.0 Frameset doctype.
xhtml_strict for XHTML 1.0 Strict doctype.
xhtml_basic for XHTML basic doctype.
xhtml_11 for XHTML 1.1 doctype.

Is there a clean way to extract the doctype suitable for a standard conform tag generation (validation)?

Allowing a custom doctype string in

config.doctype

makes it necessary to use another config variable (xhtmlDoctype) for exactly the same thing even using the same constants. And there's more to come like
This specification defines version 5.0 of the XHTML syntax, known as "XHTML 5".

mentioned in http://www.w3.org/TR/html5/introduction.html#html-vs-xhtml

I suggest to

  • use a new config variable
    config.customDoctypeString
    . If set, it overwrites the doctype string generated via
    config.doctype
  • deprecate and later remove
    config.xhtmlDoctype
    . It's obsolete then.
  • not allow costom doctype strings any longer in
    config.doctype

For BC, I suggest to

  • look if
    config.doctype
    contains a custom doctype string (which is not in the list of constants). If that's the case, then
  • copy the value of
    config.doctype
    into
    config.customDoctypeString
    .
  • copy the value of
    config.xhtmlDoctype
    into
    config.doctype
    (documentation says it MUST be set when using a custom doctype string)

BC could be removed when deprecation period is over. I guess that extensions would be more compatible with old well-defined known constants than they are now with custom doctype strings. Still, the full gain occurs already during deprecation period. Later on, existing users only have to configure themselves what the BC does before it's removed.

A use case would be https://github.com/mblaschke/TYPO3-metaseo/issues/70

Personally, I'd also like to see PHP constants to be used for doctype constants. They could be reused in extensions and be up-to-date when e.g. a doctype changes from html_5 to html5.


Related issues

Related to TYPO3 Core - Feature #40503: XHTML 5 Needs Feedback 2012-09-01

History

#1 Updated by Thomas Mayer over 4 years ago

Another use case, but more in detail: https://github.com/mblaschke/TYPO3-metaseo/issues/68

User complains:

Till Typo3 v6 there was a core functionality xhtmlcleaning, which is deprecated without any substitute. So the extensions must now produce clean code.

The issue is also related to #40503 (XHTML 5)

#2 Updated by Thomas Mayer over 4 years ago

Still, the change would be a breaking change when users modify config.doctype or config.xhtmlDoctype dynamically. Even when the BC workaround is applied.

This could be common when dealing with XHTML and browser support.

#3 Updated by Susanne Moog over 4 years ago

  • Target version changed from 7.4 (Backend) to 7.5

#4 Updated by Mathias Brodala about 4 years ago

  • Status changed from New to Needs Feedback

Not sure what the issue is here. You can set config.doctype to an arbitrary string and TYPO3 will output that as DOCTYPE just fine. Of course, you cannot expect full HTML5 compatible output in this case.

So if this issue is still relevant, please add an example what you set in the configuration, what you expect and what you get instead.

#5 Updated by Thomas Mayer about 4 years ago

Basic example:

set config.doctype to a custom string

<!DOCTYPE CUSTOMHTML>

If the user sets a custom string in config.doctype then config.xhtmlDoctype must be set to a valid XHTML type (note that HTML5 is still missing here as well as XHTML5).

  • There's no way to set HTML5 for rendering now because this is still unknown to config.xhtmlDoctype according to documentation. (XHTML5 would be a new feature, which is also missing).
  • Even if HTML5 could be set in config.xhtmlDoctype, this does not correspond to the config variable name config.xhtmlDoctype (it's not XHTML!).
  • There should be a way to set any doctype config.doctype knows together with a custom doctype string. For the moment this is not the case (e.g. html5).
  • There should be a clean way to extract the doctype for rendering in TYPO3 core and in extensions. For the moment this is not the case and requires heuristics and some cascade of IF-statements
  • From the perspective of a data model, config.doctype is overused for multiple data types and multiple behavioural meanings (output of doctype string itself and the doctype used for rendering)
  • From the perspective of a data model, config.xhtmlDoctype is redundant to the constants config.doctype knows.

If it is still unclear why this is not nice and clean please let me know.

To answer your question if this is relevant:

  • We had a user complaining that he can't set a custom docstring and have our extension still extract a doctype for rendering
  • As soon as XHTML5 is supported by TYPO3, the situation even worsens
  • There should be a future-proof concept instead of extending this once again.

#6 Updated by Benni Mack about 4 years ago

  • Target version deleted (7.5)

#7 Updated by Alexander Opitz almost 4 years ago

  • Category changed from Code Cleanup to 1050
  • Status changed from Needs Feedback to New
  • Target version set to 8.0

#8 Updated by Jigal van Hemert over 3 years ago

Thomas Mayer wrote:

If the user sets a custom string in config.doctype then config.xhtmlDoctype must be set to a valid XHTML type (note that HTML5 is still missing here as well as XHTML5).

The logic in the code is:
1. if config.xhtmlDoctype is empty, copy the value of config.doctype to it
2. if config.xhtmlDoctype is not empty after 1. assign the value to TSFE|xhtmlDoctype
3. if config.xhtmlDoctype is not one of the supported XHTML types, set TSFE|xhtmlDoctype to empty string

  • There's no way to set HTML5 for rendering now because this is still unknown to config.xhtmlDoctype according to documentation. (XHTML5 would be a new feature, which is also missing).

You can set it to 'HTML5' for rendering (config.xhtmlDoctype); the core does that too. XHTML5 is not present yet (will add support in 8.1).

  • Even if HTML5 could be set in config.xhtmlDoctype, this does not correspond to the config variable name config.xhtmlDoctype (it's not XHTML!).

True. The variable name was chosen at a time when something flexible like HTML5 was not conceived yet. HTML4 was old and XHTML was the future. renderDoctype would probably be better, but it's hard to justify such a breaking change for the sake of a better variable name only.

  • There should be a way to set any doctype config.doctype knows together with a custom doctype string. For the moment this is not the case (e.g. html5).

Seem documentation is missing here; it's possible as described earlier.

  • There should be a clean way to extract the doctype for rendering in TYPO3 core and in extensions. For the moment this is not the case and requires heuristics and some cascade of IF-statements

What is meant by "doctype" here?
The string in the doctype header: config.doctype (either one of the keywords or a custom string)
The doctype keyword used for most rendering purposes: config.xhtmlDoctype
The doctype keyword for XHTML types: TSFE|xhtmlDoctype
The numeric XHTML type: TSFE|xhtmlVersion (xhtml_trans/xhtml_strict/xhtml_frames : 100; xhtml_basic: 105; xhtml_11/xhtml+rdfa_10: 110; otherwise: 0)
Are frames allowed: TSFE|dtdAllowsFrames

  • From the perspective of a data model, config.doctype is overused for multiple data types and multiple behavioural meanings (output of doctype string itself and the doctype used for rendering)

It's not a data model, it's configuration so the system knows how to render tags, which attributes are allowed, etcetera.

  • From the perspective of a data model, config.xhtmlDoctype is redundant to the constants config.doctype knows.

It adds to flexibility and simplicity at the same time: no need to set config.xhtmlDoctype if you set config.doctype to one of the most used types. If you need obscure custom doctype headers you can set the render rules in xhtmlDoctype.

  • We had a user complaining that he can't set a custom docstring and have our extension still extract a doctype for rendering

In most cases config.xhtmlDoctype will do the job (despite the confusing name).

  • As soon as XHTML5 is supported by TYPO3, the situation even worsens

Can you explain?

  • There should be a future-proof concept instead of extending this once again.

It's hard to know the rules of future doctypes :-(

I'll add the XHTML5 doctype when development of 8.1 starts (it's feature) and see what needs to be changed to documentation.

#9 Updated by Jigal van Hemert over 3 years ago

  • Assignee set to Jigal van Hemert

#10 Updated by Thomas Mayer over 3 years ago

Jigal van Hemert wrote:

Thomas Mayer wrote:

If the user sets a custom string in config.doctype then config.xhtmlDoctype must be set to a valid XHTML type (note that HTML5 is still missing here as well as XHTML5).

The logic in the code is:
1. if config.xhtmlDoctype is empty, copy the value of config.doctype to it

So we also have to deal with "none" and custom strings in config.xhtmlDoctype. Great ;-)

2. if config.xhtmlDoctype is not empty after 1. assign the value to TSFE|xhtmlDoctype

Then we have "none" and custom strings in TSFE|xhtmlDoctype

3. if config.xhtmlDoctype is not one of the supported XHTML types, set TSFE|xhtmlDoctype to empty string

Being empty, how can TSFE|xhtmlDoctype be used to semanticly extract the way content (HTML tags, Meta tags, etc.) should be rendered according to [HTML 4], XHTML 1.x, HTML 5, XHTML 5, ...? Falling back to a default is not an option here, because the document type might be well-defined in the output via a custom string for config.doctype.

I have to admit that as a last resort, there is still the undocumented (mis)use of config.xhtmlDoctype, according to your description of the core's logic.

  • There's no way to set HTML5 for rendering now because this is still unknown to config.xhtmlDoctype according to documentation. (XHTML5 would be a new feature, which is also missing).

You can set it to 'HTML5' for rendering (config.xhtmlDoctype); the core does that too. XHTML5 is not present yet (will add support in 8.1).

I know that I can set this. As described, the use case is an extension which must deal with what a random user has configured, including "none" and custom strings. That said, it's not that easy.

  • Even if HTML5 could be set in config.xhtmlDoctype, this does not correspond to the config variable name config.xhtmlDoctype (it's not XHTML!).

True. The variable name was chosen at a time when something flexible like HTML5 was not conceived yet. HTML4 was old and XHTML was the future. renderDoctype would probably be better, but it's hard to justify such a breaking change for the sake of a better variable name only.

This is a historic consideration. I agree, changing the name just for solving a naming issue seems not very adequate. But it's more than that. It's about using one config variable for one concept and it's about not to use one config variable for multiple concepts.

I did not suggest to replace config.xhtmlDoctype with e.g. config.renderDoctype. I'd rather suggest to use config.xhtmlDoctype for what the name of the variable name indicates. Same for config.doctype. Same for the new variable config.customDoctypeString (as a consequence).

Meaning e.g.

config.doctype should be one of

xhtml for the XHTML doctype.
html5 for the HTML5 doctype.
xhtml5 for the XHTML 5 doctype. // only if not covered by xhtml. Needs to be discussed.
html6 for the HTML 6 doctype.
...

config.xhtmlDoctype should be one of

xhtml_trans for XHTML 1.0 Transitional doctype.
xhtml_frames for XHTML 1.0 Frameset doctype.
xhtml_strict for XHTML 1.0 Strict doctype.
xhtml_basic for XHTML basic doctype.
xhtml_11 for XHTML 1.1 doctype.  // btw, why not xhtml11 or xhthml1.1 or xhthml1_1? (html_5 got changed to html5...)
xhtml_5 for the XHTML 5 doctype.   // same here. Better use xhtml5?
xhtml_6 for the XHTML 6 doctype.
...

config.customDoctypeString should be one of

<empty> for generated doctype string according to config.doctype and config.xhtmlDoctype
<string> for output of a custom doctype string
none for suppression of the output of a doctype string

Maybe the "none" value should better be implemented using a new boolean variable with name config.suppressDoctypeString, defaulting to false.

Not sure how to deal with the entry "xhtml+rdfa_10" for the XHTML+RDFa 1.0 doctype. There's also RDFa support for HTML4 and HTML5 according to https://www.w3.org/TR/html-rdfa/. So RDFa is a separate concept and requires its own configuration variable, e.g. config.rdfaDoctype with RDFa rendering disabled when empty. In respect of rendering, current configuration does not allow to turn off and on RDFa for HTML4 and HTML5. For "xhtml+rdfa_10", is it only used for the doctype string or also for the rendering? Again, this needs to be documented. And if it can be enabled/disabled for rendering, why not for HTML5?

Not sure if that is pointing to north, but it's not pointing to a black hole. Everything is available, nothing is redundant. Naming is much better as it fits to the purpose. It's extendable by using new variables, therefore futureproof. And there's one variable for one concept which is extendable/futureproof in itself (e.g. there's no HTML together with XHTML). Could have been like that for ages. Just that now it introduces a lot of breaking changes that way. Originally, I only suggested to at least introduce config.customDoctypeString. However, now I think this should be done all in one go - once and hopefully for a long period of time, without future breaking changes, tweaks, workarounds and uncertainity about what implementation actually does and why it does that and if it is good that way and how long in the future this will be the case and if it is a feature which is officially supported or if it just somehow "worked" for the past ~decade.

It should also be made clear in the documentation that config.doctype and config.xhtmlDoctype are not only used for output doctype string generation but also for rendering. Implied, if rendered content does not validate against a specified doctype, then there might be a bug somewhere.

When in doubt, it should also be documented that config.doctype and config.xhtmlDoctype are rather relevant for rendering than for output doctype string generation because doctype string generation can still be changed via config.customDoctypeString, whereas such a way does not exist for the rendering (must be well-defined and semantics must be "known").

  • There should be a way to set any doctype config.doctype knows together with a custom doctype string. For the moment this is not the case (e.g. html5).

Seem documentation is missing here; it's possible as described earlier.

Yes, documentation is missing. With the info you added, html5 could be defined by (mis)using config.xhtmlDoctype instead of using config.doctype which then is (mis)used for the custom string. I doubt this is handled correctly all over the (extension) code, being undocumented for ages.

  • There should be a clean way to extract the doctype for rendering in TYPO3 core and in extensions. For the moment this is not the case and requires heuristics and some cascade of IF-statements

What is meant by "doctype" here?
The string in the doctype header: config.doctype (either one of the keywords or a custom string)
The doctype keyword used for most rendering purposes: config.xhtmlDoctype
The doctype keyword for XHTML types: TSFE|xhtmlDoctype
The numeric XHTML type: TSFE|xhtmlVersion (xhtml_trans/xhtml_strict/xhtml_frames : 100; xhtml_basic: 105; xhtml_11/xhtml+rdfa_10: 110; otherwise: 0)
Are frames allowed: TSFE|dtdAllowsFrames

In extensions (like metaseo, see link in description), I want to extract if meta tags and html tags should be rendered according to (former HTML 4,) XHTML, HTML 5, XHTML 5, HTML 6, XHTML 6, etc. That said, I want to have a limited set of predefined values with known semantics, making clear that e.g. HTML5 shall be used for rendering. Given "none" or a custom string in config.doctype, I cannot do that (without undocumented misuse of config.xhtmlDoctype). And the user also cannot configure that if "none" or a custom string have to be set according to the user's use case/requirements. If there is no use case for "none" and custom strings, these features could also safely be removed. However, I guess there is a use case. If that happens (rarely), why not use a separate configuration variable for "none" and custom strings.

If "none" and custom strings were in a separate variable, config.doctype could only contain one value out of a set of predefined values. Extension developers could easily "extract" e.g. html5 and know that they have to render (meta) tags according to HTML5.

So what do I mean by doctype here? I mean the document type in respect of W3C standards, e.g. HTML5. I do not mean the XHTML type in the first place. If config.xhtmlDoctype and TSFE|xhtmlVersion would contain only values from a limited set of predefined values, I'd go fine with that. But similar to config.doctype, this is not the case (according to the logic in the code you described). So generally-spoken I'd like to have a well-defined set of possible values for config.doctype, config.xhtmlDoctype and TSFE|xhtmlDoctype. That implies that custom strings would not be allowed in all of these config variables. Plus, should there be a difference between "none" for the doctype string output and a doctype for rendering. If config.doctype is used for rendering, then "none" should not be allowed in config.doctype (unless you want to e.g. suppress rendering at all).

Ideally, in extension code, I could switch/case one config variable for the rendering format. Maybe two variables if XHTML comes into play. Maybe three if framesets come into play (for a limited set of scenarios). Then there would be one variable for one concept (in the sense of a data model). If there'd be some framework support (a generic set of functions and/or semantics of config values in respect of rendering), I'd even be happier. Otherwise, there would still be an improvement: E.g. config variables for framesets might not be of relevance for every (meta) tag, hence the logic can ommit these variables when not needed.

  • From the perspective of a data model, config.doctype is overused for multiple data types and multiple behavioural meanings (output of doctype string itself and the doctype used for rendering)

It's not a data model, it's configuration so the system knows how to render tags, which attributes are allowed, etcetera.

As I pointed out, "none" and custom strings in config.doctype make it difficult for the system to "know" exactly that. That's the core of this issue (as long as doctype.xhtmlDoctype is not misused).

  • From the perspective of a data model, config.xhtmlDoctype is redundant to the constants config.doctype knows.

It adds to flexibility and simplicity at the same time: no need to set config.xhtmlDoctype if you set config.doctype to one of the most used types. If you need obscure custom doctype headers you can set the render rules in xhtmlDoctype.

This is not documented in https://docs.typo3.org/typo3cms/TyposcriptReference/Setup/Config/Index.html#xhtmldoctype . Additionally, it's really confusing to have such sort of dependencies for no reason. Plus, do I have to use if-cascades in the code of extensions to extract e.g. "html5" from multiple sources. I also doubt that this is done everywhere, at least in respect of extensions.

  • We had a user complaining that he can't set a custom docstring and have our extension still extract a doctype for rendering

In most cases config.xhtmlDoctype will do the job (despite the confusing name).

Yes, I agree for most cases (>98%?). But that is not my issue.

  • As soon as XHTML5 is supported by TYPO3, the situation even worsens

Can you explain?

Ok, the situation improves in the sense that the user could put some custom string into config.doctype and have a well-defined value in config.xhtmlDoctype as soon as TYPO3 supports XHTML5. It only worsens because when using XHTML/XHTML5, users tend to set custom strings in config.doctype for some (not always obscure, rather in the spirit of XML specs) reason as stated in https://dev.w3.org/html5/spec-preview/the-xhtml-syntax.html#writing-xhtml-documents :

This is not strictly a violation of the XML specification, but it does contradict the spirit of the XML specification's requirements. This is motivated by a desire for user agents to all handle entities in an interoperable fashion without requiring any network access for handling external subsets.

I'm not sure how browsers and users will make use of XHTML 5 in the future. XHTML 1.x allowed something like (https://www.ibm.com/developerworks/library/x-entities/):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" 
[
    <!ENTITY test-entity "This <em>is</em> an entity.">
]>
  • There should be a future-proof concept instead of extending this once again.

It's hard to know the rules of future doctypes :-(

Using an extensible/futureproof concept for config variables (one variable per concept), future doctypes should not break the concept any more. Basically this is what has gone wrong in the past in respect to this issue (this is my historic point of view on that).

Do you understand my concern now? It's about

  • Well-defined config variables, meaning there is one variable for one concept. Maybe multiple variables for one concept. But not one variable for multiple concepts, which even exclude each other and needed to be "covered up" with "fixes" and dependencies at some time in the past, resulting in even more workarounds and "fixes". As it turns out, values from config.doctype silently moved to config.xhtmlDoctype in the past, and it's not even documented. Not being documented, on could also argue there is a flaw in the code. How can bug fixes be handled if it is not clear what the code should do (and more important: does that make sense or did it just "happen"?)?
  • Cleaner code when processing config variables, meaning to avoid dependencies between variables and avoiding if-cascades to extract the semantics. Readability and testability of code.
  • Performance (a bit at least for avoidance of if-cascades). Maybe map config.doctype to a numeric representation using constants in TSFE to avoid string comparisons, in the spirit of TSFE|xhtmlVersion?
  • Easier configuration. One variable for one concept - everybody gets that.
  • Easier documentation/education/certification/.... It's just confusing users to have dependencies in config variables for no reason.
  • Avoidance of bugs in existing and future code. As said, I doubt that e.g. undocumented usage of config.xhtmlDoctype is implemented all over the place in the way the core uses it. Even if it was documented, I doubt that every developer takes care of the 2% of the users who e.g. use custom strings in config.doctype.
  • Time invested in maintenance of an old workaround for a strange conception, potentially affecting nearly the whole community for ~ a decade. Better get rid of the workaround with a clean concept and invest the saved time for something useful in future decades.
  • Framework support for extension developers. Maintainable API. Thereby avoiding breaking changes in the future via API/abstraction/interfaces. Avoiding redundant code all over the extensions.
  • Clarity of what configuration variables actually mean. In extensions, processed semantics of configuration does not necessarily follow the logic of the core. Actually, it's not really clear which one would be right or wrong.
  • Code which is processing configuration should follow a concept for configuration. Not the other way round - It's still human beings who (mis)configure.
  • Besides a deprecation concept and a numeric representation, configuration can be provided 1:1 e.g. in TSFE (no tweaks like the logic you explained).
  • If there is need for such a logic, it should use its own set of variables (or better: a class with getters, e.g. isHtml5()). Currently, one puts something in (via config) and then has to reverse engineer which sources have to be taken into account and how TYPO3's logic has silently changed the configuration values.
  • To sum it up: One variable for one concept is meant to make things easier for everybody.

I'll add the XHTML5 doctype when development of 8.1 starts (it's feature) and see what needs to be changed to documentation.

This issue does not necessarily have to be done together with XHTML5 support, because XHTML5 support does (hopefully) not introduce a new concept in the sense of this issue, so XHTML5 support is orthogonal to this issue. However, I think that just extending these config variables over and over is exactly what went wrong in the past. So it would simply be a great opportunity to consider this issue together with XHTML5 support, now that you touch it "anyways". Plus, would users opting in for the all-new XHTML 5 support not have to change their config shortly after that (However, that should not be a big issue).

Hence, we'd now have an excellent time window to do this issue, because (I guess) many users went from XHTML 1.x to HTML 5 while there should not be a lot of need for custom strings etc. in HTML5. XHTML 5 support in TYPO3 closes that window as soon as users widely go for XHTML 5.

There is also an excellent opportunity to change undocumented behaviour instead of documenting actual strange behaviour in terms of the undocumented possibility to set e.g. html5 in config.xhtmlDoctype. As soon as it's documented, it also should be supported (and not change frequently).

The good thing is: ~98% of all users won't use "none" and custom strings in config.doctype, so they would not necessarily be affected by a breaking change. Hopefully, the other 2% know what they configured, so they should also be capable to follow a breaking change just for custom strings and "none". For affected users, the change itself is as easy as using a new config variable instead. Plus, even if they don't do that, there is a way to reach backwards compatibility (as suggested earlier).

#11 Updated by Jigal van Hemert over 3 years ago

Thomas Mayer wrote:

Jigal van Hemert wrote:

Thomas Mayer wrote:

If the user sets a custom string in config.doctype then config.xhtmlDoctype must be set to a valid XHTML type (note that HTML5 is still missing here as well as XHTML5).

The logic in the code is:
1. if config.xhtmlDoctype is empty, copy the value of config.doctype to it

So we also have to deal with "none" and custom strings in config.xhtmlDoctype. Great ;-)

Please read on before jumping to conclusions!

2. if config.xhtmlDoctype is not empty after 1. assign the value to TSFE|xhtmlDoctype

Then we have "none" and custom strings in TSFE|xhtmlDoctype

3. if config.xhtmlDoctype is not one of the supported XHTML types, set TSFE|xhtmlDoctype to empty string

Being empty, how can TSFE|xhtmlDoctype be used to semanticly extract the way content (HTML tags, Meta tags, etc.) should be rendered according to [HTML 4], XHTML 1.x, HTML 5, XHTML 5, ...? Falling back to a default is not an option here, because the document type might be well-defined in the output via a custom string for config.doctype.

Both config.xhtmlDoctype and TSFE|xhtmlDoctype are used to base rendering decisions on. TSFE|xhtmlDoctype is for example useful to check if there is XHTML doctype or not. In some cases you have to check specific features (such as if a target attribute is allowed) and you can use the specific values.

You can set it to 'HTML5' for rendering (config.xhtmlDoctype); the core does that too. XHTML5 is not present yet (will add support in 8.1).

I know that I can set this. As described, the use case is an extension which must deal with what a random user has configured, including "none" and custom strings. That said, it's not that easy.

Although someone could set config.xhtmlDoctype to a custom string it will not help a lot with the rendering. You should limit your checks to the predefined types.

  • Even if HTML5 could be set in config.xhtmlDoctype, this does not correspond to the config variable name config.xhtmlDoctype (it's not XHTML!).

True. The variable name was chosen at a time when something flexible like HTML5 was not conceived yet. HTML4 was old and XHTML was the future. renderDoctype would probably be better, but it's hard to justify such a breaking change for the sake of a better variable name only.

This is a historic consideration. I agree, changing the name just for solving a naming issue seems not very adequate. But it's more than that. It's about using one config variable for one concept and it's about not to use one config variable for multiple concepts.

I did not suggest to replace config.xhtmlDoctype with e.g. config.renderDoctype. I'd rather suggest to use config.xhtmlDoctype for what the name of the variable name indicates. Same for config.doctype. Same for the new variable config.customDoctypeString (as a consequence).

What about a completely different solution: most of the time it's only necessary to know about certain features or behaviour. Would it be a solution if there was an API to find out things like isTargetAttributeAllowed(), hasSelfClosingTags(), and of course a way to find the doctype whose rendering rules are followed. This way the decision logic is centralized and can be extended for additional doctypes.

Not sure how to deal with the entry "xhtml+rdfa_10" for the XHTML+RDFa 1.0 doctype. There's also RDFa support for HTML4 and HTML5 according to https://www.w3.org/TR/html-rdfa/.

So far the only thing that it does is output the correct doctype header. We could easily add it to the various doctype keywords. The API would then have a function to detect RDFa support.

What is meant by "doctype" here?

In extensions (like metaseo, see link in description), I want to extract if meta tags and html tags should be rendered according to HTML 4, XHTML, HTML 5, XHTML 5, etc.

How about that API to detect the rendering rules and features of the doctype?

I'll also discuss if changing the TS properties makes sense.

#12 Updated by Thomas Mayer over 3 years ago

Jigal van Hemert wrote:

Thomas Mayer wrote:

Jigal van Hemert wrote:

Thomas Mayer wrote:

What about a completely different solution: most of the time it's only necessary to know about certain features or behaviour. Would it be a solution if there was an API to find out things like isTargetAttributeAllowed(), hasSelfClosingTags(), and of course a way to find the doctype whose rendering rules are followed. This way the decision logic is centralized and can be extended for additional doctypes.

Framework support would be the next step, consequently (I already suggested that). But it does not solve this issue which is about quirky use of configuration variables (naming!) and unclear meaning according to documentation. This issue does not deal with TYPO3's internal logic in the first place.

Framework support could also replace or enhance a deprecation concept for existing definition of configuration variables.

Framework support could also improve performance by internal caching of (boolean) variables for the getters, so that a lot of string comparisons can be avoided. Together with a lot of redundant code.

Besides extensions, it could be used all over the core, allowing a lot of cleanup, improve maintainability/testability and maybe even performance.

However, getters like hasSelfClosingTags() don't solve the issue which meta tags are allowed e.g. for HTML 5. For metaseo, we still need to know which set of meta tags we can provide for HTML 5, according to https://wiki.whatwg.org/wiki/MetaExtensions. Basically, that means that we still need to know if we have to render for HTML 5, so we would still require a function like isHtml5().

But yes, framework support goes exactly to the right direction. Next step would be (better) framework support to do the rendering itself (e.g. in lightweight manner, but using DOM-inspired OOP structures instead of string concatenation as an internal data structure). I could imagine such approaches have been discussed already for a long time. But that is another (big) issue, and it also might have some impact on performance and memory usage. In the long run, I also have concepts like extensible framework support for structured data in mind (e.g. https://developers.google.com/structured-data/rich-snippets/products#examples). But that's out of focus/off-topic for this issue.

Not sure how to deal with the entry "xhtml+rdfa_10" for the XHTML+RDFa 1.0 doctype. There's also RDFa support for HTML4 and HTML5 according to https://www.w3.org/TR/html-rdfa/.

So far the only thing that it does is output the correct doctype header. We could easily add it to the various doctype keywords. The API would then have a function to detect RDFa support.

Yes, API/Framework support could provide that. Extensions could use it as long as the API's feature set allows it.

What is meant by "doctype" here?

In extensions (like metaseo, see link in description), I want to extract if meta tags and html tags should be rendered according to HTML 4, XHTML, HTML 5, XHTML 5, etc.

How about that API to detect the rendering rules and features of the doctype?

Yes, yes. Go for it.

I'll also discuss if changing the TS properties makes sense.

In the end, it should be in the spirit of an extensible CMS to provide such an infrastructure (API/Framework support). From the perspective of an extension developer, I'd be very happy with that because I would not even have to care about original config variables and TYPO3's internal logic. For my use case (extract doctype and its features), that's already 100%. It should also go fine together with #40503 in case you want to use it for the core.

Still, I think the quirky configuration variables need to be cleaned up, together with documentation.

#13 Updated by Λάθε βιώσας over 3 years ago

When I read this issue and looked to my configuration I had some thoughts.

I am running a Typo3 installation (v7.6) producing since longer time valid XHTML 5 output. It is now possible with only a few tricks by following configuration:

# XHTML-Anpassungen:
config.xmlprologue = <?xml version="1.0" encoding="utf-8"?>
config.htmlTag_setParams = xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sch="http://schema.org/" xmlns:og="http://ogp.me/ns#" xml:lang="de" 

# Festlegung des HTML-Standards:
config.doctype = html5

# Festlegung des XHTML-Standards:
config.xhtmlDoctype = xhtml_11

Interesting hereby is that I need

config.doctype = html5
for the correct configuration of the generation of HTML and
config.xhtmlDoctype = xhtml_11
just for the output as correct XML based HTML.
Except for XHTML 2 which was never productive (although some concepts were very revolutionary like the universal href attribute or the universal h attribute) nearly all XHTML versions are corresponding to a HTML generation.
My idea for longer time would be to use config.doctype for the HTML generation (4, 5, 6...) and a new config.renderingType (or similar) just for the rendering type (SGML or XHTML, default SGML). An additional config.customHeader could used also for the different variations of XHTML 1 if still needed (to define as generation 4).

#14 Updated by Benni Mack over 3 years ago

  • Target version changed from 8.0 to 8.2

#15 Updated by Benni Mack over 3 years ago

  • Target version changed from 8.2 to 8.3

#16 Updated by Benni Mack over 3 years ago

  • Target version changed from 8.3 to 8.4

#17 Updated by Benni Mack about 3 years ago

  • Target version changed from 8.4 to 8.5

#18 Updated by Benni Mack almost 3 years ago

  • Target version changed from 8.5 to 8.6

#19 Updated by Benni Mack almost 3 years ago

  • Target version changed from 8.6 to 8 LTS

#20 Updated by Benni Mack over 2 years ago

  • Target version changed from 8 LTS to Candidate for patchlevel

#21 Updated by Oliver Hader about 2 years ago

  • Category changed from 1050 to Frontend

#22 Updated by Susanne Moog 9 months ago

  • Tracker changed from Task to Feature
  • Status changed from New to Needs Feedback

@Jigal will you continue here?

Also available in: Atom PDF