Project

General

Profile

Actions

Bug #86797

open

No field value tranformation/URL sanitation when using extbase extension custom field as a route path segment

Added by Michael Hitzler almost 6 years ago. Updated about 3 years ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Site Handling, Site Sets & Routing
Target version:
-
Start date:
2018-10-30
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
9
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

When I try to use a custom field for routing in an extbase extension the field value does not get transformed to a proper URL segment (whitespace replace, special chars / replace / remove, lower case, etc.)

In order to reproduce you can use the tx_news extension and instead of using the slug field we use the title field.

In both cases, with the PersistedAliasMapper or the PersistedPatternMapper, the resulting path segment still contains whitespaces, special chars and uppercase letters which leads to invalid URLs.

    aspects:
      news_title:
        type: PersistedAliasMapper
        tableName: 'tx_news_domain_model_news'
        routeFieldName: 'path_segment'

or

        aspects:
            news_title:
                type: PersistedPatternMapper
                tableName: 'tx_news_domain_model_news'
                routeFieldPattern: '^(?P<title>.+)-(?P<uid>\d+)$'
                routeFieldResult: '{title}-{uid}'

Related issues 1 (1 open0 closed)

Related to TYPO3 Core - Bug #86747: Using PersistedPatternMapper has some drawbacksNew2018-10-26

Actions
Actions #1

Updated by Guido Schmechel almost 6 years ago

  • Related to Bug #86747: Using PersistedPatternMapper has some drawbacks added
Actions #2

Updated by Jochen Rieger almost 6 years ago

I would be great to have a config option which would URLize the string handed over, eg. from a common title field.

Actions #3

Updated by Jochen Rieger almost 6 years ago

I guess,

\TYPO3\CMS\Core\DataHandling\SlugHelper->sanitize($string)

could be quite helpful here.

Actions #4

Updated by Guido Schmechel almost 6 years ago

To use (and adapt) the sanitize fucntion is my plan for a first patch. I think the configuration part is a bigger topic and this would be more a topic for v10.

Actions #5

Updated by Jochen Rieger almost 6 years ago

Bigger topic, for sure. But I would definitely see it as a must have for a fully working core-based URL routing. Hence, something vor v9.

I'm not talking about the backend module and corresponding UI-based solution; that might well be a comfort-thing to be added in v10.

But providing integrators with the possibility to add a simple PersistedAliasMapper aspect to use the title of a custom record table – I think, that's mandatory for a comprehensive onboard URL routing.

I'm also digging into \TYPO3\CMS\Core\Routing\Aspect\PersistedAliasMapper to see if one could add the sanitize function somewhere there.

Actions #6

Updated by Guido Schmechel almost 6 years ago

Of course uyou can use a custom field:
:yaml:

routeEnhancers:
Blog:
type: Extbase
limitToPages: [13]
extension: BlogExample
plugin: Pi1
routes:
- { routePath: '/blog/{blogpost}', _controller: 'Blog::detail', _arguments: {'blogpost': 'post'} }
defaultController: 'Blog::detail'
aspects:
blogpost:
type: PersistedPatternMapper
tableName: 'tx_blogexample_domain_model_post'
routeFieldPattern: '^(?P&lt;title&gt;.+)-(?P&lt;uid&gt;\d+)$'
routeFieldResult: '{title}-{uid}'

see https://review.typo3.org/c/58384/72/typo3/sysext/core/Documentation/Changelog/master/Feature-86365-RoutingEnhancersAndAspects.rst

But same sanitize problem here. We need, in my option, an "AbstractAspect" which introduces sanitize like in the SlugHelper.

Actions #7

Updated by Jochen Rieger almost 6 years ago

Thanks for the clarification. I'm using the PersistedAliasMapper already for the custom field. But as you outlined as well, it'll just grab the title string and append it to the URL, eg.

/my-page-path/details/My Custom Title Of A Record

And this is the issue I'm referring to when say it is a must have yaml config option based functionality to be included in T3 v9's route / URL handling.

Actions #8

Updated by Jochen Rieger almost 6 years ago

So, adding a sanitizing option, eg. to

\TYPO3\CMS\Core\Routing\Aspect\PersistedAliasMapper->generate()

is the easy part.

The complex one will be the resolver. As far as I can tell up until now, there is no cache (db table or similar) which would store the sanitized string alongside it's "partner" GET parameters. That currently makes it nearly impossible to match and resolve, when using a non-slug field of a db table.

RealURL used the URL cache tables for that. I guess one would have to add such a caching mechanism, which indeed seems to make the whole feature much more complex. Yet, still something needed badly, I would say.

Actions #9

Updated by Michael Hitzler almost 6 years ago

That's exactly what I am afraid of. How can a resolver get the origial record / title used as path segment without any intermediate field / table? A dash can be a dash or an empty space, replacement of special chars are not unique, etc.

Actions #10

Updated by Oliver Hader almost 6 years ago

Actually PersistedPatternMapper was not meant to be used für arbitrary field values - title in the class comments is misleading here.
As you found out, generate is pretty simple since it mostly happens in PHP and reduces the character set (e.g. 'Hello "World"!' just gets 'hello-world'). But resolve is the interesting case.

The routing functionality in TYPO3 v9 does NOT use a cache for URLs and hopefully NEVER does - because using an URL cache like RealURL did implies that URLs have been generated before they can be resolved.

Assume the title value of record with uid=13 was "Hello World" then the URL:
  • results on /hello-world-13 as URL result
  • if title value gets changed later to "Hey new World" the URL would be /hey-new-world-13
  • the previous URL becomes invalid and results in a 404 result

Approach #1: Ignore title when resolving

  • routeFieldPattern (resolve) and routeFieldResult (generate) are independent from each other
  • using regular expression patterns in order to ignore the title part on resolving
  • thus title could be anything, only a numeric uid is required on resolving
aspects:
    news_title:
        type: PersistedPatternMapper
        tableName: 'tx_news_domain_model_news'
        routeFieldPattern: '^(?:.+)-(?P<uid>\d+)$'
        routeFieldResult: '{title}-{uid}'

routeFieldPattern just needs to ensure title is not captured in matches (not using ?P<title>). The following examples would be possible as well:

        routeFieldPattern: '(?P<uid>\d+)$'
        routeFieldPattern: '^[a-z0-9-]+-(?P<uid>\d+)$'

The sanitize part using SlugHelper::sanitize is missing and could be the patch for this issue.
Downside of this approach is that one gets duplicate content URLs whenever the title changes.

Using these arbitrary URLs somewhere in your rendering output or logic would lead to cache poisoning e.g. https://example.org/some-page/news/free-online-casino-xxx-advertisement-13 being in your side when URLs are not generated anymore but "reused" from e.g. REQUEST_URI.

Approach #2: Use persisted slug

similar to https://stackoverflow.com/questions/52880663/typo3-route-enhancers-converting-routefieldname-to-lowercase

  • path_segment is automatically normalized
  • it can be generated based on the title
  • but does not get updated automatically (since old URL would become invalid then without additional redirects handling)
aspects:
    news_title:
        type: PersistedPatternMapper
        tableName: 'tx_news_domain_model_news'
        routeFieldPattern: '^(?P<path_segment>.+)-(?P<uid>\d+)$'
        routeFieldResult: '{path_segment}-{uid}'

This is related to issue #86740 concerning normalizing slashes in slug fields (for pages that desired, for news probably not).

Documentation

Documentation has to be enhanced and updated accordingly.

Actions #11

Updated by Jochen Rieger almost 6 years ago

Thanks, Oliver, for the great and comprehensive guide through the challenges and actual options.

Using the pattern-approach and including the UID is a nice workaround, which I didn't catch at first glance. Good one! But of course, the maybe-duplicate-URL-topic might be a show stopper in some cases.

The slug-field approach worked very well during my first contact with RouteEnhancers, already. And I like how the slug field is being handled inside the editing forms. Especially, that you need to update it intentionally and manually after eg. a title change.

I still like RealURL's approach to using a cache table for resolving. But I understand, it's a complex topic anyway.

Thanks for, anyway, for the valuable additional information.

Actions #12

Updated by Michael Hitzler almost 6 years ago

Also big thanks to Oliver for clearifing things!

So to put it all together:
What can be patched is using the SlugHelper::sanitize here.

But general recommendation is to use a dedicated slug field for custom extbase extensions, if I get that right.

Actions #13

Updated by Jonas Eberle about 5 years ago

OK, so we will not (yet) make use of a cache (although I disagree with Oliver in that this does not make sense at all - it could be primed by iterating through a table and it has its use cases when dealing with non-editable data). But that's not the point here.

This patch would be only about a feature to allow URLs including a unique identifier which allows on-the-fly resolving.

I suggest a convention like

   routeFieldResult: '{title|sanitized}-{uid}'

This could be used together with

   routeFieldPattern: '-(?P<uid>\d+)$'

to have performance-friendly "simple" SEO URLs that would include the UID (or another unique identifier) of that table record like /some-fancy-title-221

I think there are plenty of use cases for such a solution.

Actions #14

Updated by Jochen Rieger about 5 years ago

@ Jonas:

Jonas, thanks for picking up on the topic. I fully agree, that would be a marvelous solution. Many systems work with an ID in the URL-part while more or less ignoring the "speaking" part.

It would be great to have such a feature in the URL routing options.

Actions #15

Updated by Jonas Eberle about 5 years ago

  • Subject changed from No field value tranformation when using extbase extension custom field as a route path segment to No field value tranformation/URL sanitation when using extbase extension custom field as a route path segment
  • PHP Version deleted (7.2)

As pointed out by Oli and Jochen, the same content would be also accessible via /another-fancy-title-221.

But the negative SEO-impact could be be circumvented with either the "correct" canonical tag (EXT:seo) or maybe even a redirection mechanism.

Actions #16

Updated by Jonas Eberle about 5 years ago

Canonical works out of the box already:

Accessing ..../another-fancy-title-3 yields
<link rel="canonical" href="..../my-fancy-title-3"/>

with the configuration given above :)

Actions #17

Updated by Gerrit Code Review about 5 years ago

  • Status changed from New to Under Review

Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/61044

Actions #18

Updated by Gerrit Code Review about 5 years ago

Patch set 2 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/61044

Actions #19

Updated by Gerrit Code Review about 5 years ago

Patch set 3 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/c/Packages/TYPO3.CMS/+/61044

Actions #20

Updated by Anonymous almost 5 years ago

I have tested the patch. It works fine. But if there is a slash in title it throws an error.
Slashes will not remove in TYPO3\CMS\Core\DataHandling\SlugHelper::sanitize.

Actions #21

Updated by B. Kausch over 4 years ago

Will this patch (working with slashes too!) land in the current LTS in the forseeable future?

Actions #22

Updated by Björn Jacob over 4 years ago

I think no. The patch is abandoned. The author and Benni agreed on putting this one in a separate extension. Furthermore, it was a feature. Features cannot be backported to current LTS versions. Features are master only. v10 is in feature freeze. Earliest version would be v11 though.

Actions #23

Updated by Björn Jacob over 4 years ago

  • Status changed from Under Review to New
Actions #24

Updated by Bernhard Eckl over 3 years ago

Hi,

I tried the patch, but there is a problem with special characters (ä, ö, ü). It gets sanitized, but when resolving the page I get a page not found error.
How can I solve this?

Actions #25

Updated by Markus Hofmann about 3 years ago

Bernhard Eckl wrote in #note-24:

Hi,

I tried the patch, but there is a problem with special characters (ä, ö, ü). It gets sanitized, but when resolving the page I get a page not found error.
How can I solve this?

Hi Bernhard, change your routeFieldPattern to:

routeFieldPattern: '^(.*)-(?P<uid>\d+)$'
Actions

Also available in: Atom PDF