Task #80014
closedRemove extbase ensureCorrectPostOrderingByCategoryTitle test
100%
Description
The test is based on the assumption that a relational database can easily sort across a M:N relation. This is not the case, sorting across M:N relations using a series of joins is an indeterministic operation at best and not supported by any standard compliant database.
The reason for this is that GROUP BY evaluation happens before ORDER BY. Under the assumption that a Post has two categories assigned the row which should be the result of the aggregation can not reliably be determined by the database and a random record based on the joined rows would have to be chosen. In one query this could be the Post joined to the first category, in another the Post that is joined to the second category.
As this is not deterministic this will lead to failures on most databases. MySQL has an operating mode which enables this random selection (which was enabled by default until MySQL 5.7.5) and as such would execute the query without a hard error.
To exemplify this assume the following join result: (before grouping and ordering)
post.uid | post.title | category.title -------------------------------------- 4 | Post 4 | Cat 2 3 | Post 3 | Cat 2 1 | Post 1 | Cat 1 2 | Post 2 | Cat 5 4 | Post 4 | Cat 3 4 | Post 4 | Cat 1
What would the correct reduction be after the grouping step?
First rows?
post.uid | post.title | category.title -------------------------------------- 4 | Post 4 | Cat 2 3 | Post 3 | Cat 2 1 | Post 1 | Cat 1 2 | Post 2 | Cat 5
Last rows?
post.uid | post.title | category.title -------------------------------------- 3 | Post 3 | Cat 2 1 | Post 1 | Cat 1 2 | Post 2 | Cat 5 4 | Post 4 | Cat 1
A random one?
post.uid | post.title | category.title -------------------------------------- 3 | Post 3 | Cat 2 1 | Post 1 | Cat 1 2 | Post 2 | Cat 5 4 | Post 4 | Cat 3
Given the uncertainties in this step the outcome of the order by could be that Post 4 is listed with Cat 1,2 or 3 and all would be "correct" by definition as there is no rule as to how to reduce this.
As this is a limitation of the database engines/the undefined expected outcome I would remove this test entirely.
If this works on a MySQL database that has the partial group by extension activated and the user can live with possibly random results give them the results, otherwise let the query fail with an appropriate message from the database:
MySQL:
TYPO3\CMS\Extbase\Persistence\Generic\Storage\Exception\SqlErrorException: Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'test_ftb66966d.sys_category.title' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
PostgreSQL:
TYPO3\CMS\Extbase\Persistence\Generic\Storage\Exception\SqlErrorException: SQLSTATE[42803]: Grouping error: 7 ERROR: column "sys_category.title" must appear in the GROUP BY clause or be used in an aggregate function
Updated by Gerrit Code Review over 7 years ago
- Status changed from New to Under Review
Patch set 1 for branch master of project Packages/TYPO3.CMS has been pushed to the review server.
It is available at https://review.typo3.org/51844
Updated by Christian Kuhn over 7 years ago
- Sprint Focus set to Stabilization Sprint
Updated by Morton Jonuschat over 7 years ago
- Status changed from Under Review to Resolved
- % Done changed from 0 to 100
Applied in changeset 8fad63d4726a205148ce9aa275f635071e5b1269.
Updated by Markus Klein over 7 years ago
For the record. The issue actually is that the doctrine migration for the MM part is again the reason and is therefore the 3rd regression to the migration patch!
The former distinct was changed to a groupby uid, which leaves the orderby out of scope and causes this problem.
The test itself is therefore correct, but the underlying extbase query parser is the problem.
Updated by Morton Jonuschat over 7 years ago
In this situation DISTINCT and GROUP BY are identical and the behavior after the fix is not a regression but by design of the standard. It only ever worked because ONLY_FULL_GROUP_BY was not enabled by default.
Quoting from https://dev.mysql.com/doc/refman/5.7/en/distinct-optimization.html¶
In most cases, a DISTINCT clause can be considered as a special case of GROUP BY. For example, the following two queries are equivalent:
SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const; SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3;
Quoting from https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html¶
In MySQL 5.7.5 and higher, ONLY_FULL_GROUP_BY also affects handling of queries that use DISTINCT and ORDER BY. Consider the case of a table t with three columns c1, c2, and c3 that contains these rows:
c1 c2 c3 1 2 A 3 4 B 1 2 C
Suppose that we execute the following query, expecting the results to be ordered by c3:
SELECT DISTINCT c1, c2 FROM t ORDER BY c3;
To order the result, duplicates must be eliminated first. But to do so, should we keep the first row or the third? This arbitrary choice influences the retained value of c3, which in turn influences ordering and makes it arbitrary as well. To prevent this problem, a query that has DISTINCT and ORDER BY is rejected as invalid if any ORDER BY expression does not satisfy at least one of these conditions:
- The expression is equal to one in the select list
- All columns referenced by the expression and belonging to the query's selected tables are elements of the select list
Updated by Christian Kuhn over 7 years ago
- Related to Feature #80380: Missing Distinct / GroupBy option in Query added
Updated by Riccardo De Contardi about 7 years ago
- Status changed from Resolved to Closed