Feature #104001: Data Hydration / Relation Resolving - TYPO3 Core - TYPO3 Forge

Actions

Copy link

Feature #104001

open

Epic #103998: Improve handling of custom Content Types

Data Hydration / Relation Resolving

Added by Benni Mack 6 months ago. Updated about 1 month ago.

Status:

Accepted

Priority:

Should have

Assignee:

Category:

Content Rendering

Target version:

Candidate for Major Version

Start date:

2024-06-07

Due date:

% Done:

Estimated time:

PHP Version:

Tags:

Complexity:

Sprint Focus:

Description

With the new Records objects, we need an API that is powerful, useful in BE (Page Module / CE Preview) and FE to resolve relations to them. It really needs to be bullet-proof and performant.

----
Lazy-Eager Record fetching in TYPO3 v13 - A document by Benni in June 2024.

When TYPO3 fetches records for Frontend rendering, it usually takes a database row, and one-by-one checks for language overlay and version overlay records. A relation is then needed to be attached manually and resolved manually.

The typical places where integrators and PHP developers are working on this are:
- DatabaseQueryProcesor
- ContentObjectRenderer->getRecords()
Some other places:
- Extbase does their own magic via DataMapper and LazyObjectStorage

However, we have made some huge steps forward in v13:
- Record API knows which fields are needed for a specific CType in tt_content
- Schema API knows which fields can could relations to other records.
- Reference Index knows all relations to other records on a given PID.

My proposed solution is a so called "Lazy-Eager" Processing.

An example is to render tt_content records with relations to other tables (tt_content.accordion_items for a list of "accordion item" or "sys_file_reference" for tt_content.image).

1. Lazy means: We only fetch the database records for the inline relations (such as "accordion items") when they are accessed the first time. Due to utilizing Schema API and Record API, we only need to do this on CTypes that have the field configured in their list of fields (showitem).

2. Eager means: We load ALL of the database records for a given PID at once. If we have 3 tt_content records of type accordion with 5 inline relations at once, we load ALL inline relations of that PID on the first "access". We keep the 15 DB result arrays in memory, as it is much faster to have 1 DB query and 15 arrays than having 3 DB queries.

3. We then utilize the Reference Index to find out, which records we actually need for a given relation of the "first" content element which taps into the access. We also do this in a "bulk" way: We fetch all relations from sys_refindex with a IN query. The Reference Index then needs to cluster the results (while keeping the sort order) of the field name, the. Reference Index is the place.

Effectively in code, Step 3 is before Step 2 - we fetch all information before we fetch the full records, leaving us with the following SQL queries per page (default language, live workspace):

- Routing: Fetch current page
- Routing: Fetch rootline elements
- Content Elements: Fetch all tt_content elements of the current selected page grouped by colPos
- Relations: Query sys_refindex for all used records with a IN query
- Record Query (once per DB table): Fetch all records of a DB table with a IN query

Ideally, we have a maximum of 5 DB queries per page on a non-cached page (+1 query per "inline DB table"). In addition, we have links to other pages that need to be resolved.

We have a few places to build and separate responsibility:

All of them need to have the Context object, as this is the one place which we can modify

1. RecordFactory [already there] -> creates Record objects out of DB rows
-> In relational fields we need a proxy object to handle the lazy resolving of the related or inline records

2. RelationResolver
-> Find all relations (as objects) from a UID, Field and Table Name
-> Queries the refindex for the relations, and returns the needed UIDs for this in a grouped way
-> has a runtime cache because it should only query the sys_refindex DB table once

3. RecordQueryBuilder
-> High-Level: Give me all records (objects) for a relation field from a record.
-> Does the "eager" part by fetching all records of a PID
-> Uses the RelationResolver to only filter the UIDs needed from the list of all PIDs
-> has a special handling for File (we need a better JOIN query here)

In general, we can also validate if we have one layer before this layer, which can also deal with "flex" and "json" fields, but that's probably inside the RecordFactory then.