Project

General

Profile

Actions

Bug #95238

open

Metatags Keywords are not indexed by indexed_search

Added by Johannes Regner about 3 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Start date:
2021-09-16
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
11
PHP Version:
7.4
Tags:
indexed_search
Complexity:
Is Regression:
Sprint Focus:

Description

Hi, i had the problem that the indexed_search doesn't find any meta keywords on the search results.
Then i debugged this part:
https://github.com/TYPO3/typo3/blob/master/typo3/sysext/indexed_search/Classes/Indexer.php#L400-L415

 if ($this->conf['index_metatags']) {
            $meta = [];
            $i = 0;
            while ($this->embracingTags($headPart, 'meta', $dummy, $headPart, $meta[$i])) {
                $i++;
            }

            // @todo The code below stops at first unset tag. Is that correct?
            for ($i = 0; isset($meta[$i]); $i++) {
                // decode HTML entities, meta tag content needs to be encoded later
                $meta[$i] = GeneralUtility::get_tag_attributes($meta[$i], true);
                if (stripos($meta[$i]['name'], 'keywords') !== false) {
                    $contentArr['keywords'] .= ',' . $this->addSpacesToKeywordList($meta[$i]['content']);

                }
                if (stripos($meta[$i]['name'], 'description') !== false) {
                    $contentArr['description'] .= ',' . $meta[$i]['content'];
                }
            }

The problem is, that the while only found the hreflang meta tag an then stop working.
Maybe you can change the part like this, with the new MetaTagApi ... it worked for me :)

        // get keywords and description metatags
        if ($this->conf['index_metatags']) {
            // Get Keywords
            $metaTagManager = GeneralUtility::makeInstance(MetaTagManagerRegistry::class)->getManagerForProperty('keywords');
            $keywords = $metaTagManager->getProperty('keywords');
            if(!empty($keywords[0]['content'])) $contentArr['keywords'] .= ',' . $this->addSpacesToKeywordList($keywords[0]['content']);
            // Get Description
            $metaTagManager = GeneralUtility::makeInstance(MetaTagManagerRegistry::class)->getManagerForProperty('description');
            $pageDescription = $metaTagManager->getProperty('description');
            if(!empty($pageDescription[0]['content'])) $contentArr['description'] .= ',' . $pageDescription[0]['content'];
        }

Actions #1

Updated by Christian Hackl about 3 years ago

I have looked around a bit in the Class Indexer:
the Title comes from "indexedDocTitle" for whatever reason(?) and the rest of the meta tags from the HTML Content. But the HTML Content is not rendered yet - there are still the placeholders in it, something like "".
e. g. Indexer.php line: 319 - $this->conf['content'];

There it is already clear that he can not parse out meta tags...

Actions #2

Updated by Christian Hackl about 3 years ago

Workaround:
In your own "ext_localconf.php" write:

unset($GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['tslib/class.tslib_fe.php']['contentPostProc-cached']['indexed_search']);

create a PSR-15 Middleware and call it AFTER typo3/cms-frontend/tsfe
Then at the Middleware process() method, call something like:

public function process(ServerRequestInterface $request, RequestHandlerInterface $handler): ResponseInterface {
// ...
$tsfe = $GLOBALS['TSFE'];
$TypoScriptFrontendHook = GeneralUtility::makeInstance(\TYPO3\CMS\IndexedSearch\Hook\TypoScriptFrontendHook::class);
$TypoScriptFrontendHook->indexPageContent([], $tsfe);
// ...
}

In this solution the "no_cache" condition is not considered.

If you want consider the "no_cache" write something like:

public function process(ServerRequestInterface $request, RequestHandlerInterface $handler): ResponseInterface {
    $response = $handler->handle($request);

    $tsfe = $GLOBALS['TSFE'];
    $TypoScriptFrontendHook = GeneralUtility::makeInstance(\TYPO3\CMS\IndexedSearch\Hook\TypoScriptFrontendHook::class);
    if(!$tsfe->no_cache) {
        $TypoScriptFrontendHook->indexPageContent([], $tsfe);
    }

    return $response;
}

just put something together, maybe someone needs:
https://github.com/Hauer-Heinrich/hh_indexed_search

Actions #3

Updated by B. Kausch almost 2 years ago

  • TYPO3 Version changed from 10 to 11

This is clearly a bug. Since switching to the Metatag API, the head content looks like this:

<head>

<meta charset="utf-8">
<!-- 
    This website is powered by TYPO3 - inspiring people to share!
    TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL.
    TYPO3 is copyright 1998-2023 of Kasper Skaarhoj. Extensions are copyright of their respective owners.
    Information and contribution at https://typo3.org/
-->

<!-- ###TITLEdfaab768279a90e4d957fa20450f0d20### -->
<!-- ###METAdfaab768279a90e4d957fa20450f0d20### -->

<!-- ###CSS_LIBSdfaab768279a90e4d957fa20450f0d20### -->
<!-- ###CSS_INCLUDEdfaab768279a90e4d957fa20450f0d20### -->
<!-- ###CSS_INLINEdfaab768279a90e4d957fa20450f0d20### -->

<!-- ###JS_LIBSdfaab768279a90e4d957fa20450f0d20### -->
<!-- ###JS_INCLUDEdfaab768279a90e4d957fa20450f0d20### -->
<!-- ###JS_INLINEdfaab768279a90e4d957fa20450f0d20### -->

<!-- ###HEADERDATAdfaab768279a90e4d957fa20450f0d20### -->
</head>

No meta tags to be found...

Actions

Also available in: Atom PDF