Project

General

Profile

Actions

Bug #80899

open

Epic #65815: Improve Indexed search indexer

indexed_search can't extract metadata

Added by Ian Solo over 7 years ago. Updated almost 2 years ago.

Status:
Needs Feedback
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Start date:
2017-04-19
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
8
PHP Version:
7.1
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

indexed_search, in method

\TYPO3\CMS\IndexedSearch\Indexer::splitHTMLContent
tries to extract metadata while indexing but can't because the html content at that point has markers like
<!-- ###META79deef79d064c0ac810f34ff70431fb0### -->

Actions #1

Updated by Benni Mack over 7 years ago

  • Target version changed from 8 LTS to next-patchlevel
Actions #2

Updated by Benni Mack over 5 years ago

  • Target version changed from next-patchlevel to Candidate for patchlevel
Actions #3

Updated by Markus Mächler over 4 years ago

We worked around this bug using the following XClass:

Indexer.php

<?php
namespace Vendor\YourExt;

class Indexer extends \TYPO3\CMS\IndexedSearch\Indexer 
{
    /**
     * Workaround the following bug: https://forge.typo3.org/issues/80899
     *
     * @param string $content
     *
     * @return array|string[]
     */
    public function splitHTMLContent($content)
    {
        $result =  parent::splitHTMLContent($content);
        /** @var \TYPO3\CMS\Core\MetaTag\MetaTagManagerRegistry $metaTagManagerRegistry */
        $metaTagManagerRegistry = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(\TYPO3\CMS\Core\MetaTag\MetaTagManagerRegistry::class);

        if (empty($result['title']) && $GLOBALS['TSFE'] instanceof \TYPO3\CMS\Frontend\Controller\TypoScriptFrontendController) {
            /** @var \TYPO3\CMS\Frontend\Controller\TypoScriptFrontendController $tsfe */
            $tsfe = $GLOBALS['TSFE'];

            if (isset($tsfe->page['seo_title'])) {
                $result['title'] = $tsfe->page['seo_title'];
            } else if (isset($tsfe->page['title'])) {
                $result['title'] = $tsfe->page['title'];
            }
        }

        if (empty($result['keywords'])) {
            $keywordsProperty = $metaTagManagerRegistry->getManagerForProperty('keywords')->getProperty('keywords');

            if (isset($keywordsProperty[0]['content'])) {
                $result['keywords'] = $keywordsProperty[0]['content'];
            }
        }

        if (empty($result['description'])) {
            $descriptionProperty = $metaTagManagerRegistry->getManagerForProperty('description')->getProperty('description');

            if (isset($descriptionProperty[0]['content'])) {
                $result['description'] = $descriptionProperty[0]['content'];
            }
        }

        return $result;
    }
}

ext_localconf.php

$GLOBALS['TYPO3_CONF_VARS']['SYS']['Objects'][\TYPO3\CMS\IndexedSearch\Indexer::class] = array(
    'className' => \Vendor\YourExt\Indexer::class
);
Actions #4

Updated by Riccardo De Contardi almost 4 years ago

  • Parent task set to #65815
Actions #5

Updated by Tomas Norre Mikkelsen about 2 years ago

Cannot extract meta-data of what? PDFs or what is meant here?
Could you please add steps to reproduce?

Actions #6

Updated by Christian Kuhn almost 2 years ago

  • Status changed from New to Needs Feedback
Actions

Also available in: Atom PDF