Project

General

Profile

Actions

Bug #99527

open

Epic #65815: Improve Indexed search indexer

indexed_search does not properly index XLSX and PPTX files

Added by Xavier Perseguers about 1 year ago. Updated about 1 year ago.

Status:
In Progress
Priority:
Should have
Category:
Indexed Search
Target version:
-
Start date:
2023-01-12
Due date:
% Done:

50%

Estimated time:
TYPO3 Version:
10
PHP Version:
Tags:
Complexity:
Is Regression:
Sprint Focus:

Description

Working on a client's install, it turns out indexed_search is not capable of properly indexing XLSX and PPTX files.

  • Problem for XLSX: the wrong "unzipped" file is indexed, that file contains some kind of metadata which are basically only pointers (integers). The content extracted is plain wrong and useless
  • Problem for PPTX: only the content of Slide 1 is extracted, it misses content from all other slides

Files

G9-content-in-pptx-xlsx.patch (3.34 KB) G9-content-in-pptx-xlsx.patch Xavier Perseguers, 2023-01-12 14:24
Actions #1

Updated by Xavier Perseguers about 1 year ago

Suggested patch (applied locally via composer patch)

Actions

Also available in: Atom PDF