Project

General

Profile

Actions

Bug #20698

closed

Indexed Search on Windows does not index pdf files using xpdf (pdfinfo and pdftotext)

Added by Robert Wunsch over 15 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Should have
Assignee:
-
Category:
Indexed Search
Target version:
-
Start date:
2009-06-29
Due date:
% Done:

0%

Estimated time:
TYPO3 Version:
4.2
PHP Version:
Tags:
Complexity:
Is Regression:
No
Sprint Focus:

Description

I am using TYPO3 4.2.3 WinInstaller (on WinXP) in my test-environment and Windows 2003 Server and IIS6 as productive environment.

I could not get the indexed_search to index my PDF-Files.

Doing some research in the sources, I discovered, that two problems were responsible:

1) indexed_search used the filename in URL-format making it impossibe to have whitespaces within the filenames. (a "x y.z" becomes a "x%20y.z", turns whitespace into "%20").

2) pdfinfo.exe: when using the 'exec($command,$output)' command the output is expected to be an array with each line being a new variable in the array. This seems to work within Linux, but the windows version writes the whole output in the first variable of the array.
This way the number of pages of the pdf-document cannot be determined and the PDF-file is not being indexed.

1) indexed_search installed and functional
2) Path to PDF parsers[pdftools] = C:\TYPO3_4.2.3\xpdf (set path to pdftools and set right file rights, disable 'open_basedir')
3) create page with link to local PDF-File
4) click on the page created above ... page gets indexed (check in backend->info->indexed_search)

=> page gets indexed, but PDF does not

I attached the patch I have written for myself.

Just the two files
- 'class.indexer.php'
- 'class.external_parser.php'
are subject of change.

The attached files have each included:
- original.php (original as of TYPO3 4.2.6)
- patched.php (original including my patch)
- diff.txt (just show the differences between both of the above)
- report.html (html page showing both files in sync with changes highlighted)

(issue imported from #M11448)


Files

indexed_search modified.7z (57.2 KB) indexed_search modified.7z Administrator Admin, 2009-06-29 19:54
indexed_Search.patch (2.69 KB) indexed_Search.patch Administrator Admin, 2009-06-29 20:16
Actions #1

Updated by Alexander Opitz over 11 years ago

  • Status changed from New to Needs Feedback
  • Target version deleted (0)
  • TYPO3 Version set to 4.2

The issue is very old, does this issue exists in newer versions of TYPO3 CMS (4.5 or 6.1)?

Actions #2

Updated by Alexander Opitz about 11 years ago

  • Status changed from Needs Feedback to Closed
  • Is Regression set to No

No feedback for over 90 days.

Actions

Also available in: Atom PDF