Bug #20698
closedIndexed Search on Windows does not index pdf files using xpdf (pdfinfo and pdftotext)
0%
Description
I am using TYPO3 4.2.3 WinInstaller (on WinXP) in my test-environment and Windows 2003 Server and IIS6 as productive environment.
I could not get the indexed_search to index my PDF-Files.
Doing some research in the sources, I discovered, that two problems were responsible:
1) indexed_search used the filename in URL-format making it impossibe to have whitespaces within the filenames. (a "x y.z" becomes a "x%20y.z", turns whitespace into "%20").
2) pdfinfo.exe: when using the 'exec($command,$output)' command the output is expected to be an array with each line being a new variable in the array. This seems to work within Linux, but the windows version writes the whole output in the first variable of the array.
This way the number of pages of the pdf-document cannot be determined and the PDF-file is not being indexed.
1) indexed_search installed and functional
2) Path to PDF parsers[pdftools] = C:\TYPO3_4.2.3\xpdf (set path to pdftools and set right file rights, disable 'open_basedir')
3) create page with link to local PDF-File
4) click on the page created above ... page gets indexed (check in backend->info->indexed_search)
=> page gets indexed, but PDF does not
I attached the patch I have written for myself.
Just the two files
- 'class.indexer.php'
- 'class.external_parser.php'
are subject of change.
The attached files have each included:
- original.php (original as of TYPO3 4.2.6)
- patched.php (original including my patch)
- diff.txt (just show the differences between both of the above)
- report.html (html page showing both files in sync with changes highlighted)
(issue imported from #M11448)
Files