Languages
Version 1 (Ingo Renner, 2012-04-25 17:31)
| 1 | 1 | h1. Languages |
|
|---|---|---|---|
| 2 | 1 | ||
| 3 | 1 | By default the TYPO3 Solr extension comes with predefined schema.xml files for several languages which are directly supported by Apache Solr. |
|
| 4 | 1 | ||
| 5 | 1 | When installing the Solr server the install script allows to provide a list of languages to install: |
|
| 6 | 1 | <pre> |
|
| 7 | 1 | sudo ./install-solr.sh german english french |
|
| 8 | 1 | </pre> |
|
| 9 | 1 | ||
| 10 | 1 | If you do not specify a list of languages the script will set up an english only Solr server. |
|
| 11 | 1 | ||
| 12 | 1 | Beware though, although the install script can install the different languages, it does not configure cores for the languages. Setting up cores for languages is easy though. Open @/opt/solr-tomcat/solr/solr.xml@ and copy and adjust the core definition as you need. |
|
| 13 | 1 | ||
| 14 | 1 | Example for german, english, french: |
|
| 15 | 1 | <pre> |
|
| 16 | 1 | <?xml version="1.0" encoding="UTF-8" ?> |
|
| 17 | 1 | <solr persistent="true"> |
|
| 18 | 1 | <cores adminPath="/admin/cores" shareSchema="true"> |
|
| 19 | 1 | <core name="core_en" instanceDir="typo3cores" schema="english/schema.xml" dataDir="data/core_en" /> |
|
| 20 | 1 | <core name="core_fr" instanceDir="typo3cores" schema="french/schema.xml" dataDir="data/core_fr" /> |
|
| 21 | 1 | <core name="core_de" instanceDir="typo3cores" schema="german/schema.xml" dataDir="data/core_de" /> |
|
| 22 | 1 | </cores> |
|
| 23 | 1 | </solr> |
|
| 24 | 1 | </pre> |
|
| 25 | 1 | ||
| 26 | 1 | h2. Languages not provided by Solr |
|
| 27 | 1 | ||
| 28 | 1 | Basically you can index any language with Solr, to get better search results you would want a stemmer for you language though to be able to search for different "variants" of a word. Basic searching works without stemming. Although Solr comes with stemmers for many languages already it does not provide stemmers for each and every language out of the box. |
|
| 29 | 1 | ||
| 30 | 1 | Starting with version 3.5 Apache Solr comes with a stemmer that uses "Hunspell":http://hunspell.sourceforge.net . The Hunspell stemmer works with dictionaries and rules to do its job. Fortunately the Hunspell stemmer can use the dictionaries and rules provided by the OpenOffice project. |
|
| 31 | 1 | ||
| 32 | 1 | The dictionaries and rules are provided as "OpenOffice extensions"http://extensions.services.openoffice.org/dictionary . Download the dictionary extension you need and simply unzip it - the extensions are simple zip files. In the unpacked extension you'll find a @.aff@ file and a @.dic@ file with the same name, those are needed to get your Hunspell stemmer working. |
|
| 33 | 1 | ||
| 34 | 1 | Example for slovenian Hunspell stemmer: |
|
| 35 | 1 | <pre> |
|
| 36 | 1 | <filter class=”org.apache.solr.analysis.HunspellStemFilterFactory” dictionary=”sl_SL.dic” affix=”sl_SL.aff”> |
|
| 37 | 1 | </pre> |
|
| 38 | 1 | ||
| 39 | 1 | Using these OpenOffice dictionaries you get stemming support for a lot of languages. |