Languages¶
By default the TYPO3 Solr extension comes with predefined schema.xml files for several languages which are directly supported by Apache Solr.
When installing the Solr server the install script allows to provide a list of languages to install:
sudo ./install-solr.sh german english french
If you do not specify a list of languages the script will set up an english only Solr server.
Beware though, although the install script can install the different languages, it does not configure cores for the languages. Setting up cores for languages is easy though. Open /opt/solr-tomcat/solr/solr.xml and copy and adjust the core definition as you need.
Example for german, english, french:
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
<cores adminPath="/admin/cores" shareSchema="true">
<core name="core_en" instanceDir="typo3cores" schema="english/schema.xml" dataDir="data/core_en" />
<core name="core_fr" instanceDir="typo3cores" schema="french/schema.xml" dataDir="data/core_fr" />
<core name="core_de" instanceDir="typo3cores" schema="german/schema.xml" dataDir="data/core_de" />
</cores>
</solr>
Languages not provided by Solr¶
Basically you can index any language with Solr, to get better search results you would want a stemmer for you language though to be able to search for different "variants" of a word. Basic searching works without stemming and the generic language schema provided since EXT:solr version 2.2. Although Solr comes with stemmers for many languages already it does not provide stemmers for each and every language out of the box.
Starting with version 3.5 Apache Solr comes with a stemmer that uses Hunspell . The Hunspell stemmer works with dictionaries and rules to do its job. Fortunately the Hunspell stemmer can use the dictionaries and rules provided by the OpenOffice project.
The dictionaries and rules are provided as OpenOffice extensions . Download the dictionary extension you need and simply unzip it - the extensions are simple zip files. In the unpacked extension you'll find a .aff file and a .dic file with the same name, those are needed to get your Hunspell stemmer working.
Example for slovenian Hunspell stemmer:
<filter class=”org.apache.solr.analysis.HunspellStemFilterFactory” dictionary=”sl_SL.dic” affix=”sl_SL.aff”>
Using these OpenOffice dictionaries you get stemming support for a lot of languages.