CoreCommunity ExtensionsIncubatorDistributionsTYPO3 4.5 ProjectsTYPO3 4.6 ProjectsTYPO3 4.7 ProjectsTYPO3 6.0 ProjectsTYPO3 6.1 ProjectsTYPO3 6.2 Projects (+)

Languages

Version 1 (Ingo Renner, 2012-04-25 17:31)

1 1
h1. Languages
2 1
3 1
By default the TYPO3 Solr extension comes with predefined schema.xml files for several languages which are directly supported by Apache Solr.
4 1
5 1
When installing the Solr server the install script allows to provide a list of languages to install:
6 1
<pre>
7 1
sudo ./install-solr.sh german english french
8 1
</pre>
9 1
10 1
If you do not specify a list of languages the script will set up an english only Solr server. 
11 1
12 1
Beware though, although the install script can install the different languages, it does not configure cores for the languages. Setting up cores for languages is easy though. Open @/opt/solr-tomcat/solr/solr.xml@ and copy and adjust the core definition as you need.
13 1
14 1
Example for german, english, french:
15 1
<pre>
16 1
<?xml version="1.0" encoding="UTF-8" ?>
17 1
<solr persistent="true">
18 1
	<cores adminPath="/admin/cores" shareSchema="true">		
19 1
		<core name="core_en" instanceDir="typo3cores" schema="english/schema.xml" dataDir="data/core_en" />
20 1
		<core name="core_fr" instanceDir="typo3cores" schema="french/schema.xml" dataDir="data/core_fr" />
21 1
		<core name="core_de" instanceDir="typo3cores" schema="german/schema.xml" dataDir="data/core_de" />
22 1
	</cores>
23 1
</solr>
24 1
</pre>
25 1
26 1
h2. Languages not provided by Solr
27 1
28 1
Basically you can index any language with Solr, to get better search results you would want a stemmer for you language though to be able to search for different "variants" of a word. Basic searching works without stemming. Although Solr comes with stemmers for many languages already it does not provide stemmers for each and every language out of the box.
29 1
30 1
Starting with version 3.5 Apache Solr comes with a stemmer that uses "Hunspell":http://hunspell.sourceforge.net . The Hunspell stemmer works with dictionaries and rules to do its job. Fortunately the Hunspell stemmer can use the dictionaries and rules provided by the OpenOffice project. 
31 1
32 1
The dictionaries and rules are provided as "OpenOffice extensions"http://extensions.services.openoffice.org/dictionary . Download the dictionary extension you need and simply unzip it - the extensions are simple zip files. In the unpacked extension you'll find a @.aff@ file and a @.dic@ file with the same name, those are needed to get your Hunspell stemmer working.
33 1
34 1
Example for slovenian Hunspell stemmer:
35 1
<pre>
36 1
<filter class=”org.apache.solr.analysis.HunspellStemFilterFactory” dictionary=”sl_SL.dic” affix=”sl_SL.aff”>
37 1
</pre>
38 1
39 1
Using these OpenOffice dictionaries you get stemming support for a lot of languages.