Tx solrindex » History » Revision 31
« Previous |
Revision 31/38
(diff)
| Next »
Stefan Sprenger, 2012-09-12 11:50
Add documentation for queue.pages.indexer.frontendDataHelper
tx_solr.index¶
- Table of contents
- tx_solr.index
- enablePageIndexing
- enableIndexingWhileBeUserLoggedIn
- files
- files.allowedTypes
- additionalFields
- fieldProcessingInstructions
- queue
- queue.[indexingConfiguration]
- queue.[indexingConfiguration].additionalWhereClause
- queue.[indexingConfiguration].additionalPageIds
- queue.[indexingConfiguration].table
- queue.[indexingConfiguration].initialization
- queue.[indexingConfiguration].indexer
- queue.[indexingConfiguration].indexingPriority
- queue.[indexingConfiguration].fields
- queue.[indexingConfiguration].attachments.fields
- queue.pages.indexer.authorization.username
- queue.pages.indexer.authorization.password
- queue.pages.indexer.frontendDataHelper.scheme
- queue.pages.indexer.frontendDataHelper.host
- queue.pages.indexer.frontendDataHelper.path
- Indexing Helpers
enablePageIndexing¶
Since: version:1.0
Removed: version:2.0 (en/disable Index Queue page indexing configuration instead)
Default: 1
Options: 0,1
Type: Boolean
En- / disables frontend page indexing.
enableIndexingWhileBeUserLoggedIn¶
Since: version:1.0
Removed: version:2.0
Default: 1
Options: 0,1
Type: Boolean
Allows to prevent frontend indexing of pages when a backend editor is logged in and browsing the website.
files¶
Since: version:2.0
Default: 1
Options: 0,1
Type: Boolean
En- / disables file indexing.
files.allowedTypes¶
Since: version:2.0
Default:
*Type: String
Defines which file types are allowed to be indexed. Accepts a list of comma-separated file type extensions. Defaults to all file types (*).
Example:
plugin.tx_solr.index.files.allowedTypes = doc, docx, pdf, odt
additionalFields¶
Since: version:1.0
Deprecated: version:2.0
Type: string, cObject (since version:1.1)
See: Dynamic Fields
A mapping of Solr field names to additional string values to be indexed with page documents. Use dynamic fields to index additional data, this way you don't have to modify the schema.xml
Example:
plugin.tx_solr.index.additionalFields { myFirstAdditionalField_stringS = some string mySecondAdditionalField_stringS = TEXT mySecondAdditionalField_stringS { value = some other value that can be constructed using any TypoScript cObject case = upper // more processing here as needed } }
Since version version:1.1 you can use cObjects to generate the value for the field. The only thing to observe is that you generate strings. Other values may work, but haven't been tested yet.
Deprecated since version:2.0, please use the Index Queue indexing configurations instead as it allows you to define more precisely for which types of documents you want which fields to be indexed.
fieldProcessingInstructions¶
Since: version:1.2 version:2.0
Options: timestampToIsoDate, uppercase, pathToHierarchy (version:2.5-dkd), pageUidToHierarchy (version:2.5-dkd)
Type: cObject
Assigns processing instructions to Solr fields during indexing (Syntax: Solr index field = processing instruction name). Currently it is not possible to extend / add own processing instructions.
Before documents are sent to the Solr server they are processed by the field processor service. Currently you can make a filed's value all uppercase, convert a UNIX timestamp to an ISO date, or transform a path into a hierarchy for hierarchical facets (version:2.0 only). Currently you can use only one processing instruction at a time.
Example:
fieldProcessingInstructions { changed = timestampToIsoDate created = timestampToIsoDate endtime = timestampToIsoDate }
queue¶
The Index Queue is a powerful feature introduced with version version:2.0. It allows you to easily index any table in your TYPO3 installation by defining a mapping of SolrFieldName = DatabaseTableFieldNameOrContentObject. The table must be configured / described in TCA, though. To index other, external data sources you might want to check out Solr's Data Import Handler (DIH).
The Index Queue comes preconfigured to index pages (enabled by default) and an example configuration for tt_news (provided as a separate TypoScript template).
Since: version:2.0
Default: pages
Type: Array
Defines a set of table indexing configurations. By convention the name of the indexing configuration also represents the table name. You can name the indexing configuration differently though by explicitly defining the table as a parameter within the indexing configuration. That's useful when indexing records from one table with different configuration - different single view pages / URLs for example.
Example (default configuration for page indexing):
plugin.tx_solr.index.queue { // enables indexing of tt_news reocrds tt_news = 1 tt_news { fields { abstract = short author = author description = short title = title // the special SOLR_CONTENT content object cleans HTML and RTE fields content = SOLR_CONTENT content { field = bodytext } // the special SOLR_RELATION content object resolves relations category_stringM = SOLR_RELATION category_stringM { localField = category multiValue = 1 } // the special SOLR_MULTIVALUE content object allows to index multivalue fields keywords = SOLR_MULTIVALUE keywords { field = keywords } // build the URL through typolink, make sure to use returnLast = url url = TEXT url { typolink.parameter = {$plugin.tt_news.singlePid} typolink.additionalParams = &tx_ttnews[tt_news]={field:uid}&L={field:__solr_index_language} typolink.additionalParams.insertData = 1 typolink.returnLast = url typolink.useCacheHash = 1 } sortAuthor_stringS = author sortTitle_stringS = title } } }
queue.[indexingConfiguration]¶
Since: version:2.0
Default: pages
Type: Boolean, Array
An indexing configuration defines several parameters about how to index records of a table. By default the name of the indexing configuration is also the name of the table to index.
By setting plugin.tx_solr.index.queue.[indexingConfiguration] = 1
or 0 you can en- / disable an indexing configuration.
queue.[indexingConfiguration].additionalWhereClause¶
Since: version:2.0
Type: String
A WHERE clause that is used when initializing the Index Queue, limiting what goes into the Queue. Use this to limit records by page ID or the like.
// only index standard and mount pages, enabled for search plugin.tx_solr.index.queue.pages.additionalWhereClause = doktype IN(1, 7) AND no_search = 0
queue.[indexingConfiguration].additionalPageIds¶
Since: version:2.0
Type: String
Defines additional pages to take into account when indexing records for example. Especially useful for indexing DAM records or if you have your news outside your site root in a shared folder to use for multiple sites.
Additional page IDs can be provided as comma-separated list.
queue.[indexingConfiguration].table¶
Since: version:2.0
Type: String
Sometimes you may want to index records from a table with different configurations, f.e., to generate different single view URLs for tt_news records depending on their category or storage page ID. In these cases you can use a distinct name for the configuration and define the table explicitly.
plugin.tx_solr.index.queue.generalNews { table = tt_news fields.url = URL for the general news // more field configurations here ... } // extends the general news configuration plugin.tx_solr.index.queue.pressAnnouncments < plugin.tx_solr.index.queue.generalNews plugin.tx_solr.index.queue.pressAnnouncments { fields.url = overwriting URL for the press announcements // may overwrite or unset more settings from the general configuration } // completely different configuration plugin.tx_solr.index.queue.productNews { table = tt_news fields.url = URL for the product news }
queue.[indexingConfiguration].initialization¶
Since: version:2.0
Type: String
When initializing the Index Queue through the search backend module the queue tries to determine what records need to be indexed. Usually the default initializer will be enough for this purpose, but this option allows to define a class that will be used to initialize and add records to the Index Queue in special ways.
The extension uses this option for initializing the pages and more specifically to resolve Mount Page trees so they can be indexd too, although only being virtual pages.
queue.[indexingConfiguration].indexer¶
Since: version:2.0
Type: String, Array
When configuring tables to index a default indexer is used that comes with the extensions. The default indexer resolves the Solr field to database table field mapping as configured. However, in some cases you may reach the limits of TypoScript, when this happens you can configure a specialized indexer using this setting.
The indexer class is loaded using TYPO3's auto loading mechanism, so make sure your class is registered properly. The indexer must extend tx_solr_indexqueue_Indexer.
Example, pages use a specialized indexer:
plugin.tx_solr.index.queue.pages { indexer = tx_solr_indexqueue_PageIndexer indexer { // add options for the indexer here } }
Within the indexer configuration you can also define options for the specialized indexer. These are then available within the indexer class in $this->options
.
Example, the TypoScript settings are available in PHP:
plugin.tx_solr.index.queue.myIndexingConfiguration { indexer = tx_myextension_indexqueue_MyIndexer indexer { someOption = x someOtherOption = y } }
class tx_myextension_indexqueue_MyIndexer extends tx_solr_indexqueue_Indexer {
public function index(tx_solr_indexqueue_Item $item) {
if ($this->options['someOption']) {
// ...
}
}
}
queue.[indexingConfiguration].indexingPriority¶
Since: version:2.2
Type: Integer
Default: 0
Allows to define the order in which Index Queue items of different kinds are indexed. Items with higher priority are indexed first.
queue.[indexingConfiguration].fields¶
Since: version:2.0
Type: Array
Mapping of Solr field names on the left side to database table field names or content objects on the right side. You must at least provide the title, content, and url fields. TYPO3 system fields like uid, pid, crdate, tstamp and so on are added automatically by the indexer depending on the TCA information of a table.
Example:
plugin.tx_solr.index.queue.[indexingConfiguration].fields { content = bodytext title = title url = TEXT url { typolink.parameter = {$plugin.tx_extensionkey.singlePid} typolink.additionalParams = &tx_extenionkey[record]={field:uid} typolink.additionalParams.insertData = 1 typolink.returnLast = url } }
queue.[indexingConfiguration].attachments.fields¶
Since: version:2.5-dkd
Type: String
Comma-separated list of fields that hold files. Using this setting allows to tell the file indexer in which fields to look for files to index from records.
Example:
plugin.tx_solr.index.queue.tt_news.attachments.fields = news_files
queue.pages.indexer.authorization.username¶
Since: version:2.0
Type: String
Specifies the username to use when indexing pages protected by htaccess.
queue.pages.indexer.authorization.password¶
Since: version:2.0
Type: String
Specifies the password to use when indexing pages protected by htaccess.
queue.pages.indexer.frontendDataHelper.scheme¶
Since: version:2.0
Type: String
Specifies the scheme to use when indexing pages.
queue.pages.indexer.frontendDataHelper.host¶
Since: version:2.0
Type: String
Specifies the host to use when indexing pages.
queue.pages.indexer.frontendDataHelper.path¶
Since: version:2.0
Type: String
Specifies the path to use when indexing pages.
Indexing Helpers¶
To make life even easier the Index Queue provides some indexing helpers. These helpers are content objects that perform cleanup tasks or content transformations.
SOLR_CONTENT¶
Since: version:2.0
Cleans a database field in a way so that it can be used to fill a Solr document's content field. It removes HTML markup, Javascript and invalid utf-8 chracters.
The helper supports stdWrap on its configuration root.
Example:
content = SOLR_CONTENT content { field = bodytext }
Parameters:
value¶
Since: version:2.0
Type: String
Defines the content to clean up. In this case the value would be hard-coded.
SOLR_MULTIVALUE¶
Since: version:2.0
Turns comma separated strings into an array to be used in a multi value field of an Solr document.
The helper supports stdWrap on its configuration root.
Example:
keywords = SOLR_MULTIVALUE keywords { field = tags separator = , removeEmptyValues = 1 }
Parameters:
value¶
Since: version:2.0
Type: String
Defines the content to clean up. In this case the value would be hard-coded.
separator¶
Since: version:2.0
Type: String
Default: ,
The separator by which to split the content.
removeEmptyValues¶
Since: version:2.0
Type: Boolean
Options: 0,1
Default: 1
The helper will clean the resulting array from empty values by default. If, for some reason, you want to keep empty values just set this to 0
.
SOLR_RELATION¶
Since: version:2.0
Resolves relations between tables.
Example:
category_stringM = SOLR_RELATION category_stringM { localField = category multiValue = 1 }
Parameters:
localField¶
Since: version:2.0
Type: String
Required
The current record's field name to use to resolve the relation to the foreign table.
foreignLabelField¶
Since: version:2.0
Type: String
Usually the label field to retrieve from the related records is determined automatically using TCA, using this option the desired field can be specified explicitly.
multiValue¶
Since: version:2.0
Type: Boolean
Options: 0,1
Default: 0
Whether to return related records suitable for a multi value field. If this is disabled the related values will be concatenated using the following singleValueGlue
.
singleValueGlue¶
Since: version:2.0
Type: String
Default: |, |
When not using multiValue
, the related records need to be concatened using a glue string, by default this is ", " (comma followed by space). Using this option a custom glue can be specified. The custom value must be wrapped by pipe (|) characters to be able to have leading or trailing spaces.
relationTableSortingField¶
Since: version:2.2
Type: String
Field in an mm relation table to sort by, usually "sorting".
Updated by Stefan Sprenger over 8 years ago · 31 revisions