After a quick foray into Sitecore Search, I decided to jot down a more thorough step-by-step of the things needed to make Sitecore Search from start to finish. All this stuff is probably very well known by Sitecore veterans, but I wanted to get something down for somebody just starting to venture into Sitecore Search.
Note: This post will use Lucene as the Search Provider.
First things first – setting up the index
Before doing any kind of search, you have to make configuration files for the index. We will start with a very basic index that will start the search. For that, you need two basic parts – first, you have to setup what items you are indexing:
<indexConfigurations> <MySearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider"> <indexAllFields>true</indexAllFields> <initializeOnAdd>true</initializeOnAdd> <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" /> <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch"> <fieldNames hint="raw:AddFieldByFieldName"> <field fieldName="_uniqueid" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"> <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" /> </field> <fieldType fieldName="_id" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"> <Analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" /> </fieldType> <field fieldName="category" storageType="YES" indexType="TOKENIZED" vectorType="YES" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"> <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" /> </field> <field fieldName="tags" storageType="YES" indexType="TOKENIZED" vectorType="YES" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"> <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" /> </field> </fieldNames> </fieldMap> <fields hint="raw:AddComputedIndexField"> <field fieldName="customcontent">MyLibrary.CustomContentField, MyLibrary</field> </fields> <include hint="list:IncludeTemplate"> <NewsTemplateID>{B179CB04-3ACC-4737-ADA0-B45D7E98C213}</NewsTemplateID> </include> <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/> <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/> <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/> <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType> </MySearchConfiguration> </indexConfigurations>
Let’s break this down into the different sections:
Starting from the indexConfigurations, we open up a new configuration node. You can name this whatever you want, but note the name of the node, so we can refer to it later.
<MySearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
The next part are some options:
<indexAllFields>true</indexAllFields> <initializeOnAdd>true</initializeOnAdd>
Next comes the reference to the analyzer:
<analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
If you look in the Sitecore.ContentSearch.Lucene.DefaultConfigurations.config file in the App_Config folder, you’ll find that it has default index settings for a bunch of the references that are needed. In your index, you can create your own (if you need your own analyzer) and refer to it here, or you can just refer directly to the one in the default configuration node. The ref attribute points directly to that node.
Next comes all the different fields that should be indexed. Even though we have a true as an option, what this will do is index all the text in a field called _content, but if you want separate fields to refer to and search on, you’ll need to add them here.
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
This is one field you have to have – this assigns a unique ID to the document in the index, so when the item in Sitecore gets updated, it doesn’t add a new document to the index – instead it just updates it. In some older versions, this is not needed.
<field fieldName="_uniqueid" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"> <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" /> </field>
You’ll need to choose the indexType and storageType for each field:
storageType = “YES” or “NO“: pretty straightforward – in the sense that the value of the field is either stored in the index, or not. This is useful for when you don’t want to go back to the database to retrieve the item for values that you want to display.
indexType = “TOKENIZED” or “UN-TOKENIZED” or “NO” or “NO_NORMS”
- TOKENIZED: Any phrases with multiple words will be split up
- UN-TOKENIZED: Phrases will be stored as a whole – the entire value of the field, essentially
- NO_NORMS: Phrases will not be split up, same as UN-TOKENIZED and also will not be analyzed, which means that it won’t store boost factors.
- NO: The field value won’t be searchable, and the only reason to have this option is if you have storageType = “YES”, so you can retrieve the value.
If you have indexAllFields set to true, you don’t need to specify the fields – however, if you want to refer to the fields directly as members (for custom search result classes), they need to be added.
You can add fields by name, by type, or exclude them by name or by type (in our example, we added by name):
<fieldNames hint="raw:AddFieldByFieldName">
To add by type:
<fieldTypes hint=”raw:AddFieldByFieldTypeName”>
Include fields, or exclude fields:
<include hint=”list:IncludeField”> <fieldId>{B179CB04-3ACC-4737-ADA0-B45D7E98C213}</fieldId> </include>
OR
<exclude hint=”list:ExcludeField”> <fieldId>{B179CB04-3ACC-4737-ADA0-B45D7E98C213}</fieldId> </exclude>
The next section is computed fields. Computed fields are great for when a field value possibly points to another field, or multiple fields, and you have to derive a value based on some specific logic. It’s also useful for if you want have related items of some sort. You can calculate the related documents’ values on the fly and add it as a one-to-one field with the document. I’ll get into this in Part 2.
There are a bunch of fields that get added by Sitecore regardless of your config:
- _content
- _created
- _creator
- _database
- _datasource
- _displayname
- _editor
- _fullpath
- _group
- _indexname
- _language
- _latestversion
- _name
- _parent
- _path
- _template
- _templatename
- _updated
_ - version
I called out _latestversion because this will be important when you do the searches – when you have multiple versions of the same item, it gets indexed as separate documents, so when you search, you have to make sure you get the latest one. This only really matters on the CM server for previewing, because the web database always only has one version always.
<fields hint="raw:AddComputedIndexField">
Next step is to add the type of templates you want to index. The node name doesn’t really matter – you can name it anything, just include the GUID of the template in the node.
<include hint="list:IncludeTemplate"> <NewsTemplateID>{B179CB04-3ACC-4737-ADA0-B45D7E98C213}</NewsTemplateID> </include>
Alternatively, you can choose to include all template, and put a directive to exclude the templates you don’t want to index.
<Exclude hint=”list:ExcludeTemplate”> <NewsTemplateID>{B179CB04-3ACC-4737-ADA0-B45D7E98C213}</NewsTemplateID> </include>
And then, you have to define some other values, such as the field readers, valueformatters, document property mappers, and builder type. You can point them all to the default config node.
<fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/> <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/> <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/> <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
Once you’ve setup what you are indexing, next is to step define the actual index:
<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch"> <indexes hint="list:AddIndex"> <index id="my_index_name" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider"> <param desc="name">$(id)</param> <param desc="folder">$(id)</param> <!-- This initializes index property store. Id has to be set to the index id --> <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" /> <configuration ref="contentSearch/indexConfigurations/MySearchConfiguration" /> <strategies hint="list:AddStrategy"> <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/rebuildAfterFullPublish" /> </strategies> <commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch"> <policies hint="list:AddCommitPolicy"> <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" /> </policies> </commitPolicyExecutor> <locations hint="list:AddCrawler"> <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch"> <Database>master</Database> <Root>/sitecore/content/User Content/Site Level/Mindshift/Resources</Root> </crawler> </locations> <enableItemLanguageFallback>false</enableItemLanguageFallback> <enableFieldLanguageFallback>false</enableFieldLanguageFallback> </index> </indexes> </configuration>
This starts out by naming the index:
<indexes hint="list:AddIndex"> <index id="my_index_name" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
Some requisite parameters for index locations, etc.
<param desc="name">$(id)</param> <param desc="folder">$(id)</param>
Next thing to note is the reference provided to what the index should store. Here is where we point the ref attribute to the index configuration we made earlier:
<configuration ref="contentSearch/indexConfigurations/MySearchConfiguration" />
Next is the add strategy section – this section defines how the indexes are updated, for both CM and CD. Essentially, it defines how/when indexes are updated when items are added/updated. For basic indexing, I’ve added rebuildAfterFullPublish which will rebuild the index on all remote servers after a publish.
<strategies hint="list:AddStrategy"> <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/rebuildAfterFullPublish" /> </strategies>
For all the different index update strategies, go here: http://bit.ly/2h6FDCv
The next important part is the crawler. We will is the default crawler – there are many implementations of crawlers out there, and if you have a need to index your items in a very specific way, you can inherit from the default crawler and build upon it. In which case, that is the crawler type you would specify here.
<locations hint="list:AddCrawler"> <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch"> <Database>master</Database> <Root>/sitecore/content/User Content/Site Level/Resources</Root> </crawler> </locations>
You also have to specify the root of the content tree where indexing will start. The crawler will traverse from there.
<locations hint="list:AddCrawler"> <Root>/sitecore/content/User Content/Site Level/Resources</Root>
Last but not least, you need to surround both of these with:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> <sitecore> <contentSearch>
Once this is done, you can deploy and you should see your indexes in the /data/indexes folder. You can use SPE (Sitecore Powershell Extensions) or a program like Luke Index Viewer to check the indexes and the fields being indexed.
Before you deploy to CM and CD, you must make sure that you follow the configuration file setup, as it has a bunch of indexes that need to be disabled for CD – if they aren’t disabled, errors get thrown, and interferes with your custom indexes. Go here for the configuration options: http://bit.ly/2fYtJ8y
In Part 2, we’ll get into the code on how to perform basic searches.