The sections below discuss options available in Titan CMS Administration for Search Indexes using dtSearch in your installation of Titan.
Stemming extends a search to grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply.
There are two ways to add stemming to your searches:
- Under Configurable Items, check the Stemming box to enable stemming for all of the words in a search request.
- If the Stemming box is unchecked, a user can add stemming to their request selectively by adding a tilde (~) at the end of words that they want stemmed in a search. Example: apply~
Stemming does not slow searches noticeably and is almost always helpful in making sure you find what you want.
The stemming rules included with dtSearch are designed to work with the English language. These rules are in the file STEMMING.DAT (in multi-server installs, this will most likely be on the Titan Application server). If you need to implement stemming for a different language, or if you want to modify the English stemming rules that dtSearch uses, you can create a new set of stemming rules to be used in place of STEMMING.DAT. Contact Titan Support for more information about how this can be modified.
Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe.
There are two ways to add phonic searching to your searches:
- Under Configurable Items, check the Phonic Searching box to enable phonic searching for all of the words in a search request.
- If the Phonic Searching box is unchecked, a user can search for a word phonically by putting a number sign (#) in front of the word in your search request. Examples: #smith, #johnson
Enabling phonic searching affects all of the words in a given search request. Phonic searching is somewhat slower than other types of searching and tends to make searches over-inclusive, so it is usually better to use the # symbol to do phonic searches selectively.
By default, dtSearch indexes numbers both as text and as numeric values, which is necessary for numeric range searching.
There are two ways to modify the Titan configuration of the search index to affect indexing of numbers.
- Under Configurable Items, check the Index Numbers box to enable indexing of numerical values as both text and numeric values. If this box is not checked, any word that begins with a digit will not be indexed.
- In the Advanced Configuration XML, add a TextFlag node with the dtsoTfSkipNumericValues property to suppress indexing of numeric values in applications that do not require numeric range searching. This setting can reduce the size of the index by about 20%.
Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors, or for text that has been scanned using optical character recognition (OCR).
There are two ways to add fuzziness to searches:
- Set the Fuzzy Searching value to a number greater than 0. You can adjust the level of fuzziness from 1 to 10. The higher the level of fuzziness, the more differences dtSearch will permit when matching words, and the closer these differences can be to the start of the word.
- If the Fuzzy Searching value is zero, a user can add fuzziness selectively using the % character. The number of % characters added determines the number of differences dtSearch will ignore when searching for a word. The position of the % characters determines how many letters at the start of the word have to match exactly.
ba%nana – The word must begin with ba and have at most one difference between it and banana.
b%%anana – The word must begin with b and have at most two differences between it and banana.
When configuring the default fuzziness, we recommend that the value 4 or less. By setting the value too high, searches become over-inclusive and the results are less relevant.
Search String Matching
dtSearch provides three ways to handle search requests – Boolean, All words, and Any words. Titan CMS exposes these as radio options on the Index configuration screen so that you can choose the most appropriate method. The default syntax for a dtSearch search request is Boolean. The "All words" and "Any words" syntaxes are alternatives to the default Boolean syntax for dtSearch queries.
Depending on the goal for your site searches, we recommend the use of “All words” or “Any words” as this will be more like the syntax used on some internet search engines. By using one of these alternatives, the need for Boolean connectors is eliminated, and instead lets users enter a search request as a list of words or quoted phrases.
Example: "first class mail" postage
- In an "All Words" search, this would be equivalent to the Boolean search:
first class mail and postage
- In an "Any Words" search, this would be equivalent to the Boolean search:
first class mail or postage
Max Documents to Return
dtSearch provides a way to limit the maximum number of search results that are returned for a request. Titan CMS exposes a text field on the Index Configuration screen so you can control the output of dtSearch.
When Max Documents to Return is non-zero, it controls the maximum number of items that can be rendered after a search. The most relevant documents from all matching documents in the index will be included in the search results.
The Max Documents to Return value is used in conjunction with an Advanced Configuration value called AutoStopLimit, which determines the number of items dtSearch is allowed to find before ranking by relevance. For example, if you set Max Documents to Return = 10 and AutoStopLimit = 5000, then the search results will contain the 10 most relevant documents from the first 5000 found. Documents after the first 5000 found will not be considered, because AutoStopLimit = 5000 will force the search to halt after 5000 matching documents are found.
The default value configured in Titan is 200. We recommend considering a lower value for this setting, perhaps 20-50. This reduces the potential performance impact of rendering results in Titan, which has the added overhead of confirming the security settings of each item.
In addition, consider setting the AutoStopLimit value in the Advanced Configuration XML to a very large number like 20,000. This may seem over the top, but dtSearch is extremely fast at retrieving these results. We want to make sure that the items rendered are the most relevant throughout all your pages and files. Depending on how many pages and files make up your site, keeping this value too low may exclude some of your content from being selected and returned in the results.
Modifications to this section of the configuration screen should be made very carefully. Please refer to the help text in Titan regarding changes to this field. Additionally, you can refer to the Advanced Configuration Knowledge Base article for greater detail on the available properties exposed using this raw XML configuration.