A common goal and desire for many site administrators and marketing professionals is to have rich, accurate search results for their site content. Titan CMS offers as a core feature the ability to integrate dtSearch as the primary search platform for a site. dtSearch is a full text search engine platform used both for crawling websites, documents, and databases, as well as for retrieving and presenting search results content.
One of the key benefits of dtSearch is its wide-breadth of programmatically configurable settings, making it possible to fine-tune the indexing and searching capabilities of the tool for a number of situations. Titan CMS has been written to expose the configuration of these settings using XML data controlled through the Titan Administration module.
The sections that follow define the advanced settings that can be configured for dtSearch within Titan CMS. In most cases, the naming convention for these attributes follows the naming in the dtSearch API. It is important to note that dtSearch supports other attributes that are not exposed through the configuration in Titan CMS.
In order to modify these configuration options, you will need to work with the Raw dtSearch HTML Config text area. We recommend that you save the current value so you can restore it in case something goes wrong.
Searching & Indexing Options
These are the basic options supported by dtSearch that apply to indexing a search process. These basic configuration values are defaulted in Titan CMS, but can be overridden by specifying different values through the advanced configuration XML. Examine the sample configuration help text to see where these values should be defined.
| AlphabetFile |
Name of dtSearch alphabet file to use when parsing text into words. |
| FuzzyChar |
Character that enables fuzzy searching for a search term (default: "%") |
| Hyphens |
Controls the treatment of hyphens. See the Hyphens Settings options below. |
| IndexNumbers |
If false, any word that begins with a digit will not be indexed. |
| MatchDigitChar |
Wildcard character that matches a single digit (default: "="). |
| MaxStoredFieldSize |
Maximum size of a single stored field. Stored fields are field data collected during indexing that is returned in search results. |
| MaxWordLength |
Words longer than the maxWordLength will be truncated when indexing. The default maxWordLength is 32. The maximum value is 128. |
| NoiseWordFile |
List of noise words to skip during indexing (default: "noise.dat") |
| PhonicChar |
Character that enables phonic searching for a search term (default "#") |
| StemmingChar |
Character that enables stemming for a search term (default: "~"). |
| StemmingRulesFile |
Stemming rules for stemming searches (default: "stemming.dat") |
| TextFieldsFile |
Name of the file containing rules for extraction of field data from text files based on markers in the text. |
| XmlIgnoreTags |
Comma-separated list of tags to ignore when indexing XML |
| FieldFlags |
Flags that control indexing of metadata. See Field Flags options below. |
| IndexingFlags |
Flags that control the indexing job. See Indexing Flags options below. |
| TextFlags |
Flags that control text-processing options. See Text Flags options below. |
Hyphen Settings
These are the options for the Hyphens node. The behavior of each is described below.
| Ignore |
index "first-class" as "firstclass" |
| Hyphen |
index "first-class" as "first-class" |
| Space |
index "first-class" as "first" and "class" |
| All |
index "first-class" all three ways |
Field Flags
These are the Field Flags options that can be entered. Each item should be in a <FieldFlag> element inside the <FieldFlags> node.
| dtsoFfSkipFilenameField |
Do not generate a field named Filename containing the name of the file. |
| dtsoFfSkipDocumentProperties |
Do not index or search document summary fields |
| dtsoFfHtmlShowLinks |
Make HTML links searchable |
| dtsoFfHtmlShowImgSrc |
Make HTML IMG src= attribute searchable |
| dtsoFfHtmlShowComments |
Make HTML Comments searchable |
| dtsoFfHtmlShowScripts |
Make HTML Scripts searchable |
| dtsoFfHtmlShowStylesheets |
Make HTML style sheets searchable |
| dtsoFfHtmlShowMetatags |
Make HTML meta tags searchable and visible, appended to the body of the HTML file |
| dtsoFfHtmlNoHeaderFields |
Suppress automatic generation of the HtmlTitle field for the title and the HtmlH1, HtmlH2, etc. fields for header content in HTML files. |
| dtsoFfOfficeSkipHiddenContent |
Skip non-text streams in Office (Word, Excel, PowerPoint) documents. |
| dtsoFfXmlHideFieldNames |
Do not index field names in XML files |
| dtsoFfShowNtfsProperties |
Make NTFS file properties searchable |
| dtsoFfXmlSkipAttributes |
Do not index attributes in XML files |
| dtsoFfSkipFilenameFieldPath |
Include only the filename (not the path) in the Filename field generated at the end of each document. |
Indexing Flags
These are the Indexing Flags options that can be entered. Each item should be in a <IndexingFlag> element inside the <IndexingFlags> node.
| dtsAlwaysAdd |
Index every document specified in the IndexJob, even if the document is already in the index with the same modification date and size |
| dtsIndexCreateCaseSensitive |
Create a case-sensitive index. Index will treat words with different capitalization as different words. (apple and Apple would be two different words.) |
| dtsIndexCreateAccentSensitive |
Create an accent-sensitive index. |
| dtsIndexCreateRelativePaths |
Use relative rather than absolute paths in storing document locations. |
| dtsIndexResumeUpdate |
Resume an earlier index update that did not complete. (Version 7 indexes only.) |
| dtsIndexCacheText |
Compress and store the text of documents in the index, for use in generating Search Reports and highlighting hits. (Version 7 indexes only.) |
| dtsIndexCacheOriginalFile |
Compress and store documents in the index, for use in generating Search Reports and highlighting hits. (Version 7 indexes only.) |
| dtsIndexCacheTextWithoutFields |
When text caching is enabled, do not cache any fields that were provided through the data source API (in DocFields). |
| dtsIndexKeepExistingDocIds |
Preserve existing document ids following a compression of an index or a merge of two or more indexes (this flag is ignored during merges if the indexes being merged have overlapping ranges of document ids). |
| dtsIndexCreateVersion7 |
Create an index using the version 7 index format. Version 7 indexes are created by default in versions after 7.0, so this flag is no longer needed. |
Text Flags
These are the Text Flags options that can be entered. Each item should be in a <TextFlag> element inside the <TextFlags> node.
| dtsoTfSkipNumericValues |
By default, dtSearch indexes numbers both as text and as numeric values, which is necessary for numeric range searching. Use this flag to suppress indexing of numeric values in applications that do not require numeric range searching. This setting can reduce the size of the index by about 20%. |
| dtsoTfSkipXFirstAndLast |
Suppress automatic generation of xfirstword and xlastword. By default, xfirstword is defined to be the first word in each document, and xlastword is defined to be the last word in each document. These words are generated when an index is created, so this flag must be set during indexing to suppress xlastword and xfirstword. |
| dtsoTfRecognizeDates |
Automatically recognize dates in text as it is indexed. |
Search Settings
These are the basic options supported by dtSearch that apply only to the search process. These basic configuration values are defaulted in Titan CMS, but can be overridden by specifying different values through the advanced configuration XML. Examine the sample configuration help text to see where these values should be defined.
| TimeoutSeconds |
Set to a non-zero value to force the search to halt after a specified time. |
| AutoStopLimit |
Make the search automatically stop when this many documents were found |
| MaxFilesToRetreive |
Limit the maximum size of search results to a specified number of files. |
| SearchStemming |
Enable stemming for all words in the search request |
| SearchAutoTermWeight |
Apply the automatic term weighting to each term in the request. |
| SearchPositionalScoring |
Rank documents higher when hits are closer to the top of the document and when hits are located close to each other within a document. This improves relevancy ranking for "all words" and "any words" searches. |
| SynopsisEnabled |
A Boolean value indicating whether or not to produce the synopsis report. |
| SearchPhonic |
Enable phonic searching for all words in the search request. |
| SearchFuzziness |
If non-zero, the engine will match words that are close to but not identical to a search term. |
| SearchTypeAnyWords |
Find any of the words in the search request. |
| SearchTypeAllWords |
Find all of the words in the search request. |
| SynopsisHeader |
Text to appear at the top of the report. |
| SynopsisFooter |
Text to appear after the end of the report. |
| SynopsisNumberContextBlocks |
Number of blocks of context to include in the report for each document. |
| SynopsisMaxWordsToRead |
Number of words to scan in each document looking for blocks of context to include in the report. |
| SynopsisWordsOfContext |
Number of words of context to include around each hit. |