Managing Full Text Search Indexes
Manager Contents
User Contents

Contents

Introduction
Settings
History and Statistics
Indexable File Formats

Introduction

Full Text Search indexes are the critical foundation of the Full Text Search feature. These indexes are not the same as traditional database table indexes, and are actually stored outside the database in a file share. These indexes do not update in real time when data changes, and must instead be created and updated via Full Text Indexing Jobs.

To manage Full Text Search indexes, click the Full Text Search Management link in the Special Functions section of the Manage page, then click the Manage Full Text Indexes link. Since Full Text Index records are actually subcomponent records, they may also be accessed via the Component Management link. You can choose to limit who can modify the indexes by specifying ownership. For more information on planning and installing the Full Text Search software and infrastructure, refer to Installation Guide for CATSWeb Full Text Search.

Back to Top


Settings - Full Text Search Index records have a variety of settings that are divided into sections. A setting or section of settings may or may not apply to a particular index, as determined by the Index Type. Non-applicable settings are typically either hidden or made invisible at runtime. The settings are:

Accent Sensitive Excluded Files Process As
Binary Files In Search Form Search Provider
Case Sensitive Include Filters SQL Index Restriction
Component Name Index Name Subcomponent Type
Data Link Index Type Virtual Path to Files
Description Physical Path to Files

Accent Sensitive - Determines whether or not accents on characters are treated as significant during searches. Primarily applies to data in languages such as Spanish, French, etc. which make use of accents.

Binary Files - This setting applies to CATSWeb File Attachment (live and archived) indexes, Files Indexes, and Custom Data Indexes that may process file data via Binary Large Object (BLOB) fields. When the indexing engine processes file data, it analyzes the file format to determine the file type (MS Word document, PDF document, Excel Spreadsheet, etc.). If it is a known indexable type, the file is indexed. If it is an unknown type, it is considered to be a Binary File. This setting determines how such Binary Files are indexed:
  • Do Not Index - The binary file is ignored and not indexed.
  • Filter Text - A filtering algorithm is applied to the binary contents to extract data that appears to be text, which is then indexed. The remaining data is not indexed.
  • Index As Plain Text - All data in the binary file is considered to be text and indexed as-is.


Case Sensitive - Determines whether or not words are indexed in a case-sensitive manner (e.g., "apple" and "Apple" would be considered two different words). For ease of searching, AssurX recommends that you do not enable this setting.

Component Name - A read-only value of "Full Text Indexes", which is the parent component for all Full Text Index subcomponent records.

Data Link - Used only in Custom Data indexes, this specifies the name of the data link that provides the records to be indexed. For more information on creating Custom Data indexes, contact AssurX Engineering Support.

Description - A description of the index.

Excluded Files - Used only in Files and CATSWeb Documentation indexes, this setting may be used to specify a list of file names to be excluded from the index. Multiple file names must be delimited with the pipe ("|") character. Note that in Files and CATSWeb Documentation indexes, Zip files are either included or excluded based on the name of the Zip file, and not based on the individual filenames that may exist inside the Zip file. In CATSWeb Data and Custom Data indexes (where this setting does not apply), each file within a Zip file is evaluated and processed independently.

In Search Form - Determines whether or not an index appears in the list of indexes within an Advanced Search form.

Include Filters - Used only in Files and CATSWeb Documentation indexes, this setting may be used to specify a list of DOS file filters that define which files are included in the index. If none are specified, then all files from the Physical Path to Files location are included. DOS file filters are expressions such as *.doc, *.*, ASpecificFileName.dat, etc. Multiple filters must be delimited with the pipe ("|") character. Note that in Files and CATSWeb Documentation indexes, Zip files are either included or excluded based on the name of the Zip file, and not based on the individual filenames that may exist inside the Zip file. In CATSWeb Data and Custom Data indexes (where this setting does not apply), each file within a Zip file is evaluated and processed independently.

Index Name - The name of the index as it appears in Advanced Search forms. The name is read-only for standard indexes provided by AssurX. The Index Name is also used as the name of the subdirectory where index files are stored, so it must not contain special characters that are illegal in Windows directory names. The index name cannot be changed once it is established. If renaming an index is necessary, create a new index with the new name and delete the old one.

Index Type - The type of index. One of the following values:

  • CATSWeb Data - A standard index provided by AssurX that typically maps to a single table or record type in CATSWeb. You may not create new CATSWeb Data indexes.
  • CATSWeb Documentation - A standard index provided by AssurX that maps to CATSWeb documentation files, such as the Help Files index. You may not create new CATSWeb Documentation indexes, but may create new Files indexes, which are equivalent.
  • Custom Data - A creatable custom index that indexes records provided by CATSWeb Data Links. Since CATSWeb Data Links can access data from any source, this allows extensibility of CATSWeb Full Text Search features to any data source. For more information on creating Custom Data indexes and their associated data link implementations, contact AssurX Engineering Support.
  • Custom Search Provider - A creatable "index" that allows searches to be extended to external search engines such as Google, or one that might be embedded in some other enterprise software application. Custom Search Providers are implemented via ActiveX DLLs or Web Services. Since these are not true "indexes" that are maintained by CATSWeb, they don't need to be included in Full Text Indexing Jobs. The actual indexes will be maintained by the search engine that the Customer Search Provider interfaces with.

    Users may select Custom Search Provider "indexes" in Advanced Search forms, along with any other standard or custom indexes, to execute a single search that spans across multiple systems. The results from the Custom Search Provider are combined with the results from other index searches, ordered per the Advanced Search settings, and presented to the user in an integrated Search Results listing.

    For more information on creating Custom Search Provider indexes and their associated Custom Search Provider implementations, contact AssurX Engineering Support
    .
  • Files - A creatable index that allows files residing outside the CATSWeb database (i.e. not CATSWeb File Attachments) to be indexed. A number of common file formats can be automatically recognized and optimally indexed, while other file types are processed as Binary Files. A typical application for a Files index would be to index and make searchable your company's Standard Operating Procedures (SOPs) that relate to CATSWeb usage.

Physical Path to Files - Used only in Files and CATSWeb Documentation indexes, this setting is used to specify the physical path to the folder/directory where the files exist. The path specification cannot use a UNC path, and must instead use a drive letter. In CATSWeb systems with multiple servers, this will typically be a mapped drive letter. And when the CATSWeb system includes multiple servers, the path must be valid for all servers (i.e. the same mapping must exist on all of the servers.

The special "{CATSWeb}" token may be used to specify a path relative to the CATSWeb installation directory. For example, AssurX pre-configures the standard Help Files index with a setting of "{CATSWeb}\Doc" to point to the Doc subdirectory of the CATSWeb installation directory, regardless of where the CATSWeb directory physically exists.

Process As - Used only in Custom Data indexes, this setting should be left blank when the index processes truly custom data from outside the CATSWeb system. If the custom data is actually a subset of CATSWeb data, a CATSWeb data type may be specified. The data will be indexed as if it were the specified type.

For example, suppose a CATSWeb system contains 10 Subtask forms, but a custom data index is being created to provide targeted searches in only two of the forms. The index's Data Link might use this SQL to select the records: Select * From Subtasks Where Category='Form 1' Or Category='Form 2'. Selecting "Subtasks" as the Process As value will allow CATSWeb to process the data just as it would do for its standard Subtasks index. The Full Text Index setting on Field Definition pages would be respected, URLs pointing to the Subtask would be provided in the Search Results, etc. If the Process As value were instead left blank, CATSWeb would assume an unknown data type and index all fields in the recordset. Links would not be provided in the search results because no URL data is provided in that recordset.

For more information on creating Custom Data indexes, contact AssurX Engineering Support.

Search Provider - This setting applies only to Custom Search Provider indexes and is the name of the ActiveX or Web Services data link that defines the interface to the Custom Search Provider.

SQL Index Restriction - This setting applies to CATSWeb Data indexes and may be used to limit the set of records that are indexed. It is specified by using SQL to define part of the SQL Where clause that is used by CATSWeb Indexing Server when it retrieves the records for indexing. The format is similar to that used in Record Access Restrictions in Employee or Personality records, except that text tokens such as "{My Name}" may not be used.

The field names in the SQL Index Restriction must be the actual table field names from the index's underlying data table (ex: from the "Notes" table if defined for the "Notes" index). To determine the table field names, refer to the field definition listings for Issue, Action, Subtask or Subform forms, or refer to CATSWeb Database Schema Documentation for other index types.

Quick Tip - Useful Fields Present in All CATSWeb Tables

The UDDateCreated (date/time of record creation) and UDDateEdited (date/time of last edit) fields exist and are used consistently in all CATSWeb tables and are good candidates for SQL Index Restrictions. For example, you may wish to only index records that were created during the current and prior calendar year. This SQL Index Restriction will accomplish that, assuming a current year of 2004:

UDDateCreated >= '1/1/2003'

Subcomponent Type - A read-only value of "Full Text Index", which is the type of subcomponent used for all Full Text Index records.

Virtual Path to Files - Used only in Files and CATSWeb Documentation indexes, this optional setting is used to specify the virtual path to the folder/directory where the files exist. The virtual path is used to provide a navigable link to the file when it appears in search results. The virtual path may either be specified as a path relative to the CATSWeb virtual directory, or via a fully-qualified URL path (ex: http://www.ourcompany.com/ourindexedfiles). If you omit the setting, the indexed files can still appear in search results, but no link to the files will be provided.

The special "{CATSWeb}" token may be used to substitute for the virtual path to CATSWeb. For example, AssurX pre-configures the standard Help Files index with a setting of "{CATSWeb}/Doc" to point to the Doc subdirectory of the CATSWeb virtual directory, regardless of what the CATSWeb virtual directory actually is.

Back to Top

History and Statistics

The History and Statistics section provides a variety of read-only historical data regarding the index and the operations performed on them by Full Text Indexing Jobs. This information may be useful for planning and adjusting indexing jobs, and in diagnosing and troubleshooting problems. The following comments may help to improve interpretation (and prevent misinterpretation) of these values:

  • The Items Processed value may be entirely inaccurate following a Compress job, since compression processes the index as a whole, rather than processing individual items.
  • The Items Processed value may not necessarily reflect the total number of records or files processed. For example, an Update job may run on an index associated with a table of 30,000 records. If the indexing server is able to pre-filter this recordset down to only 200 records that have actually changed, which can significantly reduce the time required for processing, Items Processed may instead reflect the filtered value.
  • An Exclusive Update is an ongoing indexing job that needs exclusive access to the index, thereby preventing the index from being used in searches during that time.

Back to Top

Indexable File Formats

When indexing file data from CATSWeb File Attachment (live and archived) indexes, Files indexes, CATSWeb Documentation indexes and Custom Data Indexes that may process file data via Binary Large Object (BLOB) fields, CATSWeb Indexing Server can recognize a number of common file formats. When the file format is recognized, indexing is optimized and the Item column in search results will have the best possible fidelity. When the file format is not recognized, the Binary Files setting determines how (and if) the file is indexed.

File formats that are recognized by CATSWeb Indexing Server include:

  • Adobe Acrobat (PDF) all versions through version 6, except for older files that may use LZW encoding (LZW support has been excluded due to software patent by UNISYS).
  • Ami Pro
  • ANSI Text
  • Comma-separated Values (CSV, hit highlighted HTML views are not supported)
  • HTML
  • Microsoft Excel (through Excel 2003)
  • Microsoft PowerPoint 97, PowerPoint 2000, PowerPoint XP, PowerPoint 2003
  • Microsoft Rich Text Format (RTF)
  • Microsoft Word for DOS
  • Microsoft Word for Windows (through Word 2003)
  • Microsoft Works
  • Multimate Advantage II
  • Multimate Version 4
  • Unicode
  • WordPerfect (all versions from 5.0 through WordPerfect 2002)
  • WordStar versions 4, 5 and 6
  • XBase (including FoxPro, dBase and other XBase-compatible formats)
  • XML
  • ZIP versions 1.x and 2.0 (each file in a ZIP archive will be indexed and searchable as a separate document, but hit highlighted HTML views are not supported)