Advanced Search
User Guide Contents
Managing Full Text Search

Contents

Introduction
Quick Tip - Search Indexes Don't Update Immediately
Quick Start
Advanced Search Parameters
Special Characters in Searches
Noise Words
Searching in Specific Fields
Results Ordering
Output Options
Save Options
Search Results
Hit Highlighting
Exporting Search Results


Introduction

The Advanced Search feature enables full text searches to be performed against a variety of record, file and data types using advanced search criteria. Advanced Searches can even search inside attached files, and into external files, such as CATSWeb Help files. Unlike Queries and Filters which can only query against one type of record at a time, an Advanced Search can span across many different data types. When you enter an Advanced Search, you may use either natural language syntax, which is specified in the Search Request box, or a boolean expression syntax which is specified in the Boolean Restrictions box. You may also combine these two techniques together for more powerful searches.

Simple searches performed from CATSWeb page headers are really simplified forms of Advanced Searches, in which the search terms are used in conjunction with the settings from a saved Advanced Search template. Your search terms are used for the Search Request parameter. If you have been granted permission to do so, you may create your own Advanced Search template by privately saving an Advanced Search, then selecting it as your personal search template on the My CATSWeb page.

Quick Tip - Search Indexes Don't Update Immediately

Unlike the database indexes used by CATSWeb Queries and Filters, the search indexes don't update immediately when CATSWeb data is added or changed. For example, if you enter a new record with the phrase "Quick brown foxes are super fast!", then immediately try to search for those words, the new record won't be found. Search indexes are updated periodically based on the schedule(s) that your CATSWeb administrator puts in place.

Back to Top


Quick Start

Do the following to try your first Advanced Search:

  • Click the Advanced (search) link in the page header to open the Advanced Search form.
  • Choose a Search Type (find any words or find all words).
  • Choose the Indexes to Search. This is a multi-select list. Hold down the <Ctrl> key to select multiple indexes, or hold down the <Alt> key to select a range of indexes.
  • Enter your word or words in the Search Request box.
  • Click Submit.

Back to Top


Advanced Search Parameters

The following parameters are available in an Advanced Search:

Boolean Restrictions Phonic Searching Stemming
Columns in Results Return Most Recent Synonym Searching
Fuzziness Search Request
Indexes to Search Search Type
Maximum Results Search Stop Limit


Boolean Restrictions - Boolean Restrictions provide the means of searching for information using a structured language that is functionally similar to SQL, instead of (or in addition to) entering natural language searches using the Search Request parameter.

A boolean search request consists of a group of words, phrases, or macros linked by connectors such as AND and OR that indicate the relationship between them. Here are some examples:

Boolean Restriction Meaning
apple and pear Both words must be present.
apple or pear Either word can be present.
apple w/5 pear apple must occur within 5 words of pear.
apple pre/5 pear apple must occur 5 or fewer words before pear.
apple not w/5 pear apple must not occur within 5 words of pear.
apple and not pear

only apple must be present. If pear is also present, a match will not occur.
Category contains Validation The field named Category must contain the word Validation.
Form_Name contains Validation The field with a caption of Form Name must contain the word Validation (spaces in captions are replaced with underscores).
apple w/5 xfirstword apple must occur in the first 5 words
apple w/5 xlastword apple must occur in the last 5 words


If you use more than one connector, you should use parentheses to indicate precisely what you want to search for. For example, apple and pear or orange juice could mean (apple and pear) or orange, or it could mean apple and (pear or orange).


Words and Phrases

In a boolean search, you do not need to use any special punctuation or commands to search for a phrase. Simply enter the phrase the way it ordinarily appears. You can use a phrase anywhere in a boolean restriction. For example:

apple w/5 fruit salad


Noise Words

Noise words, such as if and the, are ignored in searches. If a phrase contains a noise word, CATSWeb will skip over the noise word when searching for it. For example, a search for statue of liberty would retrieve any item containing the word statue, any intervening word, and the word liberty. Punctuation inside of a search word is treated as a space. Thus, can't would be treated as a phrase consisting of two words: can and t. 1843(c)(8)(ii) would become 1843 c 8 ii (four words).

The use of noise words in searches may produce results that you do not expect. For example, if you search for the phrases the problem and my problem in separate searches, you might expect each search to only return items containing those specific phrases. However, each search would actually return identical results since each is really just a search for problem (the and my are noise words).

CATSWeb system administrators can tailor the noise word list for the entire system, or for individual indexes within the system. Refer to Installation Guide for CATSWeb Full Text Search for more information.


Searching In Specific Fields

You may use boolean restrictions (or a pair of colons in the Search Request) to restrict the search results based on information inside specific fields. You may specify the field using either its table field name or its caption. If you use the caption, special characters within the caption must be removed (parens, colons, apostrophes, etc.) and spaces in the caption must be replaced with underscore ("_") characters. This means that an actual caption of "Action Title (Summary):" would be stated as "Action_Title_Summary" in searches.

For example, assume that the Category table field has a caption of "Form Name". You may restrict search results to only a particular form ("Basic Action Form" in this example) using either of the following boolean restrictions:

Category Contains "Basic Action Form"
Form_Name Contains "Basic Action Form"

The Boolean Restriction may also be more elaborate, such as this example:

(Form_Name Contains "Basic Action Form" And Investigation_Notes Contains "resolve immediately") Or (Form_Name Contains "Basic Issue Form" And Problem_Description Contains "critical problem")

Note that the CATSWeb Category field is special, in that the Category field value from the parent Action, Issue or Subtask record is included in the index entries for associated child records. This means that either of the boolean restrictions in the first example above will also limit the Notes, File Attachments, Signatures, etc. to ones for "Basic Action Form" records. However, Subform records have their own Category field, with a separate ParentCategory field. If the intent is to limit records to a particular Issue, Action or Subtask category, including Suform records, a restriction such as this one may be used:

Category Contains "Basic Action Form" Or ParentCategory Contains "Basic Action Form"

For all other fields, a boolean restriction will match that field only in the record types where the field actually exists. For example, if you define a boolean restriction based on the UDUserCreated field, the search will match the actual value in the record, regardless of the parent or child relationship (UDUserCreated is itself somewhat special in that it exists in all CATSWeb record types). And if you define a boolean restriction based on the field InvestigationNotes, the search will only match CATSWeb Action records, since that field is only present in Action records.


AND Connector

Use the AND connector in a boolean restriction to connect two expressions, both of which must be found in any document retrieved. For example:

apple pie and poached pear would retrieve any item that contained both phrases.

(apple or banana) and (pear w/5 grape) would retrieve any document that (1) contained either apple OR banana, AND (2) contained pear within 5 words of grape.


AndAny Connector

AndAny lets you combine a search for required search terms with other terms that are optional. The words before AndAny are required, and the words after AndAny are optional. For example:

(apple and pear) AndAny (grape or banana) would find any word that contains apple and pear, and grape and banana will also be counted as hits.


OR Connector

Use the OR connector in a search request to connect two expressions, at least one of which must be found in any document retrieved. For example, apple pie or poached pear would retrieve any document that contained apple pie, poached pear, or both.


W/N Connector

Use the W/N connector in a search request to specify that one word or phrase must occur within N words of the other. For example, apple w/5 pear would retrieve any document that contained apple within 5 words of pear. Some types of complex expressions using the W/N connector will produce ambiguous results and should not be used. The following are examples of ambiguous boolean restrictions:

(apple and banana) w/10 (pear and grape)
(apple w/10 banana) w/10 (pear and grape)

In general, at least one of the two expressions connected by W/N must be a single word or phrase or a group of words and phrases connected by OR. For example

(apple and banana) w/10 (pear or grape)
apple and banana) w/10 orange tree

Two built-in words are provided to designate the beginning or ending of a file or record. They are xfirstword and xlastword. The terms are useful if you want to limit a search to the beginning or end of a file. For example, apple w/10 xlastword would search for apple within 10 words of the end of a document

The pre/N connector is like W/N, but it also requires that the first expression must occur before the second. For example, (apple or pear) pre/5 banana means that apple or pear must occur before banana and within 5 words of it.


NOT and NOT W/N Connector

Use NOT in front of any search expression to reverse its meaning. This allows you to exclude items from a search. For example:

apple sauce and not pear

NOT standing alone can be the start of a search request. For example, not pear would retrieve all documents that did not contain pear. If NOT is not the first connector in a request, you need to use either AND or OR with NOT:

apple or not pear
not (apple w/5 pear)

The NOT W/ ("not within") operator allows you to search for a word or phrase not in association with another word or phrase. For example:

apple not w/20 pear

Unlike the W/ operator, NOT W/ is not symmetrical. That is, apple not w/20 pear is not the same as pear not w/20 apple. In the apple not w/20 pear request, CATSWeb searches for apple and excludes cases where apple is too close to pear. In the pear not w/20 apple case, CATSWeb searches for pear and excludes cases where pear is too close to apple.



Numeric Range Searching

A numeric range search is a search for any numbers that fall within a range. To add a numeric range component to a boolean restriction, enter the upper and lower bounds of the search separated by the special characters ~~ like this:

Part w/2 20~~22

This would find any item containing Part within 2 words of a number between 20 and 22 (such as 21 CFR Part 11). Note that:

  • A numeric range search includes the upper and lower bounds (so 20 and 22 would be retrieved in the above example).
  • Numeric range searches only work with positive integers.
  • For purposes of numeric range searching, decimal points and commas are treated as spaces and minus signs are ignored. For example, -123,456.78 would be interpreted as: 123 456 78 (three numbers).
  • Numeric range searches are best used against documents, file attachments or when the number and word(s) occur within the same field.

Columns in Results - Choose the columns that you wish to appear in the search results. Unlike a query or filter where the columns are the actual field values from a record, columns in search results may not be data from the records at all. For example, Relevance is a percentage ranking of the item relative to other items in the search results, based on your search criteria.

When you open a new Advanced Search form, CATSWeb defaults the column selections to the ones that are most useful for the majority of searches. Use the <Ctrl> key to select multiple columns, or use the <Alt> key to select multiple columns in a range.

Fuzziness - Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors, or for text that has been scanned using optical character recognition (OCR).

If you choose a non-zero fuzziness value, fuzzy searching will be enabled for all of the words in the Search Request. Higher values cause more fuzziness to be applied. High fuzziness values are typically not a good idea, as your search will probably find much more information than you desired. You may also specify fuzzy searching for individual words by choosing a value of zero (0) and using the "%" special character as described in the Search Request topic.


Indexes to Search - Select the indexes that you wish to search within. An index typically corresponds to a collection of records or files, such as a CATSWeb table. Use the <Ctrl> key to select multiple indexes, or use the <Alt> key to select multiple indexes in a range.


Maximum Results - Use this setting to limit the number of items returned in the search results. A value of zero (0) means that there is no limit. The search engine will not stop searching after the limit is reached. Instead, it will complete the search and collect the best matching items among all of the items retrieved. Note that your administrator may have specified a system-wide maximum for this value. If a system-wide maximum exists, your setting will be ignored if it is greater than the maximum. See also: Search Stop Limit

Phonic Searching - Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe. Checking this box will apply phonic searching to all words in your Search Request, You may also add the "#" special character to the beginning of individual words to apply phonic searching to just some words (Ex: #bear).

Return Most Recent - This setting has no effect unless the search process is terminated early due to a Search Stop Limit threshold being reached. In that case, the setting has the effect of skewing the search toward more recent items, rather than items with the best relevancy. This can in turn change the set of results returned. The setting has no effect on how search results are ordered, since CATSWeb sorts (orders) the search results after the search engine returns them. Leaving the box unchecked is best for most searches.


Search Request - Enter your main natural language search expression here, or enter nothing and use Boolean Restrictions instead. You may use words, quoted phrases, tokens, special characters or expressions. Note that when you perform a simple search from the CATSWeb page header, you are entering this parameter, and may use simple words or any of the techniques described here.

Just as in CATSWeb Queries and Filters, you may enter text tokens to represent values from the user's session record who executes the search. Terms in your search request may also use the following special characters:

Character Meaning
? A wildcard that replaces a single character (matches any character in that position). Ex: appl? matches apply and apple but not apples.
* A wildcard that replaces any number of characters. Ex: ap*ed matches applied, approved, etc. *cipl* matches principle, participle, etc. appl* matches apple, apples, application, etc.
% Specifies that a Fuzzy search should be performed on that word. Ex: ba%nana matches words that begin with ba and have at most one difference between it and banana. Entering b%%anana matches words that begin with b and have at most two differences between it and banana.

The Fuzziness parameter should be set to 0 when performing fuzzy searches on individual words using this character. Non-zero values automatically fuzziness to all words in the search.
# Used at the beginning of a word, specifies that a Phonic search should be performed on that word. Ex: #bear will match bear and bare.
~ Used at the end of a word, specifies that stemming should be applied to the word. Ex: fish~ will match fish, fishing, fished, etc.
+ and - Use plus (+) sign to designate words or phrases that are required to exist in matched records. Use the minus (-) sign to designate words or phrases that must not exist in matched records. Ex:

"apple pie" -salad +"ice cream"

finds records that contain the phrases "apple pie" and "ice cream" but do not contain the word salad.
:: Use a pair of colons (::) to specify a search within a particular field in the search request. Ex:

Criticality_Of_Problem::High

finds records that contain "High" in fields with captions of "Criticality Of Problem". Note that special characters within the caption must be removed (parens, colons, apostrophes, etc.) and spaces in the caption must be replaced with underscore characters. You may also use the field's actual table field name, rather than the caption. For example, if the "Criticality Of Problem" field uses the Text1 table field on all forms, the following search request is equivalent:

Text1::High
## Used to specify a Regular Expression. Regular Expression searching provides a way to search for advanced combinations of characters. A regular expression included in a search request must be quoted and must begin with ##. Ex:

Apple and "##200[4-5]"
Apple and "##20[0-9]+"


A regular expression must match a single whole word. For example, you could not search for "apple pie" with a regular expression "##app.*ie"

For more information on Regular Expressions, refer to reference books and Web sites devoted to the subject. The Regular Expression Library at http://regexlib.com is a good starting point.
~~ Used to specify a numeric range in Boolean Restrictions only, not usable in the Search Request. Ex:

apple w/5 12~~17

means to find all records containing apple within 5 words of a number between 12 and 17. For more information, see numeric range searching in the Boolean Restrictions topic.

Back to Parameter Listing


Search Type - Specify if the search will match records containing any of the words, or if the search must match all of the words.

Search Stop Limit - This setting causes a search to halt automatically when the specified number of items have been found. This provides a way to limit the resources consumed by searches that retrieve a very large number of items. A value of zero (0) means that there is no limit. Note that your administrator may have specified a system-wide maximum for this value. If a system-wide maximum exists, your setting will be ignored if it is greater than the maximum. See also: Maximum Results

Stemming - Stemming extends a search to cover grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply. Checking this box will apply stemming to all words in your Search Request, You may also add the "~" special character to the end of individual words to apply stemming to just some words (Ex: fish~)


Synonym Searching - Synonym searching finds synonyms of a word in a search request. For example, a search for fast would also find quickly. Choose a WordNet library to apply synonym searching to all words in your Search Request, or add the "&" special character to the end of individual words to apply synonym searching to just some words (Ex: fast&).

In a phrase search, synonym searching applies to the phrase, not to individual words in the phrase. For example, a search for "object lesson" with WordNet Thesaurus enabled would find "example", which is a synonym for "object lesson". The individual words in the phrase, object and lesson, are not expanded into their respective synonym lists.

Back to Parameter Listing


Results Ordering

This section of the Advanced Search form allows the results to be ordered by up to 4 different columns in the Search Results. Choose the desired columns in a top-to-bottom order on the form, and specify ascending or descending order for each column.


Output Options

The following output options may be selected:

  • Show Search Parameters - The search results will include a table that lists the search parameters that you entered.
  • Suppress Links In Data - Links will not be included in the search results. This is desirable if the search results are to be loaded into a spreadsheet or other program.
  • Suppress Page Header - The search results will be displayed without the usual CATSWeb page header. This is desirable if the search results are to be loaded into a spreadsheet or other program.

Back to Top

Save Options

An Advanced Search may be optionally saved for later reuse. To save an Advanced Search, select Public or Private and specify a name for the search in the Save As field. Public searches will be available to all users, while Private searches will only be available to you. If the Save As name for the search already exists, the old search will be overwritten with the new search.

If you have recalled a saved Advanced Search, the Delete check box will also be available. Checking this box and submitting the search causes it to be deleted from either the public or private Advanced Search list.

Back to Top

Search Results

Results of the search are presented using the columns specified in Columns in Results. Your system administrator may have limited the columns available, since many of the columns are intended for special applications and are not relevant to end-user searches.

The Item column in the search results will typically include a link back to the live (current) version of the record or file. The following iconic symbols may also be available in the column:

  • - If the item is a record that is located in the CATSWeb archive tables, the iconic "A" provides a link directly to the archived record.
  • - If the item is a CATSWeb File Attachment, the iconic down arrow may be used to download or view the attached file directly, without first having to browse to the CATSWeb File Attachment record.
  • - If the item is a file, the iconic "H" button may be used to view an HTML version of the file with search hits highlighted . Not all file types are convertible to HTML, and CATSWeb may return an error for some file types . If this occurs, use the textual item link to open the file using its native viewer (which you would need to have installed on your computer). Formatting information inside most converted file types will likely be lost after converting to HTML.

Search result columns include:

  • Archive ID - Available only if your CATSWeb system includes the Record Archiving option, this is the numeric ID of the record if it is an archived record.
  • Archive URL - Available only if your CATSWeb system includes the Record Archiving option, this is a URL that points to the archived version of the record, if the record in the search results is an archived record. Note that a link is not provided, you should instead use the Item column for most searches.
  • Creation User - If the item is a CATSWeb record, this is the Employee ID of the user that originally created the record..
  • Document ID - The numeric identifier of the item within the Full Text Search index that it was found in. Note that this value is not permanently associated with a particular item and may change whenever the indexes are updated or rebuilt.
  • Document Name - The name of the temporary file used by CATSWeb Indexing Server when the item was indexed.
  • Document Path - The full path to the temporary file that was used by CATSWeb Indexing Server when the item was indexed.
  • Document Size - The size of the item in bytes.
  • Document Type - Technical information intended for special applications. Contact AssurX Engineering Support for more information.
  • Download URL - If the item is a File Attachment, this is the URL that the iconic down arrow uses for direct downloading or viewing. Note that a link is not provided, you should instead use the Item column for most searches.
  • File Name - A generic name for the item, not necessarily a true file name. For example, if the Item value is "Basic Action Form #123", this value may be the more generic "Action #123".
  • Hit Byte Offsets - Technical information intended for special applications. Contact AssurX Engineering Support for more information.
  • Hits - A relative measurement of the number of hits (matches) in an item.
  • Hits by Word - A listing of the search terms that received hits, along with the number of hits for each term.
  • Index - The name of the index that the item was found in.
  • Index Details - Technical information intended for AssurX diagnostics and special applications. Contact AssurX Engineering Support for more information.
  • Index Path - The full path to the index folder that was used by CATSWeb Indexing Server when the item was indexed.
  • Item - A "best-fit" description of the item, as determined by CATSWeb and the search engine during the indexing process. For CATSWeb records, this is typically the category name (a.k.a. "form name") followed by the record number. Child records such as Notes or Links use the parent record name with a parenthetical designation of the child record type. Files, including those attached to CATSWeb File Attachment records, may use a more descriptive document title or name, if the indexing engine was able to determine one.
  • Last Edit User - If the item is a CATSWeb record, this is the Employee ID of the user that last edited the record.
  • Line # - A column that numbers the lines in the search results.
  • Parent ID - If the item is a CATSWeb child record such as a Note, File Attachment or Link, this is the numeric ID of the parent record.
  • Parent Type - If the item is a CATSWeb child record such as a Note, File Attachment or Link, this is the numeric type of the parent record (a CWEB_RECTYPE value, 1=Issue, 2=Action, 5=Subtask). Refer to CATSWeb Database Schema Documentation or CATSWeb API sample code files for more information on the values, or contact AssurX Engineering Support for more information.
  • Phrase Count - Technical information intended for special applications. Contact AssurX Engineering Support for more information.
  • Record ID - If the item is a CATSWeb record, this is the numeric ID of the record. For child records such as a Note, File Attachment or Link, this will only be unique among other similar child records for the same parent.
  • Record ID1 - Technical information intended for special applications, this value is typically the same as Record ID if the item is a CATSWeb record. Contact AssurX Engineering Support for more information.
  • Record ID2 - Technical information intended for special applications. Contact AssurX Engineering Support for more information.
  • Record Prefix - Technical information derived during the indexing process and intended for special applications. Contact AssurX Engineering Support for more information.
  • Record Type - If the item is a CATSWeb record, this is the numeric type of the record (a CWEB_RECTYPE value, 1=Issue, 2=Action, 5=Subtask, etc.). Refer to CATSWeb API sample code files for more information on the values, or contact AssurX Engineering Support for additional information.
  • Relevance (%) - A percentage ranking of the item in terms of relevance to your search. Higher numbers are more relevant.
  • Summary Data - "Best-fit" summary data from the item as determined by CATSWeb and the search engine during the indexing process. This column is only marginally useful for CATSWeb records since there is no concept of "best" fields in CATSWeb, but is more useful for files and documents. In these types of items, the search engine can often extract summary information from embedded titles and properties.
  • URL - A URL pointing to the live (current) version of the item. Note that a link is not provided, you should instead use the Item column for most searches.
  • Word Count - The total number of indexed words in the item as determined during the indexing process. This value will typically not be accurate for CATSWeb records since not all fields are typically indexed, and additional data may be included to make the search process work more efficiently. It is best to use this column only when the search is restricted to files and documents.

Back to Top


Hit Highlighting

Hit Highlighting is available when an item is viewed via links in the search results. When the item is a CATSWeb record, clicking the main Item link opens the record's View page with hit highlighting enabled. When the item is a file, the iconic H () may be used to view a highlighted HTML version of the file, subject to limitations of the HTML conversion process .

Hit highlighting is not always accurate, it is simply a tool to assist you in quickly locating the information that you searched for. Inaccuracies in hit highlighting can result from a file or record being changed since it was last indexed, and from limitations in the highlighting algorithms themselves. For example, if your search uses wildcards, fuzziness or other means of specifying inexact searches, the matching words and phrases may not always be highlighted, especially in CATSWeb records. Terms and words that contain special characters such as accented characters may also not be highlighted.


Back to Top


Exporting Search Results

Search results may be exported to spreadsheet programs and other software simply by saving the HTML source of the search results. All browsers provide this capability (typically from the File menu), and most spreadsheet programs are capable of importing HTML files. HTML tables in the search results will be converted to tables in the spreadsheet program.

When executing searches specifically for exporting, you may wish to check the Suppress Links In Data and Suppress Page Header output options. See above for descriptions of these options.


Back to Top