|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
User Guide Contents Managing Full Text Search |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contents Introduction
|
Quick Tip - Search Indexes Don't Update Immediately
|
Unlike the database indexes used by CATSWeb Queries and Filters, the search indexes don't update immediately when CATSWeb data is added or changed. For example, if you enter a new record with the phrase "Quick brown foxes are super fast!", then immediately try to search for those words, the new record won't be found. Search indexes are updated periodically based on the schedule(s) that your CATSWeb administrator puts in place. |
Back to Top
Quick Start
Do the following to try your first Advanced Search:
The following parameters are available in an Advanced Search:
Boolean Restrictions | Phonic Searching | Stemming |
Columns in Results | Return Most Recent | Synonym Searching |
Fuzziness | Search Request | |
Indexes to Search | Search Type | |
Maximum Results | Search Stop Limit |
Boolean Restrictions - Boolean Restrictions provide the means of searching for information using a structured language that is functionally similar to SQL, instead of (or in addition to) entering natural language searches using the Search Request parameter.
A boolean search request consists of a group of words, phrases, or macros linked by connectors such as AND and OR that indicate the relationship between them. Here are some examples:
Boolean Restriction | Meaning |
apple and pear | Both words must be present. |
apple or pear | Either word can be present. |
apple w/5 pear | apple must occur within 5 words of pear. |
apple pre/5 pear | apple must occur 5 or fewer words before pear. |
apple not w/5 pear | apple must not occur within 5 words of pear. |
apple and not pear | only apple must be present. If pear is also present, a match will not occur. |
Category contains Validation | The field named Category must contain the word Validation. |
Form_Name contains Validation | The field with a caption of Form Name must contain the word Validation (spaces in captions are replaced with underscores). |
apple w/5 xfirstword | apple must occur in the first 5 words |
apple w/5 xlastword | apple must occur in the last 5 words |
If you use more than one connector, you should use parentheses to indicate precisely what you want to search for. For example, apple and pear or orange juice could mean (apple and pear) or orange, or it could mean apple and (pear or orange).
Words and Phrases
In a boolean search, you do not need to use any special punctuation or commands to search for a phrase. Simply enter the phrase the way it ordinarily appears. You can use a phrase anywhere in a boolean restriction. For example:
apple w/5 fruit salad
Noise Words
Noise words, such as if and the, are ignored in searches. If a phrase contains a noise word, CATSWeb will skip over the noise word when searching for it. For example, a search for statue of liberty would retrieve any item containing the word statue, any intervening word, and the word liberty. Punctuation inside of a search word is treated as a space. Thus, can't would be treated as a phrase consisting of two words: can and t. 1843(c)(8)(ii) would become 1843 c 8 ii (four words).
The use of noise words in searches may produce results that you do not expect. For example, if you search for the phrases the problem and my problem in separate searches, you might expect each search to only return items containing those specific phrases. However, each search would actually return identical results since each is really just a search for problem (the and my are noise words).
CATSWeb system administrators can tailor the noise word list for the entire system, or for individual indexes within the system. Refer to Installation Guide for CATSWeb Full Text Search for more information.
Searching In Specific Fields
You may use boolean restrictions (or a pair of colons in the Search Request) to restrict the search results based on information inside specific fields. You may specify the field using either its table field name or its caption. If you use the caption, special characters within the caption must be removed (parens, colons, apostrophes, etc.) and spaces in the caption must be replaced with underscore ("_") characters. This means that an actual caption of "Action Title (Summary):" would be stated as "Action_Title_Summary" in searches.
For example, assume that the Category table field has a caption of "Form Name". You may restrict search results to only a particular form ("Basic Action Form" in this example) using either of the following boolean restrictions:
Category Contains "Basic Action Form"
Form_Name Contains "Basic Action Form"
The Boolean Restriction may also be more elaborate, such as this example:
(Form_Name Contains "Basic Action Form" And Investigation_Notes Contains "resolve immediately") Or (Form_Name Contains "Basic Issue Form" And Problem_Description Contains "critical problem")
Note that the CATSWeb Category field is special, in that the Category field value from the parent Action, Issue or Subtask record is included in the index entries for associated child records. This means that either of the boolean restrictions in the first example above will also limit the Notes, File Attachments, Signatures, etc. to ones for "Basic Action Form" records. However, Subform records have their own Category field, with a separate ParentCategory field. If the intent is to limit records to a particular Issue, Action or Subtask category, including Suform records, a restriction such as this one may be used:
Category Contains "Basic Action Form" Or ParentCategory Contains "Basic Action Form"
For all other fields, a boolean restriction will match that field only in the record types where the field actually exists. For example, if you define a boolean restriction based on the UDUserCreated field, the search will match the actual value in the record, regardless of the parent or child relationship (UDUserCreated is itself somewhat special in that it exists in all CATSWeb record types). And if you define a boolean restriction based on the field InvestigationNotes, the search will only match CATSWeb Action records, since that field is only present in Action records.
AND Connector
Use the AND connector in a boolean restriction to connect two expressions, both of which must be found in any document retrieved. For example:
apple pie and poached pear would retrieve any item that contained both phrases.
(apple or banana) and (pear w/5 grape) would retrieve any document that (1) contained either apple OR banana, AND (2) contained pear within 5 words of grape.
AndAny Connector
AndAny lets you combine a search for required search terms with other terms that are optional. The words before AndAny are required, and the words after AndAny are optional. For example:
(apple and pear) AndAny (grape or banana) would find any word that contains apple and pear, and grape and banana will also be counted as hits.
OR Connector
Use the OR connector in a search request to connect two expressions, at least one of which must be found in any document retrieved. For example, apple pie or poached pear would retrieve any document that contained apple pie, poached pear, or both.
W/N Connector
Use the W/N connector in a search request to specify that one word or phrase must occur within N words of the other. For example, apple w/5 pear would retrieve any document that contained apple within 5 words of pear. Some types of complex expressions using the W/N connector will produce ambiguous results and should not be used. The following are examples of ambiguous boolean restrictions:
(apple and banana) w/10 (pear and grape)
(apple w/10 banana) w/10 (pear and grape)
In general, at least one of the two expressions connected by W/N must be a single word or phrase or a group of words and phrases connected by OR. For example
(apple and banana) w/10 (pear or grape)
apple and banana) w/10 orange tree
Two built-in words are provided to designate the beginning or ending of a file or record. They are xfirstword and xlastword. The terms are useful if you want to limit a search to the beginning or end of a file. For example, apple w/10 xlastword would search for apple within 10 words of the end of a document
The pre/N connector is like W/N, but it also requires that the first expression must occur before the second. For example, (apple or pear) pre/5 banana means that apple or pear must occur before banana and within 5 words of it.
NOT and NOT W/N Connector
Use NOT in front of any search expression to reverse its meaning. This allows you to exclude items from a search. For example:
apple sauce and not pear
NOT standing alone can be the start of a search request. For example, not pear would retrieve all documents that did not contain pear. If NOT is not the first connector in a request, you need to use either AND or OR with NOT:
apple or not pear
not (apple w/5 pear)
The NOT W/ ("not within") operator allows you to search for a word or phrase not in association with another word or phrase. For example:
apple not w/20 pear
Unlike the W/ operator, NOT W/ is not symmetrical. That is, apple not w/20 pear is not the same as pear not w/20 apple. In the apple not w/20 pear request, CATSWeb searches for apple and excludes cases where apple is too close to pear. In the pear not w/20 apple case, CATSWeb searches for pear and excludes cases where pear is too close to apple.
Numeric Range Searching
A numeric range search is a search for any numbers that fall within a range. To add a numeric range component to a boolean restriction, enter the upper and lower bounds of the search separated by the special characters ~~ like this:
Part w/2 20~~22
This would find any item containing Part within 2 words of a number between 20 and 22 (such as 21 CFR Part 11). Note that:
Columns in Results - Choose the columns that you wish to appear in the search results. Unlike a query or filter where the columns are the actual field values from a record, columns in search results may not be data from the records at all. For example, Relevance is a percentage ranking of the item relative to other items in the search results, based on your search criteria.
When you open a new Advanced Search form, CATSWeb defaults the column selections to the ones that are most useful for the majority of searches. Use the <Ctrl> key to select multiple columns, or use the <Alt> key to select multiple columns in a range.
Fuzziness - Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors, or for text that has been scanned using optical character recognition (OCR).
If you choose a non-zero fuzziness value, fuzzy searching will be enabled for all of the words in the Search Request. Higher values cause more fuzziness to be applied. High fuzziness values are typically not a good idea, as your search will probably find much more information than you desired. You may also specify fuzzy searching for individual words by choosing a value of zero (0) and using the "%" special character as described in the Search Request topic.
Indexes to Search -
Select the indexes that you wish to search within. An index typically corresponds to a collection of records or files, such as a CATSWeb table. Use the <Ctrl> key to select multiple indexes, or use the <Alt> key to select multiple indexes in a range.
Maximum Results - Use this setting to limit the number of items returned in the search results. A value of zero (0) means that there is no limit. The search engine will not stop searching after the limit is reached. Instead, it will complete the search and collect the best matching items among all of the items retrieved. Note that your administrator may have specified a system-wide maximum for this value. If a system-wide maximum exists, your setting will be ignored if it is greater than the maximum. See also: Search Stop Limit
Phonic Searching - Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe. Checking this box will apply phonic searching to all words in your Search Request, You may also add the "#" special character to the beginning of individual words to apply phonic searching to just some words (Ex: #bear).
Return Most Recent - This setting has no effect unless the search process is terminated early due to a Search Stop Limit threshold being reached. In that case, the setting has the effect of skewing the search toward more recent items, rather than items with the best relevancy. This can in turn change the set of results returned. The setting has no effect on how search results are ordered, since CATSWeb sorts (orders) the search results after the search engine returns them. Leaving the box unchecked is best for most searches.
Search Request - Enter your main natural language search expression here, or enter nothing and use Boolean Restrictions instead. You may use words, quoted phrases, tokens, special characters or expressions. Note that when you perform a simple search from the CATSWeb page header, you are entering this parameter, and may use simple words or any of the techniques described here.
Just as in CATSWeb Queries and Filters, you may enter text tokens to represent values from the user's session record who executes the search. Terms in your search request may also use the following special characters:
Character | Meaning |
? | A wildcard that replaces a single character (matches any character in that position). Ex: appl? matches apply and apple but not apples. |
* | A wildcard that replaces any number of characters. Ex: ap*ed matches applied, approved, etc. *cipl* matches principle, participle, etc. appl* matches apple, apples, application, etc. |
% | Specifies that a Fuzzy search should be performed on that word. Ex: ba%nana matches words that begin with ba and have at most one difference between it and banana. Entering b%%anana matches words that begin with b and have at most two differences between it and banana. The Fuzziness parameter should be set to 0 when performing fuzzy searches on individual words using this character. Non-zero values automatically fuzziness to all words in the search. |
# | Used at the beginning of a word, specifies that a Phonic search should be performed on that word. Ex: #bear will match bear and bare. |
~ | Used at the end of a word, specifies that stemming should be applied to the word. Ex: fish~ will match fish, fishing, fished, etc. |
+ and - | Use plus (+) sign to designate words or phrases that are required to exist in matched records. Use the minus (-) sign to designate words or phrases that must not exist in matched records. Ex: "apple pie" -salad +"ice cream" finds records that contain the phrases "apple pie" and "ice cream" but do not contain the word salad. |
:: | Use a pair of colons (::) to specify a search within a particular field in the search request. Ex: Criticality_Of_Problem::High finds records that contain "High" in fields with captions of "Criticality Of Problem". Note that special characters within the caption must be removed (parens, colons, apostrophes, etc.) and spaces in the caption must be replaced with underscore characters. You may also use the field's actual table field name, rather than the caption. For example, if the "Criticality Of Problem" field uses the Text1 table field on all forms, the following search request is equivalent: Text1::High |
## | Used to specify a Regular Expression. Regular Expression searching provides a way to search for advanced combinations of characters. A regular expression included in a search request must be quoted and must begin with ##. Ex: Apple and "##200[4-5]" Apple and "##20[0-9]+" A regular expression must match a single whole word. For example, you could not search for "apple pie" with a regular expression "##app.*ie" For more information on Regular Expressions, refer to reference books and Web sites devoted to the subject. The Regular Expression Library at http://regexlib.com is a good starting point. |
~~ | Used to specify a numeric range in Boolean Restrictions only, not usable in the Search Request. Ex: apple w/5 12~~17 means to find all records containing apple within 5 words of a number between 12 and 17. For more information, see numeric range searching in the Boolean Restrictions topic. |
Search Stop Limit - This setting causes a search to halt automatically when the specified number of items have been found. This provides a way to limit the resources consumed by searches that retrieve a very large number of items. A value of zero (0) means that there is no limit. Note that your administrator may have specified a system-wide maximum for this value. If a system-wide maximum exists, your setting will be ignored if it is greater than the maximum. See also: Maximum Results
Stemming - Stemming extends a search to cover grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply. Checking this box will apply stemming to all words in your Search Request, You may also add the "~" special character to the end of individual words to apply stemming to just some words (Ex: fish~)
Synonym Searching - Synonym searching finds synonyms of a word in a search request. For example, a search for fast would also find quickly. Choose a WordNet library to apply synonym searching to all words in your Search Request, or add the "&" special character to the end of individual words to apply synonym searching to just some words (Ex: fast&).
In a phrase search, synonym searching applies to the phrase, not to individual words in the phrase. For example, a search for "object lesson" with WordNet Thesaurus enabled would find "example", which is a synonym for "object lesson". The individual words in the phrase, object and lesson, are not expanded into their respective synonym lists.
Results Ordering
This section of the Advanced Search form allows the results to be ordered by up to 4 different columns in the Search Results. Choose the desired columns in a top-to-bottom order on the form, and specify ascending or descending order for each column.
The following output options may be selected:
An Advanced Search may be optionally saved for later reuse. To save an Advanced Search, select Public or Private and specify a name for the search in the Save As field. Public searches will be available to all users, while Private searches will only be available to you. If the Save As name for the search already exists, the old search will be overwritten with the new search.
If you have recalled a saved Advanced Search, the Delete check box will also be available. Checking this box and submitting the search causes it to be deleted from either the public or private Advanced Search list.
Back to Top
Results of the search are presented using the columns specified in Columns in Results. Your system administrator may have limited the columns available, since many of the columns are intended for special applications and are not relevant to end-user searches.
The Item column in the search results will typically include a link back to the live (current) version of the record or file. The following iconic symbols may also be available in the column:
Search result columns include:
Hit Highlighting is available when an item is viewed via links in the search results. When the item is a CATSWeb record, clicking the main Item link opens the record's View page with hit highlighting enabled. When the item is a file, the iconic H () may be used to view a highlighted HTML version of the
file, subject to limitations of the HTML conversion
process .
Hit highlighting is not
always accurate, it is simply a tool to assist you in quickly locating the information that you searched for. Inaccuracies in hit highlighting can result from a file or record being changed since it was last indexed, and from limitations in the highlighting algorithms themselves. For example, if your search uses wildcards, fuzziness or other means of specifying inexact searches, the matching words and phrases may not always be highlighted, especially in CATSWeb records. Terms and words that contain special characters such as accented characters may also not be highlighted.
Back to Top
Exporting Search Results
Search results may be exported to spreadsheet programs and other software simply by saving the HTML source of the search results. All browsers provide this capability (typically from the File menu), and most spreadsheet programs are capable of importing HTML files. HTML tables in the search results will be converted to tables in the spreadsheet program.
When executing searches specifically for exporting, you may wish to check the Suppress Links In Data and Suppress Page Header output options. See above for descriptions of these options.