Managing Full Text Indexing Jobs
Manager Contents
User Contents

Contents

Introduction
Settings
History and Statistics
Job Scheduling Constraints and Tips

Introduction

Full Text Indexing Jobs are used to create and update Full Text Search Indexes. A job may process any number of indexes, and may be configured to run automatically per a configurable schedule, or to run only when manually launched.

To manage Full Text Indexing Jobs, click the Full Text Search Management link in the Special Functions section of the Manage page, then click the Manage Full Text Indexing Jobs link. Since Full Text Indexing Job records are actually subcomponent records, they may also be accessed via the Component Management link. You can choose to limit who can modify the Full Text Indexing Jobs by specifying ownership. For more information on planning and installing the Full Text Search software and infrastructure, refer to Installation Guide for CATSWeb Full Text Search.

Back to Top


Settings - Full Text Indexing Job records have the following settings and special functions:

Active Hours Months
Component Name Indexes Order within Hour
Days of the Month Job Frequency Run Job Now
Days of the Week Job Name Subcomponent Type
Description Job Type

Active - Determines whether or not the job runs automatically on a scheduled basis. The setting is ignored when the Run Job Now link is used to run the job.

Component Name - A read-only value of "Full Text Indexing Jobs", which is the parent component for all Full Text Indexing Job subcomponent records.

Days of the Month - Used only for jobs with a Job Frequency setting of One-time or Monthly by Date, this multi-select list is used to specify the days on which the job will run during the month.

Days of the Week - Used only for jobs with a Job Frequency setting of Periodically, this multi-select list is used to specify the days of the week on which the job will run. 01=Sunday, 02=Monday, etc.

Description - A description of the indexing job.

Hours - This multi-select list applies to all jobs and is used to specify the hours of the day in which the job will run. Hours are specified in 24-hour format: 00=Midnight, 12=Noon, 23=11 PM. If multiple jobs are scheduled for the same hour, the Order within Hour setting determines the order in which they are executed.

Indexes - This multi-select list is used to specify the Full Text Search Indexes that the job will process. At least one index must be specified.

Job Frequency - This setting is used to specify the frequency of the job and how it is scheduled. The setting is ignored when the Run Job Now link is used to run the job. One of the following values:
  • One-time - The job is executed only once at the first scheduled time. Following execution, the Active setting is automatically changed to false (unchecked), thereby preventing it from running again. The Months, Days of the Month and Hours settings determine when the job runs. The Days of the Week setting is ignored.
  • Monthly by Date - The job is executed on a regular monthly schedule as defined by the Months, Days of the Month and Hours settings. The Days of the Week setting is ignored. The job may be configured to run many times during each month, since the applicable scheduling parameters are multi-select lists.
  • Periodically - The job is executed on a regular schedule as defined by the Months, Days of the Week and Hours settings. The Days of the Month setting is ignored. You might also think of this type of job as a 'Weekly by Day' job, with the added benefit of being able to limit its execution based on the month (select all months if by-month limits aren't needed). The job may be configured to run many times during each week, since the applicable scheduling parameters are multi-select lists.

Job Name - The name of the indexing job as it appears in the listing of jobs. Each job must have a unique name.

Job Type - The type of activity that the job performs on the indexes. One of the following values:

  • Rebuild - This type of job recreates the indexes in their entirety, and is the most expensive form of job in terms of system loading, network traffic and execution time required. During the rebuilding process, searches cannot be conducted using the affected indexes. A rebuild of all indexes is required to initially create the indexes when the Full Text Search feature is first installed. Periodic rebuilds are also desirable since a rebuild is the only way to purge deleted records or files from the indexes. After a rebuild, indexes are automatically compressed, so a follow-on Compress job is not required.
  • Update - Update jobs update existing indexes by adding new files or records to the indexes, and refreshing the index data based on any changes to data in the underlying files or records. However, update jobs do not purge deleted files or records from the indexes. When indexes are undergoing an update, searches may still be conducted using the indexes. Update jobs are moderately expensive in terms in terms of system loading, network traffic and execution time required.
  • Compress - Compress jobs do not functionally change the index data, but instead compress the index file collection to be as small as possible, thereby minimizing disk space consumption. Compress jobs may be appropriate if many Update jobs have been executed since the last Rebuild job (since "wasted space" can only be created during Update jobs). Compress jobs are not required for proper functioning of the Full Text Search feature, and may not be needed at all if disk space consumption is not an issue, or if Rebuild jobs are used frequently.

Months - This multi-select list applies to all jobs and is used to specify the months in which the job will run (01=January, 02=February, etc.). If the job is intended to run in all months, be sure to select all months, rather than selecting none.

Order within Hour - When multiple jobs are scheduled for the same hour, this setting determines the order in which the jobs are executed. Lower numbers run first.

Run Job Now - Clicking this link queues the job for execution as soon as possible (which may be immediately if no other jobs are ongoing or queued). The job will be executed per the saved job parameters that exist when the job begins. If you have just modified the job parameters, be sure to submit the changes before clicking the link.

Subcomponent Type - A read-only value of "Full Text Indexing Job", which is the type of subcomponent used for all Full Text Indexing Job records.

Back to Top


History and Statistics

The History and Statistics section provides read-only historical data regarding the last execution of the job. This information may be useful for planning and adjusting indexing jobs, and in diagnosing and troubleshooting problems.

Back to Top


Job Scheduling Constraints and Tips

Scheduling an optimal set of indexing jobs involves establishing a balance between frequent index updates that keep indexes “fresh”, vs. infrequent updates that allow indexes to be “stale”, but consume less resources. You should also understand how scheduled indexing jobs are actually processed by CATSWeb Indexing Server to avoid unexpected results and problems.

Whenever CATSWeb Indexing Server is not actually running an indexing job, it uses a timer to periodically "awaken" and check the current time. If the current time is in a new hour (relative to the last time it checked), it queries the CATSWeb database(s) to see if any jobs are scheduled to run during the new hour. If there are, it queues each job for execution. All jobs scheduled for the hour are queued before any are executed. The queue order will be first in order of database (for multiple database systems), then by the Order within Hour setting if multiple jobs have been scheduled for the same hour within the same database.

CATSWeb Indexing Server then begins executing the jobs in the order they were queued. CATSWeb Indexing Server does not check for additional scheduled jobs while it is running a job. However, it does check for new scheduled jobs between each job run. These important considerations could lead to unexpected results if jobs get scheduled as in the following example:

  • A Rebuild job for an index on a large table is scheduled to run at 1 AM. Because of the large number of records, the job is expected to take 2.5 hours to process (i.e. expected to complete at 3:30 AM).
  • Other jobs are scheduled to run at 2 AM.

What will actually happen is this:

  • The 1 AM rebuild job will run successfully.
  • The 2 AM job(s) will not run at all, since CATSWeb Indexing Server was busy processing the rebuild job during the entire 2 AM hour. It never had an opportunity to look for other jobs during that hour.

This problem is easily avoided by scheduling both jobs to run at 1 AM, then using the Order within Hour setting to establish the order of the jobs (if a particular order is necessary). All jobs will be executed, since all will get queued before any of them execute.

Back to Top