A computer provides a better way to store information like the card catalog; indeed, most public libraries today keep their card catalogs on a computer. For each book in the library, a data record is created that contains information gathered from the various card catalogs. For example, the title of the book, the author's name, the physical location of the book, and any other relevant information. A record is generally composed of several fields, with each field used to store a particular piece of information. For example, we might store the author's last name in one field and the first name in a separate field. All the records (one for each book) are collected and stored in a file. The file containing the records is typically called the data file.
Indexes are created so that a particular record in the data file can be located quickly. For example, we could create an author index, a title index, and a subject index. The indexes are typically stored in a separate file called the index file.
An index is a collection of "keys", one key for each record in the data file. A key is a subset of the information stored in a record. When an index is created, the key values are extracted from one or more fields of each record. The value of each key determines its order in the index (i.e., the keys are sorted alphabetically or numerically). Each key has an associated pointer that indicates the location in the data file of the corresponding complete record. To find a particular record, a matching key is quickly located in the index, then the associated pointer is used to locate the complete record.
For example, assume that an author's first and last names are stored in two separate fields of each record. One option would be to define the author index as a simple (single field) key consisting of the last name field. This would cause the author index to order the records by last name. A second option would be to define the author index as a compound (multiple field) key consisting of the last name field followed by the first name field. This would cause any records with the same last name to be further ordered by first name. The first option would require less storage space since the first name would not be stored in the index. However, the second option would provide somewhat faster access times if there were a large number of authors with the same last name. With the addition of the first name field, each key would point closer to the desired record, which would result in less sequential searching.
In some cases, keys may have identical values. For example, different authors may have the same name, and some authors may have written more than one book. Either case would cause the author index to have multiple keys with the same value (i.e., duplicate keys). Locating a record via an index that contains duplicate keys will always require some sequential searching. Since the key itself does not uniquely identify a particular record, additional information must be retrieved from the complete record in order to determine whether or not it is the one desired.
Every record should typically contain at least one piece of information (or field) that is unique. By unique, we mean that no two records could contain the same value for this field. In the case of our card catalog example, a unique field might be a book's ISBN number, or perhaps some arbitrarily chosen serial number. In the case of a billing system, it might be an invoice number. A unique field ensures that each record is distinguishable from all other records. If the keys in an index are composed of values from a unique field, then random accesses using that index would always position directly to the desired record. No sequential searching would be necessary since only one record would contain the unique value. We would call such an index a "unique index".