Information retrieval systems enabling retrieval via subject fall into three basic categories:

  • Group 1 indexes works via the actual words used in the document and/or its title and/or its abstract and. uses ‘uncontrolled’ or ‘natural language.’
  • Group 2 uses ‘controlled’ words to describe a work’s subject
  • Group 3 uses a ‘controlled’ notation (numbers, letters or combinations) to express subjects

In ‘controlled’ (or ‘prescribed’) indexing languages the precise terms used to describe subjects and the process by which such terms are assigned are managed by a professional member of library staff. The development of online catalogues has enabled the blending of the best of both controlled and uncontrolled approaches (i.e. keyword access and full-text searching) allowing users to benefit from the best aspects of both.

A controlled indexing vocabulary based on an authority list is intended to aid indexing and searching, because (Olson & Boll, 2001) and:

  • Authorises a single term or notation for any one concept
  • Establishes the size or scope of the term
  • Explicitly records its hierarchical and affinitive or associative relations
  • Controls variant spellings
  • Explicitly identifies the multiple concepts expressed by homonyms, by means of adjectives, qualifiers, or phrases and precise terminology

Use of such a vocabulary helps searchers focus their thoughts when approaching the system with an incomplete understanding of the information they require and increases the probability that:

  • Both indexer and searcher will express a specific concept in the same way
  • Both indexer and searcher will be led to a desired topic by syndetic features (e.g. “broader term”, “narrower term”, “related term”)
  • The same term will be used by different indexers thereby ensuring consistency of indexing

The alternative approach of “keyword” and/or full text access enables users to utilise their own terminology and can provide better recall, particularly when augmented by the use of dictionaries, corpora, stemmers, parsers, etc.

The high input cost is often mentioned as the main disadvantage of controlled vocabularies. However such systems can compensate for variation in language or subject term usage when items are indexed for cross disciplinary collections.