   
SUBJECT DATA IN THE METADATA RECORD
A Report from the ALCTS/CCS/SAC/Subcommittee on Metadata and Subject Analysis
Working Draft, July 1999
SUMMARY OF RECOMMENDATIONS
For subject data in the metadata record, (the ALCTS/CCS/SAC/Subcommittee on Metadata and Subject Analysis recommends the following:
1.1 Vocabulary,Semantics,and Syntax(sections 3 and 4)
A combination of keywords and controlled vocabulary should be used to allow users the choice of simple free-text indexing as well as complex controlled vocabulary indexing. (3.1)
Use of multiple vocabularies should be accommodated. For a general vocabulary covering all subjects, the Subcommittee recommends the use of LCSH or Sears with or without modification. (3.2.1)
In order to achieve the desired level of specificity, controlled vocabulary terms assigned to (the metadata record could be supplemented and complemented by keywords and other subject related elements, such is title, abstract, statement of content, etc. (3.2.2. 1)
Synonyms should be handled by system design implementation of the controlled vocabulary or thesaurus. If this is not availab1e, an alternative is to include all identified synonyms and related terms, along with the keywords, in the metadata record. (3.2.2.2)
Tools such as online thesaurus display should be developed to provide access to controlled vocabulary structures, showing both hierarchically (broader and narrower) -,and horizontally related terms. (3.2.2.3)
The metadata record, -,and the subject element in particular, should be as simple or as complex as desired. Trained catalogers may choose to continue to apply LCSH to the metadata records in the same manner as those assigned to MARC records. For those not trained in subject cataloging, tile Subcommittee recommends a simplified syntax. (3.2.3)
The development and refinement of methods for harmonization of subject terms from different controlled vocabularies should be undertaken, and investigation of the feasibility of developing a general metathesaurus or expanding the medical metathesaurus to include indexing terms covering all subject areas should be encouraged. (3.2.4)
Classification data should be included in the metadata record by those who have the expertise to do so. For those not trained in the use of classification, further development and improvement of mechanisms for automatic assignment of classification data from different schemes and sources should bc encouraged. (4.1)
The use of as many existing classification schemes (DDC, LCC, NLM, etc.) as useful and feasible even within a particular implementation should be allowed. Multiple class numbers should be allowed in the same record to bring out different topics and aspects treated provided that they are properly designated and coded. (4.2)
1.2 Application (section 5)
In the Dublin Core metadata record, the Subcommittee recommends the inclusion in the SUBJECT element of both free-text and controlled terms, where appropriate and feasible, inorder to achieve optimal recall and precision in retrieval. (5.1)
For the sake of semantic interoperability, the Subcommittee recommends adopting an existing vocabulary or vocabularies with or without modification. (5.1.1)
The adoption or adaptation of L ibra ry of Congress Subject Headings or Sears L ist of Subject lleadings (for subject representation on a broader level) as the basis for subject data in the Dublin Core metadata records for a general collection is recommended (5.1.1.1)
Criteria for choosing specialized vocabularies should be based on subject matter, the intended audience, term specificity, and syntax. (5.1.1.2)
Each implementing agency should establish policies regarding the appropriate level of subject representation for its collection. At the appropriate level, the most specific subject terms provided by the chosen controlled vocabulary should bc assigned. (5.1.2)
Within a specified digital collection or project (the application of subject analysis should be consistent; in other words, the same semantics and syntax should be applied throughout. Compatibility with other metadata schemes is also desirable. When a controlled vocabulary is used, the version of the vocabulary should be indicated along with the date ort which the subject data are created. (5.1.3)
With regard to syntax, the use of full LCSFI subject strings, if feasible (i.e., if time and trained personnel are available), particularly in the OPAC environment, should be encouraged. For the Dublin Core, the Subcommittee endorses the use of other elements (type, coverage) in addition to the SUBJECT element to accommodate different facets related to subject: topic, place, period, language, etc. Deconstructed subject strings should be so designated. (5.1.4)
For classification data, the Subcommittee recommends adopting an existing scheme with or without modification. Criteria for choosing classification schemes should be based on subject domain, (the nature and scope of the collection being described, and the user community being served. (5.2.1)
Classification data at (thc most exhaustive or specific level should be encouraged. (5.2.2)
Classification notation should be included. However, item (non-topical Cutter) numbers are not necessary because classification data are not used as a shelving device in this context. Multiple classification numbers should be allowed -either of various classification types or multiple numbers, within the same scheme. In the metadata record, captions (i.e., the text accompanying the class numbers) need not be included. If desired, captions could be built in through systems design. (5.2.3)
1.3 Systems Design (section 6)
The development and refinement of the following online system features are highly recommended:
- automatic keyword indexing based on word occurrences in the full-text resources, using natural language processing methods;
- automatic generation of classification data based on the resource itself;
- automatic extraction of subject and classification data from records for similar items;
- availability of online access to controlled vocabularies and classification schemes for creators of metadata records;
- automatic mapping from user input free-text terms to controlled vocabularies and classification data-, and,
- availability of online tools and assistance, designed particularly for non-catalogers, to derive appropriate subject terms and/or class numbers.
For a copy of the complete report, please contact Diane Dates Casey at d-casey@gov.st.edu
|