Metadata formats

Traditionally libraries have exchanged metadata in domain specific formats such as MARC (MAchine Readable Cataloging) and offered some limited text based download formats to end users. In the latter case there has been limited standardisation although some library OPACs have supported personal bibliographic citation management tools such as EndNote.

More recently, libraries have begun to offer metadata in less proprietary formats (e.g. RDF), often as part of open data initiatives designed to make their metadata more accessible to wider user communities.

MARC formats

For encoding their printed and audiovisual material most libraries are still currently using MARC (MAchine Readable Cataloging) formats which are applications of the ISO 2709 file standard. This is a very old format which originated at the US Library of Congress in the 1960s. While the majority of library software available still uses variations of the MARC format, there is an increasing push to implement new XML based formats due to their increased flexibility.

UNIMARC

UNIMARC was originally designed to be a switching format to enable the wider exchange of bibliographic data. UNIMARC has been developed by a number of countries to become a production format. It has also been used by UNESCO for its library products, mainly to help developing countries move to automated library management systems and standard data formats. UNIMARC currently consists of a set of four formats:

  • Bibliographic
  • Authorities
  • Classification
  • Holdings

The current maintenance agency for UNIMARC is the National Library of Portugal

MARC 21

MARC 21 is the product of the integration of USMARC, UKMARC and CANMARC (Canadian MARC). It is the most extensively used MARC format in the world and a de facto standard. It has been designed to be both a production format and an exchange format. There are five MARC 21 formats: 

  • Bibliographic
  • Authorities
  • Holdings
  • Classification
  • Community information

The current maintenance agency for MARC 21 is the Library of Congress which offers support documents on the formats with several translations including: Understanding MARC Bibliographic, Library of Congress and Understanding MARC Authority Records. The French translations of the MARC 21 formats are maintained by Library and Archives Canada.

National MARC formats

Many countries have developed national versions of MARC, in order to accommodate local practices. To address this multiplicity of MARC formats, IFLA fostered the development of an international format dedicated to the exchange of bibliographic data among national libraries. UNIMARC was the result.

Recent years have seen a convergence upon the MARC 21 and UNIMARC formats by many counties with little development of new MARC formats.

The Dublin Core Metadata Initiative (DCMI)

The Dublin Core Metadata Initiative (DCMI) is an organisation dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialised metadata vocabularies.

The initiative began in 1995 with a workshop in Dublin, Ohio, that brought together librarians, digital library researchers, content providers, and text markup experts to improve discovery standards for information resources. The original Dublin Core emerged as a small set of descriptive elements that quickly drew global interest from a wide variety of information providers.

DC metadata element set

The Dublin Core Metadata Element Set is an ISO Standard (ISO 15836) well known in the Web and library worlds as a cross domain standard that defines 15 data elements for resource description.  The standard was revised by ANSI/NISO in 2007 (Z39.85-2007) and ISO in 2009.

The Dublin Core Metadata Element Set – Reference Description has been translated into 24 languages.

DCMI library application profile

The concept of application profiles emerged within the Dublin Core Metadata Initiative as a way to declare which elements from which namespaces are used in a particular application or project. Application profiles are defined as schemas which consist of data elements drawn from one or more namespaces, combined together by implementers, and optimised for a particular local application.

The DCMI Library Application Profile proposes a possible application profile that clarifies the use of the Dublin Core Metadata Element Set in libraries and library-related applications and projects.

XML formats

All mark-up languages are derived from SGML (Standard Generalized Mark-up Language), which was used in the 1980s in professional environments for technical and scientific publishing. Based on the same “grammar”, the different “formats” are linked to record profiles called Document Type Descriptions (DTD).

XML (Extensible Markup Language) is widely used across many different communities and enables more functionality than traditional MARC formats. Because of its flexibility and extensibility, it supports the expression of different data models. XML is accepted as an industry standard and therefore facilitates interoperability across sectors and is generally easier to process than alternative options. XML is also more powerful for the presentation of hierarchical or analytical information and allows good link management between bibliographic (and authority) records and digital resources.

XML formats are used in the library and archives world, as well as in the publishing and book trade industry.

MARCXML

MARCXML is a Document Type Definition (DTD) describing the MARC 21 format in XML. MARCXML is used in many applications at the Library of Congress and in OCLC WorldCat and was designed to assist the evolution of bibliographic formats towards XML, while maintaining compatibility with existing bibliographic data.

MODS (Metadata Object Description Schema)

MODS was created by the Library of Congress’ Network Development and MARC Standards Office together with other interested experts as a multi-function bibliographic element set schema with particular value for library applications.

As an XML schema MODS is intended to be able to carry selected data from existing MARC21 records as well as to enable the creation of original resource description records. It includes a subset of MARC fields and uses language-based tags rather than numeric ones, in some cases regrouping elements from the MARC21 bibliographic format.

MODS is expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users.

ONIX (Online Information eXchange)

ONIX is a group of related XML standards for books, serials and publishing rights information.

Onix for Books was the first of the standards to be widely adopted by the book trade and was developed by EDItEUR with Book Industry Communication (UK) and the Book Industry Study Group (US) and is currently maintained with the guidance of an International Steering Committee. The ONIX for Books Product Information Message is de facto international standard for the electronic communication book trade product information.

Onix for Books is a very comprehensive and sophisticated format, in order to allow as much functionality as possible in the different environments. The most widely implemented release, Onix 2.1 was the first truly international descriptive metadata format to be adopted by the book industry and its successor Onix 3.0, has been further enhanced for e-books.

Libraries have long been interested in the potential for using publisher information as a basis for catalogue records in order to improve efficiency.  Publisher migration from proprietary local formats to Onix has made this a more realistic proposition by reducing the overhead in maintaining multiple translations to MARC. NBAs responsible for maintaining a CIP programme often accept Onix formatted files as notification of forthcoming titles from publishers. Details of Onix to MARC 21 mappings created by OCLC and the Library of Congress can be found on the EDItEUR web site.

Bibliographic framework initiative

BIBFRAME is a new initiative led by the Library of Congress to explore the transition from the long established MARC 21 format via the creation of a new bibliographic data model and vocabulary optimised for use on the Web.  Although BIBFRAME will be designed to cater for library specific needs it will also support the needs of the wider information community and offer new opportunities for integration. The initiative will investigate a range of bibliographic data issues including:

  • Description & cataloguing rules
  • Creation via new means of data entry
  • Exchange protocols and methods
  • Accommodation of varying content models

A number of libraries have begun to experiment with BIBFRAME and while not yet in a final stable form, it has excited considerable interest and debate in the library community. The Library of Congress has created a list of frequently asked questions for BIBFRAME in order to address many of the common queries.