IFLA

As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites

This old website and all of its content will stay on as archive – http://archive.ifla.org

IFLANET home - International Federation of Library 
Associations and InstitutionsAnnual 
ConferenceSearchContacts

64th IFLA Conference Logo

   64th IFLA General Conference
   August 16 - August 21, 1998

 


Code Number: 138-161(WS)-E
Division Number:
Professional Group: UBCIM Core Programme
Joint Meeting with: Permanent UNIMARC Committee and Division of Bibliographic Control
Meeting Number: 161.
Simultaneous Interpretation:   No

UNIMARC and Metadata : Dublin Core

Alan Hopkinson
Middlesex University
London, UK
E-mail: a.hopkinson@mdx.ac.uk


Abstract

Metadata means data about data so UNIMARC itself is a carrier of metadata. UNIMARC was developed for a specific purpose: exchanging records between different automated cataloguing systems. Dublin Core is a set of metadata elements, 15 in all, intended to facilitate the retrieval of electronic resources. This paper discusses reasons why we would want to map two sets of metadata elements with their respective syntaxes which have in common only that they are metadata and bibliographic. Problems with the mappings are also outlined.


Paper

1. Introduction

Metadata means data about data. The term includes catalogue data. It is used increasingly to refer to any data used to aid the identification, description and location of networked electronic resources.

Why do we need a new term when we as librarians have managed quite well without it for so long. The answer is that other interested groups in this electronic age are entering into what was exclusively librarians' territory and they are having to think up or re-use terminology for their own purposes, which do not necessarily conflict with ours. So librarians have taken on board a new term, metadata, though they did not need it as they already had terminology to cover this concept.

According to the above definition, data in the UNIMARC format will usually if not always be metadata.

2. Why this paper: UNIMARC and metadata

In the past, finding aids were produced only on the one hand by librarians who produced general catalogues and on the other by some related groups of people, often practitioners or researchers in a particular field who produced lists of journal articles, or indexing services as they were called. Today, other groups are producing 'finding aids'. The largest arena of this production is in connection with Web search engines. When you do a search on a web search engine, you do not search the whole of the web directly but rather an index to the web which has been generated by a computer scanning web sites around the internet. This index is generated automatically and leaves a great deal to be desired. Here is metadata at its worst! Yet this is metadata's biggest exposure to the world at large.

Many (librarians) who a few years ago predicted the death of the library profession are now retracting and saying the world at large must realise the importance of indexing data intellectually rather than automatically. The question is do you have a librarian indexing in place of or perhaps rather in addition to an automated indexer, or do you have a librarian helping the end user who wishes to make his search more effective? The latter is going back to the idea of the intermediary, so beloved of information scientists in the 1970s. Today users, people at large, want, indeed demand to do their own searching, so the intellectual precision has to be at the index generation end rather than with the end user himself or herself.

What is required is for every web page to include some intellectually devised terms so that the computers that generate indexes can pick these up. Additionally they could include author and title information. Basically the information world needs to produce catalogues of web resources in the same way that cataloguers produce catalogues of books. How do cataloguers produce catalogues of books? They use the title-page, a 'device' which has been developed over centuries to represent the definitive aspects of bibliographic material. As soon as we leave the realm of the book and go into other materials, the cataloguers amongst us look for the title page (or title page substitute). Where is the title page of a kit, a film, a gramophone record? Their title pages are often in other media, for example the record label, though in the case of a film, the 'title-page' could be the label or it could be at the start or end of the film itself.

In the case of certain electronic materials we have a similar situation. Is the title page of a CD the label on the CD or is it in electronic form within the CD? With internet materials we have no such luxury of alternative sources; the 'title-page' must be in the electronic page itself. There is a certain amount of structure mandatory for any web-page: the 'syntax' of the page which has to be present to tell the computer system how to process the data to display it on the end-user's screen. Then there are certain features such as the 'title' which appear on the top of each web-page. However, there are also specially defined data elements which can be accessed by web crawlers. Here may be stored more information than what is displayed on the screen that the end user sees. In one way it can be regarded as CIP, Cataloguing in Publication. However, as well as the data being useful for web browsers, they may also be extracted into library catalogues. Computerised catalogues can then include records of electronic resources ideally with as little manual intervention as possible.

3. Standards for data on web pages

The standards for data on web pages are notoriously free and easy. Standards for indexing are notoriously difficult to achieve anyway, particularly if indexing is to be consistent across more than one discrete catalogue; the web is universal, so the task of indexing across the web is going to be difficult. The structure or syntax on web pages is also customarily free and easy, though there are certain constraints. Dublin Core is shorthand for the Dublin Metadata Core Element Set which was agreed at the OCLC/NCSA Metadata Workshop in March 1995. One of the uses of this set is in the cataloguing of electronic resources and it is generally held that it should be the standard used on web pages for the 'catalogue record', if indeed there is to be one: 'The Dublin Core is the leading candidate as a lingua franca' for resource discovery on the net' [1]. It is worth remembering that Dublin Core is not confined to use in HTML pages. Also noteworthy is that it is intended to be usable by non-cataloguers (e.g. the authors of web pages) as well as by those with experience with formal resource description models (i.e. cataloguers).

Here is an example of a Dublin Core document identification embedded in HTML.

Sample

In this record I chose to invert the author's name: there is nothing in Dublin Core to tell me to do this. Incidentally, I created this example manually from the IFLA page. Though UKOLN do have a Dublin Core generator DC-dot [2], it cannot make as good a job of it as a cataloguer.

4. Dublin Core and UNIMARC

To recapitulate, library cataloguing systems need MARC records, so if a MARC record could be extracted from a web page which contained an electronic document which it was thought to be worth cataloguing, so much the better.

5. Conclusion

When comparisons are made between different 'formats', it is often not very profitable to compare anything other than like with like. However, we have isolated a reason for investigating the convertibility of Dublin Core to UNIMARC, the feasibility of including automatically produced catalogue records of electronic items in library catalogues. The issues are not complex though the conversion itself would be. The nub of the matter is that it is difficult to produce a catalogue record from data which has not been prepared with the aim in view of producing a UNIMARC record. The Dublin Core 'rules' and intentions do not make this any easier.

References

  1. Miller, Eric, Dublin Core metadata. [Dublin : OCLC, 1995?] http://purl.org/metadata/dublin_core

  2. Powell, Andy. DC-dot : a Dublin Core generator. Bath: UKOLN, [1997?]
    http://www.ukoln.ac.uk/metadata/dcdot/

  3. Format for Information Interchange. Geneva, ISO, 1996 (ISO 2709-1996).

  4. Day, Michael Mapping Dublin Core to UNIMARC : draft. Bath, UKOLN, 1997.
    http://www.ukoln.ac.uk/metadata/interoperability/dc_unimarc.html

  5. Caplan, P. and Guenther, R. Metadata for Internet resources: the Dublin Core data elements set and its mapping to USMARC. Cataloguing and classification quarterly, 22 (3/4), 1996, 48.