As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites
This old website and all of its content will stay on as archive – http://archive.ifla.org
Why do we need a new term when we as librarians have managed quite well without it for so long. The answer is that other interested groups in this electronic age are entering into what was exclusively librarians' territory and they are having to think up or re-use terminology for their own purposes, which do not necessarily conflict with ours. So librarians have taken on board a new term, metadata, though they did not need it as they already had terminology to cover this concept.
According to the above definition, data in the UNIMARC format will usually if not always be metadata.
Many (librarians) who a few years ago predicted the death of the library profession are now retracting and saying the world at large must realise the importance of indexing data intellectually rather than automatically. The question is do you have a librarian indexing in place of or perhaps rather in addition to an automated indexer, or do you have a librarian helping the end user who wishes to make his search more effective? The latter is going back to the idea of the intermediary, so beloved of information scientists in the 1970s. Today users, people at large, want, indeed demand to do their own searching, so the intellectual precision has to be at the index generation end rather than with the end user himself or herself.
What is required is for every web page to include some intellectually devised terms so that the computers that generate indexes can pick these up. Additionally they could include author and title information. Basically the information world needs to produce catalogues of web resources in the same way that cataloguers produce catalogues of books. How do cataloguers produce catalogues of books? They use the title-page, a 'device' which has been developed over centuries to represent the definitive aspects of bibliographic material. As soon as we leave the realm of the book and go into other materials, the cataloguers amongst us look for the title page (or title page substitute). Where is the title page of a kit, a film, a gramophone record? Their title pages are often in other media, for example the record label, though in the case of a film, the 'title-page' could be the label or it could be at the start or end of the film itself.
In the case of certain electronic materials we have a similar situation. Is the title page of a CD the label on the CD or is it in electronic form within the CD? With internet materials we have no such luxury of alternative sources; the 'title-page' must be in the electronic page itself. There is a certain amount of structure mandatory for any web-page: the 'syntax' of the page which has to be present to tell the computer system how to process the data to display it on the end-user's screen. Then there are certain features such as the 'title' which appear on the top of each web-page. However, there are also specially defined data elements which can be accessed by web crawlers. Here may be stored more information than what is displayed on the screen that the end user sees. In one way it can be regarded as CIP, Cataloguing in Publication. However, as well as the data being useful for web browsers, they may also be extracted into library catalogues. Computerised catalogues can then include records of electronic resources ideally with as little manual intervention as possible.
Here is an example of a Dublin Core document identification embedded in HTML.
In this record I chose to invert the author's name: there is nothing in Dublin Core to tell me to do this. Incidentally, I created this example manually from the IFLA page. Though UKOLN do have a Dublin Core generator DC-dot [2], it cannot make as good a job of it as a cataloguer.
Here is a table of comparisons based on that from that study but adding the recently added 856 field which was mentioned in Brian Holt's paper.
Additionally a few changes have been made to add extra titles such as parallel title and to remove certain descriptive data elements such as 200 $f First statement of responsibility (equated to creator). Data in this subfield are not in indexed form and may just not be necessary in an electronic medium as they merely repeat data in an access point field (700) in another form (as on the document instead of formalised).
Dublin Core UNIMARC Title 200 $a Title Proper 200 $e Other Title Information (for subtitle) 510 $a Parallel title 517 $a Other Variant Titles (for other titles) Creator 700 $a Personal Name - Primary Intellectual Responsibility, or if more than one: 701 $a Personal Name - Alternative Intellectual Responsibility 710 $a Corporate Body Name - Primary Intellectual Responsibility, or: 711 $a Corporate Body Name - Alternative Intellectual Responsibility Subject 610 $a Uncontrolled Subject Terms 606 Topical Name Used as Subject (for LCSH and MeSH) 675 UDC 676 DDC 680 LCC 686 Other Classification Systems Description 330 $a Summary or Abstract Publisher 210 $c Name of Publisher, Distributor, etc. Contributors 701 $a Personal Name - Alternative Intellectual Responsibility 711 $a Corporate Body Name - Alternative Intellectual Responsibility Date 210 $d Date of Publication, Distribution, etc. Type 608 Form, Genre or Physical Characteristics Heading Format 336 $a Type of Computer File (provisional) Identifier 001 allocated by the system 010 ISBN 011 ISSN 020 (National Bibliography Number) 856 $aURL Source 324 Original Version Note Language 101 Language of the Item Relation 300 General Note Coverage 300 General Note Rights 300 General Note
Michael Day's paper goes into detail and may be read there. The main thrust is that UNIMARC records consist of data formulated by highly controlling cataloguing codes: Dublin Core data elements are less highly specified. The data elements reflect this in that they cover broader categories of data. UNIMARC also has a concept of main entry (not mandatory, but usually present). Dublin Core does not include this concept. Day also refers to a study by Caplan and Guenther relating to US MARC [5]. Many characteristics of US MARC apply to UNIMARC.
In short, data produced according to one set of conventions in one tradition by one category of producer will not usually be easily converted to data produced by another. What if cataloguers produce data in Dublin Core with a view to its automatically producing a catalogue record in another format? Even this does not seem possible as Dublin Core does not have any coding to provide the necessary detail for the specification of a record that could be converted to UNIMARC. The short answer to this is that it may not be possible to have a standard which is suitable for authors and for cataloguers at the same time. In book production one has publishers in between the author and the publication and even then many publishers would not be able to provide their own CIP record. Library catalogues today do not usually distinguish between personal and corporate authors in their indexes since the powerful retrieval tools we have make it unnecessary. MARC formats include the potential to do this and indeed the distinction is mandatory in most MARC formats and must be followed there. But it is not there in Dublin Core. One distinction in Dublin Core is between Creator and Contributor which is present, though not always explicit, in UNIMARC, and deeper down, hidden in the relator codes which are not mandatory.
If you go to Day's paper, you will see that since he wrote this paper the only area where compatibility is increased is the new field for URL in UNIMARC and this is a clerical not an intellectual improvement to UNIMARC. Dublin Core has not changed, though Dublin Core is extensible and work is going on to formulate best practices for doing this.