As of 22 April 2009 this website is 'frozen' in time — see the current IFLA websites
This old website and all of its content will stay on as archive – http://archive.ifla.org
The paper argues, that recognition of these differences is a fundamental requirement for a re-definition of the role of librarians in a newly emerging and rapidly evolving information paradigm
One of the specific goals of the present paper is to give some indications concerning the inadequacy of such a reaction: the basic aim is to identify some of the points of concern the metadata issue is likely to create for libraries in the near future.
This paper thus is by no means an introduction to the metadata issue assuming a basic knowledge of the metadata issue. Such knowledge is easy to obtain in the WWW: starting points such as the "Metadata Resources" area (http://ifla.inist.fr/II/metadata.htm) provided by IFLA or UKOLN's metadata site (http://www.ukoln.ac.uk/metadata/ ) provide extensive information regarding all aspects of metadata. Anyone familiar with these information sites of with the subject of metadata generally will be ready to understand, why I have to narrow the focus more than slightly here: this paper will not attempt to cover all metadata standards and activities but rather concentrate on one example, maybe the most prominent one at the moment: the so called "Dublin Core" (DC) set (for background information first see http://purl.org/metadata/dublin_core).
Neither is this paper attempting a contribution to relevant standardising processes in the field of DC or of existing/emerging library cataloguing rules and formats (such as ISBD(ER)) or arguing in favour of either of these working models: there are contexts more suitable for this (such as the respective mailing lists) and there surely are specialists in both fields, who are entitled to such contributions to a much higher degree than the author of this paper.
What I am concerned with here is rather the question of the possible mutual relationship between the cataloguing and the metadata approach with only very timid and tentative attempts to give any answers. It has been maintained, that metadata and 'conventional' cataloguing records are complementary to some extent, whereas the main point I would like to make in this contribution is, that they are fundamentally different, if not conflicting working models, and that the working concepts underlying both models differ substantially, too.
There are, after all, a few good reasons - some explicit, others implicit - why the metadata community did not start off proposing MARC amendments but created a completely new frame of attributes. Some of the reasons for this have their roots in the outside view of what librarians are doing: a vital point to reflect on for librarians.
On the other hand, the metadata approach today benefits from the bonus of any fresh start - once this is over, metadata based activities are likely to rediscover some of the problems and pitfalls librarians have been experiencing during the past 30 years: while reinventing wheels may even sometimes be justified (and has been current practice in the field of library automation until now, anyway) there are good reasons to at least avoid errors already made by others.
This contribution is intended to provoke and stimulate discussion: I thus apologise for all the necessary simplifications and analogies I am going to use in this context: they are as wrong as any simplifications and analogies ...
This perfectly complies to a similar point made very early in the DC discussion context by P. Caplan. In an attempt to answer the question ""What is Metadata, Anyway?" she asserted that "Metadata really is nothing more than data about data; a catalog record is metadata; so is a TEI header, or any other form of description. We could call it cataloging, but for some people that term carries excess baggage, like Anglo-American Cataloging Rules and USMARC. So to some extent this is a "you call it corn, we call it maize" situation, but metadata is a good neutral term that covers all the bases. (CAPLAN 1995) (1)
In another attempt at giving an overview of metadata formats R. Heery still places cataloguing and DC within the same continuous paradigm but indicates a difference in complexity:
A variety of formats have been placed in this table, positioned along a continuum from simple records (Band One) to complex, rich records (Band Four). The variety of record types identified in the bibliographic control process can be placed on this continuum as shown below.
Band One Band Two Band Three Band Four Proprietary Dublin Core MARC ICPSR simple records: NetFirst IAFA TEI FGDC independent headers [...] [...] [...] [...] Publishers' CIP MARC EDI messages CIP forms(HEERY 1996a)
All this seems to indicate, that the basic concern of this paper in fact is a non-issue, a mere matter of slightly changing terminology and variants of complexity.
A more than slight difference, however, can be perceived in the following definition given by T. Berners-Lee: "Metadata is machine understandable information about web resources or other things." - and this passage continues: "The phrase "machine understandable" is key. We are talking here about information which software agents can use in order to make life easier for us, ensure we obey our principles, the law, check that we can trust what we are doing, and make everything work more smoothly and rapidly." (BERNERS-LEE 1998)
This already differs sensibly from the "We could call it cataloging"-position: while the overall objectives could be claimed those of cataloguing activity, too (reliability and authentication of meta-information) the context of information usage is different (software agents rather than library users) and the explicit concern for efficiency actually implies, that things are intended to "work more smoothly and rapidly" - than cataloguing!
The difference gets even clearer, once we take into account another aspect that initially led to the DC initiative and that has recently been recalled by Stu Weibel "One of the original motivations for the DC workshop series was the notion that authors could supply their own descriptions." (WEIBEL 1998) (2) - not only does the production flow differ, but the originators of meta-information basically are not library cataloguers.
An additional aspect to keep in mind is the fact, that another original focus of the DC initiative was "to facilitate resource discovery in a networked environment" (LAGOZE 1997) and thus not primarily resource description. The metadata approach thus only accidentally fits within the descriptive paradigm of library cataloguing.
In fact, all this comes down to a clearer notion of the explicit and implicit assumptions connected to the term metadata: these are intended for a context of usage different from library catalogues, they are typically not created by professional cataloguers, they are intended to be produced more efficiently than cataloguing records, they cover a specific kind of material (electronic resources) and - this point is intended to be made further down - the relation between metadata and the resources referenced differs substantially from the relation between a cataloguing record and a book held by a library.
Even though the results of metadata production, the actual DC records, may be semantically similar to a simplified cataloguing record (and can easily be mapped to a MARC format (3)), the whole context of production and usage of this information is substantially different and driven by the intention to bypass the traditional cataloguing paradigm. Considering the process of metadata creation to be some kind of simplified cataloguing thus probably would be a serious misunderstanding.
The same is not true for DC and other metadata initiatives: one of their main characteristics seems to me, that they are driven by very specific enduser requirements to a very high degree. This could be seen as a disadvantage since changes in enduser behaviour and the context of usage are likely to affect such an approach fundamentally with the risk of lacking continuity - however, this characteristic probably is considered a positive aspect today. Whenever DC is introduced, arguments are developed with specific kinds of resources in mind (electronic objects in the WWW environment), they come with specific assumptions regarding the context of usage (enhancement of precision in the context of internet search engines for example is one of the recurrent arguments in this context) and they are often developed having a specific user group in mind: the 'digital tourist' metaphor cherished by the DC community is significant in this sense.
This is true to some extent already for DC semantics. To give just one example, one of the basic assumptions here seems to be the uniqueness of resources, not accounting for the fact, that a 'work' (in the 'Functional Requirements' terminology) may have different representations / manifestations, and that copies of these may exist - the result is a 1:1 relation between metadata and physical resources tailored to the 'flat' information paradigm of the WWW (4). This fact gets even more tangible in the context of the corresponding syntax proposals, which are clearly oriented at a WWW usage environment.(5)
This fundamental difference may perhaps best be illustrated in comparing the respective relations between cataloguing records and books and between metadata and the resources referenced by these.
In most local library systems bibliographic records are typically complemented with copy records containing the 'pointers' (i. e. shelf-marks) indicating the location of a book. Such a 'pointer' is then typically using the library systems' proprietary circulation functionality as mediating instance and often even requires additional human activity from library staff to provide the user with his object of desire, the actual book or document. The basic point is, that this context of usage has little or no consequences for the bibliographic record and the cataloguing activity.
The situation is fundamentally different for metadata, as already pointed out by R. Heery:
Metadata such are part of a specific technical information infrastructure, and this is true to some extent even for the semantic level, that was originally intended to be context free: the actual value of a metadata record is determined to a very high degree by the fact, that the access pointers contained in the record actually work (this explains the high concern of the DC related discussion about the 'broken links' problem and its necessary involvement with URN- or other identifier-related standardising processes), and that these access pointers comply to the technical requirements of the application software used for the access to information. In simplifying this aspect very much one could say, that a metadata record containing an invalid resource pointer is almost worth less than no record at all.
The conclusion of this section thus is, that metadata not only belong to a different production paradigm, but that they also are intended to be part of a usage context different than that of cataloguing records, and that they are technically linked to this context to a very high degree. While this may seem to simplify things enormously (enabling direct document access using standardised pointing methods) this fact paradoxically complicates things at same time, since the role of metadata records in this information infrastructure depends on the rapid evolution of quickly evolving and changing internet standards (making clear, that this point, too, is a mere fact, and not intended to be read as a criticism of the metadata approach).
It may of course be possible to tentatively combine both information paradigms as in the proposal of using the library OPAC as a gateway to access the metadata repository made by XU (1998). I do not want to discuss this in detail, even though I have my personal doubts regarding its immediate practicability. This is, however, one important direction to investigate for librarians, and some of the recent and current work being done in my institution - Pica - goes in the same direction combining library automation and internet information techniques as we did in our WebDOC project or in our DELTA project.
There are, however, other areas, where the metadata community may benefit from specific librarian expertise and experience (or where this is already the case, due to the presence of many persons representing the library world in this community), and this probably is true for so called 'qualified DC' to a much higher degree than for 'simple DC'. I am thinking of examples such as the use of repeatable items and the lessons possibly to be learned from the MARC experience and its subfield architecture or the use of controlled vocabulary, which may lead to discussions strangely resembling those in the library world about authority forms in the past. There are more areas of this kind, where the necessary reinvention of the metadata wheel may (and already does) avoid problems already identified in earlier contexts.
I would like to end this paper by indicating two areas, where substantial and continuous contributions from the library world may be especially useful for the metadata approach. One participant in the meta2 mailing list recently stated:
The second area I am thinking of is closely related to this and concerns the problem of metadata authentication. The recent report on the EC Metadata Workshop in Luxembourg states, "that the current take-up of Dublin Core is slow and that there is a lack of critical mass". Among other reasons one of the problems underlying this fact is the relative little use search engines like AltaVista are currently making of metadata beyond mere keyword indexing and the lack of metadata authentication in turn has been suggested to be one of the major reasons for that. A message from S. Weibel to meta2 reacts to this problem in stating:
Other formal communities also are positioned to provide trusted resource description... museums, governments, publishers, professional and trade organizations. There is room for abuse in any such system, and there will be (already is) in the metadata realm. This just makes it that much more critical that those with a mission to provide reliable resource description find common conventions (including means for validation) on which we may build the future we envision." (WEIBEL 1998)
The following suggestion has been made in that context:
I am not sure whether this is a promising or desirable path: there may be a way of involving public institutions like libraries in this necessary process of information brokering. While I agree, that trusted third parties will be needed in this process, I am not sure whether all of us would be happy to entirely depend on the brokering services of commercial institutions in this vital context of information validation. Even if this idea is anticyclic in the sense of moving against the current wave of deregulation I do think that this is an important point to consider.
Taking up again the title of this paper: it should have become clear by now, that metadata indeed is not a mere buzzword and not at all old wine in new bottles. The approach this term is synonym of stems from an information paradigm differing from that of library cataloguing activity and I think, that libraries should feel invited to follow its evolution intensely and not perceive it as a possible threat but rather as a chance of re-defining their role in the context of newly emerging information paradigms.
Berners-Lee, Tim: Metadata Architecture. Documents, Metadata, and Links. Last edit Date: 1998/02/06 17:06:46. http://www.w3.org/DesignIssues/Metadata.html (= BERNER-LEE 1998)
Caplan, Priscilla: You Call It Corn, We Call It Syntax-Independent Metadata for Document-Like Objects. In: The Public-Access Computer Systems Review 6, no. 4, 1995. http://info.lib.uh.edu/pr/v6/n4/capl6n4.html (= CAPLAN 1995)
Heery, Rachel: Metadata Formats. December 1996. Deliverable D1.1 - Work Package 1 of Telematics for Libraries project BIBLINK (LB 4034) http://www.ukoln.ac.uk/BIBLINK/wp1/d1.1/ (= HEERY 1996a)
Heery, Rachel: Review of Metadata Formats. In: Program, Vol. 30, No. 4, October 1996, pp. 345-373 (= HEERY 1996b)
Lagoze, Carl: From Static to Dynamic Surrogates. Resource Discovery in the Digital Age. In: D-Lib Magazine, June 1997. http://www.dlib.org/dlib/june97/06lagoze.html (LAGOZE 1997)
Miller, Paul: Metadata for the Masses. In: Ariadne, 5, Sept. 1996. http://www.ariadne.ac.uk/issue5/metadata-masses/ (= MILLER 1996)
Metadata Workshop, Luxembourg - 1-2 December 1997. Workshop Report. http://hosted.ukoln.ac.uk/ec/metadata-1997/report/
Miller, Paul: An Introduction to the Resource Description Framework. In: D-Lib Magazine, May 1998 (= MILLER 1998)
Olson, Nancy B. (Ed.): Cataloging Internet Resources. A Manual and Practical Guide. Second Edition. http://www.oclc.org/oclc/man/9256cat/toc.htm (= OLSON)
A User Guide for simple Dublin Core. Draft version 4.0 (15/05/1998) ; http://128.253.70.110/DC5/UserGuide4.html (= USER GUIDE)
Weibel, Stuart: Re: authentication of metadata. meta2@mrrl.lut.ac.uk (23 Jan. 1998) (= WEIBEL 1998)
Weibel, Stuart and Hakela, Juha. DC-5: The Helsinki Metadata Workshop: A Report on the Workshop and Subsequent Developments. Official report of the Helsinki DC Meeting. In : D-Lib Magazine, February 1998, http://www.dlib.org/dlib/february98/02weibel.html (= WEIBEL/HAKELA 1998)
Weinheimer, James: Re: authentication of metadata. meta2@mrrl.lut.ac.uk (23 Jan. 1998) (= WEINHEIMER 1998)
Xu, Amanda: Metadata Conversion and the Library OPAC. In: The Serials Librarian 33 (1-4) (Spring 1998), http://web.mit.edu/waynej/www/xu.htm (= XU 1998)