![]() ![]() ![]() ![]()
UDT Occasional Paper # 8 Digital Libraries: Definitions, Issues and ChallengesGary ClevelandUDT Core Programme E-mail: March, 1998. The idea of easy, finger-tip access to information-what we conceptualize as digital libraries today-began with Vannenar Bush's Memex machine (Bush, 1945) and has continued to evolve with each advance in information technology. With the arrival of computers, the concept centered on large bibliographic databases, the now familiar online retrieval and public access systems that are part of any contemporary library. When computers were connected into large networks forming the Internet, the concept evolved again, and research turned to creating libraries of digital information that could be accessed by anyone from anywhere in the world. Phrases like "virtual library," "electronic library," "library without walls" and, most recently, "digital library," all have been used interchangeably to describe this broad concept. But what does this phrase mean? What is digital library? And what are the issues and challenges in creating them? Moreover, what are the issues involved in creating a coordinated scheme of digital libraries? It has been suggested that digital libraries will only be viable within such a scheme (Chapman and Kenny, 1996). This paper provides a very high-level overview of digital libraries and briefly outlines each of these questions in turn. 1. What is a Digital Library?What is a digital library? There is much confusion surrounding this phrase, stemming from three factors. First, the library community has used several different phrases over the years to denote this concept-electronic library, virtual library, library without walls-and it never was quite clear what each of these different phrases meant. "Digital library" is simply the most current and most widely accepted term and is now used almost exclusively at conferences, online, and in the literature.Another factor adding to the confusion is that digital libraries are at the focal point of many different areas of research, and what constitutes a digital library differs depending upon the research community that is describing it (Nurnberg, et al, 1995). For example:
In fact, a digital library is all of these things. These different research approaches will all add to the development of digital libraries. Third, confusion arises from the fact that there are many things on the Internet that people are calling "digital libraries," which--from a librarian's point of view--are not. For example:
A fairly spectacular example of what many people consider to be a digital library today is the World Wide Web. The Web is a gathering of thousands and thousands of documents. Many would call this huge collection a digital library because they can find information, just as they can do banking in a "digital bank" or buy compact discs in a "digital record store." Yet, is the Web a digital library? According to Clifford Lynch, once of the leading scholars in the area of digital library research, it is not. Lynch (1997:52) states:
One sometimes hears the Internet characterized as the world's library for the digital age. This description does not stand up under even casual examination. The Internet--and particularly its collection of multimedia resources known as the World Wide Web--was not designed to support the organized publication and retrieval of information as libraries are. It has evolved into what might be thought of as a chaotic repository for the collective output of the world's digital "printing presses.".... ...In short, the Net is not a digital library. Thus, in examining the various examples of what are called digital libraries, it appears that librarians have been confused about what a digital library is, that the word "library" has been appropriated by many different groups to describe either their areas of research or signify a simple collection of digital objects. So what is a working definition of "digital library" that makes sense to librarians? As a starting point, we should assume that digital libraries are libraries with the same purposes, functions, and goals as traditional libraries--collection development and management, subject analysis, index creation, provision of access, reference work, and preservation. A narrow focus on digital formats alone hides the extensive behind-the-scenes work that libraries do to develop and organize collections and to help users find information. The institutions involved in the American Digital Library Federation came up with a similar notion of "digital library." It also emphasizes the traditional underpinnings of libraries-selection, access, and preservation-as well as the fact that digital libraries will necessarily be constructed to serve particular communities (Waters, 1998):
Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities. With the assumption that digital libraries are libraries first and foremost, we can list some characteristics. These characteristics have been gleaned from various discussions about digital libraries, both online and in print (See Arms, 1995; Graham, 1995a; Chepesuik, 1997; Lynch and Garcia-Molina, 1995):
One thing digital libraries will not be is a single, completely digital system that provides instant access to all information, for all sectors of society, from anywhere in the world. This is simply unrealistic. This concept comes from the early days when people were unaware of the complexities of building digital libraries. Instead, they will most likely be a collection of disparate resources and disparate systems, catering to specific communities and user groups, created for specific purposes. They also will include, perhaps indefinitely, paper-based collections. Further, interoperability across digital libraries-of technical architectures, metadata, and document formats-will also only likely be possible within relatively bounded systems developed for those specific purposes and communities. For librarians, this definition of a digital library, and these characteristics, are the most logical because it expands and extends the traditional library, preserves the valuable work that they do, while integrating new technologies, new processes, and new media. 2. What are the Issues and Challenges in Creating Digital Libraries?The optimism and hype from the early 1990's has been replaced by a realization that building digital libraries will be a difficult, expensive, and long-term effort (Lynch and Garcia-Molina, 1995). Creating effective digital libraries poses serious challenges. The integration of digital media into traditional collections will not be straightforward, like previous new media (e.g., video and audio tapes), because of the unique nature of digital information--it is less fixed, easily copied, and remotely accessible by multiple users simultaneously. Some the more serious issues facing the development of digital libraries are outlined below.
2.1 Technical architectureThe first issue is that of the technical architecture that underlies any digital library system. Libraries will need to enhance and upgrade current technical architectures to accommodate digital materials. The architecture will include components such as:
One important thing to point out about technical architectures for digital libraries is that they won't be monolithic systems like the turn-key, single box OPAC's with which librarians are most familiar. Instead, they will be a collection of disparate systems and resources connected through a network, and integrated within one interface, most likely a Web interface or one of its descendants. For example, the resources supported by the architecture could include:
Though these resource may reside on different systems and in different databases, they would appear as though there were one single system to the users of a particular community. Within a coordinated digital library scheme, some common standards will be needed to allow digital libraries to interoperate and share resources. The problem, however, is that across multiple digital libraries, there is a wide diversity of different data structures, search engines, interfaces, controlled vocabularies, document formats, and so on. Because of this diversity, federating all digital libraries nationally or internationally would an impossible effort. Thus, the first task would be to find sound reasons for federating particular digital libraries into one system. Narrowing the field in such a manner would reduce the technical and political hurdles required to establish common practices. Further, because of the often uncertain futures of both de jure and defacto standards over time, what those standards are is unclear.
2.2 Building digital collectionsOne of the largest issues in creating digital libraries will be the building of digital collections. Obviously, for any digital library to be viable, it must eventually have a digital collection with the critical mass to make it truly useful. There are essentially three methods of building digital collections:While the third method may not exactly constitute part of a local collection, it is still a method of increasing the materials available to local users. One of main issues here is the degree to which libraries will digitize existing materials and acquire original digital works, as opposed to simply pointing to them externally. This a reprise of the old access versus ownership issue--but in the digital realm--with many of the same concerns such as:
What about digital collection building in a coordinated scheme? There are many reasons why building digital collections is a good candidate for coordinated activity. First, acquiring digital works and doing in-house digitization are expensive, especially to undertake alone. By working together, institutions with common goals can gain greater efficiencies and reduce the overall costs involved in these activities, as was the case with retrospective conversion of bibliographic records. Second, it also reduces the redundancy and waste of acquiring or converting materials more than once. Third, coordinated digital collection building enhances resource sharing and increases the richness of collections to which users have access.
How can specific materials to be processed by a given institution be identified? Who collects and/or digitizes what materials could be based on factors such as:
Yet, no matter how a collection is built-of materials digitized in-house, of original digital works, or of providing access to materials by pointing to other external resources--libraries in a collective must ensure it is preserved and made available in perpetuity. For example, if the only copies of digital works reside on a particular publisher's server, then what happens if the publisher goes bankrupt? Or if the market value of a particular work approaches zero? What if all of part of a digital collection of a library were lost, such as through some catastrophic event? Ensuring long-term preservation and access will require policies and a scheme by which redundant permanent copies are stored at designated institutions. Preservation issues will be discussed further later in the paper.
2.3 DigitizationRecall that one of the primary methods of digital collection building is digitization. What does this term mean exactly? Simply put, it is the conversion of any fixed or analogue media--such as books, journal articles, photos, paintings, microforms--into electronic form through scanning, sampling, or in fact even re-keying. An obvious obstacle to digitization is that it is very expensive. One estimate from the University of Michigan at Ann Arbor, the organization responsible for the JSTOR project, puts the cost of digitizing a single page at $2 to $6 dollars US (Chepesuik, 1997:48).How do you go about deciding what parts of a collection to digitize? There are several approaches available, at least theoretically:
These approaches can be used alone or in combination depending upon a particular institution's goals for digitization. Nested within these approaches are several criteria for selecting individual items. These include:
2.4 MetadataMetadata is another issue central to the development of digital libraries. Metadata is the data the describes the content and attributes of any particular item in a digital library. It is a concept familiar to librarians because it is one of the primary things that librarians do--they create cataloguing records that describe documents. Metadata is important in digital libraries because it is the key to resource discovery and use of any document. Anyone who has used Alta Vista, Excite, or any of the other search engines on the Internet knows that simple full-text searches don't scale in a large network. One can get thousands of hits, but most of them will be irrelevant. While there are formal library standards for metadata, namely AACR, such records are very time-consuming to create and require specially trained personnel. Human cataloguing, though superior, is just too labour extensive for the already large and rapidly expanding information environment. Thus, simpler schemes for metadata are being proposed as solutions.While they are still in their infancy, a number of schemes have emerged, the most prominent of which is the Dublin Core, an effort to try and determine the "core" elements needed to describe materials. The first workshop took place at OCLC headquarters in Dublin, Ohio, hence the name "Dublin Core." The Dublin Core workshops defined a set of fifteen metadata elements--much simpler than those used in traditional library cataloguing. They were designed to be simple enough to be used authors, but at the same time, descriptive enough to be useful in resource discovery. The lack of common metadata standards-ideally, defined for use in some specified context-is yet another a barrier to information access and use in a digital library, or in a coordinated digital library scheme. 2.5 Naming, identifiers, and persistenceThe fifth issue is related to metadata. It is the problem of naming in a digital library. Names are strings that uniquely identify digital objects and are part of any document's metadata. Names are as important in a digital library as an ISBN number is in a traditional library. They are needed to uniquely identify digital objects for purposes such as:
Any system of naming that is developed must be permanent, lasting indefinitely. This means, among other things, that the name can't be bound up with a specific location. The unique name and its location must be separate. This is very much unlike URLs, the current method for identifying objects on the Internet. URL's confound in one string several items that should be separate. They include the method by which a document is accessed (e.g., HTTP), a machine name and document path (its location), and a document file name which may or may not be unique (e.g., how many index.html files do you have on your Web site?). URLs are very bad names because whenever a file is moved, the document is often lost entirely. A global scheme of unique identifiers is required, one that has persistence beyond the life of the originating organization and that is not tied to specific locations or processes. These names must remain valid whenever documents are moved from one location to another, or are migrated from one storage medium to another. Three examples of schemes proposed to get around the problem of persistent naming are PURLs, URNs, and Digital Object Identifiers.
The issue of persistent naming raises it head in a coordinated scheme, as well. Persistent names is an organizational problem, rather than an engineering problem. Technically, a system to handle names is possible, however, unique identifiers will only persist if some institution takes responsibility for their management and migration from a current technology to succeeding generations of technologies. Thus, one goal of a coordinated digital library scheme would be to identify an institution or institutions that would take charge of issuing, resolving, and migrating a system of unique names. 2.6 Copyright / rights managementCopyright has been called the "single most vexing barrier to digital library development" (Chepesuik, 1997:49). The current paper-based concept of copyright breaks down in the digital environment because the control of copies is lost. Digital objects are less fixed, easily copied, and remotely accessible by multiple users simultaneously. The problem for libraries is that, unlike private businesses or publishers that own their information, libraries are, for the most part, simply caretakers of information--they don't own the copyright of the material they hold. It is unlikely that libraries will ever be able to freely digitize and provide access to the copyrighted materials in their collections. Instead, they will have to develop mechanisms for managing copyright, mechanisms that allow them to provide information without violating copyright, called rights management.Some rights management functions could include, for example:
2.7 PreservationAnother important issue is preservation--keeping digital information available in perpetuity. In the preservation of digital materials, the real issue is technical obsolescence. Technical obsolescence in the digital age is like the deterioration of paper in the paper age. Libraries in the pre-digital era had to worry about climate control and the de-acidification of books, but the preservation of digital information will mean constantly coming up with new technical solutions.When considering digital materials, there are three types of "preservation" one can refer to:
What can libraries jointly do in a coordinated scheme? They can:
3. ConclusionLibraries around the world have been working on this daunting set of challenges for several years now. They have created many digital library initiatives and projects, and have formed various national schemes for jointly exploring key issues. With several years accumulated experience, the initial enthusiasm surrounding the development of the digital library has been replaced by sober second thought. Librarians have discovered that, with a few exceptions, making a business case for digitization and investments in digital technology is more difficult than first envisioned, especially given the technical and legal constraints that must first be overcome. As with most other technical developments in libraries over the years, we will have to move forward in small, manageable, evolutionary steps, rather than in an rapid revolutionary manner.Selected Sources
Bush, V., "As We May Think", Atlantic Monthly, July 1945, pp. 101-108. Chapman, S. and Kenny, A.R. (1996). Digital conversion of research library materials: a case for full informational capture. D-lib Magazine, October, 1996. URL: http://www.dlib.org/dlib/october96/cornell/10chapman.html Chepesuik, R. (1997). The future is here: America's libraries go digital. American Libraries, 2(1), 47-49. Erway, R.L. (1996). Digital initiatives of the Research Libraries Group. D-Lib Magazine, December, 1996. URL: http://www.dlib.org/dlib/december96/rlg/12erway.html Graham, P.S. (1995a). Requirements for the digital research library. URL: http://aultnis.rutgers.edu/texts/DRC.html Graham, P.S. (1995b). Long-term intellectual preservation. URL: http://aultnis.rutgers.edu/texts/dps.html Lesk, M. (1996). Going digital. Scientific American. March, 1996, 58-60. Also available at: URL: http://www.sciam.com/0397issue/0397lesk.html Lynch, CA (1995). The Tulip project: context, history, and perspective. Library Hi Tech, 52(13), 8-24. Lynch, C.A. (1997). Searching the Internet. Scientific American, March, 1997, 52-56. Also available at: URL: http://www.sciam.com/0397issue/0397lynch.html Lynch, CA. and Garcia-Molina, H. (1995). Interoperability, scaling, and the digital libraries research agenda: a report on the May 18-19, 1995 IITA Digital Libraries Workshop. URL: http://www-diglib.stanford.edu/diglib/pub/reports/iita-dlw/main.html Lynch, C.A. (1998). Identifiers and their role in networked information applications. Feliciter, January, 1998, pp. 31-35. Masinter, L. (1995). Document management, digital libraries, and the Web. URL: http://www.cernet.edu.cn/HMP/PAPER/243/html/paper.htm Miller, J.S. (1996). W3C and digital libraries. D-Lib Magazine, November, 1996. URL: http://www.dlib.org/dlib/november96/11miller.html Nurnberg, P.J., Furuta, R., Leggett, J.J., Marshall, C., and Shipman III, F.M. (1995). Digital libraries: issues and architectures. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries. Austin, Texas, June 11-13, 1995, pp. 147-153. Schatz, B. and Chen, H. (1996) Building large-scale digital libraries. Computer, May, 1996. Also available at: URL: http://www.computer.org/pubs/computer/dli/ Shreeves, E. (1997). Is here a future for cooperative collection development in the digital age? Library Trends, 4(3), 373-390. Steele, Colin. (1995). The digital library: do's, don'ts and developments. The Electronic Library, 13(5), 435-437. Stefik, M. (1997). Trusted systems. Scientific American, March, 1997, 78-81. Also available at: URL: http://www.sciam.com/0397issue/0397stefik.html RLG. (1995) Preserving digital information: The Report of the Task Force on Archiving of Digital Information. Commissioned by the Commission on Preservation and Access and the Research Libraries Group. URL: http://www.rlg.org/ArchTF/tfadi.index.htm Waters, D.J. (1998). What are digital libraries? CLIR Issues, July/August. URL: http://www.clir.org/pubs/issues/issues04.HTML Weibel, S. (1995). Metadata: The Foundations of Resource Description. D-Lib Magazine, July 1995. URL: http://www.dlib.org/dlib/July95/07weibel.html Notes
| |||
|
| |||
| Latest Revision: April 6, 1998 |
Copyright © 1995-2000
International Federation of Library Associations and Institutions www.ifla.org | ||