This chapter reviews the special problems created by these new methods of creating and supplying information. Many of the documents rely on storage provided by physical media that have been discussed in the preceding chapters.

Electronic Publications

Electronic publications cover the rapidly increasing area of publications that require a computer to be used to access the information that they contain. They can be documents distributed free of charge or obtained by purchase. They are supplied in two forms – Off-line publications and On-line publications. Some electronic publications are not supplied on physical carriers and need to be copied into the libraries' access system and be stored on hard disc stacks, tape streamers or other data storage systems ; others are supplied on physical carriers and can be stored on shelves. This chapter will, therefore, be looking not at the physical carriers – they have been covered in the preceding chapters – but at the specific problems of acquiring, selecting, storing and accessing this group of documents.

Definition and Typology of Electronic Publications

Off-line Publications

An off-line publication is an electronic document which is bibliographically identifiable, which is stored in machine readable form on an electronic storage medium. CD-ROM, diskettes or floppy discs and magnetic tapes are examples.

– Off-line monograph eg a CD-ROM encyclopaedia.
– Off-line serial eg a CD-ROM journal.

On-line Publications

An on-line publication (or resource) is an electronic document which is bibliographically identifiable, which is stored in machine readable form on an electronic storage medium and which is available on-line. For example - an electronic journal, a World Wide Web page or an on-line database.

– On-line monograph eg. a dictionary on the Web.
– On-line serial eg. an electronic journal on the Web.
– On-line resource eg. an organisation's home page.

Electronic publications can be original electronic publications, but they can also be the digitised version of a written or printed document. For many collections, most of the electronic publications will be the digitised version of a written or printed document in their possession. Examples include the CD-ROM of the National Library in Prague which contains several manuscripts and other documents, the Saint Sophia Project from Bulgaria, the Radziwill Chronicle, the Sana'a Manuscripts and the Memoria de Iberoamerica.

The producers and publishers of electronic publications can be traditional publishers who expand into new areas of publishing. It can also be newly established content providers, especially in case of the new publications on the World Wide Web, who only offer on-line electronic publishing. In addition, some companies specialize in CD-ROM publishing.

Nowadays, most publications are written, edited and formatted using word processors and desktop-publishing software. The printed version of the journal or the monograph is derived from the electronic form.

Distinction between Audiovisual Material and an Electronic Publication

Multimedia publications are now produced which contain a mixture of material e.g. a biography, a bibliography, stills (photos), animation, video and sound. It sometimes becomes difficult to distinguish between an audiovisual document and an electronic publication related to text. For example, a movie with subtitling is audiovisual – a CD of Michael Jackson with a video clip consisting of moving images is considered to be an audio CD. A CD-ROM which contains a biography, a bibliography, texts of the songs, some sound, video and photos is considered to be a multimedia CD-ROM publication.

In short, an electronic publication must contain a considerable amount of text before a library will take it on deposit. Some libraries also take audiovisual publications into deposit. e.g. Die Deutsche Bibliothek in Frankfurt am Main in Germany.

Electronic Documents or Virtual Information

The term Electronic Documents or, as they are sometimes called, Virtual Information, refers to the modern methods of transmitting documents between individuals, primarily text-based documents – the equivalents of letters and memoranda – by electronic means ie. without the use of paper. Many of the actual and potential problems created by electronic documents are similar to those created by electronic publications.

The documents, while stored on a physical carrier somewhere and easily accessible to a small group of people including the author, are, nethertheless, difficult for an archivist to obtain access to and preserve. The documents include E-Mail messages and computer files held on personal computers. When electronic documents are stored, it is on physical carriers used by other types of documents. The main factor that differentiates electronic documents from other documents is the method of transmission.

The first, and major, problem in the preservation of electronic documents is to gain access to them and discover what exists. This can only be done with the active support of the institution and its staff. If the institution has a PC network, the problem of access can be eased.

Since many of the E-Mail messages between staff are likely to be trivial and, perhaps, somewhat embarrassing if read by others than the author and the intended recipients, it is essential to ensure that everyone is aware that the archive will be periodically reviewing both formal files and messages held in the central file server to select material that is worth preserving.

Once access is gained, the material can be subject to standard selection criteria and the chosen information copied into the archive's data storage system. The long term preservation of the information can then be part of the archive's strategy for documents in general.

What is involved in acquiring electronic documents and publications ?

Selection

Research is being carried out by many archives and libraries into the best methods to give access to electronic materials in the very long term. Because of the sheer quantity of material being produced, particularly for access via the World Wide Web, selection is essential. Many archives and libraries use the existing selection criteria for printed materials for electronic materials as well. The contents of the document are the relevant factors for selection and not the medium. This means that the physical carrier, the hardware and the software used are not relevant for the selection process. Local policy defines the criteria for selection e.g. in Germany audiovisual material is included in the national bibliography, in some other countries it is not.

Acquisition and Registration

Off-line publications can often come to the library as printed publications. Obviously, when the library starts collecting off-line publications, the publishers have to be notified. In the Netherlands, where deposit is done on voluntary basis, it is important that the publishers are kept informed about the new selection criteria. In France, the law defines what publications are to be submitted.

On-line publications require a new form of co-operation. The publication has to be transmitted from the host system to the library via the network. Selected documents are either ordered, transferred automatically by the publisher or harvested by the library with a harvester application. For on-line documents, acquisition means the physical migration (via the network) of the document from the host-system to the depository system. The publisher/producer or administrator (for archives) needs to be involved in this process.

It is necessary to register documents when they are received by the library. This requires the exchange of bibliographic information (pre-publishing information) between the depository library and publisher (for archives this will be between the governmental institution and the archive), preferably before acquisition. The registration of incoming documents should be activated on arrival.

Installation

It is necessary to install the electronic publication so it can be viewed and described by the librarian. For on-line documents, a connection to the host-system is required ; off-line documents have to be physically installed on a workstation.

Description of the Document

Cataloguing systems for electronic documents are still the subject of much debate. Various groups are discussing how to describe an electronic document. The existing book-based systems such as MARC and its variants do not fully describe these new formats. For example, to be able to view an electronic publication it is also necessary to describe the technical features - which computer and operating system was the publication made for ? which formats are used ? etc. Many fields for the technical description will be made in coded form.

Metadata

Electronic publications offer an opportunity to automate part of the production of a catalogue. Bibliographic data can be retrieved from the electronic publication itself, e.g. from the table of contents (TOC). A research project of the European Commission, BIBLINK, is studying how data can be exchanged between publisher and library in an automated way. The Dublin Core defines the fields that are necessary to support adequate bibliographic description of a Web page. The Dublin Core has received significant support, particularly from North America and including some publishers. A threat that may ultimately make it unacceptable, is that the Dublin Core contains too many features requiring definition at the national level or that require a large maintenance overhead.

Unique Identification

In the international book trade, the unique identification numbers ISBN (International Standard Book Number) and ISSN (International Standard Serial Number) are widely used to uniquely identify a certain version of a monograph or serial publication. ISBN and ISSN are also used for CD-ROMs and on-line publications like electronic journals. However, these numbers are not designed for electronic publications and a proposal was, therefore, made for a Digital Object Identifier (DOI). The DOI is designed by Association of American Publishers and the Corporation for National Research Initiatives.

Authenticity and Integrity

Some electronic publications can easily be changed. What guarantee is there that the bibliographic description defines exactly the version which is stored ? And will it still do so after the lapse of several hundred years and the migration to other carriers and formats. This is still a very tricky area. Several methods are being considered, e.g. time stamps, encryptions and watermarks. But it must be said that the final solution for this issue has yet to be found.

De-installation

After the bibliographical and technical description the electronic publication must be removed from the hard disc on the computer and an on-line session must be closed. This activity has generated new information which should be included to the descriptive record.

Migration, Storage, Conversion and Emulation

Other factors that have to be considered when collecting electronic documents include the following :

– Migration – Migration of the electronic content from the original carrier to the physical storage of the depository system, including migration quality control and duplication for backup (preferably on another medium).

– Storage – The physical storage system will probably use different types of media with different access speeds, e.g. hard disc (very fast), magneto-optical (fast), tape (slow). This requires sophisticated software to monitor the use of documents and to shift documents from tape to discs and vice versa.

– Pathfinder – This is a section of storage that records the physical locations of all the files in a document and makes the file map available to the search engine.

– Conversion and Emulation – Do you have to convert the format of the document to a new format, or do you have to design a system in which the document is stored in the original format ? Emulation software enables the document stored in the original format to be viewed using the new hardware and software.

These techniques are concerned with preservation and final solutions have not yet been found. Increasing speed of technological innovation, new publishing techniques, InterNet and the present lack of standards are a few examples of the uncertainties in which the manager of a depository system must work. There is no proven solution for these systems, large vendors have built systems for data-warehousing and data-mining, although some lack the structured indexing and large scale preservation solutions needed by libraries and archives.

Long term availability and access for end users : remote or on site

Indexing

Descriptive information is indexed for use within the search engine of the depository system. This engine can be part of the pathfinder software or can be a separate existing library system's OPAC module, to be defined locally. To find the right compromise between (the user's) indexing requirements and the technical possibilities is very complicated.

Access

Access to electronic publications by end users must be clearly defined. At present, most access is "on-site" but, when agreements are made with the owners of the information, remote access may be possible.

As with the deposit for printed publications, electronic deposit collections should be used as "collections of last resort". Libraries can, however, give access when agreements are reached with publishers and authors.

Copyright Issues, Authors and Publishers

It is obvious that it is very important that the digital archives and libraries discuss restrictions on access and availability with publishers and authors when this is appropriate.

Usage of Standards

There are many relevant standards for electronic publications. The European Commission has launched an initiative, OII (Open Information Interchange), as part of the IMPACT2 programme. The aim of the OII initiative is to promote the awareness and use of standards for the exchange of information in electronic form. The target audience are developers and providers of information products and services, as well as end-users. Standards can be purchased from international standard offices and many countries have an organisation which translates and distributes the standards. For more information visit the Commission's Web site where copies of publications on standards can be found :
http://www.echo.lu/oii/activity.html.

For the preservation of electronic publications a variety of standards are relevant. These include standards on hardware, operating systems (Windows, MS-DOS, UNIX), physical carriers (CD-ROM, WORM, DAT, diskettes, magnetic tapes), application programs like word processors, databases, spread sheets and formats like MARC, SGML, HTML, etc.

Availability of Electronic Publications on the Market

Printed publications like monographs and serials are no longer available on the market permanently. After a relatively short time, a specific edition of a monograph can be difficult to find in a book shop. It may be possible to order from a large distributor or even the publisher. With off-line electronic publications it is exactly the same. The publishers are no longer interested in keeping publications available when there is no commercial interest in the products. This may be understandable from the market point of view but is still unfortunate. In addition, publishers often do not have a full archive of their own publications. It is very important, therefore, that as soon as possible after the publication date a document should be selected, described and made available (at least for review on site) by a public body like a national archive or a national library.