The following paper was written by Priscilla Caplan in response to MARBI Discussion Paper No. 54 (Providing Access to Online Information Resources). Please send any comments about this paper (or about the previous paper, Discussion Paper No. 54) to the list from which you received this. It is being posted to the following: USMARC-L, PACS-L, and CNI-DIR. The Network Development and MARC Standards Office plans to use any comments received to continue the work of accommodating online information resources in USMARC. We expect to initiate a proposal and/or discussion paper for the June MARBI meetings. PROVIDING ACCESS TO ONLINE INFORMATION RESOURCES A PAPER FOR DISCUSSION Priscilla Caplan Harvard University Library February 14, 1992 History In May 1991, Discussion Paper 49, "Dictionary of Data Elements for Online Information Resources" was forwarded to MARBI for discussion at the June meetings. The gist of this paper is described in one of its opening paragraphs: "Many different kinds of electronic information resources, whether they are numeric databases, computer forums, discussion groups, mailing list servers, online public access catalogs, full-text databases, or other varieties of information resources, are available to users over one or more networks such as the Internet, BITNET, etc. While the USMARC format accommodates the communication of information about computer files, the information in the record is description oriented with minimal attention to access (i.e., information to logon, electronic addresses, etc.). It is clear that while descriptive information is necessary, access information is equally crucial." Discussion Paper 49 went on to itemize a number of data elements pertaining to access to online resources, and to outline a tentative mapping of these elements to the MARC computer files format. It ended with three examples of electronic information resources, describing an online catalog system (GLADIS), a computer conference (PACS-L), and a BRS database (BIOSIS Previews). Following MARBI discussion of this paper, a subgroup of the committee was charged with reviewing the data elements and their tagging more closely and drafting a firmer proposal for further discussion. The subgroup, consisting of John Attig, Bill Jones, and Priscilla Caplan, soon found that before much progress could be made in the definition of data elements, a better understanding was required of the types and characteristics of the information resources under consideration. They began an extended discussion of what types of resource might be appropriate for "cataloging" within the MARC formats. By the time of their deadline, however, they had many more questions than answers, resulting in the modification and expansion of Discussion Paper 49 that was issued as Discussion Paper 54 and discussed briefly at the MARBI meetings in January 1992. Summary This document, arising from discussion papers 49 and 54 and the work of the MARBI subgroup, proposes a different approach to the description of online information resources. It suggests that such resources fall into at least two categories, "electronic data resources" and "online systems/services". The first category encompasses electronic resources that may or may not be offered online and that can be described relatively easily in the current USMARC "bibliographic" formats. One issue with these materials is how to designate their "location" when the data resource is accessible via a remote system or service rather than a traditional library or archive. The second category, "online systems/services", seems to have more in common with the programs and services described by the provisional community information format than with bibliographic data. A typology of resources Entities we might call "electronic data resources" include such things as computer software, documents stored as machine-readable text or images, databases of bibliographic, numeric etc. data, and directories and white pages. Such resources might exist only in electronic form, or might have analogs in print or other formats. Some examples would include: -- RFC-822 (an Internet specification describing email headers) stored as ascii text; -- Xferit, a Macintosh program for file transfer; -- the bitmapped text of a journal article; -- BIOSIS Previews, a collection of citations to life sciences literature produced by BIOSIS; -- Academic Index, a collection of citations to journal literature produced by Information Access Co. (IAC); -- the union catalog of the Harvard libraries; -- the UC Berkeley library catalog. Computer systems or services constitute a second type of entity. These might exist primarily to offer access to data resources (e.g. campus wide information systems) or might be of interest in their own right (e.g. computational resources). Their use may or may not be restricted to certain individuals, or members of some community. One important characteristic of most of these online systems/services is that they are available remotely, via dial or network communications facilities. Some examples would include: -- an ftp (file transfer) site; -- Princeton's campus wide information system (CWIS); -- DIALOG, a commercial system offering a variety of databases; -- HOLLIS, the Harvard Online Library Information System; -- GLADIS, the UC Berkeley library information system. There are several points to make about this division. First, many online systems/services offer access to multiple electronic data resources. For example, both the union catalog of the Harvard libraries and the Academic Index are available through the Harvard On-Line Library Information System (HOLLIS). An ftp site could easily offer access to both RFC-822 and the shareware program Xferit. Conversely, many data resources are accessible via multiple online systems/services. The Academic Index, for example, can be accessed both through HOLLIS and DIALOG. The program Xferit can be obtained through any number of different ftp sites. An online system/service can also offer access to other systems/services. HOLLIS, for example, could offer its users the option of selecting "GLADIS" from a menu, the result of which would be an automatic remote login to the UC Berkeley system. GLADIS in turn might offer that user access to Berkeley's online library catalog, as well as to MEDLINE and other information resources available through GLADIS. It should also be noted that although online systems/services often use or require computer software, the software itself does not constitute the online system/service. Yale's ORBIS and Vanderbilt's ACORN both run on IBM mainframes using similar systems software, and both use applications software marketed by NOTIS Systems, Inc. But ORBIS and ACORN have different access information (hours of operation, Internet addresses, dial-up numbers, etc.), offer access to different data resources, and are clearly different and unique systems/services. In fact, the computer software itself is a data resource, which can be held in many locations. Computer systems/services in general are unique. MARC description of electronic data resources Clearly, at least some electronic data resources are already accommodated in the USMARC computer files format. This is defined for use for "information encoded in a manner that allows it to be processed by a computer or related machine, including both data stored in machine-readable form and the programs used to process the data." The format can be used for files containing numeric data, representational (pictorial or graphic) data, text, and/or software. Databases (collections of machine- readable records) like BIOSIS Previews should fit under this definition. So should text files like the ascii version of RFC- 822, computer software like Xferit, and the pictorial or image file of a journal article. Although data resources can logically be accommodated in the computer files format, several issues need to be addressed. First, this type of data stretches the traditional focus on publication and description. The data may or may not be formally published, or issued in any definitive form. In many cases while the intellectual content remains stable, the physical representation changes from location to location (e.g. whether the data is on disk or diskette, in ascii or EBCDIC, etc.). Second, new types of identifying numbers may be relevant. A subgroup of CNI is working on document identifiers for Internet resources, which should be accommodated in MARC when defined. Third, new data elements may be required for encoding the location of the resource. A print index has a physical location which probably consists of a holding library and call number. For an index available online through HOLLIS or BRS, the physical location (perhaps a storage device in a computing facility) is irrelevant. The system, HOLLIS or BRS itself, is the information required to locate the item. In USMARC, the location is encoded in the 852 field, which is formally part of the Holdings format but may be embedded in bibliographic records. This field contains subfields for location (including library, sublocation or collection, etc.), shelving location, identifying numbers/codes, descriptors, and notes. Location itself is defined as the NUC symbol of the organization holding the item or from which it is available. To allow the designation of electronic "locations" such as HOLLIS, GLADIS, BRS, DIALOG, ftp sites, data archives, etc., either the 852 must be extended, or a new field (e.g. 851) defined. Note that the properties of being online or Internet-accessible do not actually adhere to the data resource, but rather to the systems/services through which access to the data is offered. Therefore the need for extensions to the computer files format to accommodate access information may not be required if information sufficient to identify the "holding" system/service is provided. MARC description of online systems/services This category includes (but is not limited to) library systems such as HOLLIS and GLADIS, commercially available systems such as BRS and DIALOG, campus-wide information systems, community-wide information networks like the Cleveland Freenet, academic and commercial ftp sites, and bulletin boards. A good rule of thumb for distinguishing systems/services from data resources is whether the entity has an internet (TELNET) or dial-up address. Online systems and services seem to fit poorly into the bibliographic formats. The concepts of authorship, publication, physical description, and series do not apply. On the other hand, owners or sponsors, contact persons, addresses, hours of service, and other access information are important data elements. Many of these data elements are defined in the provisional USMARC format for community information, which was formulated for the description of non-bibliographic resources including "programs, services, organizations, agencies, single and ongoing events, and individuals..." Another point of commonality is that online systems/services, like community agencies and programs but unlike bibliographic entities, tend to be one-of-a-kind. Relationship between data resources and systems/services A one-to-many relationship can exist between any given data resource and the systems/services that offer access to it. For example, the Academic Index could be available through both HOLLIS and DIALOG. Presumably, the "bibliographic" record describing each data resource would contain one location field for each relevant system/service. The location field should not be defined to contain all information relevant to accessing the data resource via that system/service (TELNET address, logon instructions, etc.). Rather, the location field should contain enough information to direct the user to a non-bibliographic record for the system/service. That record in turn would contain all the necessary information for accessing that system, getting help, etc. Similarly, a one-to-many relationship exists between any given online system/service and the electronic data resources to which it offers access. For example, the HOLLIS system offers access to many electronic data resources, including the Academic Index and the union catalog of the Harvard libraries. The system/service record for HOLLIS should indicate the data resources it contains. The name of each data resource could appear as an access point in the record for HOLLIS, either as a subject heading or as a name added entry. The advantage of this is that if records for both types of entity were contained in the same catalog or directory, then a user searching "Academic Index" could retrieve not only the record describing the Academic Index but also the system/service record for each of the systems offering access to it. Alternatively, these might be listed in a contents note (505). Users could be expected to find data resources through records for systems/services, and vice versa. For example, a user looking for RFC-822 might make use of broad subject descriptors in system/service records to find ftp sites likely to provide this document. Conversely a user finding a record for the Academic Index in some database might then look up the record for a "holding" system/service to obtain a telnet address and logon instructions. Questions and issues This paper offers a framework for discussing electronic data resources and online systems and services. Even if it is basically acceptable, however, many issues must be resolved before a workable mapping to USMARC can take place. In the case of data resources, the electronic form may be one of many, and there can be many electronic forms. The ascii text of a document could be printed, for example, and the print then scanned and made available as bitmapped images. Can these be treated as multiple versions, with one bibliographic record and multiple holdings, or do different physical formats constitute different bibliographic entities? How much data about systems/services should also be carried in location fields in data resource records? It would be inefficient to have to repeat all access information and instructions redundantly in every location field, particularly as that information can be relatively dynamic and require frequent updating. On the other hand, should the user be required to look up two records to access any resource? How do we guarantee that he has access to both types of records whenever necessary? Facilities available through LISTSERV software require more thought, as do the descriptions of ftp sites. Is PACS-L best thought of as a data resource with a "location" of LISTSERV at UHUPVM1 or as a system/service, like a bulletin board? Is a named directory at an ftp site part of the location of an item? It is also possible for the line between data resources and systems/services to be less clear than one would like, for example, when a library information system combines data from a catalog database and a circulation file to display circulation status. A note on format integration This document contains several references to the computer files format. After format integration, fields will be valid for use in describing any item to which they apply, regardless of format. This does not affect any of the discussion above. However, please note that format integration affects only the so-called "bibliographic" formats (books, serials, maps, manuscripts, music, computer files, and visual materials). Non-bibliographic formats such as authorities, holdings, and community information will not be affected.