USING Z39.50 IN AN APPLICATION FOR THE GOVERNMENT INFORMATION LOCATOR SERVICE (GILS): A BACKGROUND PAPER Developed as Part of the Cooperative Research Study: Expanding Research and Development on the ANSI/NISO Z39.50 Search and Retrieval Standard between the School of Information Studies, Syracuse University and The United States Geological Survey funded by The Interagency Working Group on Data Management for Global Change Charles R. McClure William E. Moen Co-Principal Investigators School of Information Studies 4-206 Center for Science and Technology Syracuse University Syracuse, NY 13244-4100 Telephone: (315) 443-2911 Fax: (315) 443-5806 May 7, 1994 CONTENTS 1. Introduction 2. ANSI/NISO Z39.50: A Standard for Information Retrieval 3. The Research and Development Project 3.1. A Profile for GILS 4. Assumptions and Agreements about GILS 4.1. The GILS System Architecture Model 4.2. Navigating through GILS 4.3. Uniform Resource Identifiers in GILS 4.4. GILS Locator Records 4.5. GILS Searching 4.6. GILS Browsing 4.7. Input Formats for GILS Records 4.8. Use of Z39.50 in GILS 5. Z39.50 SPECIFICATIONS FOR THE GILS APPLICATION 5.1. Version 5.2. GILS Objects 5.3. Communications Services 5.4. Z39.50 Facilities and Services 5.4.1. Search 5.4.1.1. Attribute Set 5.4.1.2. Well-known Search 5.4.2. Retrieval 5.4.2.1. Schema 5.4.2.2. Element Set Names 5.4.2.3. Record Syntaxes 5.5. Preferred Display Format for Use with SUTRS 5.6. Diagnostic Messages 6. Data Elements in GILS Locator Records 7. Conclusion Notes References Appendices Appendix A -- Project Team Members Appendix B -- Definitions USING Z39.50 IN AN APPLICATION FOR THE GOVERNMENT INFORMATION LOCATOR SERVICE (GILS) 1. Introduction This document describes a research effort focused on the use of ANSI/NISO Z39.50, the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection (National Information Standards Organization, 1992), in the proposed Government Information Locator Service (GILS). A primary component of this research has been the development of a GILS Profile. The GILS is a response to the need for users to be able to identify, locate, and access or acquire publicly available Federal information resources, including electronic information resources. The authoritative document describing the vision and function of GILS is _Government Information Locator Service (GILS)_ (Christian, 1994); that document provides an overview of GILS, its objectives, service requirements, and core requirements.[1] According to Christian (1994) the GILS is a decentralized collection of locators and associated information services that includes information and technology components as well as policy, regulation, and people. The GILS is intended to help the public locate and access public information throughout the U.S. government. The GILS Profile includes the specifications for Z39.50 in the GILS application operating in the Internet environment.[2] Additionally, the GILS Profile addresses other aspects of GILS conformant servers that are beyond the scope of Z39.50. The GILS Profile provides the specifications for the overall GILS application relating to the GILS Core, which is a subset of all GILS Locator Records, and completely specifies the use of Z39.50 in this application.[3] The GILS Profile will be used by implementations of GILS servers. It will also be used by client developers to understand expected behaviors of GILS servers. This paper discusses the work by project team of Z39.50 experts and other participants in developing the GILS Profile. Components of the team's work included understanding the high-level functional requirements for GILS described in Christian (1994), agreeing upon a model of the GILS system architecture and information flows in the GILS, and delineating the functional requirements that could be addressed by the GILS Profile. This paper is intended to serve as background to the assumptions, choices, and decisions by the project team that resulted in the specifications contained in the GILS Profile. The work of the project team occurred within the research project, "Expanding Research and Development on the ANSI/NISO Z39.50 Search and Retrieval Standard," coordinated by Syracuse University and the United States Geological Survey, funded by the Interagency Working Group on Data Management for Global Change. The result of this research project is to provide the specifications (i.e., the GILS Profile) for initial GILS implementations that are expected to provide users with information about the location of and ways to access or acquire Federal information resources. The current research builds upon a previous study, _Identifying and Describing Federal Information Inventory/Locator Systems: Design for Networked-Based Locators_ (McClure, Ryan & Moen, 1992). That study, which was conducted for the Office of Management and Budget, the National Archives and Records Administration, and the General Services Administration, recommended that each Federal agency establish a network-accessible locator that describes its information resources. The study also recommended that agencies use Z39.50 as the appropriate information retrieval protocol to achieve a distributed, standards-based Government Information Locator Service. 2. ANSI/NISO Z39.50: A Standard for Information Retrieval The information retrieval protocol, Z39.50, provides a common language for clients and servers to select and retrieve records from databases. The purpose of Z39.50 is to allow one computer operating in a client mode to perform information retrieval queries against another computer acting as an information server; the protocol also provides for the transfer of records or other information from the server to the client. Z39.50 does not prescribe how a particular system will execute the searching and retrieval on databases nor does it prescribe user interface requirements. The standard is an applications-layer protocol within the Open Systems Interconnection (OSI) reference model. The OSI Basic Reference Model (ISO 7498: 1984 Information processing systems-- Open Systems Interconnection--Basic reference model) was developed at the international level by the International Organization for Standardization (ISO). ANSI/NISO Z39.50 is an American National Standard developed and was approved by the National Information Standards Organization (NISO) in 1988. NISO balloted and approved a 1992 revision of the standard (also referred to as Version 2). The Z39.50 Implementors Group (ZIG) has been preparing Version 3 of the standard (which contains new enhancements, extensions, etc.) for official balloting through NISO in 1994.[4] Z39.50 is parallel to two OSI international standards: ISO 10162: 1993 Information and documentation--Search and Retrieve Application Service Definition; and ISO 10163-1: 1993 Information and documentation--Search and Retrieve Application Protocol Specification. Although developed as an OSI application-layer protocol, Z39.50 is currently used by implementors in the Transmission Control Protocol (TCP) environment of the Internet. The success of an Z39.50 interoperability testbed in 1992 showed how the transport service of TCP can successfully support the protocol. Lynch (1994) provides the specification for using Z39.50 over TCP. 3. The Research and Development Project For the current study, a group comprising experts in Z39.50 implementations, system implementations, and information organization, and representatives of Federal agencies has been working as part a research project coordinated by Syracuse University and the United States Geological Survey, funded by the Interagency Working Group on Data Management for Global Change. (See Appendix A for names of project team members.) To advance the development of the GILS, the research project has focused on the use of open systems standards to improve the utility of information searching and retrieval on digital networks. More specifically, the project has as its objectives to: o Expand research and development on the American National Standard for information searching and retrieval (Z39.50) for its application in facilitating public access to Federal information resources and speeding the development of interoperable systems o Build consensus of major stakeholders on the manner in which Z39.50 can be applied in GILS implementations o Develop an application profile for networked-based GILS implementations that references Z39.50 and other relevant standards for use in the Internet environment o Support and encourage test implementations of the profile by interested parties to provide evaluations of the profile and for interoperability testing. To achieve these objectives, the project team focused it primary attention on developing the GILS Profile. 3.1 A Profile for GILS A profile is "a set of one or more base standards, and where applicable, the identification of chosen classes, subsets, options and parameters of those base standards, necessary for accomplishing a particular function (International Organization for Standardization/International Electrotechnical Commission, 1992, p. 2). Profiles are also referred to as "functional standards," "implementation agreements," or "specifications." Since open systems standards often include choices and options, profiles specify the values and parameters of a standard for an application or implementation to increase the likelihood of interoperability and interworking. A profile, then, is a set of implementation agreements that guide implementors in applying one or more standards in a specific and limited context. The research team broadened this definition for the GILS Profile to include not only the specifications for Z39.50 and other relevant standards in the application but also other aspects of a GILS conformant server that are beyond the scope of these standards. The GILS Profile does provide the specifications for the overall GILS application relating to the GILS Core and completely specifies the use of Z39.50 in this application. The GILS Profile will facilitate interoperability of independently developed components of the GILS Core. Further, in developing the GILS Profile, the project team was aware of the need to understand and address interoperability issues with the currently installed base of available implementation technology. This first version of the GILS Profile focuses on the requirements for a GILS server operating in the Internet environment. GILS clients will be able to interconnect with any GILS server, and these clients will behave in a manner that allows interoperability with the GILS server. Clients that support Z39.50 but do not implement the GILS Profile should be able to access GILS records but with less than full GILS functionality. Although the GILS Profile addresses GILS servers only, it is understood that clients have roles in the execution of information retrieval activities. The GILS Profile addresses many aspects of the GILS (e.g., intersystem interactions and information interchange) but does not specify user interface requirements, the internal structure of databases that contain GILS Locator Records, or search engine functionality. These aspects are also outside the scope of Z39.50. The _Government Information Locator Service (GILS)_ (Christian, 1993) (hereafter referred to as the _GILS_ document) provided the research team with high-level requirements for the GILS. Based on those requirements, the research team delineated assumptions about the operation and information flows of the GILS and developed functional requirements for the GILS. This process allowed the research team to identify a subset of Z39.50 and other existing and emerging standards that would support these functional requirements. The following sections of this document detail the research team's assumptions, model, conclusions, and Z39.50 specifications. 4. Assumptions and Agreements about GILS The _GILS_ document presents an overview of GILS, including its objectives, service requirements, and core requirements.[5] These requirements, however, are often described in general terms rather than in terms of specific functional requirements. The research team proceeded to develop an interpretation and understanding of the high-level requirements presented in the _GILS_ document. As a result, the team delineated the functional requirements that could be addressed by the GILS Profile. To accomplish this, the research team agreed upon a model of the system architecture that adequately described the GILS operation and information flows. In addition, the research team also reached other consensus agreements on the use of Z39.50 and other existing or emerging standards (e.g., USMARC, standards such as the Uniform Resource Identifiers developed for the Internet environment by the Internet Engineering Task Force [IETF], etc.).[6] 4.1. The GILS System Architecture Model The GILS is understood to be an agency-based, Internet- accessible locator service. "Direct users" (see _GILS_ document, p. 4) will connect to GILS servers via the Internet to find information about a wide range of Federal information resources. Once connected to a GILS server, users supported by appropriate clients that understand the GILS Profile, may navigate through single or multiple servers. GILS servers will support searching (i.e., accept a search query and return a result set or diagnostic messages) and may support browsing (i.e., accept a well-known search query and return a list of Locator Records in brief display format). The use of the national standard for network information retrieval, Z39.50, provides for interoperability between clients and multiple servers. Agencies will develop and maintain GILS servers. These GILS servers are machine-readable databases that contain Locator Records describing Federal information resources. These decentralized agency-based GILS servers enable ongoing maintenance responsibilities to be carried out by those who understand and manage the information resources. The GILS, then, is a distributed resource consisting of agency-based servers. The GILS Profile does not specify the base technology (e.g., a database management system) that an agency uses to mount its database of Locator Records nor does it specify internal storage structures for Locator Records in the database. According to McClure, Ryan & Moen (1992, p. 2), a locator is a "machine-readable database that identifies different information resources (e.g., databases, libraries, clearinghouses, print publications, bulletin boards, etc.) and describes the information available in these resources. Usually, the locator does not provide the actual information, but rather points the user to the information sources that do provide the needed information." The _GILS_ document states that "GILS is an information resource that identifies other information resources, describes the information available in those resources, and provides assistance in how to obtain the information" (p. 4). A GILS server accessed using Z39.50 in the Internet environment acts primarily as a pointer to information resources. The GILS server, as well as some of the information resources pointed to by GILS Locator Records, may be available electronically through other communications protocols including the common Internet protocols that facilitate electronic information transfer such as remote login (Telnet), File Transfer Protocol (FTP), and electronic mail (SMTP/MIME). The use of these protocols or other communications paths is outside the scope of this project and of the GILS Profile. The public will use the GILS either directly or through intermediaries (the intermediaries obtain GILS information as direct users themselves or from other intermediaries). The _GILS_ document (p. 4) describes these two classes of users. The concern for the project team, however, was limited to "direct users" accessing the GILS via the Internet using client/server implementations that rely upon Z39.50 as the information retrieval protocol. GILS servers will support searching (i.e., accept a search query and return a result set or diagnostic messages) and may support browsing (i.e., accept a well-known search query and return a list of Locator Records in brief display format). Although the GILS Profile addresses GILS servers only, it is understood that clients have roles in the execution of these activities (e.g., browsing is also a client function in the sense of how it interprets and presents GILS data). The server should include in a retrieved record all elements or combinations of elements of the database record for which there is data available and which can be encoded in the requested record syntax (see Sections 5.4.2.2. -- Element Set Names and 5.4.2.3. -- Record Syntax). 4.2. Navigating through GILS Direct users must have prior knowledge of at least one GILS server and its network address, and must be able to access it to enter the GILS. Upon entry, however, users supported by appropriate clients that understand the GILS Profile may navigate through single or multiple GILS servers by following the links provided in the Locator Records (see Section 4.4.). The semantics of the Locator Records coupled with a client that understands these semantics and building upon the ability of the Z39.50 protocol to provide a uniform interface to multiple autonomously managed servers combine to provide the user with the impression of seamless navigation among these distributed servers. The semantics of the Locator Records facilitate elimination of duplicate records, further fostering the impression of a single system built out of autonomous, distributed servers. Each GILS server can be represented by a Locator Record in other GILS servers. Some of these servers will include references to all other GILS servers, and these might be regarded as a kind of "directory of directories." However, GILS itself does not assign any hierarchical status to specific servers nor does it specify a "root server." Rather, the structure and content of the GILS Locator Records enable, for example, the aggregation of Locator Records in "directories" that could be offered by one or more Federal agencies or other organizations. 4.3. Uniform Resource Identifiers in GILS GILS incorporates the use of Uniform Resource Identifiers (URIs) to improve interoperability and navigation in the Internet environment. URIs comprise a set of related standards for encoding resource location and identification information for electronic and other objects. The URI Working Group of the IETF defines and specifies URIs.[5] There are currently three objects within the URI set: the Uniform Resource Locator (URL) (1993); the Uniform Resource Name (URN) (1993); and Uniform Resource Characteristics (URC). The URI Working Group has approved URLs for experimental standardization, and it is expected to approve URNs in 1994. URCs are in the developmental stages. GILS Locator Records contain fields for URIs. A scenario for the GILS as specifically related to URIs would be: A user, via a client, browses or searches a set of GILS servers and is presented with a set of GILS Locator Records, each referring to information resources (including other GILS servers) or related GILS Locator Records. As the user reads through the records, embedded URIs provide the ability for the client to directly access these described resources, related Locator Records, or GILS servers. URIs can serve as a direct reference to related works (e.g., a cross reference to another resource). By incorporating the use of URIs, the GILS is facilitating interoperability within the wider Internet community while accomplishing its goal of providing improved access to Federal information resources. 4.4. GILS Locator Records A GILS server contains individual Locator Records; these well- structured Locator Records include a standardized set of data elements (see Section 6 -- Data Elements in GILS Locator Records). The data elements provide summary descriptions of Federal information resources. GILS servers (i.e., machine-readable databases) are themselves Federal information resources and can be described by Locator Records. Locator Records in a single agency's (e.g., Agency A) GILS server can represent one of the following: 1) An internal information resource of Agency A. The primary purpose of the GILS server at Agency A is to provide Locator Records describing its own information resources. Agency A's GILS server is an information resource of Agency A, so Agency A's server may contain a Locator Record describing this GILS server. 2) Any information resource external to Agency A. This includes information resources (including another agency's GILS server) that are described in Locator Records by other agencies participating in GILS or any other information resources Agency A's GILS server providers wish to describe. The distributed design of the GILS is partly supported by records in case #2. These records may provide specific links between GILS servers. A Locator Record consists of a number of data elements that identify and describe an information resource. (Core Elements are noted in uppercase letters throughout this document.) Several data elements can be included in Locator Records to facilitate GILS navigation and network-based access to information: o Each retrieved Locator Record contains a LOCAL CONTROL NUMBER generated by the system and guaranteed to be unique on the server from which the Locator Record is retrieved. o Each Locator Record contains a CONTROL IDENTIFIER in the form of a Uniform Resource Identifier (URI). Agency A's server may contain Locator Records with CONTROL IDENTIFIERS that identify Locator Records from other Agencies' servers. This data element allows GILS Locator Records to be replicated on multiple servers for the convenience of GILS users. o Each Locator Record contains an AVAILABILITY element that informs the user how to procure the described information resource. If the information resource is an electronic information system or electronic document, the AVAILABILITY element includes AVAILABLE LINKAGE information in both human- and machine-readable form. The network linkage information may be used to connect to and access the electronic information resource. Different agencies may create or offer Locator Records describing the same information resource (these may be existing Locator Records that have been replicated and/or modified, or entirely new Locator Records). These multiple records can offer different views of a single resource from the particular perspectives of the agencies creating/modifying a Locator Record. For example, two agencies may wish to highlight different aspects of the content of a specific information resource and to describe it in terms common to an agency's particular user community. Each agency will assign its own CONTROL IDENTIFIER to the Locator Record it creates or substantially modifies. An agency (Agency B) may copy another agency's (Agency A) Locator Record. These are considered replicated records. In this case, two things might happen: 1) If Agency B makes no substantive changes to the replicated Locator Record from Agency A, the CONTROL IDENTIFIER is not changed. 2) If Agency B makes substantive changes to the replicated Locator Record from Agency A , a new CONTROL IDENTIFIER is assigned by the agency (Agency B) making the change. The CONTROL IDENTIFIER assigned by Agency A is retained in Agency B's new record in the data element ORIGINAL CONTROL IDENTIFIER. This process of replication and modification may become very complex, and the inclusion of the ORIGINAL CONTROL IDENTIFIER is intended to enable the user to trace the location of the record created by the original source of the information resource. 4.5. GILS Searching Users will be able to search a GILS server as a means of finding out how to acquire or access the information resource described by one or more Locator Records. GILS servers may support a variety of search strategies including those: o to find known items (e.g., where the user knows the exact TITLE of an information resource described in a Locator Record) o to find resources whose Locator Records contain certain words or phrases o to find resources by topic (e.g., using a controlled vocabulary) o to find resources whose Locator Records meet other criteria (e.g., specific ORIGINATOR agencies). A user's search specification is received by a GILS server using the Search Facilities of Z39.50. The searchable elements of the Locator Records correspond to Attributes (described in Section 5.4.1.1. -- Attribute Set). The exact manner by which the user constructs the query is an interface issue and not specified by the GILS Profile, but users supported by appropriate clients that understand the GILS Profile should be able to specify searches with each of the required Attributes listed in Section 5.4.1.1. As a GILS server completes a search, it produces a result set and makes that available to a client. The GILS server provides the client the contents of selected records from the result set using the Present Service of Z39.50. The GILS server must respond to requests that records be presented in any of three Record Syntaxes (see Section 5.4.2.3. -- Record Syntaxes) mandated by the GILS Profile and one of the four Element Set Names (see Section 5.4.2.2. -- Element Set Names) specified by the GILS Profile. The exact manner in which records are presented to the user is an interface issue and not within the scope of the GILS Profile. 4.6. GILS Browsing A GILS server may provide a structure for browsing that is comprised of a chain of Locator Records traversed through pointers specified in the GILS Core Element CROSS REFERENCE. The CROSS REFERENCE is a repeating element. Each occurrence contains a item pointer in the form of a Uniform Resource Identifier (URI), the title of the item, and a content type to identify it. Each referenced item may be a Locator Record on the same GILS server or on another GILS server. To provide support for browsing GILS Locator Records, there is a well-known search consisting of specific GILS Attributes and a term of zero length. GILS servers that support browsing of records will create a result set of one or more GILS Locator Records that provide the necessary information to allow clients to offer menu- like displays of GILS Locator Records or other information and information resources. The well-known search allows users to browse a GILS server when or if they have no other starting point. If a particular GILS server does not support browsing, the response to the well-known search may be an error message or an empty result set (i.e., this particular server does not contain any such records that match the query requirements). 4.7. Input Formats for GILS Records The GILS Profile does not recommend or prescribe any formats for records input to the software that feeds a particular GILS server database. This is a concern for GILS application developers, those who create the records, and/or those who load the records from other existing systems. 4.8. The Use of Z39.50 in GILS Z39.50 provides a key part of the foundation for the GILS. This standard enables the interoperability of a variety of systems and hardware platforms in a client/server environment for the purposes of information retrieval. The GILS Profile will include the complete specifications of a subset of Z39.50 for use in the GILS application. The GILS Profile, in addition, will specify necessary characteristics of the GILS application that are outside the scope of Z39.50 including reference to other existing or emerging standards. Separate implementations will have an improved likelihood of interoperability and interworking when they conform to a common profile. 5. Z39.50 Specifications for the GILS Application Based on the descriptions of the GILS system architecture model outlined above, the project team determined how Z39.50 will support the functional requirements of GILS. The specifications for using Z39.50 is documented in the GILS Profile. The GILS Profile is the authoritative source for Z39.50 specification for the GILS application and should be referred to for completeness and accuracy of specifications. The GILS Profile details the required facilities and services available from Z39.50, describes an Attribute Set for searching Locator Records and four Element Sets by which the server presents some or all the elements of the Locator Records, and prescribes three Record Syntaxes to be supported by GILS servers for the transfer of Locator Records. This section outlines the Z39.50 specifications for the GILS Profile. The terminology and concepts presented in this section are specific to Z39.50. Readers should consult the complete standard (National Information Standards Organization, 1992) for further information and reference. For example, the standard uses the words "origin" and "target," rather than "client" and "server." 5.1. Version GILS origin (clients) and targets (servers) support Z39.50 Version 2 as specified in Z39.50-1994. GILS requires support of various objects, some of which are not defined in Z39.50-1992. These are listed in 7.2. 5.2 GILS Objects The following object identifier (OID) is assigned to the Z39.50 standard: {iso (1) member-body (2) US (840) ANSI-standard-Z39.50 (10003)}. This OID is abbreviated as: ANSI-standard-Z39.50. Several object classes are assigned at the level immediately subordinate to ANSI-standard-Z39.50, including: o 3 = attribute set definitions o 4 = diagnostic definitions o 5 = record syntax definitions o 13 = database schema definitions. o 14 = tagSet definitions. GILS requires support of the following objects o GILS attribute set: {ANSI-standard-Z39.50 3 3} o bib1 diagnostic set: {ANSI-standard-Z39.50 4 1} o USMARC record syntax: {ANSI-standard-Z39.50 5 10} o SUTRS record syntax: {ANSI-standard-Z39.50 5 101} o GRS-1 record syntax: {ANSI-standard-Z39.50 5 105} o GILS schema: {ANSI-standard-Z39.50 13 2} o tagSet-M {ANSI-standard-Z39.50 14 1} o tagSet-G {ANSI-standard-Z39.50 14 2} 5.3. Communication Services Initial implementations of GILS servers will be accessible via the Internet. Therefore, Z39.50 will be using the transport service of the Transmission Control Protocol (TCP). The specification for use of TCP is found in OIW/SIGLA Document #1, "Using Z39.50-1992 Directly over TCP" (Open Systems Environment Implementors Workshop/Special Interest Group on Library Applications (OIW/SIGLA), 1993). The GILS Profile has not defined the use of other communication services. 5.4. Z39.50 Facilities and Services GILS Z39.50 origins (clients) and targets (servers) must support the following Facilities and Services for information retrieval for operation in the Internet environment: FACILITY SERVICE Init Facility -- allows an origin Init Service (client) to propose values for initialization parameters. Search Facility -- enables an Search Service origin system (client) to query a database at a target system (server), and to receive information about the results of query. Retrieval Facility -- enables the PresentService origin (client) to retrieve records according to position within a result set maintained by the target (server). Termination Facility -- allows the origin (client) or target (server) to initiate abrupt termination or graceful termination of a connection. Mapped to TCP ABORT or TCP CLOSE (see Lynch, 1994 and Open Systems Environment Implementors Workshop/Special Interest Group on Library Applications (OIW/SIGLA), 1993). No additional services are required for conformance to the GILS Profile. Other Z39.50 services, however, may be provided optionally by targets (servers) and used by origins (clients). Standard Z39.50 Init Service negotiation procedures control the use of all services. 5.4.1. Search This section describes the components and procedures used by Z39.50 to communicate search queries. The GILS application will support Z39.50 Type 1 queries which are general purpose Boolean query structures. 5.4.1.1. Attribute Set The profile specifies a GILS Attribute Set that is a registered object. The GILS Attribute Set is a superset of the Bib-1 Attribute Set and consists of all Bib-1 Attributes and additional Use Attributes that are defined for GILS elements (see the _Application Profile for the Government Information Locator Service (GILS)_, Annex A: GILS Attribute Set). These newly defined GILS Use Attributes are well-known and correspond semantically to GILS Core Elements. GILS servers must support a limited number of GILS Attributes. The required GILS Attributes follow. (Note: The GILS Use Attribute is listed followed by the GILS Use Attribute Number and the corresponding GILS Core Element.) o Use Attributes: Local Number (12; Local Control Number); Author-name corporate (1005; Originator); Date/Time Last Modified (1012; Date of Last Modification); Record Source (1019; Record Source); Distributor Name (2001; Distributor Name); Index Terms -- Controlled (2002; Index Terms -- Controlled); Local Subject Index (29; Local Subject Term); Any (1016) o Structure: Word (2), URx (104), Date (5), Word List (6) o Relation: Greater than (5), Equal (3). GILS servers should never return any of the following four diagnostic messages: "Unsupported Use Attribute," "Unsupported Structure Attribute," "Unsupported Position Attribute," or "Unsupported Attribute Type" when a query includes a combination of these GILS Attributes (see the _Application Profile for the Government Information Locator Service (GILS)_, Annex A: GILS Attribute Set, Table 1 for the recognized and supported combinations of the GILS Attributes). 5.4.1.2. Well-known Search To facilitate browsing of Locator Records, there will be a well-known search sent by the client to the GILS server. The well- known search consists of the GILS Attribute Set Use Attribute: Local Number; Structure Attribute: URX; and a term of zero length. GILS servers that support browsing of records will create a result set of one or more GILS Locator Records that provide the necessary information to allow clients to offer menu-like displays of GILS Locator Records or other information and information resources. The "Browse" in the GILS context involves only the Search and Present Services of Z39.50. "Browse" is used informally in the GILS Profile, and it is not related nor should it be confused with the Browse Facility or Scan Service of Z39.50. 5.4.2. Retrieval This section describes the components and procedures used by Z39.50 to return records in response to a query. 5.4.2.1. Schema Schemas provide a way to identify the elements that are available from a database record. Each element is defined in a tagSet and is identified by a tagType and a tag value. In addition to describing and/or defining tagSets used in an application, the a schema also includes an abstract record structure (ARS). The ARS describes an abstract structure for a database record, in terms of a set of schema elements, as well as describing the hierarchy of a record. The GILS Profile specifies a GILS Schema (see the Application Profile for the Government Information Locator Service (GILS)_, Annex D: GILS Schema). The GILS Schema is a registered object. The schema describes and/or defines tagSets used and an abstract record structure for a Locator Record. A schema in Z39.50 can be modified and may evolve over time, and it is reasonable to expect the GILS Schema will evolve. The GILS Schema uses tagSet-M and tagSet-G elements and defines in the GILS tagSet additional elements as necessary. The GILS Profile specifies tagTypes to identify tagSet-M elements (tagType=1), tagSet-G elements (tagType=2), and the elements defined by the GILS tagSet (tagType=4). Another tagType (tagType=3) is used to identify arbitrary string tags for locally defined elements. The GILS tagSet element numbering beg_ins with number 1. Elements can be nested and the tagging notation (i.e., the tag path) will reflect the nesting. The form of the notation is (x,y)/(z,w) where x and z are numbers identifying the tagType for the tag and y and w are tag values. For example, for the notation for specifying the element DISTRIBUTOR (4,90) under AVAILABILITY (4,70) would be (4,70)/(4,90). All well-known GILS Schema elements have assigned numeric tags. String-tags (i.e., text) may be used in the GILS Schema to label those elements that are not well-known (i.e., locally defined). 5.4.2.2. Element Set Names GILS servers will support four Element Set Names. GILS servers will interpret the use of the Element Set Names required by the GILS Profile to identify the following elements from the GILS Schema: o The primitive element set name "B" contains at least: title, controlIdentifier, originator, and local control number o The primitive element set name "G" contains: all B Element Set elements and crossReference o The primitive element set name "W" contains: all B Element Set elements and bodyOfDisplay o The primitive element set name "F" contains: all elements available in the record. The element "bodyOfDisplay" in tagSet-G (2,9) may be used by the target to combine into this single element (i.e., bodyOfDisplay) one or more of the elements from the abstract record structure into a display format. The server should include in a retrieved record all of the elements specified by the element set name for which there is data available in the database record and which can be encoded in the requested record syntax (e.g., some types of locally defined binary data may not be encodable in a USMARC or SUTRS record). 5.4.2.3. Record Syntaxes Record syntaxes provide for the transfer of database records between a target (server) and an origin (client) in acceptable form for processing or display. GILS servers are required to support the following three Z39.50 record syntaxes: o Generic Record Syntax (GRS-1) o USMARC o Simple Unstructured Text Record Syntax (SUTRS). The Generic Record Syntax is a general-purpose format for packaging records of varying complexity with potentially arbitrary data in individual fields. For mainly text records like GILS Locator Records, GRS-1 is simple and efficient. USMARC is an implementation of ANSI/NISO Z39.2 and is maintained by the Library of Congress. It is a communications format used by many bibliographic systems. These systems are likely to be important users of GILS. The research team defined a mapping of the GILS Core Elements into the USMARC Format for Bibliographic Data (see the _Application Profile for the Government Information Locator Service (GILS)_, Annex B: GILS Core Elements to USMARC Mapping). However, since the data transformation is not fully reversible and requires interpretation, the record source is responsible for encoding the USMARC record(s). The data in GILS Locator Records do not always map clearly into USMARC records, particularly when agencies add their own locally defined fields to the GILS Locator Record. This means that construction of USMARC records is subject to local interpretation. Therefore, GILS Locator Records in USMARC format obtained from other than the original record source should be considered non- definitive. The original source of the GILS Locator Record can be identified by examining the ORIGINAL CONTROL IDENTIFIER field in the record. Unstructured Text (SUTRS) provides a bare-minimum operating capability. SUTRS records consist of a single text field formatted by the target system (server). GILS targets (servers) will use the Preferred Presentation Format (see Section 5.5) to format Locator Records for Unstructured Text transmission. For interchange, GRS records are to be treated as the complete and canonical representation. SUTRS and USMARC should be viewed as derivative records from the canonical representation and as such are not as complete or precise. 5.5. Preferred Display Format for Use with SUTRS The GILS Profile recommends a preferred display format for SUTRS records (see the _Application Profile for the Government Information Locator Service (GILS)_, Annex C: Preferred Display Format for GILS Records). For the SUTRS records, formatting instructions for a preferred display format is a concern of the server. When the target transfers a GILS record using the SUTRS record syntax, it will encode the GILS record formatted according to the preferred display format, so that the client may present the record directly, without processing. For SUTRS, however, the client should not expect to be able to parse the record to obtain any individual GILS elements. When the client presents a GILS record formatted by the server using the USMARC or GRS record syntax, it is recommended that the client consider the SUTRS suggested display layout in formatting the received record for presentation to the human end user. 5.6. Diagnostic Messages The GILS application will use Diagnostic Set Bib-1. 6. Data Elements in GILS Locator Records The _GILS_ document provides the list of data elements for Locator Records. The document refers to these as the GILS Core Elements (see the _Application Profile for the Government Information Locator Service (GILS)_, Annex E: GILS Core Elements, which contains a list of the elements and their definitions). GILS Locator Records consist of a number of GILS Core Elements that contain information to identify and describe Federal information resources. The research team has examined the Core Elements and has had input into revisions of these Elements, particularly Elements related to the functional requirements for searching, browsing, and navigating the GILS. 7. Conclusion This broad outline of the GILS application and the use of Z39.50 in this application is based on the development work of the research project team. During the research project, the team solicited comments from a variety of stakeholders and other interested parties (e.g., the USMARC community, Federal agencies, Z39.50 implementors/vendors, records management and archival community, etc.). Feedback from these groups and other individuals have informed the development of the GILS Profile. Now that the draft GILS Profile has been completed, the project team will ensure its wide distribution. We anticipate that a number of organizations, companies, vendors, and individuals will develop implementations based on the GILS Profile. A further step in the GILS implementation is a mechanism by which these prototype implementations of the GILS Profile can undergo interoperability testing. Such testing can provide additional feedback on the utility of the GILS Profile, and if necessary, changes and/or expansions to the GILS Profile can be made. One major goal of this research project has been to ensure that the GILS Profile is implementable and usable, and that implementations based on the Profile can interoperate and interwork. Achieving this goal will serve the larger goals of the Government Information Locator Service by providing a standards- based, decentralized, network-accessible service through which the public will be able to identify and locate Federal information resources. In addition, the GILS Profile provides the means by which various implementors using a variety of computer platforms (clients and servers) can develop products usable by Federal agencies and the public. NOTES 1. The current draft of _Government Information Locator Service (GILS)_ (Christian, 1994), dated May 2, 1994, is available on the Fedworld electronic bulletin board (703-321-8020) or by anonymous FTP (File Transfer Protocol) via the Internet at <130.11.48.107> as /pub/gils.doc (Microsoft Word for Windows format) or /pub/gils.txt (ASCII text format). 2. The _Application Profile for the Government Information Locator Service (GILS)_ is available via anonymous FTP from as /USGS/GILS_PROFILE.ps (Postscript format) or /USGS/GILS_PROFILE.txt (ASCII format). 3. The GILS Profile only addresses the needs of the GILS Core and uses the GILS Core Elements for description used in the GILS Core Locators Records. Throughout this paper, the reader should assume that "GILS" refers to "GILS Core." For further information about the GILS Core, see Christian (1994). 4. Z39.50, Version 2, was approved in 1992. Since that time, the Z39.50 Implementors Group (ZIG), which is a voluntary user group comprising implementors of Z39.50, has continued work to enhance the standard based on needs of information providers. A draft Version 3 is expected to be balloted through the National Information Standards Organization in 1994. The new version of the standard is referred to as Z39.50 -- 1994 and will describe both Version 2 and Version 3 of the standard. Drafts of Version 3 can be retrieved from the Library of Congress's gopher. Connect to MARVEL.LOC.GOV and select #7. Services to Libraries and Publishers, and then select #8. Z39.50. 5. For information on the process by which the objectives, services requirements, and core requirements of GILS were developed, contact Eliot Christian, United State Geological Survey, 802 National Center, Reston, VA 22092; telephone: (703) 648-7245; electronic mail: . 6. USMARC is the implementation in the United States of ANSI Z39.2, the standard for bibliographic information interchange. See American National Standards Institute (1985) and Library of Congress (1993). The Internet Engineering Task Force (IETF) develops standards for the environment of the Internet. For a description of this standards development process see Crocker (1993). REFERENCES American National Standards Institute. (1985). _ American national standard Z39.2-1985: Bibliographic information interchange_. New York: American National Standards Institute. Christian, Eliot. (1994, May 2). _Government information locator service (GILS): Report to the information infrastructure task force_. Available on the Fedworld electronic bulletin board (703-321-8020) or by anonymous FTP (File Transfer Protocol) via the Internet at <130.11.48.107> as /pub/gils.doc (Microsoft Word for Windows format) or /pub/gils.txt (ASCII text format). Crocker, David. (1993, September). Making standards the IETF way. _StandardView_, 1(1), 48-54. International Organization for Standardization/International Electrotechnical Commission. (1992). _ISO/IEC TR10000-1 Information technology -- Framework and taxonomy of international standardized profiles -- Part 1: Framework_. Geneva: ISO/IEC Copyright Office. Library of Congress. (1993). _USMARC format for bibliographic data_. Washington, DC: Library of Congress, Cataloging Distribution Service. Lynch, Clifford A. (1994, April 30). Using the Z39.50 Information Retrieval Protocol in the Internet Environment [Draft RFC for Z39.50 over TCP/IP]. McClure, Charles R., Ryan, Joe & Moen, William E. Moen. (1992). _Identifying and describing federal information inventory/locator systems: Design for networked-based locators_, 2 Vols. Bethesda, MD: National Audio Visual Center [Available from ERIC, document no. ED349031]. National Information Standards Organization. (1992). _ANSI/NISO Z39.50-1992, Information retrieval application service definition and protocol specification for open systems interconnection_. Gaithersburg, MD: NISO Press. Open Systems Environment Implementors Workshop/Special Interest Group on Library Applications (OIW/SIGLA). (1993). OIW/SIGLA Document #1: Using Z39.50-1992 Directly over TCP. Uniform Resource Locators (URL): A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network. (1993, October). [Internet Draft]. The latest URL draft is: Uniform Resource Names. (1993, October). [Internet Draft]. The latest URN draft is: APPENDIX A Project Team Members The research project team consists of experts in Z39.50 and also representatives of Federal agencies. The following lists these members: Z39.50 Experts Kevin Gamiel Clearinghouse for Network Information Discovery and Retrieval Ralph LeVan OCLC Denis Lynch ESL, Inc. Margaret St. Pierre WAIS, Inc. Madeleine Stovel Research Libraries Group, Inc. Representatives from Federal Agencies Eliot Christian United States Geological Survey Tim Gauslin United States Geological Survey Sue Ruddle Defense Technical Information Center Yesha Yelena National Institute of Standards and Technology Representative of the Z39.50 Maintenance Agency Ray Denenberg Z39.50 Maintenance Agency, Library of Congress APPENDIX B Definitions For purposes of this Profile, the following definitions apply. Association: A communication session between a database user and a database provider. Client: An initiating application. This application includes the Z39.50 origin. Electronic Information Resource: Information resources that are maintained in electronic, digital format and may be accessed, searched, or retrieved via electronic networks or other electronic data processing technologies (e.g., CD-ROM). GILS Core: A subset of all GILS Locator Records which describe information resources maintained by the U.S. Federal government, all of which comply with the defined GILS Core Elements and are mutually accessible through interconnected electronic network facilities without charge to the direct user. Government Information: Information created, collected, processed, disseminated, or disposed of by or for the Federal government. Government Information Locator Service (GILS) : A decentralized collection of locators and associated information services used by the public either directly or through intermediaries to find public information throughout the U.S. Federal government. Information: Any communication or representation of knowledge such as facts, data, or opinions in any medium or form, including textual, numerical, graphic, cartographic, narrative, or audiovisual forms. Information Resource: Includes both government information and information technology. Interoperability: A condition that exists when the distinctions between information systems are not a barrier to accomplishing a task that spans multiple systems. Locator: An information resource that identifies other information resources, describes the information available in those resources, and provides assistance in how to obtain the information. Locator Record: A collection of related data elements describing an information resource, the information available in the resource, and how to obtain the information. Locator Records will be offered by servers to identify information resources, describe the information available in those resources, and provide assistance in how to obtain the information. Mandatory: An element in a GILS Core Locator Record that must have a value provided by the record source. The GILS Profile does not specify which elements must be present from the perspective of GILS servers. Origin: The part of a client application that initiates a Z39.50 association and is the source of requests during the association. Profile: The statement of a function(s) and the environment within which it is used, in terms of a set of one or more standards, and where applicable, identification of chosen classes, subsets, options, and parameters of those standards. A set of implementor agreements providing guidance in applying a standard interoperably in a specific limited context. Registered Object: An object that is identified by a name-to-thing relationship in which the name is recorded by a registration authority to ensure that the names can be used unambiguously. Server: An application that responds to an initiating application (i.e., a client). The application that includes the Z39.50 target. Target: The part of an server application that accepts a Z39.50 association. Uniform Resource Identifier (URI): A set of related standards for encoding resource location and identification information for electronic and other objects. Examples include Uniform Resource Locators (URLs) and Uniform Resource Names (URNs). USMARC: An implementation of ANSI/NISO Z39.2, the American National Standard for Bibliographic Information Interchange. The USMARC format documents contain the definitions and content designators for the fields that are to be carried in records structured according to Z39.2. GILS records in USMARC format contain fields defined in USMARC Format for Bibliographic Data. This documentation is published by the Library of Congress.