1 . 10 Web Underlying Protocols

One of the Web's great strengths is standards. The following alphabet soup of standards from the formal standards world and the Internet standards world are part of the technical foundations of the Web.

1 . 10 . 1 URI

The URI or Universal Resource Identifier is a generic, all-encompassing term used to identify all Uniform Resource (UR) specifications. Other UR specifications are, by definition, part of the set of URs described by the URI.

In addition, work is ongoing on a Uniform Resource Citation (URC). A URC will be a set of attribute/value pairs describing an object. Some of the attributes include author, publisher, date, and copyright status.

1 . 10 . 2 URL

The Uniform Resource Locator (URL) specification is a way of naming and addressing objects on the Web. Following is the Abstract from the Internet Draft specification of Uniform Resource Locators (URL):

Internet Draft - CERN Uniform Resource Locators (URL)

A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network

Abstract

Many protocols and systems for document search and retrieval are currently in use, and many more protocols or refinements of existing protocols are to be expected in a field whose expansion is explosive.

These systems are aiming to achieve global search and readership of documents across differing computing platforms, and despite a plethora of protocols and data formats. As protocols evolve, gateways can allow global access to remain possible. As data formats evolve, format conversion programs can preserve global access. There is one area, however, in which it is impractical to make conversions, and that is in the names and addresses used to identify objects. This is because names and addresses of objects are passed on in so many ways, from the backs of envelopes to hypertext objects, and may have a long life.

A common feature of almost all the data models of past and proposed systems is something which can be mapped onto a concept of "object" and some kind of name, address, or identifier for that object. One can therefore define a set of name spaces in which these objects can be said to exist.

Practical systems need to access and mix objects that are part of different existing and proposed systems.

This paper discusses the requirements of a universal syntax that can be used to encapsulate a name in any registered name space. This will allow names in different spaces to be treated in a common way, even though names in different spaces have differing characteristics, as do the objects to which they refer.

The universal syntax applies to objects available using existing protocols, and may be extended with technology. It makes a recommendation for a generic syntax, and for specific forms for "Uniform Resource Locators" (URLs) of objects accessible using existing Internet protocols.

The syntax has been in widespread use by World-Wide Web software since 1990.

1 . 10 . 3 URN

The Uniform Resource Name (URN) associated with an item is meant to be a persistent name that would return a list of current URLs pointing to the item. The URN is assigned by an Internet naming authority, such as the Internet Assigned Number Authority (IANA). To take an example from a recent Open Systems Today article:

the URL of a file

ftp://ftp.uu.net/published/osys-today/urdrafts.tar.Z

The URN for the same file could be:

URNg1

IANA:merit.edu:9283492.

In the URN, the IANA indicates that the following organization (merit.edu) has the authority to assign URN IDs and that the subsequent number is the ID assigned by that organization.

Searching clients such as WAIS that use URN would return a list of URLs, one for each site containing the document. Over time, sites may delete or add the document, but the URN would always return the most recent list of URLs. URNs are still being debated in committees and will eventually emerge as an extremely significant protocol.

1 . 10 . 4 SGML/HTML

The Standard Generalized Markup Language (SGML) has grown in popularity and use corresponding to the growth of the Web. HTML, a specific SGML application, has fueled the growth and awareness of SGML. SGML is a markup language used primarily to define and mark up the structure of a document [SGML].

Logical document structures are the document's components: such as chapters, sections, headings, and paragraphs. Together they comprise the entire document. Instances of these logical document items are the document itself. For example, the name <CHAP> might refer to all chapters, a structural item of a document. "Chapter 3 - Web Feet" is a particular chapter.

The entire structure of a document is defined in a Document Type Definition (DTD), which is a kind of metalanguage. DTDs define the structure of documents in a rigorous, formal manner. Once a DTD is defined, authors can write documents that conform to the DTD. In the case of the Web, authors create documents that conform to the HTML DTD. Applications do not necessary enforce conformance. This often causes confusion as one person's document "worked just fine," but has problems when viewed on another browser. Indeed, the HTML DTD did not, keep pace with HTML features for some time.

1 . 10 . 5 Z39.50

Z39.50 is a protocol for search and retrieval tasks usually associated with the library information retrieval community. A glossary entry(32) explaining Z39.50 states that the Z39.50 protocol is the "name of the national standard developed by the National Information Standards Organization (NISO) that defines an applications level protocol by which one computer can query another computer and transfer resulting records, using a canonical format." This protocol provides the framework for On-line Public Access Catalog (OPAC), which people use to search remote catalogs on the Internet using the commands of their local systems. Projects now in development will provide Z39.50 support for catalogs on the Internet. Search and Retrieval (SR), ISO Draft International Standard 10162/10163, is the international version of Z39.50.(33)

1 . 10 . 6 MIME

After years of experiments and non-standard, non-interoperating implementations, multimedia mail has yet to become widespread on the Internet or elsewhere, outside of isolated communities. Multipurpose Internet Mail Extensions (MIME), a standards-track Internet format defined by an Internet Engineering Task Force Working Group, offers a simple standardized way to represent and encode a wide variety of media types, including textual data in non-ASCII character sets, for transmission via Internet mail. MIME extends RFC 822 in a manner that is simple and completely backward-compatible, yet flexible and open to extension. In addition to enhanced functionality for Internet mail, the new mechanism offers the promise of interconnecting X.400 "islands" without the loss of functionality currently found in X.400-to-Internet gateways. This paper describes the general approach and rationale of the new mechanisms for Internet multimedia mail.(34)

MIME's influence has gone far beyond email. Mosaic and Gopher use the MIME protocol as a mechanism to communicate various data types. Associating applications with data types enables end users to launch specialpurpose viewers for special purpose data types. This provides an extensible capability to applications that otherwise would remain closed or difficult to modify.

1 . 10 . 7 HTTP

HTTP the HyperText Transfer Protocol is the native network protocol used by Web.

The following comes from the Internet Draft of the IETF(35) HTTP specification:

Abstract(36)

HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. It is a generic stateless object-oriented protocol, which may be used for many similar tasks such as name servers, and distributed object-oriented systems, by extending the commands, or "methods", used. A feature of HTTP is the negotiation of data representation, allowing systems to be built independently of the development of new advanced representations.

One recent development is S-HTTP, the Commerce Net Secure HTTP Proposal. According to a draft:(37)

Secure HTTP has been designed to enable incorporation of various cryptographic message format standards into Web clients and servers, including, but not limited to, PKCS-7, PEM, and PGP. S-HTTP supports interoperation among a variety of implementations, and is backward compatible with HTTP. S-HTTP aware clients can talk to S-HTTP oblivious servers and vice-versa, although such transactions obviously would not use S-HTTP security features.

S-HTTP does not require client-side public key certificates (or public keys), supporting a symmetric session key operation mode. This is significant because it means that spontaneous private transactions can occur without requiring individual users to have an established public key. While S-HTTP will be able to take advantage of ubiquitous certification infrastructures, its deployment does not require it.

S-HTTP supports end-to-end secure transactions, in contrast with the existing de-facto HTTP authorization mechanisms which require the client to attempt access and be denied before the security mechanism is employed. Clients may be "primed" to initiate a secure transaction (typically using information supplied in an HTML anchor); this may be used to support encryption of fill-out forms, for example. With S-HTTP, no sensitive data need ever be sent over the network in the clear.





[SECTION 1.11] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]



© Prentice-Hall, Inc.
A Simon & Schuster Company
Upper Saddle River, New Jersey 07458

Legal Statement