5 . 4 HTML
HTML, the proverbial coin of the realm for the Web, started out life (in CERN) as a convienent way of creating hypertext documents viewable with World Wide Web browsers. Tim Berners-Lee, father of the Web, at CERN, knew nothing about SGML; however, some other people did(18) and eventually made a case for paying more attention to "proper" SGML. Primarily, this involved the creation and distribution of an SGML DTD for HTML. This means you can take an HTML document, run it through an SGML parser, and determine if your document is structurally correct. This is important stuff! If you ever hope to use your content in several ways, you must ensure that your documents are wellstructured.
Much of the Web community and vendors responsible for Web browsers do seem to be willing to keep the SGML flame burning or at least to tolerate it. Vendors who add new features and tags to HTML, at least seem willing to give SGML good lip service. Ultimately, however, vendors must be willing to include true SGML parsers as part of their products.
So the question really becomes: if HTML and the Web work, why bother with all this SGML stuff? Mostly, it boils down to two issues. First, a desire to keep your electronic documents functioning and readable in perpetuity. Secondly, a desire to use and deliver content in a variety of ways.
Electronic documentsdigital documentswill never degrade; they will never turn yellow and corrode. However, the software and operating systems necessary to read the documents will become inoperable and render the electronic documents useless. How many have had the experience of not being able to read old Apple II diskettes or even older punch cards? The only hope is a strict adherence to a well defined standard. The Federal Aviation Administration (FAA) requires that the schematics for airplanes be archived for at least 20 years. They mandate that the drawings must be kept in the IGES graphics standard.
OK, let's now assume everyone agrees that real HTML that conforms to SGML is the accepted Web language. Now there turns out to be an explosion of HTML versions. Many companies, notably Netscape and Microsoft, introduce new HTML tags to improve the functionality of their browsers. Companies try to distinguish themselves from one another by introducing new tags and browsers that interpret those tags. Netscape is the classic example, introducing the concepts of Frames and inline Plug-ins that allow multi-paned windowed interfaces and multimedia content. It's great for the user but rough on standardization. In a competitive market, this behavior is to be expected; however documents that use these tags will be truly readable only with one vendor's browser. If that vendor ultimately establishes the de facto standard, then all is well. If not, content will eventually be lost and rendered unreadable.
The only way to ensure uniform adoption of standards is to apply conformance tests to the use of the standard and browsers. Conformance tests are where the rubber meets the road in the world of standards. It is only by passing a conformance test that a document or browser can truly claim to comply with a standard. This type of branding is well known in the commercial world also, for proprietary specifications. A recent example is Microsoft's branding of products as "Windows 95" compliant and allowing only those products to bear the Windows 95 logo. Someday we may see the moniker "HTML Compliant" stamped on products by an authoritative organization; until then, chaos rules.
Using content in multiple ways is done reasonably only when the content itself is "authored" in a highly structured way. This is one of the principal strengths of SGML. Multimedia databases, books for the blind, talking books, printed text, and so on can be created from a single content source if that content is wellstructured. Remember that using your content in many ways makes your project more profitable and efficient.
Of course, that does not mean that SGML is the only way to structure documents. However, it is a formal internationallyaccepted standard. It functions as well as any other structuring specification, so why not use it?
5 . 4 . 1 Steve Tibbett's HTML Cheat Sheet
The collection of HTML flavors is growing. Hopefully, this trend will stop, and all vendors will agree on a "standard" HTML. But for now that's life in the HTML fast lane. What follows is the "HTML Cheat Sheet," created by Steve Tibbett and reproduced here with his kind permission :(19)
HTML Cheat Sheet This page is a quick reference to all the HTML tags that are supported in the most popular World Wide Web browsers. This page requires that your browser support tables, or it will look terrible. I will try to indicate which browsers support which tags, where possible. Please send me comments on missing tags, incorrect documentation, etc... This page created by Steve Tibbett.
The official keeper of the HTML specification is the World Wide Web Organization (W3O). The W3O is an organization jointly sponsored by INRAI (Institut National de Recherche en Informatique et en Qutomatique...The French National Institute for Research in Computer Science and Control) and MIT. It was founded by Tim Berners-Lee, creator of the Web, when he moved from CERN (European Laboratory for Particle Physics) to MIT. The W3O is shephearding a number of specifications through the standards process.(20) In collaboration with the W3O, an HTML working Group of the IETF, was created in or around May 1994 . The W3O is coordinating testing efforts of HTML.
5 . 4 . 2 Link Validation
An SGML parser can check that an HTML document is syntactically valid. It can't check if the link in the document actually points to valid places but there are tools to help accomplish this. For example, one tool called "linkcheck", a perl script, is available at ftp://ftp.math.psu.edu/pub/sibley.
Another useful Web management helper is a tool that aids in the relocation of Web pages. If you have a Web site or page that's popular, it becomes a problem when you have to move the page (for whatever reason). It would be nice to be able to tell other sites, that refer to your pages about the new location. The reference log file keeps track of where your visitors are coming from. (See Section 1 . 3 Web Maintenance in Chapter 1 World Wide Web for more information about these type of tools.)
Some of the new Web site management products, like Interleaf's Cyberleaf and Adobe's SiteMill, help with this arduous task. Link maintanence is nasty and sites will unquestionably degrade over time. It is an important issue that must be addressed if you hope to create a Web site that remains current.
5 . 4 . 3 A Gentle Introduction to HTML Syntax
Let's take a brief, very brief look at HTML itself. Rather than going through HTML in an overly simplistic way, let's examine the syntax and principles of HTML. Go to any book store or read the many on-line information resources for the details on HTML.(21) Look here for some syntactic and structural principles.
Keep in mind that HTML is an application of SGML. Because of this, the syntactic conventions are all derived from SGML. For example, the unbelievably baroque syntax for a comment <!-- stuff to be commented out --> derives from the nature of SGML. According to the SGML standard, a comment is defined as:
comment declaration = mdo, (markup declaration open) ( comment, ( s | comment )* )?, mdc (markup declaration close) comment = com, (comment delimeter) SGML character*, com (comment delimeter)These "production rules" are used by people who build parsers, the programs that interpret a language.
The strange look of SGML comments derives from the generality possible with SGML. One can redefine almost everything. The trick with comments is that you want to be sure that the parser does not interpret anything inbetween the start and end comment delimiters. Keep in mind that the HTML document is "parsed, " it is interpreted by a program, the browser. This is very much akin to the batchlanguage oriented document processors (see Section 4 . 1 . 2 Language Characteristics in Chapter 4 Form and Function of Document Processors). In effect, the HTML document itself is a program, that drives the HTML browser, your Web browser.
Markup tags generally have a start and an end. In between the start and end tags is the content.
Start tags generally consist of a tag name surrounded by angle brackets, like <THIS>, and end tags have the same tag name preceeded by a slash and also surrounded by angle brackets, like </THIS>.
Many HTML tags require parameters. These parameters are sometimes interpreted by the browser and sometimes by the server. Let's dissect the tag for a link:
<A HREF="http://www.ability.net">Access Ability</A>
The <A> is the start of an Anchor tag.
The HREF or hypertext reference is an attribute of the Anchor tag.
The value of the HREF is either a URL or a file accessible from the point of view of the server.
The text "Access Ability" is what the browser should display to the user as a link.
Finally, the </A>, like most tags, ends with a forward slash and the tag name "A".
In general, the syntax of much HTML markup, like SGML, is as follows:
<TagName Attrib="Value1" Attrib="Value2" Attrib3="value3> content text </TagName>
where the existence of attributes is optional, and the number of attributes is variable.
Sometimes the values for an attribute are fixed, from a list; in these cases, the value does not appear within quotes. For example, the IMG tag, used to define where and how to place an image in a page, has attibutes values for the ALIGN attribute of BOTTOM, MIDDLE and TOP.
<IMG SRC="filename.gif" ALIGN=BOTTOM>(For the geeks among us, these are the elements of an enumerated list.)
Skip to chapter[1][2][3][4][5][6][7][8][9]
| © Prentice-Hall, Inc. A Simon & Schuster Company Upper Saddle River, New Jersey 07458 |