5 . 4 HTML

HTML, the proverbial coin of the realm for the Web, started out life (in CERN) as a convienent way of creating hypertext documents viewable with World Wide Web browsers. Tim Berners-Lee, father of the Web, at CERN, knew nothing about SGML; however, some other people did(18) and eventually made a case for paying more attention to "proper" SGML. Primarily, this involved the creation and distribution of an SGML DTD for HTML. This means you can take an HTML document, run it through an SGML parser, and determine if your document is structurally correct. This is important stuff! If you ever hope to use your content in several ways, you must ensure that your documents are wellstructured.

Much of the Web community and vendors responsible for Web browsers do seem to be willing to keep the SGML flame burning or at least to tolerate it. Vendors who add new features and tags to HTML, at least seem willing to give SGML good lip service. Ultimately, however, vendors must be willing to include true SGML parsers as part of their products.

So the question really becomes: if HTML and the Web work, why bother with all this SGML stuff? Mostly, it boils down to two issues. First, a desire to keep your electronic documents functioning and readable in perpetuity. Secondly, a desire to use and deliver content in a variety of ways.

Electronic documentsdigital documentswill never degrade; they will never turn yellow and corrode. However, the software and operating systems necessary to read the documents will become inoperable and render the electronic documents useless. How many have had the experience of not being able to read old Apple II diskettes or even older punch cards? The only hope is a strict adherence to a well defined standard. The Federal Aviation Administration (FAA) requires that the schematics for airplanes be archived for at least 20 years. They mandate that the drawings must be kept in the IGES graphics standard.

OK, let's now assume everyone agrees that real HTML that conforms to SGML is the accepted Web language. Now there turns out to be an explosion of HTML versions. Many companies, notably Netscape and Microsoft, introduce new HTML tags to improve the functionality of their browsers. Companies try to distinguish themselves from one another by introducing new tags and browsers that interpret those tags. Netscape is the classic example, introducing the concepts of Frames and inline Plug-ins that allow multi-paned windowed interfaces and multimedia content. It's great for the user but rough on standardization. In a competitive market, this behavior is to be expected; however documents that use these tags will be truly readable only with one vendor's browser. If that vendor ultimately establishes the de facto standard, then all is well. If not, content will eventually be lost and rendered unreadable.

The only way to ensure uniform adoption of standards is to apply conformance tests to the use of the standard and browsers. Conformance tests are where the rubber meets the road in the world of standards. It is only by passing a conformance test that a document or browser can truly claim to comply with a standard. This type of branding is well known in the commercial world also, for proprietary specifications. A recent example is Microsoft's branding of products as "Windows 95" compliant and allowing only those products to bear the Windows 95 logo. Someday we may see the moniker "HTML Compliant" stamped on products by an authoritative organization; until then, chaos rules.

Using content in multiple ways is done reasonably only when the content itself is "authored" in a highly structured way. This is one of the principal strengths of SGML. Multimedia databases, books for the blind, talking books, printed text, and so on can be created from a single content source if that content is wellstructured. Remember that using your content in many ways makes your project more profitable and efficient.

Of course, that does not mean that SGML is the only way to structure documents. However, it is a formal internationallyaccepted standard. It functions as well as any other structuring specification, so why not use it?

5 . 4 . 1 Steve Tibbett's HTML Cheat Sheet

The collection of HTML flavors is growing. Hopefully, this trend will stop, and all vendors will agree on a "standard" HTML. But for now that's life in the HTML fast lane. What follows is the "HTML Cheat Sheet," created by Steve Tibbett and reproduced here with his kind permission :(19)
HTML Cheat Sheet
This page is a quick reference to all the HTML tags that are supported in the most popular World Wide Web browsers. This page requires that your browser support tables, or it will look terrible. I will try to indicate which browsers support which tags, where possible. Please send me comments on missing tags, incorrect documentation, etc... This page created by Steve Tibbett.

Tag Description Example Output Level
<html> Surrounds the entire HTML document. Browsers don't always require this. <html> ... </html> None 1
<!--> Inserts a comment into an HTML document. Not displayed. <!-- Steve Was Here --> 1
Header Elements
<head> Surrounds document header section. <head> ... </head> None 1
<title> Specifies the document title. Typically displayed in the browser window title bar. <title> ... </title> None 1
<isindex> Specifies that the current document is a searchable index. The browser will use a mechanism of it's choice to let the user start a search. <isindex> None 1
<base> Specifies the URL of the current document, for relative links. <base href="basename"> None 1
<body> Contains the body of the page. One per page. <body> ... </body> None 1
Text Elements
<a> Begins text anchor or hypertext link. <a href="cheat.html">
Cheat
</a>
Cheat 1
<p> Begins a new paragraph. Contains a paragraph in HTML 3.0 (ie, you should include </p>). One<p>Two One

Two

1
<center> Centers text horizontally. This is a Netscape tag; see <p align=> <center>Test</center>
Test
NS
<br> Inserts a line break. This may or may not be less space than inserted by the <p> tag. One<br>Two One
Two
1
<hr> Inserts a horizontal line across the browser window. One<hr>Two One
Two
1
<img> Inserts a graphic image or the alternate text if the browser can't show the graphic. <img src=badurl alt="Text"> Text 1
Logical Text Styles
<h1> Header. Sizes range from 1 through 6 Use for headers, not just for big text. <h1>Big</h1>
<h6>Small</h6>

Big

Small
1
<blockquote> Block quote. Quoted text from some source. Usually indented. <blockquote>
Now is the time.
</blockquote>
Now is the time
1
<em> Emphasize text. Most browsers use italics. <em>Hello</em> Hello 1
<strong> Strong text emphasis. Typically boldface. <strong>Hello</strong> Hello 1
<code> Code sample - uses monospaced font <code>Hello</code> Hello 1
<kbd> Keyboard key - for indicating that a user should press a specific key <kbd>Hello</kbd> Hello 1
<samp> Sample program output. <samp>Hello</samp> Hello 1
<var> Program variable. <var>Hello</var> Hello 1
<dfn> Definition. <dfn>Hello</dfn> Hello 1
<cite> Citation. <cite>Hello</cite> Hello 1
<address> Address - typically a mailing address. <address>Hello</address>
Hello
1
Physical Text Styles
<b> Bold face text. <b>Hello</b> Hello 1
<i> Italicize the text. <i>Hello</i> Hello 1
<u> Underline the text. <u>Hello</u> Hello 1
<big> Makes text big, relative to the current font. <big>Hello</big> Hello 3
<small> Makes text small, relative to the current font. <small>Hello</small> Hello 3
<sup> Displays superscript (small, raised) text. <sup>Hello</sup> Hello 3
<sub> Displays subscript (small, lowered) text. <sub>Hello</sub> Hello 3
<tt> Use a typewriter-style monospaced font, typically Courier if available. <tt>Hello</tt> Hello 1
<blink> Makes text flash. Hated by all. See URL about:mozilla if using Netscape. <blink>Yikes!</blink> Yikes! NS
Definition Lists
<dl> Begin a definition list. A definition list is a list of header/body pairs. The header is left-aligned, the body text is indented and word wrapped. <dl>
<dt>Header
<dd>Body
<dt>Header
<dd>Body
</dl>
None 1
<dt> Definition term. Left-aligned text - doesn't need to be terminated. See <dl> None 1
<dd> Definition body. Indented text displayed below the definition term. Doesn't need to be terminated. See <dl> None 1
Other Lists
<ul> Begin an unordered list. An unordered list is just a list of items with bullets. <ul>
<li>First
<li>Second
<li>Third
</ul>
None 1
<ol> Begin an ordered list. An ordered list is a list of items, with a counter of some sort. <ol>
<li>First
<li>Second
<li>Third
</ol>
None 1
<menu> Begins an "interactive menu". Most browsers display this the same as an unordered list. <menu>
<li>First
<li>Second
<li>Third
</menu>
None 1
<dir> Begins a "directory". Most browsers display this the same as an unordered list. <dir>
<li>First
<li>Second
<li>Third
</dir>
None 1
<li> List Item. This is an item in an ordered or unordered list. Doesn't need to be terminated. See <ul> or <ol> None 1
Forms
<form> This tag contains a form. <form [action=URL] [method=(post|get)]> ... </form> None 2
<input> This tag marks a text box, password box, checkbox, radio button, submit or reset button on a form. Type type field can be any of these. <input name="name" type=text value="default" size=32 maxlength=64> ... </input> None 2
<textarea> This tag marks a rectangular text input area on the form. <textarea name="name" [rows=1] [cols=1]> Default Text </textarea> None 2
<select> Lets the user select an item from a list. The list items follow this tag prefaced by <option> tags. <select name="name" size=2 multiple>
<option>Cheese
<option>Beans
</select>
None 2
<option> This tag is an option on a <select> menu. See <select>. None 2
Tables
<table> This tag contains a table. <table> ... </table> None 3
<tr> Table row. Each row of a table is contained in this tag. <tr> ... </tr> None 3
<td> Table data. The data in this tag will be contained in one cell on a table. <td>Data</td> None 3
<th> Table header. Generally like the <td> tag but centers the text. <th>Header</th> None 3
Client Side Image Maps
<map> Client side image map. <map name="map">
<area shape=rect coords="0,0,64,64" href="_URL_">
</map>
<img usemap="map" src="some.bmp">
None 3
<area> Client side mage map area. See <map> None 3
Frames
<frameset> This tag replaces the <body> tag for pages using frames. <frameset rows=*,*>
<frame src=this.html>
<frame src=that.html>
</frameset>
None NS
<frame> Specifies the source for one of the cells in a frame. <frame src=this.html> None NS
<noframes> Browsers with frames hide this text; others show it. Used to tell users to get a better browser. <noframes>
Ha ha ha
</noframes>
None NS
Microsoft-Specific Tags
<marquee> This tag creates an animated piece of text sliding across your browser window. <marquee>Hello</marquee> None MS
<bgsound> Loads up and plays a sound when the user enters the page. <bgsound src="start.wav"> None MS
Miscellaneous Tags
<embed> Embeds foreign content in an HTML document. <embed src=cmx.cmx>
Only visible if you have the Corel CMX Plugin for Netscape installed.
NS

The official keeper of the HTML specification is the World Wide Web Organization (W3O). The W3O is an organization jointly sponsored by INRAI (Institut National de Recherche en Informatique et en Qutomatique...The French National Institute for Research in Computer Science and Control) and MIT. It was founded by Tim Berners-Lee, creator of the Web, when he moved from CERN (European Laboratory for Particle Physics) to MIT. The W3O is shephearding a number of specifications through the standards process.(20) In collaboration with the W3O, an HTML working Group of the IETF, was created in or around May 1994 . The W3O is coordinating testing efforts of HTML.

5 . 4 . 2 Link Validation

An SGML parser can check that an HTML document is syntactically valid. It can't check if the link in the document actually points to valid places but there are tools to help accomplish this. For example, one tool called "linkcheck", a perl script, is available at ftp://ftp.math.psu.edu/pub/sibley.

Another useful Web management helper is a tool that aids in the relocation of Web pages. If you have a Web site or page that's popular, it becomes a problem when you have to move the page (for whatever reason). It would be nice to be able to tell other sites, that refer to your pages about the new location. The reference log file keeps track of where your visitors are coming from. (See Section 1 . 3 Web Maintenance in Chapter 1 World Wide Web for more information about these type of tools.)

Some of the new Web site management products, like Interleaf's Cyberleaf and Adobe's SiteMill, help with this arduous task. Link maintanence is nasty and sites will unquestionably degrade over time. It is an important issue that must be addressed if you hope to create a Web site that remains current.

5 . 4 . 3 A Gentle Introduction to HTML Syntax

Let's take a brief, very brief look at HTML itself. Rather than going through HTML in an overly simplistic way, let's examine the syntax and principles of HTML. Go to any book store or read the many on-line information resources for the details on HTML.(21) Look here for some syntactic and structural principles.

Keep in mind that HTML is an application of SGML. Because of this, the syntactic conventions are all derived from SGML. For example, the unbelievably baroque syntax for a comment <!-- stuff to be commented out --> derives from the nature of SGML. According to the SGML standard, a comment is defined as:

comment declaration =
mdo,    (markup declaration open)
( comment,
( s | 
comment )* )?,
mdc     (markup declaration close)
comment = 
com,  (comment delimeter)
SGML character*,
com    (comment delimeter)

These "production rules" are used by people who build parsers, the programs that interpret a language.

The strange look of SGML comments derives from the generality possible with SGML. One can redefine almost everything. The trick with comments is that you want to be sure that the parser does not interpret anything inbetween the start and end comment delimiters. Keep in mind that the HTML document is "parsed, " it is interpreted by a program, the browser. This is very much akin to the batchlanguage oriented document processors (see Section 4 . 1 . 2 Language Characteristics in Chapter 4 Form and Function of Document Processors). In effect, the HTML document itself is a program, that drives the HTML browser, your Web browser.

Markup tags generally have a start and an end. In between the start and end tags is the content.

Start tags generally consist of a tag name surrounded by angle brackets, like <THIS>, and end tags have the same tag name preceeded by a slash and also surrounded by angle brackets, like </THIS>.

Many HTML tags require parameters. These parameters are sometimes interpreted by the browser and sometimes by the server. Let's dissect the tag for a link:

<A HREF="http://www.ability.net">Access Ability</A>

The <A> is the start of an Anchor tag.

The HREF or hypertext reference is an attribute of the Anchor tag.

The value of the HREF is either a URL or a file accessible from the point of view of the server.

The text "Access Ability" is what the browser should display to the user as a link.

Finally, the </A>, like most tags, ends with a forward slash and the tag name "A".

In general, the syntax of much HTML markup, like SGML, is as follows:

<TagName Attrib="Value1" Attrib="Value2" Attrib3="value3> content text </TagName>

where the existence of attributes is optional, and the number of attributes is variable.

Sometimes the values for an attribute are fixed, from a list; in these cases, the value does not appear within quotes. For example, the IMG tag, used to define where and how to place an image in a page, has attibutes values for the ALIGN attribute of BOTTOM, MIDDLE and TOP.

<IMG SRC="filename.gif" ALIGN=BOTTOM>

(For the geeks among us, these are the elements of an enumerated list.)





[SECTION 5.5] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]



© Prentice-Hall, Inc.
A Simon & Schuster Company
Upper Saddle River, New Jersey 07458

Legal Statement