Chapter 3: Points of View "Each thing we see hides something else we want to see." -René Magritte
Whether you are preparing a ten-page pamphlet or a 300-page book, the process of creating and producing an electronic document can be viewed in many different ways. Each software tool presents a particular conceptual model of the publishing process. This philosophical point of view greatly influences the functionality and usability of the software. The better you understand the many points of view, the more effective you will be in choosing and using the available software tools.
Some systems are page-oriented. Others focus on the entire document. Some are WYSIWYG (what you see is what you get). Others are language-oriented and still others are oriented toward on-line display and interaction. Learning and using the publishing tools are easier if you are aware of the philosophythe point of viewthat a system supports.
One way to understand the value of new technologies is to create a metaphor or catchy phrase for a concept. The iconic user interface of the Macintosh is known as a desktop. The use of printing software and hardware on this metaphorical desktop is known as desktop publishing. This usage brings to mind miniature Gutenberg presses right at your fingertips. Many varieties of desktop metaphors have been created: desktop machining, desktop forgery, desktop prepress, desktop broadcasting, and so on.
The newer technologies of electronic publishing also need new metaphors to cover the issues of document processing, electronic distribution, archival storage, and so on. (1)
The multifaceted world of electronic publishing needs a catchy phrase to describe it. The many points of view used to examine electronic publishing are necessary because there is no single satisfying metaphor.
The term electronic publishing means different things to different people. Many of the standards discussed in this book suggest other possibilities such as hypertext, on-line information browsers, and so on. These applicationsas well as databases, CD-ROMs, and other electronic repositoriesare all part of the domain of electronic publishing. At its core, electronic documents start as text organized into chunks of information, paragraphs, pages, and so on.
In this chapter, we examine many approaches to looking at electronic documents. The views we examine are (1) Visual and Logical Views, (2) The Design Point of View, (3) Communications Views, (4) The Engineered View, (5) The Database View, (6) Specialized Views, and (7) On-line Views. We examine how the creation of electronic documents is influenced by each point of view.
3 . 1 Visual and Logical Views
"It's a small world, but I wouldn't want to paint it." - Steven Wright
Documents have many componentscharacters, words, paragraphs, chapter headings, sections, and subsections. We can examine each component in two complementary ways, the visual and the logical.
The logical aspect of a component refers to its semantically meaningful part, such as the fact that a collection of characters is a word that can be checked for spelling or that a chapter is divided into sections. The visual aspect of a document component refers to the size, position, and fonts used to form its physical appearance.The visual components of document elements will be discussed further in Section 3.2 The Design Point of View.
In this section, we examine document components of increasing complexity, starting with the character and progressing through to an entire enterprise. Each document component has a visual aspect and logical aspect; some lean more toward one than the other.
Putting these document components on a scale, from the simplest to the most complex, provides us with a useful frame of reference in which to discuss these issues.
You can manipulate each item on this scale using software tools. Of course, some tools cover several items on the scale. The orientation of a particular toolthe point of view it supportswill probably be centered around one particular item. In the following sections, we go through the scale by examining each document component individually.
3 . 1 . 1 Character
The first level of our document scale is the character and its manipulation. Characters, as logical meaningful entities, have values that are represented in the computer according to wellknown and established character codes. Character codes are the fundamental representation of text. ASCII is the best known and established character encoding.
Normally you don't have to be concerned about the character code used in your particular system. However, when you want to interchange to other systems, the character code may become a problem. In particular, interchange with systems in countries that use other character codes must pay attention to these codes. Many Asian languages require other character codes, that are necessary to support hundreds or even thousands of characters (for example, Japanese). Localization is the process of taking software written for one system and porting it to another system that uses another language and possibly another character code.
Also on the logical (as opposed to visual) side of the discussion is the ability to associate attributes or tags with individual characters. Essentially, tags are names you can associate with characters for whatever purpose you like. For example, the FrameMaker publishing system allows the definition of character tags. Each tag defines a particular font family, size, weight, and other properties, which can be applied to any character. These tagged characters may then be manipulated as a group if necessary.
Named attributes or tags such as these provide a convenient mechanism for manipulating the visual appearance of characters throughout a document. You can also use them for semantic purposes. For example, you could associate the name "placeHolder" with particular characters you wish to use temporarily. You can search for the tag "placeHolder" to locate the particular text. You can even print a report listing all occurrences of the "placeHolder" tag and where they occur in the document, creating an automated list of work to be done.
For the visual side of characters, many font manipulation tools are available that could be considered part of font definition software. If you want to change the appearance of the character T, for example, you would use a font definition tool.
There are many more issues concerning the visual aspects of characters and fonts. Please see Section 3 . 2 . 1 Fonts and Typography later in this chapter for a discussion of these issues.
3 . 1 . 2 Words
The act of writing takes place at the word level of the document component scale. Most of the discussion about writing is in Section 3 . 3 Communications Views, later in this chapter. Spelling checkers and grammatical aid systems are some of the electronic publishing tools that help with writing. The growing popularity of computer-assisted writing aids attests to their growing sophistication.
Another manipulation of words is automatic hyphenation. This is a manipulation of the logical or semantically meaningful aspects of words. Often, publishing systems allow the user to modify some variables to control the precise way automatic hyphenation is performed. For example, these could be variables to control the minimum and maximum number of characters before and after the hyphen. In addition, electronic publishing systems that support several languages must also have hyphenation dictionaries appropriate for each language. Hyphenation algorithms differ among publishing systems.(2) The same document in two systems may not appear exactly the same, even if the fonts and page margins are identical, because the hyphens will break the words at different places. Hyphenation is part of the process of formatting and can hinder efforts to interchange documents with perfect fidelity. It's amazing how complicated these little details can be!
3 . 1 . 3 Paragraphs with Tags and Styles
Moving up the complexity scale, we now come to the paragraph. One of the most powerful document processing tools is the ability to attach attributes, tags, or styles to paragraphs. I use the terms tags to refer to the logical aspect of paragraphs and styles to refer to the visual aspect of paragraphs.When writing, we generally treat the content and appearance of paragraphs uniformly. Individual paragraphs have the same margins and typefaces (they should also contain a coherent idea). Many software products treat the paragraph as an entity that can be manipulated as a unit.
When manipulating a paragraph, it is important to distinguish the logical aspects from the visual. The logical use of a paragraph tag might be to identify all chapter headings. The publishing system may support the intent of a document structure and not allow the creation of a chapter heading in the middle of a table. Identification of the logical structure of a document is one of the major features of formal document standards and is discussed in detail in the Document Standards chapter. (See Section 5 . 3 SGML in Chapter 5 Document Standards for a discussion of document structure.)
Another logical use of tag names is the actual name itself. The name "Body text" conveys the meaning that the body copy in a document will be associated with the tag "Body text." It is important to select meaningful tag names. Cryptic, "cutesy" names obscure the intent of the tag or style. Spend the painful time creating good names that will be meaningful to others in your organization.
The development and use of a consistent set of paragraph tags can be of tremendous value. This task should be done at the start of any significant project. Visual consistency can be achieved by using the same tags in the same places. Just as important, changes can be applied to specific tags or styles in one place and then applied to the entire document. The concept of a style sheet is intended specifically to allow changes in one place to migrate to the rest of the document. Changes made to a style sheet can also be applied to other documents, helping to automate and keep consistent all documents of a particular project or organization. Coherent tag names allow the logical aspects of the document to guide the visual appearance. Style sheets are just starting to appear for Web page authoring. (See Section 1 . 5 Authoring in Chapter 1 World Wide Web for more info on Web style sheets.)
3 . 1 . 4 Page
Contrary to the other document components, the page is purely visual and has no meaningful logical aspects. Indeed, we could also have this discussion for a "screen" of information. Pages and screens are convenient well understood units of information content. Pages are the physical spaces in which textual content appears. Page sizes can be altered and documents can be reprinted in different sizes and formats for on-line browsing and so on, with no effect on the content. Pages do not have any logical aspects other than their very existence. They represent a canvas upon which the content is painted.
From a visual point of view, the page provides a place for a number of items. Headers, footers, body text, and page numbers are some of these items. They are placed on a page, in a consistent position throughout the document. The positioning of these items is primarily a matter of design; but there are also computational factors. Some of the page-specific items, such as the page numbers, running headers, and running footers, can be computed or extracted from the text. The content of these items can be changed, based on the specifics of the page.
Although a page has a specific size that is rarely changed, paying attention to the size is sometimes crucial. Many systems support specific page sizes implicitly. This implicit assumption can cause a nasty problem if you need to interchange documents with an organization that uses a different standard page size. This might happen when a U.S. organization exchanges documents with an organization based in Europe as U.S. standard page sizes (8.5 x 11 inches) are different than the ISO A4 (8.25 x 11.75 inches) size used in Europe. The document will probably not print correctly unless you adjust for page size. On-line documents formatted for a VGA PC screen or a large screen workstation encounter visual problems more and more as Web browsers dynamically reformat content.
The layout and overall design of components such as text, graphics, and illustration are best manipulated in a page layout program. The quintessential example of this type of software is Adobe PageMaker (formerly Aldus). One of the keys to PageMaker's success is that this software speaks the language of designers. It presents the user with a simulation of a pasteboard (an underlying grid for creating the proportions and overall structure of the document), a commonly used graphic design tool.
One distinction that must be applied only to the page is handednesswhether the content of a page is to appear on the right- or left-hand side of the printed document. Margins, columns, headers, footers, and page number positions are sometimes shifted on the page, depending on whether they are to appear on a left or right-hand page. The more powerful electronic publishing systems provide tools to control handedness of particular parts. One example is the ability to force the start of each document (for example, chapters) on a right hand page.
Text flow is yet another term that really crosses the boundary from a page to a document. Newspaper articles leave pointers to the connecting text, such as "see Bozos column 5, page 22". These pointers tell the reader where the text is continued. The visual shape of this flow is either rectangular or follows the shape of graphic elements. Page layout or page makeup programs such as QuarkXPress and Aldus Pagemaker provide tools that allow text flows to travel automatically around graphic elements.
Frames are another frequently encountered term with a strong relationship to the page. In a sense, a frame is a subdivision of a page. It is an invisible boundary in which content appears, just like a page. Frames, however, are not physical things; they are areas that can be manipulated while using the publishing system. Text can flow automatically from one particular frame to another. Corel Ventura (formerly called Ventura Publisher) and FrameMaker (now from Adobe) both use this concept. The Netscape 2.0 Web browser has an on-line frames capability that allows for more flexible Web page layouts and interactions.
Last, but not least, Interleaf generalizes many of the aspects of a page in a feature known as a microdocument. Microdocuments are "little" documents, inserts embedded in the pages of other documents, that can independently retain stylistic characteristics. All the styles associated with a particular document can be retained intact with microdocuments, but the microdocument can be no larger than a page.
3 . 1 . 5 Document
The document in its entirety is the next stop in our analysis of document components. From a visual point of view, the document is a physical object with a particular design. From a logical point of view, the document is composed of a certain structure. The visual design and construction of documents(3) is a topic beyond the scope of this book. However, electronic publishing systems can play an essential role in the manipulation of the logical aspects of a document.
The logical structure of a document is an important characteristic of the document. We can use that structure as a framework to evaluate document processing tools. Some questions to ask in determining the suitability of a particular publishing system are:
Can the system automatically generate a table of contents?
Can the system generate lists of various elements such as tables and figures?
What kind of graphics can be integrated easily with the text?
How robust are the indexing capabilities, if any?
Is there good bibliographic and cross-referencing support?
Technical publications, in particular, need robust document-oriented tools. The more automated the tools, the better. It is essential that the publishing system provide support for automatic section numbering, running headers and footers, styles or tags, and change control. In addition, support for global changeschanges to many files that are part of a larger documentis a major time saver.
Several publishing systems present the user with the idea of a book(4) as an organizational tool. Books are made up of collections of files. If a change is made to the book, then the change is actually made to all the files that make up the book. If your publishing projects routinely deal with hundreds of files, this type of support will be an important requirement for any publishing system. An on-line equivalent of a book is a Web site with its web of interconnected pages.
As the sheer size of the document grows, we start to see a significant distinction between WYSIWYG (what you see is what you get) and batch language oriented systems. Often you don't want to see extensive, repetitive, massive changes. If you are forced into too many hand manipulations, the publishing system may be unwieldy for the particular publishing application. The higher-end publishing systems try to balance WYSIWYG capabilities with the often awkward and complicated commands of a batchoriented system. (See Section 4 . 1 Types of Document Processors in Chapter 4 Form and Function of Document Processors for a more through discussion of WYSIWYG versus batch document processing.)
3 . 1 . 6 Encyclopedia
When we discuss the multivolume or encyclopedic scale of documents, our focus shifts from document manipulation to the concept of a data repository. Manipulation of large quantities of related material is one of the strengths of batchoriented document processing systems. Offline automated processing is a virtual requirement for this scale of manipulation.
This level, in our document component scale, also represents the highest point at which a collection of documents is part of a coherent whole. Representative examples of documents at this level are the many manuals of an operating system, the volumes of an encyclopedia, and the maintenance manuals for a jet engine. Interleaf is a good example of a system with capabilities at this level. It uses the concept of a cabinet that contains collections of other documents.
Only when a publishing system supports the manipulation of multiple volumes as a unit is the multivolume category qualitatively different from the previous category. The large volume of data and high capacities required for such manipulations are supported only by the highend publishing systems.
Again, Interleaf is an example of a publishing system that supports different types of style sheets; one that can be applied to individual documents and a master style sheet that is used to modify other style sheets called the master style sheet. Master style sheets are an important feature when massive and consistent changes are required. The language-oriented document processing systems such as troff and TeX (See Section 4 . 1 . 2 Language Characteristics in Chapter 4 Form and Function of Document Processors) are also effective at working with massive amounts of material. Automated scripts can be created and documents processed without human intervention. In general, however, skilled technical users must create these scripts as they require a different type of staff than the turnkey, but more expensive, systems.
3 . 1 . 7 Enterprise
An enterprise (no, I'm not talking about Star Trek), the final level in our document component scale, is discussed here because it relates to the topic of text retrieval. When maintaining or creating a library of documents or other large archival collections of documents, the technical issues are primarily ones of access. Finding information quickly and easily is the primary issue.
The most important area in which to address these issues is that of classification. Classification and searching systems are integral parts of library science. A good classification system enables users to locate the information they desire and aids in the management of the documents. After all, if you can't find the information you need, when you need it, you may as well not have it at all. One area where document processing and searching systems intersect is that of fulltext searching.
Fulltext searching is the ability to search for any word in an entire collection of documents. The searching is usually accomplished through the use of a document browser. The emphasis in fulltext searching is on speed at the sacrifice of space. It is not unusual for the indexes used to locate the text to take up as much space as the text itself. The combination of a good document browser and fulltext searching really makes the entire field of electronic books a useful practical commodity, rather than just an interesting toy.
Fulltext retrieval engines are widely used in the creation of systems that manage large quantities of text. These retrieval engines are becoming quite prevalent in the CD-ROM and Web site industry(5) and are a key technology to enable access to a library full of information. The large capacity of CD-ROMs is an ideal complement to the large space requirements of fulltext retrieval systems.
Text retrieval is a complex field that is growing in importance as the world gets interconnected ever more tightly with networks.(6) Internet Starting Points used with Web browsers all have one form or another of a text retrieval engine. The possibility of indexing the Web challenges the computer science of text retrieval.
The increased capacity of lowcost storage devices like CD-ROMs is also a major factor in text retrieval, because entire databases can be put on-line right at your very own PC. (For more information on text retrieval, see Section 8 . 5 . 2 Text Retrieval in Chapter 8 Document Management.)
The enterprise document level is the largest in scope of the seven levels. A collection of documents and tools for the management of an entire organizations documents is covered by this level. Some vendors even offer tools that help manage an enterprise's information resources.
Open Text, a company with a long history of text retrieval software, now offers a Web server that can index an internal Web, an "intranet". In fact, internal enterprise Webs are an increasingly popular use of the Web for project management, status reports, meeting scheduling, meeting minutes, and so on.
In "The Web and its Many Uses" an article in Advanced Systems Magazine, May 95 by Chuck Musciano (chuck.musciano@advanced.com), he argues for the use of the Web for a variety of organization wide functions: e-mail archives (via mail2html), meeting minutes, and reports. Concerning he Web as a front end to SCCS, he says "From simple things like on-line mail archives and team document collections to fancy tools that track customer queries and project status, the Web has a place at every level of your development organization."
In addition to increasing collaborations within an organization, internal Webs can be used to test out new technologies. As reported in Web Week,(7) AT&T is using its internal Web to shake out digital payment technologies. Primarily geared toward internal purchasing, the trial is also functioning as a testbed for the various types of digital payment technologies.
Another product, AnchorPage, will index your internal Web and allow visitors to search the content. As the scope of your Web grows, finding information becomes even more critical than simply adding information to the Web. Interleaf has a high end product from a long time electronic publishing software vendor. Their Web publishing product, Cyberleaf, addresses, in a comprehensive manner, not only the composition and Web page creation issues, but also organizational workflow issues.
Lotus, InterNotes Web publisher converts Notes databases into Web publishable documents. Notes is probably the preeminent "groupware" product. It enables groups of people to collaborate, by placing and updating information in a Notes server. (For more information of groupware, see Section 8 . 3 Groupware in Chapter 8 Document Management.) The contents of the Notes server are a valuable resource for an organization. The InterNotes Web publisher enables users of Notes to publish their Notes databases on the Web, widening the availability and utility of the database.
That about wraps up our analysis of document components. The Web is forming new information structures creating a collection of global networked information. The rapidly solidifying collection of information, accessible via networks, may quite realistically form a global library. The technical barriers to such a fantasy are quickly disappearing. Only the legal concerns (which are not minor) of intellectual property rights, copyrights, and patent law remain as murky unknowns. (For a more through discussion of the possibilities of networks, see Section 7 4 Electronic Distribution in Chapter 7 Applying Standards.)
Skip to chapter[1][2][3][4][5][6][7][8][9]
| © Prentice-Hall, Inc. A Simon & Schuster Company Upper Saddle River, New Jersey 07458 |