3 . 4 The Engineered View
Documents are complex objects. Let's now examine the document as an object composed of a variety of pieces that must be "engineered" together.
Often, the only time all pieces of a project come together is when the final report is due. All the information gathered from a variety of sources must be assembled into a coherent, deliverable product. Most likely, many people contribute to the final report. Their individual idiosyncratic uses of publishing tools must be integrated into a consistent product. Data created by spreadsheets or images from drawing tools are also often included in completed documents. The assembly of all these components brings us to the topic of the compound document.
3 . 4 . 1 Compound Document
The compound document, as its name suggests, is a document composed of many parts. These parts may originate from vastly different systems and exist in many different formats. From a technical standpoint, the integration of these pieces into a coherent whole is a formidable task. Each part must be integrated seamlessly into what appears to be a single consistent document. Even more difficult is the often necessary requirement to go back to the original system that created the data, such as a spreadsheet, to edit the data.
Electronically created compound documents resemble information quilts patched together from a variety of information sources. You may use information created for one purpose in one particular system in several systems. You may also use the information for a different purpose than was intended. Documents created with such information can quickly become impossible to maintain and update.
The original data sources become an integral part of the creation process, and great care must be exercised to maintain those data sources for future versions of the document. Text, graphics, and scanned photos may be assembled for one purpose and later reassembled for another (i.e., a Web site). You may reuse document content. If proper care is taken of all the various data sources, you can reuse the information. Reusing the content allows an organization to profit from the publication of the content again and again.
Before we get into some more detail, let's take a look at the forest before starting a hike through the trees. Many technologies created in the last several years impact compound documents. The concept, however, is simple and elegant. The user should be allowed to read, or write a document. Inside the document are all sorts of media types that the user may want to mess around with as part of the editing process.
The world starts getting complicated when vendors, of necessity, address issues concerning the storage and interoperability of these complex compound documents. For example, if a document contains a variety of spreadsheets embedded in the document, it is comforting for the user to know that the spreadsheet will be updatable. The document itself becomes the focus of a user's attention and becomes the principal vehicle for system-wide data integration. One trend has been to represent the various media types as "objects." Then you can use and reuse the objects and the software which operates on them. A wide variety of object storage mechanisms have appeared with no clear winner on the horizon. Expect confusion to be the norm for several more years, at least.
Two major integration strategies are Microsoft's OLE 2.0 and Apple's OpenDoc. OpenDoc is a collaboration between Apple and IBM and was designed for multi-platform operations. A somewhat dated, but still valuable comparison of OpenDoc to OLE is available from IBM at: http://www.austin.ibm.com/pspinfo/odoc-ole.html.
From the OpenDoc FAQ:
What is OpenDoc? OpenDoc is a multi-platform, component software architecture that enables developers to evolve current applications into component software or to create new component software applications. OpenDoc software will run on Apple Macintosh personal computers, as well as Windows, Windows NT, OS/2, and AIX systems. With software enabled by OpenDoc, users will be able to mix and match software to fit their needs, combining text, graphics, video, spreadsheets, and many other types of data into a single document.
Individual elements, called components, may be edited by "component editors." A component editor is a "independent program that manipulates and displays a particular kind of content."
The object representation for OpenDoc is called the System Object Model (SOM) and is from IBM. Again from the OpenDoc FAQ it is a "platform-independent framework for allowing component software to exchange data and instructions. It is a highly efficient dynamic linking mechanism for objects, which supports multiple languages and provides a gateway to distributed object servers."
Another element of the OpenDoc Architecture is Bento, a portable compound document and multimedia storage library and format. Finally there is also "Component Glue," an acknowledgment that Microsoft exists. Component Glue "enables interoperability with Microsoft Corporation's Object Linking and Embedding (OLE) technology for inter-application communication. OpenDoc's significantly simpler API allows developers to program Microsoft OLE much easier via OpenDoc." (See the OpenDoc Web site for more gory details at: http://www.opendoc.apple.com.)
OLE 2.0 from Microsoft is based on yet another object storage model called the Common Object Model (COM). It is more appropriate to compare COM to CORBA (Common Object Request Broker Architecture) rather than to OpenDoc. COM and CORBA are also not attacking the exact same problems, so a comparison here is also flawed. In an excellent article, "OLE and COM vs. CORBA" by Michael Foody in the April 1996 issue of UNIX Review, Foody points out that, "In general terms, COM...is used in desktop applications to provide a binary standard for software component interoperability and ORBs are used as the infrastructure to construct larger-scale distributed systems. Of course, Microsoft is working on a distributed version of COM, designed for use in enterprise-class distributed systems, while IBM is busy working with Apple to use SOM as the basis for a desktop component model called OpenDoc."
Both IBM and DEC have had other software projects that address the challenge of compound documents. IBM's MO:DCA (Mixed Object Document Content Architecture) is a combination compound document and object architecture. DEC's CDA (Compound Document Architecture) is a system resembling the philosophical approach of ODA. (For more information on the Office Document Architecture standard, (See Section 5 . 5 ODA in Chapter 5 Document Standards.)
As we've just seen, the concepts of compound documents have been around for quite some time. The coming of the Web, however, makes the creation and use of compound documents a common place occurrence. With all the advantages the Web has brought, it has also magnified some of the problems of conformance, performance, and standardization. Vendors are trying to differentiate themselves by introducing hot new technologies. Content creators are placed in a bind because the use of these new technologies, although compelling, limits the audience and distribution possibilities. There are no simple answers; just be aware of what's going on so you can make educated choices.
3 . 4 . 2 Active Documents
The various architectural approaches discussed in the previous section permit the creation of new types of document processing. One new type is the active document. A number of publishing systems already tout this capability, but may call it different things. For example, a pie chart of data from a spreadsheet, included in a document, may update itself when the spreadsheet changes. In another case, a paragraph just rewritten may initiate an electronic mail message to a manager, informing the manager of the change and requesting approval. The document is no longer a passive object; it is doing things. The notion of a document with active components is another step in the direction of a totally integrated information environment.
Several technologies are available for inter-process and inter-application communication. Publishing systems approach the problem of application communications in several ways. Ultimately, the publishing system depends on the services provided by the operating system. Most operating systems provide some mechanism for interapplication communications, and these mechanisms are exploited by some of the publishing systems. For example, on MS-DOS platforms running MS Windows, a facility called OLE (Object Linking and Embedding) is used by MS Word for Windows to include "live" EXCEL spreadsheets. The Macintosh's System 7 operating system has a "Publish and Subscribe" facility for inter-application communication. Interleaf and FrameMaker on UNIX platforms use RPC (Remote Procedure Calls) to allow an AutoCad drawing in a document to be linked to the AutoCad application.
Interleaf's active document technology is one of the more ambitious implementations of the active document approach. Document sections can behave in certain ways and take various actions. For example, a document can be directed to send e-mail to various managers for approval before permission is granted for the public to view the document. In fact, one of Lotus Notes, strengths is to allow the organization of this type of work flow procedures with various types of documents. (See Section 8 . 3 Groupware in Chapter 8 Document Management for more information on work flow issues.)
This feature could prove invaluable to organizations that require complex configuration management of documents, because documents are just one portion of an engineering effort. For example, the production of an airplane must correspond accurately to the various designs and tests of the airplane. The ability to embed "intelligence" into documents is an interesting approach to the configuration management problem. (For more discussion on this topic, See Section 8 . 2 Configuration Management in Chapter 8 Document Management.)
Here again, the Web provides ample examples of the ability to take older concepts and apply them to newer implementations. Active document technology is perhaps best exemplified in the Web with the emergence of Java. The ability to transmit little programs called applets has taken the Web by storm. The enthusiasm with which the Net has embraced Java is both a credit to Sun's technology and their ability to market it in a Net-friendly manner. Java applets allow authors to wake up their documents. No longer passive reading material, a Java-cized document can shout, sing, and interact with the reader. Active documents have hit the mainstream.
Skip to chapter[1][2][3][4][5][6][7][8][9]
| © Prentice-Hall, Inc. A Simon & Schuster Company Upper Saddle River, New Jersey 07458 |