![]() |
|||||||||||||||||||||||||||
5.2 General Concepts of Rebuilding a System |
|||||||||||||||||||||||||||
The Problem Correctly and efficiently (re)building the executable version of a system from its individual source files can be a complicated task. In well-organized object-oriented programs, the implementation consists of many interrelated files. Usually each class will be represented by a code file (e.g., one with a ".cc", ".cpp", ".cxx", or similar suffix) and a header file (e.g., one with a ".h" suffix). During development and maintenance of the system it quickly becomes very difficult to remember what parts of the system must be recompiled when a given class has been changed, what compiler options (e.g., optimization level, search paths) are needed to compile a given class, into what, if any, library should the compiled object file be inserted, and what libraries and options are needed when the object files are linked together. The difficulty of this task is not always evident unless one has had experience with programming and managing the code for a system whose size exceeds that of typical introductory programming classes. The small-scale systems developed for such classes can be handled by a simple, brute-force strategy - recompiling everything all of the time. While this is workable for very small systems, it is horribly inefficient and often unusable for realistically sized systems. Even the modest-sized projects utilized in this book begin to tax the limitations of the brute-force approach. To manage the rebuilding task it is necessary to understand the concept of dependencies among code units and to understand the steps that are involved in compiling and linking the system together. The compiling and linking steps are driven by the dependencies and changes made to the code units. Understanding these concepts and steps is basic to incremental development and the operation of tools that automate the mechanical steps of the rebuilding task.
The difficulty of (re)building an executable version of a system stems from the many dependencies that exist among the units (files) that make up the system. The notion of one file being dependent on another file can be defined as follows: Dependent: File A is dependent on file B if it is possible that file A can be invalidated by a change in file B. Thus, whenever file B changes, file A must be regenerated to insure its own validity. Exactly how file A is regenerated depends on its nature. The table below describes three basic kinds of dependencies and the tools that are used to regenerate the dependent file.
There are two kinds of dependencies among the files that make up an object-oriented system. In the first, the header file for class X may depend on the header file of class Y (X.h --> Y.h), and in the second, the code file for class X may depend on the header file of class Y (X.cc --> Y.h). Notice that code files do not depend on other code files. The FileChooser class below ilustrates these two kinds of dependencies. The definition of the FileChooser class (in FileChooser.h) uses the class File as a return type of the AskUser() method. Thus, the FileChooser.h class must include File.h as FileChooser.h depends on File.h (FileChooser.h --> File.h).
class FileChooser {
private:
//...
public:
FileChooser(char* path, char* filter); //search at path with filter
FileChooser(char* path); //search at path, no filter
FileChooser(); //search at CWD, no filter
File AskUser(); //get file via dialog
~FileChooser(); //clean up
};
File FileChooser::AskUser() {
Directory directory(thePath, theFilter);
Selector selector(thePath);
char* nextName = directory.First();
while (nextName) {
selector.Add(nextName);
nextName = directory.Next();
}
char* fileChosen = selector.AskUser();
return File(fileChosen);
}
The AskUser method uses a Directory object and a Selector object. Thus, the FileChooser.cc file depends on the Directory.h file and the Selector.h file. A summary of these dependencies is:
FileChooser.h --> File.h
FileChooser.cc --> FileChooser.h Directory.h Selector.h
FileChooser.o --> FileChooser.cc
FileChooser.o --> FileChooser.cc FileChooser.h Directory.h
Selector.h File.h
To use an automated rebuilding tool it is necessary to know:
Each of these elements is presented below. Steps in Compiling and Linking An overview of the steps involved in compiling and linking the executable system is shown in the following figure. This overview identifies the relationships among the various tools and file types that are part of the process. The relationships are indicated by arrowed lines, which indicate what type of file is input to (or output by) each tool.
The first step in rebuilding the system is to compile all necessary source-code files into their corresponding object files. The source-code files contain the code written by the developer in the higher-level programming language (in our case C++). The compiler translates this source code into equivalent code in the instruction set of the processor on which the program will execute. The higher-level programming language is defined to be independent of the operating system and processor. While the source-code files are processor-independent, many source-code files are dependent on a particular operating system because they use services provided only by a specific one. The source code can be compiled without change on any machine with the required operating system but with different processors. Object files are processor-dependent because their compiler-generated contents are meaningful only to a single type of processor. The combination of the operating systems and processor type is refered to as the platform. Some source code is platform-independent, meaning that it can be compiled and executed on "any" platform. Examples of platform independent code are graphical packages or communication packages that operate on both Unix and Windows95 systems for any processor type. Compiling the systems consists of one or more independent compilation steps. Independent in this sense means that each execution of the compiler is unrelated to any past or future executions of the compiler: no information generated by the compiler in compiling one source file is used in compiling another source file. This independence means that the source-code files may be compiled in any order. Keep in mind, however, that within each compilation the defined-before-use ordering applies - the compiler must see the definition of a class before objects of that class can be created or manipulated. The second step in rebuilding the system is to link all necessary object files into a single executable file, which the linker accomplishes by weaving together the independently compiled object files into a single, integrated system. In linking the system together, the two most common errors are two elements with the same name and missing required elements. Duplicate names arise when developers use the same name for two different purposes; a missing element may be caused by inaccurate dependency information, leading to a failure to compile a necessary part of the system. The single most important act of the linker is to connect the code generated for the invocation of a method (in one compilation step) with the code generated for the implementation of that method (in a second compilation step). For example, the invocation whose source code form is
MyClass example;
....
example.MyMethod(arg0,...,argn-1,argn);
might get compiled into object code of the form
push argn // put arguments on stack
push argn-1
...
push arg0
call MyClass_MyMethod // execute code of method
where the symbol "MyClass_MyMethod" is a compiler-generated unresolved external reference. When the call instruction is executed, however, the processor needs to know where the code for MyMethod can be found. The linker supplies this information (termed "resolving the external reference") by using entry-point definitions generated by other compilation steps. When the MyClass class is compiled, in an independent compilation step, the source code of the form
MyClass::MyMethod(arg0,...,argn-1, argn)
{...}
might get compiled into object code of the form
MyClass_MyMethod: entry
...
instructions generated for the
source statements of the method
...
where the "entry" directive indicates the entry point (the location of the begining) of the method MyMethod. When the object file containing the invocation (and the unresolved external reference) and the object file containing the entry point are presented to the linker, the linker is able to use their combined information to resolve the external reference by replacing the unresolved external reference in the call instruction with the location of the entry point for the method's code. The linker typically is supplied with one or more libraries. A library is simply a collection of object files that have been placed together for convenience in a single file; it also may be called an archive. Commercially available software usually comes packaged in one or more libraries, and different operating systems provide utilities that developers use to build a library from a collection of the object files they have created. The linker will usually first use all of the non-library object files to resolve external references and then use the libraries to resolve any remaining external references. Each of the compiling and linking steps can produce error messages. The compiler, for example, will report errors in the syntax of the code written by the developer; the linker will report errors if duplicate entry point names are found during linking or if unresolved external references remain after all object files and libraries have been searched. It is important to be able to distinguish between these two types of error messages, because the developer must usually take different actions depending on which type of error occurs. It is also important to be able to distininguish between error messages generated during the rebuilding of the system from the error messages generated during execution.
How Dependencies Control Compiling and Linking The developer's automated rebuilding tool uses the dependency information to drive the compiling and linking process. The overall effect of this tool is shown as a high-level, inefficient scheme in the figure below, where all changed source files are recompiled along with any source files that depend on them. Once the recompilations are completed, the linker builds a new executable file. The real tool would, of course, use a more sophisticated and efficient strategy to achieve this same effect.
The automated rebuilding tool can detect changes in a file by comparing time information associated with each file, for the file system maintains the time when a file was created (creation time) and the time when the file was last written to (last modification time). A change is deemed to have occurred in the source file "souce" if
executable.CreationTime < source.LastModificationTime
where "executable" is the executable file. This test implies that any source file that has been modified since the creation of the executable must be recompiled.
The automated rebuilding tool must be informed by the developer of the search paths used by both the compiler and the linker to locate needed files. A search path is simply an ordered list of directories. The compiler (actually the preprocessor of the compiler) has an include file search path that gives the names of directories to search when it is attempting to find an include file. The linker has a library search path that gives the names of directories to search when it is attempting to find a library or archive file. These search paths are specified separately, because the include files and the library files are typically stored in different places in the file system, and in ways that are entirely dependent on the tool. By setting flags, chosing options, and defining variables, the developer can communicate parameter information to the compiler and linker. These parameters control the behavior of the compiler and linker, and what code is generated and linked. Flags and options are parameters that affect the behavior of the compiler and linker. Options are predefined choices governing compiler and linker behavior that are either selected (turned on) or not selected (turned off) by the developer. For example, a compiler typically has an option to select the level of warnings and error messages that it produces. One setting of this option causes only the most extreme error messages to be produce and all other surpressed, while another causes all errors and severe warnings to be produced but other warnings to be surpressed. The linker may have an option that indicates whether the linker does or does not treat duplicate definitions as an error. Flags are parameters that communicate a value other than a selection among predefined choices. For example, the compiler typically has a flag that allows the developer to explicitly state the name of the object file to be produced rather than the default name that the compiler would use. A linker flag is the name of a library file that should be searched in resolving external references. The nature and syntax of flags and options is compiler- and system-dependent. By defining variables the developer can control what is called
conditional compilation, which means that some detailed code may be included or excluded
in the compilation depending on the setting of a preprocessor
variable. Two common cases of conditional compilation are monitoring
code and platform-dependent code. Monitoring code is programmed
code inserted in the system during development for testing or
performance analysis later. This monitoring code is included in
the compilation so that at debugging and tuning it is part of
the executable test system. After debugging and tuning, the monitoring
code is excluded from the compilation so that the released executable
production system does not have the space and time overhead required
by the monitoring code. To build a system that runs on multiple
platforms it is often necessary to have two or more different
versions of some detailed code, each qunique to its particular
platform, or platform-dependent. For example, a network-communications
service or a window-management action might be used in slightly
different ways on a Windows95 system than it is on a Unix system.
Through conditional compilation, the correct version of the detail
code for a particular platform would not be included. Conditional compilation is achieved by defining and testing preprocessor variables. The preprocessor has a list of variables that during compilation are either defined or undefined. Tests on these variables may be inserted in the source code that cause the preprocessor to include (pass on to the compiler) a section of source code or exclude (not pass on to the compiler) a section of source code. An outline of an example of conditional compilation is shown in the figure below, where two sections of code are surrounded by preprocessor directives. Each #ifdef directive tests whether the named variables (_WINDOWS_95_ and _UNIX_ in the example) are currently defined. If the variable is defined, then the subsequent lines of source code (up to the matching #endif ) are included in the source stream produced by the preprocessor. If the variable is not defined, then the subsequent lines of source code (up to the matching #endif ) are excluded from the source stream.
The variables tested by the preprocessor can be defined in one of two ways. The first is to use a header file containing one or more #define directives and including it before any use of the variables in conditional compilation tests. The table below shows an example of header file containing a variable definition and the testing of this variable in another file.
the second way to define a preprocessor variable is as a compiler flag. The syntax and procedure for defining the preprocessor variables as compiler flags is dependent on the compiler. The source code that tests the preprocessor variable is independent of how the preprocessor variable is defined. Software development projects of any size are always implemented in a progressive and incremental manner. It is never the case that all of the code is written before any of it is tested, evaluated, and possibly modified to remove errors or to change parts of the overall design. The many small, progressive tasks that define the incremental strategy for a given system are usually planned in advance. Each step in the incremental development is carefully selected so that it is both testable and minimal. The ability to test each step is necessary to ensure that it is implemented correctly and that it operates correctly with the code already present. There is little point in adding a small bit of code so incomplete that there is no way to test it to determine these properties. At the same time, a step represents the smallest, testable incremental addition. If a step is too large (i.e., introduces too much new functionality and code), it becomes difficult to test it as completely as would be advisable.
Incremental development is critical to developing software for the following reasons:
The importance and utility of incremental development is reflected in the broad spectrum of technical, managerial, and organizational effects that flow from its use. As an example of incremental development, consider a part of graphical editing system. Similar to many common drawing tools, the graphical editing system allows the user to draw, resize, move, and group together a number of basic shapes such as rectangles, circles, and lines. The user may select a color for each shape from a pallete of available colors. One incremental development plan for this system is shown in the figure below.
The steps shown in the figure are only one of many good incremental development plans. Notice that each step focuses on adding to the system a specific capability that can be observed and tested. Also notice that new capabilities may be added to a prototype developed in an earlier step and that the prototypes from two steps may be combined. Two broad categories of tools are toolkits and integrated development environments (IDEs). There are a wide range of tools in either category that are available from commercial vendors or from pubic-domain sources. The toolkit approach presupposes that the user has available a number of different tools from different vendors for the same task. The user selects among these tools weighing such factors as the individuals' preferences, previous experience and familiarity with the tools, cost of each tool, and availability of the tools on a given platform. The user selects one tool for each task that best suits the user's requirements. For example, in a standard programming environment the user might need an editor for composing and revising the source text, a compiler and a system for automating the rebuilding process, and a debugger. The developer might choose to use the editor with which heor she is most familiar and productive, a commercially available compiler and rebuilding tool that produces good error messages and efficient code, and a public-domain debugger that has some novel and needed features not yet included in a production debugger. The advantages of the toolkit approach are:
The disadvantages of the toolkit approach are:
Toolkit users may feel that they bear too much of the burden of tool evaluation and tool integration and that more of this responsibility should be assumed by the tool developers. However, the tool developers and vendors may rightly claim that their interest is in producing a single tool that is the best of its kind, leaving them no time for the task of integration with other tools, that there is little or no economic incentive to integrate with other tools, especially if there are a large number of other tools with which the integration may be done. The integrated development environment (IDE) approach envisions a single, comprehensive system-development facility within which all of the tasks related to the programming, rebuilding, and debugging of a system are conducted. In this approach the user chooses among IDEs, not among individual tools, because the IDE is a single, individible utility package. The advantages of the IDE approach are:
The corresponding disadvantages of the IDE approach are:
Users of IDEs may feel that they are held captive to a given IDE because of the high cost change to a different IDE entails. Also, the value that a given user or community of users attaches to a particular capability may differ considerably from the value the IDE developer assigns it. In the follwing sections. two different tools are described in conjunction with the study of rebuilding software written in C++: the GNU public-domain toolset and Microsoft's Visual C++ IDE. Each of these has a dedicated user community, is a good representative of its approach, and is a worthy object of study.
|
|||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||
ÿ