5.2 General Concepts of Rebuilding a System


The Problem

Correctly and efficiently (re)building the executable version of a system from its individual source files can be a complicated task. In well-organized object-oriented programs, the implementation consists of many interrelated files. Usually each class will be represented by a code file (e.g., one with a ".cc", ".cpp", ".cxx", or similar suffix) and a header file (e.g., one with a ".h" suffix). During development and maintenance of the system it quickly becomes very difficult to remember what parts of the system must be recompiled when a given class has been changed, what compiler options (e.g., optimization level, search paths) are needed to compile a given class, into what, if any, library should the compiled object file be inserted, and what libraries and options are needed when the object files are linked together.

The difficulty of this task is not always evident unless one has had experience with programming and managing the code for a system whose size exceeds that of typical introductory programming classes. The small-scale systems developed for such classes can be handled by a simple, brute-force strategy - recompiling everything all of the time. While this is workable for very small systems, it is horribly inefficient and often unusable for realistically sized systems. Even the modest-sized projects utilized in this book begin to tax the limitations of the brute-force approach.

To manage the rebuilding task it is necessary to understand the concept of dependencies among code units and to understand the steps that are involved in compiling and linking the system together. The compiling and linking steps are driven by the dependencies and changes made to the code units. Understanding these concepts and steps is basic to incremental development and the operation of tools that automate the mechanical steps of the rebuilding task.

 

Dependencies

The difficulty of (re)building an executable version of a system stems from the many dependencies that exist among the units (files) that make up the system. The notion of one file being dependent on another file can be defined as follows:

Dependent:

File A is dependent on file B if it is possible that file A can be invalidated by a change in file B.

Thus, whenever file B changes, file A must be regenerated to insure its own validity. Exactly how file A is regenerated depends on its nature. The table below describes three basic kinds of dependencies and the tools that are used to regenerate the dependent file.

Examples of Dependencies
Dependent File Depends on Tool to Regenerate
object file source files
(code and header files)
compiler
library object files librarian or archiver
executable object files and libraries linker


In addition to these, there are other kinds of dependencies, such as a grammar file that might be used to automatically generate a parser for reading the system's input. When the grammar file changes, the parser must be regenerated.

There are two kinds of dependencies among the files that make up an object-oriented system. In the first, the header file for class X may depend on the header file of class Y (X.h --> Y.h), and in the second, the code file for class X may depend on the header file of class Y (X.cc --> Y.h). Notice that code files do not depend on other code files.

The FileChooser class below ilustrates these two kinds of dependencies. The definition of the FileChooser class (in FileChooser.h) uses the class File as a return type of the AskUser() method. Thus, the FileChooser.h class must include File.h as FileChooser.h depends on File.h (FileChooser.h --> File.h).

   class FileChooser  {
    private: 
          //...
    public:

          FileChooser(char* path, char* filter);   //search at path with filter
          FileChooser(char* path);                 //search at path, no filter
          FileChooser();                           //search at CWD, no filter
     File AskUser();                               //get file via dialog
         ~FileChooser();                           //clean up
     };


The implementation of the FileChooser class (in FileChooser.cc) depends, of course, on FileChooser.h. Further dependencies are found in the implementation of the class's methods. The implementation of the AskUser() method is:

   File FileChooser::AskUser() {
       Directory directory(thePath, theFilter);
       Selector selector(thePath);

       char* nextName = directory.First();
       while (nextName) {
           selector.Add(nextName);
           nextName = directory.Next();
       }
   
       char* fileChosen = selector.AskUser();
       return File(fileChosen);

    }

The AskUser method uses a Directory object and a Selector object. Thus, the FileChooser.cc file depends on the Directory.h file and the Selector.h file. A summary of these dependencies is:

       FileChooser.h  --> File.h
        FileChooser.cc --> FileChooser.h Directory.h Selector.h


A final dependency is that the object file (denoted by the .o suffix) created by compiling a code file (denoted by the .cc suffix) depends on that code file. Thus,

       FileChooser.o --> FileChooser.cc


Finally, the dependencies are transative and cumulative, so that, in total:

       FileChooser.o --> FileChooser.cc FileChooser.h Directory.h 
                          Selector.h File.h


This list of dependencies reflects, for example, that if Directory.h changes (i.e., the Directory class interface changes) then the FileChooser class should be recompiled to insure that it conforms properly to these changes.

 

Compiling and Linking

To use an automated rebuilding tool it is necessary to know:

  • what steps occur in compiling and linking a system together,
  • how dependencies control the compiling and linking,
  • the role of search paths in locating information needed by the compiler and linker, and
  • the use of flags, options, and variables to communicate parameters to the compiler and linker.

Each of these elements is presented below.

 

Steps in Compiling and Linking

An overview of the steps involved in compiling and linking the executable system is shown in the following figure. This overview identifies the relationships among the various tools and file types that are part of the process. The relationships are indicated by arrowed lines, which indicate what type of file is input to (or output by) each tool.


Compiling and Linking the Executable System

The first step in rebuilding the system is to compile all necessary source-code files into their corresponding object files. The source-code files contain the code written by the developer in the higher-level programming language (in our case C++). The compiler translates this source code into equivalent code in the instruction set of the processor on which the program will execute. The higher-level programming language is defined to be independent of the operating system and processor. While the source-code files are processor-independent, many source-code files are dependent on a particular operating system because they use services provided only by a specific one. The source code can be compiled without change on any machine with the required operating system but with different processors. Object files are processor-dependent because their compiler-generated contents are meaningful only to a single type of processor. The combination of the operating systems and processor type is refered to as the platform. Some source code is platform-independent, meaning that it can be compiled and executed on "any" platform. Examples of platform independent code are graphical packages or communication packages that operate on both Unix and Windows95 systems for any processor type.

Compiling the systems consists of one or more independent compilation steps. Independent in this sense means that each execution of the compiler is unrelated to any past or future executions of the compiler: no information generated by the compiler in compiling one source file is used in compiling another source file. This independence means that the source-code files may be compiled in any order. Keep in mind, however, that within each compilation the defined-before-use ordering applies - the compiler must see the definition of a class before objects of that class can be created or manipulated.

The second step in rebuilding the system is to link all necessary object files into a single executable file, which the linker accomplishes by weaving together the independently compiled object files into a single, integrated system. In linking the system together, the two most common errors are two elements with the same name and missing required elements. Duplicate names arise when developers use the same name for two different purposes; a missing element may be caused by inaccurate dependency information, leading to a failure to compile a necessary part of the system.

The single most important act of the linker is to connect the code generated for the invocation of a method (in one compilation step) with the code generated for the implementation of that method (in a second compilation step). For example, the invocation whose source code form is

    MyClass example;
     ....
     example.MyMethod(arg0,...,argn-1,argn);

might get compiled into object code of the form

    push argn              // put arguments on stack
     push argn-1
     ...
     push arg0
     call MyClass_MyMethod  // execute code of method

where the symbol "MyClass_MyMethod" is a compiler-generated unresolved external reference. When the call instruction is executed, however, the processor needs to know where the code for MyMethod can be found. The linker supplies this information (termed "resolving the external reference") by using entry-point definitions generated by other compilation steps. When the MyClass class is compiled, in an independent compilation step, the source code of the form

    MyClass::MyMethod(arg0,...,argn-1, argn) 
     {...}

might get compiled into object code of the form

    MyClass_MyMethod: entry
        ...
        instructions generated for the
        source statements of the method
        ...

where the "entry" directive indicates the entry point (the location of the begining) of the method MyMethod. When the object file containing the invocation (and the unresolved external reference) and the object file containing the entry point are presented to the linker, the linker is able to use their combined information to resolve the external reference by replacing the unresolved external reference in the call instruction with the location of the entry point for the method's code.

The linker typically is supplied with one or more libraries. A library is simply a collection of object files that have been placed together for convenience in a single file; it also may be called an archive. Commercially available software usually comes packaged in one or more libraries, and different operating systems provide utilities that developers use to build a library from a collection of the object files they have created. The linker will usually first use all of the non-library object files to resolve external references and then use the libraries to resolve any remaining external references.

Each of the compiling and linking steps can produce error messages. The compiler, for example, will report errors in the syntax of the code written by the developer; the linker will report errors if duplicate entry point names are found during linking or if unresolved external references remain after all object files and libraries have been searched. It is important to be able to distinguish between these two types of error messages, because the developer must usually take different actions depending on which type of error occurs. It is also important to be able to distininguish between error messages generated during the rebuilding of the system from the error messages generated during execution.

 

How Dependencies Control Compiling and Linking

The developer's automated rebuilding tool uses the dependency information to drive the compiling and linking process. The overall effect of this tool is shown as a high-level, inefficient scheme in the figure below, where all changed source files are recompiled along with any source files that depend on them. Once the recompilations are completed, the linker builds a new executable file. The real tool would, of course, use a more sophisticated and efficient strategy to achieve this same effect.


Using Dependencies to Control Recompilation
for all A in SourceFiles
    {   if (A has changed)
        {   recompile A;
            for all B in SourceFiles
            {   if B depends on A
                {    then recompile B
                }
            }
        }
    }
    link object files;

The automated rebuilding tool can detect changes in a file by comparing time information associated with each file, for the file system maintains the time when a file was created (creation time) and the time when the file was last written to (last modification time). A change is deemed to have occurred in the source file "souce" if

    executable.CreationTime < source.LastModificationTime

where "executable" is the executable file. This test implies that any source file that has been modified since the creation of the executable must be recompiled.

 

Search Paths

The automated rebuilding tool must be informed by the developer of the search paths used by both the compiler and the linker to locate needed files. A search path is simply an ordered list of directories. The compiler (actually the preprocessor of the compiler) has an include file search path that gives the names of directories to search when it is attempting to find an include file. The linker has a library search path that gives the names of directories to search when it is attempting to find a library or archive file. These search paths are specified separately, because the include files and the library files are typically stored in different places in the file system, and in ways that are entirely dependent on the tool.

Flags, Options, and Variables

By setting flags, chosing options, and defining variables, the developer can communicate parameter information to the compiler and linker. These parameters control the behavior of the compiler and linker, and what code is generated and linked.

Flags and options are parameters that affect the behavior of the compiler and linker. Options are predefined choices governing compiler and linker behavior that are either selected (turned on) or not selected (turned off) by the developer. For example, a compiler typically has an option to select the level of warnings and error messages that it produces. One setting of this option causes only the most extreme error messages to be produce and all other surpressed, while another causes all errors and severe warnings to be produced but other warnings to be surpressed. The linker may have an option that indicates whether the linker does or does not treat duplicate definitions as an error. Flags are parameters that communicate a value other than a selection among predefined choices. For example, the compiler typically has a flag that allows the developer to explicitly state the name of the object file to be produced rather than the default name that the compiler would use. A linker flag is the name of a library file that should be searched in resolving external references. The nature and syntax of flags and options is compiler- and system-dependent.

By defining variables the developer can control what is called conditional compilation, which means that some detailed code may be included or excluded in the compilation depending on the setting of a preprocessor variable. Two common cases of conditional compilation are monitoring code and platform-dependent code. Monitoring code is programmed code inserted in the system during development for testing or performance analysis later. This monitoring code is included in the compilation so that at debugging and tuning it is part of the executable test system. After debugging and tuning, the monitoring code is excluded from the compilation so that the released executable production system does not have the space and time overhead required by the monitoring code. To build a system that runs on multiple platforms it is often necessary to have two or more different versions of some detailed code, each qunique to its particular platform, or platform-dependent. For example, a network-communications service or a window-management action might be used in slightly different ways on a Windows95 system than it is on a Unix system. Through conditional compilation, the correct version of the detail code for a particular platform would not be included.

Conditional compilation is achieved by defining and testing preprocessor variables. The preprocessor has a list of variables that during compilation are either defined or undefined. Tests on these variables may be inserted in the source code that cause the preprocessor to include (pass on to the compiler) a section of source code or exclude (not pass on to the compiler) a section of source code. An outline of an example of conditional compilation is shown in the figure below, where two sections of code are surrounded by preprocessor directives. Each #ifdef directive tests whether the named variables (_WINDOWS_95_ and _UNIX_ in the example) are currently defined. If the variable is defined, then the subsequent lines of source code (up to the matching #endif ) are included in the source stream produced by the preprocessor. If the variable is not defined, then the subsequent lines of source code (up to the matching #endif ) are excluded from the source stream.


Conditional Compilation
#ifdef _WINDOWS_95_
... code to include for a Windows95 platform
#endif
#ifdef _UNIX_
...code to include for a Unix platform
#endif

The variables tested by the preprocessor can be defined in one of two ways. The first is to use a header file containing one or more #define directives and including it before any use of the variables in conditional compilation tests. The table below shows an example of header file containing a variable definition and the testing of this variable in another file.


Defining and Using Variables for Conditional Compilation
// This is the file Platform.h
#ifndef _PLATFORM_H
#define _PLATFORM_H
...
#define _WIN_95_  // use W'95 platform
...
#endif
#include "Platform.h"
...
#ifdef _WIN_95_
... code for Windows95 platform
#endif
#ifdef _UNIX_
...code for Unix platform
#endif

the second way to define a preprocessor variable is as a compiler flag. The syntax and procedure for defining the preprocessor variables as compiler flags is dependent on the compiler. The source code that tests the preprocessor variable is independent of how the preprocessor variable is defined.

Incremental Development

Software development projects of any size are always implemented in a progressive and incremental manner. It is never the case that all of the code is written before any of it is tested, evaluated, and possibly modified to remove errors or to change parts of the overall design. The many small, progressive tasks that define the incremental strategy for a given system are usually planned in advance.

Each step in the incremental development is carefully selected so that it is both testable and minimal. The ability to test each step is necessary to ensure that it is implemented correctly and that it operates correctly with the code already present. There is little point in adding a small bit of code so incomplete that there is no way to test it to determine these properties. At the same time, a step represents the smallest, testable incremental addition. If a step is too large (i.e., introduces too much new functionality and code), it becomes difficult to test it as completely as would be advisable.


Some experience is usually required to gain proficiency in identifying a good set of incremental steps. However, once learned, the ability to develop systems in testable, minimal steps will yield numerous advantages.

Incremental development is critical to developing software for the following reasons:

  • Easier testing/debugging: at each step there is less new code introduced, lessening the number of possible new bugs that and the interactions of the new code with the code already in the system. Since less code is introduced, it is easier to discover where mistakes have been made - there are simply fewer places to look. Minimizing the interactions between new and old code is important, because problems with the code from previous steps may only be revealed by later steps. While this detracts somewhat from the ideal of progressive development because backtracking may be needed, it is still enormously better than the alternative method of dealing with all of the code (and all of the bugs and interactions) in one huge, and usually unsuccessful, step.

  • Better risk control: a project may present unusual requirements or involve system services not previously used by members of the development team. In this case, there is a certain amount of risk involved in those parts of the system. By identifying those parts of the system, the team can decide where best to tackle the high-risk parts. At times it may be better to get some of the well-understood parts working and in place first; other times the team might decide to master the high-risk parts first to be certain that unknown difficulties in these parts will not arise late in the project, when changes to large parts of the already developed code might have to be made to accommodate the components that involve the high-risk parts.

  • Better team organization: team members have a better understanding of how their work relates to the work of other team members when incremental, collaborative steps are taken. In some cases, there may be several independent steps that can be taken, allowing the team members to work in parallel. As team members have different skills and interests, it is more likely that a given team member can be assigned tasks for which that team member is particularly skilled.

  • Concrete measurability: the set of completed steps (often termed "milestones") can be shown to project managers as concrete evidence of progress. It is far better to have completed half of the steps and be able to demonstrate a partially functional system than to have half of the code written but not have this code integrated or visibly working.

  • Psychological benefits: team members have a better mental attitude toward the project resulting from the satisfaction that comes with the completion of individual steps or tasks. These small intermediate rewards help to sustain the energy and enthusiasm of the team. A genuine sense of progress and direction results from the incremental approach.

The importance and utility of incremental development is reflected in the broad spectrum of technical, managerial, and organizational effects that flow from its use.

As an example of incremental development, consider a part of graphical editing system. Similar to many common drawing tools, the graphical editing system allows the user to draw, resize, move, and group together a number of basic shapes such as rectangles, circles, and lines. The user may select a color for each shape from a pallete of available colors. One incremental development plan for this system is shown in the figure below.


An Incremental Development Plan
  1. draw a single rectangle at a fixed location with a fixed color; no moving, resizing or grouping is allowed
  2. draw a single rectangle at a user-selected location with a fixed color; no moving, resizing or grouping is allowed
  3. draw a single instance of each basic shape at user-selected locations each with a fixed color; no moving, resizing or grouping is allowed
  4. draw a single rectangle at a fixed location with a user-selected color; no moving, resizing or grouping is allowed
  5. draw a single rectangle at a user-selected location with a user-selected color; no moving, resizing or grouping is allowed
  6. (combine steps 3 and 5) draw a single instance of each basic shape at user-selected locations each with a user-selected color; no moving, resizing or grouping is allowed
  7. draw any number of each basic shape at user-selected locations each with a user-selected color; no moving, resizing or grouping is allowed
  8. draw a single rectangle at a fixed location with a fixed color; the rectangle can be moved by the user; no resizing or grouping is allowed
  9. draw a single rectangle at a fixed location with a fixed color; the rectangle can be moved and resized by the user; no grouping is allowed
  10. (combine steps 7 and 9) draw any number of each basic shape at user-selected locations each with a user-selected color; each shape can be moved and resized by the user; no grouping is allowed
  11. draw any number of each basic shape at user-selected locations each with a user-selected color; each shape can be moved and resized by the user; basic shapes may be grouped together (but a group may not be grouped with another basic shape or with other groups)
  12. draw any number of each basic shape at user-selected locations each with a user-selected color; each shape can be moved and resized by the user; basic shapes may be grouped together, a group may be grouped with one or more other basic shapes (but a group may not be grouped with other groups)
  13. draw any number of each basic shape at user-selected locations each with a user-selected color; each shape can be moved, resized, and grouped by the user

The steps shown in the figure are only one of many good incremental development plans. Notice that each step focuses on adding to the system a specific capability that can be observed and tested. Also notice that new capabilities may be added to a prototype developed in an earlier step and that the prototypes from two steps may be combined.

Tools

Two broad categories of tools are toolkits and integrated development environments (IDEs). There are a wide range of tools in either category that are available from commercial vendors or from pubic-domain sources.

The toolkit approach presupposes that the user has available a number of different tools from different vendors for the same task. The user selects among these tools weighing such factors as the individuals' preferences, previous experience and familiarity with the tools, cost of each tool, and availability of the tools on a given platform. The user selects one tool for each task that best suits the user's requirements. For example, in a standard programming environment the user might need an editor for composing and revising the source text, a compiler and a system for automating the rebuilding process, and a debugger.

The developer might choose to use the editor with which heor she is most familiar and productive, a commercially available compiler and rebuilding tool that produces good error messages and efficient code, and a public-domain debugger that has some novel and needed features not yet included in a production debugger. The advantages of the toolkit approach are:

  • New tools can be added as new tasks become part of the development process,
  • new and more effective tools can replace older, less effective ones without requiring any other tools to change,
  • the user's expertise with common tools (e.g., editors) can be leveraged to avoid unnecessary retraining and exploit the user's already-developed skills.

The disadvantages of the toolkit approach are:

  • The user interfaces of the toolkit as a whole may be inconsistent and difficult to use. Different tools may have different metaphors and conventions that cause the developer to make mistakes and become frustrated as he or she moves among the different tools.
  • Information loss among the tools is more likely. Because the tools were developed without any awareness of one another, they are not able to communicate among themselves. Any such communication must be done by the developer, often at the expense of productivity. For example, when the user is debugging a file and finds a line of code that needs to be changed, the editor is not aware of which file should be edited (although the debugger knows which file it is working on it cannot commuicate this information to the editor). Similarly, once the editor has changed a source file, the editor has no way of communicating this information to the rebuilding tool.

Toolkit users may feel that they bear too much of the burden of tool evaluation and tool integration and that more of this responsibility should be assumed by the tool developers. However, the tool developers and vendors may rightly claim that their interest is in producing a single tool that is the best of its kind, leaving them no time for the task of integration with other tools, that there is little or no economic incentive to integrate with other tools, especially if there are a large number of other tools with which the integration may be done.

The integrated development environment (IDE) approach envisions a single, comprehensive system-development facility within which all of the tasks related to the programming, rebuilding, and debugging of a system are conducted. In this approach the user chooses among IDEs, not among individual tools, because the IDE is a single, individible utility package. The advantages of the IDE approach are:

  • The user interfaces of the IDE are usually consistent and easier to use because the IDE was developed as a single utility and usually has the same metaphors and conventions across all of its functions.

  • Information loss among the pieces of the environment is minimized. Since the IDE is an integrated set of services, it is possible to build in communication among its components that facilitates user activities. Thus, the debugger can communicte the file being debugged to the editor which can, in turn, inform the rebuilding tool of any changes made, all without any overt user effort.

The corresponding disadvantages of the IDE approach are:

  • The IDE is a closed environment in that it is usually difficult or impossible for the average user to extend the capability of the environment by including a new tool or feature. In some cases this may caupose a considerable dilemma for the user. Suppose that a newly available IDE contains one or a small number of very useful features that are not available with the IDE currently being used - it may be extremely difficult to judge the benefits of the features in the new IDE against the loss of familiarity and productivity with the existing IDE.

  • The user's existing skill in using a tool is completely lost if that tool is not part of the IDE. For example, a user may have to learn a new editor in order to use a particular IDE. The years of experience, fluency, and productivity that the user has with the existing editor is lost in using an IDE that has a different editor. It is not a question of one editor being better than the other, for the two editors may have equivalent capability. It is a question of the time it takes a user to become equally productive with the new tool.

Users of IDEs may feel that they are held captive to a given IDE because of the high cost change to a different IDE entails. Also, the value that a given user or community of users attaches to a particular capability may differ considerably from the value the IDE developer assigns it.

In the follwing sections. two different tools are described in conjunction with the study of rebuilding software written in C++: the GNU public-domain toolset and Microsoft's Visual C++ IDE. Each of these has a dedicated user community, is a good representative of its approach, and is a worthy object of study.

 




©1998 Prentice-Hall, Inc.
A Simon & Schuster Company
Upper Saddle River, New Jersey 07458

Legal Statement

 

ÿ