Compilers
The study of compilers is a core area of computer science, dealing with practical issues in the design and implementation of programming languages. Narrowly defined, a compiler translates a program written in a programming language (such as Pascal, C or Java) into the low-level instructions that a computer can understand. A compiler is itself a large program, subject to interesting design decisions and software engineering problems. As such, many courses in compilers deal as much with their structure as with their function.
More broadly construed, compilers as a discipline is concerned with the study of software tools whose user interface is a language. Such tools include compilers (as defined narrowly above), compiler generators, interpreters, translators, libraries, assemblers, and runtime support systems.
Compiler construction is a microcosm of computer science. It draws together techniques from many other areas of computer science, including artificial intelligence, algorithms, formal languages, software engineering, systems and architecture, all in the production of a complex software artifact. As such, compiler courses feature in the core of many computer science curricula, since they expose students to a broad swathe of computer science material within a focused application domain.
Compiler construction is challenging and fun. Compilers have real impact on how computers are used, since a compiler is usually the principal tool of the computer programmer. Moreover, advances in operating systems and computer hardware continue to present new challenges in compiling, with improvements in compilers producing real results that can often be quantified and measured. Such concrete feedback is usually a source of great satisfaction to the compiler designer.
Computer science is a rapidly expanding field, and many different types of students find they can enjoy and succeed in its diverse sub-fields. Students wishing to succeed in the area of compilers should enjoy the study of programs, programming languages and the practice of programming. In addition, they should be disposed towards the practical aspects of computing, with an interest in hardware developments and software development tools.
Some students naturally take up the study of compilers because of their love of programming and a desire to understand and improve the tools that programmers use. Others find they are interested in the design and implementation of programming languages in their own right.
Many computer science programs require courses in both compilers and programming languages. Programming languages courses tend to focus more on issues of language design, whereas compiler courses emphasize language implementation and often involve the construction of a project compiler for a small (though usually realistic) programming language. Thus, if you enjoy programming projects, and the application of fundamental principles within a concrete domain, you will find compilers both challenging and interesting. You will gain important knowledge and skills in a compilers course even if your personal interests lie in other areas of computer science.
Back to Top
Compiler professionals are always in demand, since hardware companies are constantly improving existing computer systems, as well as designing new ones. These developments require changes in compilers to adapt them to new hardware features. Moreover, developments in compilers often facilitate changes in hardware design. Similarly, software companies continuously strive to improve the tools they use in software development‹tools that draw much of their inspiration from the intrinsic nature of compilers as language translators.
More generally, the skills acquired in the study of compilers can be widely applied to any software artifact that presents a user interface based on language. The programming skills developed in a compilers course are also in high demand. Finally, exposure to the diverse topics covered in a compilers course is valued highly by employers.
Back to Top
Most computer science majors are given the opportunity (if not required) to take an introductory course in compilers. Courses in computer architecture, programming languages, and operating systems are also highly relevant. Advanced courses in compilers present an opportunity to explore more advanced topics, such as program analysis and optimization, memory management (i.e., "garbage collection"), and software tools.
Back to Top
Every field has its own lingo, and by knowing that lingo, you can demonstrate your knowledge and expertise‹in a job interview, in the workplace, or in an online discussion with potential colleagues. A few of the most important key words in the field of compilers are presented here, to give you a head start on your future!
Assembler
a computer program that converts assembly code into machine code.
Assembly code
a textual (symbolic) representation of machine code.
Compiler
a computer program that takes as input an executable program and produces as output an equivalent executable program. In a traditional compiler, the input language is a programming language and the output language is either assembly code or machine code for some computer system.
Compile-time
the time at which the code for a program is compiled.
Interpreter
a computer program that takes as input an executable program and produces as output the results of executing that program. Interpreters and compilers have much in common, often performing many of the same tasks in their implementation. For example, both must analyze the source code for errors in either syntax or meaning. However, interpreting the code to produce a result is quite different from emitting a translated program that can be executed to produce the results.
JIT compiler
a JIT ("just-in-time") compiler is a compiler that executes at runtime, compiling the code just before it is executed. JIT compilers generate customized code that capitalizes on facts that cannot be known any earlier.
Linker
a computer program that takes as input several separately compiled fragments of code and produces a standalone executable program as output.
Link-time
the time at which a program is linked from its separately compiled fragments.
Machine code
the low-level instructions that a computer system can directly execute, expressed as binary codes.
Object code
the output of a compiler; usually assembly code or machine code.
Code optimization
the process of analyzing code to discover facts from context and using those facts to improve the code. Compilers use several kinds of analysis to support transformations. For example, data flow analysis involves reasoning, at compile-time, about the flow of values in the code at runtime.
Separate compilation
the act of compiling separate code fragments independently of one another.
Source code
the input to a compiler or interpreter; usually a program expressed in a programming language.
Source-to-source translator
a compiler whose source code and object code are expressed in the same language. Some optimizing compilers are structured this way, with the output code being an optimized form of the input code.
Runtime
the time at which a program executes.
Back to Top
People are going to expect you to be able to do certain things well because you have studied compilers. These core skills should help you focus your course work and learning as you progress toward your goals. Take the time to practiceæso you can exceed the expectations the world has for you as a compilers student!
Principles
compiler construction brings together techniques from disparate parts of computer science. Many of these techniques have a rich basis in theory, drawing inspiration from the theoretical foundations of the field of computer science. Exercising these principles by applying them to the problems of compiler construction illuminates the theory in a very practical way. Compilers students will strengthen their theory skills by putting them into practice.
Data structures
compilers manipulate a diverse range of data structures, and implementing a compiler exposes students to the relative strengths and weaknesses of alternative data structures. Hash tables, linked lists, trees and sparse set implementations feature heavily in the construction of compilers.
Algorithms
compilers embody many different algorithms, from greedy heuristic searches to fixed-point algorithms that reason about program behavior, and others such as theorem provers and algebraic simplifiers, pattern matchers for strings and trees, and graph algorithms. The compilers student will put these algorithms into practice, having implemented them and grappled with issues of representation and efficiency.
Programming
a skill that too few courses in computer science emphasize. A compilers course is often the first place that a student will be asked to work on a large software project. These projects are often structured such that students can work in teams, requiring project management skills and teamwork to succeed.
Software architecture
a large software program such as a compiler demands careful separation of concerns, so that independent components of the compiler can be worked on in isolation. Modularity of code and interfaces between those modules offer a convenient way to break up the problem of compiler construction. A compiler is a valuable artifact of study in its own right, to understand techniques for handling program complexity via modularity of components.
Software engineering
engineering a compiler demands careful application of software engineering skills to design, specify, code and test the various modules that make up a compiler.
Back to Top
To seek support for your college experiences, and to get a head start on your career, use these links to get connected by learning more about organizations in your discipline. By joining and participating in the professional conversation around the country, you can learn beyond the boundaries of your program. Many of these organizations offer scholarships and awards that can also help you to grow and succeed in your field of study!
Student organizations
www.acm.org/upe/
Upsilon Pi Epsilon, the international honor society for the computing sciences. A member of the Association of College Honor Societies, Upsilon Pi Epsilon comprises over 100 chapters located throughout the United States, Europe, and Asia. This website has been designed to allow the thousands of current members to communicate and find society information, and to allow prospective members to learn what Upsilon Pi Epsilon is all about.
Professional organizations
The Association for Computing Machinery
Founded in 1947, ACM is the world's first educational and scientific computing society. Today, its members ‹ over 80,000 computing professionals and students worldwide ‹ and the public turn to ACM for authoritative publications, pioneering conferences, and visionary leadership for the new millennium.
ACM membership provides opportunities for ongoing learning and professional networking. People join ACM for many different reasons; yet they all share a passion for advancing computing technology and promoting a more responsible world in which to use it.
The IEEE Computer Society
The IEEE Computer Society is the world's oldest and largest professional association of people in computing.
www.sigplan.org/
SIGPLAN is a Special Interest Group of ACM that focuses on programming languages. In particular, SIGPLAN explores their implementation and efficient use. Its members are programming language users, developers, implementers, theoreticians, researchers and educators.
Get Informed!
It's a good idea to read more than what your teachers assign, and to branch out beyond the confines of your program. Many college students report that reading current magazines and journals related to their field helped them when it came time to look for a job after graduation. Every field has numerous publications that offer different perspectives and raise important issues. The links below offer a start on your own journey to get informed! Many of these publications offer discounts to students.
Magazines and Trade Journals
www.acm.org/cacm/
Communications of the ACM (CACM). Published monthly, CACM is a source of general information and news on computing.
www.computer.org/computer/
Computer. Computer is the place where computing professionals of all disciplines can share their experience, solve problems, and reach consensus. In Computer, practitioners, managers, and researchers talk to each other in plain language about what works and what doesn't, what resources are available, and what might be next. Computer is the flagship member publication of the IEEE Computer Society.
Research journals and Academic Publications
www.acm.org/pubs/contents/journals/toplas/
ACM Transactions on Programming Languages and Systems (TOPLAS). The purpose of TOPLAS is to present research results on all aspects of the design, definition, implementation, and use of programming languages and programming systems. The scope of TOPLAS includes: programming languages and their semantics; programming systems (systems to assist the programming task, such as compilers, runtime systems, and language environments); storage allocation and garbage collection; languages and methods for writing specifications; testing and verification methods; and algorithms specifically related to the implementation of language processors.
elvis.rowan.edu/sigplan/
ACM SIGPLAN Notices is an informal monthly publication of the Special Interest Group on Programming Languages (SIGPLAN) of ACM. Incorporates several conference proceedings containing refereed papers.
www.acm.org/pubs/contents/proceedings/series/pldi/
Conference on Programming Language Design and Implementation. The premier forum for advances in compilers and programming language implementation.
www.acm.org/pubs/contents/proceedings/series/oopsla/
Conference on Object Oriented Programming Systems Languages and Applications. The premier forum for advances in compilation and implementation of object oriented programming languages.
www.acm.org/pubs/contents/proceedings/series/ismm/
International Symposium on Memory Management. A focused forum for advances in memory management techniques for programming languages.
www.acm.org/pubs/contents/proceedings/series/dynamo/
Workshop on Dynamic and Adaptive Compilation and Optimization. A focused forum for advances in dynamic and adaptive compilation.
Back to Top
This is the place to deepen your knowledge of the field. Whether you are a graduating senior, or still deciding if you want to major in Compilers, you'll find here a more detailed overview of the field.
What are Compilers?
Computers must be programmed to do their job. These programs are written in some programming language æ a formal language with mathematical properties and well-defined meanings. A program written in a programming language must be translated before it can execute directly on a computer. This translation is performed by a software system called a compiler. A compiler is itself a computer program that takes as input an executable program and produces as output an equivalent executable program. In a traditional compiler, the input language is the programming language and the output language is either assembly code or machine code for some computer system. Part of the translation process also performs syntax analysis to ensure that the input program is valid.
Compilers are designed according to two principles:
- The compiler must preserve the meaning of the program being compiled. That is, the output code must faithfully reproduce the meaning of the source-code program.
- The compiler must improve the source code in some way. A traditional compiler improves the source-code program by producing code that can execute directly on some target machine. Other compilers improve their input in different ways, perhaps by converting it into a convenient form for transmission over the Internet, or by performing transformations on the code such that the program will run more quickly or in a smaller memory.
Compilers must also satisfy a number of constraints, depending on the circumstances and goals of their use:
- Speed. Runtime performance of the code generated by the compiler is often of paramount importance, since many computations are bounded by the speed with which they can achieve their results. Intensive computational and simulation problems in science and engineering are good examples of applications that demand speed.
- Space. Embedded systems outnumber general-purpose computers many times over. They are constrained to operate with tight bounds on the size of the compiled code and data, because of physical or economic factors in their design, such as the need to restrict power consumption or to ensure low-cost production. Compiling for such systems poses unique challenges.
- Error detection/correction. A compiler must accurately report errors in the input program back to the user. Some compilers work hard to correct errors in the input so as to avoid aborting the compilation.
- Debugging. Optimizing compilers can output code that wildly differs from the source-code program. Debugging optimized code is difficult if the mapping from the object-code back to the source-code cannot be maintained. Radically transformed code may be more difficult to debug than un-optimized code. Compilers must be able to generate (potentially slower) code that will interact more cleanly with software development tools such as debuggers and programming environments.
- Compile-time efficiency. Compilers are a bottleneck in the software development cycle because programmers want to explore the immediate effects of changes they make to a program. Users sensitive to the speed of the output code may be prepared to accept slow compile times, but a fast compiler that produces the same results as a slower one is always better.
Each of these (sometimes contradictory) constraints leaves room for many different choices in the design and implementation of a compiler. They influence the principles, techniques, data structures, and algorithms used to build each and every compiler, all of which present fascinating and challenging problems for the compiler implementer to solve.
Back to Top
The term "compiler" was coined in the early 1950s by one of the pioneers of computing, Grace Murray Hopper. Translation was then viewed as the "compilation" of a sequence of subprograms selected from a library. Compilation, as we now know it, was then called "automatic programming" and was viewed with universal skepticism that it could ever be achieved. Today, automatic translation of programming languages is a well-understood fact, but programming language translators are still called compilers.
Among the first real compilers were the FORTRAN compilers of the late 1950s. They presented the programmer with a problem-oriented, largely machine-independent source language and performed some optimizations to produce efficient machine code, so as to compete with hand-coded assembly language. Their success paved the way for the development of high-level (that is, less machine-dependent) compiled programming languages.
Early compilers such as those for FORTRAN were complex and largely unprincipled constructionsæthe first FORTRAN compiler took 18 men years to build! Building a compiler was a complex and costly task, with components and techniques devised "on-the-fly" as the compiler was being built. Today compiler construction techniques are well-understood, so much so that students can complete the construction of a simple compiler in a one-semester course.
Many of the principles and techniques used in compiler construction have their roots in the understanding of mathematical formulations of regular and context-free languages. These mathematical principles were laid down in a very productive period in the 1960s and 1970s, and form the backbone of modern compiler construction. Particular success was found in the application of theoretical ideas to automate the construction of compiler components from high-level specifications of the syntax and semantics of programming languages. Formalization and automation has been less successful in providing a universal framework for compiler construction, so a number of the essential components of a compiler still yield only to ad hoc methods. In this, compiler construction remains as much an art guided by experience as a science driven by theory. Pushing the barrier dividing art from science is the subject of much current research.
Compilers Today
Compiler technology is today being applied in new and exciting settings, which often diverge from traditional notions of compilation. The Java programming language has spawned interest in programming applications across the Internet, and renewed techniques such as "just-in-time" (JIT) compilation. Java applets are transmitted across the Internet in a hardware-independent internal form, called Java byte codes; these are then interpreted, or compiled, loaded, and executed on the target machine. The performance of the application that uses the applet depends on the time it takes to go from bytecodes on a remote computer system to a complete execution on the target machine. Microsoft's NET initiative embodies similar technology. JIT compilation also permits the generation of customized code that exploits runtime-dependent information that cannot be known any earlier. If JIT compile-times can be kept small and the benefits are large then improvements can be significant.
Back to
Top
Follow these links to connect with a world of information about the field of Compilers!
To get a better idea of what to expect in class, follow these links of sample course syllabi from colleges around the country.
Harvard: CS153: Principles of Programming Language Compilation
Michigan State: CSE450: Translation of Programming Languages
Princeton: CS320: Compiling Techniques
Purdue: CS502: Compiling and Programming Systems
Rice: Comp412: compiler Construction
Berkeley : CS164: Programming Languages and compilers
Berkeley : CS264: Implementation of Programming Languages
Berkeley : CS265: Advanced Programming Language Implementation
Massachusetts at Amherst: CS610: Introduction to compilers
Massachusetts at Amherst: CS710: Advanced Translator Design
Texas at Austin:CS375: compilers
U. of Washington: CSE401: Introduction to compiler Construction
U. of Wisconsin: CS536: Introduction to Programming Languages and compilers
The Teaching About Programming Languages Project
Check out these sites to get a great head start on your next research project in Compilers!
Resources for Programming Language Research
Back to
Top