-
Notifications
You must be signed in to change notification settings - Fork 33
Expand file tree
/
Copy pathwhat-is-a-compiler.scrbl
More file actions
228 lines (169 loc) · 5.92 KB
/
what-is-a-compiler.scrbl
File metadata and controls
228 lines (169 loc) · 5.92 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
#lang scribble/manual
@require[scriblib/footnote]
@define-footnote[footnote make-footnote]
@title[#:tag "Intro"]{What @emph{is} a Compiler?}
A function that maps an @emph{input} string to an @emph{output}
string.
@verbatim{
compiler : String -> String
}
Typically, the @emph{input} and @emph{output} strings are “programs”
@verbatim{
compiler : SourceProgram -> TargetProgram
}
For example, here are some well known compilers:
@verbatim{
gcc, clang : C -> Binary (* a.out, .exe *)
ghc : Haskell -> Binary
javac : Java -> JvmByteCode (* .class *)
scalac : Scala -> JvmByteCode
ocamlc : Ocaml -> OcamlByteCode (* .cmo *)
ocamlopt : Ocaml -> Binary
gwt : Java -> JavaScript (* .js *)
v8 : JavaScript -> Binary
nasm : x86 -> Binary
pdftex : LaTeX -> PDF
pandoc : Markdown -> PDF | Html | Doc
}
Key Requirements on output program:
@itemlist[#:style 'ordered
@item{Has the same @emph{meaning} (“semantics”) as input,}
@item{Is @emph{executable} in relevant @emph{context} (VM,
microprocessor, web browser).}
]
@section{A Bit of History}
Compilers were invented to @link["http://worrydream.com/dbx/"]{avoid writing machine code by hand}.
@image{img/binary-soap-fortran.png}
From Binary to FORTRAN
Richard Hamming – The Art of Doing Science and Engineering, p25:
@para{@italic{In the beginning we programmed in absolute binary… Finally, a
Symbolic Assembly Program was devised – after more years than you are
apt to believe during which most programmers continued their heroic
absolute binary programming. At the time [the assembler] first
appeared I would guess about 1% of the older programmers were
interested in it – using [assembly] was “sissy stuff”, and a real
programmer would not stoop to wasting machine capacity to do the
assembly.}}
John A.N. Lee, Dept of Computer Science, Virginia Polytechnical Institute
@para{@italic{One of von Neumann’s students at Princeton recalled that
graduate students were being used to hand assemble programs into
binary for their early machine. This student took time out to build an
assembler, but when von Neumann found out about it he was very angry,
saying that it was a waste of a valuable scientific computing
instrument to use it to do clerical work.}}
@section[#:tag "What does a Compiler look like?"]{What does a Compiler @emph{look} like?}
@image{img/compiler-pipeline.png}
Compiler Pipeline
An input source program is converted to an executable binary in many stages:
@itemlist[
@item{@bold{Parsed} into a data structure called an @bold{Abstract Syntax Tree}}
@item{@bold{Checked} to make sure code is well-formed (and well-typed)}
@item{@bold{Simplified} into some convenient @bold{Intermediate Representation}}
@item{@bold{Optimized} into (equivalent) but faster program}
@item{@bold{Generated} into assembly x86}
@item{@bold{Linked} against a run-time (usually written in C)}
]
@section{What is CMSC 430?}
@itemlist[
@item{A bridge between two worlds
@itemlist[
@item{@emph{High-level programming languages}: OCaml (CMSC 330)}
@item{@emph{Machine code}: x86/ARM (CMSC 216)}
]}
]
A sequel to both those classes.
@itemlist[
@item{How to write @bold{a compiler} for @tt{MiniRacket -> x86}
@itemlist[#:style 'ordered
@item{Parsing}
@item{Checking & Validation}
@item{Simplification & Normalizing}
@item{Optimization}
@item{Code Generation}
]}
@item{But also, how to write @bold{complex programs}
@itemlist[#:style 'ordered
@item{Design}
@item{Implement}
@item{Test}
@item{@bold{Iterate}}
]
}
]
@section{How to write a Compiler?}
General recipe, applies to any large system
@itemlist[
@item{@emph{gradually, one feature at a time!}}
]
We will
@itemlist[
@item{Step 1 Start with a teeny tiny language,}
@item{Step 2 Build a full compiler for it,}
@item{Step 3 Add a few features,}
@item{Go to Step 2.}
]
(Yes, loops forever, but we will hit Ctrl-C in 15 weeks...)
@section{Mechanics}
See @secref{Syllabus}.
@section{What will @emph{we} do?}
Write @emph{a compiler} for @tt{MiniRacket -> x86}
But the Eiffel Tower wasn’t built in a day … and neither is any serious software.
@image{img/Eiffel.jpg}
So we will write @emph{many} compilers:
@itemlist[
@item{Numbers and increment/decrement}
@item{Local Variables}
@item{Nested Binary Operations}
@item{Booleans, Branches and Dynamic Types}
@item{Functions}
@item{Tuples and Structures}
@item{Lambdas and closures}
@item{Types and Inference}
@item{Garbage Collection}
]
@section{What will @emph{you} learn?}
@bold{Core principles of compiler construction}
@itemlist[
@item{Managing Stacks & Heap}
@item{Type Checking}
@item{Intermediate forms}
@item{Optimization}
]
@bold{Several new languages}
@itemlist[
@item{Racket to write the compiler}
@item{C to write the “run-time”}
@item{x86 compilation target}
]
@bold{@italic{More importantly} how to write a large program}
@itemlist[
@item{How to use types for design}
@item{How to add new features / refactor}
@item{How to test & validate}
]
@section{What do you @emph{need to know}?}
This 430 depends very heavily on CMSC 330.
@itemlist[
@item{Familiarity with Functional Programming and Ocaml}
@item{Datatypes (e.g. Lists, Trees, ADTs)}
@item{Polymorphism}
@item{Recursion}
@item{Higher-order functions (e.g. map, filter, fold)}
]
Also @bold{depends on} CMSC 216
@itemlist[
@item{Experience with some C programming}
@item{Experience with some assembly (x86)}
]
@section{A few words on the medium of instruction}
We will use @link["https://racket-lang.org/"]{Racket} which, for our
purposes is like Ocaml but with nicer syntax.@footnote{To start a good
flamewar just post this anywhere online where Haskell and OCaml folks
can see it.}
Racket has many advanced features beyond what you saw in 330, but we
won't be using them; in the few cases we do, I’ll explain them as we
go.
@make-footnote[]
@;section{Our metalanguage: Racket}
@;section{Our target language: x86}
@;section{Our object language(s): Core Racket}