Skip to content

Commit 1fd89be

Browse files
committed
WIP: start of notes on Outlaw.
1 parent 912e2bd commit 1fd89be

File tree

2 files changed

+393
-0
lines changed

2 files changed

+393
-0
lines changed

www/notes.scrbl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,4 +33,5 @@ suggestions for improving the material, @bold{please},
3333
@include-section{notes/mug.scrbl}
3434
@include-section{notes/mountebank.scrbl}
3535
@include-section{notes/neerdowell.scrbl}
36+
@include-section{notes/outlaw.scrbl}
3637
@;include-section{notes/shakedown.scrbl}

www/notes/outlaw.scrbl

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
#lang scribble/manual
2+
3+
@(require (for-label (except-in racket compile ... struct?) a86))
4+
@(require redex/pict
5+
racket/runtime-path
6+
scribble/examples
7+
"utils.rkt"
8+
"ev.rkt"
9+
"../fancyverb.rkt"
10+
"../utils.rkt")
11+
12+
@(define codeblock-include (make-codeblock-include #'h))
13+
14+
@(define (shellbox . s)
15+
(parameterize ([current-directory (build-path notes "outlaw")])
16+
(filebox (emph "shell")
17+
(fancyverbatim "fish" (apply shell s)))))
18+
19+
@(require (for-syntax "../utils.rkt" racket/base "utils.rkt"))
20+
@(define-syntax (shell-expand stx)
21+
(syntax-case stx ()
22+
[(_ s ...)
23+
(parameterize ([current-directory (build-path notes "abscond")])
24+
(begin (apply shell (syntax->datum #'(s ...)))
25+
#'(void)))]))
26+
27+
@;{ Have to generate a-whole.rkt before listing it below.}
28+
@(shell-expand "racket -t combine.rkt -m a.rkt > a-whole.rkt")
29+
30+
@(ev '(require rackunit a86))
31+
@(ev `(current-directory ,(path->string (build-path notes "outlaw"))))
32+
@(void (ev '(with-output-to-string (thunk (system "make runtime.o")))))
33+
@(void (ev '(current-objs '("runtime.o"))))
34+
@(for-each (λ (f) (ev `(require (file ,f))))
35+
'(#;"interp.rkt" "compile.rkt" "compile-expr.rkt" "compile-literals.rkt" "compile-datum.rkt" "utils.rkt" "ast.rkt" "parse.rkt" "types.rkt" "unload-bits-asm.rkt"))
36+
37+
@(define this-lang "Outlaw")
38+
39+
@title[#:tag this-lang]{@|this-lang|: self-hosting}
40+
41+
@src-code[this-lang]
42+
43+
@emph{The king is dead, long live the king!}
44+
45+
@table-of-contents[]
46+
47+
@section[#:tag-prefix "neerdowell"]{Bootstrapping the compiler}
48+
49+
Take stock for a moment of the various language features we've built
50+
over the course of these notes and assignments: we've built a
51+
high-level language with built-in data types like booleans, integers,
52+
characters, pairs, lists, strings, symbols, vectors, boxes. Users can
53+
define functions, including recursive functions. Functions are
54+
themselves values and can be constructed anonymously with
55+
@racket[lambda]. We added basic I/O facilities. We added the ability
56+
to overload functions based on the number of arguments received using
57+
@racket[case-lambda], the ability to define variable arity functions
58+
using rest arguments, and the ability to call functions with arguments
59+
from a list using @racket[apply]. Users can defined their own
60+
structure types and use pattern matching to destructure values.
61+
Memory management is done automatically by the run-time system.
62+
63+
It's a pretty full-featured language and there are lots of interesting
64+
programs we could write in our language. One of the programs we could
65+
@emph{almost} write is actually the compiler itself. In this section,
66+
let's bridge the gap between the features of Racket our compiler uses
67+
and those that our compiler implements and then explore some of the
68+
consequences.
69+
70+
71+
We'll call it @bold{Outlaw}.
72+
73+
@section[#:tag-prefix "outlaw"]{Features used by the Compiler}
74+
75+
Let's take a moment to consider all of the language features we
76+
@emph{use} in our compiler source code, but we haven't yet
77+
implemented. Open up the source code for, e.g. @secref{Neerdowell},
78+
and see what you notice:
79+
80+
@itemlist[
81+
82+
@item{Modules: programs are not monolithic; they are broken into
83+
@bold{modules} in separate files like @tt{compile-stdin.rkt},
84+
@tt{parse.rkt}, @tt{compile.rkt}, etc.}
85+
86+
@item{a86: our compiler relies heavily on the @secref{a86} library
87+
that provides all of the constructors for a86 instructions and
88+
functions like @racket[asm-display] for printing a86 instructions
89+
using NASM syntax.}
90+
91+
@item{Higher-level I/O: at the heart of the front-end of our compiler
92+
is the use of Racket's @racket[read] function, which reads in an
93+
s-expression. We also use things like @racket[read-line] which reads
94+
in a line of text and returns it as a string.}
95+
96+
@item{Lots and lots of Racket functions: our compiler makes use of
97+
lots of built-in Racket functions that we haven't implemented. These
98+
are things like @racket[length], @racket[map], @racket[foldr],
99+
@racket[filter], etc. Even some of the functions we have implemented
100+
have more featureful counterparts in Racket which we use. For
101+
example, our @racket[+] primitve takes two arguments, while Racket's
102+
@racket[+] function can take any number of arguments.}
103+
104+
@item{Primitives as functions: the previous item brings up an
105+
important distinction between our language and Racket. For us,
106+
things like @racket[+] are @bold{primitives}. Primitives are
107+
@emph{not} values. You can't return a primitive from a function. You
108+
can't make a list of primitives. This means even if we had a
109+
@racket[map] function, you couldn't pass @racket[add1] as an argument,
110+
since @racket[add1] is not a value. In Racket, there's really no such
111+
thing as a primitive; things like @racket[add1], @racket[+],
112+
@racket[cons?], etc. are all just functions.}
113+
114+
]
115+
116+
If we want our compiler to be written in the language it implements we
117+
have to deal with this gap in some way. For each difference between
118+
what we implement and what we use, we basically only have two ways to
119+
proceed:
120+
121+
@itemlist[#:style 'ordered
122+
123+
@item{rewrite our compiler source code to @emph{not} use
124+
that feature, or}
125+
126+
@item{implement it.}
127+
]
128+
129+
Let's take some of these in turn.
130+
131+
@section[#:tag-prefix "outlaw"]{Punting on Modules}
132+
133+
Our compiler currently works by compiling a whole program, which we
134+
assume is given all at once as input to the compiler. The compiler
135+
source code, on the other hand, is sensibly broken into seperate
136+
modules.
137+
138+
We @emph{could} think about designing a module system for our
139+
language. We'd have to think about how seperate compilation of
140+
modules would work. At a minimum our compiler would have to deal with
141+
resolving module references made through @racket[require].
142+
143+
While module systems are a fascinating and worthy topic of study, we
144+
don't really have the time to do them justice and instead we'll opt to
145+
punt on the module system. Instead we can rewrite the compiler source
146+
code as a single monolithic source file.
147+
148+
That's not a very good software engineering practice and it will be a
149+
bit of pain to maintain the complete @this-lang source file. As a
150+
slight improvement, we can write a little utility program that given a
151+
file containing a module will recursively follow all @racket[require]d
152+
files and print out a single, @racket[require]-free program that
153+
includes all of the modules that comprise the program.
154+
155+
Let's see an example of the @tt{combine.rkt} utility in action.
156+
157+
Suppose we have a program that consists of the following files:
158+
159+
@codeblock-include["outlaw/a.rkt"]
160+
@codeblock-include["outlaw/b.rkt"]
161+
@codeblock-include["outlaw/c.rkt"]
162+
163+
Then we can combine these files into a single program
164+
as follows:
165+
166+
@shellbox["racket -t combine.rkt -m a.rkt > a-whole.rkt"]
167+
168+
@codeblock-include["outlaw/a-whole.rkt"]
169+
170+
171+
This gives us a rudimentary way of combining modules into a single
172+
program that can be compiled with our compiler. The idea will be that
173+
we construct a single source file for our compiler by running
174+
@tt{combine.rkt} on @tt{compile-stdin.rkt}. The resulting file will
175+
be self-contained and include everything @tt{compile-stdin.rkt}
176+
depends upon.
177+
178+
It's worth recognizing that this isn't a realistic alternative to
179+
having a module system. In particular, combining modules in this way
180+
breaks usual abstractions provided by modules. For example, it's
181+
common for modules to define their own helper functions or stateful
182+
data that are not exported (via @racket[provide]) outside the module.
183+
This ensures that clients of the module cannot access potentially
184+
sensitive data or operations or mess with invariants maintained by a
185+
module's exports. Our crude combination tool does nothing to enforce
186+
these abstraction barriers.
187+
188+
That's an OK compromise to make for now. The idea is that
189+
@tt{combine.rkt} doesn't have to work @emph{in general} for combining
190+
programs in a meaning-preserving way. It just needs to work for one
191+
specific program: our compiler.
192+
193+
@section[#:tag-prefix "outlaw"]{Bare-bones a86}
194+
195+
Our compiler makes heavy use of the @secref{a86} library that provides
196+
all of the constructors for a86 instructions and functions like
197+
@racket[asm-display] for printing a86 instructions using NASM syntax.
198+
That library is part of the @tt{langs} package.
199+
200+
The library at its core provides structures for representing a86
201+
instructions and some operations that work on instructions. While the
202+
library has a bunch of functionality that provides for good, early
203+
error checking when you construct an instruction or a whole a86
204+
program, we really only need the structures and functions of the
205+
library.
206+
207+
To make the compiler self-contained we can build our own bare-bones
208+
version of the a86 library and include it in the compiler.
209+
210+
For example, here's the module that defines an AST for a86 instructions:
211+
212+
@codeblock-include["outlaw/a86/ast.rkt"]
213+
214+
And here's the module that implements the needed operations for
215+
writing out instructions in NASM syntax:
216+
217+
@codeblock-include["outlaw/a86/printer.rkt"]
218+
219+
OK, so now we've made a86 a self-contained part of the the compiler.
220+
The code consists of a large AST definition and some functions that
221+
operate on the a86 AST data type. The printer makes use of some Racket
222+
functions we haven't used before, like @racket[system-type] and
223+
@racket[number->string], and also some other high-level IO functions
224+
like @racket[write-string]. We'll have to deal with these features,
225+
so while we crossed one item of our list (a86), we added a few more,
226+
hopefully smaller problems to solve.
227+
228+
@section[#:tag-prefix "outlaw"]{Racket functions, more I/O, and primitives}
229+
230+
We identified three more gaps between our compiler's implementation
231+
language and its implemented language: lots of Racket functions like
232+
@racket[length], @racket[map], etc., more I/O functions that operate
233+
at a higher-level than our @racket[write-byte] and @racket[read-byte]
234+
such as @racket[write-string], @racket[read], @racket[read-line],
235+
etc., and finally the issue that primitives are not values.
236+
237+
There are many ways we could proceed from here. We could, for
238+
example, spend some time adding new primitives to our compiler
239+
that implement all the missing functionality like @racket[length],
240+
@racket[write-string], and others.
241+
242+
Let's consider adding a @racket[length] primitive. It's not terribly
243+
difficult. We could add a unary operation called @racket['length],
244+
which would emit the following code:
245+
246+
@#reader scribble/comment-reader
247+
(racketblock
248+
;; assume list is in rax
249+
(let ((done (gensym 'done))
250+
(loop (gensym 'loop)))
251+
(seq (Mov r8 0) ; count = 0
252+
(Label loop)
253+
(Cmp rax (imm->bits '())) ; if empty, done
254+
(Je done)
255+
(assert-cons rax) ; otherwise, should be a cons
256+
(Xor rax type-cons)
257+
(Mov rax (Offset rax 0)) ; move cdr into rax
258+
(Add r8 (imm->bits 1)) ; increment count
259+
(Jmp loop) ; loop
260+
(Label done)
261+
(Mov rax r8))) ; return count
262+
)
263+
264+
We can play around an make sure this assembly code is actually
265+
computing the length of the list in @racket['rax]:
266+
267+
@(void (ev '(current-objs '())))
268+
269+
@#reader scribble/comment-reader
270+
(ex
271+
(require neerdowell/parse
272+
neerdowell/compile-datum
273+
neerdowell/compile-ops
274+
neerdowell/types)
275+
(require a86)
276+
277+
;; Datum -> Natural
278+
;; Computes the length of d in assembly
279+
(define (length/asm d)
280+
(bits->value
281+
(asm-interp
282+
(seq (Global 'entry)
283+
(Label 'entry)
284+
(compile-datum d)
285+
; assume list is in rax
286+
(let ((done (gensym 'done))
287+
(loop (gensym 'loop)))
288+
(seq (Mov r8 0) ; count = 0
289+
(Label loop)
290+
(Cmp rax (imm->bits '())) ; if empty, done
291+
(Je done)
292+
(assert-cons rax) ; otherwise, should be a cons
293+
(Xor rax type-cons)
294+
(Mov rax (Offset rax 0)) ; move cdr into rax
295+
(Add r8 (imm->bits 1)) ; increment count
296+
(Jmp loop) ; loop
297+
(Label done)
298+
(Mov rax r8))) ; return count
299+
(Ret)
300+
(Label 'raise_error_align) ; dummy version, returns -1
301+
(Mov rax -1)
302+
(Ret)))))
303+
304+
(length/asm '())
305+
(length/asm '(1 2 3))
306+
(length/asm '(1 2 3 4 5 6))
307+
)
308+
309+
Looks good.
310+
311+
Alternatively, instead of a primitive, we could add a @racket[length]
312+
@emph{function} by creating a static function value and binding it to
313+
the variable @racket[length]. The code for the function would
314+
essentially be the same as the primitive above:
315+
316+
@#reader scribble/comment-reader
317+
(racketblock
318+
(seq (Data)
319+
(Label 'length_func) ; the length closure
320+
(Dq 'length_code) ; points to the length code
321+
(Text)
322+
(Label 'length_code) ; code for length
323+
(Cmp r15 1) ; expects 1 arg
324+
(Jne 'raise_error_align)
325+
(Pop rax)
326+
; ... length code from above
327+
(Add rsp 8) ; pop off function
328+
(Ret))
329+
)
330+
331+
332+
The @racket[compile] function could push the binding for
333+
@racket[length] (and potentially other built-in functions) on the
334+
stack before executing the instructions of the program compiled in an
335+
environment that included @racket['length]. This would effectively
336+
solve the problem for @racket[length].
337+
338+
We'd have to do something similar for @racket[map], @racket[foldr],
339+
@racket[memq], and everything else we needed.
340+
341+
342+
The @emph{problem} with this approach is will be spending a bunch of
343+
time writing lots and lots of assembly code. An activity we had hoped
344+
to avoid by building a high-level programming language! Even worse,
345+
some of the functions we'd like to add, e.g. @racket[map], will be
346+
much more complicated to write in assembly compared to @racket[length].
347+
348+
But here's the thing. Consider a Racket definition of @racket[length]:
349+
350+
@#reader scribble/comment-reader
351+
(racketblock
352+
(define (length xs)
353+
(match xs
354+
['() 0]
355+
[(cons _ xs) (add1 (length xs))]))
356+
)
357+
358+
Note that this definition is within the language we've built. Instead
359+
of writing the assembly code for @racket[length], we could write a
360+
definition in @this-lang and simply compile it to obtain assembly code
361+
that implements a @racket[length] function.
362+
363+
Many of the functions we need in the compiler can be built up this
364+
way. Instead of spending our time writing and debugging assembly
365+
code, which is difficulty to do, we can simply write some Racket code.
366+
367+
With this, we will introduce a @bold{standard library}. The idea is that
368+
the standard library, like the run-time system, is a bundle of code that
369+
will accompany every executable; it will provide a set of built-in functions
370+
and the compiler will be updated to compile programs in the environment of
371+
everything provided by the standard library.
372+
373+
374+
@section[#:tag-prefix "outlaw"]{Building a standard library}
375+
376+
...
377+
378+
@section[#:tag-prefix "outlaw"]{Parsing primitives, revisited}
379+
380+
...
381+
382+
@section[#:tag-prefix "outlaw"]{A few more primitives}
383+
384+
...
385+
386+
@section[#:tag-prefix "outlaw"]{Dealing with I/O}
387+
388+
...
389+
390+
@section[#:tag-prefix "outlaw"]{Putting it all together}
391+
392+
...

0 commit comments

Comments
 (0)