Skip to content

Commit 54827d1

Browse files
authored
Merge pull request #32 from dvanhorn/next
Next
2 parents 99c74ed + 9022f49 commit 54827d1

9 files changed

Lines changed: 396 additions & 15 deletions

File tree

www/assignments/5.scrbl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ two booleans are equal.}
105105
106106
@section[#:tag-prefix "a5-" #:style 'unnumbered]{Extending your Parser, yet again!}
107107
108-
@bold{CHANGE:} There have been a couple of ommissions in the grammar
108+
@bold{CHANGE:} There have been a couple of omissions in the grammar
109109
and the code given to you for the parser. The grammar has been
110110
(hopefully) fixed and I have decided to release the code for the
111111
parser, so you shouldn't have to make changes to it. If you have

www/notes/hustle.scrbl

Lines changed: 260 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
"hustle/semantics.rkt"
88
"utils.rkt"
99
"ev.rkt"
10+
"../fancyverb.rkt"
1011
"../utils.rkt")
1112

1213
@(define codeblock-include (make-codeblock-include #'h))
@@ -16,20 +17,275 @@
1617

1718
@title[#:tag "Hustle"]{Hustle: heaps and lists}
1819

19-
@;codeblock-include["hustle/ast.rkt"]
20+
21+
@emph{A little and a little, collected together, become a great deal;
22+
the heap in the barn consists of single grains, and drop and drop
23+
makes an inundation.}
24+
25+
@table-of-contents[]
26+
27+
@section{Inductive data}
28+
29+
So far all of the data we have considered can fit in a single machine
30+
word (64-bits). Well, integers can't, but we truncated them and only
31+
consider, by fiat, those integers that fit into a register.
32+
33+
In the @bold{Hustle} language, we will add two @bold{inductively
34+
defined data types}, boxes and pairs, which will require us to relax
35+
this restriction.
36+
37+
Boxes are like unary pairs, they simply hold a value, which can be
38+
projected out. Pairs hold two values which each can be projected out.
39+
40+
The new operations include constructors @racket[(box _e)] and
41+
@racket[(cons _e0 _e1)] and projections @racket[(unbox _e)],
42+
@racket[(car _e)], and @racket[(cdr _e)].
43+
44+
@margin-note{Usually boxes are @emph{mutable} data structures, like
45+
OCaml's @tt{ref} type, but we will examine this aspect later. For now,
46+
we treat boxes as immutable data structures.}
47+
48+
These features will operate like their Racket counterparts:
49+
@ex[
50+
(unbox (box 7))
51+
(car (cons 3 4))
52+
(cdr (cons 3 4))
53+
]
54+
55+
56+
We use the following grammar for Hustle:
2057

2158
@centered[(render-language H)]
2259

60+
We can model this as an AST data type:
61+
62+
@codeblock-include["hustle/ast.rkt"]
2363

2464
@section{Meaning of Hustle programs}
2565

66+
The meaning of Hustle programs is just a slight update to Grift
67+
programs, namely we add a few new primitives.
68+
69+
The update to the semantics is just an extension of the semantics of
70+
primitives:
71+
2672
@(judgment-form-cases #f)
2773

28-
@centered[(render-judgment-form 𝑯-𝒆𝒏𝒗)]
74+
@;centered[(render-judgment-form 𝑯-𝒆𝒏𝒗)]
2975

3076
@centered[(render-metafunction 𝑯-𝒑𝒓𝒊𝒎 #:contract? #t)]
3177

78+
The interpreter similarly has an update to the @racket[interp-prim]
79+
function. On the relevant bits of
80+
@link["hustle/interp.rkt"]{@tt{interp.rkt}} are shown:
81+
82+
@#reader scribble/comment-reader
83+
(racketblock
84+
;; Any -> Boolean
85+
(define (prim? x)
86+
(and (symbol? x)
87+
(memq x '(add1 sub1 + - zero?
88+
;; New
89+
box unbox cons car cdr))))
90+
91+
;; Prim [Listof Answer] -> Answer
92+
(define (interp-prim p as)
93+
(match (cons p as)
94+
[(list p (? value?) ... 'err _ ...) 'err]
95+
[(list 'add1 (? integer? i0)) (+ i0 1)]
96+
[(list 'sub1 (? integer? i0)) (- i0 1)]
97+
[(list 'zero? (? integer? i0)) (zero? i0)]
98+
[(list '+ (? integer? i0) (? integer? i1)) (+ i0 i1)]
99+
[(list '- (? integer? i0) (? integer? i1)) (- i0 i1)]
100+
;; New for Hustle
101+
[(list 'box v0) (box v0)]
102+
[(list 'unbox (? box? v0)) (unbox v0)]
103+
[(list 'cons v0 v1) (cons v0 v1)]
104+
[(list 'car (cons v0 v1)) v0]
105+
[(list 'cdr (cons v0 v1)) v1]
106+
[_ 'err]))
107+
)
108+
109+
Inductively defined data is easy to model in the semantics and
110+
interpreter because we can rely on inductively defined data at the
111+
meta-level in math or Racket, respectively.
112+
113+
The real trickiness comes when we want to model such data in an
114+
impoverished setting that doesn't have such things, which of course is
115+
the case in assembly.
116+
117+
The problem is that a value such as @racket[(box _v)] has a value
118+
inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has
119+
@emph{two} values inside it. If each value is represented with 64
120+
bits, it would seem a pair takes @emph{at a minimum} 128-bits to
121+
represent (plus we need some bits to indicate this value is a pair).
122+
What's worse, those @racket[_v0] and @racket[_v1] may themselves be
123+
pairs or boxes. The great power of inductive data is that an
124+
arbitrarily large piece of data can be constructed. But it would seem
125+
impossible to represent each piece of data with a fixed set of bits.
126+
127+
The solution is to @bold{allocate} such data in memory, which can in
128+
principle be arbitrarily large, and use a @bold{pointer} to refer to
129+
the place in memory that contains the data.
130+
131+
@;{ Really deserves a "bit" level interpreter to bring this idea across. }
132+
133+
134+
@;codeblock-include["hustle/interp.rkt"]
135+
136+
@section{A Compiler for Hustle}
137+
138+
The first thing do is make another distinction in the kind of values
139+
in our language. Up until now, each value could be represented in a
140+
register. We now call such values @bold{immediate} values.
141+
142+
We introduce a new category of values which are @bold{pointer} values.
143+
We will (for now) have two types of pointer values: boxes and pairs.
144+
145+
So we now have a kind of hierarchy of values:
146+
147+
@verbatim{
148+
- values
149+
+ pointers (non-zero in last 3 bits)
150+
* boxes
151+
* pairs
152+
+ immediates (zero in last three bits)
153+
* integers
154+
* characters
155+
* booleans
156+
* ...
157+
}
158+
159+
We will represent this hierarchy by shifting all the immediates over 3
160+
bits and using the lower 3 bits to tag things as either being
161+
immediate (tagged @code[#:lang "racket"]{#b000}) or a box or pair.
162+
To recover an immediate value, we just shift back to the right 3 bits.
163+
164+
The pointer types will be tagged in the lowest three bits. A box
165+
value is tagged @code[#:lang "racket"]{#b001} and a pair is tagged
166+
@code[#:lang "racket"]{#b010}. The remaining 61 bits will hold a
167+
pointer, i.e. an integer denoting an address in memory.
168+
169+
The idea is that the values contained within a box or pair will be
170+
located in memory at this address. If the pointer is a box pointer,
171+
reading 64 bits from that location in memory will produce the boxed
172+
value. If the pointer is a pair pointer, reading the first 64 bits
173+
from that location in memory will produce one of the value in the pair
174+
and reading the next 64 bits will produce the other. In other words,
175+
constructors allocate and initialize memory. Projections dereference
176+
memory.
177+
178+
The representation of pointers will follow a slightly different scheme
179+
than that used for immediates. Let's first talk a bit about memory
180+
and addresses.
181+
182+
A memory location is represented (of course, it's all we have!) as a
183+
number. The number refers to some address in memory. On an x86
184+
machine, memory is @bold{byte-addressable}, which means each address
185+
refers to a 1-byte (8-bit) segment of memory. If you have an address
186+
and you add 1 to it, you are refering to memory starting 8-bits from the
187+
original address.
188+
189+
We will make a simplifying assumption and always store things in
190+
memory in multiples of 64-bit chunks. So to go from one memory
191+
address to the next @bold{word} of memory, we need to add 8 (1-byte
192+
times 8 = 64 bits) to the address.
193+
194+
What is 8 in binary? @code[#:lang "racket"]{#b1000}
195+
196+
What's nice about this is that if we start from a memory location that
197+
is ``word-aligned,'' i.e. it ends in @code[#:lang "racket"]{#b000},
198+
then every 64-bit index also ends in @code[#:lang "racket"]{#b000}.
199+
200+
What this means is that @emph{every} address we'd like to represent
201+
has @code[#:lang "racket"]{#b000} in its least signficant bits. We
202+
can therefore freely uses these three bits to tag the type of the
203+
pointer @emph{without needing to shift the address around}. If we
204+
have a box pointer, we can simply zero out the box type tag to obtain
205+
the address of the boxes content. Likewise with pairs.
206+
207+
208+
We use a register, @racket['rdi], to hold the address of the next free
209+
memory location in memory. To allocate memory, we simply increment
210+
the content of @racket['rdi] by a multiple of 8. To initialize the
211+
memory, we just write into the memory at that location. To contruct a
212+
pair or box value, we just tag the unused bits of the address.
213+
214+
So for example the following creates a box containing the value 7:
215+
216+
@#reader scribble/comment-reader
217+
(racketblock
218+
`((mov rax ,(arithmetic-shift 7 imm-shift))
219+
(mov (offset rdi 0) rax) ; write '7' into address held by rdi
220+
(mov rax rdi) ; copy pointer into return register
221+
(or rax ,type-box) ; tag pointer as a box
222+
(add rdi 8)) ; advance rdi one word
223+
)
224+
225+
If @racket['rax] holds a box value, we can ``unbox'' it by erasing the
226+
box tag, leaving just the address of the box contents, then
227+
dereferencing the memory:
228+
229+
@#reader scribble/comment-reader
230+
(racketblock
231+
`((xor rax ,type-box) ; erase the box tag
232+
(mov rax (offset rax 0))) ; load memory into rax
233+
)
234+
235+
Pairs are similar. Suppose we want to make @racket[(cons 3 4)]:
236+
237+
@#reader scribble/comment-reader
238+
(racketblock
239+
`((mov rax ,(arithmetic-shift 3 imm-shift))
240+
(mov (offset rdi 0) rax) ; write '3' into address held by rdi
241+
(mov rax ,(arithmetic-shift 4 imm-shift))
242+
(mov (offset rdi 1) rax) ; write '4' into word after address held by rdi
243+
(mov rax rdi) ; copy pointer into return register
244+
(or rax ,type-pair) ; tag pointer as a pair
245+
(add rdi 16)) ; advance rdi 2 words
246+
)
247+
248+
If @racket['rax] holds a pair value, we can project out the elements
249+
by erasing the pair tag, leaving just the address of the pair contents,
250+
then dereferencing either the first or second word of memory:
251+
252+
@#reader scribble/comment-reader
253+
(racketblock
254+
`((xor rax ,type-pair) ; erase the pair tag
255+
(mov rax (offset rax 0)) ; load car into rax
256+
(mov rax (offset rax 1))) ; or... load cdr into rax
257+
)
258+
259+
From here, writing the compiler for @racket[box], @racket[unbox],
260+
@racket[cons], @racket[car], and @racket[cdr] is just a matter of
261+
putting together pieces we've already seen such as evaluating multiple
262+
subexpressions and type tag checking before doing projections.
263+
264+
265+
266+
The complete compiler is given below.
267+
268+
@codeblock-include["hustle/compile.rkt"]
269+
270+
@section{A Run-Time for Hustle}
271+
272+
The run-time system for Hustle is more involved for two main reasons:
273+
274+
The first is that the compiler relies on a pointer to free memory
275+
residing in @racket['rdi]. The run-time system will be responsible
276+
for allocating this memory and initializing the @racket['rdi]
277+
register. To allocate memory, it uses @tt{malloc}. It passes the
278+
pointer returned by @tt{malloc} to the @tt{entry} function. The
279+
protocol for calling functions in C says that the first argument will
280+
be passed in the @racket['rdi] register. Since @tt{malloc} produces
281+
16-byte aligned addresses on 64-bit machines, @racket['rdi] is
282+
initialized with an address that ends in @code[#:lang
283+
"racket"]{#b000}, satisfying our assumption about addresses.
284+
285+
The second complication comes from printing. Now that values include
286+
inductively defined data, the printer must recursively traverse these
287+
values to print them.
32288

33-
@codeblock-include["hustle/interp.rkt"]
289+
The complete run-time system is below.
34290

35-
@;codeblock-include["hustle/compile.rkt"]
291+
@filebox-include[fancy-c "hustle/main.c"]

www/notes/hustle/ast.rkt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#lang racket
2+
;; type Expr =
3+
;; | Integer
4+
;; | Boolean
5+
;; | Variable
6+
;; | (list Prim1 Expr)
7+
;; | (list Prim2 Expr Expr)
8+
;; | `(if ,Expr ,Expr ,Expr)
9+
;; | `(let ((,Variable ,Expr)) ,Expr)
10+
11+
;; type Prim1 =
12+
;; | 'add1 | 'sub1 | 'zero?
13+
;; | 'box | 'unbox | 'car | 'cdr
14+
;; type Prim2 =
15+
;; | '+ | '- | 'cons
16+
17+
;; type Variable = Symbol (except 'add1 'sub1 'if, etc.)

www/notes/hustle/compile.rkt

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#lang racket
22
(provide (all-defined-out))
33

4-
;; An immediate is anything ending in #b0000
4+
;; An immediate is anything ending in #b000
55
;; All other tags in mask #b111 are pointers
66

77
(define result-shift 3)
@@ -180,7 +180,6 @@
180180
,@assert-integer
181181
(add rax (offset rsp ,(- (add1 (length c))))))))
182182

183-
184183
;; Expr Expr CEnv -> Asm
185184
(define (compile-- e0 e1 c)
186185
(let ((c1 (compile-e e1 c))

www/notes/hustle/interp.rkt

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33

44
;; type Value =
55
;; ....
6+
;; | (box Value)
67
;; | '()
78
;; | (cons Value Value)
89

9-
1010
;; Expr -> Answer
1111
(define (interp e)
1212
(interp-env e '()))
@@ -45,7 +45,9 @@
4545
;; Any -> Boolean
4646
(define (prim? x)
4747
(and (symbol? x)
48-
(memq x '(add1 sub1 + - zero? cons car cdr))))
48+
(memq x '(add1 sub1 + - zero?
49+
;; New for Hustle
50+
box unbox cons car cdr))))
4951

5052
;; Any -> Boolean
5153
(define (value? x)
@@ -65,6 +67,9 @@
6567
[(list 'zero? (? integer? i0)) (zero? i0)]
6668
[(list '+ (? integer? i0) (? integer? i1)) (+ i0 i1)]
6769
[(list '- (? integer? i0) (? integer? i1)) (- i0 i1)]
70+
;; New for Hustle
71+
[(list 'box v0) (box v0)]
72+
[(list 'unbox (? box? v0)) (unbox v0)]
6873
[(list 'cons v0 v1) (cons v0 v1)]
6974
[(list 'car (cons v0 v1)) v0]
7075
[(list 'cdr (cons v0 v1)) v1]

www/notes/hustle/main.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ int main(int argc, char** argv) {
2929
int64_t result = entry(heap);
3030
print_result(result);
3131
printf("\n");
32+
free(heap);
3233
return 0;
3334
}
3435

@@ -95,5 +96,3 @@ void print_pair(int64_t a) {
9596
print_result(cdr);
9697
}
9798
}
98-
99-

0 commit comments

Comments
 (0)