Skip to content

Commit 5bb67e1

Browse files
committed
Working in literals to Hustle notes.
1 parent 9e2d321 commit 5bb67e1

File tree

1 file changed

+188
-17
lines changed

1 file changed

+188
-17
lines changed

www/notes/hustle.scrbl

Lines changed: 188 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#lang scribble/manual
22

3-
@(require (for-label (except-in racket ... compile) a86))
3+
@(require (for-label (except-in racket ... compile) (except-in a86 exp)))
44
@(require redex/pict
55
racket/runtime-path
66
scribble/examples
@@ -13,7 +13,7 @@
1313

1414
@(define codeblock-include (make-codeblock-include #'h))
1515

16-
@(ev '(require rackunit a86))
16+
@(ev '(require rackunit (except-in a86 exp)))
1717
@(ev `(current-directory ,(path->string (build-path langs "hustle"))))
1818
@(void (ev '(with-output-to-string (thunk (system "make runtime.o")))))
1919
@(for-each (λ (f) (ev `(require (file ,f))))
@@ -70,10 +70,6 @@ The new operations include constructors @racket[(box _e)] and
7070
predicates for identifying boxes and pairs: @racket[(box? _e)] and
7171
@racket[(cons? _e)].
7272

73-
@margin-note{Usually boxes are @emph{mutable} data structures, like
74-
OCaml's @tt{ref} type, but we will examine this aspect later. For now,
75-
we treat boxes as immutable data structures.}
76-
7773
These features will operate like their Racket counterparts:
7874
@ex[
7975
(unbox (box 7))
@@ -85,6 +81,24 @@ These features will operate like their Racket counterparts:
8581
(cons? (box 7))
8682
]
8783

84+
@margin-note{Usually boxes are @emph{mutable} data structures, like
85+
OCaml's @tt{ref} type, but we will examine this aspect later. For now,
86+
we treat boxes as immutable data structures.}
87+
88+
We will also add support for writing pair and box @emph{literals}
89+
using the same @racket[quote] notation that Racket uses.
90+
91+
These features will operate like their Racket counterparts:
92+
@ex[
93+
(unbox '#&7)
94+
(car '(3 . 4))
95+
(cdr '(3 . 4))
96+
(box? '#&7)
97+
(cons? '(3 . 4))
98+
(box? '(3 . 4))
99+
(cons? '#&7)
100+
]
101+
88102
@section{Empty lists can be all and end all}
89103

90104
While we've introduced pairs, you may wonder what about @emph{lists}?
@@ -105,31 +119,188 @@ We use the following AST data type for @|this-lang|:
105119
@filebox-include-fake[codeblock "hustle/ast.rkt"]{
106120
#lang racket
107121
;; type Expr = ... | (Lit Datum)
108-
;; type Datum = ... | '()
122+
;; type Datum = ... | (cons Datum Datum) | (box Datum) | '()
109123
;; type Op1 = ... | 'box | 'car | 'cdr | 'unbox | 'box? | 'cons?
110124
;; type Op2 = ... | 'cons
111125
}
112126

127+
@section{Parsing}
128+
129+
Mostly the parser updates for @|this-lang| are uninteresting. The
130+
only slight twist is the addition of compound literal datums.
131+
132+
It's worth observing a few things about how @racket[quote] works in
133+
Racket. First, some datums are @emph{self-quoting}, i.e. we can
134+
write them with or without quoting and they mean the same thing:
135+
@ex[
136+
5
137+
'5]
138+
139+
All of the datums consider prior to @|this-lang| have been self-quoting:
140+
booleans, integers, and characters.
141+
142+
Of the new datums, boxes are self-quoting, but pairs and the empty
143+
list are not.
144+
@ex[
145+
#&7
146+
'#&7
147+
(eval:error ())
148+
'()
149+
(eval:error (1 . 2))
150+
'(1 . 2)]
151+
152+
The reason for this is that unquoted list datums would be confused
153+
with expression forms without the @racket[quote], so its required,
154+
however for the other datums, there's no possible confusion and the
155+
@racket[quote] is inferred. Note also that once inside a self-quoting
156+
datum, it's unambiguous that we're talking about literal data and not
157+
expressions that need to be evaluated, so you can have empty lists and
158+
pairs:
159+
@ex[
160+
#&()
161+
#&(1 . 2)]
162+
163+
This gives rise to two notions of datums that our parser uses,
164+
with (mutually defined) predicates for each:
165+
166+
@filebox-include-fake[codeblock "hustle/parse.rkt"]{
167+
;; Any -> Boolean
168+
(define (self-quoting-datum? x)
169+
(or (exact-integer? x)
170+
(boolean? x)
171+
(char? x)
172+
(and (box? x) (datum? (unbox x)))))
173+
174+
;; Any -> Boolean
175+
(define (datum? x)
176+
(or (self-quoting-datum? x)
177+
(empty? x)
178+
(and (cons? x) (datum? (car x)) (datum? (cdr x)))))
179+
}
180+
181+
Now when the parser encounters something that is a self-quoting datum,
182+
it can parse it as a @racket[Lit]. But for datums that are quoted, it
183+
will need to recognize the @racket[quote] form, so anything that has
184+
the s-expression shape @racket[(quote d)] will also get parsed as a
185+
@racket[Lit].
186+
187+
Things can get a little confusing here so let's look at some examples:
188+
@ex[
189+
(parse 5)
190+
(parse '5)
191+
]
192+
193+
Here, both examples are really the same. When we write @racket['5],
194+
that @racket[read]s it as @racket[5], so this is really the same
195+
example and corresponds to an input program that just contains the
196+
number @racket[5] and we are calling @racket[parse] with an argument
197+
of @racket[5].
198+
199+
If the input program contained a quoted @racket[5], then it would be
200+
@racket['5], which we would represent as an s-expression as
201+
@racket[''5]. Note that this reads as @racket['(quote 5)], i.e. a
202+
two-element list with the symbol @racket['quote] as the first element
203+
and the number @racket[5] as the second. So when writing examples
204+
where the input program itself uses @racket[quote] we will see this
205+
kind of double quotation, and we are calling @racket[parse] with
206+
a two-element list as the argument:
207+
208+
@ex[
209+
(parse ''5)]
210+
211+
This is saying that the input program was @racket['5]. Notice that it
212+
gets parsed the same as @racket[5] by our parser.
213+
214+
If we were to parse the empty list, this should be considered a parse
215+
error because it's like writing @racket[()] in Racket; it's not a valid
216+
expression form:
217+
218+
@ex[
219+
(eval:error (parse '()))]
220+
221+
However, if the empty list is quoted, i.e. @racket[''()], then we are
222+
talking about the expression @racket['()], so this gets parsed as
223+
@racket[(Lit '())]:
224+
225+
@ex[
226+
(parse ''())]
227+
228+
It works similarly for pairs:
229+
230+
@ex[
231+
(eval:error (parse '(1 . 2)))
232+
(parse ''(1 . 2))]
233+
234+
While these examples can be a bit confusing at first, implementing
235+
this behavior is pretty simple. If the input is a
236+
@racket[self-quoting-datum?], then we parse it as a @racket[Lit]
237+
containing that datum. If the the input is a two-element list of the
238+
form @racket[(list 'quote _d)] and @racket[_d] is a @racket[datum?],
239+
the we parse it as a @racket[Lit] containing @racket[_d].
240+
241+
Note that @emph{if} the examples are confusing, the parser actually
242+
explains what's going on in Racket. Somewhere down in the code that
243+
implements @racket[read] is something equivalent to what we've done
244+
here in @racket[parse] for handling self-quoting and explicitly quoted
245+
datums. Also note that after the parsing phase, self-quoting and
246+
quoted datums are unified as @racket[Lit]s and we no longer need to be
247+
concerned with any distinctions that existed in the concrete syntax.
248+
249+
The only other changes to the parser are that we've added some new
250+
unary and binary primitive names that the parser now recognizes for
251+
things like @racket[cons], @racket[car], @racket[cons?], etc.
252+
253+
@codeblock-include["hustle/parse.rkt"]
254+
255+
256+
257+
113258
@section{Meaning of @this-lang programs, implicitly}
114259

115-
The interpreter has an update to the @racket[interp-prim]
116-
module:
260+
To extend our interpreter, we can follow the same pattern we've been
261+
following so far. We have new kinds of values such as pairs, boxes,
262+
and the empty list, so we have to think about how to represent them,
263+
but the natural thing to do is to represent them with the
264+
corresponding kind of value from Racket. Just as we represent Hustle
265+
booleans with Racket booleans, Hustle integers with Racket integers,
266+
and so on, we can also represent Hustle pairs with Racket pairs. We
267+
can represent Hustle boxes with Racket boxes. We can represent
268+
Hustle's empty list with Racket's empty list.
269+
270+
Under this choice of representation, there's very little to do in
271+
the interpreter. We only need to update the interpretation of
272+
primitives to account for our new primitives such as @racket[cons],
273+
@racket[car], etc. And how should these primitives be interpreted?
274+
Using their Racket counterparts of course!
117275

118276
@codeblock-include["hustle/interp-prim.rkt"]
119277

120-
The interpreter doesn't really shed light on how constructing
121-
inductive data works because it simply uses the mechanism of the
122-
defining language to construct it. Inductively defined data is easy
123-
to model in this interpreter because we can rely on the mechanisms
124-
provided for constructing inductively defined data at the meta-level
125-
of Racket.
278+
We can try it out:
279+
280+
@ex[
281+
(interp (parse '(cons 1 2)))
282+
(interp (parse '(car (cons 1 2))))
283+
(interp (parse '(cdr (cons 1 2))))
284+
(interp (parse '(car '(1 . 2))))
285+
(interp (parse '(cdr '(1 . 2))))
286+
(interp (parse '(let ((x (cons 1 2)))
287+
(+ (car x) (cdr x)))))
288+
]
289+
290+
291+
Now while this is a perfectly good specification, this interpreter
292+
doesn't really shed light on how constructing inductive data works
293+
because it simply uses the mechanism of the defining language to
294+
construct it. Inductively defined data is easy to model in this
295+
interpreter because we can rely on the mechanisms provided for
296+
constructing inductively defined data at the meta-level of Racket.
126297

127298
The real trickiness comes when we want to model such data in an
128299
impoverished setting that doesn't have such things, which of course is
129300
the case in assembly.
130301

131-
The problem is that a value such as @racket[(box _v)] has a value
132-
inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has
302+
The main challenge is that a value such as @racket[(box _v)] has a
303+
value inside it. Pairs are even worse: @racket[(cons _v0 _v1)] has
133304
@emph{two} values inside it. If each value is represented with 64
134305
bits, it would seem a pair takes @emph{at a minimum} 128-bits to
135306
represent (plus we need some bits to indicate this value is a pair).

0 commit comments

Comments
 (0)