-
Notifications
You must be signed in to change notification settings - Fork 33
Expand file tree
/
Copy pathdodger.scrbl
More file actions
216 lines (145 loc) · 5.93 KB
/
dodger.scrbl
File metadata and controls
216 lines (145 loc) · 5.93 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
#lang scribble/manual
@(require (for-label (except-in racket ... compile) a86/printer a86/ast))
@(require redex/pict
racket/runtime-path
scribble/examples
"../fancyverb.rkt"
"utils.rkt"
"ev.rkt"
"../utils.rkt")
@(define codeblock-include (make-codeblock-include #'h))
@(ev '(require rackunit a86))
@(for-each (λ (f) (ev `(require (file ,(path->string (build-path langs "dodger" f))))))
'("main.rkt" "random.rkt" "correct.rkt"))
@title[#:tag "Dodger"]{Dodger: addressing a lack of character}
@src-code["dodger"]
@emph{There are 11 types of values...}
@table-of-contents[]
@section{Characters}
In @secref{Dupe}, we saw how to accomodate disjoint
datatypes, namely integers and booleans. Let's add yet
another: the character type. Conceptually, there's not much
new here (hence we stay in the "D" family of languages);
we're simply adding a third type, which we distinguish from
the other two by using more bits to tag the type.
We'll call it @bold{Dodger}.
To the syntax of expressions, we add character literals.
We will also add the following operations:
@itemlist[
@item{@racket[char?] @tt|{: Any -> Boolean}|: predicate for recognizing character values}
@item{@racket[integer->char] @tt|{: Integer -> Character}|: converts from integers to characters}
@item{@racket[char->integer] @tt|{: Character -> Integer}|: converts from integers to characters}
]
Abstract syntax is modelled with the following datatype definition:
@codeblock-include["dodger/ast.rkt"]
The s-expression parser is defined as follows:
@codeblock-include["dodger/parse.rkt"]
@ex[
(parse #\a)
(parse '(char? #\λ))
(parse '(char->integer #\λ))
(parse '(integer->char 97))]
@section{Characters in Racket}
Racket has a Character data type for representing single letters. A
Racket character can represent any of the 1,114,112 Unicode
@link["http://unicode.org/glossary/#code_point"]{code points}.
The way a character is most often written is an octothorp, followed by
a backslash, followed by the character itself. So for example the
character @tt{a} is written @racket[#\a]. The character @tt{λ} is
written @racket[#\λ]. The character @tt{文} is written @racket[#\文].
A character can be converted to an integer and @emph{vice versa}:
@ex[
(char->integer #\a)
(char->integer #\λ)
(char->integer #\文)
(integer->char 97)
(integer->char 955)
(integer->char 25991)
]
However, integers in the range of valid code points are acceptable to
@racket[integer->char] and using any other integer will produce an
error:
@ex[
(eval:error (integer->char -1))
(eval:error (integer->char 55296))
]
There are a few other ways to write characters (see the
Racket
@link["https://docs.racket-lang.org/reference/reader.html#%28part._parse-character%29"]{
Reference} for the details), but you don't have to worry
much about this since @racket[read] takes care of reading
characters in all their different forms. The run-time
system, described below, takes care of printing them.
@section{Meaning of Dodger programs}
The interpeter is much like that of Dupe, except we have a new base case:
@codeblock-include["dodger/interp.rkt"]
And the interpretation of primitives is extended:
@codeblock-include["dodger/interp-prim.rkt"]
The meaning of characters and their operations are just lifted from Racket.
We can try out some examples:
@ex[
(interp (Lit #\a))
(interp (Lit #\b))
(interp (Prim1 'char? (Lit #\a)))
(interp (Prim1 'char? (Lit #t)))
(interp (Prim1 'char->integer (Lit #\a)))
(interp (Prim1 'integer->char (Prim1 'char->integer (Lit #\a))))
]
Just as in Dupe, type errors result in the interpreter crashing:
@ex[
(eval:error (interp (Prim1 'char->integer (Lit #f))))
]
Also, not every integer corresponds to a character, so when
@racket[integer->char] is given an invalid input, it crashes
(more on this in a minute):
@ex[
(eval:error (interp (Prim1 'integer->char (Lit -1))))
]
@section{Ex uno plures iterum: Out of One, Many... Again}
We have exactly the same problem as in Dupe: we need to
represent different kinds of values within our one
primordial datatype: the 64-bit integer.
We can use the following encoding scheme:
@itemlist[
@item{Integers have @tt{#b0} as the last bit; the other bits describe the integer.}
@item{Character have @tt{#b01} as the last bits; the other bits describe the character.}
@item{True is @tt{#b011} and False is @tt{#b111}.}
]
Notice that each kind of value is disjoint.
We can write down functions for encoding into and decoding out of bits:
@codeblock-include["dodger/types.rkt"]
@section{A Compiler for Dodger}
Compilation is pretty easy. The compiler uses the bit-level
representation of values described earlier and uses logical operations
to implement the bit manipulating operations. Most of the work
happens in the compilation of primitives:
@codeblock-include["dodger/compile-ops.rkt"]
In fact the @racket[compile] is identical to its Dodger predecessor,
since all the new work is done in @racket[value->bits] and
@racket[compile-op1]:
@codeblock-include["dodger/compile.rkt"]
We can take a look at a few examples:
@ex[
(compile-e (parse #\a))
(compile-e (parse #\λ))
(compile-e (parse '(char->integer #\λ)))
(compile-e (parse '(integer->char 97)))]
We can run them:
@ex[
(exec (parse #\a))
(exec (parse #\λ))
(exec (parse '(char->integer #\λ)))
(exec (parse '(integer->char 97)))]
@section{A Run-Time for Dodger}
The only interesting aspect of Dodger, really, is that we
need to add run-time support for printing character literals.
We extend the bit-encoding of values following the pattern we've
already seen:
@filebox-include[fancy-c dodger "types.h"]
And update the interface for values in the runtime system:
@filebox-include[fancy-c dodger "values.h"]
@filebox-include[fancy-c dodger "values.c"]
The only other change is that @tt{print_result} is updated to handle
the case of printing characters:
@filebox-include[fancy-c dodger "print.c"]
@;{FIXME: examples should be creating executable at the command-line, not exec.}