Skip to content

Commit 9509eb5

Browse files
committed
Notes on string and symbol matching in Mug.
1 parent 12429c1 commit 9509eb5

1 file changed

Lines changed: 107 additions & 37 deletions

File tree

www/notes/mug.scrbl

Lines changed: 107 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,6 @@ Now, while this does allocate string literals statically, using memory
230230
within to the program to store the string, it doesn't alone solve the
231231
problem with string literals being represented uniquely.
232232

233-
234233
@section[#:tag-prefix "mug"]{Static Interning}
235234

236235
We've seen static memory, but we still need to make sure every string
@@ -383,50 +382,36 @@ returned.
383382
Using @racket[literals], we can write a function that compiles all of
384383
the string literals into static data as follows:
385384

385+
@(ev '(require mug/compile-literals))
386+
386387
@#reader scribble/comment-reader
387-
(racketblock
388+
(ex
388389
;; Prog -> Asm
389390
(define (compile-literals p)
390-
(seq (Data)
391-
(compile-literals-data (literals p))
392-
(Text)))
391+
(append-map compile-literal (literals p)))
393392

394-
;; [Listof Symbol] -> Asm
395-
(define (compile-literals-data ss)
396-
(append-map compile-literal-data ss))
393+
;; [Listof Char] -> Asm
394+
(define (compile-string-chars cs)
395+
(match cs
396+
['() (seq)]
397+
[(cons c cs)
398+
(seq (Dd (char->integer c))
399+
(compile-string-chars cs))]))
397400

398401
;; Symbol -> Asm
399-
(define (compile-literal-data s)
400-
(let ((str (symbol->string s)))
401-
(seq (Label (symbol->label s))
402-
(Dq (string-length str))
403-
(map Dq (map char->integer (string->list str))))))
404-
)
405-
406-
@(ev
407-
'(define (compile-literals p)
408-
(seq (Data)
409-
(compile-literals-data (strings p))
410-
(Text))))
411-
412-
@(ev
413-
'(define (compile-literals-data ss)
414-
(append-map compile-literal-data ss)))
415-
416-
@(ev
417-
'(define (compile-literal-data s)
402+
(define (compile-literal s)
418403
(let ((str (symbol->string s)))
419404
(seq (Label (symbol->label s))
420-
(Dq (string-length str))
421-
(map Dq (map char->integer (string->list str)))))))
405+
(Dq (string-length str))
406+
(compile-string-chars (string->list str))
407+
(if (odd? (string-length str))
408+
(seq (Dd 0))
409+
(seq)))))
422410

423-
So now we can reconstruct our example with:
424-
425-
@ex[
426411
(seq (compile-string "Hello!")
427412
(compile-string "Hello!")
428-
(compile-literals-data '(Hello!)))
429-
]
413+
(compile-literal 'Hello!))
414+
)
430415

431416
We've seemingly reached our goal. However, there is a fairly nasty
432417
little bug with our approach. Can you spot it?
@@ -474,7 +459,7 @@ considered @racket[eq?] to each other:
474459
@ex[
475460
(seq (compile-string "Hello!")
476461
(compile-string "Hello!")
477-
(compile-literals-data '(Hello!)))
462+
(compile-literal 'Hello!))
478463
]
479464

480465
We can try it out to confirm some examples.
@@ -603,7 +588,7 @@ only returns a single literal:
603588
(literals (parse '[(begin "Hello!" 'Hello!)]))
604589
]
605590

606-
But actually this is just fine. What happens is that only a signle
591+
But actually this is just fine. What happens is that only a single
607592
chunk of memory is allocated to hold the character data @tt{H},
608593
@tt{e}, @tt{l}, @tt{l}, @tt{o}, @tt{!}, but the @emph{symbol}
609594
@racket['Hello] is represented as a pointer to this data tagged as a
@@ -613,7 +598,7 @@ pointer, but tagged as a string. So this program compiles to:
613598
@ex[
614599
(seq (compile-string "Hello!")
615600
(compile-symbol 'Hello!)
616-
(compile-literals-data '(Hello!)))
601+
(compile-literal 'Hello!))
617602
]
618603

619604
We have now added a symbol data type and have implement static
@@ -869,6 +854,91 @@ We can confirm this works as expected:
869854
With that, we have completed the implementation of symbols and strings
870855
with the proper interning behavior.
871856

857+
858+
@section[#:tag-prefix "mug"]{Matching symbols and strings}
859+
860+
Since we have @racket[match] in our language, we should probably add
861+
the ability to match against strings and symbols.
862+
863+
We can extend the AST definition for patterns:
864+
865+
@filebox-include-fake[codeblock "mug/ast.rkt"]{
866+
;; type Pat = ...
867+
;; | (PSymb Symbol)
868+
;; | (PStr String)
869+
(struct PSymb (s) #:prefab)
870+
(struct PStr (s) #:prefab)
871+
}
872+
873+
Extending the interpreter is straightforward:
874+
875+
@filebox-include-fake[codeblock "mug/interp.rkt"]{
876+
;; Pat Value Env -> [Maybe Env]
877+
(define (interp-match-pat p v r)
878+
(match p
879+
; ...
880+
[(PSymb s) (and (eq? s v) r)]
881+
[(PStr s) (and (string? v) (string=? s v) r)]))
882+
}
883+
884+
Extending the compiler is more involved, but essentially boils down to
885+
doing exactly what the interpreter is doing above:
886+
887+
@filebox-include-fake[codeblock "mug/compile-expr.rkt"]{
888+
;; Pat CEnv Symbol -> (list Asm Asm CEnv)
889+
(define (compile-pattern p cm next)
890+
(match p
891+
; ...
892+
[(PStr s)
893+
(let ((fail (gensym)))
894+
(list (seq (Lea rdi (symbol->data-label (string->symbol s)))
895+
(Mov r8 rax)
896+
(And r8 ptr-mask)
897+
(Cmp r8 type-str)
898+
(Jne fail)
899+
(Xor rax type-str)
900+
(Mov rsi rax)
901+
pad-stack
902+
(Call 'symb_cmp)
903+
unpad-stack
904+
(Cmp rax 0)
905+
(Jne fail))
906+
(seq (Label fail)
907+
(Add rsp (* 8 (length cm)))
908+
(Jmp next))
909+
cm))]
910+
[(PSymb s)
911+
(let ((fail (gensym)))
912+
(list (seq (Lea r9 (Plus (symbol->data-label s) type-symb))
913+
(Cmp rax r9)
914+
(Jne fail))
915+
(seq (Label fail)
916+
(Add rsp (* 8 (length cm)))
917+
(Jmp next))
918+
cm))]))
919+
}
920+
921+
The implementation of string matching uses the @tt{symb_cmp} function
922+
from the run-time system, checking whether it returns @racket[0] to
923+
indicate the strings are the same. (Although the function is
924+
concerned with comparing symbols, symbols and strings are represented
925+
the same, so it works just as well to compare strings.)
926+
927+
We can confirm some examples:
928+
929+
@ex[
930+
(run '(match 'foo
931+
['foo 1]
932+
["foo" 2]))
933+
(run '(match "foo"
934+
['foo 1]
935+
["foo" 2]))
936+
(run '(match (cons '+ (cons 1 (cons 2 '())))
937+
[(cons '+ (cons x (cons y '())))
938+
(+ x y)]))
939+
]
940+
941+
872942
@section[#:tag-prefix "mug"]{Compiling Symbols and Strings}
873943

874944
We can now put the pieces together for the complete compiler.

0 commit comments

Comments
 (0)