@@ -230,7 +230,6 @@ Now, while this does allocate string literals statically, using memory
230230within to the program to store the string, it doesn't alone solve the
231231problem with string literals being represented uniquely.
232232
233-
234233@section[#:tag-prefix "mug " ]{Static Interning}
235234
236235We've seen static memory, but we still need to make sure every string
@@ -383,50 +382,36 @@ returned.
383382Using @racket[literals], we can write a function that compiles all of
384383the string literals into static data as follows:
385384
385+ @(ev '(require mug/compile-literals))
386+
386387@#reader scribble/comment-reader
387- (racketblock
388+ (ex
388389;; Prog -> Asm
389390(define (compile-literals p)
390- (seq (Data)
391- (compile-literals-data (literals p))
392- (Text)))
391+ (append-map compile-literal (literals p)))
393392
394- ;; [Listof Symbol] -> Asm
395- (define (compile-literals-data ss)
396- (append-map compile-literal-data ss))
393+ ;; [Listof Char] -> Asm
394+ (define (compile-string-chars cs)
395+ (match cs
396+ ['() (seq)]
397+ [(cons c cs)
398+ (seq (Dd (char->integer c))
399+ (compile-string-chars cs))]))
397400
398401;; Symbol -> Asm
399- (define (compile-literal-data s)
400- (let ((str (symbol->string s)))
401- (seq (Label (symbol->label s))
402- (Dq (string-length str))
403- (map Dq (map char->integer (string->list str))))))
404- )
405-
406- @(ev
407- '(define (compile-literals p)
408- (seq (Data)
409- (compile-literals-data (strings p))
410- (Text))))
411-
412- @(ev
413- '(define (compile-literals-data ss)
414- (append-map compile-literal-data ss)))
415-
416- @(ev
417- '(define (compile-literal-data s)
402+ (define (compile-literal s)
418403 (let ((str (symbol->string s)))
419404 (seq (Label (symbol->label s))
420- (Dq (string-length str))
421- (map Dq (map char->integer (string->list str)))))))
405+ (Dq (string-length str))
406+ (compile-string-chars (string->list str))
407+ (if (odd? (string-length str))
408+ (seq (Dd 0 ))
409+ (seq)))))
422410
423- So now we can reconstruct our example with:
424-
425- @ex[
426411(seq (compile-string "Hello! " )
427412 (compile-string "Hello! " )
428- (compile-literals-data '( Hello!) ))
429- ]
413+ (compile-literal ' Hello! ))
414+ )
430415
431416We've seemingly reached our goal. However, there is a fairly nasty
432417little bug with our approach. Can you spot it?
@@ -474,7 +459,7 @@ considered @racket[eq?] to each other:
474459@ex[
475460(seq (compile-string "Hello! " )
476461 (compile-string "Hello! " )
477- (compile-literals-data '( Hello!) ))
462+ (compile-literal ' Hello! ))
478463]
479464
480465We can try it out to confirm some examples.
@@ -603,7 +588,7 @@ only returns a single literal:
603588(literals (parse '[(begin "Hello! " 'Hello! )]))
604589]
605590
606- But actually this is just fine. What happens is that only a signle
591+ But actually this is just fine. What happens is that only a single
607592chunk of memory is allocated to hold the character data @tt{H},
608593@tt{e}, @tt{l}, @tt{l}, @tt{o}, @tt{!}, but the @emph{symbol}
609594@racket['Hello ] is represented as a pointer to this data tagged as a
@@ -613,7 +598,7 @@ pointer, but tagged as a string. So this program compiles to:
613598@ex[
614599(seq (compile-string "Hello! " )
615600 (compile-symbol 'Hello! )
616- (compile-literals-data '( Hello!) ))
601+ (compile-literal ' Hello! ))
617602]
618603
619604We have now added a symbol data type and have implement static
@@ -869,6 +854,91 @@ We can confirm this works as expected:
869854With that, we have completed the implementation of symbols and strings
870855with the proper interning behavior.
871856
857+
858+ @section[#:tag-prefix "mug " ]{Matching symbols and strings}
859+
860+ Since we have @racket[match ] in our language, we should probably add
861+ the ability to match against strings and symbols.
862+
863+ We can extend the AST definition for patterns:
864+
865+ @filebox-include-fake[codeblock "mug/ast.rkt " ]{
866+ ;; type Pat = ...
867+ ;; | (PSymb Symbol)
868+ ;; | (PStr String)
869+ (struct PSymb (s) #:prefab )
870+ (struct PStr (s) #:prefab )
871+ }
872+
873+ Extending the interpreter is straightforward:
874+
875+ @filebox-include-fake[codeblock "mug/interp.rkt " ]{
876+ ;; Pat Value Env -> [Maybe Env]
877+ (define (interp-match-pat p v r)
878+ (match p
879+ ; ...
880+ [(PSymb s) (and (eq? s v) r)]
881+ [(PStr s) (and (string? v) (string=? s v) r)]))
882+ }
883+
884+ Extending the compiler is more involved, but essentially boils down to
885+ doing exactly what the interpreter is doing above:
886+
887+ @filebox-include-fake[codeblock "mug/compile-expr.rkt " ]{
888+ ;; Pat CEnv Symbol -> (list Asm Asm CEnv)
889+ (define (compile-pattern p cm next)
890+ (match p
891+ ; ...
892+ [(PStr s)
893+ (let ((fail (gensym)))
894+ (list (seq (Lea rdi (symbol->data-label (string->symbol s)))
895+ (Mov r8 rax)
896+ (And r8 ptr-mask)
897+ (Cmp r8 type-str)
898+ (Jne fail)
899+ (Xor rax type-str)
900+ (Mov rsi rax)
901+ pad-stack
902+ (Call 'symb_cmp )
903+ unpad-stack
904+ (Cmp rax 0 )
905+ (Jne fail))
906+ (seq (Label fail)
907+ (Add rsp (* 8 (length cm)))
908+ (Jmp next))
909+ cm))]
910+ [(PSymb s)
911+ (let ((fail (gensym)))
912+ (list (seq (Lea r9 (Plus (symbol->data-label s) type-symb))
913+ (Cmp rax r9)
914+ (Jne fail))
915+ (seq (Label fail)
916+ (Add rsp (* 8 (length cm)))
917+ (Jmp next))
918+ cm))]))
919+ }
920+
921+ The implementation of string matching uses the @tt{symb_cmp} function
922+ from the run-time system, checking whether it returns @racket[0 ] to
923+ indicate the strings are the same. (Although the function is
924+ concerned with comparing symbols, symbols and strings are represented
925+ the same, so it works just as well to compare strings.)
926+
927+ We can confirm some examples:
928+
929+ @ex[
930+ (run '(match 'foo
931+ ['foo 1 ]
932+ ["foo " 2 ]))
933+ (run '(match "foo "
934+ ['foo 1 ]
935+ ["foo " 2 ]))
936+ (run '(match (cons '+ (cons 1 (cons 2 '() )))
937+ [(cons '+ (cons x (cons y '() )))
938+ (+ x y)]))
939+ ]
940+
941+
872942@section[#:tag-prefix "mug " ]{Compiling Symbols and Strings}
873943
874944We can now put the pieces together for the complete compiler.
0 commit comments