|
| 1 | +#lang scribble/manual |
| 2 | + |
| 3 | +@(require (for-label (except-in racket ...))) |
| 4 | +@(require redex/pict |
| 5 | + racket/runtime-path |
| 6 | + scribble/examples |
| 7 | + "utils.rkt" |
| 8 | + "ev.rkt" |
| 9 | + "../utils.rkt") |
| 10 | + |
| 11 | +@(define codeblock-include (make-codeblock-include #'h)) |
| 12 | + |
| 13 | +@(for-each (λ (f) (ev `(require (file ,(path->string (build-path notes "knock" f)))))) |
| 14 | + '("interp.rkt" "compile.rkt" "ast.rkt" "syntax.rkt" "asm/interp.rkt" "asm/printer.rkt")) |
| 15 | + |
| 16 | +@title[#:tag "Graphviz"]{Using Graphviz/dot to visualize our AST} |
| 17 | + |
| 18 | +@table-of-contents[] |
| 19 | + |
| 20 | +@section[#:tag-prefix "graphviz"]{Visualizing ASTs} |
| 21 | + |
| 22 | +Abstract Syntax Trees (ASTs) are a useful abstraction when dealing with |
| 23 | +programming languages as an object for analysis or manipulation (e.g. |
| 24 | +compilation). At the same time, these structures can quickly become |
| 25 | +too large reason about just by looking at it. For example, in Knock, |
| 26 | +our AST for @racket[(if (zero? x) (add1 (add1 x)) (sub1 x))] looks |
| 27 | +like the following: |
| 28 | + |
| 29 | +@#reader scribble/comment-reader |
| 30 | +(racketblock |
| 31 | + (if-e |
| 32 | + (prim-e 'zero? (list (var-e 'x))) |
| 33 | + (prim-e 'add1 (list (prim-e 'add1 (list (int-e 1))))) |
| 34 | + (prim-e 'sub1 (list (var-e 'x)))) |
| 35 | +) |
| 36 | + |
| 37 | +This has all the information necessary for manipulating our program (and more), |
| 38 | +it's a bit unwieldy to look at. Particularly when debugging, it can |
| 39 | +be useful to see the overal @emph{shape} of the AST. This is particularly |
| 40 | +true when speaking about program transformations. |
| 41 | + |
| 42 | +Take, for example, the program transformation from the first Midterm. Applying |
| 43 | +that transformation (updated for Knock) to the above program, results in |
| 44 | +the following AST: |
| 45 | + |
| 46 | +@#reader scribble/comment-reader |
| 47 | +(racketblock |
| 48 | + (if-e |
| 49 | + (prim-e 'zero? (list (var-e 'x) |
| 50 | + (let-e |
| 51 | + (list (binding 'g387 (prim-e 'add1 (list (int-e 1)))) |
| 52 | + (prim-e 'add1 (list (var-e 'g387)) |
| 53 | + (prim-e 'sub1 (list (var-e 'x))) |
| 54 | + |
| 55 | +Was the program transformation done correctly? If you study the AST |
| 56 | +carefully, you can determine that it was. However, it would be easier |
| 57 | +if we could, at a glance, answer the question ``Are primitive operations |
| 58 | +only applied to simple (i.e. not nested) expressions?'' |
| 59 | + |
| 60 | +Using diagrams makes answering this question marginally easier: |
| 61 | + |
| 62 | +Before transformation: |
| 63 | + |
| 64 | +@image{img/initial.png} |
| 65 | + |
| 66 | +After transformation: |
| 67 | + |
| 68 | +@image{img/transformed.png} |
| 69 | + |
| 70 | +The diagram above helps us visualize the transformed AST, but we still |
| 71 | +have to study the diagram carefully to know which nodes correspond |
| 72 | +to primitive operations (which are the subject of the transformation). |
| 73 | +This can be remedied easily, by coloring these nodes differently: |
| 74 | + |
| 75 | +Before transformation: |
| 76 | + |
| 77 | +@image{img/initial-v2.png} |
| 78 | + |
| 79 | +After transformation: |
| 80 | + |
| 81 | +@image{img/transformed-v2.png} |
| 82 | + |
| 83 | +These diagrams were made using @tt{dot} a tool provided by the |
| 84 | +@link["https://graphviz.org/"]{Graphviz}, which is a set of software components |
| 85 | +for visualizing graphs. |
| 86 | + |
| 87 | +@section[#:tag-prefix "graphviz"]{Using dot} |
| 88 | + |
| 89 | +Graphviz has many components, but we will focus on @tt{dot}, which is |
| 90 | +the tool for laying out directed graphs. The full manual for @tt{dot} |
| 91 | +can be found on the graphviz website: |
| 92 | +@link["https://www.graphviz.org/pdf/dotguide.pdf"]{https://www.graphviz.org/pdf/dotguide.pdf}. |
| 93 | + |
| 94 | +Instructions for downloading Graphviz (and therefore @tt{dot}) can be found on |
| 95 | +their website as well: |
| 96 | +@link["https://www.graphviz.org/download/"]{https://www.graphviz.org/download/} |
| 97 | + |
| 98 | +The syntax for @tt{dot} files is fairly straightforward, you first declare the |
| 99 | +type of graph, and give it a name. For our purposes the type will always be |
| 100 | +@tt{digraph} (i.e. directed-graph), and the name can be whatever you choose |
| 101 | +(though it will likely not matter much). For example: |
| 102 | + |
| 103 | +@verbatim| |
| 104 | +digraph CMSC430 { |
| 105 | + ... |
| 106 | +} |
| 107 | +| |
| 108 | + |
| 109 | +The ellipses are where you describe the graph you'd like to visualize. The |
| 110 | +designers of Graphviz provide a grammar describing the language accepted by |
| 111 | +their tools (I wish all system designers provided a grammar!). This can be |
| 112 | +found on the Graphviz website: |
| 113 | +@link["https://graphviz.org/doc/info/lang.html"]{https://graphviz.org/doc/info/lang.html} |
| 114 | + |
| 115 | +Most of the time you will not need to consult the grammar, as most of the |
| 116 | +simple rules are straightforward for those that have programmed in C or Java. |
| 117 | + |
| 118 | +In short, the description of a graph is a list of statements, statements can |
| 119 | +take many forms, but for this course (and most likely for any uses beyond this |
| 120 | +course), you can basically just use the following three types of statements: |
| 121 | + |
| 122 | + |
| 123 | +@itemlist[ |
| 124 | + |
| 125 | +@item{Node statements} |
| 126 | +@item{Edge statements} |
| 127 | +@item{Attribute statements} |
| 128 | + |
| 129 | +] |
| 130 | + |
| 131 | + |
| 132 | +Node statements are just an ASCII string (representing a Node ID) and an |
| 133 | +optional list of attributes for that node. For example: |
| 134 | + |
| 135 | +@verbatim| |
| 136 | +digraph CMSC430 { |
| 137 | + lexer; |
| 138 | + parser [shape=box]; |
| 139 | + code_gen [color=red]; |
| 140 | +} |
| 141 | +| |
| 142 | + |
| 143 | +Using the @tt{dot} tool on a file with the above as its contents produces the |
| 144 | +following diagram: |
| 145 | + |
| 146 | +@image{img/nodes.png} |
| 147 | + |
| 148 | +Edge statements connect nodes in our graph, for example: |
| 149 | + |
| 150 | +@verbatim| |
| 151 | +digraph CMSC430 { |
| 152 | + lexer -> parser -> code_gen; |
| 153 | + parser [shape=box]; |
| 154 | + code_gen [color=red]; |
| 155 | +} |
| 156 | +| |
| 157 | + |
| 158 | +This produces the following diagram: |
| 159 | + |
| 160 | +@image{img/edges1.png} |
| 161 | + |
| 162 | +You may wonder if the order matters here. While the @emph{horizontal} order |
| 163 | +matters when specifying the edges in an edge statement, the @emph{vertical} |
| 164 | +order does not matter in this case. The following produces the same diagram: |
| 165 | + |
| 166 | +@verbatim| |
| 167 | +digraph CMSC430 { |
| 168 | + parser [shape=box]; |
| 169 | + code_gen [color=red]; |
| 170 | + lexer -> parser -> code_gen; |
| 171 | +} |
| 172 | +| |
| 173 | + |
| 174 | +Notice that @tt{lexer} does not have its own `declaration' this is because it |
| 175 | +is unnecessary unless you want to attach attributes to a node (as we do |
| 176 | +with @tt{parser} and @tt{code_gen}). |
| 177 | + |
| 178 | +Edge statements also support an optional list of attributes, the following |
| 179 | +produces a similar diagram except that both edges are shaded ``deeppink2'' (for |
| 180 | +the full list of supported colors, see the official documentation). |
| 181 | + |
| 182 | +@verbatim| |
| 183 | +digraph CMSC430 { |
| 184 | + lexer -> parser -> code_gen [color=deeppink2]; |
| 185 | + parser [shape=box]; |
| 186 | + code_gen [color=red]; |
| 187 | +} |
| 188 | +| |
| 189 | + |
| 190 | +Attribute nodes describe a set of attributes that apply to all subsequent |
| 191 | +statements (which means that vertical order @emph{does} matter here!). Unless |
| 192 | +overridden by a specific attribute, all statements following an attribute |
| 193 | +statement will `default' to the attributes specified in the statement. |
| 194 | + |
| 195 | +Here we added three attribute statements. Take a minute to study the example |
| 196 | +below and see how each attribute statement affects the output. |
| 197 | + |
| 198 | + |
| 199 | +@verbatim| |
| 200 | +digraph CMSC430 { |
| 201 | + edge [color=blue]; |
| 202 | + lexer -> parser |
| 203 | + edge [color=deeppink2]; |
| 204 | + node [shape=triangle]; |
| 205 | + parser -> optimizer; |
| 206 | + parser [shape=box]; |
| 207 | + code_gen [color=red]; |
| 208 | + optimizer -> code_gen; |
| 209 | +} |
| 210 | +| |
| 211 | + |
| 212 | +@image{img/edges3.png} |
| 213 | + |
| 214 | + |
| 215 | +@section[#:tag-prefix "graphviz"]{Using graphviz programmatically} |
| 216 | + |
| 217 | +What we've done is write a small Racket library that abstracts away some of the |
| 218 | +details of making @tt{dot} diagrams so that we can automatically generate |
| 219 | +digrams from our AST. One such detail is that we have to generate unique node |
| 220 | +IDs for each node in our AST (we do this using @tt{gensym}), but then add |
| 221 | +attributes that label our nodes with the relevant information (e.g. that it's |
| 222 | +an @tt{if} node). |
| 223 | + |
| 224 | +Here is an example of a @tt{dot} description make using our library on the program |
| 225 | +@racket[(if (zero? x) 1 2)]: |
| 226 | + |
| 227 | +@verbatim| |
| 228 | +digraph prog { |
| 229 | + g850 [ label=" x " ]; |
| 230 | + g849 [ color=red,label=" (zero? ...) " ]; |
| 231 | + g849 -> g850 ; |
| 232 | + g851 [ label=" 1 " ]; |
| 233 | + g852 [ label=" 2 " ]; |
| 234 | + g848 [ label=" if " ]; |
| 235 | + g848 -> g849 ; |
| 236 | + g848 -> g851 ; |
| 237 | + g848 -> g852 ; |
| 238 | +} |
| 239 | +| |
| 240 | + |
| 241 | +Not super nice to read, but we had a program write it for us! |
| 242 | + |
| 243 | + |
| 244 | +The complete library (three files): |
| 245 | + |
| 246 | +@codeblock-include["knock/dot.rkt"] |
| 247 | +@codeblock-include["knock/render-ast.rkt"] |
| 248 | +@codeblock-include["knock/pretty-printer.rkt"] |
0 commit comments