Skip to content

Commit 7a8dede

Browse files
authored
feat: add Extra field to store additional AST node metadata (#152)
* feat: add Extra field to store additional AST node metadata Collect direct calls and anonymous functions in Go parser and store them in the Extra field of Function, Dependency, Type, and Var. * feat: add custom extra info * docs: update uniast docs and version * use const * doc: update uniast version * doc: change FunctionIsCall to IsInvoked
1 parent 46c1f30 commit 7a8dede

7 files changed

Lines changed: 260 additions & 10 deletions

File tree

.github/workflows/regression.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ jobs:
1919
['id']
2020
['Path']
2121
['ToolVersion']
22+
['ASTVersion']
2223
['Modules']['a.b/c']['Dependencies']['a.b/c']
2324
['Modules']['a.b/c/cmdx']['Dependencies']['a.b/c/cmdx']
2425
steps:

docs/uniast-en.md

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
1+
# Universal Abstract-Syntax-Tree Specification (v0.1.5)
22

33
Universal Abstract-Syntax-Tree is a LLM-friendly, language-agnostic code context data structure established by ABCoder. It represents a unified abstract syntax tree of a repository's code, collecting definitions of language entities (functions, types, constants/variables) and their interdependencies for subsequent AI understanding and coding-workflow development.
44

@@ -370,6 +370,23 @@ Function type AST Node entity, corresponding to [NodeType] as FUNC, including fu
370370

371371
- Vars: Global variables referenced within the current function, including variables and constants
372372

373+
- Extra: Additional information for storing language-specific details or extra metadata
374+
375+
376+
- AnonymousFunctions: Anonymous functions defined in the function, each element is the FileLine of the corresponding function
377+
378+
379+
- File: The filename where it is located
380+
381+
382+
- Line: **Line number of the starting position in the file (starting from 1)**
383+
384+
385+
- StartOffset: **Byte offset of the code starting position relative to the file header**
386+
387+
388+
- EndOffset: **Byte offset of the code ending position relative to the file header**
389+
373390

374391
###### Dependency
375392

@@ -384,7 +401,10 @@ Represents a dependency relationship, containing the dependent node Id, dependen
384401
"File": "manager.go",
385402
"Line": 140,
386403
"StartOffset": 3547,
387-
"EndOffset": 3564
404+
"EndOffset": 3564,
405+
"Extra": {
406+
"IsInvoked": true
407+
}
388408
}
389409
```
390410

@@ -409,6 +429,12 @@ Represents a dependency relationship, containing the dependent node Id, dependen
409429
- EndOffset: Offset of the ending position of the dependency point (not the dependent node) token relative to the code file
410430

411431

432+
- Extra: Additional information for storing language-specific details or extra metadata
433+
434+
435+
- IsInvoked: For function/method dependencies, whether it is invoked or just referenced (not executed).
436+
437+
412438
##### Type
413439

414440
Type definition, [NodeType] is TYPE, including type definitions in specific languages such as structs, enums, interfaces, type aliases, etc.
@@ -490,6 +516,9 @@ Type definition, [NodeType] is TYPE, including type definitions in specific lang
490516
- Implements: Which interfaces this type implements Identity
491517

492518

519+
- Extra: Additional information for storing language-specific details or extra metadata
520+
521+
493522
##### Var
494523

495524
Global variables, including variables and constants, **but must be global**
@@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
553582
- Groups: Group definitions, such as `const( A=1, B=2, C=3)` in Go, Groups would be `[C=3, B=2]` (assuming A is the variable itself)
554583

555584

585+
- Extra: Additional information for storing language-specific details or extra metadata
586+
587+
588+
- AnonymousFunctions: Anonymous functions defined in the initialization function of the current variable. Each element is the FileLine of the corresponding function
589+
590+
591+
- File: The filename where it is located
592+
593+
594+
- Line: **Line number of the starting position in the file (starting from 1)**
595+
596+
597+
- StartOffset: **Byte offset of the code starting position relative to the file header**
598+
599+
600+
- EndOffset: **Byte offset of the code ending position relative to the file header**
601+
602+
556603
### Graph
557604

558605
The dependency topology graph of all AST Nodes in the repository. Formatted as Identity => Node mapping, where each Node contains dependency relationships with other nodes.

docs/uniast-zh.md

Lines changed: 50 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Universal Abstract-Syntax-Tree Specification (v0.1.3)
1+
# Universal Abstract-Syntax-Tree Specification (v0.1.5)
22

33
Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言无关的代码上下文数据结构,表示某个仓库代码的统一抽象语法树。收集了语言实体(函数、类型、常(变)量)的定义及其相互依赖关系,用于后续的 AI 理解、coding-workflow 开发。
44

@@ -371,20 +371,40 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
371371
- Vars: 当前函数内引用的全局量,包括变量和常量
372372

373373

374+
- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据
375+
376+
377+
- AnonymousFunctions: 函数中所定义的匿名函数,每个元素为对应函数的 FileLine
378+
379+
380+
- File: 所在的文件名
381+
382+
383+
- Line: **起始位置文件的行号(从1开始)**
384+
385+
386+
- StartOffset: 代码起始位置**相对文件头的字节偏移量**
387+
388+
389+
- EndOffset: 代码结束位置**相对文件头的字节偏移量**
390+
374391
###### Dependency
375392

376393
表示一个依赖关系,包含依赖节点 Id、依赖产生位置等信息,方便 LLM 准确识别
377394

378395

379-
```
396+
```json
380397
{
381398
"ModPath": "github.com/cloudwego/localsession",
382399
"PkgPath": "github.com/cloudwego/localsession",
383400
"Name": "transmitSessionIdentity",
384401
"File": "manager.go",
385402
"Line": 140,
386403
"StartOffset": 3547,
387-
"EndOffset": 3564
404+
"EndOffset": 3564,
405+
"Extra": {
406+
"IsInvoked": true
407+
}
388408
}
389409
```
390410

@@ -409,6 +429,12 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
409429
- EndOffset: 依赖点(不是被依赖节点)token 结束位置相对代码文件的偏移
410430

411431

432+
- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据
433+
434+
435+
- IsInvoked: 对于函数 / 方法调用类依赖,用于说明该函数是被调用(invoke),还是仅获取其引用而不执行。
436+
437+
412438
##### Type
413439

414440
类型定义,【NodeType】为 TYPE,包括具体语言中的类型定义,如 结构体、枚举、接口、类型别名等
@@ -490,6 +516,9 @@ Universal Abstract-Syntax-Tree 是 ABCoder 建立的一种 LLM 亲和、语言
490516
- Implements: 该类型实现了哪些接口 **Identity**
491517

492518

519+
- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据
520+
521+
493522
##### Var
494523

495524
全局量,包括变量和常量,**但是必须是全局**
@@ -553,6 +582,24 @@ var x = getx(y db.Data) int {
553582
- Groups: 同组定义, 如 Go 中的 `const( A=1, B=2, C=3)`,Groups 为 `[C=3, B=2]`(假设 A 为变量自身)
554583

555584

585+
- Extra: 额外信息,用于存储一些语言特定的信息,或者是一些额外的元数据
586+
587+
588+
- AnonymousFunctions: 在当前变量的初始化函数中,所定义的匿名函数。每个元素为对应函数的 FileLine
589+
590+
591+
- File: 所在的文件名
592+
593+
594+
- Line: **起始位置文件的行号(从1开始)**
595+
596+
597+
- StartOffset: 代码起始位置**相对文件头的字节偏移量**
598+
599+
600+
- EndOffset: 代码结束位置**相对文件头的字节偏移量**
601+
602+
556603
### Graph
557604

558605
整个仓库的 AST Node 依赖拓扑图。形式为 Identity => Node 的映射,其中每个 Node 包含对其它节点的依赖关系。基于该拓扑图,可以实现**任意节点上下文的递归获取**

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ go 1.23.4
44

55
require (
66
github.com/Knetic/govaluate v3.0.1-0.20171022003610-9aa49832a739+incompatible
7+
github.com/bytedance/sonic v1.14.1
78
github.com/cloudwego/eino v0.3.52
89
github.com/cloudwego/eino-ext/components/model/ark v0.1.16
910
github.com/cloudwego/eino-ext/components/model/claude v0.1.1
@@ -44,7 +45,6 @@ require (
4445
github.com/bahlo/generic-list-go v0.2.0 // indirect
4546
github.com/buger/jsonparser v1.1.1 // indirect
4647
github.com/bytedance/gopkg v0.1.3 // indirect
47-
github.com/bytedance/sonic v1.14.1 // indirect
4848
github.com/bytedance/sonic/loader v0.3.0 // indirect
4949
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
5050
github.com/cloudwego/base64x v0.1.6 // indirect

lang/golang/parser/file.go

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ import (
2626
. "github.com/cloudwego/abcoder/lang/uniast"
2727
)
2828

29+
const (
30+
ExtraKey_IsInvoked = "IsInvoked"
31+
ExtraKey_AnonymousFunctions = "AnonymousFunctions"
32+
)
33+
2934
func (p *GoParser) parseFile(ctx *fileContext, f *ast.File) error {
3035
cont := true
3136
ast.Inspect(f, func(node ast.Node) bool {
@@ -121,7 +126,9 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool
121126

122127
// collect func value dependencies, in case of var a = func() {...}
123128
if val != nil && !isConst {
124-
collects := collectInfos{}
129+
collects := collectInfos{
130+
directCalls: map[FileLine]bool{},
131+
}
125132
ast.Inspect(*val, func(n ast.Node) bool {
126133
return p.parseASTNode(ctx, n, &collects)
127134
})
@@ -137,6 +144,16 @@ func (p *GoParser) parseVar(ctx *fileContext, vspec *ast.ValueSpec, isConst bool
137144
for _, dep := range collects.tys {
138145
v.Dependencies = InsertDependency(v.Dependencies, dep)
139146
}
147+
if len(collects.directCalls) > 0 {
148+
for i, dep := range v.Dependencies {
149+
if collects.directCalls[dep.FileLine] {
150+
v.Dependencies[i].SetExtra(ExtraKey_IsInvoked, true)
151+
}
152+
}
153+
}
154+
if len(collects.anonymousFunctions) > 0 {
155+
v.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
156+
}
140157
}
141158

142159
if vspec.Type != nil {
@@ -392,12 +409,19 @@ func (p *GoParser) parseSelector(ctx *fileContext, expr *ast.SelectorExpr, infos
392409
type collectInfos struct {
393410
functionCalls, methodCalls []Dependency
394411
tys, globalVars []Dependency
412+
413+
directCalls map[FileLine]bool
414+
anonymousFunctions []FileLine // record anonymous function
395415
}
396416

397417
func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collectInfos) bool {
398418
switch expr := node.(type) {
399419
case *ast.SelectorExpr:
400420
return p.parseSelector(ctx, expr, collect)
421+
case *ast.CallExpr:
422+
p.parseCall(ctx, expr, collect)
423+
case *ast.FuncLit:
424+
collect.anonymousFunctions = append(collect.anonymousFunctions, ctx.FileLine(expr))
401425
case *ast.Ident:
402426
callName := expr.Name
403427
// println("[parseFunc] ast.Ident:", callName)
@@ -462,6 +486,22 @@ func (p *GoParser) parseASTNode(ctx *fileContext, node ast.Node, collect *collec
462486
return true
463487
}
464488

489+
// parseCall collect direct call info
490+
func (p *GoParser) parseCall(ctx *fileContext, expr *ast.CallExpr, collect *collectInfos) {
491+
var ident *ast.Ident
492+
493+
switch idt := expr.Fun.(type) {
494+
case *ast.Ident:
495+
ident = idt
496+
case *ast.SelectorExpr:
497+
ident = idt.Sel
498+
}
499+
500+
if ident != nil {
501+
collect.directCalls[ctx.FileLine(ident)] = true
502+
}
503+
}
504+
465505
// parseFunc parses all function declaration in one file
466506
func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Function, bool) {
467507
// method receiver
@@ -511,7 +551,9 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
511551
// collect content
512552
content := string(ctx.GetRawContent(funcDecl))
513553

514-
collects := collectInfos{}
554+
collects := collectInfos{
555+
directCalls: map[FileLine]bool{},
556+
}
515557
if funcDecl.Body == nil {
516558
goto set_func
517559
}
@@ -521,7 +563,6 @@ func (p *GoParser) parseFunc(ctx *fileContext, funcDecl *ast.FuncDecl) (*Functio
521563
})
522564

523565
set_func:
524-
525566
if fname == "init" && p.repo.GetFunction(NewIdentity(ctx.module.Name, ctx.pkgPath, fname)) != nil {
526567
// according to https://go.dev/ref/spec#Program_initialization_and_execution,
527568
// duplicated init() is allowed and never be referenced, thus add a subfix
@@ -544,6 +585,22 @@ set_func:
544585
f.Types = InsertDependency(f.Types, t)
545586
}
546587
f.Signature = string(sig)
588+
589+
if len(collects.directCalls) > 0 {
590+
for i, dep := range f.FunctionCalls {
591+
if collects.directCalls[dep.FileLine] {
592+
f.FunctionCalls[i].SetExtra(ExtraKey_IsInvoked, true)
593+
}
594+
}
595+
for i, dep := range f.MethodCalls {
596+
if collects.directCalls[dep.FileLine] {
597+
f.MethodCalls[i].SetExtra(ExtraKey_IsInvoked, true)
598+
}
599+
}
600+
}
601+
if len(collects.anonymousFunctions) > 0 {
602+
f.SetExtra(ExtraKey_AnonymousFunctions, collects.anonymousFunctions)
603+
}
547604
return f, false
548605
}
549606

0 commit comments

Comments
 (0)