Hi there, I built yutu — a YouTube CLI powered by cobra. Over time, I extended it into an MCP server and an AI agent, all within the same binary. The mcp and agent modes are just cobra subcommands:
> yutu
Available Commands:
agent Start agent to automate YouTube workflows
mcp Start MCP server
video Manipulate YouTube videos
playlist Manipulate YouTube playlists
...
Three libraries make this work:
The point of this post: any cobra application can become an MCP server and an AI agent with minimal glue code — and the main friction point is flag/schema duplication between cobra and MCP.
Architecture
The key insight is that all three interfaces share the same domain logic. Only the "input layer" differs:
+--- CLI Flags ------ cobra ------+
main.go -> cmd/ ---+ +---> pkg/<resource>/
+--- MCP Schema ---- go-sdk ------+
| |
+--- Agent --------- adk-go ------+
(reuses MCP tools via in-memory transport)
pkg/<resource>/: Pure domain logic. Each resource (video, channel, playlist, ...) exposes methods like List(), Insert(), Update(), Delete() operating on a struct with functional options.
cmd/<resource>/: Registers both cobra subcommands and MCP tools, calling the same pkg/ methods.
cmd/agent/: The agent connects to the MCP server via an in-memory transport and reuses all registered MCP tools - zero additional wiring per resource.
Step 1: Domain logic in pkg/
Each resource is a self-contained package with a struct, functional options, and methods:
// pkg/activity/activity.go
type Activity struct {
ChannelId string `json:"channel_id,omitempty"`
MaxResults int64 `json:"max_results,omitempty"`
// ...
}
func NewActivity(opts ...Option) IActivity[youtube.Activity] { /* ... */ }
func (a *Activity) List(writer io.Writer) error { /* ... */ }
Step 2: CLI + MCP in cmd/
Each resource's init() registers both a cobra command and an MCP tool side by side, sharing usage strings:
// cmd/activity/list.go
func init() {
// MCP tool registration
mcp.AddTool(cmd.Server, &mcp.Tool{
Name: "activity-list", InputSchema: listInSchema,
}, cmd.GenToolHandler("activity-list",
func(input activity.Activity, writer io.Writer) error {
return input.List(writer)
},
))
// Cobra flag registration
activityCmd.AddCommand(listCmd)
listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
// ...
}
The MCP tool handler is generic - a single GenToolHandler[T] function handles JSON deserialization into the domain struct and writes the result:
// cmd/handler.go
func GenToolHandler[T any](
toolName string, op func(T, io.Writer) error,
) mcp.ToolHandlerFor[T, any] { /* ... */ }
Step 3: Agent reuses MCP tools
The agent doesn't need to know about individual resources at all. It connects to the same MCP server via an in-memory transport and gets all tools for free:
// cmd/agent/agent.go
clientTransport, serverTransport := mcp.NewInMemoryTransports()
cmd.Server.Connect(ctx, serverTransport, nil)
mcpToolSet, _ := mcptoolset.New(mcptoolset.Config{
Transport: clientTransport,
})
This means adding a new YouTube resource to the CLI automatically makes it available as an MCP tool and an agent capability, with one registration in cmd/<resource>/.
The agent itself uses a multi-agent architecture (orchestrator + retrieval/modifier/destroyer sub-agents), with each sub-agent receiving a filtered subset of MCP tools:
tool.FilterToolset(mcpToolSet, tool.StringPredicate(def.toolNames))
The Duplication Problem
However, there is some code duplication. The main duplication comes from the input definition: flags for cobra, schema for MCP. Here is an example:
MCP Schema:
var listInSchema = &jsonschema.Schema{
Type: "object",
Properties: map[string]*jsonschema.Schema{
"channel_id": {Type: "string", Description: ciUsage},
"max_results": {Type: "number", Description: pkg.MRUsage, Default: json.RawMessage("5")},
"mine": {Type: "boolean", Description: mineUsage},
// ...
},
}
Cobra Flags:
listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
listCmd.Flags().BoolVarP(mine, "mine", "M", true, mineUsage)
They share descriptions (ciUsage, pkg.MRUsage) but everything else is defined twice.
Bridging the Gap Today
Cobra and pflag already provide building blocks that get us partway there. The pflag.Flag struct exposes:
type Flag struct {
Name string
Shorthand string
Usage string // → MCP description
Value Value // .Type() → MCP type, .String() → MCP default
DefValue string // → MCP default
Annotations map[string][]string // extensible metadata
// ...
}
And cobra adds higher-level APIs on top:
MarkFlagRequired — sets an annotation (BashCompOneRequiredFlag) → maps to MCP Required
RegisterFlagCompletionFunc — provides valid values for shell completion → conceptually maps to MCP Enum
VisitAll — iterates every flag in a command
So in theory, you could write a converter that walks a cobra command's flags and generates an MCP schema automatically:
func SchemaFromCmd(cmd *cobra.Command) *jsonschema.Schema {
schema := &jsonschema.Schema{Type: "object", Properties: map[string]*jsonschema.Schema{}}
cmd.Flags().VisitAll(func(f *pflag.Flag) {
prop := &jsonschema.Schema{
Description: f.Usage,
Default: json.RawMessage(quoteDefault(f)),
}
switch f.Value.Type() {
case "string":
prop.Type = "string"
case "int", "int64", "float64":
prop.Type = "number"
case "bool":
prop.Type = "boolean"
case "stringSlice":
prop.Type = "array"
prop.Items = &jsonschema.Schema{Type: "string"}
}
// MarkFlagRequired stores an annotation we can read back
if ann, ok := f.Annotations["cobra_annotation_bash_completion_one_required_flag"]; ok && ann[0] == "true" {
schema.Required = append(schema.Required, f.Name)
}
schema.Properties[f.Name] = prop
})
return schema
}
This covers type, default, description, and required — the overlapping subset. But the remaining MCP-only features (Enum, Minimum/Maximum, Items constraints) have no cobra equivalent to read from.
What's Missing
The gap is narrow but real:
| MCP Schema Feature |
Cobra/pflag Equivalent |
Status |
type |
Flag.Value.Type() |
Available |
description |
Flag.Usage |
Available |
default |
Flag.DefValue |
Available |
required |
MarkFlagRequired annotation |
Available (read back via Flag.Annotations) |
enum |
RegisterFlagCompletionFunc |
Partial — completion funcs aren't introspectable as a static value list |
minimum/maximum |
— |
Not available |
The closest cobra has to Enum is RegisterFlagCompletionFunc, but it registers a function (for dynamic completion), not a static list of valid values. There's no way to read back "this flag accepts only these values" as data.
Possible Directions
Two lightweight options that could close the gap without changing cobra's core:
Option A: Convention over Annotations
pflag's Annotations map[string][]string is already extensible. A community convention (or thin helper library) could encode MCP-relevant metadata:
flags.SetAnnotation("privacy", "enum", []string{"public", "private", "unlisted"})
flags.SetAnnotation("maxResults", "minimum", []string{"0"})
flags.SetAnnotation("maxResults", "maximum", []string{"50"})
The schema converter above would then pick these up. No cobra changes needed — just a convention.
Option B: First-class Enum / ValidValues on pflag
A more ergonomic approach: if pflag's Flag struct gained a ValidValues []string field (or cobra added a MarkFlagEnum method alongside MarkFlagRequired), the same data would serve shell completion, validation, and schema generation:
// Hypothetical
cmd.MarkFlagEnum("privacy", "public", "private", "unlisted")
// Internally: sets Flag.ValidValues + registers completion func + sets annotation
This would unify three things that are currently separate: completion, validation, and schema metadata.
Takeaways
- Cobra + MCP is natural:
yutu mcp is just another subcommand. The MCP server is a global var Server initialized at the package level, and each resource's init() registers tools.
- Agent for free: By connecting the agent to the MCP server via in-memory transport, you get all tools without per-resource wiring.
- Shared domain logic: The
pkg/ layer is completely interface-agnostic. CLI, MCP, and agent all call the same methods.
- Most flag metadata is already recoverable from pflag's
Flag struct + cobra annotations. A simple VisitAll loop can generate ~80% of an MCP schema today.
- The remaining gap is enum values and numeric bounds. A lightweight
Annotations convention — or a new MarkFlagEnum API — would close it.
I'd love to hear thoughts from the cobra community — has anyone else extended their CLI into an MCP server or agent? Would an Annotations-based convention or a MarkFlagEnum API be useful?
Hi there, I built yutu — a YouTube CLI powered by cobra. Over time, I extended it into an MCP server and an AI agent, all within the same binary. The
mcpandagentmodes are just cobra subcommands:> yutu Available Commands: agent Start agent to automate YouTube workflows mcp Start MCP server video Manipulate YouTube videos playlist Manipulate YouTube playlists ...Three libraries make this work:
The point of this post: any cobra application can become an MCP server and an AI agent with minimal glue code — and the main friction point is flag/schema duplication between cobra and MCP.
Architecture
The key insight is that all three interfaces share the same domain logic. Only the "input layer" differs:
pkg/<resource>/: Pure domain logic. Each resource (video, channel, playlist, ...) exposes methods likeList(),Insert(),Update(),Delete()operating on a struct with functional options.cmd/<resource>/: Registers both cobra subcommands and MCP tools, calling the samepkg/methods.cmd/agent/: The agent connects to the MCP server via an in-memory transport and reuses all registered MCP tools - zero additional wiring per resource.Step 1: Domain logic in
pkg/Each resource is a self-contained package with a struct, functional options, and methods:
Step 2: CLI + MCP in
cmd/Each resource's
init()registers both a cobra command and an MCP tool side by side, sharing usage strings:The MCP tool handler is generic - a single
GenToolHandler[T]function handles JSON deserialization into the domain struct and writes the result:Step 3: Agent reuses MCP tools
The agent doesn't need to know about individual resources at all. It connects to the same MCP server via an in-memory transport and gets all tools for free:
This means adding a new YouTube resource to the CLI automatically makes it available as an MCP tool and an agent capability, with one registration in
cmd/<resource>/.The agent itself uses a multi-agent architecture (orchestrator + retrieval/modifier/destroyer sub-agents), with each sub-agent receiving a filtered subset of MCP tools:
The Duplication Problem
However, there is some code duplication. The main duplication comes from the input definition: flags for cobra, schema for MCP. Here is an example:
MCP Schema:
Cobra Flags:
They share descriptions (
ciUsage,pkg.MRUsage) but everything else is defined twice.Bridging the Gap Today
Cobra and pflag already provide building blocks that get us partway there. The
pflag.Flagstruct exposes:And cobra adds higher-level APIs on top:
MarkFlagRequired— sets an annotation (BashCompOneRequiredFlag) → maps to MCPRequiredRegisterFlagCompletionFunc— provides valid values for shell completion → conceptually maps to MCPEnumVisitAll— iterates every flag in a commandSo in theory, you could write a converter that walks a cobra command's flags and generates an MCP schema automatically:
This covers type, default, description, and required — the overlapping subset. But the remaining MCP-only features (
Enum,Minimum/Maximum,Itemsconstraints) have no cobra equivalent to read from.What's Missing
The gap is narrow but real:
typeFlag.Value.Type()descriptionFlag.UsagedefaultFlag.DefValuerequiredMarkFlagRequiredannotationFlag.Annotations)enumRegisterFlagCompletionFuncminimum/maximumThe closest cobra has to
EnumisRegisterFlagCompletionFunc, but it registers a function (for dynamic completion), not a static list of valid values. There's no way to read back "this flag accepts only these values" as data.Possible Directions
Two lightweight options that could close the gap without changing cobra's core:
Option A: Convention over
Annotationspflag's
Annotations map[string][]stringis already extensible. A community convention (or thin helper library) could encode MCP-relevant metadata:The schema converter above would then pick these up. No cobra changes needed — just a convention.
Option B: First-class
Enum/ValidValueson pflagA more ergonomic approach: if pflag's
Flagstruct gained aValidValues []stringfield (or cobra added aMarkFlagEnummethod alongsideMarkFlagRequired), the same data would serve shell completion, validation, and schema generation:This would unify three things that are currently separate: completion, validation, and schema metadata.
Takeaways
yutu mcpis just another subcommand. The MCP server is a globalvar Serverinitialized at the package level, and each resource'sinit()registers tools.pkg/layer is completely interface-agnostic. CLI, MCP, and agent all call the same methods.Flagstruct + cobra annotations. A simpleVisitAllloop can generate ~80% of an MCP schema today.Annotationsconvention — or a newMarkFlagEnumAPI — would close it.I'd love to hear thoughts from the cobra community — has anyone else extended their CLI into an MCP server or agent? Would an
Annotations-based convention or aMarkFlagEnumAPI be useful?