Skip to content

ListRepositories returns all repositories at once with no pagination, leading to memory spikes and timeouts on large projects #6340

@MayankSharmaCSE

Description

@MayankSharmaCSE

Describe the issue

When you call minder repo list (or the equivalent ListRepositories gRPC endpoint) on a project that has a lot of registered repositories, the server goes ahead and fetches every single one and sends them back in a single response. No batching, no pagination, no cursor just one giant payload.

We know this is a problem. There's even a TODO comment right in the code (line 194 in internal/controlplane/handlers_repositories.go) that says:

// TODO: Implement cursor-based pagination using entity IDs // For now, return all results without pagination

The protobuf response actually has a cursor field, but it's always returned as an empty string completely useless to the client. There's also no page_size parameter in the request, so even if a client wanted to chunk the results on their end, they couldn't ask the server for a specific page.

The result is that as your project grows, this endpoint becomes slower and hungrier for memory. At a few hundred repos it's annoying. At a few thousand it's a serious problem the API server can run out of memory, or the CLI just sits there spinning until it times out.

This also means anyone who can register repositories under a project they control has a potential way to stress-test the API server by bulk-registering thousands of repos.

To Reproduce

What you'll need:

  • A running minder server (local or staging)
  • A project with a registered GitHub provider
  • minder CLI (or grpcurl for direct gRPC calls)

Steps:

  1. Set up a test project and enroll a GitHub provider:
    minder provider enroll --name github --type github

  2. Register a large number of repositories — the easiest way is to connect a GitHub App installation that already has access to an org with lots of repos. Minder will ingest them all automatically.
    Or, if you just want to test the API behavior, register a bunch manually:
    for i in $(seq 1 1000); do
    minder repo register --repo "myorg/test-repo-$i"
    done

  3. Call the list endpoint and watch what happens:
    Via CLI:
    minder repo list
    Via gRPC directly:
    grpcurl -plaintext -d '{}' localhost:8090 minder.v1.RepositoryService/ListRepositories

  4. Observe the problems:

- The server hangs for several seconds before returning (or times out if you have enough repos)
- Server memory usage spikes noticeably during the call
- The response's `cursor` field is always an empty string, no indication that more pages exist
- There is no way to request a smaller batch of results
  1. Scale it up to confirm the pain: try with 5,000 repos and notice the latency and memory hit get significantly worse.

What version are you using?

d8b8b4e

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions