Skip to content

Latest commit

 

History

History
475 lines (373 loc) · 14.2 KB

File metadata and controls

475 lines (373 loc) · 14.2 KB

Tabular Repository Implementations

A collection of storage implementations for tabular data with multiple backend support. Provides consistent CRUD operations, search capabilities, and event monitoring across different storage technologies.

Features

  • Multiple storage backends:
    • In-memory (for testing/caching)
    • SQLite (embedded database)
    • PostgreSQL (relational database)
    • IndexedDB (browser storage)
    • Filesystem (JSON file per record)
  • Type-safe schema definitions
  • Compound primary keys support
  • Indexing for efficient search
  • Event-driven architecture
  • Cross-implementation test suite

Installation

bun add @workglow/storage
# or
npm install @workglow/storage

Basic Usage

import { InMemoryTabularStorage } from "@workglow/storage/tabular";

// Define schema and primary keys
const schema = {
  id: "string",
  name: "string",
  age: "number",
  active: "boolean",
} as const;

const primaryKeys = ["id"] as const;
// Create repository instance (when using const schemas, the next three generics
// on InMemoryTabularStorage are automatically created for you)
const repo = new InMemoryTabularStorage<typeof schema, typeof primaryKeys>(schema, primaryKeys);

// Basic operations
await repo.put({ id: "1", name: "Alice", age: 30, active: true });
const result = await repo.get({ id: "1" });
await repo.delete({ id: "1" });

Schema Definitions

You can define schemas using plain JSON Schema objects, or use schema libraries like TypeBox or Zod 4 to create them. All schemas must be compatible with DataPortSchemaObject from @workglow/util.

Note: When using TypeBox or Zod schemas, you must explicitly provide the generic type parameters to the repository constructor, as TypeScript cannot infer them from non-const schema definitions.

Using TypeBox

TypeBox schemas are JSON Schema compatible and can be used directly:

import { InMemoryTabularStorage } from "@workglow/storage/tabular";
import { Type, Static } from "@sinclair/typebox";
import { DataPortSchemaObject, FromSchema } from "@workglow/util";

// Define schema using TypeBox
const userSchema = Type.Object({
  id: Type.String({ format: "uuid" }),
  name: Type.String({ minLength: 1, maxLength: 100 }),
  email: Type.String({ format: "email" }),
  age: Type.Optional(Type.Number({ minimum: 0, maximum: 150 })),
  active: Type.Boolean({ default: true }),
}) satisfies DataPortSchemaObject;

// Infer TypeScript types from schema
type User = FromSchema<typeof userSchema>;
// => { id: string; name: string; email: string; age?: number; active: boolean }

const primaryKeys = ["id"] as const;

type UserEntity = FromSchema<typeof userSchema>;

// IMPORTANT: You must explicitly provide generic type parameters for t
// TypeScript cannot infer them from TypeBox schemas
const repo = new InMemoryTabularStorage<typeof userSchema, typeof primaryKeys, UserEntity>(
  userSchema,
  primaryKeys,
  ["email", "active"] as const // Indexes
);

// Use with type safety
await repo.put({
  id: "550e8400-e29b-41d4-a716-446655440000",
  name: "Alice",
  email: "alice@example.com",
  age: 30,
  active: true,
});

Using Zod 4

Zod 4 has built-in JSON Schema support using the .toJSONSchema() method:

import { InMemoryTabularStorage } from "@workglow/storage/tabular";
import { z } from "zod";
import { DataPortSchemaObject } from "@workglow/util";

// Define schema using Zod
const userSchemaZod = z.object({
  id: z.string().uuid(),
  name: z.string().min(1).max(100),
  email: z.string().email(),
  age: z.number().min(0).max(150).optional(),
  active: z.boolean().default(true),
});

// Convert Zod schema to JSON Schema using built-in method
const userSchema = userSchemaZod.toJSONSchema() as DataPortSchemaObject;
const primaryKeys = ["id"] as const;

// Define computed types for the repository generics
type UserEntity = z.infer<typeof userSchemaZod>;

// IMPORTANT: You must explicitly provide generic type parameters
// TypeScript cannot infer them from Zod schemas (even after conversion)
const repo = new InMemoryTabularStorage<typeof userSchema, typeof primaryKeys, UserEntity>(
  userSchema,
  primaryKeys,
  ["email", "active"] as const // Indexes
);

// Use with type safety
await repo.put({
  id: "550e8400-e29b-41d4-a716-446655440000",
  name: "Alice",
  email: "alice@example.com",
  age: 30,
  active: true,
});

Auto-Generated Primary Keys

TabularStorage supports automatic generation of primary keys, allowing the storage backend to generate IDs when entities are inserted without them. This is useful for:

  • Security: Preventing clients from choosing arbitrary IDs
  • Simplicity: No need to generate IDs client-side
  • Database features: Leveraging native auto-increment and UUID generation

Schema Configuration

Mark a primary key column as auto-generated using the x-auto-generated: true annotation:

const UserSchema = {
  type: "object",
  properties: {
    id: { type: "integer", "x-auto-generated": true }, // Auto-increment
    name: { type: "string" },
    email: { type: "string" },
  },
  required: ["id", "name", "email"],
  additionalProperties: false,
} as const satisfies DataPortSchemaObject;

const DocumentSchema = {
  type: "object",
  properties: {
    id: { type: "string", "x-auto-generated": true }, // UUID
    title: { type: "string" },
    content: { type: "string" },
  },
  required: ["id", "title", "content"],
  additionalProperties: false,
} as const satisfies DataPortSchemaObject;

Generation Strategy (inferred from column type):

  • type: "integer" → Auto-increment (SERIAL, INTEGER PRIMARY KEY, counter)
  • type: "string" → UUID via uuid4() from @workglow/util

Constraints:

  • Only the first column in a compound primary key can be auto-generated
  • Only one column can be auto-generated per table

Basic Usage

import { InMemoryTabularStorage } from "@workglow/storage/tabular";

const userStorage = new InMemoryTabularStorage(UserSchema, ["id"] as const);
await userStorage.setupDatabase();

// Insert without providing ID - it will be auto-generated
const user = await userStorage.put({
  name: "Alice",
  email: "alice@example.com",
});
console.log(user.id); // 1 (auto-generated)

// TypeScript enforces: id is optional on insert, required on returned entity

Client-Provided Keys Configuration

Control whether clients can provide values for auto-generated keys:

const storage = new PostgresTabularStorage(
  db,
  "users",
  UserSchema,
  ["id"] as const,
  [], // indexes
  { clientProvidedKeys: "if-missing" } // configuration
);

Options:

Setting Behavior Use Case
"if-missing" (default) Use client value if provided, generate otherwise Flexible - supports both auto-generation and client-specified IDs
"never" Always generate, ignore client values Maximum security - never trust client IDs
"always" Require client to provide value Testing/migration - enforce client-side ID generation

Examples:

// Default: "if-missing" - flexible
const flexibleStorage = new InMemoryTabularStorage(UserSchema, ["id"] as const);

// Without ID - auto-generated
await flexibleStorage.put({ name: "Bob", email: "bob@example.com" });

// With ID - uses client value
await flexibleStorage.put({ id: 999, name: "Charlie", email: "charlie@example.com" });

// Secure mode: "never" - always generate
const secureStorage = new PostgresTabularStorage(db, "users", UserSchema, ["id"] as const, [], {
  clientProvidedKeys: "never",
});

// Even if client provides id, it will be ignored and regenerated
const result = await secureStorage.put({
  id: 999, // Ignored!
  name: "Diana",
  email: "diana@example.com",
});
// result.id will be database-generated, NOT 999

// Testing mode: "always" - require client ID
const testStorage = new InMemoryTabularStorage(UserSchema, ["id"] as const, [], {
  clientProvidedKeys: "always",
});

// Must provide ID or throws error
await testStorage.put({
  id: 1,
  name: "Eve",
  email: "eve@example.com",
}); // OK

await testStorage.put({
  name: "Frank",
  email: "frank@example.com",
}); // Throws Error!

Backend-Specific Behavior

Each storage backend implements auto-generation differently:

Backend Integer (autoincrement) String (UUID)
InMemoryTabularStorage Internal counter (1, 2, 3...) uuid4() from @workglow/util
SqliteTabularStorage INTEGER PRIMARY KEY AUTOINCREMENT uuid4() client-side
PostgresTabularStorage SERIAL/BIGSERIAL gen_random_uuid() database-side
SupabaseTabularStorage SERIAL gen_random_uuid() database-side
IndexedDbTabularStorage autoIncrement: true uuid4() client-side
FsFolderTabularStorage Internal counter uuid4() from @workglow/util

Constraints

  1. Only first column: Only the first primary key column can be auto-generated
  2. Single auto-gen key: Only one column per table can be auto-generated
  3. Type inference: Generation strategy is inferred from column type (integer → autoincrement, string → UUID)

Type Safety

TypeScript enforces correct usage through the type system:

// Auto-generated key is OPTIONAL on insert
const entity = { name: "Alice", email: "alice@example.com" };
await storage.put(entity); // ✅ OK - id can be omitted

// Returned entity has ALL fields REQUIRED
const result = await storage.put(entity);
const id: number = result.id; // ✅ OK - id is guaranteed to exist

Implementations

InMemoryTabularStorage

  • Ideal for testing/development
  • No persistence
  • Fast search capabilities
const repo = new InMemoryTabularStorage<
  typeof schema,
  typeof primaryKeys,
  Entity, // required if using TypeBox, Zod, etc, otherwise automatically created
  PrimaryKeyEntity, // should be automatically created
  ValueEntity // should be automatically created
>(schema, primaryKeys, ["name", "active"]);

SqliteTabularStorage

  • Embedded SQLite database
  • File-based or in-memory

Call await Sqlite.init() once (from @workglow/storage/sqlite or workglow) before opening a database by path or constructing new Sqlite.Database(...).

import { SqliteTabularStorage } from "@workglow/storage";
import { Sqlite } from "@workglow/storage/sqlite";

await Sqlite.init();

const repo = new SqliteTabularStorage<
  typeof schema,
  typeof primaryKeys,
  Entity, // required if using TypeBox, Zod, etc, otherwise automatically created
  PrimaryKeyEntity, // should be automatically created
  ValueEntity // should be automatically created
>(
  ":memory:", // Database path
  "users", // Table name
  schema,
  primaryKeys,
  [["name", "active"], "age"] as const // Indexes
);

PostgresTabularStorage

  • PostgreSQL backend
  • Connection pooling support
import type { Pool } from "@workglow/storage/postgres";

const pool = new Pool({
  /* config */
});
const repo = new PostgresTabularStorage<
  typeof schema,
  typeof primaryKeys,
  Entity, // required if using TypeBox, Zod, etc, otherwise automatically created
  PrimaryKeyEntity, // should be automatically created
  ValueEntity // should be automatically created
>(
  pool, // postgres connection pool
  "users",
  schema,
  primaryKeys,
  [["name", "active"], "age"] as const
);

IndexedDbTabularStorage

  • Browser-based storage
  • Automatic schema migration
const repo = new IndexedDbTabularStorage<
  typeof schema,
  typeof primaryKeys,
  Entity, // required if using TypeBox, Zod, etc, otherwise automatically created
  PrimaryKeyEntity, // should be automatically created
  ValueEntity // should be automatically created
>(
  "user_db", // Database name
  schema,
  primaryKeys,
  [["name", "active"], "age"] as const
);

FsFolderTabularStorage

  • Filesystem storage (one JSON file per record)
  • Simple persistence format
const repo = new FsFolderTabularStorage<
  typeof schema,
  typeof primaryKeys,
  Entity, // required if using TypeBox, Zod, etc, otherwise automatically created
  PrimaryKeyEntity, // should be automatically created
  ValueEntity // should be automatically created
>("./data/users", schema, primaryKeys);

Events

All implementations emit events:

  • put: When a record is created/updated
  • get: When a record is retrieved
  • delete: When a record is deleted
  • clearall: When all records are deleted
  • search: When a search is performed
repo.on("put", (entity) => {
  console.log("Record stored:", entity);
});

repo.on("delete", (key) => {
  console.log("Record deleted:", key);
});

Testing

The implementations share a common test suite. To run tests:

bun test

Test includes:

  • Basic CRUD operations
  • Compound key handling
  • Index-based search
  • Event emission
  • Concurrency tests

License

Apache 2.0