|
| 1 | +# E2E Testing for Obsidian Plugin — Notes |
| 2 | + |
| 3 | +## Approaches Considered |
| 4 | + |
| 5 | +### Option 1: Playwright `electron.launch()` |
| 6 | + |
| 7 | +The standard Playwright approach for Electron apps — point `executablePath` at the binary and let Playwright manage the process lifecycle. |
| 8 | + |
| 9 | +**Pros:** |
| 10 | +- First-class Playwright API — `app.evaluate()` runs code in the main process, not just renderer |
| 11 | +- Automatic process lifecycle management (launch, close, cleanup) |
| 12 | +- Access to Electron-specific APIs (e.g., `app.evaluate(() => process.env)`) |
| 13 | +- Well-documented, widely used for Electron testing |
| 14 | + |
| 15 | +**Cons:** |
| 16 | +- **Does not work with Obsidian.** Obsidian's executable is a launcher that loads an `.asar` package (`obsidian-1.11.7.asar`) and forks a new Electron process. Playwright connects to the initial process, which exits, causing `kill EPERM` and connection failures. |
| 17 | +- No workaround without modifying Obsidian's startup or using a custom Electron shell |
| 18 | + |
| 19 | +**Verdict:** Not viable for Obsidian. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +### Option 2: CDP via `chromium.connectOverCDP()` (chosen) |
| 24 | + |
| 25 | +Launch Obsidian as a subprocess with `--remote-debugging-port=9222`, then connect via Chrome DevTools Protocol. |
| 26 | + |
| 27 | +**Pros:** |
| 28 | +- Works with Obsidian's forked process architecture — the debug port is inherited by the child process |
| 29 | +- Full access to renderer via `page.evaluate()` — Obsidian's global `app` object is available |
| 30 | +- Keyboard/mouse interaction works normally |
| 31 | +- Can take screenshots, traces, and use all Playwright assertions |
| 32 | +- Process is managed explicitly — clear control over startup and teardown |
| 33 | + |
| 34 | +**Cons:** |
| 35 | +- No main process access (can't call Electron APIs directly, only renderer-side `window`/`app`) |
| 36 | +- Must manually manage process lifecycle (spawn, pkill, port polling) |
| 37 | +- Fixed debug port (9222) means tests can't run in parallel across multiple Obsidian instances without port management |
| 38 | +- Port polling adds ~2-5s startup overhead |
| 39 | +- `pkill -f Obsidian` in setup is aggressive — kills ALL Obsidian instances, not just test ones |
| 40 | + |
| 41 | +**Verdict:** Works well for PoC. Sufficient for single-worker CI/local testing. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +### Option 3: Obsidian's built-in plugin testing (not explored) |
| 46 | + |
| 47 | +Obsidian has no official testing framework. Some community approaches exist (e.g., `obsidian-jest`, hot-reload-based testing), but none are mature or maintained. |
| 48 | + |
| 49 | +**Verdict:** Not a real option today. |
| 50 | + |
| 51 | +--- |
| 52 | + |
| 53 | +## What We Learned |
| 54 | + |
| 55 | +### Obsidian internals accessible via `page.evaluate()` |
| 56 | +- `app.plugins.plugins["@discourse-graph/obsidian"]` — check plugin loaded |
| 57 | +- `app.vault.getMarkdownFiles()` — list files |
| 58 | +- `app.vault.read(file)` — read file content |
| 59 | +- `app.vault.create(name, content)` — create files |
| 60 | +- `app.workspace.openLinkText(path, "", false)` — open a file in the editor |
| 61 | +- `app.commands.executeCommandById(id)` — could execute commands directly (alternative to command palette UI) |
| 62 | + |
| 63 | +### Plugin command IDs |
| 64 | +Commands are registered with IDs like `@discourse-graph/obsidian:create-discourse-node`. The command palette shows them as "Discourse Graph: Create discourse node". |
| 65 | + |
| 66 | +### Modal DOM structure |
| 67 | +The `ModifyNodeModal` renders React inside Obsidian's `.modal-container`: |
| 68 | +- Node type: `<select>` element (`.modal-container select`) |
| 69 | +- Content: `<input type="text">` (`.modal-container input[type='text']`) |
| 70 | +- Confirm: `<button class="mod-cta">` |
| 71 | + |
| 72 | +### Vault configuration |
| 73 | +Minimum config for plugin to load: |
| 74 | +- `.obsidian/community-plugins.json` → `["@discourse-graph/obsidian"]` |
| 75 | +- `.obsidian/app.json` → `{"livePreview": true}` (restricted mode must be off, but this is handled by Obsidian detecting the plugins dir) |
| 76 | +- Plugin files (`main.js`, `manifest.json`, `styles.css`) in `.obsidian/plugins/@discourse-graph/obsidian/` |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## Proposal: Full Agentic Testing Flow |
| 81 | + |
| 82 | +### Goal |
| 83 | +AI coding agents (Cursor, Claude Code) can run `pnpm test:e2e` after making changes to automatically verify features work end-to-end. The test suite should be comprehensive enough to catch regressions, fast enough to run frequently, and deterministic enough to trust the results. |
| 84 | + |
| 85 | +### Phase 1: Stabilize the PoC (current state + hardening) |
| 86 | + |
| 87 | +**Isolation improvements:** |
| 88 | +- Use a unique temp directory per test run (`os.tmpdir()`) instead of a fixed `test-vault/` path to avoid stale state |
| 89 | +- Use a random debug port to allow parallel runs |
| 90 | +- Replace `pkill -f Obsidian` with tracking the specific child PID — parse it from `lsof -i :<port>` after launch |
| 91 | +- Add a global setup/teardown in Playwright config to manage the single Obsidian instance across all tests |
| 92 | + |
| 93 | +**Reliability improvements:** |
| 94 | +- Replace `waitForTimeout()` calls with proper waitFor conditions (e.g., `waitForSelector`, `waitForFunction`) |
| 95 | +- Add retry logic for CDP connection (currently fails hard on timeout) |
| 96 | +- Add a `test.beforeEach` that resets vault state (delete all non-config files) instead of full vault recreation |
| 97 | + |
| 98 | +### Phase 2: Expand test coverage |
| 99 | + |
| 100 | +**Core plugin features to test:** |
| 101 | +- Create each discourse node type (Question, Claim, Evidence, Source) |
| 102 | +- Verify frontmatter (`nodeTypeId`) is set correctly |
| 103 | +- Verify file naming conventions (e.g., `QUE - `, `CLM - `, `EVD - `, `SRC - `) |
| 104 | +- Open node type menu via hotkey (`Cmd+\`) |
| 105 | +- Discourse context view toggle |
| 106 | +- Settings panel opens and renders |
| 107 | + |
| 108 | +**Vault-level tests:** |
| 109 | +- Create multiple nodes and verify they appear in file explorer |
| 110 | +- Verify node format regex matching (files follow the format pattern) |
| 111 | + |
| 112 | +**Use `app.commands.executeCommandById()` as the primary way to trigger commands** — faster, more reliable, and avoids flaky command palette typing. Reserve command palette tests for testing the palette itself. |
| 113 | + |
| 114 | +### Phase 3: Agentic integration |
| 115 | + |
| 116 | +**For agents to use the tests effectively:** |
| 117 | + |
| 118 | +1. **Fast feedback loop** — Tests should complete in <30s total. Current PoC is ~14s for 2 tests, which is good. Keep Obsidian running between test files using Playwright's `globalSetup`/`globalTeardown`. |
| 119 | + |
| 120 | +2. **Clear error messages** — When a test fails, the agent needs to understand WHY. Add descriptive assertion messages: |
| 121 | + ```ts |
| 122 | + expect(pluginLoaded, "Plugin should be loaded — check dist/ is built and plugin ID matches manifest.json").toBe(true); |
| 123 | + ``` |
| 124 | + |
| 125 | +3. **Screenshot-on-failure for visual debugging** — Already configured. Consider adding `page.screenshot()` at key checkpoints even on success, so agents can visually verify state. |
| 126 | + |
| 127 | +4. **Test file organization** — One test file per feature area: |
| 128 | + ``` |
| 129 | + e2e/tests/ |
| 130 | + ├── plugin-load.spec.ts # Plugin loads, settings exist |
| 131 | + ├── node-creation.spec.ts # Create each node type |
| 132 | + ├── command-palette.spec.ts # Command palette interactions |
| 133 | + ├── discourse-context.spec.ts # Context view, relations |
| 134 | + └── settings.spec.ts # Settings panel |
| 135 | + ``` |
| 136 | + |
| 137 | +5. **CI integration** — Run in GitHub Actions with a macOS runner. Obsidian would need to be pre-installed on the runner (or downloaded in a setup step). This is the biggest open question — Obsidian doesn't have a headless mode, so CI would need `xvfb` or a virtual display. |
| 138 | + |
| 139 | +6. **Agent-executable test commands:** |
| 140 | + ```bash |
| 141 | + pnpm test:e2e # run all tests |
| 142 | + pnpm test:e2e -- --grep "node creation" # run specific tests |
| 143 | + pnpm test:e2e:ui # interactive Playwright UI (for humans) |
| 144 | + ``` |
| 145 | + |
| 146 | +### Phase 4: Advanced (future) |
| 147 | + |
| 148 | +- **Visual regression testing** — Compare screenshots against baselines to catch UI regressions |
| 149 | +- **Obsidian version matrix** — Test against multiple Obsidian versions (download different `.asar` files) |
| 150 | +- **Headless mode wrapper** — Investigate running Obsidian with `--disable-gpu --headless` flags (may not work due to Obsidian's renderer requirements) |
| 151 | +- **Test data fixtures** — Pre-built vaults with specific node/relation configurations for testing complex scenarios |
| 152 | +- **Performance benchmarks** — Measure plugin load time, command execution time |
| 153 | + |
| 154 | +### Open Questions |
| 155 | + |
| 156 | +1. **CI runner setup** — How to install Obsidian on GitHub Actions macOS runners? Is there a `.dmg` download URL that's stable? Or do we cache the `.app` bundle? |
| 157 | +2. **Obsidian updates** — Obsidian auto-updates the `.asar`. Should tests pin a specific version? How to prevent auto-update during test runs? |
| 158 | +3. **Multiple vaults** — Obsidian tracks known vaults globally. Test vaults may accumulate in Obsidian's vault list. Need cleanup strategy. |
| 159 | +4. **Restricted mode** — The PoC doesn't explicitly disable restricted mode via config. The plugin loads because the `community-plugins.json` file is present, but a fresh Obsidian install might prompt the user to enable community plugins. Need to investigate if there's a config flag to skip this. |
0 commit comments