fix(plugin-multi-tenant): forward req in afterTenantDelete cascade by philipdaveby · Pull Request #16505 · payloadcms/payload

philipdaveby · 2026-05-06T13:38:49Z

What?

@payloadcms/plugin-multi-tenant's afterTenantDelete hook (packages/plugin-multi-tenant/src/hooks/afterTenantDelete.ts) receives req from the caller but does not forward it into the three internal Payload calls inside the hook:

req.payload.delete({ collection: slug, where: {...} }) — per tenant-scoped collection (cascade)
req.payload.find({ collection: usersSlug, ... }) — users-with-tenant lookup
req.payload.update({ collection: usersSlug, ... }) — per affected user

Each call therefore grabs a fresh pool connection and runs outside the caller's transaction.

Why?

When the caller (e.g. a Kafka consumer that opens a transaction to delete a tenant) holds row locks via its open tx, the cascade's sibling tx tries to acquire the same locks and blocks. The parent awaits the cascade's promise; the cascade waits on the parent's locks; Postgres can't detect the cycle because the parent connection is at ClientRead (waiting on Node, not on a Lock) — producing multi-minute idle in transaction leaks visible in pg_stat_activity.

The leak shape we observed:

```
pid X (idle in tx, ClientRead) — last query:
DELETE FROM payload_preferences WHERE key IN ($1)
pid Y (active, blocked on transactionid Lock) — query:
DELETE FROM static_assets WHERE id = $1
```

pid X is the parent (its last visible statement is Payload's deleteUserPreferences cleanup at the end of the deleteByID flow, right before the multi-tenant afterDelete hook fires). pid Y is one of the cascade's sibling txs. The cycle persists until something external (idle_in_transaction_session_timeout, NAT eviction, or a process-level watchdog) terminates a connection.

Beyond fixing the deadlock, sharing the tx also restores transactional consistency: a half-failed cascade no longer commits "tenant deleted, child rows orphaned" or "child rows deleted, tenant still present".

How?

Pass the caller's req into all three Payload calls so the cascade and the users-cleanup share the parent's transactionID. Three single-line additions:

```diff
cleanupPromises.push(
req.payload.delete({
collection: slug,

 req,
 where: { [tenantFieldName]: { in: [id] } },

}),
)
```

```diff
const usersWithTenant = (await req.payload.find({
collection: usersSlug,
depth: 0,
limit: 0,

req,
where: { ... },
})) as PaginatedDocs
```

```diff
cleanupPromises.push(
req.payload.update({
id: user.id,
collection: usersSlug,
data: { ... },

```
   req,
 }),
```
)
```

Reproduction notes

The bug has been present in every shipped 3.x version. Verified by extracting dist/hooks/afterTenantDelete.js from npm tarballs for 3.30.0, 3.50.0, 3.64.0 (current popular pin), and 3.84.1 (latest at time of writing) — all identical.
Triggers when:
1. The tenant collection is deleted from a code path that has its own open transaction (Kafka consumer, custom endpoint, etc.).
2. The tenant has child records in any tenant-scoped collection.
3. The cascade's payload.delete touches a row that the parent's tx has already touched (or vice versa).
Effectively never observed via the admin UI delete flow because that path doesn't open a custom outer tx, so the cascade's sibling tx doesn't race itself. We hit it through a Kafka consumer that calls payload.delete(tenantsSlug, id, req) after payload.db.beginTransaction().

Test plan

Existing plugin-multi-tenant tests continue to pass.
Suggest adding a regression test: invoke addTenantCleanup with a req carrying a transactionID, assert the inner payload.delete/find/update calls receive the same transactionID (e.g. by spying on the payload mock).

I'm happy to add the regression test if helpful — let me know which test file you'd prefer it in (packages/plugin-multi-tenant/test/ looks like the right place, but I haven't audited the existing test layout).

Risk

Minimal. The change only thread the existing req through. Behaviour is unchanged when the caller doesn't open a transaction (req.transactionID is undefined and Payload's adapter treats it as a plain pool acquisition).

The afterTenantDelete hook receives `req` from the caller (which carries the parent transaction's `transactionID`) but does not forward it into its three internal Payload calls — the cascade `payload.delete` per tenant-scoped collection, the users-find, and the users-update. Each cascade call therefore grabs a fresh pool connection and runs OUTSIDE the caller's transaction. When the caller (e.g. a Kafka consumer that opens a transaction to delete a tenant) holds row locks via its open tx, the cascade's sibling tx tries to acquire the same locks and blocks. The parent awaits the cascade's promise; the cascade waits on the parent's locks; PostgreSQL can't detect the cycle because the parent connection is at `ClientRead` (waiting on Node, not on a Lock) — producing multi-minute idle-in-transaction leaks visible in `pg_stat_activity`. We hit this in our staging/dev environment when deleting a tenant (QPortal instance) via a Kafka consumer that opens its own tx and calls `payload.delete(tenantsSlug, id, req)`. The leak shape is: pid X (idle in tx, ClientRead) — last query DELETE FROM payload_preferences WHERE key IN ($1) pid Y (active, blocked on transactionid Lock) — query DELETE FROM static_assets WHERE id = $1 pid X is the parent (its last visible statement is Payload's `deleteUserPreferences` cleanup at the end of the deleteByID flow, right before the multi-tenant `afterDelete` hook fires). pid Y is one of the cascade's sibling txs. Postgres sees the lock cycle but reports the parent as `ClientRead` rather than `Lock`, so the cycle isn't broken automatically — the deadlock persists until something external (idle_in_transaction_session_timeout, NAT eviction, or our own process-level watchdog) terminates a connection. Fix: pass the caller's `req` into all three Payload calls in afterTenantDelete so the cascade and the users-cleanup share the parent's transactionID. Three single-line additions. The bug has been present in every shipped 3.x version (verified by extracting `dist/hooks/afterTenantDelete.js` from npm tarballs for 3.30.0, 3.50.0, 3.64.0, 3.84.1). Beyond fixing the deadlock, sharing the tx also restores transactional consistency: a half-failed cascade no longer commits "tenant deleted, child rows orphaned" or "child rows deleted, tenant still present".

philipdaveby requested a review from JarrodMFlesch as a code owner May 6, 2026 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plugin-multi-tenant): forward req in afterTenantDelete cascade#16505

fix(plugin-multi-tenant): forward req in afterTenantDelete cascade#16505
philipdaveby wants to merge 1 commit intopayloadcms:mainfrom
philipdaveby:fix/multi-tenant-cascade-forward-req

philipdaveby commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

philipdaveby commented May 6, 2026

What?

Why?

How?

Reproduction notes

Test plan

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant