[JPA + Spring Mastery 08] Transaction Split Patterns — Saga / Outbox / REQUIRES_NEW, from academic origins to a 9-scenario EXP-09b measurement

Open Table of contents

Preface
1. The incident — EXP-09’s pool exhaustion, plain split as a starting point
2. PROPAGATION 7 — Spring’s transaction-propagation semantics
3. The limits of 2PC (XA) — why microservices can’t use it
4. Saga — Garcia-Molina 1987 origin
5. Outbox — Helland CIDR 2005 origin
6. EXP-09b 9-scenario matrix — the measurements that compare the patterns
7. Two latencies, separated — user response vs completion
8. Domain mapping — payments to Saga, notifications to Outbox
9. @TransactionalEventListener — Spring’s commit-after hook
10. Conclusion — transaction split through a senior lens
11. References

Preface

The prior measurement (EXP-09) showed the pitfall of calling external APIs inside a transaction, line by line. With HikariCP pool=10, extDelay=3,000ms, concurrent=60, and connection-timeout=1s, only 16.7% of requests passed; 50 calls hit SQLTimeoutException. The pool-occupation time stretched in lockstep with the external call.

The cure is obvious: move the external call outside the transaction. The how is the hard part. StackOverflow answers scatter across plain split (call externally first, then INSERT), Saga’s compensating transactions, Transactional Outbox, and more. Few articles answer which pattern fits which domain with measurements.

So I ran a 9-scenario matrix — three patterns (A: plain split / B: Saga / C: Outbox) × three chaos modes (OFF / DB_FAIL / EXT_FAIL). The result: the three patterns carry fundamentally different trade-offs. Plain split lets external success diverge from local DB failure — 60 calls to the external PG, 0 rows in our DB. Saga’s triple safety net (worker compensation + sweeper) keeps consistency. Outbox shortens user-visible response by 41×, at the cost of making completion 30× slower.

These patterns are not new. Saga lives in Garcia-Molina’s 1987 ACM SIGMOD paper. Outbox lives in Pat Helland’s 2005 CIDR paper Data on the Outside vs Data on the Inside. Korean tech blogs (Toss SLASH24, 29CM, Ridi) have written operational reflections on these. Stitching academic → operational → my own measurement together makes it line-by-line clear why each pattern fits its domain.

This article is that three-layer record:

The incident — EXP-09’s pool exhaustion; plain split alone is not enough
PROPAGATION 7 — REQUIRED / REQUIRES_NEW / NESTED / SUPPORTS / MANDATORY / NOT_SUPPORTED / NEVER, with their precise meanings
The limits of 2PC (XA) — why microservices can’t use it; Helland’s Life Beyond Distributed Transactions
Saga — Garcia-Molina 1987 — Choreography vs Orchestration / compensating transactions
Outbox — Helland CIDR 2005 — academic origin + Korean operational reports
EXP-09b 9-scenario measurement matrix — A/B/C × OFF/DB_FAIL/EXT_FAIL
Two latencies, separated — user response vs processing completion (different metrics)
Domain mapping — payments → Saga, notifications → Outbox, cache only → plain split
@TransactionalEventListener — AFTER_COMMIT / BEFORE_COMMIT precise semantics + Spring source

The headline:

Plain split alone is insufficient — A/DB_FAIL [measurement] ends with 60 mismatches. Wrong fit for payments / orders.
Saga’s triple safety net — B/EXT_FAIL [measurement]: worker compensates 60. B/DB_FAIL [measurement]: sweeper recovers 60. The two safety nets together preserve consistency.
Outbox carries three latencies — ACK 72ms / completion avg 92,573ms / max 181,935ms. 41× faster user response, at the cost of 30× slower completion.
Payments → Saga, notifications → Outbox — Outbox is for “ACK is enough” domains; Saga is for “completion-consistency must hold” domains.
2PC won’t work in microservices — Helland’s CIDR 2007 makes the case: blocking commit halts the system under availability/partition stress.

The maxim “just split the transaction” is half an answer. Three layers — academic, operational, measured — get the rest of it.

1. The incident — EXP-09’s pool exhaustion, plain split as a starting point

1.1 EXP-09’s two runs

The prior measurement (EXP-09) examined the pool-exhaustion mechanism in two runs.

Run #1 — silent latency blowup (timeout 5,000ms / concurrent 30 / extDelay 2,000ms):

30/30 success (100%)
P50 / P90 / P99 = 2,200 / 4,300 / 6,350 ms
Pool stats: active=10 / awaiting=20 (sustained ~6s)
A success-rate-only monitor would call this “healthy”

Run #2 — fail-fast (timeout 1,000ms / concurrent 60 / extDelay 3,000ms):

10/60 success (16.7%)
50 SQLTimeoutException
Theory check: pool / extDelay = 10/3s = 3.33 req/s ≈ measured 3.03 req/s (91%)

Same exhaustion; the connection-timeout decides the system character. Operations sees opposite signals — Run #1 looks slow, Run #2 looks broken.

1.2 The intuitive fix — pull external calls out of the transaction

// Before — EXP-09's anti-pattern
@Transactional
public void process(OrderRequest req) {
    Order order = repo.save(req);          // INSERT
    paymentClient.charge(order.getId());   // external PG (3s)
    // pool occupied = INSERT + external + commit ≈ 3,005ms
}

// After — plain split (pattern A)
public void process(OrderRequest req) {
    paymentClient.charge(req.getId());     // external first (outside Tx)
    saveOrder(req);                         // new transaction
}

@Transactional
public void saveOrder(OrderRequest req) {
    repo.save(req);                         // INSERT only
    // pool occupied ≈ 5ms
}

Is pattern A the answer? The matrix in EXP-09b says: it depends.

1.3 EXP-09b — the 9-scenario matrix’s key finding

#	Pattern	chaos	OK	Inconsist.	Compen.	ExtFail	Meaning
1	A	OFF	60	0	0	0	normal path
2	A	DB_FAIL	0	60	0	0	60 charged externally / 0 saved locally ⚠️
3	A	EXT_FAIL	0	0	0	60	external failed — no losses
4	B	OFF	60	0	0	0	Saga normal
5	B	DB_FAIL	0	0	0	60 (sweeper)	sweeper cleans 60 ✅
6	B	EXT_FAIL	0	0	60	0	worker compensates 60 ✅
7	C	OFF	60 (ACK)	0	0	0	Outbox normal
8	C	DB_FAIL	60 (ACK)	0	0	0	Outbox auto-retry
9	C	EXT_FAIL	60 (ACK)	0	0	0	Outbox auto-retry

The key finding: scenario 2 — A/DB_FAIL — produces 60 external charges with 0 local rows. Plain split alone breaks consistency on the external OK / local DB failure failure mode. Wrong fit for payments / orders.

The remainder of this article unpacks why — academically and in measurements. First, the precise meanings of PROPAGATION’s seven values.

2. PROPAGATION 7 — Spring’s transaction-propagation semantics

The seven values look enum-like, but each is a different transactional model. Spring 6’s AbstractPlatformTransactionManager handles them as a 7-way branch in handleExistingTransaction.

2.1 The semantics, precise

Propagation	With existing Tx	Without existing Tx	Behavior
`REQUIRED` (default)	join	start new	the most common choice
`REQUIRES_NEW`	suspend + start new	start new	separate connection
`NESTED`	create savepoint	start new	JDBC 3.0 savepoint
`SUPPORTS`	join	run without Tx	optional
`MANDATORY`	join	error	enforced
`NOT_SUPPORTED`	suspend + run without Tx	run without Tx	avoid Tx
`NEVER`	error	run without Tx	forbid Tx

2.2 REQUIRED vs REQUIRES_NEW — the decisive difference

@Service
public class OrderService {
    @Transactional  // PROPAGATION_REQUIRED
    public void process() {
        repo.save(order);
        notificationService.send(order);  // joins same Tx
        // notificationService throwing? — this Tx rolls back too
    }
}

@Service
public class NotificationService {
    @Transactional(propagation = REQUIRES_NEW)  // separate Tx
    public void send(Order order) {
        // separate connection / commit / rollback
        // exceptions here don't affect OrderService Tx
    }
}

REQUIRED’s inner throwing creates the Woowahan why is this rolling back? incident pattern. REQUIRES_NEW keeps the failure isolated. (See series article 7, §8 for the self-invocation combination.)

2.3 NESTED’s real meaning — Savepoints

NESTED is not a new transaction. It is a JDBC 3.0 Savepoint — same connection, same transaction, with a partial-rollback marker.

-- NESTED's SQL sequence
BEGIN;
INSERT INTO orders ...;        -- outer's work
SAVEPOINT nested_1;             -- enter inner
INSERT INTO order_items ...;   -- inner's work
ROLLBACK TO SAVEPOINT nested_1; -- roll back inner only
INSERT INTO orders ...;         -- outer continues
COMMIT;                          -- outer commits (orders persisted)

The point: NESTED commits as a whole at the outer commit point. Inner does not commit independently. It is the wrong fit for payments — you cannot commit at the external-call boundary.

2.4 Operational traps

Trap	Mechanism	Workaround
`REQUIRED` rollback-only	inner throws → status flagged → outer commit hits `UnexpectedRollbackException`	switch inner to `REQUIRES_NEW`
`REQUIRES_NEW` self-invocation	same-class call → proxy bypassed → new Tx never starts	extract a separate bean (article 7)
`NESTED` misunderstanding	mistaken for separate Tx → expect external-system commit	switch to `REQUIRES_NEW`
`SUPPORTS` ⇄ `NOT_SUPPORTED`	reverse behavior — `SUPPORTS` joins existing, `NOT_SUPPORTED` suspends	one-line guide: use existing Tx if present? → SUPPORTS / don’t? → NOT_SUPPORTED

These seven are Spring’s abstraction. In the database it boils down to connection / transaction / savepoint combinations. §3 shows how that abstraction breaks in distributed environments.

3. The limits of 2PC (XA) — why microservices can’t use it

3.1 How 2PC works

Java EE’s JTA standardized 2-Phase Commit (2PC) as the way to coordinate distributed transactions:

sequenceDiagram
    participant TC as Transaction Coordinator
    participant DB1 as DB-1
    participant DB2 as DB-2
    participant External as External System

    Note over TC,External: Phase 1: Prepare
    TC->>DB1: prepare()
    DB1-->>TC: ready
    TC->>DB2: prepare()
    DB2-->>TC: ready
    TC->>External: prepare()
    External-->>TC: ready

    Note over TC,External: Phase 2: Commit (only if all ready)
    TC->>DB1: commit()
    TC->>DB2: commit()
    TC->>External: commit()

Phase 1 — Prepare: the coordinator asks all participants whether they can commit. Once all reply “ready”, no one can abort.

Phase 2 — Commit: all commit. Or if any participant fails, all roll back.

3.2 Pat Helland’s critique — Life Beyond Distributed Transactions (CIDR 2007)

Pat Helland’s CIDR 2007 paper makes the case that 2PC does not scale to large systems.

“In production systems, distributed transactions don’t seem to be needed because applications are designed without them. In the absence of distributed transactions, application designers focus on lower forms of consistency.” — Pat Helland, Life Beyond Distributed Transactions, CIDR 2007

The thrust:

Cost of blocking — between Prepare and Commit/Abort, every participant is blocked. If the coordinator dies, everyone hangs indefinitely. Under availability or partition stress, the system halts.
Scalability ceiling — N participants → N² messages, N round trips. 100 microservices means 100×100 = 10,000 messages and 100× round trips.
Operational cost — the coordinator’s persistent recovery log is a single point of failure with operational cost beyond ordinary RDB transaction logs.
The P in CAP — under partition, 2PC sacrifices availability. Microservices treat partitions as ordinary; full rollback per partition isn’t viable.

3.3 Last Resource Gambit — XA 1.3’s hack

The JTA 1.3 spec defines a Last Resource Gambit — treat one resource as non-XA to reduce overhead. It carries a critical-bug risk — coordinator crash leaves the Last Resource’s commit/rollback in an unknown state.

In production this pattern often loses money. It’s the reason Helland recommends “lower forms of consistency”.

3.4 The conclusion — eventually consistent over 2PC

Werner Vogels’ ACM Queue 2008 article names the standard:

“Several inconsistency models exist: causal consistency, read-your-writes consistency, session consistency, monotonic read consistency, monotonic write consistency. Eventual consistency is the most relaxed of these.”

For the microservices era, transactions are eventually consistent. Saga (Garcia-Molina 1987) and Outbox (Helland 2005) sit on top of that model.

4. Saga — Garcia-Molina 1987 origin

4.1 Academic origin

Saga’s origin is Hector Garcia-Molina, Kenneth Salem — Sagas (ACM SIGMOD 1987).

“A saga is a long lived transaction that can be written as a sequence of transactions that can be interleaved with other transactions. Each transaction in the sequence is associated with a compensating transaction that semantically undoes its effects.” — Garcia-Molina, Salem, Sagas, ACM SIGMOD 1987

The definition:

A saga is a sequence of transactions — T1 → T2 → … → Tn
Each Ti has a compensating transaction Ci — semantically undoing Ti
Compensation is not commutative — Ci doesn’t fully erase Ti’s effects (e.g., a sent notification can’t be unsent)

4.2 Two flavors — Choreography vs Orchestration

Microsoft’s Saga pattern docs summarize the two variants.

(1) Choreography — each service listens for events and emits the next event after its step:

[Order Service] -- OrderCreated --> [Payment Service]
                                      ↓ (charge OK)
                              -- PaymentCharged --> [Inventory]
                                                     ↓ (reserve OK)
                                              -- InventoryReserved --> [Shipping]

(2) Orchestration — a Saga Coordinator commands every step and consumes responses:

[Saga Coordinator] -- Charge --> [Payment Service]
                  <-- Charged ----
                  -- Reserve --> [Inventory]
                  <-- Reserved ---
                  -- Ship --> [Shipping]

Axis	Choreography	Orchestration
Coupling	loose (event-based)	tight (Coordinator knows all)
Debugging	harder (events fan out)	easier (Coordinator is the single point)
SPOF	none	the Coordinator
Fits	event-based domains (DDD)	command-based (RPC-style)

4.3 Toss SLASH24 — Saga distributed-transaction compensation

Toss SLASH24 is the Korean operational counterpart:

compensating transactions are explicit steps
each step guarantees idempotency (an idempotency key)
compensations that also fail escalate to manual operations
Saga state machines are persisted (DB or Kafka)

A telling quote from the talk:

“Compensating transactions can themselves fail. Manual operations have to be designed in. Assuming Saga handles every failure automatically is dangerous.”

4.4 EXP-09b pattern B — three safety nets

Pattern B is a minimal Saga implementation with three safety nets.

// 3 transactions — Tx1 reserve / Tx2 confirm / Tx3 cancel
@Transactional
public OrderId reserve(OrderRequest req) {  // Tx1
    Order order = new Order(req, Status.PENDING);
    return repo.save(order).getId();
}

public void process(OrderId orderId, OrderRequest req) {
    try {
        paymentClient.charge(req);  // external (outside Tx)
        confirm(orderId);            // Tx2
    } catch (Exception e) {
        cancel(orderId);             // Tx3 (compensating)
        throw e;
    }
}

@Transactional
public void confirm(OrderId orderId) {  // Tx2
    Order o = repo.findById(orderId);
    o.setStatus(Status.CONFIRMED);
}

@Transactional
public void cancel(OrderId orderId) {  // Tx3 — compensating
    Order o = repo.findById(orderId);
    o.setStatus(Status.CANCELLED);
}

SagaSweeper adds the time-based safety net — auto-cancel any PENDING that didn’t get confirmed/compensated:

@Scheduled(fixedDelay = 5000)
public void sweep() {
    repo.findPendingOlderThan(Duration.ofSeconds(5))
        .forEach(o -> cancel(o.getId()));
}

4.5 Measured — pattern B across the 9 scenarios

Scenario	Result	Safety net
B/OFF	60 ✅	normal path
B/DB_FAIL	sweeper recovers 60 → CANCELLED	sweeper recovers what worker compensation couldn’t
B/EXT_FAIL	worker compensates 60 → CANCELLED	worker compensation fires immediately

Triple safety net: (1) try/catch worker compensation / (2) time-based sweeper recovery / (3) audit trail (CANCELLED rows). The two operational safety nets fire in turn; looking at one scenario in isolation hides the value.

P99 = 3,106ms — external call + Tx2 commit. Close to EXP-09 (3,302ms), but pool occupation is now 5ms × 2 (Tx1 + Tx2). Pool exhaustion gone.

5. Outbox — Helland CIDR 2005 origin

5.1 Origin — Data on the Outside vs Data on the Inside

The Outbox pattern’s academic origin is Pat Helland — Data on the Outside vs Data on the Inside (CIDR 2005).

“Data on the inside is the data that is private to a service… Data on the outside is the data that flows between services.” “We need a mechanism to publish outside data atomically with inside data changes.” — Pat Helland, Data on the Outside vs Data on the Inside, CIDR 2005

The insights:

Inside the transaction = inside data (your own DB)
Message queues / API calls = outside data (other systems)
We need a mechanism to publish outside data atomically with inside changes
Outbox — write the outside message to an inside outbox table in the same transaction → atomic → a separate poller / CDC publishes it

5.2 Mechanism — Polling vs CDC

graph LR
    Tx[Transaction]
    Tx --> InsideOrder[INSERT order]
    Tx --> InsideOutbox[INSERT outbox]
    Tx --> Commit[atomic commit]
    Commit --> Poller[Outbox Poller<br/>100ms~5s]
    Poller --> External[external system]
    Commit --> CDC[Debezium CDC<br/>ms]
    CDC --> External2[external system]

Variant	Latency	Operational cost
Polling	100ms~5s	simple (Spring `@Scheduled`)
CDC (Debezium)	<100ms	binlog access + Kafka Connect operations

5.3 Korean operational reports — 29CM, Ridi

29CM Transactional Outbox in production:

polling (5-second cadence)
state machine: pending → send_success / send_fail
retry-count limit + DLQ

Ridi’s Transactional Outbox:

polling + idempotency key
distributed lock (ShedLock) to prevent duplicate execution across instances

Both reports stay practical — no academic citation. What this article adds is the three-layer thread: academic origin (Helland CIDR 2005) → operational reports → my own measurement.

5.4 EXP-09b pattern C — measured implementation

@Transactional
public OrderId acceptAndQueue(OrderRequest req) {
    Order order = new Order(req, Status.PENDING);
    OrderId id = repo.save(order).getId();
    outboxRepo.save(new OutboxEvent(id, "CHARGE_REQUEST", req));  // same Tx
    return id;
}

@Scheduled(fixedDelay = 200)  // 200ms polling
public void pollOutbox() {
    List<OutboxEvent> batch = outboxRepo.findUnprocessed(10);  // FOR UPDATE SKIP LOCKED
    for (OutboxEvent e : batch) {
        try {
            paymentClient.charge(e);
            confirmOrder(e.getOrderId());
            outboxRepo.markProcessed(e);
        } catch (Exception ex) {
            outboxRepo.bumpRetry(e);
        }
    }
}

Design choices:

FOR UPDATE SKIP LOCKED — prepared for multi-poller deployment
locked_until column — prevents duplicate work in flight
bumpRetry — increments retry counter on external failure

5.5 Three latencies — the finding from this measurement

(Mirrors the original measurement note, §5.4.)

C/OFF (the “happy path”) decomposes into three different metrics:

Latency type	C/OFF [measurement]	Meaning
ACK latency	72ms	worker tells the user “queued for processing”
Completion latency (avg)	92,573ms	orders.PENDING → CONFIRMED (external + poller-cycle position)
Completion latency (max)	181,935ms	the last cycle’s row

The same pattern carries two latencies:

User-perceived — 41× faster than EXP-09 (3,302ms)
Completion — 30× slower than EXP-09 (92,573 / 3,071)

The real Outbox trade-off: separating user response from external call costs you in time-to-completion. A poor fit for payments where the user waits for completion. A great fit for notifications where ACK is enough.

Three ways to shorten completion latency:

(a) multi-poller (ShedLock + EXP-12 in W6)
(b) parallel batch external calls (CompletableFuture.allOf)
(c) evolve to CDC (Debezium, ADR-005)

6. EXP-09b 9-scenario matrix — the measurements that compare the patterns

6.1 Core indicators across nine scenarios

#	Pattern	chaos	OK	Inconsist.	Compen.	ExtFail	sweeper	P99 (ms)	awaiting peak
1	A	OFF	60	0	0	0	—	3,071	57 ⚠️
2	A	DB_FAIL	0	60	0	0	—	—	0
3	A	EXT_FAIL	0	0	0	60	—	—	0
4	B	OFF	60	0	0	0	0	3,106	0
5	B	DB_FAIL	0	0	0	0	60	—	0
6	B	EXT_FAIL	0	0	60	0	0	—	0
7	C	OFF	60 (ACK)	0	0	0	—	72 ⭐	0
8	C	DB_FAIL	60 (ACK)	0	0	0	—	67	0
9	C	EXT_FAIL	60 (ACK)	0	0	0	—	66	0

6.2 Final DB state

#	Pattern	chaos	orders distribution	outbox
1	A	OFF	A.CONFIRMED=60	0
2	A	DB_FAIL	(none)	0
3	A	EXT_FAIL	(none)	0
4	B	OFF	B.CONFIRMED=60	0
5	B	DB_FAIL	B.CANCELLED=60 (sweeper)	0
6	B	EXT_FAIL	B.CANCELLED=60 (worker compensation)	0
7	C	OFF	C.CONFIRMED=59 / PENDING=1	1
8	C	DB_FAIL	C.CONFIRMED=9 / PENDING=51	51
9	C	EXT_FAIL	C.PENDING=60	60

6.3 Pattern A’s awaiting=57 spike

A/OFF’s P99 of 3,071ms beats EXP-09’s 3,302ms because pool occupation drops to 5ms × wave. But awaiting peak=57 — once the external sleep(3,000ms) returns, all 60 workers race to INSERT simultaneously → pool of 10 fills → awaiting spikes.

Implication: pool occupation per call is short (5ms × wave), but the instantaneous concurrency spike still matters. It can affect other APIs during that ~10ms window. Surfacing it requires millisecond-resolution pool metrics — Prometheus + Hikari’s metrics binder.

6.4 Pattern B’s triple safety net, validated by scenarios 5+6

Scenario 5 (DB_FAIL): worker compensation cannot fire → sweeper recovers 60 → CANCELLED audit trail
Scenario 6 (EXT_FAIL): worker compensation fires immediately → sweeper has nothing to do

Both safety nets work in turn. Looking at one scenario in isolation hides their value. Both must be measured.

6.5 Code line ↔ measurement mapping

Measurement	Code location	Verified
A/DB_FAIL Inconsistent=60	`PatternARunner.java` chaos branch	✅
A/EXT_FAIL ExtFail=60	`PlatformAStub.java` sleep then throw	✅
B/OFF P99=3,106ms	`PatternBRunner.java` reserve + external + confirm	✅
B/DB_FAIL sweeper=60	`SagaSweeper.java` UPDATE HOLD>5s	✅
B/EXT_FAIL Compensated=60	`PatternBRunner.java` catch → cancel	✅
C ACK P99=72ms	`PatternCRunner.java` orders+outbox in one Tx	✅
C/OFF processed=59 (180s)	`OutboxPoller.java` serial batch	✅
C/DB_FAIL chaosSkipped=50	`OutboxPoller.java` deterministic orderId%2	✅
C/EXT_FAIL retries=19	`OutboxPoller.java` bumpRetry	✅

→ All nine measurements map 1:1 to code lines.

7. Two latencies, separated — user response vs completion

7.1 What the latencies mean per pattern

Pattern	User response ↔ completion	Sync/Async
A	same (external + INSERT on same worker thread)	sync
B	same (reserve + external + confirm on same worker thread)	sync
C	different (worker = ACK / poller = completion)	async

7.2 User response latency

Indicator	EXP-09 #2	A/OFF	B/OFF	C/OFF
Success rate	16.7%	100%	100%	100% (ACK)
User response P99	3,302ms	3,071ms	3,106ms	72ms ⭐
Pool occupation per Tx	~3,000ms	~5ms	~5ms × 2	~10ms

7.3 Completion latency

Pattern	Completion min	Completion avg	Completion max
A/OFF	3,071ms (= user response)	3,071ms	3,071ms
B/OFF	3,106ms (= user response)	3,106ms	3,106ms
C/OFF	3,233ms	92,573ms	181,935ms

→ Compared on completion latency, C is on average 30× slower than A.

7.4 Domain mapping flows from these two latencies

Domain	User response priority	Completion priority	Recommended pattern
Payment confirm	△	⭐⭐⭐	Saga (B) — completion consistency required
Notifications (SMS / Push)	⭐⭐⭐	△	Outbox (C) — ACK is enough
Cache invalidation	⭐⭐⭐	⭐	Plain split (A) — DB consistency tolerant
Order creation	⭐⭐	⭐⭐⭐	Saga (B) — consistency
Search index update	⭐	△	Outbox (C)
Idempotent retry-state update	⭐⭐	⭐⭐	Saga or Outbox (per domain)

7.5 Why completion latency is volatile

C’s completion latency varies by cycle position:

poller cycle 200ms × 60 rows × batch=10 ≈ 60 × 200 / 10 = 1,200ms (theory)
measured average = 92,573ms (77× theory)

The gap is the external call sleep(3,000ms) serialized inside the batch. With batch=10, one cycle takes 30,000ms (10 × 3,000) — the external call dominates the cycle.

7.6 Three paths to shrink completion latency

Multi-poller (ShedLock distributed lock + N instances)
Parallel-in-batch (CompletableFuture.allOf for concurrent external calls)
CDC evolution (Debezium binlog-based — polling cost goes to zero)

These three paths are measured in W6’s commerce-batch-orchestrator, ShedLock + EXP-12 / EXP-12b.

8. Domain mapping — payments to Saga, notifications to Outbox

8.1 Decision tree

Q1: External call requires *synchronous* consistency?
  YES → Saga (B)
  NO  → Q2

Q2: User response can absorb the external-call duration?
  YES → Plain split (A) — but accept the external-OK / DB-fail risk
  NO  → Outbox (C)

Q3: Completion can be slower without harm?
  YES → Outbox (C) is enough
  NO  → Outbox + multi-poller / parallel / CDC

8.2 Per-domain fit

Domain	Recommended pattern	Reason
Payment confirm	Saga (B)	Completion consistency + compensation
Refund	Saga (B)	External PG idempotency + DB reconciliation
Notifications (SMS / Push / Email)	Outbox (C)	Fast ACK; completion can lag
Search index update	Outbox (C)	Async tolerated, weak consistency
Cache invalidation	Plain split (A)	Stale cache acceptable; the next read fixes it
Order creation	Saga (B)	External inventory + local DB consistency
Domain notification (event-driven)	Outbox (C)	DDD domain-event canonical
Order cancel (self-decided)	Plain split (A)	No external call
Order cancel (with PG refund)	Saga (B)	External PG + local DB
Operations dashboard updates	Plain split (A)	Read-only cache

8.3 Applied to commerce-comment-platform-be

Domain                     → Pattern
Credit deduction (payment) → Saga (B)
Auto-comment notification  → Outbox (C)
Operator-dashboard update  → Plain split (A)
Order refund               → Saga (B)
Search index sync          → Outbox (C)

This is the mapping ADR-BE-008 codifies — validated by the 9-scenario EXP-09b.

9. `@TransactionalEventListener` — Spring’s commit-after hook

9.1 Behavior

@TransactionalEventListener publishes events at the transaction’s commit point. Spring’s TransactionalEventListener defines it.

@Service
public class OrderEventPublisher {
    private final ApplicationEventPublisher publisher;

    @Transactional
    public void confirmOrder(Order order) {
        order.confirm();
        repo.save(order);
        publisher.publishEvent(new OrderConfirmedEvent(order));  // publish
        // listener fires after commit (default = AFTER_COMMIT)
    }
}

@Component
public class NotificationListener {
    @TransactionalEventListener  // default = AFTER_COMMIT
    public void onOrderConfirmed(OrderConfirmedEvent event) {
        // notify externally after commit
        notificationClient.send(event);
    }
}

9.2 Four phases

Phase	When	Use
`BEFORE_COMMIT`	just before commit	DB validation / extra INSERTs
`AFTER_COMMIT` (default)	just after commit	external notification / cache invalidation
`AFTER_ROLLBACK`	just after rollback	compensation / audit
`AFTER_COMPLETION`	after commit OR rollback	cleanup

9.3 The Spring source — `TransactionSynchronizationManager`

TransactionSynchronizationManager is the engine. registerSynchronization registers the callback:

// TransactionalEventListenerFactory calls this internally
TransactionSynchronizationManager.registerSynchronization(
    new TransactionSynchronization() {
        @Override
        public void afterCommit() {
            // listener fires
        }
    }
);

9.4 The trap — RuntimeException inside `AFTER_COMMIT`

If an AFTER_COMMIT listener throws — the transaction is already committed → no rollback possible → external system inconsistency is on the table.

The remedy:

inside the listener, only INSERT into the outbox (= pattern C)
a separate polling worker handles the external call (= pattern C)
alternatively, @Async + an idempotency key (a simpler pattern with weaker guarantees)

This trap is why Outbox is academically sound. External calls inside AFTER_COMMIT don’t match Helland’s outside-data publication mechanism — Outbox is the answer.

10. Conclusion — transaction split through a senior lens

10.1 Layered understanding — academic → operational → measured

Layer	Understanding
L1 surface	”Don’t call external APIs inside a transaction”
L2 mechanism	PROPAGATION 7 + `TransactionSynchronizationManager` + `@TransactionalEventListener`
L2.5 source	`AbstractPlatformTransactionManager#handleExistingTransaction` 7-way / `TransactionalEventListenerFactory`
L3 measurement	EXP-09 two runs + EXP-09b 9-scenario matrix — 9 measurements mapped 1:1 to code lines
L4 ops (Korea)	Toss SLASH24 SAGA / 29CM·Ridi Outbox / Woowahan why is this rolling back?
L4 ops (global)	Stripe Idempotency / Microsoft Saga pattern docs
L5 academic	Garcia-Molina 1987 (Saga) / Helland CIDR 2005 (Outbox) / Helland CIDR 2007 (2PC limits) / Vogels 2008 (Eventually Consistent)

10.2 Domain-mapping decisions

Payment / order / refund → Saga (B) — completion consistency required
Notification / search index / async events → Outbox (C) — fast ACK + slow completion is fine
Cache / operator dashboard → Plain split (A) — external-OK / DB-fail risk acknowledged
Don’t use — 2PC (XA) — Helland’s critique + bad fit for availability/partition

10.3 Operational checklist

PR review — flag external calls inside @Transactional methods
Be explicit about PROPAGATION — don’t depend on the default
Idempotency for Saga / Outbox (idempotency key + DB UNIQUE)
Monitor Outbox completion latency (poller lag)
Define an escalation path for compensation-fails-too in Saga (manual ops + alerting)
Audit RuntimeException paths inside @TransactionalEventListener(AFTER_COMMIT)

10.4 What the next articles cover

Article 1 — JPA L1 cache / flush lifecycle (the specifics of flush at commit)
Article 3 — OSIV + transaction propagation (REQUIRED rollback-only + Vlad’s OSIV anti-pattern)
Article 7 — Spring AOP self-invocation (PROPAGATION’s self-invocation combination trap)

11. References

Academic (L5)

Garcia-Molina, Salem — Sagas (ACM SIGMOD 1987) — Saga’s origin. ACM DL
Pat Helland — Life Beyond Distributed Transactions (CIDR 2007) — 2PC limits. PDF
Pat Helland — Data on the Outside vs Data on the Inside (CIDR 2005) — Outbox’s academic origin. PDF
Werner Vogels — Eventually Consistent (ACM Queue 2008) — eventual-consistency model. ACM Queue
Bernstein, Hadzilacos, Goodman — Concurrency Control and Recovery in Database Systems (1987) — the textbook on transaction theory
Gray, Reuter — Transaction Processing: Concepts and Techniques (1992) — transactions / Saga / recovery, canonical
Newman — Building Microservices (2nd ed.) — operational guide for Saga / choreography vs orchestration

Official documentation (primary)

Spring 6 source (cited directly)

Korean tech-blog incident reports

Toss SLASH24 — SAGA distributed transaction compensation
29CM — Transactional Outbox in production
Ridi — Transactional Outbox adoption
Woowahan — 응? 이게 왜 롤백되는거지? (Wait, why is this rolling back?)
Kakao Pay — JPA Transactional readOnly + set_option

Global production reports

Stripe — Designing robust and predictable APIs with idempotency
Stripe API — Idempotent requests
Adyen — API idempotency
Confluent — Exactly-Once Semantics in Apache Kafka
Martin Fowler — Patterns of Distributed Systems

Vlad Mihalcea (Hibernate Steering Committee)

Author’s own measurements

W1 EXP-09 — Pool exhaustion from external calls inside a transaction
W1 EXP-09b — Pattern A/B/C 9-scenario matrix
This series, Article 7 — Spring AOP self-invocation @Transactional proxy