Archives
Everything I've written so far, organized by date.
-
Why saveAll() Becomes 10K INSERTs — IDENTITY and Hibernate's Structural Batch Disablement
hibernate.jdbc.batch_size=50 is set, yet saveAll() of 10,000 rows fires 10,000 INSERTs. GenerationType.IDENTITY needs LAST_INSERT_ID() per INSERT, which Statement.RETURN_GENERATED_KEYS cannot deliver in batch — Hibernate disables batching structurally. Application-managed IDs (TABLE strategy simulation) restore batching at ~200 SQL. Raw JDBC batchUpdate with rewriteBatchedStatements=true rewrites them as multi-value INSERT (~10 SQL) — the fastest path. The DZone "IDENTITY → SEQUENCE 100x" post is PostgreSQL-specific. MySQL has no native SEQUENCE (it falls back to TABLE), so the real options on MySQL are UUID, TableGenerator pooled-lo, Snowflake/TSID, or raw JDBC batch.
-
JPA N+1 and the Four JOIN FETCH Traps — MultipleBagFetchException, Pagination OOM, OneToOne LAZY
In a 4-depth domain (owner→merchant→rule→history) findAll + child traversal yields 21 SQL. JOIN FETCH collapses it to 1. But fetching two collections at once raises MultipleBagFetchException — Hibernate refuses the cartesian product of two Bags. JOIN FETCH + setMaxResults emits HHH000104 and applies pagination *in memory* — load 10K rows, keep 20, OOM. Non-owning @OneToOne LAZY is *always fetched* because the proxy cannot tell whether the value is null. @BatchSize tames N+1 to N/K+1, the standard mitigation. The fetch traps in JPA come from Bag/List/Set semantics, proxy limitations, and how Hibernate handles cartesian products — never from JOIN FETCH alone.
-
The Real Cost of JPA Dirty Checking — readOnly, @DynamicUpdate, and Query Plan Cache Leaks
Hibernate's dirty checking copies an entity snapshot at load time and compares it against the current state at flush. With 10,000 rows, readOnly=true skips the snapshot copy (memory savings). @DynamicUpdate emits SQL with only the changed columns — but generates a fresh SQL string per update pattern, increasing Query Plan Cache usage (a permanent heap leak if hibernate.query.plan_cache_max_size is unset). @Modifying bulk JPQL is fastest but leaves the persistence context inconsistent — clearAutomatically=true is the standard. The clear() pattern (flush + clear every 50 inserts) keeps memory bounded for large insert batches. The trap in JPA is never one feature; it is the interaction of flush, cache, and snapshot lifecycle.
-
JPA Optimistic Lock and the Retry Stampede Trap — 6 Scenarios @Version Cannot Cover Alone
100 workers each increment the same rule's priority by +1. Without @Version, the final priority < 100 (Lost Update). With @Version, you only get OptimisticLockException — handling is the caller's responsibility, so only some succeed. @Retryable(3) with backoff=0 produces *retry stampede* — retries pile up at the same instant, colliding again. Exponential backoff with full jitter spreads retries out and reaches priority=100. Plus the *self Lost Update* trap discovered along the way — same transaction, two SELECTs returning different objects (JDBC) vs the same instance (JPA first-level cache `==`). Different category from distributed Lost Update. The piece also covers @Transactional + @Retryable AOP ordering and the AWS Architecture Blog rationale for full jitter.
-
[JPA + Spring Mastery 01] L1 Cache · flush · Transaction Lifecycle — what readOnly really shaves off, dirty checking's true cost
How PersistenceContext sits on Fowler's Identity Map (PoEAA, 2002), how the four ActionQueue lists decide SQL emission order, what AutoFlush actually inspects right before a query, and how dirty checking's cost differs sharply between reflection and bytecode enhancement — decomposed line-by-line from Hibernate 6's DefaultFlushEventListener through Kakao Pay's readOnly + set_option (QPS +58%) report. The +0.4 ms baseline measured between raw JDBC (0.74 ms) and JPA variants (0.99–1.95 ms) is unpacked, and `readOnly`'s three-layer effect (Hibernate flush mode + Spring tx marker + MySQL Com_set_option round-trips) is taken apart line-by-line.
-
[JPA + Spring Mastery 08] Transaction Split Patterns — Saga / Outbox / REQUIRES_NEW, from academic origins to a 9-scenario EXP-09b measurement
The maxim *don't call external APIs inside a transaction* is well known; the *how* is rarely treated honestly. This article goes from PROPAGATION's seven semantics to 2PC (XA)'s limits, to Garcia-Molina's 1987 Sagas paper, Pat Helland's CIDR 2005 Data on the Outside, and Vogels's ACM Queue 2008 Eventually Consistent — then through Toss SLASH24's SAGA, 29CM and Ridi's Outbox in production — and lands on the EXP-09b 9-scenario measurement matrix (patterns A/B/C × OFF/DB_FAIL/EXT_FAIL). Payments to Saga, notifications to Outbox, cache only to plain split — academic + production + measurement, in three layers.
-
[JPA + Spring Mastery 07] Spring AOP self-invocation — the real reason @Transactional doesn't work, decomposed down to TransactionInterceptor.invoke 6 stages
In an optimistic-lock measurement, successes=100 but balance stays at 100. The code logic was fine — the cause was a same-class call bypassing Spring AOP's proxy, so @Transactional never fired and flush never happened. This article decomposes the 6 stages of TransactionInterceptor.invoke, the line in MethodInvocation.proceed() that calls the raw target, the 6 annotations sharing the same trap (@Async / @Cacheable / @Validated / @Retryable / @PreAuthorize), and 4 workarounds (separate bean / getBean(self) / AopContext.currentProxy / AspectJ weaving), citing Spring 6 / Hibernate 6 source line-by-line.
-
MySQL Credit Deduction — 4 Locks Compared, Pessimistic at 180ms / 100% accurate, plus the self-invocation trap I hit during measurement
An ordinary scenario — 100 workers concurrently subtracting 1 from an account with balance 100. Four lock strategies (optimistic / pessimistic / MySQL GET_LOCK / Redisson) all produce different results — pessimistic 180ms / 100% / balance 0, optimistic 549ms (retry storm under contention), GET_LOCK 5015ms (advisory lock cost), Redisson 53/100 (single-instance limitation). And during measurement I hit the self-invocation trap — successes=100 but the balance never moved. The real Spring/JPA pitfall is not logic, it is AOP proxy bypass. A walkthrough including direct demos of the connection-bound GET_LOCK traps in 4 scenarios.
-
RDB Mastery #3 — Mastering EXPLAIN ANALYZE: Push Down Traps and the Real Mechanics of Index Selection
Once you can read a single line of an EXPLAIN ANALYZE operator tree, you can directly verify the optimizer's decisions. Filter vs Index Range Scan over — a one-word difference that splits push down success from failure. The ANSI SQL standard row constructor (a,b)<(?,?) doesn't match the MySQL optimizer's whitelist patterns and fails to push down — Bug #16247, filed in 2006, is a long-standing known limitation (currently marked duplicate in the tracker). Index Selection is also a cost-based judgment by the optimizer — the Q2 paradox (with a small LIMIT 5, the optimizer picks the wrong index and adding an index makes it slower). The optimizer is not right 100 percent of the time. We unpack the push-down mechanism and the internals of cost-based index selection by reading 5 EXPLAIN ANALYZE outputs line by line, measured on 10 million rows.
-
RDB Mastery #2 — MySQL Index Types: B-tree / Hash / Covering / Composite / Multi-valued / Functional, and When to Pick What
Not every index in InnoDB is a B-tree. Hash (Memory engine), Spatial (R-tree), Full-text (inverted index), Multi-valued (8.0+, JSON arrays), Functional (8.0.13+, expressions). And even within the B-tree family, clustered vs secondary, covering or not, the leftmost-prefix rule for composites, and cardinality / selectivity become the decision axes. Built five real indexes on a 10M-row table and decided when to pick what by measuring cardinality + Q1~Q5 latency. Q3 covering 2,476x / Q5 composite 577x / Q2 paradox where adding an index made it slower (0.66ms → 13.5ms). Indexes are not free — write cost 5~6x + storage 1.3GB. Unwound to the end with 9 diagrams.
-
RDB Mastery #1 — InnoDB Index Internals: From No-Index to Multi-Index, the Real Picture B-trees Draw
Even when you don't define an index, InnoDB already stores rows inside a B-tree. PK = clustered index = the table itself. Secondary index = a separate B-tree that points to PK. Covering index = an index where the answer lives in the leaf, no PK lookup needed. Reverse scan = walking the leaf doubly-linked list backward. OFFSET cannot skip because B-trees do not maintain row counters. Cursor is fast because WHERE triggers the binary-search primitive of the B-tree. Multi-index means N B-trees on the same table. With a 10M-row environment, [measured] Q3 covering 2,476x / Q5 composite 577x / OFFSET 1M = 171ms / cursor = 0.30ms — unwound to the end with 10 diagrams.
-
Decoding HikariCP Pool Exhaustion via JVM Thread Dump — What TIMED_WAITING (parked) Really Means
When the pool exhaustion alert fires, staring at application code yields nothing. The thread dump from jstack is the real evidence — every worker thread is frozen in HikariCP at TIMED_WAITING (parked). I walk through the JVM Thread State machine, LockSupport.parkNanos, the ConcurrentBag and SynchronousQueue mechanics, and how the transaction-with-external-call pool-exhaustion measurement [measured] (timeout 5s = 100% pass / 1s = 16.7%) maps line-by-line to the dump — diagnosing pool exhaustion from a single dump in production.
-
MySQL No-Offset Cursor Pagination — At 10M rows, OFFSET 1M takes 171ms / Cursor 0.30ms, and the 500x trap between them, traced down to a single line
On a 10M-row table, OFFSET 1M takes 171ms while a No-Offset cursor takes 0.30ms — about 570x faster, reproduced by direct measurement. But how you write the No-Offset code splits another 500x. The ANSI SQL row constructor `(a,b)<(?,?)` is logically equivalent to the OR-split form, yet the MySQL optimizer cannot push it down to an index range (154ms — about the same as OFFSET). The single line in EXPLAIN ANALYZE — Filter: vs Covering index range scan over — is the root cause. A production retrospective combined with a reproducible learning environment.
-
MySQL InnoDB Isolation Levels — Measuring phantom reads across all 4 levels and decomposing why InnoDB RR is stronger than the ANSI standard
The ANSI SQL standard does not guarantee that REPEATABLE READ blocks phantom reads. Yet MySQL InnoDB's RR does. I nailed down this commonly-cited claim with direct measurements — RU/RC: phantom occurs (A1=0 → INSERT → A2=1), RR: blocked (A2=0), SERIALIZABLE: INSERT itself waits 1.56s. Then I decomposed why MySQL RR is stronger than the ANSI standard via three mechanisms — consistent read snapshot, gap lock, and MVCC undo log — to nail down with measurements that for payment domains, RR alone is sufficient.
-
External API Calls Inside Transactions — Reproducing Pool Exhaustion and Comparing Simple Split, Saga, and Outbox by Measurement
I reproduced HikariCP pool exhaustion caused by external API calls inside transactions in a Spring + raw JDBC environment, then compared three remedies — Simple Split, Saga, and Outbox — across 60 workers × 9 chaos scenarios. I caught the moment Simple Split breaks consistency as 60 mismatched records, watched Saga's three-tier safety net trigger in sequence, and saw how Outbox's 72ms ACK and 93-second average completion split the same dataset into opposite conclusions depending on which metric you read.
-
Proper Connection Pool Configuration in TypeORM & NestJS
A deep dive into connection pool configuration in TypeORM and mysql2, inspired by Naver D2's Commons DBCP guide. Learn how to calculate required connections using TPS formulas, compare Before/After production code, and understand each configuration option in depth.
-
When equals/hashCode Goes Wrong: A Duplicate Payment Incident Post-Mortem
A deep dive into how forgetting to override hashCode() while implementing equals() caused duplicate payments. Includes Kafka TopicPartition analysis, HashMap internals, and code review checklists.
-
Debugging a Memory Leak in Browser Automation: The Perfect Storm of Three Cleanup Paths
A deep dive into debugging a memory leak in a production system managing 50 concurrent Firefox browsers. The story of how Promise.race and finally blocks created a double-cleanup bug, and the journey to fix it.
-
Multi-Platform Database Design: Building Enterprise-Grade Logging Systems
From specialized table design for new platform integration to AI-driven design validation, index optimization, and partitioning strategies - A complete guide to enterprise-grade database design
-
Dissecting Kotlin's toSet(): Engineering is About Explaining Choices
A deep dive into Kotlin's toSet() method from JVM memory model to production environments. Analyzing standard library design decisions, memory overhead, GC impact, and practical guidelines for high-traffic systems.