Skip to content
Forward Engineering
Go back

Why saveAll() Becomes 10K INSERTs — IDENTITY and Hibernate's Structural Batch Disablement

- views

Table of contents

Open Table of contents

Why this article

The straightforward way to insert many rows in JPA:

@Transactional
public void bulkSave(List<Entity> rows) {
    repo.saveAll(rows);
}

With hibernate.jdbc.batch_size=50 set, saving 10,000 rows should be 10,000 / 50 = 200 SQL calls.

Reality: 10,000 SQL calls. Why?

The common answer (“IDENTITY”) is correct but rarely explained. And the most-cited fix — DZone’s Boost JPA Bulk Insert by 100x — assumes PostgreSQL. On MySQL there is no native SEQUENCE, so the playbook differs.


1. The phenomenon

@Entity class BulkTargetIdentity {
    @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
    Long id;
    ...
}

@Transactional
void run() {
    List<BulkTargetIdentity> rows = new ArrayList<>(10_000);
    for (int i = 0; i < 10_000; i++) rows.add(new BulkTargetIdentity(...));
    repo.saveAll(rows);
}

hibernate.jdbc.batch_size=50 plus order_inserts=true in application.yml. Hibernate Statistics still report prepareStatementCount = 10000.


2. The structural reason — Statement.RETURN_GENERATED_KEYS vs batch

For each INSERT, Hibernate must immediately retrieve the generated ID to register the entity in the persistence context. JDBC’s mechanism:

PreparedStatement ps = conn.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS);
ps.executeUpdate();
ResultSet keys = ps.getGeneratedKeys();

Per the JDBC spec (PreparedStatement.addBatch()): “not all databases support batch updates with generated keys.” MySQL Connector/J does, partially, but Hibernate’s PostInsertIdentifierGenerator does not trust the per-row mapping in batch — and disables batching to keep semantics consistent.

Issue HHH-12107 confirms this is intentional.

Consequence: with IDENTITY, hibernate.jdbc.batch_size has no effect.


3. S2 — clear() every 50 saves memory but not SQL count

for (int i = 0; i < 10_000; i++) {
    em.persist(new BulkTargetIdentity(...));
    if (i % 50 == 0 && i > 0) {
        em.flush();
        em.clear();
    }
}

Persistence context stays bounded — see P2 dirty-checking. But IDENTITY still cannot batch, so SQL count remains 10,000.


4. S3 — TABLE strategy simulation: pre-allocated IDs

CREATE TABLE hibernate_sequence_emul (
    sequence_name VARCHAR(50) PRIMARY KEY,
    next_val      BIGINT NOT NULL
);
@Transactional
public long allocateBlock(int count) {
    Long start = jdbc.queryForObject(
        "SELECT next_val FROM hibernate_sequence_emul WHERE sequence_name = ? FOR UPDATE",
        Long.class, "bulk_target_seq");
    jdbc.update("UPDATE hibernate_sequence_emul SET next_val = ? WHERE sequence_name = ?",
        start + count, "bulk_target_seq");
    return start;
}

@Entity class BulkTargetTableSeq {
    @Id Long id;   // application-set
    ...
}

long start = allocateBlock(10_000);
List<BulkTargetTableSeq> rows = new ArrayList<>();
for (int i = 0; i < 10_000; i++) rows.add(new BulkTargetTableSeq(start + i, ...));
repo.saveAll(rows);

ID is set before INSERT, so generated-keys mapping is irrelevant — batch works. With batch_size=50, ~200 SQL.

TABLE strategy trap — sequence-table row lock

SELECT ... FOR UPDATE + UPDATE serialises ID allocation. With many concurrent bulk inserts, the sequence row becomes the bottleneck. Mitigation: pooled-lo optimizer — allocate a block at a time:

@Id @GeneratedValue(strategy = TABLE, generator = "bulk_seq")
@TableGenerator(name = "bulk_seq", table = "hibernate_sequence_emul",
        pkColumnValue = "bulk_target_seq", allocationSize = 1000)

allocationSize=1000 → one sequence-row touch per 1,000 IDs.


5. S4 — raw JDBC batchUpdate + rewriteBatchedStatements=true

@Transactional
public void s4() {
    List<Object[]> args = new ArrayList<>(10_000);
    for (int i = 0; i < 10_000; i++) args.add(new Object[]{1L, "raw-" + i});
    jdbc.batchUpdate("INSERT INTO bulk_target_identity (owner_id, payload, created_at) VALUES (?, ?, NOW(6))",
            args);
}

JDBC URL must include rewriteBatchedStatements=true:

jdbc:mysql://...?...&rewriteBatchedStatements=true

The MySQL driver rewrites N batched INSERTs into one multi-value INSERT:

INSERT INTO bulk_target_identity (owner_id, payload, created_at) VALUES
    (?, ?, ...), (?, ?, ...), (?, ?, ...), ..., (?, ?, ...);

10K rows / 1K batch size → ~10 SQL — fewer round trips, less network overhead. Fastest path for one-off bulk loads.


6. MySQL has no native SEQUENCE

DZone’s “IDENTITY → SEQUENCE for 100x” applies to PostgreSQL, where nextval() is atomic and Hibernate caches blocks. On MySQL, GenerationType.SEQUENCE falls back to TABLE — a row-locking sequence simulation.

Practical MySQL options:

OptionProsCons
UUID (@GeneratedValue(strategy = UUID))distributed-safe, no lockrandom IDs cause heavy InnoDB B+tree page splits
@TableGenerator + pooled-lobatch + low lock overheadneeds sequence-table maintenance
Snowflake / TSIDtime-ordered + distributed + no lockextra library
Raw JDBC + rewriteBatchedStatementsfastestbypasses persistence context

Project rule of thumb:


7. order_inserts / order_updates revisited

hibernate.order_inserts=true reorders inserts to group same-entity-type rows together, increasing batch fill rate. This only helps when batching is enabled — i.e., not with IDENTITY. With SEQUENCE / TABLE / application-set IDs, it is part of the standard tuning set.


8. Decision tree

SituationRecommendation
1–100 inserts in normal transactionsIDENTITY + saveAll (simple; minor batch loss is acceptable)
10K+ inserts, one-off (batch job)Raw JDBC batchUpdate + rewriteBatchedStatements
Distributed env, time-ordered IDsTSID / Snowflake (library)
Single instance, PK onlyTableGenerator + pooled-lo
Spring Batch ItemWriterJdbcBatchItemWriter (rewrites automatically)

9. Conclusion

hibernate.jdbc.batch_size alone is not the answer. IDENTITY’s PostInsertIdentifierGenerator disables batching by design. Combine that with MySQL’s lack of native SEQUENCE, and the popular DZone playbook stops applying. Three orders of magnitude separate IDENTITY (10K SQL) from raw JDBC + rewrite (~10 SQL) for the same 10K rows. The choice of ID strategy is the choice of bulk-insert performance.


References

Official

Vlad Mihalcea

External

Sister posts


Share this post on:

Next Post
[JPA + Spring Mastery 01] L1 Cache · flush · Transaction Lifecycle — what readOnly really shaves off, dirty checking's true cost