Database Optimization trong Java: Hướng dẫn về Spring Boot, Quarkus, Hibernate, JPA và SQL Performance

Database là nơi hiệu năng chết. Không phải ở application server, không phải ở network, mà ở database. Trong hầu hết hệ thống production, 80% latency đến từ database query. Một index thiếu có thể biến query 2ms thành sequential scan 8 giây. Một Hibernate setting sai có thể biến 1 query thành 1000 query mà không có log nào cảnh báo. Connection pool cấu hình sai có thể làm hệ thống xử lý 10,000 rps bình thường đột ngột timeout hoàn toàn khi tải tăng lên 15,000 rps.

Bài này không dạy SQL từ đầu. Nó giải thích tại sao các cơ chế tồn tại, điều gì xảy ra khi thiếu chúng, và cách những engineer senior xử lý chúng trong production thực tế.

Phần 1: Database Performance Fundamentals

Latency vs Throughput — Hai mục tiêu thường xung đột

Latency là thời gian một request mất từ lúc gửi đến lúc nhận response. Throughput là số request hệ thống xử lý được mỗi giây.

Tại sao chúng thường xung đột? Hãy xét JDBC batching. Thay vì insert từng row (latency thấp cho mỗi operation riêng lẻ), batch 100 row vào 1 round-trip tăng throughput lên đáng kể nhưng row đầu tiên phải đợi 99 row còn lại — latency từng row tăng. Đây không phải bug, là trade-off thiết kế có chủ ý.

Ở hệ thống thanh toán, latency quan trọng hơn: user không thể đợi 500ms để thấy giao dịch confirm. Ở hệ thống report đêm, throughput quan trọng hơn: xử lý 50 triệu row nhanh nhất có thể dù latency từng row không quan trọng.

Khi nào câu hỏi đúng là “latency hay throughput”? Mỗi lần thiết kế tối ưu mới.

Network Round Trips — Chi phí ẩn nhất

Một round-trip PostgreSQL từ application server cùng data center tốn khoảng 0.5–2ms. Nghe nhỏ, nhưng:

100 round-trips × 1ms = 100ms thêm vào mỗi request

Đây là lý do N+1 query problem nguy hiểm. Nếu API endpoint tải 50 order và mỗi order trigger 3 query riêng (status, user, items), một request tạo ra 151 round-trips = 150ms overhead thuần từ network, chưa tính thời gian database thực thi.

Ứng dụng              PostgreSQL
    │                      │
    │──── SELECT orders ───>│    1 round-trip
    │<──── 50 rows ─────────│
    │──── SELECT status ───>│
    │<─────────────────────│    50 round-trips
    │──── SELECT user ─────>│
    │<─────────────────────│    50 round-trips
    │──── SELECT items ────>│
    │<─────────────────────│    50 round-trips
    │                      │
    Total: 151 round-trips

CPU vs Memory vs Disk I/O — Database bottleneck đến từ đâu?

Database engine hiện đại ưu tiên giữ data trong memory (buffer pool). Khi query cần data:

Buffer pool hit: data đã ở memory → microseconds
Buffer pool miss: phải đọc từ disk → milliseconds (SSD) hoặc tens of ms (HDD)

PostgreSQL có shared_buffers (thường 25–40% RAM). MySQL có InnoDB buffer pool. Khi working set (data được query thường xuyên) nhỏ hơn buffer pool → hầu hết request là memory hit, performance tốt. Khi working set lớn hơn buffer pool → thường xuyên đọc disk → performance tệ.

Chẩn đoán nhanh trên PostgreSQL:

-- Tỷ lệ cache hit của buffer pool
SELECT
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS cache_hit_ratio
FROM pg_statio_user_tables;
-- Mục tiêu: > 0.99 cho OLTP workload

CPU bottleneck thường do: query phức tạp với nhiều aggregation, thiếu index (sequential scan), hoặc sort lớn không dùng index. Dấu hiệu: CPU database cao, query slow log đầy Sort và Hash Join operations.

Query Execution Cost — Database “suy nghĩ” gì?

Query optimizer của PostgreSQL không thực thi query theo cách bạn viết. Nó tính cost của nhiều execution plan khác nhau và chọn plan rẻ nhất. Cost dựa trên:

seq_page_cost (default 1.0): chi phí đọc page từ disk sequentially
random_page_cost (default 4.0, SSD nên set 1.1–1.5): chi phí đọc page ngẫu nhiên
cpu_tuple_cost: chi phí xử lý mỗi row
Row count estimate từ statistics (pg_statistic)

Vấn đề: Nếu statistics cũ hoặc sai, optimizer chọn plan tệ. ANALYZE cập nhật statistics. autovacuum chạy tự động nhưng không phải lúc nào cũng kịp sau bulk insert lớn.

-- Chạy thủ công sau bulk operation lớn
ANALYZE orders;

-- Xem statistics hiện tại
SELECT tablename, n_live_tup, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE tablename = 'orders';

Phần 2: Indexing Deep Dive

B-Tree Index — Tại sao nó là default?

B-Tree (Balanced Tree) là cấu trúc dữ liệu tối ưu cho range query và equality query. Hiểu internals giúp bạn quyết định index có cần không.

B-Tree cho column "price":
                    [50]
                 /        \
           [20, 35]       [65, 80]
          /   |   \       /   |   \
       [10] [25] [40] [55] [70] [90,95]
        ↓    ↓    ↓    ↓    ↓    ↓
      Row  Row  Row  Row  Row  Row
      ptrs ptrs ptrs ptrs ptrs ptrs

Leaf nodes chứa actual row pointers. Internal nodes chứa separator keys để navigate. Với 1 triệu row, B-Tree có chiều cao ~4–5 level → tối đa 5 page reads để tìm bất kỳ row nào. So với sequential scan: có thể phải đọc hàng nghìn pages.

Khi nào B-Tree KHÔNG hiệu quả?

Full-text search → dùng GIN index với tsvector
Geospatial query → dùng GiST index
Equality-only trên low-cardinality column → B-Tree vẫn dùng được nhưng Hash index nhỏ hơn
Array containment → GIN index

Clustered vs Non-Clustered — Sự khác biệt quan trọng nhất

PostgreSQL: Mọi table đều là heap (non-clustered). Index là cấu trúc riêng biệt chứa pointers về heap. Có thể cluster thủ công bằng CLUSTER nhưng không tự động duy trì.

MySQL InnoDB: Primary key là clustered index — data rows sắp xếp vật lý theo primary key. Secondary indexes chứa primary key value, không phải row pointer trực tiếp.

Hệ quả trong production:

-- PostgreSQL: query này luôn cần 2 lookups nếu không là covering index
-- 1. Index lookup để tìm row pointer
-- 2. Heap fetch để đọc actual data
SELECT * FROM orders WHERE customer_id = 123;

-- MySQL InnoDB: query này với secondary index trên customer_id
-- cần lookup secondary index, rồi primary key lookup (clustered)
-- = 2 B-Tree traversals
SELECT * FROM orders WHERE customer_id = 123;

Practical impact: Trong MySQL InnoDB, primary key nên nhỏ (INT, BIGINT) vì nó được nhúng vào mọi secondary index. UUID primary key làm secondary index lớn hơn đáng kể.

Composite Index — Left-Most Prefix Rule

Composite index (a, b, c) có thể phục vụ:

Query trên a ✓
Query trên a, b ✓
Query trên a, b, c ✓
Query trên b ✗ (không dùng index được)
Query trên b, c ✗
Query trên a, c ✓ cho a nhưng không filter được bằng c

Tại sao? B-Tree sort data theo key thứ nhất, trong same key thứ nhất thì sort theo key thứ hai, v.v. Để dùng index, phải có constraint trên tất cả các columns bên trái.

Production scenario — chọn thứ tự column trong composite index:

-- Query thường gặp trong order management system:
SELECT * FROM orders
WHERE status = 'PENDING'
  AND created_at > NOW() - INTERVAL '24 hours'
  AND merchant_id = 456;

-- Index option 1: (status, merchant_id, created_at)
-- Index option 2: (merchant_id, status, created_at)
-- Index option 3: (created_at, status, merchant_id)

Quy tắc: đặt equality columns trước, range columns sau. status = 'PENDING' và merchant_id = 456 là equality, created_at > là range. Vì merchant_id có cardinality cao hơn status (nhiều merchant hơn là nhiều distinct status), đặt merchant_id trước filter hiệu quả hơn:

CREATE INDEX idx_orders_merchant_status_created
ON orders (merchant_id, status, created_at DESC);

Covering Index — Eliminating Heap Fetch

Covering index chứa tất cả columns mà query cần, không cần đọc về heap.

-- Query:
SELECT order_id, status, total_amount
FROM orders
WHERE merchant_id = 456 AND status = 'PENDING';

-- Index (merchant_id, status) — không covering, phải heap fetch để lấy total_amount
-- Index (merchant_id, status, total_amount, order_id) — covering index
-- PostgreSQL EXPLAIN: "Index Only Scan" thay vì "Index Scan"

EXPLAIN ANALYZE
SELECT order_id, status, total_amount
FROM orders
WHERE merchant_id = 456 AND status = 'PENDING';

-- Với covering index:
-- Index Only Scan using idx_orders_covering on orders
--   Index Cond: ((merchant_id = 456) AND (status = 'PENDING'))
--   Heap Fetches: 0    ← không cần đọc heap
--   Rows Removed by Filter: 0

Trade-off: Covering index lớn hơn trong memory và disk. Mỗi INSERT/UPDATE phải update thêm column trong index. Chỉ tạo covering index cho query thực sự hot (top 10 slow queries, tần suất cao).

Partial Index — Index chỉ những row quan trọng

-- Hệ thống payment: 95% transaction là COMPLETED, 5% là PENDING/PROCESSING
-- Chỉ operations cần xử lý PENDING/PROCESSING mới cần index

CREATE INDEX idx_transactions_pending
ON transactions (created_at, merchant_id)
WHERE status IN ('PENDING', 'PROCESSING');

-- Index này chỉ chứa ~5% rows, nhỏ hơn 20× so với full index
-- Query với WHERE status = 'PENDING' AND merchant_id = 123 sẽ dùng index này

Khi nào partial index không phù hợp? Khi distribution data thay đổi theo thời gian. Nếu hôm nay 5% là PENDING nhưng sau migration sẽ là 50%, partial index mất ưu thế.

Index Selectivity và Cardinality

Selectivity = (số distinct values) / (tổng số rows). Selectivity cao → index hiệu quả hơn.

-- Kiểm tra cardinality
SELECT
    attname AS column_name,
    n_distinct,
    correlation  -- 1.0 = perfectly sorted, -1.0 = reverse sorted, 0 = random
FROM pg_stats
WHERE tablename = 'orders' AND attname IN ('status', 'merchant_id', 'created_at');

Column status với 5 distinct values trên 10 triệu rows: n_distinct = 5, selectivity = 0.0000005. PostgreSQL optimizer biết index này chỉ hữu ích khi filter để < ~5% rows. Với query trả về 20% rows (ví dụ status = 'COMPLETED'), sequential scan có thể nhanh hơn index scan vì tránh random I/O.

Common mistake: Index trên boolean column (is_active) — chỉ có 2 values, selectivity thấp. Index không giúp ích nhiều trừ khi kết hợp composite index hoặc dùng partial index.

Quá nhiều Index — Chi phí ẩn

-- Một table "orders" có 15 index trong production thực tế:
-- Mỗi INSERT/UPDATE/DELETE phải update tất cả 15 B-Trees

-- Đo write throughput impact:
-- 5 indexes: ~12,000 inserts/sec
-- 15 indexes: ~7,000 inserts/sec
-- 25 indexes: ~4,000 inserts/sec

Ngoài ra, index maintenance sau bulk operation (VACUUM, REINDEX) tốn thêm thời gian. Audit định kỳ index nào thực sự được dùng:

-- Các index không được dùng trong 30 ngày
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND schemaname NOT IN ('pg_catalog', 'pg_toast')
ORDER BY pg_relation_size(indexrelid) DESC;

Production strategy: Khi có performance issue, thêm index. Mỗi quý, review và drop unused indexes. Đừng index preemptively theo bản năng.

Phần 3: SQL Query Optimization

SELECT * — Không chỉ là bad practice

SELECT * không chỉ tốn bandwidth. Nó:

Phá covering index: Database phải về heap dù index đã đủ
Tăng memory pressure: Rows lớn hơn, buffer pool flush nhanh hơn
Break khi schema thay đổi: Thêm column BYTEA 10MB → query đột ngột tải 10MB mỗi row
Ngăn optimizer tối ưu: Optimizer không biết column nào thực sự cần

// Hibernate - JPA thường mặc định SELECT *
@Entity
public class Order {
    @Id private Long id;
    private String status;
    private BigDecimal totalAmount;
    @Lob private byte[] receiptPdf; // 5MB mỗi order
    // 20 fields khác
}

// Code này load 5MB receipt PDF cho mỗi order dù không cần
List<Order> orders = orderRepository.findByMerchantId(merchantId);

// Fix: dùng projection
public interface OrderSummary {
    Long getId();
    String getStatus();
    BigDecimal getTotalAmount();
}

List<OrderSummary> orders = orderRepository.findByMerchantId(merchantId, OrderSummary.class);

N+1 Query — Production Killer số 1

N+1 là pattern phổ biến nhất gây performance issue trong Hibernate:

// Code nhìn vô hại:
List<Order> orders = orderRepository.findAll(); // 1 query
for (Order order : orders) {
    // Mỗi access này trigger 1 query riêng nếu items là LAZY
    System.out.println(order.getItems().size()); // N queries
}
// Tổng: 1 + N queries

Tại sao Hibernate không tự fix? Hibernate không biết bạn sẽ access items cho mọi order. Khi bạn gọi getItems(), collection chưa load, nó issue SELECT.

Cách phát hiện trong production:

// application.properties - bật Hibernate statistics
spring.jpa.properties.hibernate.generate_statistics=true
spring.jpa.properties.hibernate.session.events.log.LOG_QUERIES_SLOWER_THAN_MS=100

// Log output:
// Session Metrics {
//   1000001 nanoseconds spent acquiring 1 JDBC connections;
//   2502341 nanoseconds spent on 201 JDBC statements; ← 201 queries!
//   ...
// }

Fix 1: JOIN FETCH trong JPQL

@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.merchantId = :merchantId")
List<Order> findWithItems(@Param("merchantId") Long merchantId);
// 1 query với JOIN, Hibernate hydrate cả Order và Items

Fix 2: Entity Graph

@EntityGraph(attributePaths = {"items", "items.product"})
List<Order> findByMerchantId(Long merchantId);
// Hibernate tự tạo JOIN FETCH query

Fix 3: Batch fetch — khi JOIN FETCH không phù hợp

@Entity
public class Order {
    @OneToMany
    @BatchSize(size = 50) // Load 50 order's items trong 1 query
    private List<OrderItem> items;
}

// Với 200 orders: 1 query + ceil(200/50) = 5 queries = 6 tổng
// Thay vì 201 queries

Khi nào KHÔNG dùng JOIN FETCH? Khi collection lớn và bạn chỉ cần aggregate (count, sum) — JOIN FETCH load toàn bộ object vào memory. Dùng @Query với native SQL hoặc JPQL aggregation thay thế.

JOIN Optimization

Database optimizer thường biết cách join tốt hơn bạn nghĩ — nhưng nó cần thông tin đúng.

-- Hash Join: hiệu quả khi cả hai table lớn, không có index dùng được
-- Merge Join: hiệu quả khi cả hai input đã sorted (index scan)
-- Nested Loop: hiệu quả khi outer table nhỏ, inner có index

-- Ví dụ: join orders với merchants
EXPLAIN ANALYZE
SELECT o.id, o.total_amount, m.name
FROM orders o
JOIN merchants m ON o.merchant_id = m.id
WHERE o.status = 'PENDING'
  AND o.created_at > NOW() - INTERVAL '1 day';

-- Nếu thấy "Hash Join" với "Seq Scan" trên orders → thiếu index
-- Nếu thấy "Nested Loop" với orders là outer và merchants là inner → OK

Anti-pattern: JOIN thừa

-- Bad: JOIN merchants chỉ để check existence
SELECT o.*
FROM orders o
JOIN merchants m ON o.merchant_id = m.id
WHERE m.status = 'ACTIVE';

-- Better: Semi-join với EXISTS, không cần fetch merchant data
SELECT o.*
FROM orders o
WHERE EXISTS (
    SELECT 1 FROM merchants m
    WHERE m.id = o.merchant_id AND m.status = 'ACTIVE'
);

EXISTS vs IN — Không phải chỉ là style

-- IN với subquery: PostgreSQL materialize subquery một lần
SELECT * FROM orders
WHERE merchant_id IN (SELECT id FROM merchants WHERE status = 'ACTIVE');

-- EXISTS: short-circuit ngay khi tìm thấy match đầu tiên
SELECT * FROM orders o
WHERE EXISTS (
    SELECT 1 FROM merchants m
    WHERE m.id = o.merchant_id AND m.status = 'ACTIVE'
);

Trong PostgreSQL hiện đại (12+), optimizer thường transform IN thành semi-join tương tự EXISTS. Sự khác biệt rõ nhất với NOT IN vs NOT EXISTS:

-- NOT IN với NULL — trap phổ biến
-- Nếu subquery trả về bất kỳ NULL nào, toàn bộ NOT IN trả về empty set!
SELECT * FROM orders
WHERE merchant_id NOT IN (SELECT id FROM inactive_merchants);
-- Nếu có 1 merchant NULL trong inactive_merchants → 0 rows trả về

-- NOT EXISTS — an toàn với NULL
SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM inactive_merchants im
    WHERE im.id = o.merchant_id
);

OFFSET Pagination — Tại sao nó không scale

-- Page 1: OFFSET 0 LIMIT 20 — nhanh
SELECT * FROM orders ORDER BY created_at DESC OFFSET 0 LIMIT 20;

-- Page 500: OFFSET 9980 LIMIT 20 — PostgreSQL vẫn phải:
-- 1. Scan index từ đầu đến row 10000
-- 2. Discard 9980 rows
-- 3. Return 20 rows
SELECT * FROM orders ORDER BY created_at DESC OFFSET 9980 LIMIT 20;

OFFSET 0     OFFSET 9980
[READ] ──────────────────────────[DISCARD]──[RETURN]
^                                           ^
Start                                     Page 500

Với 10 triệu rows, page 500,000 phải scan 10 triệu rows và discard 9,999,980. Cost tuyến tính với offset.

Keyset Pagination (Cursor-based):

-- Lần đầu tiên — không cần cursor
SELECT id, created_at, total_amount
FROM orders
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Lần sau: dùng last row từ response trước làm cursor
-- last_created_at = '2026-06-01 10:30:00', last_id = 9876
SELECT id, created_at, total_amount
FROM orders
WHERE (created_at, id) < ('2026-06-01 10:30:00', 9876)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Index (created_at DESC, id DESC) làm cho query này O(log n) bất kể trang nào

// Spring Data implementation
public interface OrderRepository extends JpaRepository<Order, Long> {
    @Query("""
        SELECT o FROM Order o
        WHERE (o.createdAt < :cursorDate)
           OR (o.createdAt = :cursorDate AND o.id < :cursorId)
        ORDER BY o.createdAt DESC, o.id DESC
        """)
    List<Order> findNextPage(
        @Param("cursorDate") Instant cursorDate,
        @Param("cursorId") Long cursorId,
        Pageable pageable
    );
}

Khi nào OFFSET vẫn chấp nhận được? Admin tools với ít data, report với < 1000 rows, bất kỳ use case không cần page > 50.

Aggregation và Window Functions

-- Anti-pattern: load tất cả rows vào application, aggregate trong Java
List<Order> allOrders = orderRepository.findByMerchantId(merchantId);
Map<String, Long> countByStatus = allOrders.stream()
    .collect(Collectors.groupingBy(Order::getStatus, Collectors.counting()));
// Load hàng triệu rows vào heap để count!

-- Better: Aggregate trong database
@Query("""
    SELECT o.status AS status, COUNT(o) AS count
    FROM Order o
    WHERE o.merchantId = :merchantId
    GROUP BY o.status
    """)
List<StatusCount> countByStatus(@Param("merchantId") Long merchantId);

public interface StatusCount {
    String getStatus();
    Long getCount();
}

Window Functions — Khi GROUP BY không đủ:

-- Xếp hạng merchant theo revenue trong từng tháng
SELECT
    merchant_id,
    month,
    revenue,
    RANK() OVER (PARTITION BY month ORDER BY revenue DESC) AS rank_in_month,
    SUM(revenue) OVER (PARTITION BY merchant_id) AS total_revenue
FROM monthly_revenue;
-- Một query, không cần self-join hay subquery

Phần 4: Hibernate và JPA Performance

Persistence Context — Bộ máy ẩn đằng sau mỗi transaction

Persistence Context là unit of work của Hibernate. Mỗi transaction có một persistence context (first-level cache) chứa tất cả entities đã load trong transaction đó.

Transaction bắt đầu
       │
       ▼
Persistence Context (Map<EntityKey, Object>)
       │
  ┌────┴─────┐
  │ Load     │──→ Check cache → Miss → Issue SELECT → Store in cache
  │ entity   │
  └────┬─────┘
       │
  ┌────┴─────┐
  │ Modify   │──→ Entity tracked, change recorded in "dirty checking"
  │ entity   │
  └────┬─────┘
       │
  ┌────┴─────┐
  │ Flush    │──→ Hibernate compares current state vs snapshot
  │ (commit) │    Issues UPDATE only for changed fields
  └──────────┘

Dirty Checking overhead — ít được biết đến:

Mỗi flush, Hibernate so sánh state hiện tại của mỗi entity với snapshot ban đầu. Với 10,000 entities trong persistence context, điều này tốn thời gian đáng kể.

// Problematic pattern: batch processing trong một transaction
@Transactional
public void processAllOrders() {
    List<Order> orders = orderRepository.findAll(); // load 100,000 orders
    for (Order order : orders) {
        order.setProcessed(true);
        // Sau 100,000 iterations, persistence context chứa 100,000 entities
        // Flush sẽ so sánh 100,000 states
    }
}

// Fix: clear persistence context định kỳ
@Transactional
public void processAllOrders() {
    int batchSize = 100;
    int offset = 0;
    while (true) {
        List<Order> batch = orderRepository.findAll(
            PageRequest.of(offset / batchSize, batchSize)
        ).getContent();
        if (batch.isEmpty()) break;

        for (Order order : batch) {
            order.setProcessed(true);
        }
        orderRepository.saveAll(batch);
        entityManager.flush();
        entityManager.clear(); // ← clear persistence context
        offset += batchSize;
    }
}

First-Level Cache — Không phải Silver Bullet

First-level cache chỉ sống trong transaction. Hai transaction khác nhau không share cache. Đây là nguồn gốc của một số bugs thú vị:

// Service A và Service B trong cùng request nhưng transaction khác nhau
@Service
public class ServiceA {
    @Transactional
    public Order getOrder(Long id) {
        return orderRepository.findById(id).get(); // Query 1
    }
}

@Service
public class ServiceB {
    @Transactional
    public Order getOrder(Long id) {
        return orderRepository.findById(id).get(); // Query 2 — không share cache với ServiceA!
    }
}

// Nếu cùng được gọi trong một flow: 2 SELECT queries

Second-Level Cache — Khi nào nên dùng?

Second-level cache (L2C) shared giữa tất cả transactions và sessions. Phổ biến: EhCache, Caffeine, Redis.

@Entity
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class ProductCategory {
    @Id private Long id;
    private String name;
    // Data thay đổi ít (vài lần/tuần), đọc nhiều (mọi product page)
}

// application.properties
spring.jpa.properties.hibernate.cache.use_second_level_cache=true
spring.jpa.properties.hibernate.cache.region.factory_class=
    org.hibernate.cache.jcache.JCacheRegionFactory
spring.jpa.properties.javax.cache.provider=
    org.ehcache.jsr107.EhcacheCachingProvider

Khi nào KHÔNG dùng L2C:

Data thay đổi thường xuyên → cache invalidation overhead > cache hit benefit
Clustered environment với nhiều nodes → cache coherence phức tạp
Correctness quan trọng tuyệt đối (financial data) → stale read không chấp nhận được

Khi nào nên dùng:

Reference data: currency codes, country codes, category trees
User preferences (invalidate khi user update)
Configuration data

Lazy vs Eager Loading — Quyết định quan trọng nhất ở entity design

@Entity
public class Order {
    @ManyToOne(fetch = FetchType.LAZY) // ✓ Default nên là LAZY
    private Merchant merchant;

    @OneToMany(fetch = FetchType.LAZY) // ✓
    private List<OrderItem> items;

    @ManyToOne(fetch = FetchType.EAGER) // ✗ Avoid EAGER trong @xToMany
    private OrderStatus currentStatus; // EAGER trên @ManyToOne ít nguy hiểm hơn
}

EAGER loading vấn đề gì?

// Với EAGER:
Order order = orderRepository.findById(1L).get();
// Hibernate tự động join fetch merchant, items, subItems, v.v.
// Có thể tạo Cartesian product nếu nhiều EAGER collections

// Ví dụ worst case:
// Order với 50 items, mỗi item có 10 sub-items
// EAGER load cả hai collections:
// Result set: 50 × 10 = 500 rows để represent 1 order!

Rule of thumb: Tất cả @OneToMany và @ManyToMany nên là LAZY. @ManyToOne và @OneToOne thường OK là EAGER nếu cần thường xuyên.

JDBC Batching — Tại sao Hibernate insert chậm

// Không có batching: 1000 inserts = 1000 round-trips
for (int i = 0; i < 1000; i++) {
    entityManager.persist(new Order(...));
}

// Với batching: 1000 inserts = 10 round-trips (batch size 100)

# application.properties
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
# Bắt buộc nếu dùng IDENTITY generation strategy
# IDENTITY strategy disable batching! Dùng SEQUENCE thay thế
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true

IDENTITY vs SEQUENCE generation:

// IDENTITY — PostgreSQL SERIAL, MySQL AUTO_INCREMENT
// Hibernate phải flush ngay sau persist để lấy generated ID
// → Không batch được!
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

// SEQUENCE — PostgreSQL sequences, Oracle sequences
// Hibernate có thể pre-allocate IDs, persist nhiều entity, batch flush
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq",
                   allocationSize = 50) // Pre-allocate 50 IDs
private Long id;

Fetch Join vs Entity Graph — Khi nào dùng gì?

// Fetch Join — explicit trong JPQL, bạn kiểm soát hoàn toàn
@Query("""
    SELECT DISTINCT o FROM Order o
    JOIN FETCH o.merchant m
    JOIN FETCH o.items i
    JOIN FETCH i.product p
    WHERE o.createdAt > :since
    """)
List<Order> findRecentOrdersWithDetails(@Param("since") Instant since);

// Entity Graph — declarative, reusable, repository-level
@NamedEntityGraph(
    name = "Order.withMerchantAndItems",
    attributeNodes = {
        @NamedAttributeNode("merchant"),
        @NamedAttributeNode(value = "items",
                           subgraph = "items.product")
    },
    subgraphs = @NamedSubgraph(
        name = "items.product",
        attributeNodes = @NamedAttributeNode("product")
    )
)
@Entity
public class Order { ... }

// Repository
@EntityGraph("Order.withMerchantAndItems")
List<Order> findByCreatedAtAfter(Instant since);

Khi nào Fetch Join tốt hơn: Query phức tạp cần fine-grained control, khi Entity Graph syntax không đủ linh hoạt.

Khi nào Entity Graph tốt hơn: Reuse cùng fetch pattern ở nhiều nơi, code rõ ràng hơn ở repository layer.

MultipleBagFetchException — bẫy phổ biến:

// Lỗi khi fetch > 1 collection bằng JOIN FETCH:
@Query("SELECT o FROM Order o JOIN FETCH o.items JOIN FETCH o.tags")
// → MultipleBagFetchException: cannot simultaneously fetch multiple bags

// Fix 1: Đổi List thành Set (order không quan trọng)
@OneToMany
private Set<OrderTag> tags; // Thay vì List

// Fix 2: Dùng 2 query riêng biệt với Hibernate batch loading

Phần 5: Spring Boot và Quarkus Optimization

HikariCP — Connection Pool đúng cách

HikariCP là connection pool mặc định trong Spring Boot. Cấu hình sai có thể khiến hệ thống collapse hoàn toàn dù database hoàn toàn bình thường.

Sizing công thức (từ HikariCP docs):

pool_size = (core_count * 2) + effective_spindle_count

Với SSD: effective_spindle_count = 1. Server 4 core, SSD:

pool_size = (4 * 2) + 1 = 9

Đây là pool size cho một instance application. Nếu 10 instances, total connections = 90. PostgreSQL mặc định max 100 connections. Bạn cần tính cả connections từ monitoring tools, admin tools.

# application.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 10        # Tối đa connections trong pool
      minimum-idle: 5              # Minimum connections duy trì khi idle
      connection-timeout: 30000    # 30s timeout khi acquire connection
      idle-timeout: 600000         # 10 phút — close idle connection
      max-lifetime: 1800000        # 30 phút — recycle connection (tránh firewall timeout)
      keepalive-time: 60000        # 1 phút ping để giữ connection alive
      validation-timeout: 5000     # 5s timeout khi validate connection
      leak-detection-threshold: 60000  # 60s — log warning nếu connection không release

Connection leak detection — cứu tinh trong production:

// Leak detection phát hiện code quên close connection
// Log output khi threshold vượt:
// WARN  c.z.h.pool.ProxyLeakTask - Connection leak detection triggered for
//       com.zaxxer.hikari.pool.ProxyConnection@1a2b3c4d on thread http-nio-8080-exec-1,
//       stack trace follows
// java.lang.Exception: Apparent connection leak detected
//         at com.example.OrderService.processOrder(OrderService.java:45)

Số liệu HikariCP cần monitor:

// Expose HikariCP metrics với Micrometer
@Bean
MeterRegistryCustomizer<MeterRegistry> hikariMetrics(DataSource dataSource) {
    return registry -> {
        if (dataSource instanceof HikariDataSource hikariDS) {
            hikariDS.setMetricRegistry(registry);
        }
    };
}
// Metrics:
// hikaricp.connections.active  — connections đang được dùng
// hikaricp.connections.pending — requests đang chờ connection
// hikaricp.connections.timeout — connection acquisition timeouts

Transaction Boundaries — Nhỏ gọn và chính xác

Transaction scope ảnh hưởng trực tiếp đến:

Lock duration (transaction dài hơn = lock held lâu hơn)
Connection hold time
Memory (persistence context tích lũy entities)

// Anti-pattern: transaction quá rộng
@Transactional
public OrderResponse processOrder(OrderRequest request) {
    validatePayment(request); // Gọi external payment API — 200ms
    Order order = createOrder(request); // DB operation
    sendNotificationEmail(request); // Gọi email service — 100ms
    updateInventory(request); // DB operation
    return buildResponse(order);
}
// Transaction hold connection trong suốt 300ms+ gọi external services
// Database lock held trong 300ms

// Better: tách external calls ra ngoài transaction
public OrderResponse processOrder(OrderRequest request) {
    PaymentResult payment = validatePayment(request); // Không cần transaction
    OrderResponse response = createAndUpdateTransactionally(request, payment); // Transaction nhỏ
    sendNotificationEmail(request); // Không cần transaction
    return response;
}

@Transactional
private OrderResponse createAndUpdateTransactionally(OrderRequest request,
                                                      PaymentResult payment) {
    Order order = createOrder(request);
    updateInventory(request);
    return buildResponse(order);
}

Read-Only Transactions

// @Transactional(readOnly = true) làm gì?
// 1. Hibernate disable dirty checking — tiết kiệm CPU
// 2. Hibernate không snapshot entity state — tiết kiệm memory
// 3. JDBC driver có thể optimize (routing tới read replica)
// 4. Database có thể optimize (skip write lock overhead)

@Transactional(readOnly = true)
public List<OrderSummary> getOrderSummaries(Long merchantId) {
    return orderRepository.findSummariesByMerchantId(merchantId);
}

Spring Boot + Quarkus: Read Replica Routing:

// Spring Boot — custom routing datasource
@Configuration
public class DataSourceConfig {
    @Bean
    @Primary
    public DataSource routingDataSource(
            @Qualifier("primary") DataSource primary,
            @Qualifier("replica") DataSource replica) {
        TransactionRoutingDataSource routing = new TransactionRoutingDataSource();
        routing.setTargetDataSources(Map.of(
            "primary", primary,
            "replica", replica
        ));
        routing.setDefaultTargetDataSource(primary);
        return routing;
    }
}

public class TransactionRoutingDataSource extends AbstractRoutingDataSource {
    @Override
    protected Object determineCurrentLookupKey() {
        return TransactionSynchronizationManager.isCurrentTransactionReadOnly()
            ? "replica"
            : "primary";
    }
}

// Quarkus — Reactive với multiple datasources
@ApplicationScoped
public class OrderService {
    @Inject
    @ReactiveDataSource("primary")
    PgPool primaryPool;

    @Inject
    @ReactiveDataSource("replica")
    PgPool replicaPool;

    public Uni<List<Order>> findOrders(Long merchantId) {
        return replicaPool.preparedQuery(
            "SELECT * FROM orders WHERE merchant_id = $1"
        ).execute(Tuple.of(merchantId))
         .map(rows -> mapToOrders(rows));
    }
}

Open Session In View — Anti-pattern cần loại bỏ

OSIV (Open Session In View) giữ Hibernate session mở trong suốt HTTP request — từ khi nhận request đến khi response được serialize.

Request vào
     │
     ▼
[Open Session]
     │
     ▼
Service Layer (Transaction mở và đóng)
     │
     ▼
[Transaction closed, nhưng Session vẫn mở]
     │
     ▼
View/Controller (Serialize response)
     │ ← Lazy loading vẫn hoạt động ở đây (session còn mở)
     │ ← Nhưng: không có transaction → mỗi lazy load = new DB call
     ▼
[Close Session]

Vấn đề:

Connection được giữ trong suốt response rendering
Lazy loading silently issue queries ngoài transaction boundary
Khó debug vì queries xảy ra ở view layer

# Disable OSIV (Spring Boot 2+ default: true, nên set false)
spring.jpa.open-in-view=false

Khi disable OSIV, lazy loading ngoài transaction sẽ throw LazyInitializationException — buộc bạn phải explicit fetch data. Đây là behavior đúng.

Quarkus Panache — Active Record vs Repository

// Active Record (Panache Entity) — tốt cho simple CRUD
@Entity
public class Order extends PanacheEntity {
    public String status;
    public BigDecimal totalAmount;
    public Long merchantId;

    // Queries là static methods trên entity
    public static List<Order> findByMerchant(Long merchantId) {
        return find("merchantId = ?1 and status = 'PENDING'", merchantId).list();
    }

    public static PanacheQuery<Order> streamAll() {
        return findAll();
    }
}

// Usage
List<Order> orders = Order.findByMerchant(merchantId);

// Repository pattern — tốt hơn cho complex domain logic, testability
@ApplicationScoped
public class OrderRepository implements PanacheRepository<Order> {
    public List<Order> findPendingByMerchant(Long merchantId, int limit) {
        return find("merchantId = ?1 and status = 'PENDING' order by createdAt",
                    merchantId)
               .page(0, limit)
               .list();
    }
}

Quarkus vs Spring Boot — Performance so sánh:

Metric	Spring Boot 3.x	Quarkus 3.x (JVM)	Quarkus 3.x (Native)
Startup time	2–5s	0.5–2s	10–50ms
Memory (idle)	256–512MB	128–256MB	20–50MB
Throughput (RPS)	Tương đương	Tương đương	Tương đương
First request latency	Cao (JIT warmup)	Thấp hơn	Thấp nhất
Reflection / dynamic proxy	Nhiều	Ít hơn (build-time)	Không

Quarkus native build quan trọng cho: Serverless (cold start), microservices nhiều instance (memory), Kubernetes resource optimization.

Phần 6: Production Scaling Patterns

Read Replicas — Scale đọc, không phải ghi

PostgreSQL streaming replication: primary nhận tất cả writes, replicas nhận WAL stream và apply asynchronously.

Primary DB ──WAL stream──→ Replica 1
           ──WAL stream──→ Replica 2

Application:
- Writes → Primary
- Reads  → Replica 1 hoặc 2 (load balanced)

Replication lag là vấn đề cần xử lý:

// Scenario: User update profile → ngay lập tức redirect to profile page
// Nếu read từ replica với 100ms lag → user thấy data cũ

// Pattern: "Read your own writes" — sau write, đọc từ primary
@Service
public class UserService {
    @Transactional
    public User updateProfile(Long userId, UpdateRequest request) {
        User user = userRepository.findById(userId).get();
        user.apply(request);
        return userRepository.save(user);
        // Return từ primary context — data mới nhất
        // Không redirect sang endpoint khác ngay lập tức
    }
}

// Hoặc: session-level primary routing sau write
public class PrimaryAfterWriteFilter implements Filter {
    @Override
    public void doFilter(ServletRequest request, ...) {
        if (userJustWrote(request)) {
            routingContext.setForceWriteDataSource(true);
        }
        chain.doFilter(request, response);
    }
}

CQRS — Tách Command và Query

Command Query Responsibility Segregation tách write model (commands) và read model (queries) hoàn toàn.

┌─────────────────┐         ┌─────────────────┐
│  Command Model  │         │   Query Model   │
│  (Write Side)   │         │   (Read Side)   │
│                 │         │                 │
│  - Order        │──event──│  - OrderView    │
│  - OrderItem    │──────→  │  - MerchantStat │
│  - Payment      │         │  - DailyReport  │
│  (normalized)   │         │  (denormalized) │
└─────────────────┘         └─────────────────┘
        │                           │
   PostgreSQL                 PostgreSQL (read replica)
   (primary)                  hoặc Elasticsearch
                              hoặc Redis

// Command side — domain-driven, normalized
@Service
public class OrderCommandService {
    @Transactional
    public OrderId placeOrder(PlaceOrderCommand command) {
        Order order = Order.place(command); // Domain logic
        orderRepository.save(order);
        eventPublisher.publish(new OrderPlacedEvent(order));
        return order.getId();
    }
}

// Query side — optimized read model, flat denormalized structure
@Service
public class OrderQueryService {
    public OrderView getOrderView(Long orderId) {
        return orderViewRepository.findById(orderId)
               .orElseThrow(); // Pre-built, denormalized view
    }
}

// Event handler updates read model
@Component
public class OrderViewUpdater {
    @EventListener
    @Transactional
    public void handle(OrderPlacedEvent event) {
        OrderView view = OrderView.builder()
            .orderId(event.getOrderId())
            .merchantName(merchantRepository.findNameById(event.getMerchantId()))
            .totalAmount(event.getTotalAmount())
            // ... denormalized fields
            .build();
        orderViewRepository.save(view);
    }
}

Khi nào KHÔNG dùng CQRS:

Hệ thống đơn giản với ít read pattern
Team nhỏ — overhead vận hành cao
Strong consistency required cho mọi read

Database Partitioning — Scale data volume

PostgreSQL native partitioning (12+):

-- Range partitioning theo created_at
CREATE TABLE orders (
    id BIGINT,
    merchant_id BIGINT,
    status VARCHAR(20),
    created_at TIMESTAMP WITH TIME ZONE,
    total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);

-- Tạo partition mỗi tháng
CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Index tự động apply cho tất cả partitions
CREATE INDEX ON orders (merchant_id, created_at DESC);

-- Query chỉ scan partition liên quan (partition pruning)
EXPLAIN SELECT * FROM orders
WHERE created_at BETWEEN '2026-06-01' AND '2026-06-30';
-- → Chỉ scan orders_2026_06, không phải toàn bộ table

Partition maintenance automation:

-- Tạo partition tự động với pg_partman extension
SELECT partman.create_parent(
    p_parent_table => 'public.orders',
    p_control => 'created_at',
    p_type => 'range',
    p_interval => 'monthly',
    p_premake => 3 -- Tạo sẵn 3 tháng tới
);

Materialized Views — Pre-computed Aggregations

-- View tính toán daily merchant stats — query nặng nếu compute realtime
CREATE MATERIALIZED VIEW merchant_daily_stats AS
SELECT
    merchant_id,
    DATE(created_at) AS date,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    AVG(total_amount) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE status = 'COMPLETED'
GROUP BY merchant_id, DATE(created_at);

CREATE UNIQUE INDEX ON merchant_daily_stats (merchant_id, date);

-- Refresh: concurrent refresh không block reads
REFRESH MATERIALIZED VIEW CONCURRENTLY merchant_daily_stats;

-- Schedule refresh mỗi giờ (pg_cron extension)
SELECT cron.schedule('0 * * * *',
    'REFRESH MATERIALIZED VIEW CONCURRENTLY merchant_daily_stats');

// Đọc từ materialized view trong Spring Data
@Repository
public interface MerchantDailyStatsRepository extends JpaRepository<MerchantDailyStats, MerchantDateKey> {
    List<MerchantDailyStats> findByMerchantIdAndDateBetween(
        Long merchantId, LocalDate start, LocalDate end);
}

@Entity
@Table(name = "merchant_daily_stats")
@Immutable // Hibernate không track dirty changes
public class MerchantDailyStats { ... }

Application-Level Caching — Redis với Spring Cache

@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory factory) {
        RedisCacheConfiguration defaultConfig = RedisCacheConfiguration
            .defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .serializeValuesWith(
                RedisSerializationContext.SerializationPair.fromSerializer(
                    new GenericJackson2JsonRedisSerializer()
                )
            );

        Map<String, RedisCacheConfiguration> cacheConfigs = Map.of(
            "merchants", defaultConfig.entryTtl(Duration.ofHours(1)),
            "categories", defaultConfig.entryTtl(Duration.ofDays(1)),
            "orderViews", defaultConfig.entryTtl(Duration.ofMinutes(5))
        );

        return RedisCacheManager.builder(factory)
            .cacheDefaults(defaultConfig)
            .withInitialCacheConfigurations(cacheConfigs)
            .build();
    }
}

@Service
public class MerchantService {
    @Cacheable(value = "merchants", key = "#merchantId")
    public Merchant getMerchant(Long merchantId) {
        return merchantRepository.findById(merchantId).orElseThrow();
    }

    @CacheEvict(value = "merchants", key = "#merchant.id")
    @Transactional
    public Merchant updateMerchant(Merchant merchant) {
        return merchantRepository.save(merchant);
    }
}

Cache stampede prevention — khi cache expire đồng thời, nhiều request hit DB cùng lúc:

// Probabilistic early expiration (PER) / jittered TTL
public Merchant getMerchantWithJitter(Long merchantId) {
    // TTL = base + random(0, jitter)
    Duration jitter = Duration.ofSeconds(ThreadLocalRandom.current().nextInt(0, 300));
    Duration ttl = Duration.ofMinutes(10).plus(jitter);
    // ...
}

Phần 7: Observability và Troubleshooting

Slow Query Log — Baseline không thể thiếu

-- PostgreSQL: bật slow query log
-- postgresql.conf
log_min_duration_statement = 100  -- Log query > 100ms
log_statement = 'none'            -- Chỉ log slow queries, không phải tất cả
log_duration = off
auto_explain.log_min_duration = 1000  -- Auto-explain query > 1s
auto_explain.log_analyze = true
auto_explain.log_buffers = true

# Phân tích slow query log với pgBadger
pgbadger /var/log/postgresql/postgresql.log -o report.html
# Report: top slow queries, most frequent queries, queries by wait time

EXPLAIN ANALYZE — Đọc execution plan như senior engineer

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total_amount, m.name
FROM orders o
JOIN merchants m ON o.merchant_id = m.id
WHERE o.status = 'PENDING'
  AND o.created_at > NOW() - INTERVAL '24 hours'
ORDER BY o.created_at DESC
LIMIT 100;

Output điển hình và cách đọc:

Limit  (cost=0.56..487.23 rows=100 width=32) (actual time=0.823..12.456 rows=100 loops=1)
  ->  Nested Loop  (cost=0.56..48723.12 rows=10000 width=32) (actual time=0.821..12.201 rows=100 loops=1)
        ->  Index Scan Backward using idx_orders_status_created on orders o
              (cost=0.43..38520.10 rows=10000 width=24)
              (actual time=0.456..8.123 rows=100 loops=1)
              Index Cond: ((status = 'PENDING') AND (created_at > (now() - '1 day'::interval)))
              Buffers: shared hit=203 read=12  ← 12 blocks từ disk, 203 từ cache
        ->  Index Scan using merchants_pkey on merchants m
              (cost=0.13..1.00 rows=1 width=8)
              (actual time=0.036..0.036 rows=1 loops=100)
              Buffers: shared hit=300
Planning Time: 2.567 ms
Execution Time: 12.890 ms

Những điều cần chú ý:

cost=X..Y: X là startup cost (first row), Y là total cost. Đây là estimate, không phải thực tế.
actual time=X..Y: Thời gian thực. Nếu khác xa cost estimate → statistics cũ.
rows=10000 (estimate) vs rows=100 (actual): estimate sai 100×. Cần ANALYZE.
Buffers: shared hit=203 read=12: 12 disk reads → acceptable. Nếu reads » hits → buffer pool miss.
loops=100: node này chạy 100 lần. Multiply actual time × loops cho tổng.

Red flags trong execution plan:

Seq Scan trên table lớn → thiếu index
Sort với Sort Method: external merge → sort không vừa trong work_mem
Hash với Batches: 4 → hash table không vừa trong memory
Row count estimate lệch > 10× → chạy ANALYZE
Filter: (condition) sau Index Scan → index không cover filter condition

Hibernate Statistics — Debug N+1 trong production

// application.properties
spring.jpa.properties.hibernate.generate_statistics=true

// Programmatic access
@Autowired
private EntityManagerFactory emf;

public void printStats() {
    Statistics stats = emf.unwrap(SessionFactory.class).getStatistics();
    log.info("Query count: {}", stats.getQueryExecutionCount());
    log.info("Entity loads: {}", stats.getEntityLoadCount());
    log.info("Collections loaded: {}", stats.getCollectionLoadCount());
    log.info("Second-level cache hits: {}", stats.getSecondLevelCacheHitCount());
    log.info("Second-level cache misses: {}", stats.getSecondLevelCacheMissCount());
}

// Detect N+1 với DataSource proxy
@Bean
public DataSource dataSource(DataSourceProperties properties) {
    HikariDataSource hikariDS = properties.initializeDataSourceBuilder()
        .type(HikariDataSource.class).build();

    return ProxyDataSourceBuilder.create(hikariDS)
        .logSlowQueryBySlf4j(100, TimeUnit.MILLISECONDS)
        .countQuery()
        .build();
}

Connection Pool Monitoring

// Prometheus metrics với Micrometer
// Thêm dependency: micrometer-registry-prometheus

// application.properties
management.endpoints.web.exposure.include=health,info,metrics,prometheus
management.metrics.tags.application=${spring.application.name}

// Metrics tự động expose:
// hikaricp_connections_active  — gauge
// hikaricp_connections_idle    — gauge
// hikaricp_connections_pending — gauge
// hikaricp_connections_timeout_total — counter (acquisition timeout)
// hikaricp_connections_creation_seconds — histogram (time to create connection)

Grafana Dashboard alert rules:

# Alert: Connection pool gần đầy
- alert: HikariConnectionPoolSaturation
  expr: hikaricp_connections_pending > 2
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: "HikariCP pool has {{ $value }} pending connections"

# Alert: Connection acquisition timeout tăng
- alert: HikariConnectionTimeout
  expr: rate(hikaricp_connections_timeout_total[5m]) > 0
  for: 1m
  labels:
    severity: critical

Chẩn đoán Lock Contention và Deadlock

-- Xem locks hiện tại
SELECT
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state,
    wait_event_type,
    wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY duration DESC;

-- Xem lock dependencies (ai đang block ai)
SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.query AS blocking_query,
    blocked.wait_event
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
    ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';

Deadlock detection — PostgreSQL log:

2026-06-07 10:30:15 UTC ERROR: deadlock detected
DETAIL:  Process 1234 waits for ShareLock on transaction 5678;
         blocked by process 5678.
         Process 5678 waits for ShareLock on transaction 1234;
         blocked by process 1234.
HINT:  See server log for query details.

Prevention strategies:

// Consistent lock ordering — luôn lock theo cùng thứ tự
// Bad: Transaction A lock order_1 rồi order_2
//       Transaction B lock order_2 rồi order_1 → deadlock potential
@Transactional
public void transferBalance(Long fromId, Long toId, BigDecimal amount) {
    // Sort IDs để đảm bảo consistent lock order
    Long firstId = Math.min(fromId, toId);
    Long secondId = Math.max(fromId, toId);

    Account first = accountRepository.findByIdWithLock(firstId);  // SELECT FOR UPDATE
    Account second = accountRepository.findByIdWithLock(secondId);

    // Tiếp tục với logic...
}

// SELECT FOR UPDATE trong Spring Data JPA
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT a FROM Account a WHERE a.id = :id")
Account findByIdWithLock(@Param("id") Long id);

Diagnosing High Database CPU

Các nguyên nhân phổ biến và cách tìm:

-- 1. Tìm query tốn CPU nhiều nhất (cần pg_stat_statements extension)
SELECT
    query,
    calls,
    total_exec_time / calls AS avg_time_ms,
    rows / calls AS avg_rows,
    100.0 * total_exec_time / sum(total_exec_time) OVER () AS cpu_percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 2. Tìm sequential scans lớn
SELECT
    relname AS table,
    seq_scan AS sequential_scans,
    seq_tup_read AS rows_read_sequentially,
    idx_scan AS index_scans,
    n_live_tup AS live_rows
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;

-- 3. Reset statistics sau fix để verify improvement
SELECT pg_stat_reset();
SELECT pg_stat_statements_reset();

Query regression detection:

// Track query performance metrics với custom Hibernate interceptor
public class SlowQueryInterceptor implements StatementInspector {
    private final MeterRegistry meterRegistry;

    @Override
    public String inspect(String sql) {
        // Instrument tất cả queries
        return sql;
    }
}

// Hoặc dùng datasource proxy để log execution time mỗi query
// Sau đó alert khi P99 latency của specific query tăng

Phần 8: Production Checklist

Query Review Checklist

□ Không có SELECT * trong production code
□ Tất cả WHERE clauses được test với EXPLAIN ANALYZE
□ Không có N+1 query (verify với Hibernate statistics hoặc query counter)
□ Pagination dùng keyset (cursor-based) cho data > 10k rows
□ Aggregations xảy ra ở database, không phải application
□ JOINs không fetch data không cần thiết
□ NOT IN được replace bằng NOT EXISTS hoặc LEFT JOIN IS NULL
□ Queries được test với production data volume (không phải dev data 100 rows)
□ Query timeout được set (statement_timeout trong PostgreSQL)

-- Set query timeout theo session (Spring Boot)
@Bean
public DataSource dataSource() {
    HikariConfig config = new HikariConfig();
    config.setConnectionInitSql("SET statement_timeout = '30s'");
    return new HikariDataSource(config);
}

Index Review Checklist

□ Mọi foreign key có index (PostgreSQL không tự tạo)
□ Columns thường xuyên trong WHERE, JOIN ON, ORDER BY có index
□ Composite indexes được sắp xếp đúng (equality trước, range sau)
□ Covering indexes cho top 10 hot queries
□ Unused indexes được identify và scheduled để drop
□ Index bloat được monitor (sau heavy UPDATE workload)
□ Partial indexes cho low-cardinality columns với skewed distribution

-- Kiểm tra foreign keys không có index (PostgreSQL)
SELECT
    conrelid::regclass AS table,
    conname AS constraint,
    array_to_string(conkey, ', ') AS columns
FROM pg_constraint
WHERE contype = 'f'
  AND conrelid::regclass::text NOT IN (
      SELECT tablename FROM pg_indexes WHERE tablename = conrelid::regclass::text
  );

-- Index bloat
SELECT
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan,
    idx_tup_read
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

Hibernate Review Checklist

□ Tất cả @OneToMany và @ManyToMany là FetchType.LAZY
□ Không có MultipleBagFetchException workarounds bằng EAGER
□ JDBC batching được enable (hibernate.jdbc.batch_size)
□ SEQUENCE generation strategy thay vì IDENTITY
□ spring.jpa.open-in-view=false
□ @Transactional(readOnly=true) cho tất cả read-only service methods
□ Second-level cache được cấu hình cho stable reference data
□ Entity không load @Lob fields khi không cần
□ Persistence context được clear trong batch processing jobs

Connection Pool Review Checklist

□ Pool size được tính đúng (công thức HikariCP) × số instances
□ max-lifetime < PostgreSQL idle_in_transaction_session_timeout
□ connection-timeout phù hợp với SLA
□ leak-detection-threshold được set
□ Pool metrics được expose và monitored
□ Alert được set cho pending connections > threshold
□ PgBouncer hoặc connection pooler ở database level nếu nhiều app instances

Deployment Review Checklist

□ Schema migration (Flyway/Liquibase) test trên production data clone
□ Index creation dùng CONCURRENTLY (không block)
□ ALTER TABLE dùng phương pháp không lock khi có thể
□ ANALYZE chạy sau bulk data changes
□ Slow query log được enable ở production
□ Backup và rollback plan cho migration
□ Connection pool drain trước rolling restart

-- Safe index creation — không block production traffic
CREATE INDEX CONCURRENTLY idx_orders_merchant_status
ON orders (merchant_id, status)
WHERE status IN ('PENDING', 'PROCESSING');

-- Safe column addition (không lock nếu có default)
-- PostgreSQL 11+: Adding column with non-volatile default không rewrite table
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMP;
-- Sau đó set default riêng
ALTER TABLE orders ALTER COLUMN processed_at SET DEFAULT NOW();

War Stories — Lessons từ Production

Incident 1: Materialized View Refresh Block

Hệ thống reporting refresh materialized view mỗi giờ. Bình thường mất 30 giây. Một ngày, refresh không xong sau 20 phút và blocking mọi read trên view.

Root cause: Ai đó thêm REFRESH MATERIALIZED VIEW thay vì REFRESH MATERIALIZED VIEW CONCURRENTLY. Non-concurrent refresh lấy AccessExclusiveLock — block tất cả reads.

Fix: Luôn dùng CONCURRENTLY. Để dùng CONCURRENTLY, view phải có unique index. Nếu không có unique index, không thể dùng concurrent refresh — thiết kế lại view để có unique key.

Incident 2: Statistics Lag sau Migration

Team migrate 200 triệu rows từ old table sang new table trong weekend. Thứ Hai sáng, mọi query trên new table chạy như sequential scan dù có đầy đủ indexes.

Root cause: pg_statistics hoàn toàn trống cho new table. Optimizer ước tính 1 row cho mọi query → chọn nhầm plan (nested loop với 200M rows outer).

Fix: Chạy ANALYZE new_table ngay sau migration. Thêm vào checklist migration.

Incident 3: HikariCP Pool Exhaustion

Service xử lý order nhận đột biến traffic trong flash sale. Connection pool (size=10) bị exhaust trong 30 giây. Mọi request timeout.

Root cause: Transaction boundary quá rộng — mỗi order request giữ connection trong suốt validation (gọi inventory service) + payment (gọi payment gateway) + DB write. Mỗi transaction giữ connection trung bình 2 giây thay vì 50ms.

Fix: Tách external calls ra ngoài transaction. Transaction scope giảm xuống < 100ms. Pool hỗ trợ throughput tăng 20×.

Incident 4: Composite Index Column Order Sai

Query chạy 8 giây trong production dù có index:

-- Query:
WHERE created_at > '2026-01-01' AND merchant_id = 456 AND status = 'COMPLETED'

-- Index hiện có: (created_at, merchant_id, status)
-- created_at là range condition → sau range condition, phần còn lại của index không được dùng
-- Optimizer chọn sequential scan vì index không selective đủ

Fix: Đổi index thành (merchant_id, status, created_at) — equality columns trước, range sau. Query time giảm từ 8 giây xuống 3ms.

Kết luận

Database optimization không có silver bullet. Mỗi quyết định — index nào tạo, fetch type nào chọn, pool size bao nhiêu — đều là trade-off giữa read performance, write performance, memory, và operational complexity.

Ba thứ cần làm ngay nếu chưa có:

Enable slow query log với threshold 100ms. Biết query nào chậm trước khi optimize bất kỳ thứ gì.
Set spring.jpa.open-in-view=false và fix LazyInitializationException. Buộc code explicit về data fetching.
Monitor HikariCP pool metrics. Connection exhaustion là failure mode khó recover nhất.

Ba thứ tránh xa:

EAGER loading trên collections. Không có exception.
OFFSET pagination trên table lớn. Switch sang keyset.
Transaction scope bao gồm external service calls. Transaction giữ connection và lock. External calls có thể timeout bất kỳ lúc nào.

Database performance là một kỹ năng đến từ kinh nghiệm đọc execution plans, hiểu tại sao optimizer chọn plan đó, và có intuition về khi nào thêm index giúp vs khi nào restructure query cần thiết. Không có shortcut — nhưng hiểu fundamentals như bài này trình bày sẽ giúp bạn debug nhanh hơn rất nhiều khi incident xảy ra lúc 2 giờ sáng.