Xây dựng High-Performance REST API trong Java: Hướng dẫn cho Senior Backend Engineer

API chậm không chỉ là trải nghiệm xấu — nó là tiền. Amazon tính toán mỗi 100ms latency tăng thêm làm giảm 1% doanh thu. Google thấy 500ms chậm hơn dẫn đến 20% ít traffic hơn. Shopify xử lý flash sale hàng trăm nghìn requests/phút. Những số này không đến từ máy tính nhanh hơn — chúng đến từ hiểu rõ bottleneck ở đâu trong hệ thống và loại bỏ chúng có chủ ý.

Performance không phải thứ bạn thêm vào sau. Nó là kết quả của mọi quyết định thiết kế. Bài này không dạy cách thêm cache vào mọi thứ hay tăng thread pool lên 1000. Nó dạy cách đọc hệ thống, tìm bottleneck thực sự, và fix đúng thứ — cách một senior engineer tiếp cận vấn đề.

Phần 1: API Chậm Vì Điều Gì?

Request lifecycle đầy đủ

Trước khi optimize bất cứ thứ gì, phải hiểu một request đi qua đâu và mất thời gian ở đâu.

Client
  │── DNS lookup (1-100ms, cached sau lần đầu)
  │── TCP handshake (1 RTT ≈ 0.5-50ms tùy geography)
  │── TLS handshake (1-2 RTTs thêm)
  ▼
Load Balancer
  │── Health check routing
  │── SSL termination (nếu không offload)
  │── Connection overhead (negligible nếu keep-alive)
  ▼
API Server (JVM)
  │── Thread pool wait (0 nếu available, up to seconds nếu exhausted)
  │── Request deserialization — JSON parse (1-50ms tùy payload size)
  │── Authentication/Authorization (token validation, database check)
  │── Business logic (depends on logic complexity)
  │── Serialization — response to JSON
  ▼
Database
  │── Connection pool acquisition (0-30s nếu exhausted)
  │── Query execution (1ms-10s tùy query complexity và index)
  │── Network round-trip (0.5-2ms trong cùng datacenter)
  ▼
Cache (Redis)
  │── Network round-trip (0.5-1ms)
  │── Cache hit: return data
  │── Cache miss: fall through to DB
  ▼
External Services
  │── DNS + TCP + TLS (nếu không reuse connection)
  │── External API latency (10ms-5s, out of your control)
  │── Timeout handling

Latency của request = tổng của mọi bước trong path. Bước chậm nhất quyết định tất cả. Nếu external payment API mất 800ms, dù bạn optimize database từ 50ms xuống 5ms, latency tổng thể vẫn là ~850ms.

Đây là lý do tại sao profiling quan trọng hơn guessing. Senior engineers không đoán bottleneck — họ đo.

Latency contributions thực tế trong production

Dưới đây là phân tích latency điển hình của một API endpoint đọc data từ DB và cache:

P50 latency = 45ms

Breakdown:
  Network (client → LB): 8ms   (18%)
  Auth token validation:  3ms   (7%)
  Cache lookup (Redis):   2ms   (4%)  — cache hit
  Serialization:          5ms   (11%)
  Business logic:         2ms   (4%)
  DB query (cache miss):  25ms  (56%) — khi cache miss
  Network (LB → client):  8ms   (18%)

Database chiếm phần lớn trong phần lớn API. Bắt đầu optimize ở đó.

Thread contention — bottleneck ẩn

// Đây là production code thực sự gây vấn đề:
@RestController
public class ReportController {
    private static final Map<String, Report> reportCache = new HashMap<>(); // KHÔNG thread-safe!

    @GetMapping("/reports/{id}")
    public Report getReport(@PathVariable String id) {
        // ConcurrentModificationException ở production
        // Hoặc worse: stale read silently return sai data
        return reportCache.computeIfAbsent(id, this::generateReport);
    }
}

Thread contention xảy ra khi nhiều threads cạnh tranh cùng resource (lock, I/O, CPU). Biểu hiện: CPU thấp nhưng throughput thấp, nhiều threads ở trạng thái BLOCKED trong thread dump.

Garbage Collection — Latency spike không đoán trước

GC pause là kẻ thù của consistent low latency. Trong G1GC (Spring Boot default), minor GC thường < 5ms, nhưng full GC có thể pause toàn bộ JVM hàng trăm milliseconds.

Timeline:
│ request │ request │ request │ request │ request │ request │
│  45ms   │  42ms   │  48ms   │ [GC 200ms pause] │  44ms   │
                                    ↑
                    Request này thấy latency tăng 5× dù không có gì thay đổi

P99 latency cao bất thường so với P50 thường là dấu hiệu GC pressure hoặc thread pool contention.

Phần 2: Hiểu Performance Metrics Đúng Cách

Tại sao average latency là vô nghĩa

Scenario: 100 requests trong 1 phút
- 95 requests: 50ms
- 4 requests: 200ms
- 1 request: 5,000ms (database query miss index)

Average: (95×50 + 4×200 + 1×5000) / 100 = 100ms

Bạn báo cáo "average latency là 100ms". Sự thật:
- 1% users đợi 5 giây
- Với 1 triệu requests/ngày: 10,000 requests mỗi ngày mất 5 giây

Average bị kéo bởi outliers theo hướng không đại diện. User trải nghiệm distribution, không phải average.

Percentiles — Cách engineer senior nghĩ

P50 (Median): 50% requests hoàn thành trong hoặc dưới giá trị này. Trải nghiệm “typical user”.

P95: 95% requests hoàn thành trong giá trị này. 5% requests chậm hơn. Dùng để set SLA.

P99: 99% requests hoàn thành trong giá trị này. Quan trọng cho: heavy users (họ gửi nhiều request nhất), peak traffic, và operations có nhiều fan-out.

Hệ thống A:    P50=50ms  P95=100ms  P99=200ms  — consistent, predictable
Hệ thống B:    P50=40ms  P95=500ms  P99=3000ms — bimodal, có vấn đề nghiêm trọng

Hệ thống B trông tốt hơn ở P50 nhưng hoàn toàn không chấp nhận được ở P99.

Tại sao P99 quan trọng đặc biệt trong microservices:

Service A gọi Service B gọi Service C gọi Service D

Nếu mỗi service có P99 latency = 100ms:
P99 của toàn bộ chain = 1 - (0.99^4) ≈ 4% requests > 100ms ở bất kỳ step nào
P99 latency của chain ≈ 400ms+ (4 services trong worst case path)

Với 10 services: P99 của toàn chain gần 1 giây dù mỗi service chỉ mất 100ms
→ Microservices khuếch đại P99 latency

Throughput vs Latency — Trade-off cơ bản

// Scenario: Batch vs individual processing
// Option A: Xử lý ngay từng request — latency thấp, throughput thấp
@PostMapping("/orders")
public OrderResponse createOrder(@RequestBody OrderRequest request) {
    Order order = orderService.create(request); // Commit ngay
    return OrderResponse.from(order);
    // Latency: 50ms, Throughput: 200 RPS
}

// Option B: Buffer và batch — latency cao hơn, throughput cao hơn
@PostMapping("/orders")
public OrderResponse createOrder(@RequestBody OrderRequest request) {
    orderQueue.enqueue(request);
    return OrderResponse.accepted(); // Return ngay, xử lý async
    // Latency: 5ms, Throughput: 5000 RPS
    // Trade-off: user không biết ngay order success hay fail
}

Không có throughput tốt và latency tốt miễn phí. Phải quyết định ưu tiên gì dựa trên use case.

Saturation — Tín hiệu sắp vỡ

Saturation là mức độ resource đang được dùng so với capacity. Connection pool 90% full là signal nguy hiểm dù hiện tại vẫn OK.

HikariCP pool: 10 connections
8 đang dùng (80% saturation) → Còn buffer
9 đang dùng (90% saturation) → Warning: traffic tăng nhẹ là exhausted
10 đang dùng (100% saturation) → Requests đang xếp hàng chờ connection

Monitor saturation, không chỉ current usage. Saturation > 70-80% ở bất kỳ resource nào cần attention.

Phần 3: Database Là Bottleneck Phổ Biến Nhất

N+1 Query — Cách Hibernate giết performance

// Controller trả về merchants với orders của họ
@GetMapping("/merchants")
public List<MerchantResponse> getMerchants() {
    List<Merchant> merchants = merchantRepo.findAll(); // 1 query

    return merchants.stream()
        .map(merchant -> MerchantResponse.builder()
            .id(merchant.getId())
            .name(merchant.getName())
            .orderCount(merchant.getOrders().size()) // N queries! LAZY load mỗi merchant
            .totalRevenue(merchant.getOrders().stream()
                .mapToDouble(o -> o.getTotal().doubleValue())
                .sum())
            .build())
        .toList();
    // Với 100 merchants: 1 + 100 = 101 queries
    // Mỗi query 2ms: 202ms overhead thuần từ N+1
}

Detect N+1 trong development:

// Dùng datasource-proxy để đếm queries per request
@Bean
public DataSource dataSource(DataSourceProperties properties) {
    HikariDataSource ds = properties.initializeDataSourceBuilder()
        .type(HikariDataSource.class).build();

    return ProxyDataSourceBuilder.create(ds)
        .name("Query-Counter")
        .countQuery()
        .logSlowQueryBySlf4j(50, TimeUnit.MILLISECONDS)
        .afterQuery((execInfo, queryInfoList) -> {
            if (queryInfoList.size() > 10) {
                log.warn("N+1 suspected: {} queries in one request, first: {}",
                    queryInfoList.size(),
                    queryInfoList.get(0).getQuery());
            }
        })
        .build();
}

Fix — aggregate trong database:

// Query 1 lấy tất cả data cần thiết
@Query("""
    SELECT new com.example.dto.MerchantStats(
        m.id,
        m.name,
        COUNT(o.id),
        COALESCE(SUM(o.total), 0)
    )
    FROM Merchant m
    LEFT JOIN m.orders o
    GROUP BY m.id, m.name
    """)
List<MerchantStats> findAllWithStats();
// 1 query, database aggregates, không cần lazy load

Fix 2 — JOIN FETCH khi cần full entities:

@Query("""
    SELECT DISTINCT m FROM Merchant m
    LEFT JOIN FETCH m.orders o
    WHERE m.active = true
    """)
List<Merchant> findActiveWithOrders();

SELECT * — Không chỉ là wasteful

// Tưởng tượng Product entity có 30 fields bao gồm:
@Entity
public class Product {
    // ... 25 fields thông thường ...
    @Lob
    private byte[] fullDescription; // HTML content, 50KB mỗi product
    @Lob
    private byte[] technicalManual; // PDF, 5MB mỗi product
}

// API list 50 products:
List<Product> products = productRepo.findAll(pageable); // SELECT * → load 5MB × 50 = 250MB!

// Fix: Projection chỉ lấy cần thiết
public interface ProductSummary {
    Long getId();
    String getName();
    String getSku();
    BigDecimal getPrice();
    // Không có fullDescription, technicalManual
}

List<ProductSummary> products = productRepo.findAllProjectedBy(pageable);
// SELECT id, name, sku, price FROM products — < 1KB per product

Quarkus Panache với projection:

@ApplicationScoped
public class ProductRepository implements PanacheRepository<Product> {
    public List<ProductSummaryDTO> findSummaries(int page, int pageSize) {
        return find("active = true")
            .page(page, pageSize)
            .project(ProductSummaryDTO.class)
            .list();
    }
}

Over-fetching và Under-fetching trong API Design

Over-fetching: Client nhận nhiều data hơn cần. Ảnh hưởng: bandwidth, serialization, memory.

Under-fetching: Client phải gọi nhiều requests để lấy đủ data. Ảnh hưởng: network round-trips, latency tăng.

// Over-fetching: một endpoint trả về mọi thứ
@GetMapping("/users/{id}")
public User getUser(@PathVariable Long id) {
    return userRepo.findById(id).orElseThrow();
    // Trả về: profile, settings, 200 orders, 50 reviews, payment methods...
    // Mobile app chỉ cần tên và avatar cho list view
}

// Pattern tốt hơn: Sparse fieldsets hoặc separate endpoints
@GetMapping("/users/{id}")
public UserResponse getUser(
    @PathVariable Long id,
    @RequestParam(required = false) Set<String> fields
) {
    User user = userRepo.findById(id).orElseThrow();
    return UserResponse.of(user, fields); // Chỉ include requested fields
}

// Client gọi: GET /users/123?fields=id,name,avatarUrl

Phần 4: Connection Pool Optimization

Tại sao mở database connection đắt

Tạo một JDBC connection mới với PostgreSQL bao gồm:

TCP handshake (0.5-2ms)
TLS handshake nếu encrypted (1-2ms thêm)
PostgreSQL authentication protocol (1-2 round-trips)
Session parameter negotiation
Allocate server-side memory cho session

Tổng: 5-15ms overhead để có được một connection. Với 1000 requests/giây, tạo connection mới cho mỗi request = 5-15 giây overhead mỗi giây — hệ thống không bao giờ catch up.

Connection pool giữ connections warm và tái sử dụng, eliminating overhead này.

HikariCP Sizing — Công thức thực tế

Formula từ HikariCP documentation:

pool_size = (core_count × 2) + effective_spindle_count

Với SSD: effective_spindle_count = 1
Server 4 cores + SSD: pool_size = (4 × 2) + 1 = 9

Tại sao không phải “nhiều connection = tốt hơn”?

PostgreSQL server: max_connections = 100
Application: 20 instances, mỗi instance pool_size = 20
Total connections = 400 → PostgreSQL reject connections!

Đúng: pool_size = 100 / 20 instances = 5 connections per instance
      (Chừa 20 connections cho admin, monitoring, migrations)

# application.yml — Production-ready HikariCP config
spring:
  datasource:
    hikari:
      maximum-pool-size: 10
      minimum-idle: 5
      connection-timeout: 5000       # 5s — fail fast, đừng để user chờ
      idle-timeout: 300000           # 5 phút
      max-lifetime: 1800000          # 30 phút — recycle trước firewall timeout
      keepalive-time: 60000          # 1 phút ping
      leak-detection-threshold: 10000 # 10s — warn về connection leaks
      validation-timeout: 3000       # 3s timeout để validate connection

Pool Exhaustion — Biểu hiện và chẩn đoán

Biểu hiện:
- API latency tăng đột biến từ 50ms → 5+ giây
- Error: "Unable to acquire JDBC Connection" hoặc
          "Connection is not available, request timed out after 5000ms"
- Database CPU thấp (DB không bận, app đang chờ connection)
- hikaricp_connections_pending > 0

Timeline điển hình:
T=0    Traffic tăng đột biến (flash sale, deploy mới)
T=30s  Pool hit max, requests bắt đầu queue
T=35s  connection-timeout bắt đầu expire → 503 errors
T=40s  Error rate > 50%
T=45s  Traffic tự giảm (retries storm) → hoặc worse: retry storm

// Detect pool exhaustion trước khi nó thành incident
@Component
public class ConnectionPoolHealthCheck {
    @Autowired private HikariDataSource dataSource;

    @Scheduled(fixedRate = 5000)
    public void checkPool() {
        HikariPoolMXBean pool = dataSource.getHikariPoolMXBean();
        int pending = pool.getThreadsAwaitingConnection();
        double utilization = (double) pool.getActiveConnections()
                             / dataSource.getMaximumPoolSize();

        if (pending > 0) {
            log.error("CONNECTION POOL: {} requests waiting. Active={}, Idle={}, Max={}",
                pending,
                pool.getActiveConnections(),
                pool.getIdleConnections(),
                pool.getTotalConnections());
        }

        Metrics.gauge("hikaricp.utilization", utilization);
    }
}

PgBouncer — Connection Pooling ở Database Level

Khi nhiều application instances cần nhiều connections hơn PostgreSQL có thể handle:

Without PgBouncer:
20 app instances × 10 connections = 200 connections → PostgreSQL
PostgreSQL overhead: memory per connection ≈ 5-10MB → 1-2GB just for connections

With PgBouncer (transaction pooling mode):
20 app instances × 10 = 200 connections → PgBouncer
PgBouncer: 10-20 connections → PostgreSQL
PostgreSQL chỉ thấy 10-20 connections, scale hàng nghìn app connections

# pgbouncer.ini
[databases]
mydb = host=localhost dbname=mydb

[pgbouncer]
pool_mode = transaction          # Reuse connection sau mỗi transaction
max_client_conn = 1000           # Tối đa client connections đến PgBouncer
default_pool_size = 20           # Connections từ PgBouncer đến PostgreSQL
min_pool_size = 5
reserve_pool_size = 5            # Emergency connections
server_idle_timeout = 300

Trade-off: Transaction pooling mode không tương thích với một số PostgreSQL features: prepared statements, advisory locks, SET parameters. Kiểm tra trước khi adopt.

Phần 5: Serialization và JSON Performance

Tại sao serialization đắt ở scale

Với 10,000 requests/giây, mỗi request serialize response 10KB:

10,000 RPS × 10KB = 100MB/giây JSON serialization
+ Jackson dùng reflection để đọc field names và values
+ Object allocation cho intermediate representation
+ GC pressure từ short-lived objects

Jackson reflection có thể chiếm 10-30% CPU trong high-throughput services.

Jackson Internals và Tuning

Jackson có hai layer khi serialize:

ObjectMapper — high-level API, caches type information
JsonSerializer — per-type serializer, generated hoặc reflection-based

// ObjectMapper là EXPENSIVE to create — tạo một lần, reuse
// Spring Boot tự quản lý nhưng hiểu điều này quan trọng

@Configuration
public class JacksonConfig {
    @Bean
    @Primary
    public ObjectMapper objectMapper() {
        return JsonMapper.builder()
            // Performance: Skip null fields giảm payload size
            .serializationInclusion(JsonInclude.Include.NON_NULL)
            // Performance: Fail fast thay vì ignore unknown fields
            .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
            // Performance: Không serialize dates thành arrays
            .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
            // Module đăng ký type handler
            .addModule(new JavaTimeModule())
            // Performance: Afterburner module dùng bytecode generation thay vì reflection
            .addModule(new AfterburnerModule())
            .build();
    }
}

AfterburnerModule — quan trọng nhất cho performance: thay thế reflection bằng bytecode generation, tăng throughput 2-5x cho serialization.

MapStruct thay vì BeanUtils:

// BeanUtils.copyProperties: dùng reflection, slow
BeanUtils.copyProperties(order, orderDto); // Chậm hơn

// MapStruct: compile-time code generation, zero reflection overhead
@Mapper(componentModel = "spring")
public interface OrderMapper {
    OrderDto toDto(Order order);
    Order toEntity(OrderDto dto);
}

// MapStruct generate code như:
public OrderDto toDto(Order order) {
    OrderDto dto = new OrderDto();
    dto.setId(order.getId());
    dto.setStatus(order.getStatus().name());
    // ... pure Java, không reflection
    return dto;
}

Circular Reference và Object Graph Explosion

// Circular reference — gây StackOverflowError hoặc infinite JSON:
@Entity
public class Order {
    @ManyToOne
    private Customer customer; // Customer có List<Order>
}

@Entity
public class Customer {
    @OneToMany
    private List<Order> orders; // Orders có Customer → infinite loop
}

// Jackson annotation fix:
@Entity
public class Customer {
    @OneToMany
    @JsonManagedReference // Serialize này
    private List<Order> orders;
}

@Entity
public class Order {
    @ManyToOne
    @JsonBackReference // Không serialize cái này
    private Customer customer;
}

// Hoặc tốt hơn: tạo DTO riêng, break circular reference explicitly
public record OrderDto(Long id, String status, Long customerId) {}
// Không bao giờ serialize entity trực tiếp

Response Compression — Giảm bandwidth 60-90%

# application.yml — Enable compression
server:
  compression:
    enabled: true
    mime-types: application/json,text/html,text/plain
    min-response-size: 1024  # Chỉ compress responses > 1KB

Không compress: 100KB JSON response → 100KB transfer
Với gzip:       100KB JSON response → ~10-15KB transfer (85-90% smaller)
Với brotli:     100KB JSON response → ~8-12KB transfer (88-92% smaller)

Trade-off: Compression tốn CPU. Với response < 1KB, compression overhead > bandwidth savings. CDN thường handle compression tốt hơn — offload nếu có thể.

Streaming cho large responses

// Thay vì load tất cả vào memory rồi serialize:
@GetMapping("/reports/export")
public ResponseEntity<List<SalesRecord>> exportReport() {
    List<SalesRecord> allRecords = reportRepo.findAll(); // 1GB vào heap!
    return ResponseEntity.ok(allRecords);
}

// Dùng StreamingResponseBody — write trực tiếp vào output stream
@GetMapping(value = "/reports/export", produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<StreamingResponseBody> exportReport() {
    StreamingResponseBody stream = outputStream -> {
        JsonGenerator generator = objectMapper.getFactory()
            .createGenerator(outputStream);
        generator.writeStartArray();

        reportRepo.findAllAsStream().forEach(record -> { // Stream từ DB
            try {
                objectMapper.writeValue(generator, record);
            } catch (IOException e) {
                throw new UncheckedIOException(e);
            }
        });

        generator.writeEndArray();
        generator.close();
    };

    return ResponseEntity.ok()
        .contentType(MediaType.APPLICATION_JSON)
        .body(stream);
}
// Memory usage: constant, không phụ thuộc vào data size

Phần 6: REST API Design cho Performance

Offset Pagination — Tại sao không scale

-- Page 1: Nhanh
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 0;

-- Page 500: PostgreSQL phải:
-- 1. Scan từ đầu (có thể dùng index)
-- 2. Skip 9,980 rows
-- 3. Return 20 rows
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 9980;
-- Cost tuyến tính với offset — page 5000 = 100,000 rows scanned và discarded

Keyset Pagination — O(log n) bất kể trang nào:

// Request: GET /orders?limit=20&after_created=2026-06-01T10:00:00Z&after_id=9876
@GetMapping("/orders")
public PageResponse<OrderDto> getOrders(
    @RequestParam int limit,
    @RequestParam(required = false) Instant afterCreated,
    @RequestParam(required = false) Long afterId
) {
    List<Order> orders;

    if (afterCreated == null) {
        // First page
        orders = orderRepo.findFirstPage(limit + 1);
    } else {
        // Subsequent pages — cursor-based
        orders = orderRepo.findNextPage(afterCreated, afterId, limit + 1);
    }

    boolean hasMore = orders.size() > limit;
    List<Order> page = hasMore ? orders.subList(0, limit) : orders;

    String nextCursor = hasMore
        ? buildCursor(page.get(page.size() - 1))
        : null;

    return PageResponse.of(page.stream().map(mapper::toDto).toList(), nextCursor);
}

// Repository:
@Query("""
    SELECT o FROM Order o
    WHERE (o.createdAt < :afterCreated)
       OR (o.createdAt = :afterCreated AND o.id < :afterId)
    ORDER BY o.createdAt DESC, o.id DESC
    """)
List<Order> findNextPage(
    @Param("afterCreated") Instant afterCreated,
    @Param("afterId") Long afterId,
    Pageable pageable
);
// Index: (created_at DESC, id DESC) → O(log n) traversal

Response format với cursor:

{
  "data": [...],
  "pagination": {
    "hasMore": true,
    "nextCursor": "eyJjcmVhdGVkQXQiOiIyMDI2LTA2LTAxVDEwOjAwOjAwWiIsImlkIjo5ODc2fQ==",
    "limit": 20
  }
}

Filtering và Sorting — Ảnh hưởng trực tiếp đến DB

// Dynamic filter builder — không string concatenation (SQL injection!)
@GetMapping("/orders")
public Page<OrderDto> searchOrders(
    @RequestParam(required = false) String status,
    @RequestParam(required = false) Long merchantId,
    @RequestParam(required = false) @DateTimeFormat(iso = DateTimeFormat.ISO.DATE) LocalDate dateFrom,
    @RequestParam(required = false) @DateTimeFormat(iso = DateTimeFormat.ISO.DATE) LocalDate dateTo,
    @RequestParam(defaultValue = "createdAt") String sortBy,
    @RequestParam(defaultValue = "DESC") String sortDir,
    Pageable pageable
) {
    Specification<Order> spec = Specification.where(null);

    if (status != null) spec = spec.and(OrderSpecs.hasStatus(status));
    if (merchantId != null) spec = spec.and(OrderSpecs.forMerchant(merchantId));
    if (dateFrom != null) spec = spec.and(OrderSpecs.createdAfter(dateFrom));
    if (dateTo != null) spec = spec.and(OrderSpecs.createdBefore(dateTo));

    // Validate sort column — tránh injection, tránh sort trên non-indexed column
    Sort sort = buildSafeSort(sortBy, sortDir);

    return orderRepo.findAll(spec, PageRequest.of(pageable.getPageNumber(),
                                                   pageable.getPageSize(), sort))
                    .map(mapper::toDto);
}

private Sort buildSafeSort(String sortBy, String sortDir) {
    // Whitelist allowed sort fields
    Set<String> allowedFields = Set.of("createdAt", "totalAmount", "status");
    if (!allowedFields.contains(sortBy)) {
        sortBy = "createdAt"; // Default safe fallback
    }
    Sort.Direction direction = sortDir.equalsIgnoreCase("ASC")
        ? Sort.Direction.ASC : Sort.Direction.DESC;
    return Sort.by(direction, sortBy);
}

Performance consideration cho dynamic filters: Mỗi filter combination có thể cần index riêng. Với 5 filterable fields, có thể có nhiều combinations. Giải pháp:

Index trên highest-cardinality, most-common filter columns
Composite index cho các combinations phổ biến nhất
Elasticsearch/PostgreSQL full-text search cho complex search

Sparse Fieldsets — Giảm payload và DB fetch

// Client chỉ request fields cần thiết
// GET /orders?fields=id,status,totalAmount

@GetMapping("/orders/{id}")
public Map<String, Object> getOrder(
    @PathVariable Long id,
    @RequestParam(required = false) Set<String> fields
) {
    Order order = orderRepo.findById(id).orElseThrow();

    if (fields == null || fields.isEmpty()) {
        return mapper.toFullMap(order);
    }

    return mapper.toPartialMap(order, fields);
}

Sparse fieldsets giảm payload size và có thể enable covering indexes (nếu DB projection match index).

Phần 7: Caching Strategies

Tại sao cache tồn tại

Database là expensive: disk I/O, query planning, lock acquisition. Với read-heavy workloads (phần lớn web API là 80-95% reads), serving data từ memory thay vì disk là một trong những optimizations có ROI cao nhất.

Caching chỉ đúng nghĩa khi data được đọc nhiều hơn được write. Cache cho frequently-changing data tạo ra consistency problems mà không đem lại lợi ích.

Cache Aside (Lazy Loading) — Pattern phổ biến nhất

@Service
public class ProductService {
    @Autowired private ProductRepository repo;
    @Autowired private RedisTemplate<String, Product> redisTemplate;
    private static final Duration TTL = Duration.ofMinutes(10);

    public Product getProduct(Long id) {
        String key = "product:" + id;

        // 1. Check cache
        Product cached = redisTemplate.opsForValue().get(key);
        if (cached != null) {
            return cached; // Cache hit — ~1ms
        }

        // 2. Cache miss → load from DB
        Product product = repo.findById(id).orElseThrow(); // ~5-50ms

        // 3. Populate cache
        redisTemplate.opsForValue().set(key, product, TTL);

        return product;
    }

    public void updateProduct(Product product) {
        repo.save(product);
        // Invalidate cache
        redisTemplate.delete("product:" + product.getId());
        // Hoặc: update cache với data mới (write-through)
    }
}

Spring Cache abstraction — cleaner:

@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory factory) {
        RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .serializeValuesWith(RedisSerializationContext.SerializationPair.fromSerializer(
                new GenericJackson2JsonRedisSerializer()
            ))
            .disableCachingNullValues();

        return RedisCacheManager.builder(factory)
            .cacheDefaults(config)
            .withCacheConfiguration("products", config.entryTtl(Duration.ofHours(1)))
            .withCacheConfiguration("orders", config.entryTtl(Duration.ofMinutes(5)))
            .build();
    }
}

@Service
public class ProductService {
    @Cacheable(value = "products", key = "#id")
    public ProductDto getProduct(Long id) {
        return mapper.toDto(repo.findById(id).orElseThrow());
    }

    @CacheEvict(value = "products", key = "#product.id")
    @Transactional
    public ProductDto updateProduct(ProductDto product) {
        return mapper.toDto(repo.save(mapper.toEntity(product)));
    }

    @CacheEvict(value = "products", allEntries = true)
    @Scheduled(fixedRate = 3600000) // Clear tất cả mỗi giờ
    public void evictAllProducts() {}
}

Write-Through — Consistency cao hơn

// Write-through: update cache và DB cùng lúc
@Transactional
public Product updateProductWriteThrough(Product product) {
    Product saved = repo.save(product); // DB first
    redisTemplate.opsForValue().set(
        "product:" + saved.getId(),
        saved,
        Duration.ofMinutes(10)
    ); // Cache update ngay sau DB
    return saved;
}

// Advantage: cache always fresh sau write
// Disadvantage: write latency tăng (DB + Redis), cache chứa data ít được read

Cache Stampede — Khi cache miss gây sập DB

Cache key "popular-products" expire lúc 14:00:00.000
14:00:00.001: 1000 concurrent requests hit cache → miss
14:00:00.002: 1000 requests đồng thời query DB → DB overload

// Fix 1: Probabilistic Early Recomputation
public Product getProductWithPER(Long id) {
    String key = "product:" + id;
    ValueOperations<String, CachedValue<Product>> ops = redisTemplate.opsForValue();

    CachedValue<Product> cached = ops.get(key);
    if (cached != null) {
        // Tính xác suất recompute trước khi expire thực sự
        long remainingTtl = redisTemplate.getExpire(key, TimeUnit.SECONDS);
        double beta = 1.0; // Tuning parameter
        double logFetchTime = Math.log(cached.getFetchTimeMs() / 1000.0);
        double threshold = -beta * logFetchTime * remainingTtl;

        if (Math.random() > Math.exp(threshold)) {
            return cached.getValue(); // Cache hit, còn fresh
        }
        // Probabilistically recompute sớm để warm cache trước expire
    }

    return recomputeAndCache(id);
}

// Fix 2: Distributed lock — chỉ 1 request recompute, rest chờ
public Product getProductWithLock(Long id) {
    String key = "product:" + id;
    String lockKey = "lock:product:" + id;

    Product cached = redisTemplate.opsForValue().get(key);
    if (cached != null) return cached;

    // Try acquire lock
    Boolean locked = redisTemplate.opsForValue()
        .setIfAbsent(lockKey, "1", Duration.ofSeconds(10));

    if (Boolean.TRUE.equals(locked)) {
        try {
            Product product = repo.findById(id).orElseThrow();
            redisTemplate.opsForValue().set(key, product, Duration.ofMinutes(10));
            return product;
        } finally {
            redisTemplate.delete(lockKey);
        }
    } else {
        // Chờ lock holder hoàn thành
        Thread.sleep(100);
        return getProductWithLock(id); // Retry
    }
}

Cache Penetration — Query data không tồn tại

Attacker hoặc bug: liên tục query IDs không tồn tại
→ Mỗi request: cache miss → DB query → không tìm thấy → không cache (null)
→ Mọi request hit DB → DB overload

// Fix: Cache null results với short TTL
@Cacheable(value = "products", key = "#id", unless = "#result == null")
public ProductDto getProduct(Long id) {
    return repo.findById(id).map(mapper::toDto).orElse(null);
}
// Sẽ không cache null vì unless="#result == null"

// Better: Cache null explicitly với short TTL
public Optional<Product> getProduct(Long id) {
    String key = "product:" + id;
    Object cached = redisTemplate.opsForValue().get(key);

    if (cached != null) {
        return cached instanceof NullMarker ? Optional.empty() : Optional.of((Product) cached);
    }

    Optional<Product> product = repo.findById(id);
    if (product.isPresent()) {
        redisTemplate.opsForValue().set(key, product.get(), Duration.ofMinutes(10));
    } else {
        redisTemplate.opsForValue().set(key, new NullMarker(), Duration.ofMinutes(1));
        // Cache miss result với TTL ngắn hơn
    }
    return product;
}

Cache Avalanche — Tất cả keys expire cùng lúc

// Thay vì TTL cố định:
redisTemplate.opsForValue().set(key, value, Duration.ofMinutes(10));
// Nếu tất cả keys được cache lúc 2:00 AM → tất cả expire 2:10 AM → stampede

// Fix: TTL có jitter
Duration baseTtl = Duration.ofMinutes(10);
Duration jitter = Duration.ofSeconds(ThreadLocalRandom.current().nextInt(0, 300));
redisTemplate.opsForValue().set(key, value, baseTtl.plus(jitter));
// Keys expire trong window 10-15 phút, không đồng thời

Local Cache — Caffeine trước Redis

Cho read-heavy data ít thay đổi (config, currencies, categories), local in-memory cache nhanh hơn Redis 100x.

@Configuration
public class LocalCacheConfig {
    @Bean
    public CacheManager localCacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(1000)              // Max entries
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .recordStats()                  // Enable metrics
        );
        return manager;
    }
}

// Layer caching: L1 (local Caffeine) → L2 (Redis) → DB
@Service
public class CurrencyService {
    @Caching(cacheable = {
        @Cacheable(cacheManager = "localCacheManager", value = "currencies", key = "#code"),
        @Cacheable(cacheManager = "redisCacheManager", value = "currencies", key = "#code")
    })
    public Currency getCurrency(String code) {
        return currencyRepo.findByCode(code);
    }
}

Trade-off local cache: Data có thể stale khác nhau giữa các nodes. Khi config thay đổi, phải invalidate ở tất cả instances. Dùng Redis pub/sub để broadcast invalidation:

@Component
public class CacheInvalidationListener {
    @Autowired private CacheManager localCacheManager;

    @RedisListener(topics = "cache:invalidate")
    public void handleInvalidation(String cacheKey) {
        // Invalidate local cache khi Redis nhận message
        String[] parts = cacheKey.split(":");
        localCacheManager.getCache(parts[0]).evict(parts[1]);
    }
}

Phần 8: Threading và Concurrency

Tomcat Thread Pool — Cơ chế mặc định

Spring Boot với Tomcat: mỗi HTTP request được handle bởi một thread từ pool. Thread bị blocked khi đợi I/O (DB query, external call).

Max threads = 200 (default Tomcat)
Mỗi request giữ thread trong 100ms
Max throughput = 200 threads / 100ms = 2,000 RPS

Nhưng nếu request mất 500ms do external service call:
Max throughput = 200 threads / 500ms = 400 RPS
→ Thread pool exhaust tại 400 RPS dù server không bận

# application.yml — Tomcat thread pool tuning
server:
  tomcat:
    threads:
      max: 200              # Tăng nếu workload là I/O-bound
      min-spare: 20         # Minimum threads luôn ready
    accept-count: 100       # Queue size khi tất cả threads busy
    connection-timeout: 5000 # 5s để complete request
    max-connections: 8192   # Tối đa TCP connections

Tại sao không tăng max threads lên 1000?

Mỗi thread tốn ~512KB-1MB stack memory. 1000 threads = 500MB-1GB chỉ cho stack. Context switching overhead tăng. Đây là lý do non-blocking I/O tồn tại.

Virtual Threads (Java 21) — Game changer

Virtual threads (Project Loom) cho phép có hàng triệu threads mà không overhead của platform threads.

// Spring Boot 3.2+ — Enable virtual threads
# application.yml
spring:
  threads:
    virtual:
      enabled: true

// Với virtual threads:
// Mỗi request vẫn dùng một "thread" nhưng là virtual thread
// Khi virtual thread block (waiting for DB, HTTP call), nó unmount từ carrier thread
// Carrier thread có thể serve virtual thread khác
// → Hiệu quả như non-blocking nhưng code style blocking truyền thống

// Trước virtual threads — Reactive:
@GetMapping("/orders/{id}")
public Mono<OrderDto> getOrder(@PathVariable Long id) {
    return orderRepository.findById(id) // Returns Mono
        .map(mapper::toDto)
        .switchIfEmpty(Mono.error(new NotFoundException()));
    // Code khó debug, stack trace không có ý nghĩa
}

// Với virtual threads — đơn giản hơn:
@GetMapping("/orders/{id}")
public OrderDto getOrder(@PathVariable Long id) {
    return orderRepository.findById(id) // Blocking style
        .map(mapper::toDto)
        .orElseThrow(NotFoundException::new);
    // Code bình thường, blocking style, nhưng scalable như non-blocking
}

// Custom executor với virtual threads nếu cần
@Bean
public Executor virtualThreadExecutor() {
    return Executors.newVirtualThreadPerTaskExecutor();
}

// Async với virtual threads
@Async("virtualThreadExecutor")
public CompletableFuture<Report> generateReport(Long merchantId) {
    // Runs on virtual thread — không block platform thread
    Report report = expensiveReportGeneration(merchantId);
    return CompletableFuture.completedFuture(report);
}

Khi nào Virtual Threads KHÔNG đủ:

CPU-bound tasks (không có I/O blocking) — virtual threads không giúp ích gì
Tasks cần explicit backpressure — reactive streams vẫn phù hợp hơn
Pinning issues với synchronized blocks — kiểm tra JVM flags để detect

Backpressure — Tránh overload

// Không có backpressure: client gửi bao nhiêu request, server accept tất cả
// → Memory exhaustion, GC pressure, OOM

// Rate limiting ở API level (Resilience4j):
@Bean
public RateLimiter rateLimiter() {
    RateLimiterConfig config = RateLimiterConfig.custom()
        .limitRefreshPeriod(Duration.ofSeconds(1))
        .limitForPeriod(1000)        // Max 1000 requests/giây
        .timeoutDuration(Duration.ofMillis(100))
        .build();
    return RateLimiter.of("api-rate-limiter", config);
}

@GetMapping("/orders")
@RateLimiter(name = "api-rate-limiter", fallbackMethod = "rateLimitFallback")
public Page<OrderDto> getOrders(Pageable pageable) {
    return orderService.findAll(pageable);
}

public Page<OrderDto> rateLimitFallback(Pageable pageable, RequestNotPermitted e) {
    throw new TooManyRequestsException("Rate limit exceeded. Retry after 1 second.");
}

Phần 9: Async Processing

Không phải mọi thứ cần xảy ra trong request

// Synchronous — user đợi tất cả:
@PostMapping("/orders")
public OrderResponse placeOrder(@RequestBody OrderRequest request) {
    Order order = orderService.create(request);         // 50ms
    emailService.sendConfirmation(order);               // 300ms — SMTP
    pushNotification.send(order.getUserId(), order);    // 200ms — FCM
    analyticsService.track("order_placed", order);      // 100ms — analytics DB
    // Total: 650ms, user đợi tất cả
    return OrderResponse.from(order);
}

// Async — user nhận response ngay:
@PostMapping("/orders")
public OrderResponse placeOrder(@RequestBody OrderRequest request) {
    Order order = orderService.create(request);         // 50ms — critical path
    // Fire and forget — không block response
    CompletableFuture.runAsync(() -> emailService.sendConfirmation(order));
    CompletableFuture.runAsync(() -> pushNotification.send(order.getUserId(), order));
    CompletableFuture.runAsync(() -> analyticsService.track("order_placed", order));
    // Total: 50ms user-facing, rest happen in background
    return OrderResponse.from(order);
}

Pitfall của CompletableFuture.runAsync(): Default dùng ForkJoinPool.commonPool() — shared với tất cả code trong JVM. Một heavy task có thể starve other tasks.

// Dedicated executor:
@Bean(name = "asyncExecutor")
public Executor asyncExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(10);
    executor.setMaxPoolSize(50);
    executor.setQueueCapacity(500);
    executor.setThreadNamePrefix("async-");
    executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
    executor.initialize();
    return executor;
}

@Async("asyncExecutor")
public CompletableFuture<Void> sendConfirmationEmail(Order order) {
    emailService.sendConfirmation(order);
    return CompletableFuture.completedFuture(null);
}

Message Queue cho durability

CompletableFuture.runAsync() không durable — nếu server crash trước khi email sent, email lost.

// Dùng message queue cho critical async tasks:
@PostMapping("/orders")
@Transactional
public OrderResponse placeOrder(@RequestBody OrderRequest request) {
    Order order = orderService.create(request);

    // Publish message — trong cùng transaction (Transactional Outbox pattern)
    outboxRepo.save(OutboxMessage.builder()
        .eventType("ORDER_CONFIRMATION_EMAIL")
        .payload(objectMapper.writeValueAsString(new EmailRequest(order)))
        .build());

    return OrderResponse.from(order);
}

// Consumer — chạy riêng, retry tự động nếu fail
@KafkaListener(topics = "order.email.requests")
public void processEmailRequest(EmailRequest request) {
    emailService.sendConfirmation(request);
    // Nếu fail → Kafka retry với backoff
    // Message không mất dù server crash
}

Parallel Fan-Out — Aggregate multiple services

// Sequential — chậm:
@GetMapping("/dashboard/{merchantId}")
public DashboardData getDashboard(@PathVariable Long merchantId) {
    MerchantStats stats = statsService.get(merchantId);    // 50ms
    List<Order> recentOrders = orderService.getRecent(merchantId); // 80ms
    RevenueChart chart = chartService.get(merchantId);     // 60ms
    // Total: 190ms sequential
    return new DashboardData(stats, recentOrders, chart);
}

// Parallel — nhanh hơn:
@GetMapping("/dashboard/{merchantId}")
public DashboardData getDashboard(@PathVariable Long merchantId) {
    CompletableFuture<MerchantStats> statsFuture =
        CompletableFuture.supplyAsync(() -> statsService.get(merchantId), asyncExecutor);

    CompletableFuture<List<Order>> ordersFuture =
        CompletableFuture.supplyAsync(() -> orderService.getRecent(merchantId), asyncExecutor);

    CompletableFuture<RevenueChart> chartFuture =
        CompletableFuture.supplyAsync(() -> chartService.get(merchantId), asyncExecutor);

    CompletableFuture.allOf(statsFuture, ordersFuture, chartFuture).join();

    // Total: max(50, 80, 60) = 80ms parallel (vs 190ms sequential)
    return new DashboardData(
        statsFuture.join(),
        ordersFuture.join(),
        chartFuture.join()
    );
}

Với timeout để tránh hanging:

try {
    CompletableFuture.allOf(statsFuture, ordersFuture, chartFuture)
        .get(2, TimeUnit.SECONDS); // Overall timeout
} catch (TimeoutException e) {
    // Cancel pending futures
    statsFuture.cancel(true);
    ordersFuture.cancel(true);
    chartFuture.cancel(true);
    throw new ServiceUnavailableException("Dashboard data timeout");
}

Phần 10: HTTP-Level Optimizations

Keep-Alive và Connection Reuse

Không Keep-Alive:
Client → [TCP handshake] → Request → Response → [TCP close]
         3ms overhead                              3ms overhead

Với Keep-Alive (HTTP/1.1 default):
Client → [TCP handshake] → Request 1 → Response 1
                        → Request 2 → Response 2  (no handshake!)
                        → Request N → Response N
         3ms overhead once

Spring Boot (Tomcat) enable Keep-Alive by default. Đảm bảo client (RestTemplate, HttpClient) cũng support:

// RestTemplate với connection pooling:
@Bean
public RestTemplate restTemplate() {
    HttpComponentsClientHttpRequestFactory factory =
        new HttpComponentsClientHttpRequestFactory();
    factory.setHttpClient(
        HttpClients.custom()
            .setConnectionManager(PoolingHttpClientConnectionManager.create(
                RegistryBuilder.<ConnectionSocketFactory>create()
                    .register("http", PlainConnectionSocketFactory.getSocketFactory())
                    .register("https", SSLConnectionSocketFactory.getSystemSocketFactory())
                    .build()
            ))
            .setConnectionReuseStrategy(DefaultClientConnectionReuseStrategy.INSTANCE)
            .build()
    );
    return new RestTemplate(factory);
}

// Hoặc WebClient (Spring WebFlux) — handles connection pooling tốt hơn:
@Bean
public WebClient webClient() {
    HttpClient httpClient = HttpClient.create()
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000)
        .responseTimeout(Duration.ofSeconds(5))
        .doOnConnected(conn -> conn
            .addHandlerLast(new ReadTimeoutHandler(5))
            .addHandlerLast(new WriteTimeoutHandler(5)));

    return WebClient.builder()
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .build();
}

HTTP/2 — Multiplexing và Header Compression

HTTP/1.1: Mỗi request cần connection riêng (hoặc pipeline — ít được support). Browser mở 6 connections max per domain.

HTTP/2: Nhiều requests trên cùng một connection, không head-of-line blocking.

# Spring Boot — Enable HTTP/2 với embedded server
server:
  http2:
    enabled: true
  ssl:
    enabled: true  # HTTP/2 thực tế yêu cầu TLS trong hầu hết browsers
    key-store: classpath:keystore.p12
    key-store-password: ${SSL_PASSWORD}
    key-store-type: PKCS12

HTTP/1.1 (6 parallel connections):
Conn1: GET /api/orders    → 80ms
Conn2: GET /api/users     → 60ms
Conn3: GET /api/products  → 90ms
... (3 more connections)

HTTP/2 (1 connection, multiplexed):
Stream1: GET /api/orders  ─┐
Stream2: GET /api/users   ─┤─→ All on same connection
Stream3: GET /api/products ─┘
+ Header compression (HPACK): repeated headers (Authorization, Content-Type) compressed

Impact: Nhỏ cho server-to-server communication (thường đã có connection pooling). Lớn cho browser-to-server (nhiều parallel requests, header compression, no HOL blocking).

ETag và Conditional Requests — Tránh transfer data không cần thiết

@GetMapping("/products/{id}")
public ResponseEntity<ProductDto> getProduct(
    @PathVariable Long id,
    @RequestHeader(value = "If-None-Match", required = false) String ifNoneMatch
) {
    Product product = productService.findById(id);
    String etag = "\"" + product.getVersion() + "\""; // Hoặc MD5 hash

    if (etag.equals(ifNoneMatch)) {
        return ResponseEntity.status(HttpStatus.NOT_MODIFIED).build();
        // 304: Không transfer data, client dùng cached version
        // Tiết kiệm: bandwidth + serialization cost + DB fetch (nếu version cached)
    }

    return ResponseEntity.ok()
        .eTag(etag)
        .cacheControl(CacheControl.maxAge(60, TimeUnit.SECONDS))
        .body(mapper.toDto(product));
}

Cache-Control Headers — Browser và CDN Caching

@GetMapping("/static/product-catalog")
public ResponseEntity<List<ProductDto>> getProductCatalog() {
    List<ProductDto> catalog = catalogService.getAll();

    return ResponseEntity.ok()
        .cacheControl(CacheControl
            .maxAge(1, TimeUnit.HOURS)       // Browser cache 1 giờ
            .staleWhileRevalidate(5, TimeUnit.MINUTES) // Serve stale trong 5 phút khi revalidating
            .staleIfError(1, TimeUnit.DAYS)  // Serve stale 1 ngày nếu origin down
        )
        .body(catalog);
}

@GetMapping("/user/{id}/profile")
public ResponseEntity<UserProfile> getProfile(@PathVariable Long id) {
    UserProfile profile = userService.getProfile(id);

    return ResponseEntity.ok()
        .cacheControl(CacheControl.noStore()) // Sensitive data — không cache
        .body(profile);
}

Phần 11: External Service Optimization

Timeout — Phòng tuyến đầu tiên

Không có timeout = một slow downstream service có thể hold tất cả threads.

// RestTemplate với timeout:
@Bean
public RestTemplate restTemplate() {
    HttpComponentsClientHttpRequestFactory factory =
        new HttpComponentsClientHttpRequestFactory();
    factory.setConnectTimeout(3000);     // 3s để establish connection
    factory.setReadTimeout(5000);         // 5s để đọc response
    // Nếu payment API mất > 5s → timeout, fail fast
    return new RestTemplate(factory);
}

// WebClient (preferred):
@Bean
public WebClient paymentClient() {
    return WebClient.builder()
        .baseUrl(paymentServiceUrl)
        .clientConnector(new ReactorClientHttpConnector(
            HttpClient.create()
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000)
                .responseTimeout(Duration.ofSeconds(5))
        ))
        .build();
}

Timeout strategy:

Connect timeout < Read timeout: Connection establishment thường fast hơn
Read timeout = P99 latency của downstream + buffer
Tổng timeout của chain < SLA của endpoint bạn

Circuit Breaker — Fail Fast thay vì Slow Fail

Circuit breaker ngăn cascade failure: khi downstream service liên tục fail, stop calling it thay vì tiếp tục waste resources.

CLOSED state (normal):
  Calls pass through → track failure rate

If failure rate > threshold → OPEN state:
  All calls fail immediately (no network call) → save resources
  → Wait cooldown period

After cooldown → HALF-OPEN state:
  Allow limited calls to test if service recovered
  → If success → back to CLOSED
  → If fail → back to OPEN

// Resilience4j Circuit Breaker:
@Bean
public CircuitBreaker paymentCircuitBreaker(CircuitBreakerRegistry registry) {
    CircuitBreakerConfig config = CircuitBreakerConfig.custom()
        .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
        .slidingWindowSize(10)               // Last 10 calls
        .failureRateThreshold(50)            // Open nếu >= 50% fail
        .waitDurationInOpenState(Duration.ofSeconds(30))  // 30s cooldown
        .permittedNumberOfCallsInHalfOpenState(3)
        .slowCallRateThreshold(80)           // Cũng coi slow calls là failures
        .slowCallDurationThreshold(Duration.ofSeconds(2))
        .build();

    return registry.circuitBreaker("payment-service", config);
}

@Service
public class PaymentService {
    @Autowired private CircuitBreaker paymentCircuitBreaker;

    public PaymentResult charge(PaymentRequest request) {
        return paymentCircuitBreaker.executeSupplier(
            () -> paymentApiClient.charge(request)
        );
    }
}

Fallback strategies:

@CircuitBreaker(name = "recommendation-service", fallbackMethod = "getDefaultRecommendations")
public List<ProductDto> getRecommendations(Long userId) {
    return recommendationClient.getForUser(userId);
}

// Fallback: Degrade gracefully — không fail toàn bộ page
public List<ProductDto> getDefaultRecommendations(Long userId, Exception e) {
    log.warn("Recommendation service unavailable, serving popular products. Error: {}", e.getMessage());
    return popularProductsCache.getTopProducts(10); // Serve popular products instead
}

Retry Storms — Khi retry trở thành vấn đề

Payment service trả về 503
→ 1000 clients retry ngay lập tức
→ 1000 requests hit payment service cùng lúc
→ Payment service bị overload bởi retries
→ 503 liên tục
→ Clients retry lại...
→ Vòng lặp vô hạn (Retry Storm)

// Retry với exponential backoff + jitter:
@Bean
public Retry paymentRetry() {
    RetryConfig config = RetryConfig.custom()
        .maxAttempts(3)
        .waitDuration(Duration.ofMillis(500))
        .intervalFunction(IntervalFunction.ofExponentialRandomBackoff(
            500,   // Initial interval ms
            2.0,   // Multiplier
            0.5,   // Randomization factor (jitter)
            10000  // Max interval ms
        ))
        // Chỉ retry cho specific errors:
        .retryOnException(e -> e instanceof ConnectTimeoutException
                             || e instanceof ServiceUnavailableException)
        // Không retry cho client errors:
        .ignoreExceptions(BadRequestException.class, UnauthorizedException.class)
        .build();

    return Retry.of("payment-retry", config);
}

Retry timeline với jitter:
Request 1: 503
Retry 1: wait 400-600ms (500ms base ± 50% jitter)
Retry 2: wait 800-1200ms
Retry 3: wait 1600-2400ms
Fail (max attempts reached)
→ Retries spread out, không cùng lúc

Bulkhead — Isolation giữa downstream services

// Không có bulkhead:
// Payment service chậm → chiếm tất cả threads → Inventory, Shipping cũng không được serve

// Với bulkhead — dedicated thread pool cho mỗi downstream service:
@Bean
public ThreadPoolBulkhead paymentBulkhead() {
    ThreadPoolBulkheadConfig config = ThreadPoolBulkheadConfig.custom()
        .maxThreadPoolSize(5)         // Chỉ 5 threads cho payment calls
        .coreThreadPoolSize(3)
        .queueCapacity(10)            // Queue 10 requests
        .keepAliveDuration(Duration.ofSeconds(30))
        .build();

    return ThreadPoolBulkhead.of("payment", config);
}

Phần 12: JVM-Level Optimizations

Garbage Collection — Chọn đúng GC cho workload

G1GC (Default Java 11+): Balanced latency và throughput. Good default cho phần lớn applications.

ZGC (Java 15+ production-ready): Sub-millisecond pause times. Tốt cho latency-sensitive applications.

Shenandoah: Tương tự ZGC, concurrent, low pause.

Workload             Recommended GC
─────────────────────────────────────
General OLTP API     G1GC (default)
Low-latency API      ZGC hoặc Shenandoah
High-throughput batch ParallelGC
Large heap (> 32GB)  ZGC

# JVM flags cho production API server:

# G1GC (default):
-XX:+UseG1GC
-Xms2g -Xmx4g              # Heap: min=max để tránh resize
-XX:MaxGCPauseMillis=200   # Target max pause (G1 cố optimize)
-XX:G1HeapRegionSize=8m
-XX:+G1UseAdaptiveIHOP

# ZGC cho low-latency:
-XX:+UseZGC
-Xms4g -Xmx4g
-XX:+ZGenerational          # ZGC generational mode (Java 21+)

Object Allocation — Chi phí không nhìn thấy

// High-allocation code — tạo nhiều short-lived objects:
@GetMapping("/orders")
public List<OrderDto> getOrders() {
    return orderRepo.findAll().stream()
        .map(order -> {
            // Tạo StringBuilder mỗi iteration
            String formattedDate = new SimpleDateFormat("yyyy-MM-dd")
                .format(order.getCreatedAt()); // SimpleDateFormat không thread-safe!
            return new OrderDto(
                order.getId(),
                order.getStatus(),
                formattedDate,
                // String concatenation tạo StringBuilder + String objects
                "Order-" + order.getId() + "-" + order.getMerchantId()
            );
        })
        .collect(Collectors.toList());
}

// Better:
private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd");
// DateTimeFormatter là thread-safe, tạo một lần

@GetMapping("/orders")
public List<OrderDto> getOrders() {
    return orderRepo.findAll().stream()
        .map(order -> new OrderDto(
            order.getId(),
            order.getStatus(),
            FORMATTER.format(order.getCreatedAt()),  // Reuse formatter
            String.format("Order-%d-%d", order.getId(), order.getMerchantId())
        ))
        .toList(); // Java 16+ — không tạo mutable List
}

String Allocation — Thủ phạm GC thường gặp nhất

// String concatenation trong loop = nhiều String objects:
public String buildQuery(List<Long> ids) {
    String query = "SELECT * FROM orders WHERE id IN (";
    for (Long id : ids) {
        query += id + ","; // Mỗi += tạo StringBuilder mới!
    }
    return query + ")";
}

// Better:
public String buildQuery(List<Long> ids) {
    return "SELECT * FROM orders WHERE id IN (" +
        ids.stream()
           .map(Object::toString)
           .collect(Collectors.joining(",")) +
        ")";
}

// Hoặc dùng parameterized query (preferred cho SQL):
// "SELECT * FROM orders WHERE id IN (:ids)"
// với Spring Data: findAllById(ids)

Escape Analysis — JVM optimization tự động

JVM có thể allocate short-lived objects trên stack thay vì heap (không cần GC). Điều này xảy ra khi JVM prove object không “escape” method.

// Object có thể được stack-allocated (không escape):
public int sumOrderItems(Order order) {
    Iterator<OrderItem> iter = order.getItems().iterator(); // Iterator local
    int sum = 0;
    while (iter.hasNext()) {
        sum += iter.next().getQuantity();
    }
    return sum; // Iterator không escape method
}

// Object escape (phải heap-allocate):
public List<OrderItem> getExpensiveItems(Order order) {
    List<OrderItem> result = new ArrayList<>();
    for (OrderItem item : order.getItems()) {
        if (item.getPrice().compareTo(THRESHOLD) > 0) {
            result.add(item); // result escapes method
        }
    }
    return result; // Must be heap-allocated
}

Đây là lý do tại sao JVM performance không luôn match intuition — optimizer làm nhiều thứ bạn không nhìn thấy. Profile trước khi optimize thủ công.

Phần 13: Observability và Performance Debugging

Ba trụ cột observability

Metrics: Aggregated numbers — throughput, latency percentiles, error rate. Tốt cho alerting và trending.

Traces: End-to-end request flow qua multiple services. Tốt để tìm bottleneck trong distributed system.

Logs: Detailed event records. Tốt cho debugging specific incidents.

Micrometer + Prometheus + Grafana Stack

// Spring Boot tự-expose metrics tại /actuator/prometheus
// Thêm custom metrics:
@RestController
public class OrderController {
    private final Counter orderCounter;
    private final Timer orderLatencyTimer;
    private final DistributionSummary payloadSizeDistribution;

    public OrderController(MeterRegistry registry) {
        this.orderCounter = Counter.builder("api.orders.created")
            .tag("environment", "production")
            .description("Total orders created")
            .register(registry);

        this.orderLatencyTimer = Timer.builder("api.orders.latency")
            .publishPercentiles(0.5, 0.95, 0.99) // Publish P50, P95, P99
            .publishPercentileHistogram(true)
            .register(registry);

        this.payloadSizeDistribution = DistributionSummary.builder("api.response.size")
            .baseUnit("bytes")
            .register(registry);
    }

    @PostMapping("/orders")
    public ResponseEntity<OrderResponse> createOrder(@RequestBody OrderRequest request) {
        return orderLatencyTimer.recordCallable(() -> {
            orderCounter.increment();
            OrderResponse response = orderService.create(request);
            payloadSizeDistribution.record(objectMapper.writeValueAsBytes(response).length);
            return ResponseEntity.ok(response);
        });
    }
}

Grafana Dashboard queries:

# API latency P99 (PromQL)
histogram_quantile(0.99, rate(api_orders_latency_seconds_bucket[5m]))

# Error rate
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
  / rate(http_server_requests_seconds_count[5m])

# Throughput
rate(api_orders_created_total[5m])

# HikariCP saturation
hikaricp_connections_active / hikaricp_connections_max

OpenTelemetry Distributed Tracing

// application.yml — Spring Boot 3 auto-instruments với Micrometer Tracing
management:
  tracing:
    sampling:
      probability: 0.1  # Sample 10% requests (100% quá đắt)

spring:
  application:
    name: order-service

# Gửi traces đến Jaeger hoặc Zipkin
management:
  otlp:
    tracing:
      endpoint: http://jaeger:4317

// Custom span cho operations quan trọng:
@Service
public class OrderService {
    @Autowired private Tracer tracer;

    public Order processOrder(OrderRequest request) {
        Span span = tracer.nextSpan().name("order.process")
            .tag("merchant.id", request.getMerchantId().toString())
            .start();

        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            Order order = createOrder(request);        // Auto-traced nếu dùng instrumented DB
            Span paymentSpan = tracer.nextSpan().name("payment.charge").start();
            try (Tracer.SpanInScope ps = tracer.withSpan(paymentSpan)) {
                paymentService.charge(order);
            } finally {
                paymentSpan.end();
            }
            return order;
        } finally {
            span.end();
        }
    }
}

Java Flight Recorder — Production Profiling

JFR là built-in profiler với overhead < 1-2%, safe for always-on production recording.

# Bắt đầu JFR recording (có thể attach vào running JVM):
jcmd <PID> JFR.start duration=60s filename=/tmp/recording.jfr settings=profile

# Analyze với JDK Mission Control:
jmc /tmp/recording.jfr

# Automated với JFR API trong code:
@Component
public class PerformanceRecorder {
    @Scheduled(fixedDelay = 3600000) // Mỗi giờ
    public void captureProfile() throws Exception {
        Path file = Path.of("/tmp/profile-" + System.currentTimeMillis() + ".jfr");
        Recording recording = new Recording();
        recording.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
        recording.enable("jdk.GarbageCollection");
        recording.enable("jdk.SocketRead").withThreshold(Duration.ofMillis(10));
        recording.start();
        Thread.sleep(60000); // Record 1 phút
        recording.dump(file);
        recording.stop();
        s3Client.upload(file); // Upload để analyze
    }
}

Thread Dump Analysis

# Lấy thread dump khi API chậm:
jstack <PID> > thread-dump.txt

# Hoặc với jcmd:
jcmd <PID> Thread.print > thread-dump.txt

# Phân tích: tìm BLOCKED threads
grep -A 5 "BLOCKED" thread-dump.txt

# Tìm threads đang chờ lock:
grep -B 2 "waiting to lock" thread-dump.txt

Patterns nguy hiểm trong thread dump:

Nhiều threads ở WAITING state → Tất cả chờ cùng condition (pool exhaustion)
BLOCKED "waiting to lock <0x...>" → Lock contention
TIMED_WAITING "parking" → Có thể OK nếu trong async queue, hoặc stuck nếu idle
RUNNABLE với "socketRead" → Thread đang chờ network I/O (blocking)

Async Profiler — CPU Hotspot Detection

# Async Profiler: accurate CPU profiling, không JVM safepoint bias
./profiler.sh -d 30 -f /tmp/flamegraph.html -e cpu,alloc <PID>

# Flame graph: wide bars = more CPU time
# Look for:
# - Wide bars trong serialization code → optimize DTOs
# - Wide bars trong reflection → add AfterburnerModule
# - Wide bars trong GC → reduce allocation
# - Wide bars trong JDBC → N+1 queries, missing indexes

Phần 14: Production Performance Incidents

Incident 1: API Latency Nhảy Từ 80ms Lên 4 Giây

Triệu chứng: Lúc 2:00 PM, P99 latency của /api/orders tăng từ 80ms lên 4,000ms. P50 vẫn ~90ms. Error rate gần 0. CPU và memory bình thường.

Investigation:

-- Kiểm tra long-running queries
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle' AND now() - query_start > INTERVAL '1 second'
ORDER BY duration DESC;

-- Kết quả: 1 query đang chạy 3.5 giây:
-- SELECT * FROM orders WHERE merchant_id = 123 ORDER BY created_at DESC

EXPLAIN ANALYZE SELECT * FROM orders
WHERE merchant_id = 123
ORDER BY created_at DESC
LIMIT 20;

-- Seq Scan on orders (cost=0.00..850000.00 rows=10000000)
-- Filter: merchant_id = 123
-- MISSING INDEX!

Root cause: Merchant 123 có lưu lượng giao dịch tăng gấp 10 lần sau marketing campaign. Trước đây query vẫn OK vì số rows ít. Bây giờ table có 10 triệu rows, sequential scan mất 3-4 giây.

P50 vẫn OK: Phần lớn merchants ít orders → queries fast. P99 phản ánh merchants lớn nhất.

Fix:

CREATE INDEX CONCURRENTLY idx_orders_merchant_created
ON orders (merchant_id, created_at DESC);
-- CONCURRENTLY: không block production traffic
-- Query time: 3.5s → 3ms

Prevention:

# Slow query log bắt buộc:
# postgresql.conf
log_min_duration_statement = 500  # Log queries > 500ms
auto_explain.log_min_duration = 1000

Incident 2: Connection Pool Exhausted Trong Flash Sale

Triệu chứng: 3:00 PM flash sale bắt đầu. Trong 90 giây, tất cả requests fail với HikariPool: Connection is not available, request timed out after 5000ms. Database CPU = 30%, nghĩa là DB không bận.

Investigation:

// Metrics lúc incident:
// hikaricp_connections_active = 10 (MAX)
// hikaricp_connections_pending = 847
// api.orders.latency.p99 = 5000ms (timeout)

// Tìm nguyên nhân connection held lâu:
// Thread dump cho thấy 10 threads tất cả tại:
// at com.example.NotificationService.sendPushNotification(NotificationService.java:45)
// at com.example.OrderService.placeOrder(OrderService.java:78)
// → Push notification call (Google FCM) đang block transaction!

Root cause: placeOrder() gọi FCM trong @Transactional method. FCM mất 2-10 giây trong flash sale. 10 threads × 10 giây = pool exhausted trong vài phút.

Fix:

// Trước:
@Transactional
public OrderResponse placeOrder(OrderRequest request) {
    Order order = createOrder(request);
    notificationService.sendPushNotification(order); // Giữ connection 2-10 giây!
    return OrderResponse.from(order);
}

// Sau:
@Transactional
public OrderResponse placeOrder(OrderRequest request) {
    Order order = createOrder(request);
    outboxRepo.save(OutboxMessage.forNotification(order)); // < 1ms
    return OrderResponse.from(order);
    // Transaction commit ở đây, connection released
}
// Outbox poller gửi notification async

Incident 3: Redis Outage Khiến Database Sập

Triệu chứng: Redis cluster fail lúc 11:00 PM. Trong 10 giây, database CPU tăng từ 20% lên 100%. Database bắt đầu drop connections. Toàn bộ platform down trong 20 phút.

Root cause: Cache Avalanche. 100% requests hit Redis → miss (Redis down) → hit database. Database nhận 10× normal traffic đột ngột. Không có circuit breaker cho database, không có rate limiting.

Investigation timeline:

11:00:00  Redis master failure
11:00:01  Application detect Redis connection errors
11:00:02  All cache reads fail → all requests hit DB
11:00:05  DB connections pool exhausted
11:00:10  DB CPU 100%, queries timeout
11:00:15  Application cascade fail
11:00:30  On-call engineer paged

Fix:

// 1. Graceful degradation khi Redis fail:
public ProductDto getProduct(Long id) {
    try {
        ProductDto cached = redisTemplate.opsForValue().get("product:" + id);
        if (cached != null) return cached;
    } catch (RedisConnectionFailureException e) {
        // Redis down → log, continue to DB (không throw!)
        log.warn("Redis unavailable, falling back to DB for product {}", id);
        redisDownMeter.increment();
    }
    return repo.findById(id).map(mapper::toDto).orElseThrow();
}

// 2. Local cache làm buffer khi Redis down:
@Cacheable(cacheManager = "localCacheManager", value = "products-local", key = "#id")
public ProductDto getProduct(Long id) {
    // Caffeine local cache hits đến DB giảm nếu Redis down
}

// 3. Circuit breaker cho database queries khi Redis down:
@CircuitBreaker(name = "database", fallbackMethod = "getDatabaseFallback")
public ProductDto getProductFromDb(Long id) {
    return repo.findById(id).map(mapper::toDto).orElseThrow();
}
// Nếu DB cũng overwhelmed → circuit breaker open → fail fast thay vì cascade

Incident 4: Retry Storm Làm Sập Platform

Triệu chứng: Payment service deploy fail → một số instances trả về 500. Trong 30 giây, tất cả services bắt đầu fail. Platform unreachable trong 45 phút.

Root cause:

Payment service instances = 3
1 instance bad deploy → trả về 500

Order Service: gọi Payment → 500 → retry 3 lần (mỗi retry 100ms wait)
→ mỗi order request tạo 4 payment requests (1 + 3 retries)
→ Traffic lên payment tăng 4×

Bad payment instance trở nên overload hơn → 2 instances khác cũng slow
→ Retry xảy ra cho 2 instances còn lại
→ Traffic tăng lên 16× (4 retries × 4 retries)
→ Tất cả payment instances down
→ Order service retry storm
→ Tất cả services cascade fail

Fix:

// 1. Exponential backoff với jitter (giảm thundering herd)
// 2. Max retry count thấp (2-3, không phải 5-10)
// 3. Circuit breaker: stop retrying khi failure rate cao
// 4. Total request timeout: ngay cả với retry, tổng thời gian phải có cap

RetryConfig config = RetryConfig.custom()
    .maxAttempts(2)                    // Chỉ 2 retries
    .waitDuration(Duration.ofMillis(200))
    .intervalFunction(IntervalFunction.ofExponentialRandomBackoff(200, 2.0, 0.5, 2000))
    .build();

// 5. Server-side rate limiting để protect payment service
// 6. Separate retry budget: không retry nếu đã có quá nhiều failures

Incident 5: Serialization Tạo Response 50MB

Triệu chứng: Một API endpoint trả về response rất chậm (30+ giây) và đôi khi OOM. Memory usage tăng đột biến khi endpoint được gọi.

Investigation:

// Endpoint:
@GetMapping("/merchants/{id}/full-report")
public MerchantReport getFullReport(@PathVariable Long id) {
    return merchantService.getFullReport(id);
}

// Service:
public MerchantReport getFullReport(Long id) {
    Merchant merchant = merchantRepo.findById(id).orElseThrow();
    // Hibernate EAGER load mọi thứ:
    // merchant.getOrders() → 50,000 orders
    // mỗi order.getItems() → 10 items
    // mỗi item.getProduct() → full product với binary data
    // Total: 50,000 × 10 × product (~100KB) = 50GB in worst case
    return new MerchantReport(merchant);
}

Root cause: Object graph explosion. Entity với EAGER relationships + serializer traverse toàn bộ graph.

Fix:

// 1. Never serialize entities directly — always use DTOs
public record MerchantReportDto(
    Long merchantId,
    String merchantName,
    long orderCount,  // Just count, không phải danh sách
    BigDecimal totalRevenue,
    List<OrderSummaryDto> recentOrders // Chỉ 20 orders gần nhất
) {}

// 2. Database aggregation thay vì loading all data
@Query("""
    SELECT new com.example.MerchantReportDto(
        m.id, m.name,
        COUNT(o.id),
        COALESCE(SUM(o.total), 0)
    )
    FROM Merchant m LEFT JOIN m.orders o
    WHERE m.id = :id
    GROUP BY m.id, m.name
    """)
MerchantReportDto findReportById(@Param("id") Long id);

// 3. Streaming cho large responses
@GetMapping(value = "/merchants/{id}/orders/export",
            produces = MediaType.APPLICATION_NDJSON_VALUE)
public ResponseEntity<StreamingResponseBody> exportOrders(@PathVariable Long id) {
    // newline-delimited JSON streaming
}

Phần 15: Senior Engineer Performance Checklist

Trước khi Launch: API Design Review

□ Pagination: Dùng keyset/cursor, không phải OFFSET cho data > 10K rows
□ Response size: Có sparse fieldsets hoặc projections?
  Không có endpoint nào có thể trả về unbounded data
□ Filtering: Mọi filter parameter có index tương ứng?
□ Sorting: Chỉ allow sort trên indexed columns
□ Bulk operations: Endpoint batch tồn tại cho use cases cần nhiều items?
□ Idempotency: Mọi POST/PUT có idempotency key support?
□ Rate limiting: Endpoints có thể bị abuse đã có rate limiting?
□ Cache-Control headers được set đúng cho mọi endpoint?
□ Sensitive endpoints có no-store?

Trước khi Launch: Database Review

□ EXPLAIN ANALYZE chạy cho mọi query với production data volume
□ Không có Seq Scan trên table > 100K rows
□ N+1 queries: Verify với query counter trong integration tests
□ Tất cả foreign keys có indexes
□ Không SELECT * trong production code
□ Connection pool sized đúng cho số instances
□ Statement timeout được set (tránh hung queries)
□ Read-only transactions dùng @Transactional(readOnly=true)
□ Batch operations clear persistence context định kỳ

Trước khi Launch: Load Testing

□ Load test với production-realistic data volume (không phải 100 rows)
□ Test P50, P95, P99 — không chỉ average
□ Test tại 1x, 2x, 5x expected peak traffic
□ Test sustained load (không chỉ spike)
□ Test graceful degradation: tắt Redis, external services
□ Verify connection pool behavior dưới load
□ Verify GC behavior dưới load (không có > 200ms GC pauses)
□ Test retry behavior với slow/failing downstream services

Monitoring Checklist

□ Latency: P50, P95, P99 alerting per endpoint
□ Error rate: Alert nếu > 0.1%
□ Throughput: Alert nếu drop > 20% (indicates issue)
□ HikariCP: pending connections, timeout count
□ JVM: heap usage, GC pause duration và frequency
□ External services: latency P99, error rate, circuit breaker state
□ Database: slow queries (> 500ms), connection count, deadlocks
□ Cache: hit rate, eviction rate, memory usage
□ Thread pool: active threads, queue size

Incident Response: Performance Triage

Latency tăng đột biến:
1. Kiểm tra error rate — có correlation không?
2. Kiểm tra database slow queries (pg_stat_activity)
3. Kiểm tra connection pool metrics (pending > 0?)
4. Kiểm tra external service latency (traces)
5. Kiểm tra GC activity (JVM metrics)
6. Kiểm tra thread pool — any blocked threads?
7. Kiểm tra recent deploys — ai deploy gì?

Nếu DB:
  → EXPLAIN ANALYZE slow query
  → Check missing index
  → Check lock contention (pg_locks)

Nếu External Service:
  → Check circuit breaker state
  → Check retry behavior — retry storm?

Nếu JVM:
  → Thread dump → find BLOCKED threads
  → GC logs → major GC too frequent?
  → Async profiler → CPU hotspot

Kết luận

Performance engineering không phải thêm cache vào mọi thứ hay tăng thread count. Đó là khả năng đọc hệ thống: biết latency đến từ đâu, tại sao P99 khác xa P50, tại sao database trở thành bottleneck sau một nhất định scale, và tại sao retry storm có thể làm sập cả platform.

Ba thứ đúng nhất cho hầu hết hệ thống:

1. Profile trước khi optimize. Guessing bottleneck thường sai. JFR, Async Profiler, và distributed tracing cho biết thực sự thời gian đi đâu. Không bao giờ optimize thứ bạn không đo.

2. Database thường là bottleneck — fix ở đó trước. Missing index, N+1 queries, và over-fetching data giải quyết 80% performance problems trong API mới. EXPLAIN ANALYZE là công cụ quan trọng nhất bạn có.

3. Design cho failure, không cho happy path. Timeout, circuit breaker, retry với backoff, graceful degradation khi cache down — những thứ này không liên quan đến performance thông thường nhưng quyết định P99 và behavior khi hệ thống bị stressed. P99 của bạn trong production là P50 của bạn khi có incident.

Senior engineer không nhớ mọi optimization technique — họ có mental model về request lifecycle, biết đặt câu hỏi đúng, và biết đọc dữ liệu để tìm bottleneck. Đó là thứ bài này cố gắng truyền đạt.