Everything About Threads in Java

Concurrency bugs are the hardest bugs to find and fix in production. No complete stack trace, no stable reproduction, just random misbehavior after a few hours of running. Most of those bugs come not from not knowing the API, but from not understanding why the concurrency mechanisms were designed the way they were. This post explains from the problem, not from the definition.

Why Threads Exist

Consider an HTTP server processing order requests. Each request must: validate input, query the database (50ms), call a payment gateway (200ms), and write a log. If the server processes sequentially:

// Single-threaded server - sequential processing
while (true) {
    Request req = acceptConnection();  // receive request
    processOrder(req);                 // 250ms+ per request
    // Next request must WAIT until the current one finishes
}

The second request must wait for the first to finish its 250ms. With 100 concurrent users, the last user waits 25 seconds. This is the problem threads solve: while thread 1 is waiting for the database to respond (I/O-bound, CPU is idle), thread 2 can process a different request.

The trade-off from the start: Threads solve I/O-bound bottlenecks very well. For CPU-bound work (heavy computation), adding threads beyond the number of CPU cores provides no benefit and adds context-switching overhead. Understand whether the problem is I/O-bound or CPU-bound before deciding how many threads to use.

Thread Lifecycle

A thread does not simply “run” or “stop.” It has six states, and knowing which state a thread is in is an important debugging skill.

Drag · Scroll to zoom

The most important states to understand when debugging:

BLOCKED means the thread is waiting to acquire a lock held by another thread. If you see many BLOCKED threads in a thread dump, it indicates either lock contention (one thread holding a lock too long) or a deadlock.

WAITING means the thread voluntarily yielded the CPU and is waiting to be notified. Unlike BLOCKED: WAITING is not competing for a lock; it is waiting for a condition.

TIMED_WAITING is the most common state you will see when a thread is in sleep() or waiting on I/O with a timeout.

Production debugging: When a service hangs, run kill -3 <pid> (JVM thread dump) or use jstack. Look at the number of BLOCKED threads. If dozens of threads are BLOCKED on the same lock, that is lock contention. If A is waiting for a lock held by B and B is waiting for a lock held by A, that is a deadlock.

All Ways to Create Threads in Java

Java provides several thread creation mechanisms at different levels of abstraction. Using the wrong mechanism for the problem is a common source of performance bugs and memory issues.

Thread and Runnable: the foundation

// Option 1: Subclass Thread
// Problem: cannot subclass another class, harder to test, tightly coupled
class OrderProcessor extends Thread {
    @Override
    public void run() { processOrder(); }
}
new OrderProcessor().start();

// Option 2: Runnable - separates the task from the execution mechanism
Runnable task = () -> processOrder();
new Thread(task).start();

Why Runnable is better than a Thread subclass: Runnable is a pure task definition. You can pass the same Runnable to a Thread, an ExecutorService, or any other thread abstraction. A Thread subclass ties the task to the Thread class mechanism.

Both share a common problem: Each call to new Thread().start() creates a new OS thread. Creating and destroying an OS thread costs roughly 1ms and a few MB of stack. At 1,000 requests/second, each creating one thread, you are creating 1,000 OS threads per second: memory runs out and the OS scheduler is overloaded.

Callable and Future: tasks with return values

Runnable.run() returns no value and cannot throw checked exceptions. Callable<V> solves both:

ExecutorService executor = Executors.newFixedThreadPool(10);

// Callable: returns a result and can throw Exception
Callable<ProductData> task = () -> {
    return externalApiClient.fetchProduct(productId); // may throw IOException
};

Future<ProductData> future = executor.submit(task);

// Do other work while the task runs asynchronously
doOtherWork();

// Get the result, blocking if not yet done
try {
    ProductData data = future.get(5, TimeUnit.SECONDS); // timeout prevents blocking forever
} catch (TimeoutException e) {
    future.cancel(true); // interrupt the task on timeout
    throw new ServiceUnavailableException("External API timeout");
} catch (ExecutionException e) {
    throw new RuntimeException("Task failed", e.getCause());
}

What happens when you use Future.get() without a timeout:

// Bug: get() without timeout
ProductData data = future.get(); // blocks forever if the external API hangs
// The user's HTTP request also blocks forever -> load balancer timeout
// Thread held in pool -> pool exhaustion -> entire service hangs

Trade-off: Future is a blocking model. future.get() holds a thread while waiting. With many parallel calls, you need many threads just to wait. For I/O-heavy workloads, CompletableFuture is more efficient.

ScheduledExecutorService: periodic tasks

ScheduledExecutorService replaces Timer. Do not use Timer in production: if one task throws an exception it kills the entire Timer thread, stopping all scheduled tasks.

ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(4);

// Run once after a delay
scheduler.schedule(() -> sendReminder(order), 24, TimeUnit.HOURS);

// Run periodically with a fixed rate (period starts from when the previous run started)
scheduler.scheduleAtFixedRate(
    () -> syncInventory(),
    0,          // initial delay
    5,          // period
    TimeUnit.MINUTES
);

// Run periodically with a fixed delay (period starts from when the previous run ended)
// Safer when task duration is unpredictable
scheduler.scheduleWithFixedDelay(
    () -> cleanExpiredSessions(),
    0,
    10,
    TimeUnit.MINUTES
);

scheduleAtFixedRate vs scheduleWithFixedDelay: If the inventory sync task takes 6 minutes but the period is 5 minutes, AtFixedRate queues the next task immediately when the current one finishes (because it is already overdue). Under heavy load, tasks accumulate. WithFixedDelay guarantees a 10-minute rest between runs, which is safer for tasks with variable duration.

Trade-off: ScheduledExecutorService is appropriate for scheduled tasks within a single JVM. For distributed scheduling across multiple instances, use Quartz, Spring @Scheduled with a distributed lock, or Kubernetes CronJobs.

CompletableFuture: async pipelines

CompletableFuture is the modern way to compose async operations without blocking a thread for each waiting step.

// Sequential: each step waits for the previous one - slow
ProductData product = productService.fetch(productId);     // 100ms
StockInfo stock     = inventoryService.check(productId);   // 80ms
PriceData price     = pricingService.calculate(productId); // 60ms
// Total: 240ms

// Parallel with CompletableFuture: run concurrently
CompletableFuture<ProductData> productFuture =
    CompletableFuture.supplyAsync(() -> productService.fetch(productId), executor);

CompletableFuture<StockInfo> stockFuture =
    CompletableFuture.supplyAsync(() -> inventoryService.check(productId), executor);

CompletableFuture<PriceData> priceFuture =
    CompletableFuture.supplyAsync(() -> pricingService.calculate(productId), executor);

// Wait for all to complete
CompletableFuture.allOf(productFuture, stockFuture, priceFuture).join();
// Total: ~100ms (limited by the slowest task)

ProductResponse response = new ProductResponse(
    productFuture.join(),
    stockFuture.join(),
    priceFuture.join()
);

Pipeline composition:

CompletableFuture<OrderConfirmation> pipeline =
    CompletableFuture.supplyAsync(() -> validateOrder(request), executor)
        .thenApplyAsync(valid -> enrichWithProductData(valid), executor)
        .thenApplyAsync(enriched -> calculateShipping(enriched), executor)
        .thenComposeAsync(order -> paymentService.charge(order), executor) // returns CF
        .thenApplyAsync(paid -> orderRepository.save(paid), executor)
        .exceptionally(ex -> {
            log.error("Order pipeline failed", ex);
            return OrderConfirmation.failed(ex.getMessage());
        });

A common trap: CompletableFuture.supplyAsync(task) without specifying an executor uses ForkJoinPool.commonPool(). This is a JVM-wide shared pool. A blocking task in it affects parallelStream() and every other CompletableFuture. Always pass an explicit executor:

// WRONG: uses common pool
CompletableFuture.supplyAsync(() -> blockingDbCall());

// CORRECT: uses a dedicated executor
CompletableFuture.supplyAsync(() -> blockingDbCall(), dbExecutor);

ForkJoinPool: work-stealing for divide and conquer

ForkJoinPool is designed for problems that can be split recursively. The key difference from ThreadPoolExecutor is work-stealing: when a thread finishes its own queue of tasks, it steals tasks from the tail of another busy thread’s queue. The result is that all CPUs stay busy with no thread sitting idle while others are overloaded.

Drag · Scroll to zoom

RecursiveTask: parallel sum of a large array

public class SumTask extends RecursiveTask<Long> {
    private static final int THRESHOLD = 10_000; // split until small enough
    private final long[] data;
    private final int start, end;

    public SumTask(long[] data, int start, int end) {
        this.data = data; this.start = start; this.end = end;
    }

    @Override
    protected Long compute() {
        if (end - start <= THRESHOLD) {
            // Base case: small enough, compute directly
            long sum = 0;
            for (int i = start; i < end; i++) sum += data[i];
            return sum;
        }
        // Recursive case: split in half
        int mid = (start + end) / 2;
        SumTask left  = new SumTask(data, start, mid);
        SumTask right = new SumTask(data, mid, end);

        left.fork();                    // push left onto current thread's queue
        long rightResult = right.compute(); // run right directly on this thread
        long leftResult  = left.join(); // get left result (wait if not done)

        return leftResult + rightResult;
    }
}

// Usage
ForkJoinPool pool = new ForkJoinPool(); // default: number of CPU cores
long total = pool.invoke(new SumTask(data, 0, data.length));

Why left.fork() then right.compute() instead of forking both:

If you fork both, the current thread sits idle waiting. By calling right.compute() directly, the current thread keeps working instead of being idle. This is the standard ForkJoin pattern.

parallelStream() uses ForkJoinPool under the hood:

// parallelStream() uses ForkJoinPool.commonPool()
List<OrderSummary> summaries = orders.parallelStream()
    .filter(o -> o.getStatus() == COMPLETED)
    .map(orderMapper::toSummary)
    .collect(Collectors.toList());

// PROBLEM: commonPool is shared across the entire JVM
// If a task in parallelStream() blocks (DB call, HTTP call),
// it occupies a commonPool worker and affects all CompletableFutures and other parallelStreams

// Production fix: use a dedicated ForkJoinPool for isolation
ForkJoinPool customPool = new ForkJoinPool(4);
List<OrderSummary> summaries = customPool.submit(() ->
    orders.parallelStream()
        .filter(o -> o.getStatus() == COMPLETED)
        .map(orderMapper::toSummary)
        .collect(Collectors.toList())
).get();

When to use ForkJoinPool:

CPU-bound divide-and-conquer (merge sort, parallel aggregation, image processing)
Large datasets where each element is processed independently
parallelStream() on large collections with pure computation

When NOT to use ForkJoinPool:

I/O-bound tasks (database, HTTP): threads block without doing work, wasting workers
Tasks with order-dependent side effects
Small collections (fork/join overhead exceeds the benefit)

Trade-off: ForkJoinPool optimizes CPU utilization but is not suited for blocking I/O. With Java 21+, Virtual Threads handle I/O-bound problems better.

Virtual Threads (Java 21+): lightweight at scale

Virtual Threads are Project Loom: a new thread model that does not map 1-to-1 with OS threads. The JVM manages millions of virtual threads, multiplexing them onto a small number of OS threads called carrier threads.

// Create a virtual thread
Thread vt = Thread.ofVirtual().start(() -> processRequest(request));

// With ExecutorService
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
    // Creates a virtual thread for each task - not as expensive as OS threads
    for (Order order : orders) {
        executor.submit(() -> processOrder(order)); // thousands of tasks, no problem
    }
} // automatically shuts down when try-with-resources exits

Why Virtual Threads were created:

With platform threads (OS threads), each thread costs roughly 1MB of stack. A server with 10,000 concurrent requests needs 10GB of RAM just for stacks. With Virtual Threads, the overhead per thread is only a few KB. You can have 100,000 virtual threads without worrying about memory.

More importantly: when a Virtual Thread blocks on I/O (database, HTTP), the JVM unmounts it from its carrier thread. The carrier thread is free to run another virtual thread. No thread sits idle waiting.

// With platform threads: pool must be sized to avoid thread exhaustion
ExecutorService platformExecutor = Executors.newFixedThreadPool(200);
// With 200 threads, at most 200 concurrent requests

// With virtual threads: thread-per-request model becomes viable again
ExecutorService virtualExecutor = Executors.newVirtualThreadPerTaskExecutor();
// Thousands of concurrent requests, each with its own virtual thread
// When blocked on I/O -> unmounted, carrier thread processes another request

Traps with Virtual Threads:

// WRONG: synchronized block pins the virtual thread to the carrier thread
// When a virtual thread inside a synchronized block is blocked on I/O,
// it cannot unmount - the carrier thread is also blocked
synchronized (lock) {
    dbConnection.query(sql); // I/O block -> pins carrier thread!
}

// CORRECT: use ReentrantLock instead of synchronized
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
    dbConnection.query(sql); // virtual thread unmounts when blocked on I/O
} finally {
    lock.unlock();
}

Trade-off: Virtual Threads excel at I/O-bound workloads. For CPU-bound work (tight loops, heavy computation), there is no benefit over platform threads. ForkJoinPool is still the right choice for CPU-bound work. Virtual Threads do not solve race conditions or deadlocks: the fundamental concurrency problems still apply.

Comparison table

Mechanism	When to use	Avoid when
`new Thread()`	Never in production	-
`ExecutorService` (fixed pool)	I/O-bound, controlled concurrency	Task scheduling or return values needed
`Callable + Future`	Single task needing a return value	Multiple async steps that need composing
`ScheduledExecutorService`	Periodic or delayed tasks	Distributed scheduling
`CompletableFuture`	Async pipelines, parallel calls	CPU-bound heavy computation
`ForkJoinPool`	CPU-bound divide and conquer	I/O-bound, small data
`parallelStream()`	Pure computation on large collections	I/O inside the stream operation
Virtual Thread (Java 21+)	I/O-bound, high concurrency	CPU-bound computation

Creating Threads: Never Use new Thread() in Production

Despite the many mechanisms available, the simplest rule is: never create threads manually in production code. Always use an executor to control the number of threads.

A real production bug:

// Bug: creates an unbounded number of threads for each incoming order event
@KafkaListener(topics = "order-events")
public void handleOrder(OrderEvent event) {
    new Thread(() -> {
        enrichWithExternalData(event); // HTTP call 500ms
        orderRepository.save(event);
    }).start(); // creates a new thread per event, unbounded
}

When Kafka consumer lag spiked after a maintenance window, the service received thousands of events simultaneously, each creating a thread. Heap exhaustion, OOM error, service crash.

Trade-off: Pooling threads with ExecutorService is the right approach, but the pool size must be tuned. Too few threads: low throughput. Too many: memory pressure and context-switching overhead. Pool sizing is covered in the ThreadPoolExecutor section below.

Race Condition: When Two Threads Collide

A race condition occurs when the result depends on the execution order of threads, and that order is not guaranteed.

What happens:

// Unprotected shared counter
public class OrderCounter {
    private int count = 0;

    public void increment() {
        count++; // NOT atomic! This is 3 steps: read, add, write
    }

    public int get() { return count; }
}

count++ looks like one operation but is actually: read the current value, add 1, write it back. Two threads working simultaneously:

Thread 1: reads count = 100
Thread 2: reads count = 100   <- reads before Thread 1 finishes writing
Thread 1: writes count = 101
Thread 2: writes count = 101  <- overwrites Thread 1's result

Result: two threads incremented but count only went from 100 to 101, not 102. With 1,000 threads all incrementing, the final result can be any number from 1 to 1,000.

Production example: An order counter used to limit orders during a flash sale.

@Service
public class FlashSaleService {
    private int orderCount = 0; // race condition!
    private static final int MAX_ORDERS = 1000;

    public boolean tryPlaceOrder(Order order) {
        if (orderCount >= MAX_ORDERS) {
            return false; // sold out
        }
        orderCount++; // race condition here
        orderRepository.save(order);
        return true;
    }
}

Result: 1,200 orders were created when the limit was 1,000. Inventory went negative and the fulfillment team had to handle it manually.

The correct fix with AtomicInteger:

private final AtomicInteger orderCount = new AtomicInteger(0);

public boolean tryPlaceOrder(Order order) {
    // compareAndSet is atomic: check and increment in a single operation
    int current;
    do {
        current = orderCount.get();
        if (current >= MAX_ORDERS) return false;
    } while (!orderCount.compareAndSet(current, current + 1));

    // Or more simply with getAndIncrement + check after
    int slot = orderCount.getAndIncrement();
    if (slot >= MAX_ORDERS) {
        orderCount.decrementAndGet(); // return the slot
        return false;
    }
    orderRepository.save(order);
    return true;
}

AtomicInteger uses CAS (Compare-And-Swap) at the CPU instruction level: no lock needed, so no contention on read-heavy workloads.

Trade-off: AtomicInteger works well for a single variable. When you need to update multiple variables atomically at the same time, CAS is not enough; you need a lock.

synchronized: Protecting a Critical Section

synchronized places a lock (monitor) on an object. Only one thread can hold the lock at a time. Other threads must wait in the BLOCKED state.

Why it exists: AtomicInteger is sufficient for one variable, but when multiple variables must be updated consistently together, you need to ensure no other thread sees a “halfway” state in the middle.

What happens without it:

// Transfer money between two accounts - must be atomic!
public class BankAccount {
    private int balance;

    public void transfer(BankAccount target, int amount) {
        this.balance -= amount;
        // If another thread reads balance here: it sees the money already deducted
        // but not yet added to target. The total money in the system appears negative.
        target.balance += amount;
    }
}

Production example: A payment service processing a refund concurrently with a purchase.

public class WalletService {
    private final Map<String, Long> balances = new HashMap<>();

    // synchronized on method: lock is 'this' (the WalletService instance)
    public synchronized boolean transfer(String fromId, String toId, long amount) {
        long fromBalance = balances.getOrDefault(fromId, 0L);
        if (fromBalance < amount) return false;

        balances.put(fromId, fromBalance - amount);
        balances.put(toId, balances.getOrDefault(toId, 0L) + amount);
        return true;
    }

    // The problem: locking the whole method means only one transfer runs at a time
    // Even transfers between completely different pairs of accounts must queue up
}

The problem: synchronized on a method locks this. Every transfer serializes, including ones with no relationship to each other. Throughput is severely limited.

Use a synchronized block instead of a synchronized method to reduce scope:

public boolean transfer(String fromId, String toId, long amount) {
    // Lock only when actually modifying state
    synchronized (balances) {
        long fromBalance = balances.getOrDefault(fromId, 0L);
        if (fromBalance < amount) return false;
        balances.put(fromId, fromBalance - amount);
        balances.put(toId, balances.getOrDefault(toId, 0L) + amount);
        return true;
    }
    // Operations that do not need locking (logging, audit trail) can go outside
}

Trade-off: synchronized is blocking. A blocked thread holds a CPU slot and OS resources while doing nothing. Under workloads where many threads compete for the same lock, throughput is severely constrained. If you see many BLOCKED threads in a thread dump competing for the same lock, that is a signal to redesign.

volatile: Visibility Between Threads

volatile solves a different problem from synchronized: visibility. Each CPU core has its own L1/L2 cache. When thread A writes to a variable, the value may sit in core A’s cache and thread B (running on a different core) may not see it.

What happens without it:

public class DataProcessor {
    private boolean running = true; // no volatile

    public void process() {
        while (running) {   // Thread A reads 'running' from CPU cache
            doWork();
        }
    }

    public void stop() {
        running = false;    // Thread B writes to RAM, but Thread A
                            // still sees running = true from its cache
                            // -> the loop never stops
    }
}

The JVM is allowed to optimize by caching running in a register. The thread calling stop() writes false to RAM, but the thread in process() never re-reads from RAM. In a release build with JIT optimization, while (running) can even be transformed into while (true).

Fix:

private volatile boolean running = true;

volatile guarantees two things:

Every write to a volatile variable is flushed to main memory immediately.
Every read from a volatile variable is read from main memory, not from cache.

Production example: Graceful shutdown in a service.

@Component
public class EventProcessor implements DisposableBean {
    private volatile boolean shutdownRequested = false;
    private final ExecutorService executor = Executors.newSingleThreadExecutor();

    @PostConstruct
    public void start() {
        executor.submit(() -> {
            while (!shutdownRequested) {
                try {
                    Event event = eventQueue.poll(100, TimeUnit.MILLISECONDS);
                    if (event != null) processEvent(event);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    break;
                }
            }
            log.info("EventProcessor stopped cleanly");
        });
    }

    @Override
    public void destroy() {
        shutdownRequested = true; // visible to the processing thread immediately
        executor.shutdown();
    }
}

The critical trade-off: volatile guarantees visibility but does NOT guarantee atomicity. volatile int count; count++ is still a race condition because count++ is three steps. Use volatile when: one thread writes, multiple threads read. Use AtomicInteger or synchronized when multiple threads write.

Deadlock: When Two Threads Wait for Each Other Forever

Deadlock occurs when thread A holds lock X and waits for lock Y, while thread B holds lock Y and waits for lock X. Both wait for each other indefinitely.

Drag · Scroll to zoom

A real production bug: transfer between two accounts

public class DeadlockExample {

    public void transfer(Account from, Account to, long amount) {
        synchronized (from) {              // Thread 1: locks account A
            synchronized (to) {            // Thread 1: waits for lock on account B
                from.debit(amount);
                to.credit(amount);
            }
        }
    }
}

// Thread 1: transfer(accountA, accountB, 100)  -> locks A, waits for B
// Thread 2: transfer(accountB, accountA, 200)  -> locks B, waits for A
// Both wait for each other forever

Fix: lock ordering - always acquire locks in a consistent order

public void transfer(Account from, Account to, long amount) {
    // Use System.identityHashCode to determine a consistent lock order
    Account first  = System.identityHashCode(from) < System.identityHashCode(to) ? from : to;
    Account second = first == from ? to : from;

    synchronized (first) {
        synchronized (second) {
            from.debit(amount);
            to.credit(amount);
        }
    }
}

Now every thread always acquires locks in the same order (the account with the smaller hashCode first). Thread 1 and Thread 2 both try to lock account A first: one wins, the other waits. Deadlock cannot occur.

Ways to prevent deadlocks:

Lock ordering: Always acquire multiple locks in a consistent order.
Lock timeout with tryLock: Use ReentrantLock.tryLock(timeout) instead of waiting indefinitely.
Minimize lock scope: Hold locks for the shortest time possible; do not call external code while holding a lock.
Use higher-level abstractions: ConcurrentHashMap, BlockingQueue instead of managing locks manually.

Trade-off: Lock ordering solves deadlocks but requires strict discipline. A new developer who does not know the convention will break it. ReentrantLock.tryLock() is safer but requires handling the case where the lock is not acquired.

ReentrantLock: When synchronized Is Not Enough

ReentrantLock provides everything synchronized does, plus:

tryLock(timeout): Try to acquire the lock; give up after the timeout (prevents deadlock).
lockInterruptibly(): A thread waiting for the lock can be interrupted.
Fair lock: The thread that has waited longest gets priority (prevents starvation).
Multiple conditions: Multiple Condition objects for the same lock.

Production example: payment with timeout to prevent deadlock

public class PaymentService {
    private final ReentrantLock lock = new ReentrantLock();

    public boolean processPayment(PaymentRequest request) throws InterruptedException {
        // Try to acquire the lock for up to 2 seconds
        if (!lock.tryLock(2, TimeUnit.SECONDS)) {
            // Could not acquire lock after 2 seconds -> reject instead of waiting forever
            log.warn("Payment processing timeout for order {}", request.getOrderId());
            throw new PaymentTimeoutException("System busy, please retry");
        }

        try {
            return doProcessPayment(request);
        } finally {
            lock.unlock(); // ALWAYS unlock in finally
        }
    }
}

Trade-off: ReentrantLock is more complex than synchronized and easy to misuse if you forget to unlock in a finally block. Use synchronized when it is sufficient. Switch to ReentrantLock only when you need tryLock, interruptible locking, or fair ordering.

ThreadPoolExecutor: Configuration

Executors.newFixedThreadPool(n) is a convenient shortcut but hides important parameters. In production, you need to understand the full ThreadPoolExecutor.

Drag · Scroll to zoom

ThreadPoolExecutor executor = new ThreadPoolExecutor(
    10,                          // corePoolSize: threads always alive
    50,                          // maximumPoolSize: absolute upper limit
    60L, TimeUnit.SECONDS,       // keepAliveTime: how long extra threads stay alive
    new LinkedBlockingQueue<>(500), // workQueue: limit of 500 pending tasks
    new ThreadPoolExecutor.CallerRunsPolicy() // rejection: caller handles the task itself
);

Why each parameter matters:

When a task is submitted:

If running threads < corePoolSize: create a new thread immediately, even if idle threads exist.
If running threads >= corePoolSize: put the task in the queue.
If queue is full AND running threads < maximumPoolSize: create an additional thread.
If queue is full AND maximumPoolSize is reached: invoke the RejectionHandler.

What happens with Executors.newFixedThreadPool(n):

// Executors.newFixedThreadPool source code:
return new ThreadPoolExecutor(n, n, 0L, TimeUnit.MILLISECONDS,
    new LinkedBlockingQueue<Runnable>()); // NO BOUND!

The queue is unbounded. If workers cannot keep up, tasks accumulate in the queue without limit and cause OOM. This is one of the most common causes of OOM errors in Java services.

Production configuration for a REST API service:

@Bean
public ThreadPoolExecutor orderProcessingExecutor() {
    int cpuCores = Runtime.getRuntime().availableProcessors();

    return new ThreadPoolExecutor(
        cpuCores * 2,          // core: I/O-bound tasks typically benefit from 2x cores
        cpuCores * 4,          // max: burst capacity
        30L, TimeUnit.SECONDS,
        new LinkedBlockingQueue<>(1000), // bounded: prevents OOM
        new ThreadFactory() {
            private final AtomicInteger count = new AtomicInteger();
            public Thread newThread(Runnable r) {
                Thread t = new Thread(r, "order-worker-" + count.incrementAndGet());
                t.setDaemon(false); // non-daemon: JVM does not exit while tasks remain
                return t;
            }
        },
        (task, executor) -> {
            // Rejection: log and throw so the caller knows and can retry
            log.error("Order processing queue full, rejecting task");
            throw new RejectedExecutionException("Order processing at capacity");
        }
    );
}

Queue type trade-offs:

LinkedBlockingQueue(n): bounded buffer, tasks wait in line. Appropriate when higher latency during busy periods is acceptable.

SynchronousQueue: no buffer, each submit must have a ready thread. Low latency, but requires a large enough maximumPoolSize or rejections occur frequently.

ArrayBlockingQueue(n): similar to LinkedBlockingQueue but pre-allocates memory, more cache-friendly at high throughput.

How to size the pool:

For I/O-bound tasks (database, HTTP calls): poolSize = N * (1 + waitTime/serviceTime). If a call takes 200ms and processing takes 10ms, the ratio is 21. With 4 cores: 4 * 21 = 84 threads is optimal. For I/O-bound tasks, more threads than CPU cores is correct.

For CPU-bound tasks (computation, compression): poolSize = N + 1 (N = CPU core count). More threads do not help; they only add context-switching cost.

ThreadLocal: Per-Thread Private State

ThreadLocal<T> gives each thread its own copy of a variable. No synchronization is needed because each thread only sees and modifies its own copy.

Why it exists: Some objects are not thread-safe but are expensive to create on every use. Or you need to pass context (user info, request ID, DB transaction) through many layers without threading it through every method parameter.

Production example: request context tracking

// RequestContext.java
public class RequestContext {
    private static final ThreadLocal<RequestContext> CONTEXT =
        ThreadLocal.withInitial(RequestContext::new);

    private String requestId;
    private String userId;
    private long startTime;

    public static RequestContext current() { return CONTEXT.get(); }

    public static void clear() { CONTEXT.remove(); } // IMPORTANT
}

// OrderFilter.java (Servlet Filter)
public class RequestContextFilter implements Filter {
    @Override
    public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain)
            throws IOException, ServletException {
        try {
            RequestContext ctx = RequestContext.current();
            ctx.setRequestId(UUID.randomUUID().toString());
            ctx.setUserId(extractUserId((HttpServletRequest) req));
            ctx.setStartTime(System.currentTimeMillis());

            chain.doFilter(req, resp);
        } finally {
            RequestContext.clear(); // REQUIRED to prevent memory leaks in thread pools
        }
    }
}

// Usable anywhere in the call stack without passing parameters
@Service
public class AuditService {
    public void logAction(String action) {
        String requestId = RequestContext.current().getRequestId(); // no injection needed
        log.info("[{}] Action: {}", requestId, action);
    }
}

Memory leak with ThreadLocal in a thread pool:

// BUG: not clearing ThreadLocal in a thread pool
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, ...);

executor.submit(() -> {
    RequestContext.current().setUserId("user-42");
    processRequest();
    // Forgot to call RequestContext.clear()!
    // Thread returns to pool with userId = "user-42" still in its ThreadLocal
    // The next request on this thread reads the previous request's userId
});

Thread pools reuse threads. Without remove(), the next task that runs on the same thread sees stale data from the previous task. In a security context this means request A sees user B’s information. Beyond security, it is also a memory leak because the strong reference prevents garbage collection.

Rule: ThreadLocal always pairs with try-finally { ThreadLocal.remove() }.

Trade-off: ThreadLocal hides dependencies. Code calling RequestContext.current() implicitly depends on a Filter having set the context, but this does not appear in the method signature. It is harder to test (you must set up ThreadLocal in the test). Use ThreadLocal for genuine cross-cutting concerns (logging, tracing), not for business logic.

Context Propagation in Spring Boot and Quarkus

Understanding threads is not enough. In practice you work inside a framework that manages threads for you. The problem is not how to create a thread; it is how to ensure that important metadata (security identity, request ID, transaction, CDI scope) is not lost when code moves to a different thread.

Spring Boot: context lost when using @Async

When a method is annotated with @Async, Spring executes it on a separate thread pool. That thread does not automatically inherit any context from the calling thread, because all context in Spring is ThreadLocal-based.

What happens:

@Service
public class OrderService {

    @Async
    public CompletableFuture<Void> sendConfirmationAsync(Order order) {
        // SecurityContextHolder.getContext().getAuthentication() -> null
        // MDC.get("requestId") -> null
        // RequestContextHolder.getRequestAttributes() -> null
        String userId = getCurrentUserId(); // NullPointerException
        auditService.log(userId, "ORDER_CONFIRMED", order.getId());
        return CompletableFuture.completedFuture(null);
    }
}

The thread in the pool does not know which request is being processed, who is logged in, or what the requestId for logging is. Result: audit log missing userId, distributed trace loses requestId, security checks inside the async method throw NullPointerException.

Fix: TaskDecorator to copy context before dispatch

TaskDecorator is a Spring hook that lets you wrap each task before it runs on the thread pool. Use it to capture context from the calling thread and restore it on the worker thread:

public class ContextCopyingTaskDecorator implements TaskDecorator {

    @Override
    public Runnable decorate(Runnable task) {
        // Runs on the calling thread: capture current context
        SecurityContext securityCtx = SecurityContextHolder.getContext();
        Map<String, String> mdcCtx   = MDC.getCopyOfContextMap();

        return () -> {
            // Runs on the worker thread: restore context
            SecurityContextHolder.setContext(securityCtx);
            if (mdcCtx != null) MDC.setContextMap(mdcCtx);
            try {
                task.run();
            } finally {
                SecurityContextHolder.clearContext(); // required: thread returns to pool
                MDC.clear();
            }
        };
    }
}

@Configuration
@EnableAsync
public class AsyncConfig implements AsyncConfigurer {

    @Override
    public Executor getAsyncExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(500);
        executor.setThreadNamePrefix("async-worker-");
        executor.setTaskDecorator(new ContextCopyingTaskDecorator());
        executor.initialize();
        return executor;
    }
}

Now every @Async method automatically receives the correct SecurityContext and MDC from the original request without changing any business logic.

Why not use MODE_INHERITABLETHREADLOCAL:

Spring Security can be configured to automatically inherit context into child threads:

SecurityContextHolder.setStrategyName(SecurityContextHolder.MODE_INHERITABLETHREADLOCAL);

This looks simpler, but it is dangerous with thread pools. InheritableThreadLocal copies context when a thread is created, not when a task is submitted. Because thread pools reuse threads, threads are created once at startup and never “inherit” the context of later requests. The result is still null. TaskDecorator copies at task submission time, which is the correct behavior.

Production example: order audit with full context

@Service
public class OrderProcessingService {

    @Async("orderExecutor")
    public CompletableFuture<OrderResult> processAsync(Order order) {
        // SecurityContext and MDC have been propagated via TaskDecorator
        String requestId = MDC.get("requestId"); // has value from the original request
        String userId    = SecurityContextHolder.getContext()
                               .getAuthentication().getName();

        log.info("[{}] Processing order {} for user {}",
            requestId, order.getId(), userId);

        OrderResult result = doHeavyProcessing(order);

        auditRepository.save(new AuditEntry(userId, "ORDER_PROCESSED",
            order.getId(), requestId));

        return CompletableFuture.completedFuture(result);
    }
}

Trade-off: TaskDecorator copies the SecurityContext as a shallow copy. The same Authentication object is shared between the calling thread and the worker thread. If Authentication is mutable and someone modifies it in the async thread, the calling thread also sees the change. In practice, Authentication in Spring Security is immutable, so this is not a problem. For custom context objects, ensure they are immutable or perform a deep copy.

Quarkus: a fundamentally different thread model

Quarkus uses the Vert.x reactive engine, which creates two types of threads with distinct roles:

Drag · Scroll to zoom

Never block the event loop thread:

// WRONG in Quarkus: blocking the event loop
@GET
@Path("/orders/{id}")
public Order getOrder(String id) {
    return orderRepository.findById(id).await().indefinitely(); // blocking!
    // event loop is blocked -> the entire server stops accepting requests
}

// CORRECT: annotate blocking methods
@GET
@Path("/orders/{id}")
@Blocking  // Quarkus automatically dispatches to a worker thread
public Order getOrder(String id) {
    return orderRepository.findById(id); // JDBC blocking, fine on a worker thread
}

ManagedExecutorService instead of ExecutorService:

In Quarkus (and Jakarta EE), when you need to run background tasks, use ManagedExecutorService instead of creating an ExecutorService yourself. The core difference: ManagedExecutorService automatically propagates CDI context, security identity, and transaction context into the task.

@ApplicationScoped
public class BackgroundJobService {

    @Inject
    ManagedExecutorService executor; // injected, not manually created

    @Inject
    SecurityIdentity identity;

    public void scheduleInventoryUpdate(List<Product> products) {
        executor.submit(() -> {
            // CDI context is available - no manual setup needed
            String currentUser = identity.getPrincipal().getName(); // works
            products.forEach(inventoryService::update); // @Transactional works
        });
    }
}

Compared to new ThreadPoolExecutor(): tasks running in a manually created executor have no CDI context. Calling an @ApplicationScoped bean from within them works (because ApplicationScoped does not need request scope), but calling a @RequestScoped bean throws ContextNotActiveException.

Manually activating request scope when needed:

@ApplicationScoped
public class ReportService {

    @Inject
    ManagedExecutorService executor;

    @Inject
    InjectableContext requestContext; // Quarkus CDI request context

    public void generateReportAsync(ReportRequest request) {
        executor.submit(() -> {
            // Request scope is not active in background threads by default
            // Activate it manually:
            requestContext.activate();
            try {
                doGenerateReport(request); // code uses @RequestScoped beans
            } finally {
                requestContext.terminate(); // cleanup
            }
        });
    }
}

Reactive pipeline with Mutiny:

Quarkus encourages Mutiny for async/reactive code. Context propagation is handled automatically by SmallRye when switching between threads in a pipeline:

@GET
@Path("/orders/process")
public Uni<OrderResult> processOrder(@Valid OrderRequest request) {
    return Uni.createFrom().item(request)
        .onItem().transformToUni(req ->
            // Still on event loop: validate, lightweight
            Uni.createFrom().item(validateAndEnrich(req))
        )
        .emitOn(executor)                    // switch to worker thread
        .onItem().transform(enriched -> {
            // On worker thread: blocking DB call
            // SecurityIdentity, CDI context propagated by SmallRye
            return orderRepository.save(enriched);
        })
        .emitOn(Infrastructure.getDefaultWorkerPool())
        .onItem().transform(saved -> {
            notificationService.sendConfirmation(saved);
            return new OrderResult(saved.getId());
        });
}

Summary: Spring Boot vs Quarkus

	Spring Boot	Quarkus
Context propagation by default	No (TaskDecorator required)	Yes (ManagedExecutorService)
Security context	Manual copy needed	Propagated automatically
MDC propagation	Manual copy needed	Manual or via filter
Background tasks	`@Async` + TaskDecorator	`ManagedExecutorService`
Blocking operations	Any thread	Worker thread, avoid event loop
CDI request scope in async	Requires setup	`requestContext.activate()`
Reactive pipeline	`CompletableFuture`, Reactor	Mutiny + SmallRye propagation

Summary

Problem	Cause	Right tool
I/O bottleneck	Single-threaded	Thread pool
Race condition	Unprotected shared state	`AtomicInteger`, `synchronized`, `ReentrantLock`
Visibility	CPU cache, JIT optimization	`volatile`
Deadlock	Circular lock dependency	Lock ordering, `tryLock(timeout)`
OOM from thread pool	Unbounded queue	`LinkedBlockingQueue(n)`
Memory leak	ThreadLocal not cleared	`try-finally { ThreadLocal.remove() }`
Context lost in @Async	ThreadLocal not propagated	`TaskDecorator` (Spring), `ManagedExecutorService` (Quarkus)

Mental model: A thread is a unit of execution, not a unit of isolation. Two threads in the same JVM share the entire heap. Everything you put in an object’s field is potential shared state. The first question when writing concurrent code: “is this variable accessed from multiple threads?” If yes, a protection mechanism is needed. If no, none is needed.

Concurrency is not about using the right API. It is about designing data flow to minimize shared mutable state. The code easiest to reason about for concurrency is code with no shared mutable state: use immutable objects, local variables, and message passing instead of shared memory whenever possible.