The effect of -Xss JVM option, and inefficient memory allocation on MacOS

I’ll try to answer the following questions with some tests and measurements:

  • What does the -Xss JVM argument actually do? How does it affect memory usage?
  • Can scalability be improved by adjusting the JVM stack size?

In the course of these tests, I encountered memory allocation problem on MacOS.

Default JVM stack size

The default stack size for ARM64 platform is twice that of X86 platform:

$ java -XX:+PrintFlagsFinal -version | grep -i threadstack
     intx CompilerThreadStackSize             = 2048
     intx ThreadStackSize                     = 2048
     intx VMThreadStackSize                   = 2048

ThreadStackSize : Stack size used by the application, can be set with ‘-Xss’
VMThreadStackSize : Native method stack size, can be set with -XX:VMThreadStackSize

  8.0.397 11.0.21 17.0.9 21.0.1
x86_64 / MacOS or Linux 1024 1024 1024 1024
ARM64 / MacOS 2048 2048 2048 2048
ARM64 / Linux 2048 2048 2040 2040
Default stack size of major Java versions (in KB)

What does the -Xss JVM option do?

It defines how much stack memory per thread is reserved in virtual memory.

Actual memory allocation (virtual to physical mapping) happens dynamically, meaning that this setting determines the maximum size of the stack memory.

The tests below confirm this and also reveal excessive memory allocation when running JVM in containers on a MacOS host

Testing

Setup

     
Operating Systems   :   MacOS 13.6 / Linux Ubuntu 22.04
Architectures   :   ARM64, X86_64
Java Versions   :   Amazon Corretto 21.0.1, 17.0.9, 11.0.21, 8.0.392
Docker Engine   :   24.0.6

Implementation

The code below (also on GitHub) allows controlling the number of threads, stack depth, and heap allocation per thread:

/** Recursive method to fill the stack */
void fillStack(int frameCount) {
    // Unused primitives consume stack on each frame, since 
    // "frames may be heap allocated": https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-2.html#jvms-2.5.2
    var p1 = 1L; 
    var p2 = 1L;
    if (frameCount > 0) {
        fillStack(frameCount - 1);
    }
}


For each thread, we allocate the heap and stack:

public void spawnThreads(int threadCount, int heapPerThread, int frameCount) {
    System.out.println("\nSpawning " + threadCount + " threads, each consuming ~" 
            + heapPerThread + " KB heap, " + frameCount + " stack frames\n");

    IntStream.rangeClosed(1, threadCount).forEachOrdered(value -> new Thread(() -> {
         // Allocate heap
         byte[] heapArr = new byte[heapPerThread * 1024];
         // Allocate stack
         fillStack(frameCount);
         System.out.print("\r" + threadCount + " threads created");
         // Block the thread
         try {
             countDownLatch.await();
         } catch (InterruptedException e) {
             throw new RuntimeException(e);
         }
    }).start());

    System.out.println("\nPID: " + ManagementFactory.getRuntimeMXBean().getPid());
}

Testing on the host

Changing the stack depth

We will start with Java 17 on MacOS on ARM64 architecture.

Our testing parameters include a 1536 MB heap limit, 1000 threads, each handling approximately 100 KB-sized objects in heap, and a JVM stack size option set to 2048 KB.

To assess memory usage, we’ll compare scenarios with 0 and 1000 stack depth, employing both native memory tracking and the ‘ps’ command.

Stack depth 0:

$ java -Xint -Xmx1536m -Xss2048k -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary com.berksoftware.article.jvmstack.MemoryFiller 1000 100 0
Spawning 1000 threads, each consuming ~100 KB heap, 0 stack frames

1000 threads created
PID: 62972

NMT report:

Total: reserved=4981866KB, committed=2731770KB
       malloc: 27930KB #36432
       mmap:   reserved=4953936KB, committed=2703840KB

-                 Java Heap (reserved=1572864KB, committed=526336KB)
                            (mmap: reserved=1572864KB, committed=526336KB) 
...
-                    Thread (reserved=2112289KB, committed=2112289KB)
                            (thread #1025)
                            (stack: reserved=2109440KB, committed=2109440KB)
                            (malloc=1649KB #6163) 
                            (arena=1200KB #2048)
...

The NMT report inaccurately indicates the stack memory is allocated eagerly. Therefore, we measure the native memory usage with ps:

$ ps -o rss,command -p 62972
RSS COMMAND
282336 /usr/bin/java -Xint -Xmx1536m -Xss2048k ... MemoryFiller 1000 100 0

Stack depth 1000:

$ java -Xint -Xmx1536m -Xss2048k -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary com.berksoftware.article.jvmstack.MemoryFiller 1000 100 1000
Spawning 1000 threads, each consuming ~100 KB heap, 1000 stack frames

1000 threads created
PID: 63571

Once again, NMT fails to provide useful information regarding stack memory allocation:

Total: reserved=4981861KB, committed=2731829KB
       malloc: 27925KB #36322
       mmap:   reserved=4953936KB, committed=2703904KB

-                 Java Heap (reserved=1572864KB, committed=526336KB)
                            (mmap: reserved=1572864KB, committed=526336KB) 
... 
-                    Thread (reserved=2112289KB, committed=2112289KB)
                            (thread #1025)
                            (stack: reserved=2109440KB, committed=2109440KB)
                            (malloc=1649KB #6163) 
                            (arena=1200KB #2048)
...

‘ps’ command reveals the difference in total memory usage, 276 vs 414 MB:

$ ps -o rss,command -p 63571
   RSS COMMAND
424032 /usr/bin/java -Xint -Xmx1536m -Xss2048k ... MemoryFiller 1000 100 1000

Other OS / platform / Java versions behave similarly.

Thread stack is allocated dynamically.

Changing JVM stack size

We’ll analyze memory usage under the effects of 2048KB and 512KB JVM stack size configurations.

Test parameters: 1536 MB heap limit, 1000 threads, 100 KB sized object on each thread, 250 stack frames per thread.

2048 KB:

$ java -Xint -Xmx1536m -Xss2048k -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary com.berksoftware.article.jvmstack.MemoryFiller 1000 100 250
Spawning 1000 threads, each consuming ~100 KB heap, 250 stack frames

1000 threads created
PID: 8309
Native Memory Tracking:

Total: reserved=4981850KB, committed=2731690KB
       malloc: 27914KB #36250
       mmap:   reserved=4953936KB, committed=2703776KB

-                 Java Heap (reserved=1572864KB, committed=526336KB)
                            (mmap: reserved=1572864KB, committed=526336KB) 
... 
-                    Thread (reserved=2112289KB, committed=2112289KB)
                            (thread #1025)
                            (stack: reserved=2109440KB, committed=2109440KB)
                            (malloc=1649KB #6163) 
                            (arena=1200KB #2048)
...
$ ps -o rss,comm -p 8309
   RSS COMM
313232 /usr/bin/java

512 KB:

$ java -Xint -Xmx1536m -Xss512k -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary com.berksoftware.article.jvmstack.MemoryFiller 1000 100 250

Total committed memory appears less than half of 2048 KB stack size:

Native Memory Tracking:

Total: reserved=3432025KB, committed=1181993KB
       malloc: 27913KB #36277
       mmap:   reserved=3404112KB, committed=1154080KB

-                 Java Heap (reserved=1572864KB, committed=526336KB)
                            (mmap: reserved=1572864KB, committed=526336KB) 
... 
-                    Thread (reserved=562465KB, committed=562465KB)
                            (thread #1025)
                            (stack: reserved=559616KB, committed=559616KB)
                            (malloc=1649KB #6163) 
                            (arena=1200KB #2048)
...

Memory usage on the OS is more or less the same:

$ ps -o rss,comm -p 8530
   RSS COMM
314320 /usr/bin/java

Results

  • Although the Native Memory Tracking report indicates a connection between committed memory and provided stack size, the resident set size on the operating system remains constant, at around 307 MB.

  • So the stack is allocated dynamically and the JVM stack size option -Xss sets the limit on how much it can grow.

  • Java 8 and 11 utilizes memory less efficiently on MacOS:

    • On MacOS / ARM64, Java 8 utilized around 80% more memory, compared to Java 17 and 21.

    • On MacOS, for both ARM and x86 architectures, Java 11 utilized around 20% more memory then Java 17 and 21.

NMT reports for MacOs are here

Testing in a container

We will compare the container’s memory usage as reported by Docker under various stack size configurations.

This implementation is used to create images tagged with current Java version.

Test parameters:

  • Container memory limit: 2GB
  • Heap limit: 75% of the container (1536 MB)

We initiate our testing with the following OS / JVM configuration:

  • OS / Architecture: MacOS 13.6 / ARM64
  • JVM: Amazon Corretto 17.0.9

We run containers with JVM stack size options of 2040 KB, 1024 KB and 512 KB:

$ docker run -m 2g --memory-swap 2g --rm -d -e JAVA_OPTS="-Xss2040k -XX:MaxRAMPercentage=75" --name J17_2040k memoryfiller:java17 1000 100 250
$ docker run -m 2g --memory-swap 2g --rm -d -e JAVA_OPTS="-Xss1024k -XX:MaxRAMPercentage=75" --name J17_1024k memoryfiller:java17 1000 100 250
$ docker run -m 2g --memory-swap 2g --rm -d -e JAVA_OPTS="-Xss512k  -XX:MaxRAMPercentage=75" --name J17_512k  memoryfiller:java17 1000 100 250

Docker arguments used:

     
-m 2g : Limit container memory to 2 GB
--memory-swap 2g : Prevent container from using swap memory
-XX:MaxRAMPercentage=75 : Limit JVM heap to 75% of container memory

Excessive memory usage on MacOS

We see a high memory usage reported by ‘docker stats’ on container with 2040 KB stack size:

  NAME        CPU %     MEM USAGE / LIMIT    MEM %       PIDS
  J17_512k    0.25%     315.1MiB  / 2GiB     15.38%      1019
  J17_1024k   0.25%     309.4MiB  / 2GiB     15.11%      1019
  J17_2040k   0.17%     1.967GiB  / 2GiB     98.33%      1019

Memory usage increases dramatically for stack sizes larger than 1024 KB.

This anomaly only shows up on containers running on MacOS (both ARM64 and x86_64 architectures), with LTS Java versions 17 and below.

Linux is unaffected, as is Java 21.

Increasing number of threads

Lastly, we will find the maximum number of threads that can be run in a container with different stack size configurations.

docker run -m 2g --memory-swap 2g --rm -d -e JAVA_OPTS="-Xss1024k -XX:MaxRAMPercentage=75" --name J17_2040k memoryfiller:java17 7000 100 250

On MacOS with default stack size (2040 KB), container gets killed by Docker after ~1050 threads.

With a decreased JVM stack size of 1024 KB (-Xss1024k option), we can go up to ~7100 threads. Further reducing the stack size does not have an impact.

Like the previous test, this only occurs on MacOS, with LTS Java versions 17 and below.

Conclusion

  • JVM Stack size option ‘-Xss’ determines how much virtual memory is reserved per thread. Since actual memory allocation occurs dynamically, this does not affect total memory usage, or scalability, besides the minimum effect of mapping table size, but,

  • Significant high memory usage has been observed on Java versions 8 - 17 running in a container on MacOs, when the stack size is set more than 1024 KB.

  • The default stack size on ARM64 architecture being twice of X86 architecture result in Macs with Apple Silicon encountering the mentioned high memory consumption issue with the default JVM configuration.

  • Java 8 and Java 17 on MacOS allocates memory less efficiently than Java 17 and 21, the difference is bigger on Java 8 NMT reports are here.

Only a few services require more than 1024 KB of stack, with Elasticsearch being one of them. Using Java 21 or applying -Xss1024k argument will lower the memory consumption of containers.