skywalkergcm
1 Nickel

Using large pages and vNUMA to deliver / improve the Java performance of a foglight infrastructure.

Hopefully by sharing this we can help our foglight admins improve the Java performance of the FglMS or FglAM setups by using large pages and vNUMA in the foglight infrastructure setup.   This approach can yield significant performance gains for ANY Java application if applied correctly and tuned in a similar manner. We searched for a while for how to do this b/c there is no one stop shopping for the best way to accomplish the setup that we could find.

While these steps are specifically for RHEL 7.x, similar items can be done for windows which “lock pages and memory” and automatically activates large page use. You still have to tell Java to use large pages in all cases.

I have enabled large pages for MSSQL and other processes in a high performance environments so I know that works. For foglight, the team chose RHEL 7.x as the base hosting OS. I will not be able to post the windows steps here. Also note that the steps for RHEL 6.x are a bit different and a few more files to touch. We took the “tuned” approach b/c it easier than manipulating all those other files required in RHEL 6.x and as RHEL 7.x moves forward, it will become the preferred way to make such tuning changes to the systems.  As always, please thoroughly review the information before implementing and understand the sizings. This information is which is provided “as-is” with no guarantee or warranty. Make backups of any and all files etc..  

YMMV, Enjoy. Gregory Mobley

If this is a VM, some things to verify or change for the very best performance…

1. On the pHW, the BIOS settings have at least been set to “high performance” mode, not balanced, not auto..

2. On VMware, verify there are no power saving features enabled

3. On the OS, there are no power saving features enable.

4. login as root or su root access is required.

5. numactl package is installed  (yum install numactl)

How to setup huge pages for Java performance in foglight…

Overview

Steps to tune a FglMS for best performance on a *nix system after installation. These steps show how to enable "LargePages" on RHEL 7.x and then configure the Foglight JVM to properly use them. These steps may be applied to other applications needing to use "LargePages" in RHEL 7.x. The expected performance gains are in the 30 - 300% range depending on a number of factors such as the:

    1. process memory I/O rates
    1. NUMA / vNUMA setup / environment
    1. pStorage I/O rates
    1. pHardware and OS  power tuning profiles
    1. pHardware BIOS settings
    1. ...

Getting started

Once the FglMS is installed, the out-of-box settings can be tuned to enhance and deliver better performance with simple tuning adjustments.  The sections below contain the changes to the default RHEL 7.x environment as well as the default Foglight setup and  infrastructure.  References are from various online, manual and other informative sources.

Requirements / Assumptions

    1. The size of the FglMS or FglAM is known and planned.   For instance: 
      1. FglMS (evaluating) --> 32GB vRAM, 23 GB Max Foglight Total, 22GB Single Process, 20GB Foglight JAVA, 6-8 vCPU, (When database is external)
      1. FglAM (evaluating) --> 10GB vRAM, 8GB Foglight Max Total, 7GB Single Process, 4GB Foglight Java, 3-4 vCPU
    1. The hosting OS is RHEL 7.x or later.
      1. Note: RHEL 6.x requires similar but different changes to related files.
    1. For the Foglight OS (physical or virtual)
      1. The pHOST BIOS has been tuned for "High Performance".
      1. The guest OS has been properly setup to support NUMA and has also been tuned for "High Performance".
      1. NUMA considerations are in play.
        1. You have the specific pHOST specs which will be used to help with NUMA.  These tell you the maximum sizes for the amount of vRAM and # of vCPU.
          1. The total amount of pRAM installed in the pHOST
          1. The # of pSockets in the pHOST. 
          1. The # of pCores per pSOCKET. (Note watch out on newer Intel pCPUs as they may have sub-NUMA cores.  Also watch out for this on AMD pCPUs)
          1. Maximum # of vCPU to support single node vNUMA = (# of pCores per pSocket - 2 or 1)  (allow for overhead)
          1. Maximum amount of vRAM for a VM to support single node vNUMA = (Total amount of pRAM installed in pHost) * 80% - 85%  / # of pSockets  (allocate only 80-85%, allow 20 - 15% overhead)

Tasks and processes

Shutdown puppet and other "auto override" processes

#> login root
#> systemctl disable puppet
#> systemctl stop puppet
#> ps -A | grep puppet --> verify puppet is not running or it may overwrite any and all changes. 

Add numactl package

This package helps verify hugepages are active.

#> yum install numactl 

Confirm Transparent Huge Pages (THP) are ON

By evaluating what processes are using the THP, we decided to leave it on because Foglight's FMS IS using some THPs even with huge pages active.

#> cat /sys/kernel/mm/transparent_hugepage/enabled
#> [always] madvise never   --> "always" is the selected and correct response
#> cat /proc/meminfo
#> grep AnonHuge /proc/meminfo --> Should show some pages in use...

Determine which GID # is assigned / running the foglight or other processes needing to access Huge pages

#> id foglight
#> uid=6139911(foglight) '''gid=18111(foglight)''' groups=18111(foglight)

Use the web "Java memory calculator" to obtain the values you want to use for HugePages for JAVA or other processes

--> http://www.peuss.de/node/67

Some samples...

Verify and remove any related changes to RHEL 6.x files

In RHEL 7x. there is NO NEED to alter either the /etc/security/limits.conf or /etc/sysctl.conf. Verify and remove any conflicting changes already present! There is an improved and more flexible method!Create two new /etc/tuned profiles: One for the fglms and one for fglam.

Create a unique service unit file for the FglMS or other process

Create /etc/systemd/system/foglight-fms.service

[Unit]
Description=foglight-fms
#
# itperf 20160106 tuned must be up and running to set the fglms or fglam tuned profile active.
# itperf 20160106 without waiting for this service to begin, the FMS cannot get the hugepages as intended
# itperf 20160108 start AFTER networking is ONLINE
#
Wants=network-online.target
After=network.target network-online.target tuned.service
#
[Service]
Type=forking
User=foglight
Group=foglight
ExecStart=/bin/bash -c '/opt/foglight/bin/fmsStartup.sh'
ExecStop=/bin/bash -c '/opt/foglight/bin/fmsShutdown.sh'
ExecRestart=/bin/bash -c '/opt/foglight/bin/fmsShutdown.sh' : /bin/bash -c '/opt/foglight/bin/fmsStartup.sh'
RemainAfterExit=yes
#
# itperf 20160106 if specifying LimitMEMLOCK, value must be in BYTES!
# itperf 20160106 max = infinity
# itperf 20160106 32GB = 34359738368 B
# itperf 20160106 24GB = 25769803776 B (FglMS) for ENTIRE process, leave a buffer for non-hugepage use
# itperf 20160106 22GB = 23622320128 B
# itperf 20160106 20GB = 21474836480 B
# itperf 20160106 10GB = 10737418240 B
# itperf 20160106 8GB = 8589934592 B (FglAM) for ENTIRE process, leave buffer for non-hugepage use
# itperf 20160106 4GB = 4294967296 B
#
LimitMEMLOCK=25769803776
#
[Install]
WantedBy=multi-user.target
#
# EOF
#

Create a unique tuned profile for the FglMS or other process

Create /etc/tuned/fglms/tuned.confChange the values for the FglMS or calculate new values

#
# itperf 20160105 FglMS and FglAM tuned.conf configurations
#
[main]
include=latency-performance
#
# include=virtual-guest
#
[vm]
#
# itperf 20160104 For Foglight do NOT disable transparent huge pages, enable them!
#
transparent_hugepages=always
#
[disk]
devices=sda
elevator=noop
# elevator=deadline
# readahead=4096
#
[sysctl]
#
# itperf 20160104 Set NUMA Balancing = 1
#
kernel.numa_balancing=1
#
# itperf 20160104 Set max shared in BYTES to 23GB = (23GB x 1024 x 1024 x 1024) = 24696061952 Bytes
# itperf 20160104 This value can be the total size of the system memory in order to adjust as required.
# itperf 20160104 This value indicates the TOTAL maximum shared memory this system CAN use, not WILL use.
# itperf 20160104 Use the values marked with JAVA... do not take the whole system!
#
# itperf 20160104 32GB = 34359738368 B(FglMS VM)
# itperf 20160104 23GB = 24696061952 B (FglMS Total)
# itperf 20160104 16GB = 17179869184 B
# itperf 20160108 10GB = 10737418240 B (FglAM VM)
# itperf 20160108 9GB = 9663676416 B
# itperf 20160104 8GB = 8589934592 B (FglAM Total)
# itperf 20160108 7GB = 7516192768 B
# itperf 20160104 4GB = 4294967296 B
#
kernel.shmmax = 24696061952
#
# itperf 20151222 shmall maximum can be ceil(shmax/getconfig PAGE_SIZE) if ONE process will access ALL of the HUGE PAGES
# itperf 20150104 Value is the # of 4096k PAGES!
# itperf 20150104 Set to be shmax - 1GB for breathing room!
#
# itperf 20150104 23GB = 24696061952 B / 4096 B = 6029312 4K PAGES
# itperf 20160104 22GB = 23622320128 B / 4096 B = 5767168 4K PAGES (FglMS Total w/ Buffer)
# itperf 20160104 15GB = 16106127360 B / 4096 B = 3932160 4K PAGES
# itperf 20160104 7GB = 7516192768 B / 4096 B = 1835008 4K PAGES (FglAM Total w/ Buffer)
# itperf 20160108 6GB = 6442450944 B / 4096 B = 1572864 4K PAGES
# itperf 20160104 4GB = 4294967296 B / 4096 B = 1048576 4K PAGES
# itperf 20160104 3GB = 3221225472 B / 4096 B = 786432 4K PAGES
#
kernel.shmall = 5767168
#
# itperf 20151222 Set # of 2048KB (2MB) Huges pages = 22GB x 1024 x 1024 = 23068672 KB / 2048 KB = 11264 huge pages required
# itperf 20151222 Allow some breathing room for other processes to use HUGE PAGES
# itperf 20160104 23GB = 24117248 KB / 2048 KB = 11776 2048KB huge pages
# itperf 20160104 22GB = 23068672 KB / 2048 KB = 11264 2048KB huge pages (FglMS Total w/ Buffer)
# itperf 20160104 15GB = 15728640 KB / 2048 KB = 7680 2048KB huge pages
# itperf 20160104 7GB = 7340032 KB / 2048 KB = 3584 2048KB huge pages (FglAM Total w/ Buffer)
# itperf 20160106 6GB = 6291456 KB / 2048 KB = 3072 2048KB huge pages
# itperf 20160104 4GB = 4194304 KB / 2048 KB = 2048 2048KB huge pages
# itperf 20160104 3GB = 3145728 KB / 2048 KB = 1596 2048KB huge pages
#
vm.nr_hugepages = 11264
#
# itperf 20151222 - Set the GID for that can use these pages to foglight's GID
#
vm.hugetlb_shm_group = 18111
#
# EOF
#

Activate / Verify the customized tuned profile

 [root@...]# tuned-adm profile list
Available profiles:
- balanced
- desktop
- fglms
- latency-performance
- network-latency
- network-throughput
- powersave
- throughput-performance
- virtual-guest
- virtual-host
Current active profile: virtual-guest

To activate the fglms profile:

[root@...]# tuned-adm profile fglms

Verify the profile is active...

[root@...]# tuned-adm profile list
Available profiles:
- balanced
- desktop
- fglms
- latency-performance
- network-latency
- network-throughput
- powersave
- throughput-performance
- virtual-guest
- virtual-host
Current active profile: fglms

Finally, reboot the system to insure ALL the tuned values are picked up correctly system-wide.

shutdown -r now

Changes to the foglight control files..

Configuring FglMS to performance optimize the configuration

After installing FglMS, update  to enable the items per below in the /opt/foglight/config/server.config

1) Allow the JVM to use between 70% and 80% of the available system memory.

# itperf 20151104 - Establish JVM size to 20G for a 32G vRAM VM, use LARGEPAGES (requires OS tuning)
#
server.vm.option0 = "-Xms20G";
server.vm.option1 = "-Xmx20G";
server.vm.option2 = "-XX:ReservedCodeCacheSize=512M";
server.vm.option3 = "-XX:+UseCodeCacheFlushing";
server.vm.option4 = "-XX:+ForceTimeHighResolution";
server.vm.option5 = "-XX:+UseLargePages";
server.vm.option6 = "-XX:LargePageSizeInBytes=2m";
# 

How to VERIFY the settings are changed are what is active…

Requirements / Assumptions

6. The performance tuning items outlined earlier have been performed and the system rebooted.

7. login as root or su root access is required.

8. numactl package is installed  (yum install numactl)

Tasks and processes

Verify the number of Huge Pages (HP) used by NUMA node

#> ps -A | grep fms or fglam

#> numastat –p

 

Below we see the fms process is using ~ 20870MB of HUGE Pages (20G) + other for nearly the 22GB HugePages we have allocated..(JAVA)

The remaining pages are held privately, shared, heap or as THP etc..

Per-node process memory usage (in MBs) for PID 786 (fms) up 23 days..

                           Node 0           Total

                 --------------- ---------------

Huge     ------>       20870.00       20870.00

Heap                       14.72           14.72

Stack                     247.51         247.51

Private                   497.59         497.59

---------------- --------------- ---------------

Total                   21629.82       21629.82

 

In the example below, we see that NO HUGE pages are allocated or in use. The "Private" pages may be THP in use instead. See below for more details.

Per-node process memory usage (in MBs) for PID 755 (fms)

 

                           Node 0           Total

                 --------------- ---------------

Huge   -------->           0.00           0.00

Heap                       11.76           11.76

Stack                     371.73         371.73

Private                 15795.83       15795.83

---------------- --------------- ---------------

Total                   16179.32       16179.32

Verify the number of Transparent Huge Pages (THP) available

After evaluating the use of THP by foglight, we determined leaving THP ON along with allocating Huge Pages was optimal.

#> grep AnonHuge /proc/meminfo  

AnonPages:       609836 kB

AnonHugePages:   368640 kB/proc/meminfo

Verify which processes are using THP and how large are the segments

#>'''grep -e AnonHugePages /proc/*/smaps | awk '{ if($2>4) print $0} ' | awk -F "/" '{print $0; system("ps -fp " $3)} ''''

/proc/1052/smaps:AnonHugePages:     12288 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     22528 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     38912 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052    1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     2048 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     4096 kB

UID       PID PPID C STIME TTY         TIME CMD

foglight 1052     1 83 15:33 ?       00:32:09 Foglight 5.7.5.2: Foglight Daemon

/proc/1052/smaps:AnonHugePages:     6144 kB

Verify the LimitMEMLOCK is active for the foglight-fms or foglight-fglam process

#> systemctl show foglight-fms.service | grep LimitM

LimitMEMLOCK=25769803776

#> systemctl show foglight-fglam.service | grep LimitM

LimitMEMLOCK=8589934592  

Top may show smaller memory use because Huge Pages are locked and not visible



<END of Page 1>

 

0 Kudos
3 Replies
skywalkergcm
1 Nickel

RE: Using large pages and vNUMA to deliver / improve the Java performance of a foglight infrastructure.

0 Kudos
skywalkergcm
1 Nickel

RE: Using large pages and vNUMA to deliver / improve the Java performance of a foglight infrastructure.

0 Kudos
skywalkergcm
1 Nickel

RE: Using large pages and vNUMA to deliver / improve the Java performance of a foglight infrastructure.

Hi, We've been studying and testing a few more optimizing JAVA parms for our Foglight workloads. These new JAVA parms I've added were published by Mr. Emad Benjamin of VMware for generally improving Java's performance, especially when virtualized. Per the results / graphs below, we reduced the number of GC by about 1/2 at the current workload and significantly decreased the GC durations.   These graphs are from 2 FglMS children with about the same number of VMs. The 1st graph shows the sister FglMS child without the new pars while the 2nd graph shows the other FglMS Child's performance.  The delta translates into faster GUI responses, less GC less pauses and when there is GC, the duration is much shorter!  I've been testing this for about 1 week now but so far stable.  YMMV.  No warranty written or implied.   

Hope this helps.

The parms being tested are per below.  The RHEL 7.x  system is setup to use LARGE PAGES per above postings.

I elaborated on the parms as comments...

#
# itperf 20151124 - Establish JVM size of 20G for 32GB of vRAM DO NOT push too close to edge, leave breathing room!
#
server.vm.option0 = "-Xms20G";
server.vm.option1 = "-Xmx20G";
#
# itperf 20160229 - Testing Emad Benjamin's high performance Java recommendations for virtualizing JAVA scaling performance
# itperf 20160229 - Many of the GC settings below are based on Emad's extensive published works.
# itperf 20160229 - Set -Xmn to ~ 30% of -Xms size
server.vm.option2 = "-Xmn6G";
server.vm.option3 = "-XX:ReservedCodeCacheSize=512M";
server.vm.option4 = "-XX:+UseCodeCacheFlushing";
#
# itperf 20151124 - Reduce JAVA's timer resolution.  This is a  very OLD Java defect that had the opposite impact of the original intent.
# itperf 20151124 - It has been broken so long, that the decision was made to leave it be as not fixed.
# itperf 20151124 - The setting as originally implemented actually reduces the JAVA resolution timer which eases the timing burden on the VM.
server.vm.option5 = "-XX:+ForceTimeHighResolution";
#
# itperf 20151124 - Use Large Pages
server.vm.option6 = "-XX:+UseLargePages";
server.vm.option7 = "-XX:LargePageSizeInBytes=2m";
#
# itperf 20160311 - Set to pretouch and zero ALL pages during startup initialization.  Slower init, faster long term, especially for large page use!
server.vm.option8 = "-XX:+AlwaysPreTouch";
#
# itperf 20160301 - Set to allow minor GC before CMS remark. Reduces duration of larger CMS remark cycles.
server.vm.option9 = "-XX:+CMSScavengeBeforeRemark";
#
# itperf 20160312 - Change CMS to run ONLY at 75%
server.vm.option10 = "-XX:CMSInitiatingOccupancyFraction=75";
server.vm.option11 = "-XX:+UseCMSInitiatingOccupancyOnly";
#
# itperf 20160312 - Set explicit Survior ratios
server.vm.option12 = "-XX:TargetSurvivorRatio=80";
server.vm.option13 = "-XXSmiley FrustratedurvivorRatio=8";
server.vm.option14 = "-XX:MaxTenuringThreshold=15";
#
# itperf 20160314 - Disable UseAdaptiveSizePolicy when using SurviforRatio and MaxTenuringThreshold
# itperf 20160314 - The AdaptiveSizePolicy favors minimizing memory usage with more frequent GC and thus more frequent pauses.
# itperf 20160314 - The AdaptiveSizePolicy may improve server throughput at the cost of UI response per above.
# itperf 20160314 - This AdaptiveSizePolicy MAY be auto-disabled when setting Xms=Xmx.  Not able to confirm.  Disable here to be sure!
server.vm.option15 = "-XX:-UseAdaptiveSizePolicy";
#

Cheers!

 

0 Kudos