| title | Performance Best Practices for SQL Server on Linux | |
|---|---|---|
| description | This article provides performance best practices and guidelines for running SQL Server on Linux. | |
| author | tejasaks | |
| ms.author | tejasaks | |
| ms.reviewer | randolphwest | |
| ms.date | 01/27/2026 | |
| ms.service | sql | |
| ms.subservice | linux | |
| ms.topic | best-practice | |
| ms.custom |
|
[!INCLUDE SQL Server - Linux]
This article provides best practices and recommendations to maximize performance for database applications that connect to [!INCLUDE ssnoversion-md] on Linux. These recommendations are specific to running on the Linux platform. All normal [!INCLUDE ssnoversion-md] recommendations, such as index design, still apply.
The following guidelines contain recommendations for configuring both [!INCLUDE ssnoversion-md] and the Linux operating system (OS). Consider using these configuration settings to experience the best performance for a [!INCLUDE ssnoversion-md] installation.
- Storage configuration recommendation
- Kernel and CPU settings for high performance
- SQL Server configuration
The storage subsystem hosting data, transaction logs, and other associated files (such as checkpoint files for in-memory OLTP) should be capable of managing both average and peak workload gracefully.
In on-premises environments, the storage vendor normally supports appropriate hardware RAID configuration with striping across multiple disks to ensure appropriate IOPS, throughput, and redundancy. However, this support can differ across different storage vendors and different storage offerings with varying architectures.
For [!INCLUDE ssnoversion-md] on Linux deployed on Azure Virtual Machines, consider using software RAID to ensure appropriate IOPS and throughput requirements. When configuring [!INCLUDE ssnoversion-md] on Azure virtual machines with similar storage considerations, see Configure storage for SQL Server on Azure VMs.
The following example shows how to create software RAID in Linux on an Azure Virtual Machine. Keep in mind that you should use the appropriate number of data disks for the required throughput and IOPS for volumes based on the data, transaction log, and tempdb I/O requirements. In the following example, eight data disks were attached to the VM; four to host data files, two for transaction logs, and two for tempdb workload.
To locate the devices (for example /dev/sdc) for RAID creation, use the lsblk command.
# For Data volume, using 4 devices, in RAID 5 configuration with 8KB stripes
mdadm --create --verbose /dev/md0 --level=raid5 --chunk=8K --raid-devices=4 /dev/sdc /dev/sdd /dev/sde /dev/sdf
# For Log volume, using 2 devices in RAID 10 configuration with 64KB stripes
mdadm --create --verbose /dev/md1 --level=raid10 --chunk=64K --raid-devices=2 /dev/sdg /dev/sdh
# For tempdb volume, using 2 devices in RAID 0 configuration with 64KB stripes
mdadm --create --verbose /dev/md2 --level=raid0 --chunk=64K --raid-devices=2 /dev/sdi /dev/sdjFor [!INCLUDE ssnoversion-md], use a RAID configuration. The deployed filesystem stripe unit (sunit) and stripe width match the RAID geometry. For example, the following example shows an XFS-based configuration for a log volume.
# Creating a log volume, using 6 devices, in RAID 10 configuration with 64KB stripes
mdadm --create --verbose /dev/md3 --level=raid10 --chunk=64K --raid-devices=6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
mkfs.xfs /dev/md3 -f -L log
meta-data=/dev/md3 isize=512 agcount=32, agsize=18287648 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=585204384, imaxpct=5
= sunit=16 swidth=48 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=285744, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0The log array is a six-drive RAID-10 with a 64-KB stripe. As you can see:
- For
sunit=16 blks, 16 * 4096 block size = 64 KB, matches the stripe size. - For
swidth=48 blks,swidth/sunit= 3, which is the number of data drives in the array, excluding parity drives.
[!INCLUDE ssnoversion-md] supports both ext4 and XFS filesystems to host the database, transaction logs, and other files such as checkpoint files for in-memory OLTP in [!INCLUDE ssnoversion-md]. Use the XFS filesystem for hosting the [!INCLUDE ssnoversion-md] data and transaction log files.
Format the volume with the XFS filesystem:
mkfs.xfs /dev/md0 -f -L datavolume
mkfs.xfs /dev/md1 -f -L logvolume
mkfs.xfs /dev/md2 -f -L tempdbYou can configure the XFS filesystem to be case insensitive when creating and formatting the XFS volume. This configuration isn't frequently used in the Linux ecosystem but can be used for compatibility reasons.
For example, you can run the following command. Use -n version=ci to configure the XFS filesystem to be case insensitive.
mkfs.xfs /dev/md0 -f -n version=ci -L datavolumeFor the filesystem configuration on Persistent Memory devices, set the block allocation for the underlying filesystem to 2 MB. For more information, see Technical considerations.
Your production environment might require more connections than the default open file limit of 1,024. You can set soft and hard limits to 1,048,576. For example, in RHEL, edit the /etc/security/limits.d/99-mssql-server.conf file to have the following values:
mssql - nofile 1048576Note
This setting doesn't apply to [!INCLUDE ssnoversion-md] services started by systemd. For more information, see How to set limits for services in RHEL and systemd.
To ensure that the drives attached to the system remount automatically after a restart, add them to the /etc/fstab file. Use the UUID (Universally Unique Identifier) in /etc/fstab to refer to the drive, rather than just the device name (such as /dev/sdc1).
Use the noatime attribute with any filesystem that stores [!INCLUDE ssnoversion-md] data and log files. Refer to your Linux documentation on how to set this attribute. The following example shows how to enable the noatime option for a volume mounted in an Azure Virtual Machine.
The mount point entry in /etc/fstab:
UUID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" /data1 xfs rw,attr2,noatime 0 0In the previous example, UUID represents the device that you can find using the blkid command.
[!INCLUDE linux-forced-unit-access]
The following section describes the recommended Linux OS settings related to high performance and throughput for a [!INCLUDE ssnoversion-md] installation. See your Linux distribution's documentation for the process to configure these settings. You can use TuneD as described, to configure many CPUs and kernel configurations, described in the next section.
For Red Hat Enterprise Linux (RHEL) users, the TuneD throughput-performance profile automatically configures some kernel and CPU settings (except for C-States). Starting with RHEL 8.0, a TuneD profile named mssql was codeveloped with Red Hat and offers finer Linux performance-related tunings for [!INCLUDE ssnoversion-md] workloads. This profile includes the RHEL throughput-performance profile, and we present its definitions in this article for your review with other Linux distributions and RHEL releases without this profile.
For SUSE Linux Enterprise Server 12 SP5, Ubuntu 18.04, and Red Hat Enterprise Linux 7.x, you can manually install the tuned package. Use it to create and configure the mssql profile as described in the following section.
Note
Starting in [!INCLUDE sssql25-md], SUSE Linux Enterprise Server (SLES) isn't supported.
The following example provides a TuneD configuration for [!INCLUDE ssnoversion-md] on Linux.
[main]
summary=Optimize for Microsoft SQL Server
include=throughput-performance
[cpu]
force_latency=5
[sysctl]
vm.swappiness = 1
vm.dirty_background_ratio = 3
vm.dirty_ratio = 80
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.transparent_hugepages=always
# For multi-instance SQL deployments, use
# vm.transparent_hugepages=madvise
vm.max_map_count=1600000
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
kernel.numa_balancing=0If you use Linux distributions with kernel versions greater than 4.18, comment the following options as shown; otherwise, uncomment the following options if you use distributions with kernel versions earlier than 4.18.
# kernel.sched_latency_ns = 60000000
# kernel.sched_migration_cost_ns = 500000
# kernel.sched_min_granularity_ns = 15000000
# kernel.sched_wakeup_granularity_ns = 2000000To enable this TuneD profile, save these definitions in a tuned.conf file under the /usr/lib/tuned/mssql folder, and enable the profile using the following commands:
chmod +x /usr/lib/tuned/mssql/tuned.conf
tuned-adm profile mssqlVerify that the profile is active, with the following command:
tuned-adm activeOr:
tuned-adm listThe following table provides recommendations for CPU settings:
| Setting | Value | More information |
|---|---|---|
| CPU frequency governor | performance | See the cpupower command |
| ENERGY_PERF_BIAS | performance | See the x86_energy_perf_policy command |
| min_perf_pct | 100 | See your documentation on Intel p-state |
| C-States | C1 only | See your Linux or system documentation on how to ensure C-States is set to C1 only |
When you use TuneD as described, it automatically configures the CPU frequency governor, ENERGY_PERF_BIAS, and min_perf_pct settings. It uses the throughput-performance profile as the base for the mssql profile. You must manually configure the C-States parameter according to the documentation provided by Linux or your system distributor.
The following table provides recommendations for disk settings:
| Setting | Value | More information |
|---|---|---|
Disk readahead |
4096 | See the blockdev command |
| sysctl settings | kernel.sched_min_granularity_ns = 15000000kernel.sched_wakeup_granularity_ns = 2000000vm.dirty_ratio = 80vm.dirty_background_ratio = 3vm.swappiness = 1 |
See the sysctl command |
-
vm.swappiness: This parameter controls the relative weight given to swapping out runtime process memory compared to the filesystem cache. The default value for this parameter is 60, which indicates swapping runtime process memory pages compared to removing filesystem cache pages at a ratio of 60:140. Setting the value to 1 indicates a strong preference for keeping runtime process memory in physical memory at the expense of the filesystem cache. Since [!INCLUDE ssnoversion-md] uses the buffer pool as a data page cache and strongly prefers to write through to physical hardware bypassing the filesystem cache for reliable recovery, an aggressive swappiness configuration can be beneficial for high-performing and dedicated [!INCLUDE ssnoversion-md].You can find additional information at Documentation for /proc/sys/vm/ - #swappiness
-
vm.dirty_*: [!INCLUDE ssnoversion-md] file write accesses are uncached, satisfying its data integrity requirements. These parameters allow efficient asynchronous write performance and lower the storage I/O effect of Linux caching writes by allowing large enough caching while throttling flushing. -
kernel.sched_*: These parameter values represent the current recommendation for tweaking the Completely Fair Scheduling (CFS) algorithm in the Linux kernel. They improve throughput of network and storage I/O calls with respect to inter-process preemption and resumption of threads.
Using the mssql TuneD profile configures the vm.swappiness, vm.dirty_*, and kernel.sched_* settings. You must manually configure the disk readahead setting by using the blockdev command for each device.
If you install [!INCLUDE ssnoversion-md] on a multinode NUMA system, the following kernel.numa_balancing kernel setting is enabled by default. To allow [!INCLUDE ssnoversion-md] to operate at maximum efficiency on a NUMA system, disable auto NUMA balancing on a multinode NUMA system:
sysctl -w kernel.numa_balancing=0Using the mssql TuneD profile configures the kernel.numa_balancing option.
The default setting of vm.max_map_count (which is 65536) might not be high enough for a [!INCLUDE ssnoversion-md] installation. For this reason, change the vm.max_map_count value to at least 262144 for a [!INCLUDE ssnoversion-md] deployment, and refer to the Proposed Linux settings using a TuneD mssql profile section for further tunings of these kernel parameters. The maximum value for vm.max_map_count is 2147483647.
sysctl -w vm.max_map_count=1600000Using the mssql TuneD profile configures the vm.max_map_count option.
Most Linux installations have this option on by default. For the most consistent performance experience, leave this configuration option enabled. However, if there's high memory paging activity in [!INCLUDE ssnoversion-md] deployments with multiple instances, or [!INCLUDE ssnoversion-md] execution with other memory demanding applications on the server, test your application's performance after executing the following command:
echo madvise > /sys/kernel/mm/transparent_hugepage/enabledOr modify the mssql TuneD profile with the line:
vm.transparent_hugepages=madviseAnd make sure the mssql profile is active after the modification:
tuned-adm off
tuned-adm profile mssqlUsing the mssql TuneD profile configures the transparent_hugepage option.
Along with storage and CPU recommendations, you have network specific recommendations as well. The following recommendations are listed for reference. Not all settings in the following examples are available across different NICs. Refer and consult with NIC vendors for guidance for each of these options. Test and configure this on development environments before applying them on production environments. The following options are explained with examples, and the commands used are specific to NIC type and vendor.
-
Configuring network port buffer size. In the example, the NIC is named
eth0, which is an Intel-based NIC. For Intel based NIC, the recommended buffer size is 4 KB (4096). Verify the preset maximums and then configure it using the following example:Check the preset maximums with the following command. Replace
eth0with your NIC name:ethtool -g eth0
Set both the
rx(receive) andtx(transmit) buffer size to 4 KB:ethtool -G eth0 rx 4096 tx 4096
Check that the value is properly configured:
ethtool -g eth0
-
Enable jumbo frames. Before enabling jumbo frames, verify that all the network switches, routers, and anything else essential in the network packet path between the clients and the [!INCLUDE ssnoversion-md] support jumbo frames. Only then can enabling jumbo frames improve performance. After jumbo frames are enabled, connect to [!INCLUDE ssnoversion-md] and change the network packet size to 8060 using
sp_configure, as shown in the following example:# command to set jumbo frame to 9014 for a Intel NIC named eth0 is ifconfig eth0 mtu 9014 # verify the setting using the command: ip addr | grep 9014
EXECUTE sp_configure 'network packet size', '8060'; GO RECONFIGURE WITH OVERRIDE; GO
-
By default, set the port for adaptive RX/TX IRQ coalescing, meaning interrupt delivery is adjusted to improve latency when packet rate is low and improve throughput when packet rate is high. This setting might not be available across your network infrastructure, so review the existing network infrastructure and confirm that this setting is supported. The example is for the NIC named
eth0, which is an Intel-based NIC:-
Set the port for adaptive RX/TX IRQ coalescing:
ethtool -C eth0 adaptive-rx on ethtool -C eth0 adaptive-tx on
-
Confirm the setting:
ethtool -c eth0
[!NOTE]
For predictable behavior in high-performance environments, like environments for benchmarking, disable the adaptive RX/TX IRQ coalescing and then set specifically the RX/TX interrupt coalescing. See the example commands to disable the RX/TX IRQ coalescing and then specifically set the values:Disable adaptive RX/TX IRQ coalescing:
ethtool -C eth0 adaptive-rx off ethtool -C eth0 adaptive-tx off
Confirm the change:
ethtool -c eth0
Set the
rx-usecsandirqparameters.rx-usecsspecifies how many microseconds after at least one packet is received before generating an interrupt. Theirqparameter specifies the corresponding delays in updating the status when the interrupt is disabled. For Intel bases NICs, you can use the following settings:ethtool -C eth0 rx-usecs 100 tx-frames-irq 512
Confirm the change:
ethtool -c eth0
-
-
Enable receive-side scaling (RSS) and by default, combine the RX and TX side of RSS queues. There are specific scenarios, when working with Microsoft Support, where disabling RSS improves the performance as well. Test this setting in test environments before applying it on production environments. The following example is for Intel NICs.
Get the preset maximum values:
ethtool -l eth0
Combine the queues with the value reported in the preset "Combined" maximum value. In this example, the value is set to
8:ethtool -L eth0 combined 8
Verify the setting:
ethtool -l eth0
-
Work with NIC port IRQ affinity. To achieve expected performance by tweaking the IRQ affinity, consider few important parameters like Linux handling of the server topology, NIC driver stack, default settings, and
irqbalancesetting. Optimizations of the NIC port IRQ affinities settings are done with the knowledge of server topology, disabling theirqbalance, and using the NIC vendor-specific settings.The following example of Mellanox specific network infrastructure helps to explain the configuration. For more information, and to download the Mellanox mlnx tools, see Performance Tuning tools for Mellanox Network Adapters. The commands change based on the environment. Contact the NIC vendor for further guidance.
Disable
irqbalance, or get a snapshot of the IRQ settings and force the daemon to exit:systemctl disable irqbalance.service
Or:
irqbalance --oneshot
Make sure that
common_irq_affinity.shis executable:chmod +x common_irq_affinity.sh
Display IRQ affinity for Mellanox NIC port (for example,
eth0):./show_irq_affinity.sh eth0
Optimize for best throughput performance with a Mellanox tool:
./mlnx_tune -p HIGH_THROUGHPUT
Set hardware affinity to the NUMA node hosting physically the NIC and its port:
./set_irq_affinity_bynode.sh `\cat /sys/class/net/eth0/device/numa_node` eth0Verify the IRQ affinity:
./show_irq_affinity.sh eth0
Add IRQ coalescing optimizations:
ethtool -C eth0 adaptive-rx off ethtool -C eth0 adaptive-tx off ethtool -C eth0 rx-usecs 750 tx-frames-irq 2048
Verify the settings:
ethtool -c eth0
-
After you make the preceding changes, verify the speed of the NIC to ensure it matches your expectations by using the following command:
ethtool eth0 | grep -i Speed
-
For the best storage I/O performance, use Linux multiqueue scheduling for block devices. This scheduling method enables the block layer performance to scale well with fast solid-state drives (SSDs) and multicore systems. Check the documentation to see if your Linux distribution enables it by default. In most other cases, you can boot the kernel with
scsi_mod.use_blk_mq=yto enable it. The documentation for your Linux distribution might have further guidance on this setting. This setting is consistent with the upstream Linux kernel. -
Because multipath I/O is often used for [!INCLUDE ssnoversion-md] deployments, configure the device mapper (DM) multiqueue target to use the
blk-mqinfrastructure, by enabling thedm_mod.use_blk_mq=ykernel boot option. The default value isn(disabled). This setting reduces locking overhead at the DM layer when the underlying SCSI devices useblk-mq. For more information on how to configure multipath I/O, refer to your Linux distribution's documentation.
Ensure you have a properly configured swapfile to avoid any out of memory issues. Consult your Linux documentation for how to create and properly size a swapfile.
If you're running [!INCLUDE ssnoversion-md] on Linux in a virtual machine, make sure you select options to fix the amount of memory reserved for the virtual machine. Don't use features like Hyper-V Dynamic Memory.
Perform the following configuration tasks after you install [!INCLUDE ssnoversion-md] on Linux to achieve the best performance for your application.
Use ALTER SERVER CONFIGURATION to set PROCESS AFFINITY for all the NUMANODEs and CPUs you're using for [!INCLUDE ssnoversion-md] (which is typically for all NODEs and CPUs) on a Linux OS. Processor affinity helps maintain efficient Linux and SQL Scheduling behavior. Using the NUMANODE option is the simplest method. Use PROCESS AFFINITY even if you have only a single NUMA Node on your computer. For more information on how to set PROCESS AFFINITY, see the ALTER SERVER CONFIGURATION article.
Because a [!INCLUDE ssnoversion-md] on Linux installation doesn't offer an option to configure multiple tempdb files, we recommend that you consider creating multiple tempdb data files after installation. For more information, see the guidance in the article, Recommendations to reduce allocation contention in SQL Server tempdb database.
The following recommendations are optional configuration settings that you might choose to perform after installation of [!INCLUDE ssnoversion-md] on Linux. These choices are based on the requirements of your workload and configuration of your Linux OS.
To ensure there's enough free physical memory for the Linux operating system, the [!INCLUDE ssnoversion-md] process uses only 80% of the physical RAM by default. For some systems with large amounts of physical RAM, 20% might be a significant number.
For example, on a system with 1 TB of RAM, the default setting would leave around 200 GB of RAM unused. In this situation, you might want to configure the memory limit to a higher value. You can adjust this value with the mssql-conf tool. For more information, see the memory.memorylimitmb setting that controls the memory visible to [!INCLUDE ssnoversion-md] (in units of MB).
Note
You can also configure a memory limit using the MSSQL_MEMORY_LIMIT_MB environment variable. This method is commonly used when deploying containers, or automating [!INCLUDE ssnoversion-md] container or package-based deployments. The MSSQL_MEMORY_LIMIT_MB environment variable takes precedence over the memory.memorylimitmb setting.
When changing this setting, be careful not to set this value too high. If you don't leave enough memory, you could experience problems with the Linux OS and other Linux applications.
[!INCLUDE cgroup-support]
When setting memory limits for [!INCLUDE ssnoversion-md] on Linux, consider the following guidelines:
-
Use
cgroupto limit the overall memory available to the container. This setting establishes the upper bound for all processes inside the container. -
The memory limit (whether set by
memorylimitmbor theMSSQL_MEMORY_LIMIT_MBenvironment variable) controls the total memory that [!INCLUDE ssnoversion-md] on Linux can allocate across all its components, such as the buffer pool, SQLPAL, SQL Server Agent, LibOS, PolyBase, Full-Text Search, and any other process loaded in [!INCLUDE ssnoversion-md] on Linux. -
The
MSSQL_MEMORY_LIMIT_MBenvironment variable takes precedence overmemorylimitmbdefined inmssql.conf. -
memorylimitmbcan't exceed the actual physical memory of the host system. -
Set
memorylimitmblower than the host system memory and the cgroup limit (if present), to ensure there's enough free physical memory for the Linux operating system. If you don't explicitly setmemorylimitmb, [!INCLUDE ssnoversion-md] uses 80% of the lesser value between total system memory and the cgroup limit (if present). -
The max_server_memory server configuration option limits only the size of the [!INCLUDE ssnoversion-md] buffer pool, and doesn't govern overall memory usage for [!INCLUDE ssnoversion-md] on Linux. Always set this value lower than
memorylimitmbto ensure sufficient memory remains for other components, such as SQLPAL, SQL Server Agent, LibOS, PolyBase, Full-Text Search, and any other process loaded in [!INCLUDE ssnoversion-md] on Linux.