Skip to content

Additional Tweaks

BBR (TCP Congestion Control Algorithm)

Google developed a TCP Congestion Control Algorithm (CCA) called TCP Bottleneck Bandwidth and RRT (BBR) that overcomes many of the issues found in both Reno and CUBIC (the default CCAs). This new algorithm not only achieves significant bandwidth improvements, but also lower latency. It is a TCP congestion control algorithm built for the congestion of the modern internet. TCP BBR is already employed on Google's servers, and now it's possible to apply it to your VPS — so long as your Linux machine is running kernel 4.9 or newer.

Further Reading:

https://cloud.google.com/blog/products/gcp/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster

https://medium.com/google-cloud/tcp-bbr-magic-dust-for-network-performance-57a5f1ccf437

To check which CCA is currently being used, run:

sudo sysctl net.ipv4.tcp_available_congestion_control

It will probably display:

net.ipv4.tcp_available_congestion_control = cubic reno

To switch over to using BBR...

sudo nano /etc/sysctl.conf

and add the following two lines at the bottom of the file

net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

and reload sysctl with:

$ sudo sysctl -p

Now check again with

sudo sysctl net.ipv4.tcp_available_congestion_control

and you should see confirmation:

net.ipv4.tcp_available_congestion_control = bbr

Adjust Open Files Limit

Linux provides ways of limiting the amount of resource that can be used in the form of a 'max processes per user' limit. This feature allows us to control the number of processes an existing user on the server may be authorized to have. The default maximum open files limit is 1024 Which is very low, especially for a busy web server that may be hosting multiple sites driven by SQL databases.

The current limit can be checked with the command:

ulimit -n

To increase this, we need to make changes in three files. Firstly...

sudo nano /etc/sysctl.conf

and add the following line:

fs.file-max = 65535

Then edit the limits.conf file:

sudo nano /etc/security/limits.conf

And add the following:

* soft     nproc          65535    
* hard     nproc          65535   
* soft     nofile         65535   
* hard     nofile         65535
root soft     nproc          65535    
root hard     nproc          65535   
root soft     nofile         65535   
root hard     nofile         65535

Limits can be hard or soft. Hard limits are set by the root user. Only the root user can increase hard limits, though other users can decrease them. Soft limits can be set and changed by other users, but they cannot exceed the hard limits.

And finally we edit the pam.d config:

sudo nano /etc/pam.d/common-session

Add the following line:

session required pam_limits.so

We then apply the changes with:

sudo sysctl -p

Reboot then after logging back in, check with:

ulimit -n

Raise Minimum Amount of Entropy

Explained simply by user Janne Pikkarainen on Serverfault.com:

"Your system gathers some "real" random numbers by keeping an eye about different events: network activity, hardware random number generator (if available; for example VIA processors usually has a "real" random number generator), and so on. If feeds those to kernel entropy pool, which is used by /dev/random. Applications which need some extreme security tend to use /dev/random as their entropy source, or in other words, the randomness source. If /dev/random runs out of available entropy, it's unable to serve out more randomness and the application waiting for the randomness stalls until more random stuff is available."

https://serverfault.com/questions/172337/explain-in-plain-english-about-entropy-available

Virtual machines have issues with the above sources of random input because they lack keyboard and mouse interrupts and may have some or all of the hardware virtualized, making it more predictable. This is especially problematic if you are hosting a web app with TLS everywhere and it doesn't receive much traffic that would help the machine generate additional entropy.

Further Reading:

https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged

https://alcidanalytics.com/p/entropy-on-cloud-vps

To raise the minimum amount of entropy, we can install a service called Haveged.

The haveged project is an attempt to provide an easy-to-use, unpredictable random number generator based upon an adaptation of the HAVEGE algorithm. Haveged was created to remedy low-entropy conditions in the Linux random device that can occur under some workloads, especially on headless servers.

http://www.issihosts.com/haveged/

To install:

sudo apt update && sudo apt install haveged apparmor-utils

And edit the config:

sudo nano /etc/default/haveged

Make sure the line reads:

DAEMON_ARGS="-w 1024"

And prevent AppArmor from stopping the service from starting:

sudo aa-complain /usr/sbin/haveged

Start the service and configure to start on boot:

sudo service haveged start
sudo update-rc.d haveged defaults

Additional Kernel Tweaks

Here are a few more tweaks that have helped, especially with the softIRQ budget warnings I was constantly being notified of by my monitoring system. I wouldn't advise just copying and pasting the whole lot into your config. Take the time to read the following pages first:

http://www.nateware.com/linux-network-tuning-for-2013.html

https://blog.cloudflare.com/the-story-of-one-latency-spike/

https://wiki.archlinux.org/index.php/sysctl

http://www.cyberciti.biz/faq/linux-kernel-etcsysctl-conf-security-hardening/

https://gist.github.com/gburd/6f555a7b1197260d8472e24a35b7752c

Edit the sysctl config file:

$ sudo nano /etc/sysctl.conf

Add the lines below if necessary (please do your research first!). I have included the lines we added earlier for the BBR Congestion Control Algorithm, and for increasing the open files limit.

#Reboot the machine soon after a kernel panic
kernel.panic=10

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1

# Protects against creating or following links under certain conditions
fs.protected_hardlinks=1
fs.protected_symlinks=1

#Enable ExecShield protection
#Set value to 1 or 2 (recommended) 
kernel.exec-shield = 2
kernel.randomize_va_space=2

# increase system file descriptor limit    
fs.file-max = 65535

#Allow for more PIDs 
kernel.pid_max = 65536

#Disable zone reclaim
vm.zone_reclaim_mode = 0

#Reduce swap usage
vm.swappiness = 10

###############################################
########## IPv4 networking start ##############
###############################################

# Send redirects, if router, but this is just server
# So no routing allowed 
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

# Accept packets with SRR option? No
net.ipv4.conf.all.accept_source_route = 0

# Accept Redirects? No, this is not router
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 1

#Ignore bad ICMP errors
net.ipv4.icmp_ignore_bogus_error_responses=1

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# TCP window scaling tries to avoid saturating the network adapter with
# incoming packets.
net.ipv4.tcp_window_scaling = 1

# If enabled, assume that no receipt of a window-scaling option means that the
# remote TCP is broken and treats the window as a signed quantity.  If
# disabled, assume that the remote TCP is not broken even if we do not receive
# a window scaling option from it.
net.ipv4.tcp_workaround_signed_windows = 1

# TCP SACK and FACK refer to options found in RFC 2018 and are also documented
# back to Linux Kernel 2.6.17 with an experimental "TCP-Peach" set of
# functions. These are meant to get you your data without excessive losses.
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1

# The latency setting is 1 if you prefer more packets vs bandwidth, or 0 if you
# prefer bandwidth. More packets are ideal for things like Remote Desktop and
# VOIP: less for bulk downloading.
#net.ipv4.tcp_low_latency = 0

# I found RFC 2923, which is a good review of PMTU. IPv6 uses PMTU by default
# to avoid segmenting packets at the router level, but its optional for
# IPv4. PMTU is meant to inform routers of the best packet sizes to use between
# links, but its a common admin practice to block ICMP ports that allow
# pinging, thus breaking this mechanism. Linux tries to use it, and so do I: if
# you have problems, you have a problem router, and can change the "no" setting
# to 1. "MTU probing" is also a part of this: 1 means try, and 0 means don't.
#net.ipv4.ip_no_pmtu_disc = 0
#net.ipv4.tcp_mtu_probing = 1

# FRTO is a mechanism in newer Linux kernels to optimize for wireless hosts:
# use it if you have them; delete the setting, or set to 0, if you don't.
#net.ipv4.tcp_frto = 2
#net.ipv4.tcp_frto_response = 2

# Log packets with impossible addresses to kernel log? yes
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

# Ignore all ICMP ECHO and TIMESTAMP requests sent to it via broadcast/multicast
net.ipv4.icmp_echo_ignore_broadcasts = 1

#Increase system IP port limits
net.ipv4.ip_local_port_range = 15000 65000

# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0

# Enable TCP/IP SYN cookies, see http://lwn.net/Articles/277146/
# Note: This may impact IPv6 TCP sessions too.
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 5

# Enable source validation by reversed path, as specified in RFC1812, which
# turn on Source Address Verification in all interfaces to prevent some
# spoofing attacks.
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1 

# RFC 1337, TIME-WAIT Assassination Hazards in TCP, a fix written in 1992
# for some theoretically-possible failure modes for TCP connections. To this
# day this RFC still has people confused if it negatively impacts performance
# or not or is supported by any decent router. Murphy's Law is that the only
# router that it would even have trouble with, is most likely your own.
net.ipv4.tcp_rfc1337 = 1

###############################################
########## IPv6 networking start ##############
###############################################

# Uncomment the next line to enable packet forwarding for IPv6.  Enabling this
# option disables Stateless Address Autoconfiguration based on Router
# Advertisements for this host.
#net.ipv6.conf.all.forwarding = 0

# Number of Router Solicitations to send until assuming no routers are present.
# This is host and not router
net.ipv6.conf.default.router_solicitations = 0

# Accept packets with SRR option? No
net.ipv6.conf.all.accept_source_route = 0

# Accept Router Preference in RA?
net.ipv6.conf.default.accept_ra_rtr_pref = 0

# Learn Prefix Information in Router Advertisement
net.ipv6.conf.default.accept_ra_pinfo = 0

# Setting controls whether the system will accept Hop Limit settings from a router advertisement
net.ipv6.conf.default.accept_ra_defrtr = 0

#router advertisements can cause the system to assign a global unicast address to an interface
net.ipv6.conf.default.autoconf = 0

#how many neighbor solicitations to send out per address?
net.ipv6.conf.default.dad_transmits = 0

# How many global unicast IPv6 addresses can be assigned to each interface?
net.ipv6.conf.default.max_addresses = 1

# Do not accept ICMP redirects (prevent MITM attacks)
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.all.secure_redirects = 1

############################################
##### TCP Tuning ###########################
############################################

# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 10
net.core.netdev_max_backlog = 60000
net.core.netdev_budget = 60000
net.core.netdev_budget_usecs = 6000

# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192

# Change Congestion Control Algorithm to BBR
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

When finished then apply the new configurions. A reboot is recommended:

sudo sysctl -p
sudo reboot