Infrastructure tuning
As Centrifugo deals with lots of persistent connections your operating system and server infrastructure must be ready for it.
Open files limit
You should increase a max number of open files Centrifugo process can open if you want to handle more connections.
To get the current open files limit run:
ulimit -n
On Linux you can check limits for a running process using:
cat /proc/<PID>/limits
The open file limit shows approximately how many clients your server can handle. Each connection consumes one file descriptor. On most operating systems this limit is 128-256 by default.
See this document to get more info on how to increase this number.
If you install Centrifugo using RPM from repo then it automatically sets max open files limit to 65536.
You may also need to increase max open files for Nginx (or any other proxy before Centrifugo).
Ephemeral port exhaustion
Ephemeral ports exhaustion problem can happen between your load balancer and Centrifugo server. If your clients connect directly to Centrifugo without any load balancer or reverse proxy software between then you are most likely won't have this problem. But load balancing is a very common thing.
The problem arises due to the fact that each TCP connection uniquely identified in the OS by the 4-part-tuple:
source ip | source port | destination ip | destination port
On load balancer/server boundary you are limited in 65536 possible variants by default. Actually due to some OS limits not all 65536 ports are available for usage (usually about 15k ports available by default).
In order to eliminate a problem you can:
- Increase the ephemeral port range by tuning
ip_local_port_range
option - Deploy more Centrifugo server instances to load balance across
- Deploy more load balancer instances
- Use virtual network interfaces
See a post in Pusher blog about this problem and more detailed solution steps.
Sockets in TIME_WAIT state
On load balancer/server boundary one more problem can arise: sockets in TIME_WAIT state.
Under load when lots of connections and disconnections happen socket descriptors can stay in TIME_WAIT state. Those descriptors can not be reused for a while. So you can get various
errors when using Centrifugo. For example something like (99: Cannot assign requested address) while connecting to upstream
in Nginx error log and 502 on client side.
Look how many socket descriptors in TIME_WAIT state.
netstat -an |grep TIME_WAIT | grep <CENTRIFUGO_PID> | wc -l
Nice article about TIME_WAIT sockets: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
The advices here are similar to ephemeral port exhaustion problem:
- Increase the ephemeral port range by tuning
ip_local_port_range
option - Deploy more Centrifugo server instances to load balance across
- Deploy more load balancer instances
- Use virtual network interfaces
Proxy max connections
Proxies like Nginx and Envoy have default limits on maximum number of connections which can be established.
Make sure you have a reasonable limit for max number of incoming and outgoing connections in your proxy configuration.
Conntrack table
More rare (since default limit is usually sufficient) your possible number of connections can be limited by conntrack table. Netfilter framework which is part of iptables keeps information about all connections and has limited size for this information. See how to see its limits and instructions to increase in this article.