Lessons learned in DevOps during the past years!

A post on a new year is usually about resolutions! But, isn’t the best time to revisit the last year’s mistakes and resolve to never repeat it this year (and years to come)?! Since, my strong skills are with DevOps, I’d like to share some oops moments (you may call them blunders) that you’d never want to do it, if you are starting on DevOps or if you just want to understand where things go wrong in DevOps. In general, you go by the defaults, you’d be in trouble in the future. Whatever software you use, make sure you understand the default values and what each of them does! Here are the top three mistakes that I did…

Note: Most of them affect only the high-traffic sites or a large server that hosts plenty of low-traffic websites. If you host only one low-traffic website, you’d be safe with the default values!

Nginx

Nginx uses worker_processes directive to define the number of Nginx processes to deploy in the server. The default value for worker_processes is just 1! Keeping the default in a production server with thousands of visitors is like sending a crowd through a narrow opening!

The value should match the number of CPU cores! Or set it to auto, to let Nginx auto-detect the number of cores. There are many more options to tune. The official tuning guide is the best place to get started on it.

PHP-FPM

Unlike Nginx, there is no default value for process manager in PHP-FPM. Each GNU/Linux distribution uses its own default. Here’s the default value/s from Ubuntu Xenial (16.04) …

; Choose how the process manager will control the number of child processes.
; Possible Values:
;   static  - a fixed number (pm.max_children) of child processes;
;   dynamic - the number of child processes are set dynamically based on the
;             following directives. With this process management, there will be
;             always at least 1 children.
;             pm.max_children      - the maximum number of children that can
;                                    be alive at the same time.
;             pm.start_servers     - the number of children created on startup.
;             pm.min_spare_servers - the minimum number of children in 'idle'
;                                    state (waiting to process). If the number
;                                    of 'idle' processes is less than this
;                                    number then some children will be created.
;             pm.max_spare_servers - the maximum number of children in 'idle'
;                                    state (waiting to process). If the number
;                                    of 'idle' processes is greater than this
;                                    number then some children will be killed.
;  ondemand - no children are created at startup. Children will be forked when
;             new requests will connect. The following parameter are used:
;             pm.max_children           - the maximum number of children that
;                                         can be alive at the same time.
;             pm.process_idle_timeout   - The number of seconds after which
;                                         an idle process will be killed.
; Note: This value is mandatory.
pm = dynamic

; The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
; This value sets the limit on the number of simultaneous requests that will be
; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
; CGI. The below defaults are based on a server without much resources. Don't
; forget to tweak pm.* to fit your needs.
; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
; Note: This value is mandatory.
pm.max_children = 5

; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 2

; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 1

; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 3

There is nothing wrong with it, if you have a small server with a low-traffic site. For all other sites and servers, this can cause errors, such as the following…

[pool www] server reached pm.max_children setting (5), consider raising it

Also, there is no correct set (of values) for any server. If you have a large server with a bunch of small sites, you may want to use pm = ondemand so that resources are allocated (for PHP) only on-demand. If you have a high-traffic site with (almost) steady traffic 24×7, then the stability can be achieved by setting pm = static and by setting pm.max_children = n where n depends on the total memory and the memory consumed by a single PHP request! Start with a lower value and keep increasing until it occupies 80% to 95% of the memory! Be sure to have a buffer. We never know when we need extra memory! In short, never set pm as dynamic, no matter how good you are at maths!

Redis

Redis uses memory to store its content. No amount of swap can change its behaviour. Redis can be used as the backend for WP Object Cache. There are a number of plugins available for the integration of Redis from the official WP repository and also from Github.

When the server runs out of memory, by default, Redis is configured to decline any more content (until some existing content expires). Here’s the direct quote from the official redis.conf file…

If Redis can’t remove keys according to the policy, or if the policy is set to ‘noeviction’, Redis will start to reply with errors to commands that would use more memory, like SET, LPUSH, and so on, and will continue to reply to read-only commands like GET.

The result can be seen in the WordPress backend where any change in settings would not persist!!!

There are two ways to resolve this situation. First one is obvious… by increasing the server memory. The other solution involves setting a memory limit for Redis and letting it to evict its content automatically when it reaches the memory limit. The official redis.conf file contains excellent documentation on how to set both and what options are available for each of them. Here’s an overview of them…

# Don't use more memory than the specified amount of bytes.
# When the memory limit is reached Redis will try to remove keys
# according to the eviction policy selected (see maxmemory-policy).
#
# If Redis can't remove keys according to the policy, or if the policy is
# set to 'noeviction', Redis will start to reply with errors to commands
# that would use more memory, like SET, LPUSH, and so on, and will continue
# to reply to read-only commands like GET.
#
# This option is usually useful when using Redis as an LRU cache, or to set
# a hard memory limit for an instance (using the 'noeviction' policy).
#
# WARNING: If you have slaves attached to an instance with maxmemory on,
# the size of the output buffers needed to feed the slaves are subtracted
# from the used memory count, so that network problems / resyncs will
# not trigger a loop where keys are evicted, and in turn the output
# buffer of slaves is full with DELs of keys evicted triggering the deletion
# of more keys, and so forth until the database is completely emptied.
#
# In short... if you have slaves attached it is suggested that you set a lower
# limit for maxmemory so that there is some free RAM on the system for slave
# output buffers (but this is not needed if the policy is 'noeviction').
#
# maxmemory <bytes>

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key according to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys-random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
#
# Note: with any of the above policies, Redis will return an error on write
#       operations, when there are no suitable keys for eviction.
#
#       At the date of writing these commands are: set setnx setex append
#       incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
#       sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
#       zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
#       getset mset msetnx exec sort
#
# The default is:
#
# maxmemory-policy noeviction

Did you read the last line maxmemory-policy noeviction?!

Wishlist

In 2018, I wish Nginx sets worker_processes auto;, PHP-FPM sets pm = ondemand along with a high value for pm.max_children and Redis sets maxmemory-policy volatile-ttl! Until then, it is time to set these yourself to avoid the facepalm moments that I had!

Oops Moments in DevOps

Nginx

PHP-FPM

Redis

Wishlist

Leave a Reply Cancel reply