My client has a few Wordpress websites and they ran into a strange issue. The stack is quite simple, just Wordpress and MariaDB both running in Docker, on Google Cloud.

When it happens, the server doesn’t respond, timing out. The VM is inaccessible, cannot SSH to the VM. We checked the log of both Wordpress and MariaDB, but couldn’t find anything useful to identify the issue. There was only normal logs, and then timing out.

They used to get attacked/DDoS, so we suspected that. But looking at the metrics at that moment, it didn’t make sense. There was no huge burst of external connection to the VM, so there was no attack, at least on application layer (we route the whole site through Cloudflare for DDoS protection).

The metrics looked fine, for the most parts:

  • Just a burst of CPU, then it returned to normal

  • RAM is not full, only ~50% utilization

  • Sudden huge burst of Disk Throughput

We didn’t really understand the root cause, and it only happens a few times over a month, so we kinda “ignore” it.

A few days ago was the final straw, I jumped in to debug it for hours. Turned out it was a simple root cause and a simple fix as well.

Understanding the Cause

We checked the container logs, the metrics and all, but we forgot a very important part: the syslog.

kernel: [ 8085.099452] mariadbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
kernel: [ 8085.099486] CPU: 1 PID: 10069 Comm: mariadbd Not tainted 6.8.0-1033-gcp #35~22.04.1-Ubuntu
kernel: [ 8085.099491] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
kernel: [ 8085.099497] Call Trace:
kernel: [ 8085.099503] <TASK>
kernel: [ 8085.099521] dump_stack_lvl+0x76/0xa0
kernel: [ 8085.099631] dump_stack+0x10/0x20
kernel: [ 8085.099635] dump_header+0x47/0x200
kernel: [ 8085.099691] oom_kill_process+0x116/0x270

There it was! mariadbd was killed by the kernel via oom-killer, meaning it ran out of memory. But didn’t the metrics said the memory utilization was only 50%? Then why it ran out of memory?

What is the OOM Killer?

The OOM Killer (Out-of-Memory Killer) is part of the Linux kernel that prevents the system from freezing when memory runs out. When a process requests more memory than the system can provide, either because physical + swap memory is exhausted or because a large contiguous block can’t be found due to fragmentation, the OOM Killer steps in. It selects processes to terminate based on an internal scoring system, often targeting those with the largest memory usage relative to their priority. In this case, MariaDB’s sudden large allocation request made it the top candidate, so mariadbd was killed instantly.

The Problem: A Tiny Buffer Pool

Quick check:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name           | Value     |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.004 sec)

Only 128MiB for innodb_buffer_pool_size! This is too small for a Wordpress site with moderate load.

The buffer pool is where MariaDB caches table data and indexes. When undersized, the database must fetch more data from slow disk storage, which increases latency and causes large memory bursts when loading data.

On top of that, MariaDB also allocates per-connection buffers (for sorting, joins, and temporary tables) which can multiply RAM usage during concurrent queries. Combined with a tiny buffer pool and possible memory fragmentation, these bursts were enough to trigger the OOM Killer, explaining both the spike in disk throughput and the abrupt crash.

The Fix

I have 2GB RAM for the VM so I allocate:

  • Set the innodb_buffer_pool_size to 50% of the RAM.

  • Limit 75% RAM for MariaDB via Docker. This means, even if the DB somehow messes it up, it will not crash the VM.

mariadb:
  image: mariadb:11.4
  ...
  deploy:
    resources:
      limits:
        memory: 1.5G
  command:
    - "--innodb_buffer_pool_size=1G"

The Result

After restarting the stack with the new settings:

  • Disk throughput dropped sharply.

  • Page loads became noticeably faster, which is a big win for optimization that I didn’t plan for!

  • No more unexpected process kills.

Takeaway

  • Check the app logs, metrics, but don’t forget to check for the syslog if your system is having an issue.

  • Optimize your DB config.