---
title: 'Laravel API, PHP-FPM, Nginx, Postgres, Kubernetes'
source: 'https://youtube.com/watch?v=G7Nug1Mr9VE'
video_id: 'G7Nug1Mr9VE'
date: 2026-06-28
duration_sec: 5149
---

# Laravel API, PHP-FPM, Nginx, Postgres, Kubernetes

> Source: [Laravel API, PHP-FPM, Nginx, Postgres, Kubernetes](https://youtube.com/watch?v=G7Nug1Mr9VE)

## Summary

This video provides a comprehensive walkthrough of deploying a Laravel RESTful API with PHP-FPM, Nginx, and PostgreSQL inside Docker containers orchestrated by Kubernetes. The presenter covers production cluster architecture, configuration decisions for PHP-FPM and Nginx, development environment setup, testing, deployment, and API design features like structured error responses and database constraint validation.

## Transcript

Hello everyone. This video is a
discussion of a Laravel restful API
project using PHP FPM and engine X with
a Postgress database all in docker
containers deployed in Kubernetes.
So I'll cover the setup in Kubernetes,
the choices and thought processes when
configuring PHP FPM and EngineX, the
design of the development environment,
testing, deployment, and some features
of the API. Plus, there's a Vue.js front
end thrown in as well to log into the
API and interact with it. It's a bit too
simple to discuss in this video, but
it's also included in the code, and all
the code is in the repository in the
video description.
The production environment is my
Kubernetes cluster at home with a single
control plane and two worker nodes on a
pretty slow internet connection,
but that doesn't have too much impact on
the project and it can mostly be applied
anywhere.
The API itself has been kept generic so
you won't have to listen to any domain
specific concepts. The features I've
added are focused on internal workings
rather than a specific API use case. So
they could be applied and expanded to
many use cases.
For example, responses are restricted to
a set predictable structure, and each
error has a unique integer error code,
so the client can more easily handle
error responses programmatically.
Input validation for database columns
with unique or foreign key constraints
have been pushed to the database to
reduce the number of queries needed.
That's a controversial one, so we'll
discuss the pros and the cons.
There's also a fix for a very common
enginex inefficiency regarding HTTP
compression
and there are some helpful logs to help
us set the PHP FPM and enginex
configurations.
So the structure of the video is
production environment,
PHP FPM and EngineX configuration,
the Docker files, the dev environment
and something of a CI/CD pipeline but
very minimal and the Laravel API itself
and discussion of security
considerations throughout.
So what does the production cluster look
like? The ingress controller manages
traffic for the cluster, handles TLS
termination, and applies HTTP
compression. It forwards API requests to
the EngineX and Laravel pod. If traffic
increases, the EngineX and Laravel pod
is duplicated by a horizontal pod
autoscaler.
So, as usual, we want to make sure that
it's completely stateless.
Then for the database we're using
Postgress managed by a stateful set.
Postgress zero is the primary instance
that all Laravel instances write to.
Postgress one and two and so on are the
readonly standby instances to to
increase read throughput if we have high
concurrent usage.
So whenever a new standby instance
starts up, it duplicates the primary
database with pg
base backup.
And whenever the primary instance
executes a write operation, it sends the
W or write ahead log records to each
standby so that all instances are
synchronized in near real time.
To configure this in Laravel, in the
database config file, add read and write
elements to the database you're using,
specifying the pod for the right
instance and the service for the read
instances.
And for the development or testing
environments, override that in the
Laravel.env env file specifying the
database's service name in docker
compose
for cache we're using reddus
and of course there's the view front end
which is mostly decoupled it could be
hosted here or anywhere else and finally
all these components are scraped by
prometheus and graphana molds that data
into dashboards which are also
accessible through ingress controller
Since the cluster has so few nodes, we
can present the CPU and memory usage of
each node in a single dashboard.
And the other dashboard I watch most is
for the ingress controller, especially
latencies for the three connected
services, the API, the front end, and
the graphana front end we're using right
now.
On my slow home internet, the latencies
don't help us model production systems
very much. They're pretty slow, but the
variance can still be insightful.
The variance is expressed here with the
average latency of the fastest 50%, 95%
and 99% of requests.
In the ingress controller, we're most
commonly receiving requests from the CDN
with the original client's IP in the X
forwarded for header.
So by specifying the CDN cider ranges
that we trust, we can safely strip them
from the forwarding chain and just leave
the client's IP.
That's important for logging and for
rate limiting by IP address downstream
in EngineX or Laravel.
The ingress controller like any
engineext instance generates a random
request ID and here we're attaching it
to the web request with a custom header.
And that way we can reference a
consistent request ID here in the
ingress controller and also as the
request passes on through engine X and
Laravel and back again.
Regarding HTTP compression,
by default, gzip and brley compress
files of any size, but they both add
metadata to the compressed files. So for
files that are already small, we're
actually expending CPU power to compress
a file and actually increase the amount
of data to be transferred.
So we specify brutley min length and
gzip min length to set the file size
threshold at which the ingress
controller will apply compression.
But what's the logical threshold to set?
Well, the equilibrium point at which
compression starts to reduce file size
is different per file, but is usually
between 100 and 400 kilob.
So, is that the answer? Well, not
really. When we send a message over TCP,
as long as the payload fits within one
TCP segment,
the size of that payload has a
negligible impact on the transmission
time.
It's like adding more passengers to a
plane. It has almost no impact on the
flight time.
So from the client's perspective,
latency doesn't scale linearly with the
file size. It potentially jumps stepwise
when the file size necessitates sending
another TCP segment.
That means HTTP compression is
worthwhile only if it has a decent
likelihood of reducing the number of TCP
segments needed to hold a particular
payload.
So a legitimate strategy is to set the
HTTP compression threshold at the file
size that would trigger a second TCP
segment.
The maximum size of a TCP segment
depends on the MTU, the maximum
transmission unit on layer 3 of the OSI
model, which is 1,500 bytes.
Each IP packet has an IP header taking
20 to 60 bytes and a TCP header taking
20 to 40 bytes. And the rest of the
1,500 bytes is for the payload.
For the largest possible payload that is
still contained in one TCP segment,
it would contain the TLS record for
encryption, the HTTP headers, and
finally once all of that is accounted
for, the remaining space is for the HTTP
body, which is the candidate for
compression.
In this project, it's too early to
finalize what our standard HTTP headers
will be. So, we can't finalize this
optimization yet. But this is the
formula that will decide it once all of
the components are confirmed.
But there's one last twist. If the
response doesn't have a content length
header, then EngineX ignores the gzip
min length and brought minength
directives and compresses every file
regardless of file size.
So for that reason in Laravel, we've got
a middleware to measure the body and add
the content length header.
There's a PHP.ini ini directive called
MB string.f funk_over.
If we set it to zero, we can safely use
the faster stren function for adding the
content length header.
Otherwise, we'd need to use the
multibbyte equivalent mb
strlen and specify 8 bit encoding to be
sure we're getting the number of bytes
instead of the character count.
We have to make sure this middleware is
late in the stack after any middleware
that will modify the body because if the
response body is longer than the content
length header, engine X will cut off the
excess.
Okay, onto Laravel and Engine X. There
are four ways of connecting PHP FPM with
EngineX in Kubernetes.
The first decision is whether to put the
containers in separate pods or in the
same pod.
Engine X can handle far more connections
than PHP FPM can and separate pods would
let us scale them independently. So we
could save the memory overhead of the
excess engine X pods.
Each engine X pod has an overhead of
about 10 MGB.
The downside is that EngineX could
forward requests to PHP FPM instances on
different nodes, which would add network
latency.
Getting around this is fiddly and might
not be worth the memory saved.
Also, separate pods means that the logs
for a particular request would be split
up between different pods.
If we go with a single pod, that opens
up the choice of how EngineX and PHP FPM
will talk to each other. Either by using
TCP or by mounting a shared volume onto
both containers to hold a Unix socket
file.
Each TCP connection needs to be
established with a TCP handshake
consisting of three messages sent back
and forth with the default EngineX
configuration. That handshake happens
for every single request rooted to PHP
FPM.
Each message sent is bundled with IP and
TCP headers increasing the amount of
data to be transferred. And depending on
the size of the request and the
response, they might be broken up into
packets before sending, then reassembled
at the other end. And finally, the
connection needs to be torn down with
another three messages.
Due to all this redundant stuff, I
really expected sockets to be measurably
faster than TCP.
Maybe the tearown would happen
concurrently to the response being sent
out, but the rest of it surely increases
latency.
However, in the very quick tests that I
set up, latency was almost identical
between the two methods.
That's interesting academically, and I
want to investigate more when I have the
time. But practically speaking, if we're
interested in such small savings in
latency, then we'd likely be better off
considering SWUL instead of PHP FPM and
EngineX.
And then this whole architectural
decision would go away.
Both TCP and sockets benefit from the
EngineX directive fast CGI keep con
which keeps the connection open between
requests. So we wouldn't need the TCP
handshake for every request.
The PHP FPM counterparts are PM do
process idle timeout which sets the time
a worker can be idle before it's killed
and PMAX
requests which caps how many requests a
worker can serve before it's respawned.
And we need to set fast CGI pass in
engineext and listen in PHP FBM.
For TCP connections, we tell them which
port and for socket, we tell them the
location of the socket file.
Then the fourth and final option for
connecting enginex with PHP FPM in
Kubernetes is putting them in the same
container in the same pod.
This is workable but it creates
complications.
Kubernetes monitors the status of the
process with P1 as a health check. If
the process exits, Kubernetes stops the
container even if other processes are
running. And if another process exit,
Kubernetes has no idea and does nothing.
So if a container has more than one key
process, we need to implement a
replacement for the Kubernetes health
checks and process management.
In the end, I went with the middle
ground option of same pod, different
containers with shared volume between
them for a Unix socket file. Mostly
because I hadn't tried this setup
before.
And we'll see the detailed
implementation in the Kubernetes
manifest and the Docker file later in
the video.
So with that decided, let's jump into
the pod manifest.
As I explain components, I'll also build
up this visual representation to help
visualize how everything links together.
Obviously the Laravel and EngineX
containers are at the core and the first
step is to inject the configuration
files with Kubernetes secrets and config
maps and to mount the shared volume for
PHP FPM to create the socket file we
just talked about.
If we can make the whole file system in
a container read only, that's great for
security. So we need to identify where
the application needs right access and
then mount volumes at those locations
with right access and make everything
else read only
and the socket volume is our first one
of those.
The Laravel Bootstrap caches never
change in production.
So at first thought we'd run the cache
creation commands in the Docker file so
that they're part of the image and then
we'd make them read only in production.
That's fine for event cache, root cache,
and view cache. But config cache needs
to read thev file.
For security reasons, we can't put
sensitive files like thev into the
image. So the only safe option is to run
config cache within each pod as it
starts up.
So that means the cache directory needs
to have right access in production. But
we'd prefer it to only have read access
because it never changes once created
and any attacker that gets right access
to the bootstrap caches could obviously
do serious damage.
But there is a way to get the best of
both worlds.
First, in the Docker file, we rename the
bootstrap/cache
directory to something else like cache
temp.
Then in Kubernetes, we run a laravel
init container when the pod starts up.
It has the env file injected and a
writable volume mounted at
bootstrap/cache.
We copy everything from the temp cache
directory into the volume at
bootstrap/cache
and then run php artisan config cache to
create the final bootstrap cache file.
The time command just logs the memory
usage to help us set resource limits
later.
Then after the init container has
finished, the main Laravel container
starts up with the cache volume mounted
as readonly true.
And that's how we get readonly cache
with sensitive data compiled at the pod
startup.
We're also running additive database
migrations in the init container.
The d- isolated flag is very
consequential.
It means while this migration is
running, although normal reads and
writes can still happen concurrently,
no other migrate command with the
isolated flag can begin.
A migrate command without the isolated
flag can still run. So it's important
that we add it to every migrate command
in production to make sure concurrent
migrations are impossible.
It uses the cache to track whether the
isolated command is running or not
currently.
So if you're using the database for
cache, it creates a catch22 situation
for your first migration. you'd need to
run migrations once without the isolated
flag first,
but most applications would just use
Reddus, so it wouldn't be a problem.
An unexpected problem I ran into is that
if a container crashes and gets
restarted by Kubernetes,
the mounted volumes are not cleared.
That's the behavior we want most of the
time. We want data to be permanent for
the life of the pod, but it can cause
some misleading error logs. In the init
container startup script, it copied and
created the cache files with no problem.
Then it hit an error with migrations.
So, Kubernetes killed the container and
ran it again. But on the second run, the
volume was already populated. So the
error logs were about file permissions,
not the migration commands.
So just bear that in mind. If all else
is equal, move any operations on mounted
volumes to the end of the script. But in
our case, migrations actually depend on
the cache files being present.
We've got another init container to set
up the directory structure for EngineX's
writable volume.
I prefer to use chain guard images to be
minimal and more secure, but the CPUs of
my nodes are too old and don't support
them.
And finally, we have one last container
that scrapes the enginex metrics
endpoint and presents that data in a
format that Prometheus can then scrape.
Port 8080 is for publicly available
endpoints via ingress. Port 8081
is for internal traffic like health
checks and metrics and then Prometheus
scripts the exporter on its default port
9113.
Kubernetes provides three types of
probes or health checks. Probes are
attached to containers, but the actions
on success or failure can impact either
that container it's attached to or the
whole pod.
When a pod starts up, if at least one
container has a startup probe, then that
pod won't initially be added to the
services end points. So, it won't be
accessible by other pods or by outside
traffic from ingress.
If a startup probe fails, the container
it's attached to is killed and by
default configuration in Kubernetes, any
killed container is instantly restarted.
So that's hoping that a restart or a
slight delay will fix whatever caused
the startup probe to fail.
Once a container startup probe passes,
it will never run again. And instead,
the container's readiness probe and
livveness probe begin running if it has
them. And they will then run repeatedly
for as long as the pod exists.
Once all startup probes in a pod pass
and if there are no readiness probes,
then the pod is added to the services
endpoints and it starts serving traffic.
If there are one or more readiness
probes, then the pod waits on them.
Once all readiness probes pass, the pod
is added to the services endpoints.
But the readiness probes keep running
continuously.
And if at any time a readiness probe
fails, the pod is removed again. So a
failed readiness probe only has a pod
level effect. It doesn't kill the
container.
That's what livveness probes are for.
When a livveness probe fails, it has no
pod level effect. The pod can still
receive external traffic. Instead, a
failed livveness probe kills the
container it's attached to.
So, to summarize, a failed readiness
probe stops traffic flow reaching the
pod. A failed livveness probe kills the
container it's attached to. And a failed
startup probe does both of those, but
only when the pod is starting up.
Phew. I think that's the most concise
summary of probes I can give. So, how
can we apply these to our Laravel and
EngineX pod?
Well, we don't want requests reaching a
broken pod. So, a readiness check is
crucial.
And the ability to serve requests
depends on EngineX and Laravel both
working. So we put the readiness probe
on the engine X container to query a
Laravel endpoint that returns a simple
plain text response.
That means the readiness check only
passes if EngineX and Laravel and their
connection are all fine.
The ability to serve requests also
depends on the connection to the
database and the cache. So we might
consider checking those connections as
part of the readiness check.
But if a database problem did occur, it
would affect all Laravel pods.
We wouldn't have a mix of healthy and
unhealthy pods. And the readiness probe
would react by removing this pod from
its service, which doesn't do anything
to solve the database problem. And
actually we'd slightly prefer to keep
the Laravel pod serving requests to give
as graceful a response as possible.
And then we'd rely on some other probe
to heal the database problem closer to
where it occurred. So no, the readiness
probe should not check the connections
to database and cache.
However, it is a good idea to add those
connections to the startup probe. That's
so that if we're deploying a new version
and I've messed up the connection
configurations,
Kubernetes will stop it going live and
keeps the old version alive serving
requests.
Another reason to have a startup check
is because we're doing opcache
preloading at startup. So we need some
flexibility around the slightly
unpredictable bootup time.
The startup probe of course also needs
to check both enginex and Laravel and
their connection. So we add it to the
engineext container and call a startup
endpoint in Laravel.
This design has one perverse side effect
that if the startup probe fails, it
kills the engine X container it's
attached to. Even though the problem is
much more likely to come from the
Laravel container or the database or
cache connections, but that's one
imperfection I think we can live with.
And finally, what about livveness
probes? Well, in the engineext
configuration, I created an endpoint
that returns a simple plain text
response, but I think the chance of
EngineX messing up is so unlikely.
Currently, I don't think it's worth
running a constant probe.
Laravel is a bit more likely to mess up.
So, we've got a livveness probe querying
the PHP FPM status page.
The last big topic of this manifest is
the memory limits that Kubernetes
imposes on each container. So this is
where we'll transition to the topic of
configuration for PHP, PHP FPM and
engine X.
The Kubernetes memory limit is a fail
safe that kills the container if it's
exceeded.
That's a pretty drastic action. So we
need to set it high enough to cover the
peak memory usage in normal operation
so that it's only triggered by abnormal
memory usage that we want to catch early
and contain.
This is my formula to estimate peak
usage in normal operation.
My understanding can be improved further
but I think this is a decent formula for
now. And at the end we use a margin
component to represent the degree of
confidence we have in our estimation.
PHP FBM has one master process and a
variable number of workers that serve
web requests. So we need to determine
which memory expenses are per worker and
which are shared between all workers.
The master process overhead, the PHP
interpreter and its extensions, OPC
cache, and any mounted volumes stored in
memory should all be counted once. And
then everything in the brackets is
multiplied by the maximum number of
workers specified by the PHP FPM
directive, PM domax children.
For an application that's 100% CPU
inensive, we'd set PM max children equal
to the number of CPU cores available to
the container.
But the larger the IO weight is expected
to be like waiting for database queries
to come back, the more we can raise PMAX
children above the number of cores.
Workers process one request at a time.
So memory usage doesn't scale infinitely
as the concurrency of requests grow. The
number of workers is a cutoff. So
pm.mmax children is the ultimate cutff.
And any queue of waiting requests mostly
consumes engineext's memory allowance,
not PHP FPMs.
Memory limit in PHP.ini INI is the
memory usage failsafe on script
execution.
So just like the Kubernetes memory
limit, we need to predict the peak
memory usage in normal operation of a
single script this time and add a margin
of confidence.
If abnormal memory usage happens in a
script, we want the PHP memory limit to
kill that script.
And the additional margin of the
Kubernetes memory limit means it can
only be triggered in an even rarer and
more extreme situation and it would kill
the whole container, not just a single
request.
To help set PHP's memory limit in
Laravel, we're logging the peak memory
usage for every request.
Similarly, we're also logging the real
path cache size and the worker ID.
Doing it in the middleware's terminate
method means it's executed after the
response is sent out.
We can also use Xdebug profiling to find
exactly how memory is used in a
particular request execution.
And while profiling in Laravel, we
should disable garbage collection at the
start of the script to ensure accurate
readings.
So there's a setting in thev file to
toggle garbage collection.
We have a similar structure for the
engine X prediction of peak memory usage
in normal operation.
Shared resources are counted once and
the items in brackets are per
connection. So they get multiplied by
worker processes which is the number of
workers and worker connections which is
the maximum number of connections per
worker.
The default for worker connections is
512
and that can realistically be set as
high as 10,000 or more.
That means that since we will add a
margin of confidence to each of these
buffer sizes here to ensure they can
satisfy legitimate requests,
those margins would then be multiplied
thousands or tens of thousands of times
when we're calculating the container
level memory limit.
We would then be reserving a huge amount
of memory for a peak usage scenario that
is very unlikely to occur.
So to manage that problem, first we need
to align worker connections with the
peak request concurrency we want to
guarantee satisfying and with the memory
or the financial constraints that we
have.
Second, let's consider the size of these
buffers extremely carefully. Can we
restrict them without hurting UX?
Does exceeding a particular buffer kill
a request or does it just downgrade
performance and by how much?
So with that as our goal, how do these
buffers work? For an incoming request,
EngineX puts the headers into the client
header buffer.
If the headers exceed that buffer,
EngineX puts them into the large client
header buffers.
And if in that scenario the initial
client header buffer is no longer used,
we could safely remove that from our
peak usage formula. But I haven't had
time to experiment with that yet, so I'm
keeping it in just to be safe.
If the large client header buffers are
exceeded, EngineX returns a 400 bad
request error. That's a pretty drastic
action. So, we can't squeeze this buffer
too tightly, especially if we have a
broad or non-technical user base.
But if our API end users are technical
enough, we could put a low but
reasonable limit on the size of the
request headers and then require users
to read and abide by the documentation.
EngineX stores the request body in the
client body buffer and any excess is
stored on disk. So we can be more
aggressive with this buffer. Maybe only
guaranteeing it will hold the bodies of
95% or 99% of legitimate requests.
For an enginex instance that only serves
static assets like the enginex serving
our Vue.js JS front end legitimate users
will never send post, put, or patch
requests.
So, we can set the client body buffer
size very low.
But, we'd still need to count it in our
formula because users can still fill up
that buffer. So, we don't want to give
malicious users the ability to trigger
our memory limit and kill the EngineX
container.
After receiving the request line and the
headers, EngineX compiles them to
determine how to route the request. Some
of the requests will be sent to PHP FPM
and it will run the Laravel application
to build the response which will then be
sent back to EngineX.
The first part of the output from PHP
FPM is stored in the preliminary buffer
determined by fast CGI buffer size.
It's crucial that the fast CGI headers
are fully included in this preliminary
buffer because if not, EngineX returns a
502 bad gateway error.
These fast CGI headers are what will
later be translated into the responses
HTTP status and HTTP headers.
So we need to be very aware of and
control the size of the headers returned
by Laravel.
In the engineext access log, we're
recording the embedded variables,
upstream response length, which is the
total size of the response payload sent
from PHP FPM, and body bytes sent, which
is the size of the response body. So,
the total minus the body gives us the
size of the headers. This is likely to
be much smaller than the size of the
equivalent HTTP headers because it's in
binary key value format and doesn't
include endline characters.
Any headers added by engine X are not
held in this buffer because they're
added as the response is sent out.
If this preliminary CGI buffer fully
contains the headers, the rest of the
buffer is put to good use and fills up
with the first part of the response
body. So there's no memory saving
benefit to squeezing this buffer
aggressively.
If the response exceeds the preliminary
buffer, the rest of the body is stored
in fast CGI buffers.
Yes, that's named confusingly.
what I'm calling the preliminary buffer
is determined by the enginex directive
fast CGI buffer size
and then these body only buffers are
determined by fast CGI buffers.
If these body only buffers are also
exceeded engineext responds with a 502
bad gateway.
So we need to be very aware of the
maximum size of our responses.
The value of upstream response length in
our logs will help with that.
If we can control the response size with
a high degree of confidence, we can set
these buffers quite tightly.
And then we'd need to set up a system
such that any design or codebase change
that affects the maximum response size
triggers a reassessment of this
configuration before deployment.
And we'd also need to implement
end-to-end tests for the scenarios that
generate the largest possible responses
in production.
It's possible that EngineX clears the
request buffers before the response
buffers are filled. If that's the case,
we could safely use the max of the
request buffers and the response buffers
instead of the sum. And that would then
reduce our peak memory estimate by quite
a lot.
That would be very interesting to
examine, but I haven't had the time to
do that yet.
For an enginex instance that only serves
static assets, it's impossible for a
request to use the fast CGI buffers, not
to mention the proxy or the output
buffers not mentioned here. So, we can
safely remove those from our peak usage
formula for that instance.
And the ingress controller handles TLS
termination. So we're also ignoring SSL
buffer here.
PHP, PHP FPM, and EngineX all have
settings for timeouts that govern
various parts of the request response
process. So I created this diagram for
my notes to help visualize how they line
up.
The x-axis represents time very roughly
as the request goes from client to
engineext to PHP FPM and the response
reverses that route.
But the size of each block doesn't
correspond to how long the task takes or
the suggested timeout value. Rather, the
diagram shows how and when each timeout
is triggered and how they overlap.
If any of these timeouts are breached,
the client receives an error response.
So, we want these timeouts to
accommodate essentially 100% of
legitimate requests.
Starting from the left, the first
timeout is client header timeout, which
is triggered when EngineX accepts the
connection after the TCP handshake.
It sets the time needed to receive the
request line and the headers from the
client.
It's not an absolute timeout. Rather,
it's a timeout for the intervals between
reads. That is, each time some part of
the header is received, the timer resets
to zero. And that's a running theme for
a lot of these engine X timeouts. As you
can see in the diagram,
after EngineX receives and pauses the
request line and the headers, the client
body timeout begins and limits the
intervals between reads of the request
body from the client.
The purpose of these two request
timeouts is to stop partial requests
filling up the available connections.
A slow loris attack is an attempt to do
that on mass and a clever attacker could
easily determine our timeouts and then
drip feed the server with a response
repeatedly. So it's important to also
limit concurrent requests from the same
IP address.
So we store all concurrent IP addresses
with limit con zone. With my settings,
there can be a maximum of 24,048
concurrent connections. So that needs
124 kilobytes to guarantee storing all
concurrent addresses.
Then in a server or location block, use
limit con to put an upper limit on the
number of concurrent connections from
the same IP address.
Back to timeouts.
After the header has been received,
EngineX can determine how to process the
request. If it will be forwarded to PHP
FPM and if the EngineX worker doesn't
already have a connection with an idle
PHP FPM worker, it starts a new
connection and starts the fast CGI
connect timeout.
In normal operation, establishing a
connection is near instantaneous
unless all PHP FPM workers are busy and
the queue is filled up. The Q size is
determined by the PHP FPM directive
listen backlog. So, it's advisable to
set it to the maximum 511
and then set fast CGI connect timeout to
just a few seconds.
Once the body is fully received and the
fast CGI connection is established,
EngineX begins sending the request to
PHP FPM and starts the fast CGI send
timeout and PHP starts the max input
time.
Max input time also covers the pausing
of the request body like populating the
dollar post or dollar files predefined
variables before the script can begin
execution.
But of course, EngineX doesn't know
about or care about any of that. So as
soon as the transmission is complete, it
switches from the first CGI send timeout
to the first CGI read timeout, which
puts a limit on how long Laravel can
take to return the response in full.
More accurately, it puts a limit on the
interval between read operations. But
since Laravel typically buffers the
whole response and sends it out at the
end, that makes fast CGI read timeout
almost the same as an absolute timeout.
The execution time of the PHP script is
limited by PHP's max execution time and
by PHP FPM's request terminate timeout.
Max execution time measures CPU time. So
the timer is paused during IO operations
like database queries. And when
exceeded, it has a slightly more
graceful termination.
Whereas request terminate timeout
measures wall clock time and it has a
hard termination.
So we'd align these two, but then we'd
increase request terminate timeout to
account for the peak expected IO time.
And then we'd also add a little margin
to give max execution time a chance to
terminate the script more gracefully
after returning the response in full. If
the PHP script continues execution as in
with the terminate method in middleware,
this is included in max execution time
and request terminate timeout, but not
in EngineX's fast CGI read timeout
because once EngineX has received the
response, it's already moved on to
sending it to the client.
So the script execution timeouts can
extend rightwards beyond T5 in the
diagram and maybe beyond the end of
engine X's fast CGI read timeout. Though
in that situation, it's probably better
to use Q workers instead
to help set the script execution
timeouts. In Laravel, we're logging wall
clock duration and CPU time duration for
each request.
And finally, at the right of the
diagram, send timeout limits the
intervals between write operations while
sending the response to the client.
Engine X has four embedded variables
which we can log to help us with setting
some of these timeouts.
Request time measures from the first
bite received from the client to the
last bite sent to the client.
Upstream connect time measures the time
to establish the first CGI connection.
That should hopefully always be zero.
Upstream header time measures from the
first bite sent to PHP FPM until the
first bite received in response by
engine X.
An upstream response time has the same
start point but keeps measuring until
the last bite of the response is
received by engine X. Those two will be
the same unless the response is very
large.
In the Laravel documentation, the
suggested EngineX configuration is
really not great.
As an example, let's say a user sends a
request to our domain /hello.
So this embedded variable dollar uri
equals hello.
With this configuration, what we're
asking engineext to do is this.
First check for a file called hello and
if it exists serve that file to the
client.
So far so good. That could be a JS file
or a CSS file.
But if hello file doesn't exist, check
for a directory called hello.
If hello directory exists, check for an
index file which above is defined as
index.php.
If that exists, serve it to the client.
And already from a Laravel perspective,
we are way off course.
If hello directory didn't contain
index.php,
then serve the directory listing, which
very ancient internet users will
remember, but these days directory
listings are disabled by default. So,
EngineX returns 403 forbidden simply
because the user requested a directory
which does exist on the server.
And if the request URI isn't a file and
isn't a directory, finally we're
directed to the Laravel application
index.php.
But I don't understand why index.tphp
here is so convoluted with variables.
If not to a static file, we always want
to forward to public/index.php.
So why not just hardcode it here and
state it clearly?
And these headers at the moment are
functional. But the moment we put an add
header directive into a location block,
that block no longer inherits add header
directives from outside. So it's safer
to just put all add header directives
into location blocks and don't rely on
inheritance.
This whole configuration feels like a
copy paste job from a pre- Laravel PHP
project. And although it works, it has
needless inefficiencies and potential
security flaws.
For our EngineX configuration on port
8080,
we want all responses to be JSON,
including errors caught by EngineX.
So for the error pages, we're using
named locations that serve static JSON
files saved into the image.
Internal locations would also achieve
the same result.
Then apart from the FAV icon and
robots.txt, txt we're returning 404 for
any request that doesn't start with
slash API/v1
slash
and if we take a look at the standard
hacky requests that every server gets
this location block alone rejects well
all of them we definitely don't want any
malicious requests like this to access
any static files and preferably we don't
want to waste resources forwarding the
request to Laravel just for it to return
a 404 response.
All API requests get sent to Laravel and
index.php is hardcoded for clarity.
Finally, we're rate limiting by IP
address here in EngineX and later by
user ID in the Laravel application.
Then for port 8081 for cluster internal
traffic, we've got keep alive times to
sustain TCP connections with EngineX
exporter and the readiness check for one
week and for two weeks respectively.
/engineex up is a simple enginex only
endpoint for a livveness check which
we're currently not using.
slashengineext status is for the enginex
exporter to scrape engineext metrics and
in turn be scraped by Prometheus
and the rest are specific endpoints to
send to Laravel
in the Vue.js EngineX instance if a
request URI ends in one of these file
extensions we check if the static file
exists and if not return index.html HTML
and the Vue.js router will send the 404
page.
And then for non-static asset URIs, go
straight to index.html.
In the content security policy header,
we need to specify the API domain for
connect source and form action.
For static assets, we can cache the
results of the stat and open system
calls. And since we're using immutable
containers, we can safely cache for a
year or more. And then the static assets
themselves will likely be held in memory
by Linux's page cache, though that
depends on other containers in the same
node.
Let's take a quick look at the Docker
files for the Laravel image. Some
dependencies are needed at build time
for compiling PHP extensions and running
composer install but not needed at
runtime.
So the overall design is compile in a
builder target and then copy the results
into a fresh minimal target for
production.
To accommodate other targets which we'll
discuss in a second, we have to split
builder and build prod targets and split
minimal base and prod targets with prod
being the ultimate image to deploy in
production.
During the build phase, the composer
install command only needs composer.json
and composer.lock, block. But creating
the auto loader obviously needs the
entire codebase. So a very efficient
Docker caching strategy is to copy in
thejson andlock files, run composer
install with the d-n no autoloadader
flag.
Then that intensive process is cached
until we change our composer
dependencies.
Then we can copy in the codebase and
build the autoloader.
If we didn't split those commands, we'd
need to build the vendor directory every
time we modify a file
in the production image for PHP FPM to
communicate with the engineext
container. Both processes need read and
write access to the socket file. So we
make sure that the www data user in this
container and the engineext user in the
engineext container have the same UID
and then set the file permissions
accordingly in the PHP FPM configuration
as mentioned in the Kubernetes section.
We run all bootstrap caches except for
config cache in the docker file.
And we're not using Laravel's built-in
health checker endpoint. But if we were,
apparently that view isn't cached by PHP
artisan view cache. So we can access it
once in the Docker file to force that
view to be rendered so that we can make
the cache directory read only in
production.
For file permissions, I've commented out
some of my standard Laravel production
setup because they don't apply in
Kubernetes or for this particular
project, but I like to keep them here
just as a reminder or if we change the
project or architecture later.
For the local dev environment, the
simplest option is to extend the builder
target just before the composer install
command is run. Then install development
tools like XDBug and create a directory
for the output from XDBug profiling
and then mount the local codebase into
the container in docker compose with a
bind mount.
We want to run composure install and
artisan commands inside this container
to write files to our local device.
So in the make file we can construct a
command to enter the container with the
www data user and our local users group
GD
and then set um mask to make sure that
any files generated have 775 permissions
as in full permissions for both user and
group.
That way the vendor directory and any
other files created by this container
are readable and writable by the www
data user in this container and our
local user on the host.
I've also got a combined target for
testing the architectural option of PHP
FPM and engine X in a single container
which we discussed earlier.
The only complaint with this dev image
is that it has different dependencies to
the production image. So potentially
some tests could be passing in this
image but be failing for production.
This is a trade-off I'm happy with. But
an alternative, more complex approach
would be to run composer install with
testing dependencies included.
Then copy the resultant vendor directory
into a target that splits from the
production image just before the
codebase is copied in and then mount the
local code base with a bind mount. And
similarly for the dev environment split
off from minimal base and install dev
tools like xdebug.
The local development environment is
handled by docker compose. There's not
too much to note here. Engineext depends
on the laravel container and service
started is enough because we just need
the socket file to exist before enginex
starts.
Ideally, Laravel should depend on
Postgress and Reddus passing health
checks. Postgress has a handy pg_is
ready command and Reddus has ping. But I
commented out the dependencies since
very often I was just testing engine X
and PHP only and didn't want to wait an
extra 2 seconds to get running.
There are two instances of Postgress,
one for the dev environment and one for
testing. It's most common to run tests
with an in-memory instance of SQL Lite
as the database. But as we'll see later,
our application is unfortunately tightly
coupled to Postgress. So we need to run
the tests with Postgress to have any
confidence that the results represent
the application in production.
There's a pre-commit script to process
the code before committing to git
repository for Vue.js. JS lint staged
handles things pretty well. We're
running prettier eslint and vest on only
staged files that have changed since the
last commit and running view tsc on the
whole project.
In a similar way, in the bash script,
we've got a function that returns an
array of the staged files that have
changed since the last commit, so that
we're not wasting time and resources
checking the whole codebase on each
commit.
For scripts, we feed that array into
shell check lint. And for PHP, we're
validating the composer.json JSON and
log files. Then feeding the changed
files array into PHP stan at level 9 and
pint and then running git add to
reststage any files that were modified.
Each makes sure to reference the correct
configuration file. And we have a
stricter pint configuration for non-ests
than for tests.
And the last step is to run PHP unit. It
executes a script mounted in the Laravel
container which accepts arguments for
whether or not to first run database
migrations,
the test coverage threshold, test suites
to include or exclude, and which tests
specifically to run.
Since I'm not working as part of a team,
a make file is sufficient for a CI/CD
pipeline for testing, building, and
deploying.
Exec Laravel executes a command in the
Laravel container.
Very often we're creating or editing
files on the host machine via a bind
mount. So we enter the container with
both the www data user and the local
users group so we can function in both
worlds. And we're setting um mask so
that any files created have 775
permissions so that both the containers
user and the local user can read and
write the files created.
Shell Laravel executes an interactive
shell in the container.
The composer commands have the same
function and are just time savers to do
specific composer functions without
opening an interactive terminal.
And then there are similar exec and
shell commands for each container.
If we want to run PHP stan pint or PHP
unit outside of the pre-commit check,
those commands are here. For testing,
there are commands for the standard test
suite and for end-to-end tests for
testing a deployment.
Then there are commands for building the
images and for deploying them in
Kubernetes.
We can switch contexts between the local
kind cluster and my physical cluster.
For this API, I want every response to
fit within a small set of predictable
JSON structures.
The top level of the response should
always be an object, not an array to
avoid JSON array hijacking.
And for every response object, we attach
an object called meta, which at least
includes a request ID to help clients
communicate problems with us and help
with debugging.
I'm also including the timestamp and
script duration for now, but these don't
have any practical purpose at the
moment.
For a query that returns a single
resource, the resource is the value of a
key called data.
For a query that returns multiple
resources, data is an array of results.
And we add an object called pagionation
to help the client navigate through the
data set.
For a successful query with no resource
to return, data just contains result
success.
And finally, if at least one error
occurs, data is replaced by an array
called errors.
which contains error objects which each
have an integer code and a string
message.
These error codes help the client to
respond programmatically to a problem
without needing to parse the message
text.
The error itself is a data transfer
object called API error and the code is
an integer enum called API error code.
API error code has a method called
message which accepts an array of
placeholders if necessary and returns an
appropriate message for each error code.
And the API error constructor calls this
message method.
So API error code is a very convenient
single location to plan and construct a
list of all possible errors, pair them
up with a suitable message, and a
reference point for the placeholders
that we need to pass to the API error
constructor.
The aim is to be as specific as possible
with error codes, but also to have more
general error codes to fall back on. For
example, we have specific error codes
for each type of input validation
employed in our project. But if
something goes wrong, there's a general
validation error code. So if we use a
new type of input validation in a
controller or form request, but we don't
account for it in our validation error
handler, we can still return a
semi-specific error response. And in the
worst case scenario, we can fall back on
the unknown error code.
Of course, when such general errors
happen, we need to analyze the logs and
construct more insightful error codes.
As a result,
the JSON response structures mentioned
each have a method in the class called
API response builder.
success pagenated
errors.
And as well as the errors method,
there's also an error method because
returning a single error is the most
common scenario and it's easy to forget
to enclose it in an array.
Since we want to always send JSON
responses with descriptive error codes,
we want to replace Laravel's default
exception handling behavior in Laravel
11 onwards. That's done in
Bootstrap/app.php.
But since it's likely to get quite
sizable, I've extracted it to a class
called API exception handler.
In simple cases, we can just define an
error response with API response
builder.
In some cases, there's a small amount of
processing to add more detail to the
error response.
And there's a catch all default for any
exception we're not handling
specifically.
For input validation errors, I've
extracted the logic to validation errors
builder, which returns an array of the
data transfer object. API error to be
fed into API response builder errors
method.
In validation errors builder, we loop
through the errors returned by the
validator and match each with the
correct API error code.
As mentioned throughout this project,
whenever there's a scenario that we
don't expect to happen, like if we fail
to match the validation error, we log
the details and fall back on a more
general API error code.
For the database, I have maybe something
of a controversial feature which I'll
have to explain carefully.
When we need to write to the database
and one of the columns has a unique
constraint or a foreign key constraint,
the standard practice is to first
validate the value with a select query
and if no results are returned, then
continue with the insert or update
operation.
So that's one database query for the
failure scenario and two queries for the
success scenario. But
with a lot of caveats, we can
potentially skip the validation in
Laravel, write the value directly to the
database and if a constraint is
violated, handle the error returned by
the database and inform the user of the
input validation error.
That means only one database query for
both success and failure scenarios which
reduces the load on the database and
speeds up responses from the client's
perspective.
It also removes the race condition
between the select query for the
validation and the eventual write to
database.
So now for the downsides. One, it splits
the validation logic in two, which is
messy. I kept the constraints in the
form request as comments, as reminders
of what will be validated by the
database.
Two, it's not great for the developer
experience. We just have to remember to
not validate constraints in form
requests and to employ our new strategy
each time we want to insert or update.
Three, the database treats the
constraint violation as an error, not as
a simple validation check, and it logs
it as such. So, we'd need a log
filtration system in production.
Four, the database returns the error as
a string which we have to parse
and that is a somewhat fragile process.
We need to run rigorous tests with many
edge cases every time we change database
version.
And five, error responses vary per
vendor. So we're tightly coupling our
application with our initial choice of
database vendor, in this case Postgress.
These are all very serious cons and
nothing else about this project is
geared towards the high concurrency
situation that would give value to the
pros. So for a real project, I would
almost certainly not implement this
feature, but I wanted to explore it as
an educational exercise. And it's a
healthy exercise to predict the cons in
advance, try it, and run head-on into
any unexpected cons, and then get better
at analysis of design choices in the
future.
So far with Postgress with unique and
foreign key constraints, I haven't hit
any critical problems from paring the
error message.
It returns a unique SQL state code that
identifies which constraint was violated
and the offending column is bounded by
characters that are invalid for a column
name and preceded by a substantial fixed
string.
Postgress does actually allow illegal
characters in the column name if it's
bounded by double quotes. So we'd have
to check for that.
Another nuisance is that Laravel
interpolates the actual values into the
Postgress error message. And for
security, we don't want potentially
sensitive values feeding into our psing
logic, especially for a fragile process
like this that has a lot of logging.
So it works for Postgress but if for
some reason we change database vendor
we'd need to rewrite the parsing logic
and there's no guarantee that the error
message provides the required detail and
format for us to parse.
Here's the comparable message from SQL
light.
The SQL state code is more generic than
Postgresses. So we'd have to parse the
text to discover even which constraint
was violated.
There's also a small possibility that a
new version of Postgress will change the
error message in a way that hurts this
feature.
I implemented this feature with a trait
called handles DB errors.
For any code that inserts or updates a
database record, we wrap it in a closure
and in the handle DB errors method. and
the closure is executed inside a try
catch block looking for query exception.
This wrapping design causes very minimal
disruption and there's a low development
cost to enabling or disabling the error
handling feature.
The design is very reusable. Just add
the trait to any class that writes to
database. And compared to a rigid method
signature, the closure gives us complete
flexibility around what variables to
pass, what type to return,
which interface to use to interact with
the database, how many queries to run,
what parts of the code to wrap in a
database transaction,
and what other actions we need to run
alongside the queries.
Currently, if a constraint violation is
detected, we're immediately returning a
validation error response to the client.
We could consider making this more
flexible, like allowing more closures to
be passed to handle specific error
scenarios. For example, we might want to
alter our reaction based on which column
violated the constraint.
Also regarding the database, we're
implementing a rule of no select star
queries or no queries that return all
columns. This is to reduce the chance of
exposing sensitive data and to make
queries faster to run.
So for any resource that will be output
by the API, we're defining a resource
class where we define how the raw output
is processed into the API output.
In this case, we're just renaming UU ID
into ID.
And we're defining the columns to be
injected into the select query to fetch
only the relevant columns.
For the naming scheme, we have the model
name item, then public or private
explicitly warning if this resource will
be output by the API or not. For
example, the UU ID is for public usage
while the incrementing integer ID is
strictly for internal usage.
And we have full or minimal to denote
which columns to include.
Pagionated results would typically use
minimal while the results of a single
specified resource would typically use
full and include more columns.
Then in the model class, it imports the
columns constant to run the relevant
queries.
Let's run through the life cycle of a
standard request. The first thing we do
is create an instance of request context
service which will hold auxiliary
information about the request.
We define it as a singleton so that
anytime it's referenced in a method
signature throughout the application,
the service container will inject the
same instance with the same properties
similar to how request itself is
handled.
Immediately we store the current time so
that we can calculate the wall clock
duration at the end of the script
execution and we store the current
resource usage of the PHP FPM worker
process in order to calculate the CPU
time duration at the end of the script.
There's an empty array to store the
duration of any database queries which
will also be logged at the end. And the
request ID set way back in the ingress
controller is saved here and added to
the context of any logs that will be
written.
The logging middleware will call get
duration milliseconds which is pretty
simple and also get CPU time
milliseconds which requires some
explanation.
R U
time.tv TV sec is the number of whole
seconds the PHP FBM worker process has
spent in user mode like the PHP
interpreter doing work.
The same key with us or microsconds
instead of sec is the number of
microsconds towards the next whole
second in user mode. So it's bounded by
1 million.
RUS time.tv
sec is the number of whole seconds the
PHP FPM worker process has spent in
kernel mode. So that's system calls,
memory mapping, network stack
processing etc.
And the same key with USC instead of SE
is the number of microsconds towards the
next whole second in kernel mode.
The most intuitive mathematical approach
to calculating the CPU time duration of
the script is to combine the seconds
with microsconds
and sum the user time and kernel time
and then subtract the start time from
the end time just like we do for the
wall clock duration.
I was worried that floating point
accuracy might be a problem here.
Floating point accuracy is 15 to 17
significant figures. So adding
everything up to a large total before
the final subtraction could cause some
precision to be lost.
And since the difference between the
start and end times will generally be
tiny, that loss of resolution could mean
we get a result of zero once the PHP FPM
worker passes a certain age.
So maybe it would be better to have a
mathematically equivalent but less
intuitive formula that subtracts large
numbers from large numbers and sums the
results at the end.
But I tried some rough calculations and
actually the loss of resolution happens
sometime after 250 years. So yes, the
intuitive formula will do fine.
The CPU time duration will inform our
decision on setting the PHP directive
max execution and the wall clock
duration helps with the PHP FPM
directive request terminate timeout.
The last two methods are to log the size
of the response headers and the body.
The only thing to note here is that the
headers are ASKI only. So we can always
safely use strlen which is faster than
the multibbyte equivalent mb strl ln.
The body is UTF8.
So potentially nonasi.
So we can only use strl ln as long as in
php.ini
we set mbstring.funk
overload to zero.
Otherwise, we'd have to use MB_ST
strlen and specify 8 bit encoding to be
sure we're getting the number of bytes
instead of the number of characters.
Then we add a listener for the database
to add the query time to the query
duration array we just created.
And next we configure the rate limiter,
although it isn't actually applied at
this point in the request.
Engine X rate limits by IP and
intercepts those requests before they
reach Laravel. So in Laravel, we're
limiting by user ID and that needs to
happen after authentication.
For us, it happens later in
roots/api.php
PHP straight after the authentication
middleware.
Malicious users can get around this with
a coordinated system of multiple
accounts and multiple IPs.
If we were worried about that and we
couldn't control user accounts or
whitelisted IPs more tightly, then we
might consider running pattern
recognition of usage, i.e. combine the
usage of two or more users. assess them
as if they were a single user and see if
it adds up to a malicious usage pattern.
Or we might try to gather a list of
known VPN IP addresses and monitor those
accounts more closely.
For EngineX's rate limit on IP
addresses, we need to consider if our
clients are in a business at the same
address.
For example, if all users log on at 9:00
a.m. on the same IP address, then we
might need to loosen X's rate limit on
IP addresses.
Next, the request hits the middleware.
I've disabled global middleware and
moved most of the default middleware to
the API group.
That's because the web group can only be
accessed internally within the
Kubernetes cluster.
For trust proxies, we're providing the
cider range that the ingress controller
is guaranteed to be within. And then
Laravel knows that the client's IP is
the real IP and that the connection is
indeed secure.
Then API requests pass to root/api.php.
PHP
and for routes that require
authentication the requests pass through
the authentication middleware and the
rate limiter middleware we discussed
earlier.
I extended the authenticate class as a
convenient way to add the user ID to the
context of logs created after
authentication
and also because in the original class
if an unauthenticated request didn't
have the accept header of
application/json
it tried to redirect the user to a login
page but our API is purely JSON so
that's not the desired behavior.
The last endpoint is for browsers to
report content security policy
violations which go straight into the
log.
The web routes are mostly for the
Kubernetes probes mentioned earlier. /
Laravel readiness simply replies
immediately.
/ Laravel startup if you recall tests
the connections to the database and the
cache in a try catch block and returns
an error HTTP status code if there's any
problem
and / Laravel status is for my own
tinkering and analysis of real cache and
opcache in production to further tweak
those configurations.
Unfortunately, we can't just run those
commands in the command line interface
because the CLI is separate from PHP FPM
and maintains its own obcache memory
pool.
And finally, an API request runs back
through the middleware.
The handle cause middleware along with
its config lets us tell browsers that
the Vue.js front end is permitted to
access the API.
And the inject meta middleware adds the
meta object to each JSON response object
as it's passing out of the door.
After the response has been sent, the
last action is to log the request for
future analysis.
I won't cover the Vue.js front end
because this video is already quite long
and it just logs into the API, stores
the token and interacts with the API in
a simple manner.
All the source code is available in the
description of the video and please let
me know if you have any questions or any
improvements on any part of the code.
Thanks.
