Laravel API, PHP-FPM, Nginx, Postgres, Kubernetes

1h 25m video Transcribed Jun 28, 2026

Advanced 42 min read For: Experienced Laravel developers and DevOps engineers interested in Kubernetes deployment and performance tuning.

262

Views

13

Likes

2

Comments

0

Dislikes

5.7%

📊 Average

AI Summary

This video provides a comprehensive walkthrough of deploying a Laravel RESTful API with PHP-FPM, Nginx, and PostgreSQL inside Docker containers orchestrated by Kubernetes. The presenter covers production cluster architecture, configuration decisions for PHP-FPM and Nginx, development environment setup, testing, deployment, and API design features like structured error responses and database constraint validation.

Clickbait Check

95% Legit

"The title accurately describes the video's content: a detailed walkthrough of a Laravel API project using PHP-FPM, Nginx, Postgres, and Kubernetes."

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Laravel API Kubernetes Setup

45s

This segment introduces a complex, real-world project that appeals to developers interested in modern DevOps and Laravel.

▶ Play Clip

Controversial DB Validation

48s

The controversial approach of pushing validation to the database sparks debate and engagement among developers.

▶ Play Clip

HTTP Compression Optimization

55s

The detailed explanation of TCP segment optimization for compression is highly educational and practical for performance tuning.

▶ Play Clip

PHP-FPM vs Nginx Connection

57s

The comparison of TCP vs Unix sockets for PHP-FPM and Nginx is a common pain point, making this segment highly relatable and shareable.

▶ Play Clip

Nginx Config Security Flaws

55s

Critiquing Laravel's default Nginx config as inefficient and insecure is controversial and educational, driving engagement.

▶ Play Clip

Full Transcript

Download .txt Download .md

[00:00] Hello everyone. This video is a

[00:02] discussion of a Laravel restful API

[00:04] project using PHP FPM and engine X with

[00:08] a Postgress database all in docker

[00:11] containers deployed in Kubernetes.

[00:15] So I'll cover the setup in Kubernetes,

[00:18] the choices and thought processes when

[00:20] configuring PHP FPM and EngineX, the

[00:24] design of the development environment,

[00:27] testing, deployment, and some features

[00:29] of the API. Plus, there's a Vue.js front

[00:33] end thrown in as well to log into the

[00:35] API and interact with it. It's a bit too

[00:38] simple to discuss in this video, but

[00:40] it's also included in the code, and all

[00:43] the code is in the repository in the

[00:45] video description.

[00:49] The production environment is my

[00:51] Kubernetes cluster at home with a single

[00:54] control plane and two worker nodes on a

[00:57] pretty slow internet connection,

[00:59] but that doesn't have too much impact on

[01:01] the project and it can mostly be applied

[01:03] anywhere.

[01:06] The API itself has been kept generic so

[01:08] you won't have to listen to any domain

[01:10] specific concepts. The features I've

[01:13] added are focused on internal workings

[01:15] rather than a specific API use case. So

[01:18] they could be applied and expanded to

[01:20] many use cases.

[01:22] For example, responses are restricted to

[01:26] a set predictable structure, and each

[01:29] error has a unique integer error code,

[01:31] so the client can more easily handle

[01:34] error responses programmatically.

[01:37] Input validation for database columns

[01:39] with unique or foreign key constraints

[01:42] have been pushed to the database to

[01:44] reduce the number of queries needed.

[01:47] That's a controversial one, so we'll

[01:49] discuss the pros and the cons.

[01:52] There's also a fix for a very common

[01:54] enginex inefficiency regarding HTTP

[01:57] compression

[01:59] and there are some helpful logs to help

[02:02] us set the PHP FPM and enginex

[02:05] configurations.

[02:08] So the structure of the video is

[02:11] production environment,

[02:13] PHP FPM and EngineX configuration,

[02:16] the Docker files, the dev environment

[02:20] and something of a CI/CD pipeline but

[02:23] very minimal and the Laravel API itself

[02:28] and discussion of security

[02:30] considerations throughout.

[02:35] So what does the production cluster look

[02:37] like? The ingress controller manages

[02:40] traffic for the cluster, handles TLS

[02:43] termination, and applies HTTP

[02:45] compression. It forwards API requests to

[02:49] the EngineX and Laravel pod. If traffic

[02:53] increases, the EngineX and Laravel pod

[02:55] is duplicated by a horizontal pod

[02:58] autoscaler.

[02:59] So, as usual, we want to make sure that

[03:02] it's completely stateless.

[03:05] Then for the database we're using

[03:07] Postgress managed by a stateful set.

[03:11] Postgress zero is the primary instance

[03:13] that all Laravel instances write to.

[03:17] Postgress one and two and so on are the

[03:20] readonly standby instances to to

[03:23] increase read throughput if we have high

[03:25] concurrent usage.

[03:28] So whenever a new standby instance

[03:30] starts up, it duplicates the primary

[03:32] database with pg

[03:35] base backup.

[03:37] And whenever the primary instance

[03:39] executes a write operation, it sends the

[03:42] W or write ahead log records to each

[03:47] standby so that all instances are

[03:49] synchronized in near real time.

[03:52] To configure this in Laravel, in the

[03:55] database config file, add read and write

[03:58] elements to the database you're using,

[04:02] specifying the pod for the right

[04:04] instance and the service for the read

[04:06] instances.

[04:09] And for the development or testing

[04:10] environments, override that in the

[04:13] Laravel.env env file specifying the

[04:16] database's service name in docker

[04:18] compose

[04:22] for cache we're using reddus

[04:25] and of course there's the view front end

[04:27] which is mostly decoupled it could be

[04:29] hosted here or anywhere else and finally

[04:33] all these components are scraped by

[04:35] prometheus and graphana molds that data

[04:39] into dashboards which are also

[04:42] accessible through ingress controller

[04:46] Since the cluster has so few nodes, we

[04:49] can present the CPU and memory usage of

[04:52] each node in a single dashboard.

[04:55] And the other dashboard I watch most is

[04:58] for the ingress controller, especially

[05:00] latencies for the three connected

[05:02] services, the API, the front end, and

[05:06] the graphana front end we're using right

[05:08] now.

[05:10] On my slow home internet, the latencies

[05:12] don't help us model production systems

[05:15] very much. They're pretty slow, but the

[05:18] variance can still be insightful.

[05:20] The variance is expressed here with the

[05:23] average latency of the fastest 50%, 95%

[05:28] and 99% of requests.

[05:35] In the ingress controller, we're most

[05:38] commonly receiving requests from the CDN

[05:41] with the original client's IP in the X

[05:45] forwarded for header.

[05:47] So by specifying the CDN cider ranges

[05:50] that we trust, we can safely strip them

[05:53] from the forwarding chain and just leave

[05:56] the client's IP.

[05:58] That's important for logging and for

[06:00] rate limiting by IP address downstream

[06:04] in EngineX or Laravel.

[06:07] The ingress controller like any

[06:09] engineext instance generates a random

[06:12] request ID and here we're attaching it

[06:15] to the web request with a custom header.

[06:19] And that way we can reference a

[06:21] consistent request ID here in the

[06:23] ingress controller and also as the

[06:26] request passes on through engine X and

[06:28] Laravel and back again.

[06:32] Regarding HTTP compression,

[06:35] by default, gzip and brley compress

[06:39] files of any size, but they both add

[06:42] metadata to the compressed files. So for

[06:45] files that are already small, we're

[06:47] actually expending CPU power to compress

[06:51] a file and actually increase the amount

[06:53] of data to be transferred.

[06:57] So we specify brutley min length and

[07:00] gzip min length to set the file size

[07:03] threshold at which the ingress

[07:06] controller will apply compression.

[07:09] But what's the logical threshold to set?

[07:13] Well, the equilibrium point at which

[07:16] compression starts to reduce file size

[07:18] is different per file, but is usually

[07:21] between 100 and 400 kilob.

[07:25] So, is that the answer? Well, not

[07:28] really. When we send a message over TCP,

[07:32] as long as the payload fits within one

[07:34] TCP segment,

[07:36] the size of that payload has a

[07:38] negligible impact on the transmission

[07:41] time.

[07:42] It's like adding more passengers to a

[07:44] plane. It has almost no impact on the

[07:47] flight time.

[07:50] So from the client's perspective,

[07:52] latency doesn't scale linearly with the

[07:55] file size. It potentially jumps stepwise

[07:59] when the file size necessitates sending

[08:02] another TCP segment.

[08:05] That means HTTP compression is

[08:08] worthwhile only if it has a decent

[08:11] likelihood of reducing the number of TCP

[08:14] segments needed to hold a particular

[08:16] payload.

[08:18] So a legitimate strategy is to set the

[08:21] HTTP compression threshold at the file

[08:25] size that would trigger a second TCP

[08:28] segment.

[08:31] The maximum size of a TCP segment

[08:34] depends on the MTU, the maximum

[08:38] transmission unit on layer 3 of the OSI

[08:41] model, which is 1,500 bytes.

[08:46] Each IP packet has an IP header taking

[08:50] 20 to 60 bytes and a TCP header taking

[08:55] 20 to 40 bytes. And the rest of the

[08:59] 1,500 bytes is for the payload.

[09:03] For the largest possible payload that is

[09:06] still contained in one TCP segment,

[09:10] it would contain the TLS record for

[09:12] encryption, the HTTP headers, and

[09:16] finally once all of that is accounted

[09:18] for, the remaining space is for the HTTP

[09:21] body, which is the candidate for

[09:23] compression.

[09:26] In this project, it's too early to

[09:27] finalize what our standard HTTP headers

[09:30] will be. So, we can't finalize this

[09:33] optimization yet. But this is the

[09:35] formula that will decide it once all of

[09:37] the components are confirmed.

[09:41] But there's one last twist. If the

[09:44] response doesn't have a content length

[09:47] header, then EngineX ignores the gzip

[09:51] min length and brought minength

[09:54] directives and compresses every file

[09:57] regardless of file size.

[10:00] So for that reason in Laravel, we've got

[10:03] a middleware to measure the body and add

[10:06] the content length header.

[10:09] There's a PHP.ini ini directive called

[10:12] MB string.f funk_over.

[10:17] If we set it to zero, we can safely use

[10:19] the faster stren function for adding the

[10:24] content length header.

[10:26] Otherwise, we'd need to use the

[10:28] multibbyte equivalent mb

[10:32] strlen and specify 8 bit encoding to be

[10:36] sure we're getting the number of bytes

[10:38] instead of the character count.

[10:41] We have to make sure this middleware is

[10:42] late in the stack after any middleware

[10:45] that will modify the body because if the

[10:48] response body is longer than the content

[10:50] length header, engine X will cut off the

[10:52] excess.

[10:58] Okay, onto Laravel and Engine X. There

[11:02] are four ways of connecting PHP FPM with

[11:06] EngineX in Kubernetes.

[11:08] The first decision is whether to put the

[11:11] containers in separate pods or in the

[11:13] same pod.

[11:16] Engine X can handle far more connections

[11:18] than PHP FPM can and separate pods would

[11:22] let us scale them independently. So we

[11:25] could save the memory overhead of the

[11:27] excess engine X pods.

[11:30] Each engine X pod has an overhead of

[11:32] about 10 MGB.

[11:36] The downside is that EngineX could

[11:38] forward requests to PHP FPM instances on

[11:42] different nodes, which would add network

[11:45] latency.

[11:46] Getting around this is fiddly and might

[11:49] not be worth the memory saved.

[11:52] Also, separate pods means that the logs

[11:55] for a particular request would be split

[11:58] up between different pods.

[12:04] If we go with a single pod, that opens

[12:07] up the choice of how EngineX and PHP FPM

[12:10] will talk to each other. Either by using

[12:13] TCP or by mounting a shared volume onto

[12:17] both containers to hold a Unix socket

[12:20] file.

[12:22] Each TCP connection needs to be

[12:25] established with a TCP handshake

[12:27] consisting of three messages sent back

[12:29] and forth with the default EngineX

[12:33] configuration. That handshake happens

[12:35] for every single request rooted to PHP

[12:38] FPM.

[12:40] Each message sent is bundled with IP and

[12:43] TCP headers increasing the amount of

[12:46] data to be transferred. And depending on

[12:48] the size of the request and the

[12:50] response, they might be broken up into

[12:53] packets before sending, then reassembled

[12:56] at the other end. And finally, the

[12:58] connection needs to be torn down with

[13:00] another three messages.

[13:03] Due to all this redundant stuff, I

[13:06] really expected sockets to be measurably

[13:08] faster than TCP.

[13:11] Maybe the tearown would happen

[13:12] concurrently to the response being sent

[13:14] out, but the rest of it surely increases

[13:17] latency.

[13:19] However, in the very quick tests that I

[13:22] set up, latency was almost identical

[13:25] between the two methods.

[13:27] That's interesting academically, and I

[13:30] want to investigate more when I have the

[13:32] time. But practically speaking, if we're

[13:36] interested in such small savings in

[13:37] latency, then we'd likely be better off

[13:40] considering SWUL instead of PHP FPM and

[13:43] EngineX.

[13:45] And then this whole architectural

[13:47] decision would go away.

[13:50] Both TCP and sockets benefit from the

[13:53] EngineX directive fast CGI keep con

[13:57] which keeps the connection open between

[14:00] requests. So we wouldn't need the TCP

[14:02] handshake for every request.

[14:05] The PHP FPM counterparts are PM do

[14:09] process idle timeout which sets the time

[14:13] a worker can be idle before it's killed

[14:17] and PMAX

[14:19] requests which caps how many requests a

[14:22] worker can serve before it's respawned.

[14:27] And we need to set fast CGI pass in

[14:30] engineext and listen in PHP FBM.

[14:36] For TCP connections, we tell them which

[14:38] port and for socket, we tell them the

[14:41] location of the socket file.

[14:46] Then the fourth and final option for

[14:48] connecting enginex with PHP FPM in

[14:51] Kubernetes is putting them in the same

[14:54] container in the same pod.

[14:57] This is workable but it creates

[14:59] complications.

[15:01] Kubernetes monitors the status of the

[15:03] process with P1 as a health check. If

[15:07] the process exits, Kubernetes stops the

[15:10] container even if other processes are

[15:13] running. And if another process exit,

[15:16] Kubernetes has no idea and does nothing.

[15:20] So if a container has more than one key

[15:22] process, we need to implement a

[15:24] replacement for the Kubernetes health

[15:26] checks and process management.

[15:31] In the end, I went with the middle

[15:33] ground option of same pod, different

[15:36] containers with shared volume between

[15:38] them for a Unix socket file. Mostly

[15:42] because I hadn't tried this setup

[15:43] before.

[15:45] And we'll see the detailed

[15:46] implementation in the Kubernetes

[15:48] manifest and the Docker file later in

[15:50] the video.

[15:55] So with that decided, let's jump into

[15:57] the pod manifest.

[15:59] As I explain components, I'll also build

[16:02] up this visual representation to help

[16:05] visualize how everything links together.

[16:08] Obviously the Laravel and EngineX

[16:10] containers are at the core and the first

[16:13] step is to inject the configuration

[16:15] files with Kubernetes secrets and config

[16:18] maps and to mount the shared volume for

[16:21] PHP FPM to create the socket file we

[16:24] just talked about.

[16:27] If we can make the whole file system in

[16:29] a container read only, that's great for

[16:32] security. So we need to identify where

[16:35] the application needs right access and

[16:38] then mount volumes at those locations

[16:41] with right access and make everything

[16:43] else read only

[16:45] and the socket volume is our first one

[16:47] of those.

[16:52] The Laravel Bootstrap caches never

[16:55] change in production.

[16:57] So at first thought we'd run the cache

[16:59] creation commands in the Docker file so

[17:02] that they're part of the image and then

[17:04] we'd make them read only in production.

[17:08] That's fine for event cache, root cache,

[17:11] and view cache. But config cache needs

[17:15] to read thev file.

[17:19] For security reasons, we can't put

[17:21] sensitive files like thev into the

[17:24] image. So the only safe option is to run

[17:28] config cache within each pod as it

[17:31] starts up.

[17:33] So that means the cache directory needs

[17:36] to have right access in production. But

[17:39] we'd prefer it to only have read access

[17:41] because it never changes once created

[17:45] and any attacker that gets right access

[17:47] to the bootstrap caches could obviously

[17:49] do serious damage.

[17:52] But there is a way to get the best of

[17:54] both worlds.

[17:56] First, in the Docker file, we rename the

[17:59] bootstrap/cache

[18:01] directory to something else like cache

[18:05] temp.

[18:07] Then in Kubernetes, we run a laravel

[18:10] init container when the pod starts up.

[18:14] It has the env file injected and a

[18:18] writable volume mounted at

[18:20] bootstrap/cache.

[18:24] We copy everything from the temp cache

[18:26] directory into the volume at

[18:29] bootstrap/cache

[18:31] and then run php artisan config cache to

[18:37] create the final bootstrap cache file.

[18:41] The time command just logs the memory

[18:43] usage to help us set resource limits

[18:45] later.

[18:48] Then after the init container has

[18:50] finished, the main Laravel container

[18:53] starts up with the cache volume mounted

[18:56] as readonly true.

[19:00] And that's how we get readonly cache

[19:02] with sensitive data compiled at the pod

[19:06] startup.

[19:11] We're also running additive database

[19:14] migrations in the init container.

[19:17] The d- isolated flag is very

[19:20] consequential.

[19:22] It means while this migration is

[19:24] running, although normal reads and

[19:26] writes can still happen concurrently,

[19:29] no other migrate command with the

[19:32] isolated flag can begin.

[19:35] A migrate command without the isolated

[19:38] flag can still run. So it's important

[19:40] that we add it to every migrate command

[19:43] in production to make sure concurrent

[19:46] migrations are impossible.

[19:49] It uses the cache to track whether the

[19:52] isolated command is running or not

[19:55] currently.

[19:56] So if you're using the database for

[19:59] cache, it creates a catch22 situation

[20:02] for your first migration. you'd need to

[20:04] run migrations once without the isolated

[20:07] flag first,

[20:09] but most applications would just use

[20:11] Reddus, so it wouldn't be a problem.

[20:16] An unexpected problem I ran into is that

[20:20] if a container crashes and gets

[20:22] restarted by Kubernetes,

[20:24] the mounted volumes are not cleared.

[20:28] That's the behavior we want most of the

[20:30] time. We want data to be permanent for

[20:33] the life of the pod, but it can cause

[20:36] some misleading error logs. In the init

[20:39] container startup script, it copied and

[20:42] created the cache files with no problem.

[20:45] Then it hit an error with migrations.

[20:48] So, Kubernetes killed the container and

[20:51] ran it again. But on the second run, the

[20:54] volume was already populated. So the

[20:57] error logs were about file permissions,

[20:59] not the migration commands.

[21:02] So just bear that in mind. If all else

[21:05] is equal, move any operations on mounted

[21:08] volumes to the end of the script. But in

[21:11] our case, migrations actually depend on

[21:13] the cache files being present.

[21:18] We've got another init container to set

[21:20] up the directory structure for EngineX's

[21:23] writable volume.

[21:25] I prefer to use chain guard images to be

[21:28] minimal and more secure, but the CPUs of

[21:31] my nodes are too old and don't support

[21:33] them.

[21:35] And finally, we have one last container

[21:38] that scrapes the enginex metrics

[21:40] endpoint and presents that data in a

[21:43] format that Prometheus can then scrape.

[21:46] Port 8080 is for publicly available

[21:50] endpoints via ingress. Port 8081

[21:54] is for internal traffic like health

[21:56] checks and metrics and then Prometheus

[21:59] scripts the exporter on its default port

[22:02] 9113.

[22:08] Kubernetes provides three types of

[22:10] probes or health checks. Probes are

[22:14] attached to containers, but the actions

[22:17] on success or failure can impact either

[22:20] that container it's attached to or the

[22:23] whole pod.

[22:25] When a pod starts up, if at least one

[22:28] container has a startup probe, then that

[22:31] pod won't initially be added to the

[22:34] services end points. So, it won't be

[22:37] accessible by other pods or by outside

[22:40] traffic from ingress.

[22:43] If a startup probe fails, the container

[22:46] it's attached to is killed and by

[22:49] default configuration in Kubernetes, any

[22:52] killed container is instantly restarted.

[22:56] So that's hoping that a restart or a

[22:58] slight delay will fix whatever caused

[23:01] the startup probe to fail.

[23:04] Once a container startup probe passes,

[23:07] it will never run again. And instead,

[23:10] the container's readiness probe and

[23:12] livveness probe begin running if it has

[23:15] them. And they will then run repeatedly

[23:18] for as long as the pod exists.

[23:21] Once all startup probes in a pod pass

[23:25] and if there are no readiness probes,

[23:28] then the pod is added to the services

[23:30] endpoints and it starts serving traffic.

[23:34] If there are one or more readiness

[23:36] probes, then the pod waits on them.

[23:40] Once all readiness probes pass, the pod

[23:44] is added to the services endpoints.

[23:47] But the readiness probes keep running

[23:49] continuously.

[23:51] And if at any time a readiness probe

[23:53] fails, the pod is removed again. So a

[23:57] failed readiness probe only has a pod

[23:59] level effect. It doesn't kill the

[24:01] container.

[24:03] That's what livveness probes are for.

[24:05] When a livveness probe fails, it has no

[24:09] pod level effect. The pod can still

[24:11] receive external traffic. Instead, a

[24:14] failed livveness probe kills the

[24:16] container it's attached to.

[24:20] So, to summarize, a failed readiness

[24:22] probe stops traffic flow reaching the

[24:25] pod. A failed livveness probe kills the

[24:28] container it's attached to. And a failed

[24:32] startup probe does both of those, but

[24:35] only when the pod is starting up.

[24:39] Phew. I think that's the most concise

[24:41] summary of probes I can give. So, how

[24:44] can we apply these to our Laravel and

[24:46] EngineX pod?

[24:49] Well, we don't want requests reaching a

[24:52] broken pod. So, a readiness check is

[24:55] crucial.

[24:57] And the ability to serve requests

[24:59] depends on EngineX and Laravel both

[25:02] working. So we put the readiness probe

[25:05] on the engine X container to query a

[25:08] Laravel endpoint that returns a simple

[25:12] plain text response.

[25:14] That means the readiness check only

[25:16] passes if EngineX and Laravel and their

[25:19] connection are all fine.

[25:23] The ability to serve requests also

[25:25] depends on the connection to the

[25:27] database and the cache. So we might

[25:31] consider checking those connections as

[25:33] part of the readiness check.

[25:36] But if a database problem did occur, it

[25:40] would affect all Laravel pods.

[25:43] We wouldn't have a mix of healthy and

[25:45] unhealthy pods. And the readiness probe

[25:49] would react by removing this pod from

[25:51] its service, which doesn't do anything

[25:54] to solve the database problem. And

[25:57] actually we'd slightly prefer to keep

[25:59] the Laravel pod serving requests to give

[26:02] as graceful a response as possible.

[26:06] And then we'd rely on some other probe

[26:09] to heal the database problem closer to

[26:11] where it occurred. So no, the readiness

[26:15] probe should not check the connections

[26:17] to database and cache.

[26:20] However, it is a good idea to add those

[26:23] connections to the startup probe. That's

[26:26] so that if we're deploying a new version

[26:28] and I've messed up the connection

[26:29] configurations,

[26:31] Kubernetes will stop it going live and

[26:33] keeps the old version alive serving

[26:36] requests.

[26:39] Another reason to have a startup check

[26:41] is because we're doing opcache

[26:42] preloading at startup. So we need some

[26:45] flexibility around the slightly

[26:47] unpredictable bootup time.

[26:51] The startup probe of course also needs

[26:53] to check both enginex and Laravel and

[26:56] their connection. So we add it to the

[26:58] engineext container and call a startup

[27:01] endpoint in Laravel.

[27:05] This design has one perverse side effect

[27:07] that if the startup probe fails, it

[27:10] kills the engine X container it's

[27:12] attached to. Even though the problem is

[27:15] much more likely to come from the

[27:16] Laravel container or the database or

[27:19] cache connections, but that's one

[27:21] imperfection I think we can live with.

[27:25] And finally, what about livveness

[27:27] probes? Well, in the engineext

[27:30] configuration, I created an endpoint

[27:32] that returns a simple plain text

[27:34] response, but I think the chance of

[27:36] EngineX messing up is so unlikely.

[27:39] Currently, I don't think it's worth

[27:40] running a constant probe.

[27:43] Laravel is a bit more likely to mess up.

[27:45] So, we've got a livveness probe querying

[27:48] the PHP FPM status page.

[27:56] The last big topic of this manifest is

[27:58] the memory limits that Kubernetes

[28:00] imposes on each container. So this is

[28:03] where we'll transition to the topic of

[28:05] configuration for PHP, PHP FPM and

[28:09] engine X.

[28:11] The Kubernetes memory limit is a fail

[28:13] safe that kills the container if it's

[28:16] exceeded.

[28:17] That's a pretty drastic action. So we

[28:20] need to set it high enough to cover the

[28:22] peak memory usage in normal operation

[28:26] so that it's only triggered by abnormal

[28:28] memory usage that we want to catch early

[28:30] and contain.

[28:33] This is my formula to estimate peak

[28:35] usage in normal operation.

[28:38] My understanding can be improved further

[28:40] but I think this is a decent formula for

[28:42] now. And at the end we use a margin

[28:45] component to represent the degree of

[28:48] confidence we have in our estimation.

[28:51] PHP FBM has one master process and a

[28:55] variable number of workers that serve

[28:57] web requests. So we need to determine

[29:00] which memory expenses are per worker and

[29:04] which are shared between all workers.

[29:07] The master process overhead, the PHP

[29:10] interpreter and its extensions, OPC

[29:13] cache, and any mounted volumes stored in

[29:16] memory should all be counted once. And

[29:19] then everything in the brackets is

[29:21] multiplied by the maximum number of

[29:23] workers specified by the PHP FPM

[29:26] directive, PM domax children.

[29:31] For an application that's 100% CPU

[29:34] inensive, we'd set PM max children equal

[29:39] to the number of CPU cores available to

[29:41] the container.

[29:43] But the larger the IO weight is expected

[29:46] to be like waiting for database queries

[29:49] to come back, the more we can raise PMAX

[29:54] children above the number of cores.

[29:58] Workers process one request at a time.

[30:01] So memory usage doesn't scale infinitely

[30:04] as the concurrency of requests grow. The

[30:08] number of workers is a cutoff. So

[30:10] pm.mmax children is the ultimate cutff.

[30:14] And any queue of waiting requests mostly

[30:17] consumes engineext's memory allowance,

[30:20] not PHP FPMs.

[30:24] Memory limit in PHP.ini INI is the

[30:28] memory usage failsafe on script

[30:31] execution.

[30:33] So just like the Kubernetes memory

[30:35] limit, we need to predict the peak

[30:37] memory usage in normal operation of a

[30:40] single script this time and add a margin

[30:42] of confidence.

[30:45] If abnormal memory usage happens in a

[30:48] script, we want the PHP memory limit to

[30:51] kill that script.

[30:54] And the additional margin of the

[30:56] Kubernetes memory limit means it can

[30:58] only be triggered in an even rarer and

[31:01] more extreme situation and it would kill

[31:03] the whole container, not just a single

[31:06] request.

[31:08] To help set PHP's memory limit in

[31:11] Laravel, we're logging the peak memory

[31:13] usage for every request.

[31:16] Similarly, we're also logging the real

[31:18] path cache size and the worker ID.

[31:22] Doing it in the middleware's terminate

[31:24] method means it's executed after the

[31:27] response is sent out.

[31:30] We can also use Xdebug profiling to find

[31:33] exactly how memory is used in a

[31:35] particular request execution.

[31:38] And while profiling in Laravel, we

[31:41] should disable garbage collection at the

[31:43] start of the script to ensure accurate

[31:45] readings.

[31:47] So there's a setting in thev file to

[31:49] toggle garbage collection.

[31:55] We have a similar structure for the

[31:57] engine X prediction of peak memory usage

[32:00] in normal operation.

[32:03] Shared resources are counted once and

[32:06] the items in brackets are per

[32:08] connection. So they get multiplied by

[32:11] worker processes which is the number of

[32:13] workers and worker connections which is

[32:17] the maximum number of connections per

[32:19] worker.

[32:21] The default for worker connections is

[32:24] 512

[32:26] and that can realistically be set as

[32:27] high as 10,000 or more.

[32:30] That means that since we will add a

[32:32] margin of confidence to each of these

[32:34] buffer sizes here to ensure they can

[32:36] satisfy legitimate requests,

[32:39] those margins would then be multiplied

[32:42] thousands or tens of thousands of times

[32:44] when we're calculating the container

[32:46] level memory limit.

[32:49] We would then be reserving a huge amount

[32:51] of memory for a peak usage scenario that

[32:54] is very unlikely to occur.

[32:57] So to manage that problem, first we need

[33:00] to align worker connections with the

[33:03] peak request concurrency we want to

[33:05] guarantee satisfying and with the memory

[33:08] or the financial constraints that we

[33:11] have.

[33:12] Second, let's consider the size of these

[33:15] buffers extremely carefully. Can we

[33:18] restrict them without hurting UX?

[33:21] Does exceeding a particular buffer kill

[33:23] a request or does it just downgrade

[33:26] performance and by how much?

[33:29] So with that as our goal, how do these

[33:31] buffers work? For an incoming request,

[33:35] EngineX puts the headers into the client

[33:38] header buffer.

[33:40] If the headers exceed that buffer,

[33:43] EngineX puts them into the large client

[33:46] header buffers.

[33:49] And if in that scenario the initial

[33:52] client header buffer is no longer used,

[33:54] we could safely remove that from our

[33:56] peak usage formula. But I haven't had

[33:59] time to experiment with that yet, so I'm

[34:01] keeping it in just to be safe.

[34:04] If the large client header buffers are

[34:07] exceeded, EngineX returns a 400 bad

[34:11] request error. That's a pretty drastic

[34:14] action. So, we can't squeeze this buffer

[34:17] too tightly, especially if we have a

[34:19] broad or non-technical user base.

[34:23] But if our API end users are technical

[34:26] enough, we could put a low but

[34:28] reasonable limit on the size of the

[34:30] request headers and then require users

[34:33] to read and abide by the documentation.

[34:41] EngineX stores the request body in the

[34:44] client body buffer and any excess is

[34:47] stored on disk. So we can be more

[34:50] aggressive with this buffer. Maybe only

[34:52] guaranteeing it will hold the bodies of

[34:55] 95% or 99% of legitimate requests.

[35:02] For an enginex instance that only serves

[35:04] static assets like the enginex serving

[35:07] our Vue.js JS front end legitimate users

[35:11] will never send post, put, or patch

[35:13] requests.

[35:15] So, we can set the client body buffer

[35:17] size very low.

[35:20] But, we'd still need to count it in our

[35:21] formula because users can still fill up

[35:24] that buffer. So, we don't want to give

[35:26] malicious users the ability to trigger

[35:28] our memory limit and kill the EngineX

[35:30] container.

[35:34] After receiving the request line and the

[35:36] headers, EngineX compiles them to

[35:39] determine how to route the request. Some

[35:43] of the requests will be sent to PHP FPM

[35:46] and it will run the Laravel application

[35:48] to build the response which will then be

[35:51] sent back to EngineX.

[35:54] The first part of the output from PHP

[35:57] FPM is stored in the preliminary buffer

[36:00] determined by fast CGI buffer size.

[36:05] It's crucial that the fast CGI headers

[36:08] are fully included in this preliminary

[36:11] buffer because if not, EngineX returns a

[36:14] 502 bad gateway error.

[36:18] These fast CGI headers are what will

[36:20] later be translated into the responses

[36:23] HTTP status and HTTP headers.

[36:29] So we need to be very aware of and

[36:31] control the size of the headers returned

[36:34] by Laravel.

[36:35] In the engineext access log, we're

[36:38] recording the embedded variables,

[36:40] upstream response length, which is the

[36:43] total size of the response payload sent

[36:46] from PHP FPM, and body bytes sent, which

[36:51] is the size of the response body. So,

[36:54] the total minus the body gives us the

[36:57] size of the headers. This is likely to

[37:00] be much smaller than the size of the

[37:02] equivalent HTTP headers because it's in

[37:05] binary key value format and doesn't

[37:08] include endline characters.

[37:13] Any headers added by engine X are not

[37:16] held in this buffer because they're

[37:18] added as the response is sent out.

[37:23] If this preliminary CGI buffer fully

[37:25] contains the headers, the rest of the

[37:28] buffer is put to good use and fills up

[37:30] with the first part of the response

[37:32] body. So there's no memory saving

[37:34] benefit to squeezing this buffer

[37:36] aggressively.

[37:38] If the response exceeds the preliminary

[37:41] buffer, the rest of the body is stored

[37:43] in fast CGI buffers.

[37:47] Yes, that's named confusingly.

[37:50] what I'm calling the preliminary buffer

[37:53] is determined by the enginex directive

[37:56] fast CGI buffer size

[37:59] and then these body only buffers are

[38:01] determined by fast CGI buffers.

[38:06] If these body only buffers are also

[38:08] exceeded engineext responds with a 502

[38:12] bad gateway.

[38:14] So we need to be very aware of the

[38:16] maximum size of our responses.

[38:20] The value of upstream response length in

[38:22] our logs will help with that.

[38:26] If we can control the response size with

[38:28] a high degree of confidence, we can set

[38:31] these buffers quite tightly.

[38:34] And then we'd need to set up a system

[38:36] such that any design or codebase change

[38:38] that affects the maximum response size

[38:41] triggers a reassessment of this

[38:43] configuration before deployment.

[38:46] And we'd also need to implement

[38:47] end-to-end tests for the scenarios that

[38:50] generate the largest possible responses

[38:52] in production.

[38:55] It's possible that EngineX clears the

[38:57] request buffers before the response

[39:00] buffers are filled. If that's the case,

[39:03] we could safely use the max of the

[39:05] request buffers and the response buffers

[39:08] instead of the sum. And that would then

[39:10] reduce our peak memory estimate by quite

[39:13] a lot.

[39:14] That would be very interesting to

[39:16] examine, but I haven't had the time to

[39:17] do that yet.

[39:20] For an enginex instance that only serves

[39:23] static assets, it's impossible for a

[39:25] request to use the fast CGI buffers, not

[39:29] to mention the proxy or the output

[39:31] buffers not mentioned here. So, we can

[39:34] safely remove those from our peak usage

[39:36] formula for that instance.

[39:39] And the ingress controller handles TLS

[39:42] termination. So we're also ignoring SSL

[39:45] buffer here.

[39:52] PHP, PHP FPM, and EngineX all have

[39:57] settings for timeouts that govern

[39:59] various parts of the request response

[40:01] process. So I created this diagram for

[40:04] my notes to help visualize how they line

[40:06] up.

[40:08] The x-axis represents time very roughly

[40:13] as the request goes from client to

[40:15] engineext to PHP FPM and the response

[40:19] reverses that route.

[40:22] But the size of each block doesn't

[40:24] correspond to how long the task takes or

[40:26] the suggested timeout value. Rather, the

[40:30] diagram shows how and when each timeout

[40:33] is triggered and how they overlap.

[40:38] If any of these timeouts are breached,

[40:40] the client receives an error response.

[40:43] So, we want these timeouts to

[40:45] accommodate essentially 100% of

[40:48] legitimate requests.

[40:51] Starting from the left, the first

[40:53] timeout is client header timeout, which

[40:57] is triggered when EngineX accepts the

[40:59] connection after the TCP handshake.

[41:03] It sets the time needed to receive the

[41:05] request line and the headers from the

[41:08] client.

[41:09] It's not an absolute timeout. Rather,

[41:12] it's a timeout for the intervals between

[41:15] reads. That is, each time some part of

[41:19] the header is received, the timer resets

[41:22] to zero. And that's a running theme for

[41:25] a lot of these engine X timeouts. As you

[41:28] can see in the diagram,

[41:31] after EngineX receives and pauses the

[41:34] request line and the headers, the client

[41:36] body timeout begins and limits the

[41:40] intervals between reads of the request

[41:42] body from the client.

[41:45] The purpose of these two request

[41:47] timeouts is to stop partial requests

[41:50] filling up the available connections.

[41:53] A slow loris attack is an attempt to do

[41:56] that on mass and a clever attacker could

[41:59] easily determine our timeouts and then

[42:01] drip feed the server with a response

[42:04] repeatedly. So it's important to also

[42:07] limit concurrent requests from the same

[42:09] IP address.

[42:11] So we store all concurrent IP addresses

[42:15] with limit con zone. With my settings,

[42:19] there can be a maximum of 24,048

[42:22] concurrent connections. So that needs

[42:26] 124 kilobytes to guarantee storing all

[42:28] concurrent addresses.

[42:32] Then in a server or location block, use

[42:35] limit con to put an upper limit on the

[42:38] number of concurrent connections from

[42:40] the same IP address.

[42:43] Back to timeouts.

[42:46] After the header has been received,

[42:48] EngineX can determine how to process the

[42:51] request. If it will be forwarded to PHP

[42:54] FPM and if the EngineX worker doesn't

[42:57] already have a connection with an idle

[43:00] PHP FPM worker, it starts a new

[43:03] connection and starts the fast CGI

[43:07] connect timeout.

[43:09] In normal operation, establishing a

[43:12] connection is near instantaneous

[43:14] unless all PHP FPM workers are busy and

[43:18] the queue is filled up. The Q size is

[43:21] determined by the PHP FPM directive

[43:25] listen backlog. So, it's advisable to

[43:29] set it to the maximum 511

[43:32] and then set fast CGI connect timeout to

[43:36] just a few seconds.

[43:39] Once the body is fully received and the

[43:42] fast CGI connection is established,

[43:45] EngineX begins sending the request to

[43:47] PHP FPM and starts the fast CGI send

[43:53] timeout and PHP starts the max input

[43:57] time.

[44:00] Max input time also covers the pausing

[44:03] of the request body like populating the

[44:06] dollar post or dollar files predefined

[44:11] variables before the script can begin

[44:14] execution.

[44:16] But of course, EngineX doesn't know

[44:18] about or care about any of that. So as

[44:21] soon as the transmission is complete, it

[44:23] switches from the first CGI send timeout

[44:26] to the first CGI read timeout, which

[44:30] puts a limit on how long Laravel can

[44:32] take to return the response in full.

[44:36] More accurately, it puts a limit on the

[44:38] interval between read operations. But

[44:41] since Laravel typically buffers the

[44:43] whole response and sends it out at the

[44:45] end, that makes fast CGI read timeout

[44:50] almost the same as an absolute timeout.

[44:55] The execution time of the PHP script is

[44:58] limited by PHP's max execution time and

[45:03] by PHP FPM's request terminate timeout.

[45:09] Max execution time measures CPU time. So

[45:13] the timer is paused during IO operations

[45:16] like database queries. And when

[45:19] exceeded, it has a slightly more

[45:21] graceful termination.

[45:23] Whereas request terminate timeout

[45:26] measures wall clock time and it has a

[45:29] hard termination.

[45:31] So we'd align these two, but then we'd

[45:35] increase request terminate timeout to

[45:38] account for the peak expected IO time.

[45:41] And then we'd also add a little margin

[45:44] to give max execution time a chance to

[45:47] terminate the script more gracefully

[45:51] after returning the response in full. If

[45:54] the PHP script continues execution as in

[45:58] with the terminate method in middleware,

[46:01] this is included in max execution time

[46:05] and request terminate timeout, but not

[46:08] in EngineX's fast CGI read timeout

[46:12] because once EngineX has received the

[46:14] response, it's already moved on to

[46:16] sending it to the client.

[46:19] So the script execution timeouts can

[46:21] extend rightwards beyond T5 in the

[46:25] diagram and maybe beyond the end of

[46:28] engine X's fast CGI read timeout. Though

[46:32] in that situation, it's probably better

[46:34] to use Q workers instead

[46:38] to help set the script execution

[46:40] timeouts. In Laravel, we're logging wall

[46:43] clock duration and CPU time duration for

[46:47] each request.

[46:49] And finally, at the right of the

[46:51] diagram, send timeout limits the

[46:55] intervals between write operations while

[46:58] sending the response to the client.

[47:02] Engine X has four embedded variables

[47:04] which we can log to help us with setting

[47:08] some of these timeouts.

[47:10] Request time measures from the first

[47:13] bite received from the client to the

[47:16] last bite sent to the client.

[47:20] Upstream connect time measures the time

[47:23] to establish the first CGI connection.

[47:27] That should hopefully always be zero.

[47:31] Upstream header time measures from the

[47:34] first bite sent to PHP FPM until the

[47:38] first bite received in response by

[47:40] engine X.

[47:43] An upstream response time has the same

[47:46] start point but keeps measuring until

[47:49] the last bite of the response is

[47:51] received by engine X. Those two will be

[47:54] the same unless the response is very

[47:56] large.

[48:00] In the Laravel documentation, the

[48:03] suggested EngineX configuration is

[48:05] really not great.

[48:08] As an example, let's say a user sends a

[48:12] request to our domain /hello.

[48:16] So this embedded variable dollar uri

[48:19] equals hello.

[48:21] With this configuration, what we're

[48:23] asking engineext to do is this.

[48:26] First check for a file called hello and

[48:30] if it exists serve that file to the

[48:33] client.

[48:34] So far so good. That could be a JS file

[48:36] or a CSS file.

[48:39] But if hello file doesn't exist, check

[48:42] for a directory called hello.

[48:46] If hello directory exists, check for an

[48:49] index file which above is defined as

[48:52] index.php.

[48:54] If that exists, serve it to the client.

[48:57] And already from a Laravel perspective,

[49:00] we are way off course.

[49:03] If hello directory didn't contain

[49:06] index.php,

[49:08] then serve the directory listing, which

[49:11] very ancient internet users will

[49:13] remember, but these days directory

[49:15] listings are disabled by default. So,

[49:18] EngineX returns 403 forbidden simply

[49:22] because the user requested a directory

[49:24] which does exist on the server.

[49:28] And if the request URI isn't a file and

[49:31] isn't a directory, finally we're

[49:33] directed to the Laravel application

[49:36] index.php.

[49:38] But I don't understand why index.tphp

[49:40] here is so convoluted with variables.

[49:44] If not to a static file, we always want

[49:46] to forward to public/index.php.

[49:50] So why not just hardcode it here and

[49:52] state it clearly?

[49:56] And these headers at the moment are

[49:58] functional. But the moment we put an add

[50:01] header directive into a location block,

[50:04] that block no longer inherits add header

[50:07] directives from outside. So it's safer

[50:10] to just put all add header directives

[50:14] into location blocks and don't rely on

[50:16] inheritance.

[50:18] This whole configuration feels like a

[50:20] copy paste job from a pre- Laravel PHP

[50:23] project. And although it works, it has

[50:27] needless inefficiencies and potential

[50:29] security flaws.

[50:33] For our EngineX configuration on port

[50:35] 8080,

[50:37] we want all responses to be JSON,

[50:40] including errors caught by EngineX.

[50:43] So for the error pages, we're using

[50:45] named locations that serve static JSON

[50:48] files saved into the image.

[50:51] Internal locations would also achieve

[50:53] the same result.

[50:56] Then apart from the FAV icon and

[50:58] robots.txt, txt we're returning 404 for

[51:02] any request that doesn't start with

[51:05] slash API/v1

[51:08] slash

[51:10] and if we take a look at the standard

[51:12] hacky requests that every server gets

[51:15] this location block alone rejects well

[51:18] all of them we definitely don't want any

[51:21] malicious requests like this to access

[51:23] any static files and preferably we don't

[51:26] want to waste resources forwarding the

[51:28] request to Laravel just for it to return

[51:31] a 404 response.

[51:35] All API requests get sent to Laravel and

[51:38] index.php is hardcoded for clarity.

[51:43] Finally, we're rate limiting by IP

[51:45] address here in EngineX and later by

[51:49] user ID in the Laravel application.

[51:54] Then for port 8081 for cluster internal

[51:58] traffic, we've got keep alive times to

[52:00] sustain TCP connections with EngineX

[52:03] exporter and the readiness check for one

[52:08] week and for two weeks respectively.

[52:12] /engineex up is a simple enginex only

[52:15] endpoint for a livveness check which

[52:18] we're currently not using.

[52:20] slashengineext status is for the enginex

[52:23] exporter to scrape engineext metrics and

[52:26] in turn be scraped by Prometheus

[52:29] and the rest are specific endpoints to

[52:31] send to Laravel

[52:34] in the Vue.js EngineX instance if a

[52:38] request URI ends in one of these file

[52:41] extensions we check if the static file

[52:44] exists and if not return index.html HTML

[52:49] and the Vue.js router will send the 404

[52:52] page.

[52:55] And then for non-static asset URIs, go

[52:58] straight to index.html.

[53:01] In the content security policy header,

[53:04] we need to specify the API domain for

[53:07] connect source and form action.

[53:12] For static assets, we can cache the

[53:14] results of the stat and open system

[53:17] calls. And since we're using immutable

[53:20] containers, we can safely cache for a

[53:22] year or more. And then the static assets

[53:25] themselves will likely be held in memory

[53:27] by Linux's page cache, though that

[53:30] depends on other containers in the same

[53:32] node.

[53:37] Let's take a quick look at the Docker

[53:38] files for the Laravel image. Some

[53:42] dependencies are needed at build time

[53:44] for compiling PHP extensions and running

[53:47] composer install but not needed at

[53:50] runtime.

[53:52] So the overall design is compile in a

[53:55] builder target and then copy the results

[53:58] into a fresh minimal target for

[54:01] production.

[54:03] To accommodate other targets which we'll

[54:05] discuss in a second, we have to split

[54:08] builder and build prod targets and split

[54:12] minimal base and prod targets with prod

[54:15] being the ultimate image to deploy in

[54:18] production.

[54:20] During the build phase, the composer

[54:22] install command only needs composer.json

[54:26] and composer.lock, block. But creating

[54:29] the auto loader obviously needs the

[54:31] entire codebase. So a very efficient

[54:34] Docker caching strategy is to copy in

[54:37] thejson andlock files, run composer

[54:41] install with the d-n no autoloadader

[54:44] flag.

[54:46] Then that intensive process is cached

[54:49] until we change our composer

[54:51] dependencies.

[54:53] Then we can copy in the codebase and

[54:55] build the autoloader.

[54:57] If we didn't split those commands, we'd

[54:59] need to build the vendor directory every

[55:02] time we modify a file

[55:05] in the production image for PHP FPM to

[55:09] communicate with the engineext

[55:11] container. Both processes need read and

[55:14] write access to the socket file. So we

[55:17] make sure that the www data user in this

[55:21] container and the engineext user in the

[55:24] engineext container have the same UID

[55:28] and then set the file permissions

[55:30] accordingly in the PHP FPM configuration

[55:35] as mentioned in the Kubernetes section.

[55:37] We run all bootstrap caches except for

[55:40] config cache in the docker file.

[55:44] And we're not using Laravel's built-in

[55:46] health checker endpoint. But if we were,

[55:49] apparently that view isn't cached by PHP

[55:52] artisan view cache. So we can access it

[55:56] once in the Docker file to force that

[55:59] view to be rendered so that we can make

[56:02] the cache directory read only in

[56:04] production.

[56:06] For file permissions, I've commented out

[56:09] some of my standard Laravel production

[56:11] setup because they don't apply in

[56:13] Kubernetes or for this particular

[56:15] project, but I like to keep them here

[56:17] just as a reminder or if we change the

[56:20] project or architecture later.

[56:24] For the local dev environment, the

[56:26] simplest option is to extend the builder

[56:28] target just before the composer install

[56:31] command is run. Then install development

[56:35] tools like XDBug and create a directory

[56:38] for the output from XDBug profiling

[56:42] and then mount the local codebase into

[56:44] the container in docker compose with a

[56:47] bind mount.

[56:50] We want to run composure install and

[56:52] artisan commands inside this container

[56:55] to write files to our local device.

[56:59] So in the make file we can construct a

[57:01] command to enter the container with the

[57:04] www data user and our local users group

[57:09] GD

[57:10] and then set um mask to make sure that

[57:13] any files generated have 775 permissions

[57:17] as in full permissions for both user and

[57:20] group.

[57:21] That way the vendor directory and any

[57:24] other files created by this container

[57:26] are readable and writable by the www

[57:29] data user in this container and our

[57:33] local user on the host.

[57:37] I've also got a combined target for

[57:39] testing the architectural option of PHP

[57:43] FPM and engine X in a single container

[57:46] which we discussed earlier.

[57:50] The only complaint with this dev image

[57:52] is that it has different dependencies to

[57:54] the production image. So potentially

[57:57] some tests could be passing in this

[57:59] image but be failing for production.

[58:02] This is a trade-off I'm happy with. But

[58:05] an alternative, more complex approach

[58:07] would be to run composer install with

[58:10] testing dependencies included.

[58:13] Then copy the resultant vendor directory

[58:16] into a target that splits from the

[58:18] production image just before the

[58:20] codebase is copied in and then mount the

[58:24] local code base with a bind mount. And

[58:27] similarly for the dev environment split

[58:30] off from minimal base and install dev

[58:33] tools like xdebug.

[58:37] The local development environment is

[58:39] handled by docker compose. There's not

[58:42] too much to note here. Engineext depends

[58:45] on the laravel container and service

[58:48] started is enough because we just need

[58:50] the socket file to exist before enginex

[58:53] starts.

[58:55] Ideally, Laravel should depend on

[58:57] Postgress and Reddus passing health

[58:59] checks. Postgress has a handy pg_is

[59:04] ready command and Reddus has ping. But I

[59:08] commented out the dependencies since

[59:10] very often I was just testing engine X

[59:12] and PHP only and didn't want to wait an

[59:15] extra 2 seconds to get running.

[59:18] There are two instances of Postgress,

[59:21] one for the dev environment and one for

[59:22] testing. It's most common to run tests

[59:26] with an in-memory instance of SQL Lite

[59:29] as the database. But as we'll see later,

[59:32] our application is unfortunately tightly

[59:35] coupled to Postgress. So we need to run

[59:37] the tests with Postgress to have any

[59:40] confidence that the results represent

[59:42] the application in production.

[59:46] There's a pre-commit script to process

[59:48] the code before committing to git

[59:50] repository for Vue.js. JS lint staged

[59:55] handles things pretty well. We're

[59:58] running prettier eslint and vest on only

[1:00:02] staged files that have changed since the

[1:00:04] last commit and running view tsc on the

[1:00:08] whole project.

[1:00:10] In a similar way, in the bash script,

[1:00:13] we've got a function that returns an

[1:00:15] array of the staged files that have

[1:00:18] changed since the last commit, so that

[1:00:20] we're not wasting time and resources

[1:00:22] checking the whole codebase on each

[1:00:24] commit.

[1:00:26] For scripts, we feed that array into

[1:00:28] shell check lint. And for PHP, we're

[1:00:32] validating the composer.json JSON and

[1:00:35] log files. Then feeding the changed

[1:00:38] files array into PHP stan at level 9 and

[1:00:42] pint and then running git add to

[1:00:45] reststage any files that were modified.

[1:00:48] Each makes sure to reference the correct

[1:00:50] configuration file. And we have a

[1:00:53] stricter pint configuration for non-ests

[1:00:56] than for tests.

[1:00:59] And the last step is to run PHP unit. It

[1:01:03] executes a script mounted in the Laravel

[1:01:05] container which accepts arguments for

[1:01:09] whether or not to first run database

[1:01:11] migrations,

[1:01:12] the test coverage threshold, test suites

[1:01:16] to include or exclude, and which tests

[1:01:19] specifically to run.

[1:01:23] Since I'm not working as part of a team,

[1:01:25] a make file is sufficient for a CI/CD

[1:01:28] pipeline for testing, building, and

[1:01:30] deploying.

[1:01:32] Exec Laravel executes a command in the

[1:01:35] Laravel container.

[1:01:38] Very often we're creating or editing

[1:01:40] files on the host machine via a bind

[1:01:42] mount. So we enter the container with

[1:01:45] both the www data user and the local

[1:01:50] users group so we can function in both

[1:01:52] worlds. And we're setting um mask so

[1:01:55] that any files created have 775

[1:01:58] permissions so that both the containers

[1:02:01] user and the local user can read and

[1:02:03] write the files created.

[1:02:06] Shell Laravel executes an interactive

[1:02:09] shell in the container.

[1:02:12] The composer commands have the same

[1:02:14] function and are just time savers to do

[1:02:16] specific composer functions without

[1:02:19] opening an interactive terminal.

[1:02:21] And then there are similar exec and

[1:02:23] shell commands for each container.

[1:02:27] If we want to run PHP stan pint or PHP

[1:02:31] unit outside of the pre-commit check,

[1:02:34] those commands are here. For testing,

[1:02:37] there are commands for the standard test

[1:02:39] suite and for end-to-end tests for

[1:02:42] testing a deployment.

[1:02:45] Then there are commands for building the

[1:02:46] images and for deploying them in

[1:02:49] Kubernetes.

[1:02:51] We can switch contexts between the local

[1:02:53] kind cluster and my physical cluster.

[1:02:59] For this API, I want every response to

[1:03:02] fit within a small set of predictable

[1:03:04] JSON structures.

[1:03:07] The top level of the response should

[1:03:09] always be an object, not an array to

[1:03:11] avoid JSON array hijacking.

[1:03:14] And for every response object, we attach

[1:03:17] an object called meta, which at least

[1:03:20] includes a request ID to help clients

[1:03:23] communicate problems with us and help

[1:03:25] with debugging.

[1:03:27] I'm also including the timestamp and

[1:03:29] script duration for now, but these don't

[1:03:32] have any practical purpose at the

[1:03:33] moment.

[1:03:36] For a query that returns a single

[1:03:38] resource, the resource is the value of a

[1:03:41] key called data.

[1:03:44] For a query that returns multiple

[1:03:46] resources, data is an array of results.

[1:03:51] And we add an object called pagionation

[1:03:54] to help the client navigate through the

[1:03:56] data set.

[1:03:58] For a successful query with no resource

[1:04:00] to return, data just contains result

[1:04:04] success.

[1:04:06] And finally, if at least one error

[1:04:08] occurs, data is replaced by an array

[1:04:11] called errors.

[1:04:13] which contains error objects which each

[1:04:15] have an integer code and a string

[1:04:18] message.

[1:04:20] These error codes help the client to

[1:04:22] respond programmatically to a problem

[1:04:24] without needing to parse the message

[1:04:26] text.

[1:04:28] The error itself is a data transfer

[1:04:31] object called API error and the code is

[1:04:35] an integer enum called API error code.

[1:04:41] API error code has a method called

[1:04:44] message which accepts an array of

[1:04:46] placeholders if necessary and returns an

[1:04:50] appropriate message for each error code.

[1:04:53] And the API error constructor calls this

[1:04:57] message method.

[1:05:01] So API error code is a very convenient

[1:05:04] single location to plan and construct a

[1:05:07] list of all possible errors, pair them

[1:05:10] up with a suitable message, and a

[1:05:12] reference point for the placeholders

[1:05:14] that we need to pass to the API error

[1:05:17] constructor.

[1:05:19] The aim is to be as specific as possible

[1:05:22] with error codes, but also to have more

[1:05:24] general error codes to fall back on. For

[1:05:28] example, we have specific error codes

[1:05:30] for each type of input validation

[1:05:33] employed in our project. But if

[1:05:35] something goes wrong, there's a general

[1:05:37] validation error code. So if we use a

[1:05:41] new type of input validation in a

[1:05:43] controller or form request, but we don't

[1:05:46] account for it in our validation error

[1:05:48] handler, we can still return a

[1:05:50] semi-specific error response. And in the

[1:05:53] worst case scenario, we can fall back on

[1:05:55] the unknown error code.

[1:05:58] Of course, when such general errors

[1:06:00] happen, we need to analyze the logs and

[1:06:03] construct more insightful error codes.

[1:06:05] As a result,

[1:06:08] the JSON response structures mentioned

[1:06:10] each have a method in the class called

[1:06:13] API response builder.

[1:06:16] success pagenated

[1:06:19] errors.

[1:06:21] And as well as the errors method,

[1:06:23] there's also an error method because

[1:06:26] returning a single error is the most

[1:06:28] common scenario and it's easy to forget

[1:06:30] to enclose it in an array.

[1:06:35] Since we want to always send JSON

[1:06:37] responses with descriptive error codes,

[1:06:40] we want to replace Laravel's default

[1:06:43] exception handling behavior in Laravel

[1:06:46] 11 onwards. That's done in

[1:06:48] Bootstrap/app.php.

[1:06:53] But since it's likely to get quite

[1:06:54] sizable, I've extracted it to a class

[1:06:56] called API exception handler.

[1:07:00] In simple cases, we can just define an

[1:07:02] error response with API response

[1:07:05] builder.

[1:07:07] In some cases, there's a small amount of

[1:07:09] processing to add more detail to the

[1:07:11] error response.

[1:07:13] And there's a catch all default for any

[1:07:16] exception we're not handling

[1:07:17] specifically.

[1:07:20] For input validation errors, I've

[1:07:22] extracted the logic to validation errors

[1:07:25] builder, which returns an array of the

[1:07:28] data transfer object. API error to be

[1:07:32] fed into API response builder errors

[1:07:35] method.

[1:07:37] In validation errors builder, we loop

[1:07:40] through the errors returned by the

[1:07:42] validator and match each with the

[1:07:45] correct API error code.

[1:07:49] As mentioned throughout this project,

[1:07:51] whenever there's a scenario that we

[1:07:53] don't expect to happen, like if we fail

[1:07:55] to match the validation error, we log

[1:07:59] the details and fall back on a more

[1:08:01] general API error code.

[1:08:05] For the database, I have maybe something

[1:08:08] of a controversial feature which I'll

[1:08:11] have to explain carefully.

[1:08:14] When we need to write to the database

[1:08:16] and one of the columns has a unique

[1:08:18] constraint or a foreign key constraint,

[1:08:22] the standard practice is to first

[1:08:24] validate the value with a select query

[1:08:28] and if no results are returned, then

[1:08:30] continue with the insert or update

[1:08:33] operation.

[1:08:35] So that's one database query for the

[1:08:37] failure scenario and two queries for the

[1:08:40] success scenario. But

[1:08:43] with a lot of caveats, we can

[1:08:46] potentially skip the validation in

[1:08:48] Laravel, write the value directly to the

[1:08:51] database and if a constraint is

[1:08:54] violated, handle the error returned by

[1:08:56] the database and inform the user of the

[1:08:59] input validation error.

[1:09:02] That means only one database query for

[1:09:05] both success and failure scenarios which

[1:09:09] reduces the load on the database and

[1:09:11] speeds up responses from the client's

[1:09:13] perspective.

[1:09:15] It also removes the race condition

[1:09:17] between the select query for the

[1:09:19] validation and the eventual write to

[1:09:22] database.

[1:09:24] So now for the downsides. One, it splits

[1:09:28] the validation logic in two, which is

[1:09:31] messy. I kept the constraints in the

[1:09:34] form request as comments, as reminders

[1:09:37] of what will be validated by the

[1:09:39] database.

[1:09:41] Two, it's not great for the developer

[1:09:44] experience. We just have to remember to

[1:09:47] not validate constraints in form

[1:09:49] requests and to employ our new strategy

[1:09:52] each time we want to insert or update.

[1:09:57] Three, the database treats the

[1:09:59] constraint violation as an error, not as

[1:10:02] a simple validation check, and it logs

[1:10:05] it as such. So, we'd need a log

[1:10:08] filtration system in production.

[1:10:12] Four, the database returns the error as

[1:10:15] a string which we have to parse

[1:10:18] and that is a somewhat fragile process.

[1:10:21] We need to run rigorous tests with many

[1:10:23] edge cases every time we change database

[1:10:26] version.

[1:10:28] And five, error responses vary per

[1:10:31] vendor. So we're tightly coupling our

[1:10:34] application with our initial choice of

[1:10:36] database vendor, in this case Postgress.

[1:10:41] These are all very serious cons and

[1:10:44] nothing else about this project is

[1:10:46] geared towards the high concurrency

[1:10:49] situation that would give value to the

[1:10:51] pros. So for a real project, I would

[1:10:54] almost certainly not implement this

[1:10:56] feature, but I wanted to explore it as

[1:11:00] an educational exercise. And it's a

[1:11:02] healthy exercise to predict the cons in

[1:11:04] advance, try it, and run head-on into

[1:11:08] any unexpected cons, and then get better

[1:11:10] at analysis of design choices in the

[1:11:13] future.

[1:11:15] So far with Postgress with unique and

[1:11:18] foreign key constraints, I haven't hit

[1:11:21] any critical problems from paring the

[1:11:23] error message.

[1:11:25] It returns a unique SQL state code that

[1:11:29] identifies which constraint was violated

[1:11:33] and the offending column is bounded by

[1:11:36] characters that are invalid for a column

[1:11:38] name and preceded by a substantial fixed

[1:11:42] string.

[1:11:44] Postgress does actually allow illegal

[1:11:46] characters in the column name if it's

[1:11:48] bounded by double quotes. So we'd have

[1:11:51] to check for that.

[1:11:53] Another nuisance is that Laravel

[1:11:55] interpolates the actual values into the

[1:11:58] Postgress error message. And for

[1:12:01] security, we don't want potentially

[1:12:03] sensitive values feeding into our psing

[1:12:05] logic, especially for a fragile process

[1:12:08] like this that has a lot of logging.

[1:12:13] So it works for Postgress but if for

[1:12:15] some reason we change database vendor

[1:12:18] we'd need to rewrite the parsing logic

[1:12:20] and there's no guarantee that the error

[1:12:22] message provides the required detail and

[1:12:25] format for us to parse.

[1:12:28] Here's the comparable message from SQL

[1:12:30] light.

[1:12:32] The SQL state code is more generic than

[1:12:34] Postgresses. So we'd have to parse the

[1:12:37] text to discover even which constraint

[1:12:39] was violated.

[1:12:42] There's also a small possibility that a

[1:12:44] new version of Postgress will change the

[1:12:47] error message in a way that hurts this

[1:12:49] feature.

[1:12:52] I implemented this feature with a trait

[1:12:54] called handles DB errors.

[1:12:57] For any code that inserts or updates a

[1:13:00] database record, we wrap it in a closure

[1:13:04] and in the handle DB errors method. and

[1:13:07] the closure is executed inside a try

[1:13:10] catch block looking for query exception.

[1:13:15] This wrapping design causes very minimal

[1:13:18] disruption and there's a low development

[1:13:20] cost to enabling or disabling the error

[1:13:23] handling feature.

[1:13:25] The design is very reusable. Just add

[1:13:28] the trait to any class that writes to

[1:13:30] database. And compared to a rigid method

[1:13:33] signature, the closure gives us complete

[1:13:36] flexibility around what variables to

[1:13:38] pass, what type to return,

[1:13:42] which interface to use to interact with

[1:13:44] the database, how many queries to run,

[1:13:48] what parts of the code to wrap in a

[1:13:50] database transaction,

[1:13:52] and what other actions we need to run

[1:13:54] alongside the queries.

[1:13:57] Currently, if a constraint violation is

[1:14:00] detected, we're immediately returning a

[1:14:02] validation error response to the client.

[1:14:06] We could consider making this more

[1:14:07] flexible, like allowing more closures to

[1:14:10] be passed to handle specific error

[1:14:12] scenarios. For example, we might want to

[1:14:16] alter our reaction based on which column

[1:14:18] violated the constraint.

[1:14:27] Also regarding the database, we're

[1:14:30] implementing a rule of no select star

[1:14:33] queries or no queries that return all

[1:14:36] columns. This is to reduce the chance of

[1:14:40] exposing sensitive data and to make

[1:14:42] queries faster to run.

[1:14:45] So for any resource that will be output

[1:14:48] by the API, we're defining a resource

[1:14:50] class where we define how the raw output

[1:14:54] is processed into the API output.

[1:14:58] In this case, we're just renaming UU ID

[1:15:01] into ID.

[1:15:04] And we're defining the columns to be

[1:15:06] injected into the select query to fetch

[1:15:08] only the relevant columns.

[1:15:11] For the naming scheme, we have the model

[1:15:14] name item, then public or private

[1:15:18] explicitly warning if this resource will

[1:15:20] be output by the API or not. For

[1:15:23] example, the UU ID is for public usage

[1:15:27] while the incrementing integer ID is

[1:15:29] strictly for internal usage.

[1:15:32] And we have full or minimal to denote

[1:15:35] which columns to include.

[1:15:38] Pagionated results would typically use

[1:15:40] minimal while the results of a single

[1:15:43] specified resource would typically use

[1:15:46] full and include more columns.

[1:15:49] Then in the model class, it imports the

[1:15:51] columns constant to run the relevant

[1:15:54] queries.

[1:15:59] Let's run through the life cycle of a

[1:16:01] standard request. The first thing we do

[1:16:04] is create an instance of request context

[1:16:07] service which will hold auxiliary

[1:16:10] information about the request.

[1:16:12] We define it as a singleton so that

[1:16:15] anytime it's referenced in a method

[1:16:17] signature throughout the application,

[1:16:19] the service container will inject the

[1:16:21] same instance with the same properties

[1:16:24] similar to how request itself is

[1:16:26] handled.

[1:16:28] Immediately we store the current time so

[1:16:31] that we can calculate the wall clock

[1:16:33] duration at the end of the script

[1:16:35] execution and we store the current

[1:16:38] resource usage of the PHP FPM worker

[1:16:42] process in order to calculate the CPU

[1:16:44] time duration at the end of the script.

[1:16:48] There's an empty array to store the

[1:16:49] duration of any database queries which

[1:16:52] will also be logged at the end. And the

[1:16:55] request ID set way back in the ingress

[1:16:58] controller is saved here and added to

[1:17:01] the context of any logs that will be

[1:17:02] written.

[1:17:05] The logging middleware will call get

[1:17:07] duration milliseconds which is pretty

[1:17:10] simple and also get CPU time

[1:17:14] milliseconds which requires some

[1:17:16] explanation.

[1:17:20] R U

[1:17:22] time.tv TV sec is the number of whole

[1:17:26] seconds the PHP FBM worker process has

[1:17:30] spent in user mode like the PHP

[1:17:33] interpreter doing work.

[1:17:36] The same key with us or microsconds

[1:17:40] instead of sec is the number of

[1:17:42] microsconds towards the next whole

[1:17:45] second in user mode. So it's bounded by

[1:17:49] 1 million.

[1:17:51] RUS time.tv

[1:17:55] sec is the number of whole seconds the

[1:17:58] PHP FPM worker process has spent in

[1:18:01] kernel mode. So that's system calls,

[1:18:05] memory mapping, network stack

[1:18:07] processing etc.

[1:18:10] And the same key with USC instead of SE

[1:18:13] is the number of microsconds towards the

[1:18:15] next whole second in kernel mode.

[1:18:19] The most intuitive mathematical approach

[1:18:21] to calculating the CPU time duration of

[1:18:25] the script is to combine the seconds

[1:18:28] with microsconds

[1:18:30] and sum the user time and kernel time

[1:18:34] and then subtract the start time from

[1:18:36] the end time just like we do for the

[1:18:39] wall clock duration.

[1:18:41] I was worried that floating point

[1:18:43] accuracy might be a problem here.

[1:18:46] Floating point accuracy is 15 to 17

[1:18:49] significant figures. So adding

[1:18:51] everything up to a large total before

[1:18:54] the final subtraction could cause some

[1:18:57] precision to be lost.

[1:19:00] And since the difference between the

[1:19:01] start and end times will generally be

[1:19:03] tiny, that loss of resolution could mean

[1:19:06] we get a result of zero once the PHP FPM

[1:19:10] worker passes a certain age.

[1:19:14] So maybe it would be better to have a

[1:19:16] mathematically equivalent but less

[1:19:18] intuitive formula that subtracts large

[1:19:21] numbers from large numbers and sums the

[1:19:24] results at the end.

[1:19:27] But I tried some rough calculations and

[1:19:30] actually the loss of resolution happens

[1:19:32] sometime after 250 years. So yes, the

[1:19:36] intuitive formula will do fine.

[1:19:41] The CPU time duration will inform our

[1:19:43] decision on setting the PHP directive

[1:19:46] max execution and the wall clock

[1:19:48] duration helps with the PHP FPM

[1:19:51] directive request terminate timeout.

[1:19:56] The last two methods are to log the size

[1:19:59] of the response headers and the body.

[1:20:04] The only thing to note here is that the

[1:20:06] headers are ASKI only. So we can always

[1:20:09] safely use strlen which is faster than

[1:20:13] the multibbyte equivalent mb strl ln.

[1:20:19] The body is UTF8.

[1:20:21] So potentially nonasi.

[1:20:24] So we can only use strl ln as long as in

[1:20:28] php.ini

[1:20:30] we set mbstring.funk

[1:20:33] overload to zero.

[1:20:36] Otherwise, we'd have to use MB_ST

[1:20:40] strlen and specify 8 bit encoding to be

[1:20:44] sure we're getting the number of bytes

[1:20:46] instead of the number of characters.

[1:20:50] Then we add a listener for the database

[1:20:53] to add the query time to the query

[1:20:55] duration array we just created.

[1:20:59] And next we configure the rate limiter,

[1:21:03] although it isn't actually applied at

[1:21:05] this point in the request.

[1:21:07] Engine X rate limits by IP and

[1:21:10] intercepts those requests before they

[1:21:12] reach Laravel. So in Laravel, we're

[1:21:15] limiting by user ID and that needs to

[1:21:18] happen after authentication.

[1:21:21] For us, it happens later in

[1:21:23] roots/api.php

[1:21:25] PHP straight after the authentication

[1:21:28] middleware.

[1:21:30] Malicious users can get around this with

[1:21:33] a coordinated system of multiple

[1:21:35] accounts and multiple IPs.

[1:21:38] If we were worried about that and we

[1:21:41] couldn't control user accounts or

[1:21:43] whitelisted IPs more tightly, then we

[1:21:46] might consider running pattern

[1:21:47] recognition of usage, i.e. combine the

[1:21:51] usage of two or more users. assess them

[1:21:55] as if they were a single user and see if

[1:21:57] it adds up to a malicious usage pattern.

[1:22:02] Or we might try to gather a list of

[1:22:04] known VPN IP addresses and monitor those

[1:22:07] accounts more closely.

[1:22:10] For EngineX's rate limit on IP

[1:22:12] addresses, we need to consider if our

[1:22:15] clients are in a business at the same

[1:22:17] address.

[1:22:18] For example, if all users log on at 9:00

[1:22:22] a.m. on the same IP address, then we

[1:22:24] might need to loosen X's rate limit on

[1:22:27] IP addresses.

[1:22:33] Next, the request hits the middleware.

[1:22:36] I've disabled global middleware and

[1:22:39] moved most of the default middleware to

[1:22:41] the API group.

[1:22:43] That's because the web group can only be

[1:22:46] accessed internally within the

[1:22:47] Kubernetes cluster.

[1:22:50] For trust proxies, we're providing the

[1:22:53] cider range that the ingress controller

[1:22:55] is guaranteed to be within. And then

[1:22:58] Laravel knows that the client's IP is

[1:23:01] the real IP and that the connection is

[1:23:04] indeed secure.

[1:23:07] Then API requests pass to root/api.php.

[1:23:11] PHP

[1:23:13] and for routes that require

[1:23:14] authentication the requests pass through

[1:23:17] the authentication middleware and the

[1:23:20] rate limiter middleware we discussed

[1:23:22] earlier.

[1:23:24] I extended the authenticate class as a

[1:23:27] convenient way to add the user ID to the

[1:23:30] context of logs created after

[1:23:32] authentication

[1:23:34] and also because in the original class

[1:23:36] if an unauthenticated request didn't

[1:23:39] have the accept header of

[1:23:42] application/json

[1:23:44] it tried to redirect the user to a login

[1:23:46] page but our API is purely JSON so

[1:23:50] that's not the desired behavior.

[1:23:53] The last endpoint is for browsers to

[1:23:55] report content security policy

[1:23:57] violations which go straight into the

[1:23:59] log.

[1:24:02] The web routes are mostly for the

[1:24:04] Kubernetes probes mentioned earlier. /

[1:24:07] Laravel readiness simply replies

[1:24:10] immediately.

[1:24:12] / Laravel startup if you recall tests

[1:24:16] the connections to the database and the

[1:24:18] cache in a try catch block and returns

[1:24:21] an error HTTP status code if there's any

[1:24:24] problem

[1:24:26] and / Laravel status is for my own

[1:24:29] tinkering and analysis of real cache and

[1:24:32] opcache in production to further tweak

[1:24:35] those configurations.

[1:24:37] Unfortunately, we can't just run those

[1:24:39] commands in the command line interface

[1:24:42] because the CLI is separate from PHP FPM

[1:24:45] and maintains its own obcache memory

[1:24:48] pool.

[1:24:49] And finally, an API request runs back

[1:24:52] through the middleware.

[1:24:54] The handle cause middleware along with

[1:24:57] its config lets us tell browsers that

[1:25:00] the Vue.js front end is permitted to

[1:25:02] access the API.

[1:25:05] And the inject meta middleware adds the

[1:25:08] meta object to each JSON response object

[1:25:11] as it's passing out of the door.

[1:25:14] After the response has been sent, the

[1:25:16] last action is to log the request for

[1:25:18] future analysis.

[1:25:23] I won't cover the Vue.js front end

[1:25:25] because this video is already quite long

[1:25:28] and it just logs into the API, stores

[1:25:30] the token and interacts with the API in

[1:25:33] a simple manner.

[1:25:35] All the source code is available in the

[1:25:37] description of the video and please let

[1:25:39] me know if you have any questions or any

[1:25:42] improvements on any part of the code.

[1:25:44] Thanks.

Topics #laravel #kubernetes #php-fpm #nginx #postgresql #docker #api design