Laravel API Kubernetes Setup
45sThis segment introduces a complex, real-world project that appeals to developers interested in modern DevOps and Laravel.
▶ Play ClipThis video provides a comprehensive walkthrough of deploying a Laravel RESTful API with PHP-FPM, Nginx, and PostgreSQL inside Docker containers orchestrated by Kubernetes. The presenter covers production cluster architecture, configuration decisions for PHP-FPM and Nginx, development environment setup, testing, deployment, and API design features like structured error responses and database constraint validation.
"The title accurately describes the video's content: a detailed walkthrough of a Laravel API project using PHP-FPM, Nginx, Postgres, and Kubernetes."
[00:00] Hello everyone. This video is a
[00:02] discussion of a Laravel restful API
[00:04] project using PHP FPM and engine X with
[00:08] a Postgress database all in docker
[00:11] containers deployed in Kubernetes.
[00:15] So I'll cover the setup in Kubernetes,
[00:18] the choices and thought processes when
[00:20] configuring PHP FPM and EngineX, the
[00:24] design of the development environment,
[00:27] testing, deployment, and some features
[00:29] of the API. Plus, there's a Vue.js front
[00:33] end thrown in as well to log into the
[00:35] API and interact with it. It's a bit too
[00:38] simple to discuss in this video, but
[00:40] it's also included in the code, and all
[00:43] the code is in the repository in the
[00:45] video description.
[00:49] The production environment is my
[00:51] Kubernetes cluster at home with a single
[00:54] control plane and two worker nodes on a
[00:57] pretty slow internet connection,
[00:59] but that doesn't have too much impact on
[01:01] the project and it can mostly be applied
[01:03] anywhere.
[01:06] The API itself has been kept generic so
[01:08] you won't have to listen to any domain
[01:10] specific concepts. The features I've
[01:13] added are focused on internal workings
[01:15] rather than a specific API use case. So
[01:18] they could be applied and expanded to
[01:20] many use cases.
[01:22] For example, responses are restricted to
[01:26] a set predictable structure, and each
[01:29] error has a unique integer error code,
[01:31] so the client can more easily handle
[01:34] error responses programmatically.
[01:37] Input validation for database columns
[01:39] with unique or foreign key constraints
[01:42] have been pushed to the database to
[01:44] reduce the number of queries needed.
[01:47] That's a controversial one, so we'll
[01:49] discuss the pros and the cons.
[01:52] There's also a fix for a very common
[01:54] enginex inefficiency regarding HTTP
[01:57] compression
[01:59] and there are some helpful logs to help
[02:02] us set the PHP FPM and enginex
[02:05] configurations.
[02:08] So the structure of the video is
[02:11] production environment,
[02:13] PHP FPM and EngineX configuration,
[02:16] the Docker files, the dev environment
[02:20] and something of a CI/CD pipeline but
[02:23] very minimal and the Laravel API itself
[02:28] and discussion of security
[02:30] considerations throughout.
[02:35] So what does the production cluster look
[02:37] like? The ingress controller manages
[02:40] traffic for the cluster, handles TLS
[02:43] termination, and applies HTTP
[02:45] compression. It forwards API requests to
[02:49] the EngineX and Laravel pod. If traffic
[02:53] increases, the EngineX and Laravel pod
[02:55] is duplicated by a horizontal pod
[02:58] autoscaler.
[02:59] So, as usual, we want to make sure that
[03:02] it's completely stateless.
[03:05] Then for the database we're using
[03:07] Postgress managed by a stateful set.
[03:11] Postgress zero is the primary instance
[03:13] that all Laravel instances write to.
[03:17] Postgress one and two and so on are the
[03:20] readonly standby instances to to
[03:23] increase read throughput if we have high
[03:25] concurrent usage.
[03:28] So whenever a new standby instance
[03:30] starts up, it duplicates the primary
[03:32] database with pg
[03:35] base backup.
[03:37] And whenever the primary instance
[03:39] executes a write operation, it sends the
[03:42] W or write ahead log records to each
[03:47] standby so that all instances are
[03:49] synchronized in near real time.
[03:52] To configure this in Laravel, in the
[03:55] database config file, add read and write
[03:58] elements to the database you're using,
[04:02] specifying the pod for the right
[04:04] instance and the service for the read
[04:06] instances.
[04:09] And for the development or testing
[04:10] environments, override that in the
[04:13] Laravel.env env file specifying the
[04:16] database's service name in docker
[04:18] compose
[04:22] for cache we're using reddus
[04:25] and of course there's the view front end
[04:27] which is mostly decoupled it could be
[04:29] hosted here or anywhere else and finally
[04:33] all these components are scraped by
[04:35] prometheus and graphana molds that data
[04:39] into dashboards which are also
[04:42] accessible through ingress controller
[04:46] Since the cluster has so few nodes, we
[04:49] can present the CPU and memory usage of
[04:52] each node in a single dashboard.
[04:55] And the other dashboard I watch most is
[04:58] for the ingress controller, especially
[05:00] latencies for the three connected
[05:02] services, the API, the front end, and
[05:06] the graphana front end we're using right
[05:08] now.
[05:10] On my slow home internet, the latencies
[05:12] don't help us model production systems
[05:15] very much. They're pretty slow, but the
[05:18] variance can still be insightful.
[05:20] The variance is expressed here with the
[05:23] average latency of the fastest 50%, 95%
[05:28] and 99% of requests.
[05:35] In the ingress controller, we're most
[05:38] commonly receiving requests from the CDN
[05:41] with the original client's IP in the X
[05:45] forwarded for header.
[05:47] So by specifying the CDN cider ranges
[05:50] that we trust, we can safely strip them
[05:53] from the forwarding chain and just leave
[05:56] the client's IP.
[05:58] That's important for logging and for
[06:00] rate limiting by IP address downstream
[06:04] in EngineX or Laravel.
[06:07] The ingress controller like any
[06:09] engineext instance generates a random
[06:12] request ID and here we're attaching it
[06:15] to the web request with a custom header.
[06:19] And that way we can reference a
[06:21] consistent request ID here in the
[06:23] ingress controller and also as the
[06:26] request passes on through engine X and
[06:28] Laravel and back again.
[06:32] Regarding HTTP compression,
[06:35] by default, gzip and brley compress
[06:39] files of any size, but they both add
[06:42] metadata to the compressed files. So for
[06:45] files that are already small, we're
[06:47] actually expending CPU power to compress
[06:51] a file and actually increase the amount
[06:53] of data to be transferred.
[06:57] So we specify brutley min length and
[07:00] gzip min length to set the file size
[07:03] threshold at which the ingress
[07:06] controller will apply compression.
[07:09] But what's the logical threshold to set?
[07:13] Well, the equilibrium point at which
[07:16] compression starts to reduce file size
[07:18] is different per file, but is usually
[07:21] between 100 and 400 kilob.
[07:25] So, is that the answer? Well, not
[07:28] really. When we send a message over TCP,
[07:32] as long as the payload fits within one
[07:34] TCP segment,
[07:36] the size of that payload has a
[07:38] negligible impact on the transmission
[07:41] time.
[07:42] It's like adding more passengers to a
[07:44] plane. It has almost no impact on the
[07:47] flight time.
[07:50] So from the client's perspective,
[07:52] latency doesn't scale linearly with the
[07:55] file size. It potentially jumps stepwise
[07:59] when the file size necessitates sending
[08:02] another TCP segment.
[08:05] That means HTTP compression is
[08:08] worthwhile only if it has a decent
[08:11] likelihood of reducing the number of TCP
[08:14] segments needed to hold a particular
[08:16] payload.
[08:18] So a legitimate strategy is to set the
[08:21] HTTP compression threshold at the file
[08:25] size that would trigger a second TCP
[08:28] segment.
[08:31] The maximum size of a TCP segment
[08:34] depends on the MTU, the maximum
[08:38] transmission unit on layer 3 of the OSI
[08:41] model, which is 1,500 bytes.
[08:46] Each IP packet has an IP header taking
[08:50] 20 to 60 bytes and a TCP header taking
[08:55] 20 to 40 bytes. And the rest of the
[08:59] 1,500 bytes is for the payload.
[09:03] For the largest possible payload that is
[09:06] still contained in one TCP segment,
[09:10] it would contain the TLS record for
[09:12] encryption, the HTTP headers, and
[09:16] finally once all of that is accounted
[09:18] for, the remaining space is for the HTTP
[09:21] body, which is the candidate for
[09:23] compression.
[09:26] In this project, it's too early to
[09:27] finalize what our standard HTTP headers
[09:30] will be. So, we can't finalize this
[09:33] optimization yet. But this is the
[09:35] formula that will decide it once all of
[09:37] the components are confirmed.
[09:41] But there's one last twist. If the
[09:44] response doesn't have a content length
[09:47] header, then EngineX ignores the gzip
[09:51] min length and brought minength
[09:54] directives and compresses every file
[09:57] regardless of file size.
[10:00] So for that reason in Laravel, we've got
[10:03] a middleware to measure the body and add
[10:06] the content length header.
[10:09] There's a PHP.ini ini directive called
[10:12] MB string.f funk_over.
[10:17] If we set it to zero, we can safely use
[10:19] the faster stren function for adding the
[10:24] content length header.
[10:26] Otherwise, we'd need to use the
[10:28] multibbyte equivalent mb
[10:32] strlen and specify 8 bit encoding to be
[10:36] sure we're getting the number of bytes
[10:38] instead of the character count.
[10:41] We have to make sure this middleware is
[10:42] late in the stack after any middleware
[10:45] that will modify the body because if the
[10:48] response body is longer than the content
[10:50] length header, engine X will cut off the
[10:52] excess.
[10:58] Okay, onto Laravel and Engine X. There
[11:02] are four ways of connecting PHP FPM with
[11:06] EngineX in Kubernetes.
[11:08] The first decision is whether to put the
[11:11] containers in separate pods or in the
[11:13] same pod.
[11:16] Engine X can handle far more connections
[11:18] than PHP FPM can and separate pods would
[11:22] let us scale them independently. So we
[11:25] could save the memory overhead of the
[11:27] excess engine X pods.
[11:30] Each engine X pod has an overhead of
[11:32] about 10 MGB.
[11:36] The downside is that EngineX could
[11:38] forward requests to PHP FPM instances on
[11:42] different nodes, which would add network
[11:45] latency.
[11:46] Getting around this is fiddly and might
[11:49] not be worth the memory saved.
[11:52] Also, separate pods means that the logs
[11:55] for a particular request would be split
[11:58] up between different pods.
[12:04] If we go with a single pod, that opens
[12:07] up the choice of how EngineX and PHP FPM
[12:10] will talk to each other. Either by using
[12:13] TCP or by mounting a shared volume onto
[12:17] both containers to hold a Unix socket
[12:20] file.
[12:22] Each TCP connection needs to be
[12:25] established with a TCP handshake
[12:27] consisting of three messages sent back
[12:29] and forth with the default EngineX
[12:33] configuration. That handshake happens
[12:35] for every single request rooted to PHP
[12:38] FPM.
[12:40] Each message sent is bundled with IP and
[12:43] TCP headers increasing the amount of
[12:46] data to be transferred. And depending on
[12:48] the size of the request and the
[12:50] response, they might be broken up into
[12:53] packets before sending, then reassembled
[12:56] at the other end. And finally, the
[12:58] connection needs to be torn down with
[13:00] another three messages.
[13:03] Due to all this redundant stuff, I
[13:06] really expected sockets to be measurably
[13:08] faster than TCP.
[13:11] Maybe the tearown would happen
[13:12] concurrently to the response being sent
[13:14] out, but the rest of it surely increases
[13:17] latency.
[13:19] However, in the very quick tests that I
[13:22] set up, latency was almost identical
[13:25] between the two methods.
[13:27] That's interesting academically, and I
[13:30] want to investigate more when I have the
[13:32] time. But practically speaking, if we're
[13:36] interested in such small savings in
[13:37] latency, then we'd likely be better off
[13:40] considering SWUL instead of PHP FPM and
[13:43] EngineX.
[13:45] And then this whole architectural
[13:47] decision would go away.
[13:50] Both TCP and sockets benefit from the
[13:53] EngineX directive fast CGI keep con
[13:57] which keeps the connection open between
[14:00] requests. So we wouldn't need the TCP
[14:02] handshake for every request.
[14:05] The PHP FPM counterparts are PM do
[14:09] process idle timeout which sets the time
[14:13] a worker can be idle before it's killed
[14:17] and PMAX
[14:19] requests which caps how many requests a
[14:22] worker can serve before it's respawned.
[14:27] And we need to set fast CGI pass in
[14:30] engineext and listen in PHP FBM.
[14:36] For TCP connections, we tell them which
[14:38] port and for socket, we tell them the
[14:41] location of the socket file.
[14:46] Then the fourth and final option for
[14:48] connecting enginex with PHP FPM in
[14:51] Kubernetes is putting them in the same
[14:54] container in the same pod.
[14:57] This is workable but it creates
[14:59] complications.
[15:01] Kubernetes monitors the status of the
[15:03] process with P1 as a health check. If
[15:07] the process exits, Kubernetes stops the
[15:10] container even if other processes are
[15:13] running. And if another process exit,
[15:16] Kubernetes has no idea and does nothing.
[15:20] So if a container has more than one key
[15:22] process, we need to implement a
[15:24] replacement for the Kubernetes health
[15:26] checks and process management.
[15:31] In the end, I went with the middle
[15:33] ground option of same pod, different
[15:36] containers with shared volume between
[15:38] them for a Unix socket file. Mostly
[15:42] because I hadn't tried this setup
[15:43] before.
[15:45] And we'll see the detailed
[15:46] implementation in the Kubernetes
[15:48] manifest and the Docker file later in
[15:50] the video.
[15:55] So with that decided, let's jump into
[15:57] the pod manifest.
[15:59] As I explain components, I'll also build
[16:02] up this visual representation to help
[16:05] visualize how everything links together.
[16:08] Obviously the Laravel and EngineX
[16:10] containers are at the core and the first
[16:13] step is to inject the configuration
[16:15] files with Kubernetes secrets and config
[16:18] maps and to mount the shared volume for
[16:21] PHP FPM to create the socket file we
[16:24] just talked about.
[16:27] If we can make the whole file system in
[16:29] a container read only, that's great for
[16:32] security. So we need to identify where
[16:35] the application needs right access and
[16:38] then mount volumes at those locations
[16:41] with right access and make everything
[16:43] else read only
[16:45] and the socket volume is our first one
[16:47] of those.
[16:52] The Laravel Bootstrap caches never
[16:55] change in production.
[16:57] So at first thought we'd run the cache
[16:59] creation commands in the Docker file so
[17:02] that they're part of the image and then
[17:04] we'd make them read only in production.
[17:08] That's fine for event cache, root cache,
[17:11] and view cache. But config cache needs
[17:15] to read thev file.
[17:19] For security reasons, we can't put
[17:21] sensitive files like thev into the
[17:24] image. So the only safe option is to run
[17:28] config cache within each pod as it
[17:31] starts up.
[17:33] So that means the cache directory needs
[17:36] to have right access in production. But
[17:39] we'd prefer it to only have read access
[17:41] because it never changes once created
[17:45] and any attacker that gets right access
[17:47] to the bootstrap caches could obviously
[17:49] do serious damage.
[17:52] But there is a way to get the best of
[17:54] both worlds.
[17:56] First, in the Docker file, we rename the
[17:59] bootstrap/cache
[18:01] directory to something else like cache
[18:05] temp.
[18:07] Then in Kubernetes, we run a laravel
[18:10] init container when the pod starts up.
[18:14] It has the env file injected and a
[18:18] writable volume mounted at
[18:20] bootstrap/cache.
[18:24] We copy everything from the temp cache
[18:26] directory into the volume at
[18:29] bootstrap/cache
[18:31] and then run php artisan config cache to
[18:37] create the final bootstrap cache file.
[18:41] The time command just logs the memory
[18:43] usage to help us set resource limits
[18:45] later.
[18:48] Then after the init container has
[18:50] finished, the main Laravel container
[18:53] starts up with the cache volume mounted
[18:56] as readonly true.
[19:00] And that's how we get readonly cache
[19:02] with sensitive data compiled at the pod
[19:06] startup.
[19:11] We're also running additive database
[19:14] migrations in the init container.
[19:17] The d- isolated flag is very
[19:20] consequential.
[19:22] It means while this migration is
[19:24] running, although normal reads and
[19:26] writes can still happen concurrently,
[19:29] no other migrate command with the
[19:32] isolated flag can begin.
[19:35] A migrate command without the isolated
[19:38] flag can still run. So it's important
[19:40] that we add it to every migrate command
[19:43] in production to make sure concurrent
[19:46] migrations are impossible.
[19:49] It uses the cache to track whether the
[19:52] isolated command is running or not
[19:55] currently.
[19:56] So if you're using the database for
[19:59] cache, it creates a catch22 situation
[20:02] for your first migration. you'd need to
[20:04] run migrations once without the isolated
[20:07] flag first,
[20:09] but most applications would just use
[20:11] Reddus, so it wouldn't be a problem.
[20:16] An unexpected problem I ran into is that
[20:20] if a container crashes and gets
[20:22] restarted by Kubernetes,
[20:24] the mounted volumes are not cleared.
[20:28] That's the behavior we want most of the
[20:30] time. We want data to be permanent for
[20:33] the life of the pod, but it can cause
[20:36] some misleading error logs. In the init
[20:39] container startup script, it copied and
[20:42] created the cache files with no problem.
[20:45] Then it hit an error with migrations.
[20:48] So, Kubernetes killed the container and
[20:51] ran it again. But on the second run, the
[20:54] volume was already populated. So the
[20:57] error logs were about file permissions,
[20:59] not the migration commands.
[21:02] So just bear that in mind. If all else
[21:05] is equal, move any operations on mounted
[21:08] volumes to the end of the script. But in
[21:11] our case, migrations actually depend on
[21:13] the cache files being present.
[21:18] We've got another init container to set
[21:20] up the directory structure for EngineX's
[21:23] writable volume.
[21:25] I prefer to use chain guard images to be
[21:28] minimal and more secure, but the CPUs of
[21:31] my nodes are too old and don't support
[21:33] them.
[21:35] And finally, we have one last container
[21:38] that scrapes the enginex metrics
[21:40] endpoint and presents that data in a
[21:43] format that Prometheus can then scrape.
[21:46] Port 8080 is for publicly available
[21:50] endpoints via ingress. Port 8081
[21:54] is for internal traffic like health
[21:56] checks and metrics and then Prometheus
[21:59] scripts the exporter on its default port
[22:02] 9113.
[22:08] Kubernetes provides three types of
[22:10] probes or health checks. Probes are
[22:14] attached to containers, but the actions
[22:17] on success or failure can impact either
[22:20] that container it's attached to or the
[22:23] whole pod.
[22:25] When a pod starts up, if at least one
[22:28] container has a startup probe, then that
[22:31] pod won't initially be added to the
[22:34] services end points. So, it won't be
[22:37] accessible by other pods or by outside
[22:40] traffic from ingress.
[22:43] If a startup probe fails, the container
[22:46] it's attached to is killed and by
[22:49] default configuration in Kubernetes, any
[22:52] killed container is instantly restarted.
[22:56] So that's hoping that a restart or a
[22:58] slight delay will fix whatever caused
[23:01] the startup probe to fail.
[23:04] Once a container startup probe passes,
[23:07] it will never run again. And instead,
[23:10] the container's readiness probe and
[23:12] livveness probe begin running if it has
[23:15] them. And they will then run repeatedly
[23:18] for as long as the pod exists.
[23:21] Once all startup probes in a pod pass
[23:25] and if there are no readiness probes,
[23:28] then the pod is added to the services
[23:30] endpoints and it starts serving traffic.
[23:34] If there are one or more readiness
[23:36] probes, then the pod waits on them.
[23:40] Once all readiness probes pass, the pod
[23:44] is added to the services endpoints.
[23:47] But the readiness probes keep running
[23:49] continuously.
[23:51] And if at any time a readiness probe
[23:53] fails, the pod is removed again. So a
[23:57] failed readiness probe only has a pod
[23:59] level effect. It doesn't kill the
[24:01] container.
[24:03] That's what livveness probes are for.
[24:05] When a livveness probe fails, it has no
[24:09] pod level effect. The pod can still
[24:11] receive external traffic. Instead, a
[24:14] failed livveness probe kills the
[24:16] container it's attached to.
[24:20] So, to summarize, a failed readiness
[24:22] probe stops traffic flow reaching the
[24:25] pod. A failed livveness probe kills the
[24:28] container it's attached to. And a failed
[24:32] startup probe does both of those, but
[24:35] only when the pod is starting up.
[24:39] Phew. I think that's the most concise
[24:41] summary of probes I can give. So, how
[24:44] can we apply these to our Laravel and
[24:46] EngineX pod?
[24:49] Well, we don't want requests reaching a
[24:52] broken pod. So, a readiness check is
[24:55] crucial.
[24:57] And the ability to serve requests
[24:59] depends on EngineX and Laravel both
[25:02] working. So we put the readiness probe
[25:05] on the engine X container to query a
[25:08] Laravel endpoint that returns a simple
[25:12] plain text response.
[25:14] That means the readiness check only
[25:16] passes if EngineX and Laravel and their
[25:19] connection are all fine.
[25:23] The ability to serve requests also
[25:25] depends on the connection to the
[25:27] database and the cache. So we might
[25:31] consider checking those connections as
[25:33] part of the readiness check.
[25:36] But if a database problem did occur, it
[25:40] would affect all Laravel pods.
[25:43] We wouldn't have a mix of healthy and
[25:45] unhealthy pods. And the readiness probe
[25:49] would react by removing this pod from
[25:51] its service, which doesn't do anything
[25:54] to solve the database problem. And
[25:57] actually we'd slightly prefer to keep
[25:59] the Laravel pod serving requests to give
[26:02] as graceful a response as possible.
[26:06] And then we'd rely on some other probe
[26:09] to heal the database problem closer to
[26:11] where it occurred. So no, the readiness
[26:15] probe should not check the connections
[26:17] to database and cache.
[26:20] However, it is a good idea to add those
[26:23] connections to the startup probe. That's
[26:26] so that if we're deploying a new version
[26:28] and I've messed up the connection
[26:29] configurations,
[26:31] Kubernetes will stop it going live and
[26:33] keeps the old version alive serving
[26:36] requests.
[26:39] Another reason to have a startup check
[26:41] is because we're doing opcache
[26:42] preloading at startup. So we need some
[26:45] flexibility around the slightly
[26:47] unpredictable bootup time.
[26:51] The startup probe of course also needs
[26:53] to check both enginex and Laravel and
[26:56] their connection. So we add it to the
[26:58] engineext container and call a startup
[27:01] endpoint in Laravel.
[27:05] This design has one perverse side effect
[27:07] that if the startup probe fails, it
[27:10] kills the engine X container it's
[27:12] attached to. Even though the problem is
[27:15] much more likely to come from the
[27:16] Laravel container or the database or
[27:19] cache connections, but that's one
[27:21] imperfection I think we can live with.
[27:25] And finally, what about livveness
[27:27] probes? Well, in the engineext
[27:30] configuration, I created an endpoint
[27:32] that returns a simple plain text
[27:34] response, but I think the chance of
[27:36] EngineX messing up is so unlikely.
[27:39] Currently, I don't think it's worth
[27:40] running a constant probe.
[27:43] Laravel is a bit more likely to mess up.
[27:45] So, we've got a livveness probe querying
[27:48] the PHP FPM status page.
[27:56] The last big topic of this manifest is
[27:58] the memory limits that Kubernetes
[28:00] imposes on each container. So this is
[28:03] where we'll transition to the topic of
[28:05] configuration for PHP, PHP FPM and
[28:09] engine X.
[28:11] The Kubernetes memory limit is a fail
[28:13] safe that kills the container if it's
[28:16] exceeded.
[28:17] That's a pretty drastic action. So we
[28:20] need to set it high enough to cover the
[28:22] peak memory usage in normal operation
[28:26] so that it's only triggered by abnormal
[28:28] memory usage that we want to catch early
[28:30] and contain.
[28:33] This is my formula to estimate peak
[28:35] usage in normal operation.
[28:38] My understanding can be improved further
[28:40] but I think this is a decent formula for
[28:42] now. And at the end we use a margin
[28:45] component to represent the degree of
[28:48] confidence we have in our estimation.
[28:51] PHP FBM has one master process and a
[28:55] variable number of workers that serve
[28:57] web requests. So we need to determine
[29:00] which memory expenses are per worker and
[29:04] which are shared between all workers.
[29:07] The master process overhead, the PHP
[29:10] interpreter and its extensions, OPC
[29:13] cache, and any mounted volumes stored in
[29:16] memory should all be counted once. And
[29:19] then everything in the brackets is
[29:21] multiplied by the maximum number of
[29:23] workers specified by the PHP FPM
[29:26] directive, PM domax children.
[29:31] For an application that's 100% CPU
[29:34] inensive, we'd set PM max children equal
[29:39] to the number of CPU cores available to
[29:41] the container.
[29:43] But the larger the IO weight is expected
[29:46] to be like waiting for database queries
[29:49] to come back, the more we can raise PMAX
[29:54] children above the number of cores.
[29:58] Workers process one request at a time.
[30:01] So memory usage doesn't scale infinitely
[30:04] as the concurrency of requests grow. The
[30:08] number of workers is a cutoff. So
[30:10] pm.mmax children is the ultimate cutff.
[30:14] And any queue of waiting requests mostly
[30:17] consumes engineext's memory allowance,
[30:20] not PHP FPMs.
[30:24] Memory limit in PHP.ini INI is the
[30:28] memory usage failsafe on script
[30:31] execution.
[30:33] So just like the Kubernetes memory
[30:35] limit, we need to predict the peak
[30:37] memory usage in normal operation of a
[30:40] single script this time and add a margin
[30:42] of confidence.
[30:45] If abnormal memory usage happens in a
[30:48] script, we want the PHP memory limit to
[30:51] kill that script.
[30:54] And the additional margin of the
[30:56] Kubernetes memory limit means it can
[30:58] only be triggered in an even rarer and
[31:01] more extreme situation and it would kill
[31:03] the whole container, not just a single
[31:06] request.
[31:08] To help set PHP's memory limit in
[31:11] Laravel, we're logging the peak memory
[31:13] usage for every request.
[31:16] Similarly, we're also logging the real
[31:18] path cache size and the worker ID.
[31:22] Doing it in the middleware's terminate
[31:24] method means it's executed after the
[31:27] response is sent out.
[31:30] We can also use Xdebug profiling to find
[31:33] exactly how memory is used in a
[31:35] particular request execution.
[31:38] And while profiling in Laravel, we
[31:41] should disable garbage collection at the
[31:43] start of the script to ensure accurate
[31:45] readings.
[31:47] So there's a setting in thev file to
[31:49] toggle garbage collection.
[31:55] We have a similar structure for the
[31:57] engine X prediction of peak memory usage
[32:00] in normal operation.
[32:03] Shared resources are counted once and
[32:06] the items in brackets are per
[32:08] connection. So they get multiplied by
[32:11] worker processes which is the number of
[32:13] workers and worker connections which is
[32:17] the maximum number of connections per
[32:19] worker.
[32:21] The default for worker connections is
[32:24] 512
[32:26] and that can realistically be set as
[32:27] high as 10,000 or more.
[32:30] That means that since we will add a
[32:32] margin of confidence to each of these
[32:34] buffer sizes here to ensure they can
[32:36] satisfy legitimate requests,
[32:39] those margins would then be multiplied
[32:42] thousands or tens of thousands of times
[32:44] when we're calculating the container
[32:46] level memory limit.
[32:49] We would then be reserving a huge amount
[32:51] of memory for a peak usage scenario that
[32:54] is very unlikely to occur.
[32:57] So to manage that problem, first we need
[33:00] to align worker connections with the
[33:03] peak request concurrency we want to
[33:05] guarantee satisfying and with the memory
[33:08] or the financial constraints that we
[33:11] have.
[33:12] Second, let's consider the size of these
[33:15] buffers extremely carefully. Can we
[33:18] restrict them without hurting UX?
[33:21] Does exceeding a particular buffer kill
[33:23] a request or does it just downgrade
[33:26] performance and by how much?
[33:29] So with that as our goal, how do these
[33:31] buffers work? For an incoming request,
[33:35] EngineX puts the headers into the client
[33:38] header buffer.
[33:40] If the headers exceed that buffer,
[33:43] EngineX puts them into the large client
[33:46] header buffers.
[33:49] And if in that scenario the initial
[33:52] client header buffer is no longer used,
[33:54] we could safely remove that from our
[33:56] peak usage formula. But I haven't had
[33:59] time to experiment with that yet, so I'm
[34:01] keeping it in just to be safe.
[34:04] If the large client header buffers are
[34:07] exceeded, EngineX returns a 400 bad
[34:11] request error. That's a pretty drastic
[34:14] action. So, we can't squeeze this buffer
[34:17] too tightly, especially if we have a
[34:19] broad or non-technical user base.
[34:23] But if our API end users are technical
[34:26] enough, we could put a low but
[34:28] reasonable limit on the size of the
[34:30] request headers and then require users
[34:33] to read and abide by the documentation.
[34:41] EngineX stores the request body in the
[34:44] client body buffer and any excess is
[34:47] stored on disk. So we can be more
[34:50] aggressive with this buffer. Maybe only
[34:52] guaranteeing it will hold the bodies of
[34:55] 95% or 99% of legitimate requests.
[35:02] For an enginex instance that only serves
[35:04] static assets like the enginex serving
[35:07] our Vue.js JS front end legitimate users
[35:11] will never send post, put, or patch
[35:13] requests.
[35:15] So, we can set the client body buffer
[35:17] size very low.
[35:20] But, we'd still need to count it in our
[35:21] formula because users can still fill up
[35:24] that buffer. So, we don't want to give
[35:26] malicious users the ability to trigger
[35:28] our memory limit and kill the EngineX
[35:30] container.
[35:34] After receiving the request line and the
[35:36] headers, EngineX compiles them to
[35:39] determine how to route the request. Some
[35:43] of the requests will be sent to PHP FPM
[35:46] and it will run the Laravel application
[35:48] to build the response which will then be
[35:51] sent back to EngineX.
[35:54] The first part of the output from PHP
[35:57] FPM is stored in the preliminary buffer
[36:00] determined by fast CGI buffer size.
[36:05] It's crucial that the fast CGI headers
[36:08] are fully included in this preliminary
[36:11] buffer because if not, EngineX returns a
[36:14] 502 bad gateway error.
[36:18] These fast CGI headers are what will
[36:20] later be translated into the responses
[36:23] HTTP status and HTTP headers.
[36:29] So we need to be very aware of and
[36:31] control the size of the headers returned
[36:34] by Laravel.
[36:35] In the engineext access log, we're
[36:38] recording the embedded variables,
[36:40] upstream response length, which is the
[36:43] total size of the response payload sent
[36:46] from PHP FPM, and body bytes sent, which
[36:51] is the size of the response body. So,
[36:54] the total minus the body gives us the
[36:57] size of the headers. This is likely to
[37:00] be much smaller than the size of the
[37:02] equivalent HTTP headers because it's in
[37:05] binary key value format and doesn't
[37:08] include endline characters.
[37:13] Any headers added by engine X are not
[37:16] held in this buffer because they're
[37:18] added as the response is sent out.
[37:23] If this preliminary CGI buffer fully
[37:25] contains the headers, the rest of the
[37:28] buffer is put to good use and fills up
[37:30] with the first part of the response
[37:32] body. So there's no memory saving
[37:34] benefit to squeezing this buffer
[37:36] aggressively.
[37:38] If the response exceeds the preliminary
[37:41] buffer, the rest of the body is stored
[37:43] in fast CGI buffers.
[37:47] Yes, that's named confusingly.
[37:50] what I'm calling the preliminary buffer
[37:53] is determined by the enginex directive
[37:56] fast CGI buffer size
[37:59] and then these body only buffers are
[38:01] determined by fast CGI buffers.
[38:06] If these body only buffers are also
[38:08] exceeded engineext responds with a 502
[38:12] bad gateway.
[38:14] So we need to be very aware of the
[38:16] maximum size of our responses.
[38:20] The value of upstream response length in
[38:22] our logs will help with that.
[38:26] If we can control the response size with
[38:28] a high degree of confidence, we can set
[38:31] these buffers quite tightly.
[38:34] And then we'd need to set up a system
[38:36] such that any design or codebase change
[38:38] that affects the maximum response size
[38:41] triggers a reassessment of this
[38:43] configuration before deployment.
[38:46] And we'd also need to implement
[38:47] end-to-end tests for the scenarios that
[38:50] generate the largest possible responses
[38:52] in production.
[38:55] It's possible that EngineX clears the
[38:57] request buffers before the response
[39:00] buffers are filled. If that's the case,
[39:03] we could safely use the max of the
[39:05] request buffers and the response buffers
[39:08] instead of the sum. And that would then
[39:10] reduce our peak memory estimate by quite
[39:13] a lot.
[39:14] That would be very interesting to
[39:16] examine, but I haven't had the time to
[39:17] do that yet.
[39:20] For an enginex instance that only serves
[39:23] static assets, it's impossible for a
[39:25] request to use the fast CGI buffers, not
[39:29] to mention the proxy or the output
[39:31] buffers not mentioned here. So, we can
[39:34] safely remove those from our peak usage
[39:36] formula for that instance.
[39:39] And the ingress controller handles TLS
[39:42] termination. So we're also ignoring SSL
[39:45] buffer here.
[39:52] PHP, PHP FPM, and EngineX all have
[39:57] settings for timeouts that govern
[39:59] various parts of the request response
[40:01] process. So I created this diagram for
[40:04] my notes to help visualize how they line
[40:06] up.
[40:08] The x-axis represents time very roughly
[40:13] as the request goes from client to
[40:15] engineext to PHP FPM and the response
[40:19] reverses that route.
[40:22] But the size of each block doesn't
[40:24] correspond to how long the task takes or
[40:26] the suggested timeout value. Rather, the
[40:30] diagram shows how and when each timeout
[40:33] is triggered and how they overlap.
[40:38] If any of these timeouts are breached,
[40:40] the client receives an error response.
[40:43] So, we want these timeouts to
[40:45] accommodate essentially 100% of
[40:48] legitimate requests.
[40:51] Starting from the left, the first
[40:53] timeout is client header timeout, which
[40:57] is triggered when EngineX accepts the
[40:59] connection after the TCP handshake.
[41:03] It sets the time needed to receive the
[41:05] request line and the headers from the
[41:08] client.
[41:09] It's not an absolute timeout. Rather,
[41:12] it's a timeout for the intervals between
[41:15] reads. That is, each time some part of
[41:19] the header is received, the timer resets
[41:22] to zero. And that's a running theme for
[41:25] a lot of these engine X timeouts. As you
[41:28] can see in the diagram,
[41:31] after EngineX receives and pauses the
[41:34] request line and the headers, the client
[41:36] body timeout begins and limits the
[41:40] intervals between reads of the request
[41:42] body from the client.
[41:45] The purpose of these two request
[41:47] timeouts is to stop partial requests
[41:50] filling up the available connections.
[41:53] A slow loris attack is an attempt to do
[41:56] that on mass and a clever attacker could
[41:59] easily determine our timeouts and then
[42:01] drip feed the server with a response
[42:04] repeatedly. So it's important to also
[42:07] limit concurrent requests from the same
[42:09] IP address.
[42:11] So we store all concurrent IP addresses
[42:15] with limit con zone. With my settings,
[42:19] there can be a maximum of 24,048
[42:22] concurrent connections. So that needs
[42:26] 124 kilobytes to guarantee storing all
[42:28] concurrent addresses.
[42:32] Then in a server or location block, use
[42:35] limit con to put an upper limit on the
[42:38] number of concurrent connections from
[42:40] the same IP address.
[42:43] Back to timeouts.
[42:46] After the header has been received,
[42:48] EngineX can determine how to process the
[42:51] request. If it will be forwarded to PHP
[42:54] FPM and if the EngineX worker doesn't
[42:57] already have a connection with an idle
[43:00] PHP FPM worker, it starts a new
[43:03] connection and starts the fast CGI
[43:07] connect timeout.
[43:09] In normal operation, establishing a
[43:12] connection is near instantaneous
[43:14] unless all PHP FPM workers are busy and
[43:18] the queue is filled up. The Q size is
[43:21] determined by the PHP FPM directive
[43:25] listen backlog. So, it's advisable to
[43:29] set it to the maximum 511
[43:32] and then set fast CGI connect timeout to
[43:36] just a few seconds.
[43:39] Once the body is fully received and the
[43:42] fast CGI connection is established,
[43:45] EngineX begins sending the request to
[43:47] PHP FPM and starts the fast CGI send
[43:53] timeout and PHP starts the max input
[43:57] time.
[44:00] Max input time also covers the pausing
[44:03] of the request body like populating the
[44:06] dollar post or dollar files predefined
[44:11] variables before the script can begin
[44:14] execution.
[44:16] But of course, EngineX doesn't know
[44:18] about or care about any of that. So as
[44:21] soon as the transmission is complete, it
[44:23] switches from the first CGI send timeout
[44:26] to the first CGI read timeout, which
[44:30] puts a limit on how long Laravel can
[44:32] take to return the response in full.
[44:36] More accurately, it puts a limit on the
[44:38] interval between read operations. But
[44:41] since Laravel typically buffers the
[44:43] whole response and sends it out at the
[44:45] end, that makes fast CGI read timeout
[44:50] almost the same as an absolute timeout.
[44:55] The execution time of the PHP script is
[44:58] limited by PHP's max execution time and
[45:03] by PHP FPM's request terminate timeout.
[45:09] Max execution time measures CPU time. So
[45:13] the timer is paused during IO operations
[45:16] like database queries. And when
[45:19] exceeded, it has a slightly more
[45:21] graceful termination.
[45:23] Whereas request terminate timeout
[45:26] measures wall clock time and it has a
[45:29] hard termination.
[45:31] So we'd align these two, but then we'd
[45:35] increase request terminate timeout to
[45:38] account for the peak expected IO time.
[45:41] And then we'd also add a little margin
[45:44] to give max execution time a chance to
[45:47] terminate the script more gracefully
[45:51] after returning the response in full. If
[45:54] the PHP script continues execution as in
[45:58] with the terminate method in middleware,
[46:01] this is included in max execution time
[46:05] and request terminate timeout, but not
[46:08] in EngineX's fast CGI read timeout
[46:12] because once EngineX has received the
[46:14] response, it's already moved on to
[46:16] sending it to the client.
[46:19] So the script execution timeouts can
[46:21] extend rightwards beyond T5 in the
[46:25] diagram and maybe beyond the end of
[46:28] engine X's fast CGI read timeout. Though
[46:32] in that situation, it's probably better
[46:34] to use Q workers instead
[46:38] to help set the script execution
[46:40] timeouts. In Laravel, we're logging wall
[46:43] clock duration and CPU time duration for
[46:47] each request.
[46:49] And finally, at the right of the
[46:51] diagram, send timeout limits the
[46:55] intervals between write operations while
[46:58] sending the response to the client.
[47:02] Engine X has four embedded variables
[47:04] which we can log to help us with setting
[47:08] some of these timeouts.
[47:10] Request time measures from the first
[47:13] bite received from the client to the
[47:16] last bite sent to the client.
[47:20] Upstream connect time measures the time
[47:23] to establish the first CGI connection.
[47:27] That should hopefully always be zero.
[47:31] Upstream header time measures from the
[47:34] first bite sent to PHP FPM until the
[47:38] first bite received in response by
[47:40] engine X.
[47:43] An upstream response time has the same
[47:46] start point but keeps measuring until
[47:49] the last bite of the response is
[47:51] received by engine X. Those two will be
[47:54] the same unless the response is very
[47:56] large.
[48:00] In the Laravel documentation, the
[48:03] suggested EngineX configuration is
[48:05] really not great.
[48:08] As an example, let's say a user sends a
[48:12] request to our domain /hello.
[48:16] So this embedded variable dollar uri
[48:19] equals hello.
[48:21] With this configuration, what we're
[48:23] asking engineext to do is this.
[48:26] First check for a file called hello and
[48:30] if it exists serve that file to the
[48:33] client.
[48:34] So far so good. That could be a JS file
[48:36] or a CSS file.
[48:39] But if hello file doesn't exist, check
[48:42] for a directory called hello.
[48:46] If hello directory exists, check for an
[48:49] index file which above is defined as
[48:52] index.php.
[48:54] If that exists, serve it to the client.
[48:57] And already from a Laravel perspective,
[49:00] we are way off course.
[49:03] If hello directory didn't contain
[49:06] index.php,
[49:08] then serve the directory listing, which
[49:11] very ancient internet users will
[49:13] remember, but these days directory
[49:15] listings are disabled by default. So,
[49:18] EngineX returns 403 forbidden simply
[49:22] because the user requested a directory
[49:24] which does exist on the server.
[49:28] And if the request URI isn't a file and
[49:31] isn't a directory, finally we're
[49:33] directed to the Laravel application
[49:36] index.php.
[49:38] But I don't understand why index.tphp
[49:40] here is so convoluted with variables.
[49:44] If not to a static file, we always want
[49:46] to forward to public/index.php.
[49:50] So why not just hardcode it here and
[49:52] state it clearly?
[49:56] And these headers at the moment are
[49:58] functional. But the moment we put an add
[50:01] header directive into a location block,
[50:04] that block no longer inherits add header
[50:07] directives from outside. So it's safer
[50:10] to just put all add header directives
[50:14] into location blocks and don't rely on
[50:16] inheritance.
[50:18] This whole configuration feels like a
[50:20] copy paste job from a pre- Laravel PHP
[50:23] project. And although it works, it has
[50:27] needless inefficiencies and potential
[50:29] security flaws.
[50:33] For our EngineX configuration on port
[50:35] 8080,
[50:37] we want all responses to be JSON,
[50:40] including errors caught by EngineX.
[50:43] So for the error pages, we're using
[50:45] named locations that serve static JSON
[50:48] files saved into the image.
[50:51] Internal locations would also achieve
[50:53] the same result.
[50:56] Then apart from the FAV icon and
[50:58] robots.txt, txt we're returning 404 for
[51:02] any request that doesn't start with
[51:05] slash API/v1
[51:08] slash
[51:10] and if we take a look at the standard
[51:12] hacky requests that every server gets
[51:15] this location block alone rejects well
[51:18] all of them we definitely don't want any
[51:21] malicious requests like this to access
[51:23] any static files and preferably we don't
[51:26] want to waste resources forwarding the
[51:28] request to Laravel just for it to return
[51:31] a 404 response.
[51:35] All API requests get sent to Laravel and
[51:38] index.php is hardcoded for clarity.
[51:43] Finally, we're rate limiting by IP
[51:45] address here in EngineX and later by
[51:49] user ID in the Laravel application.
[51:54] Then for port 8081 for cluster internal
[51:58] traffic, we've got keep alive times to
[52:00] sustain TCP connections with EngineX
[52:03] exporter and the readiness check for one
[52:08] week and for two weeks respectively.
[52:12] /engineex up is a simple enginex only
[52:15] endpoint for a livveness check which
[52:18] we're currently not using.
[52:20] slashengineext status is for the enginex
[52:23] exporter to scrape engineext metrics and
[52:26] in turn be scraped by Prometheus
[52:29] and the rest are specific endpoints to
[52:31] send to Laravel
[52:34] in the Vue.js EngineX instance if a
[52:38] request URI ends in one of these file
[52:41] extensions we check if the static file
[52:44] exists and if not return index.html HTML
[52:49] and the Vue.js router will send the 404
[52:52] page.
[52:55] And then for non-static asset URIs, go
[52:58] straight to index.html.
[53:01] In the content security policy header,
[53:04] we need to specify the API domain for
[53:07] connect source and form action.
[53:12] For static assets, we can cache the
[53:14] results of the stat and open system
[53:17] calls. And since we're using immutable
[53:20] containers, we can safely cache for a
[53:22] year or more. And then the static assets
[53:25] themselves will likely be held in memory
[53:27] by Linux's page cache, though that
[53:30] depends on other containers in the same
[53:32] node.
[53:37] Let's take a quick look at the Docker
[53:38] files for the Laravel image. Some
[53:42] dependencies are needed at build time
[53:44] for compiling PHP extensions and running
[53:47] composer install but not needed at
[53:50] runtime.
[53:52] So the overall design is compile in a
[53:55] builder target and then copy the results
[53:58] into a fresh minimal target for
[54:01] production.
[54:03] To accommodate other targets which we'll
[54:05] discuss in a second, we have to split
[54:08] builder and build prod targets and split
[54:12] minimal base and prod targets with prod
[54:15] being the ultimate image to deploy in
[54:18] production.
[54:20] During the build phase, the composer
[54:22] install command only needs composer.json
[54:26] and composer.lock, block. But creating
[54:29] the auto loader obviously needs the
[54:31] entire codebase. So a very efficient
[54:34] Docker caching strategy is to copy in
[54:37] thejson andlock files, run composer
[54:41] install with the d-n no autoloadader
[54:44] flag.
[54:46] Then that intensive process is cached
[54:49] until we change our composer
[54:51] dependencies.
[54:53] Then we can copy in the codebase and
[54:55] build the autoloader.
[54:57] If we didn't split those commands, we'd
[54:59] need to build the vendor directory every
[55:02] time we modify a file
[55:05] in the production image for PHP FPM to
[55:09] communicate with the engineext
[55:11] container. Both processes need read and
[55:14] write access to the socket file. So we
[55:17] make sure that the www data user in this
[55:21] container and the engineext user in the
[55:24] engineext container have the same UID
[55:28] and then set the file permissions
[55:30] accordingly in the PHP FPM configuration
[55:35] as mentioned in the Kubernetes section.
[55:37] We run all bootstrap caches except for
[55:40] config cache in the docker file.
[55:44] And we're not using Laravel's built-in
[55:46] health checker endpoint. But if we were,
[55:49] apparently that view isn't cached by PHP
[55:52] artisan view cache. So we can access it
[55:56] once in the Docker file to force that
[55:59] view to be rendered so that we can make
[56:02] the cache directory read only in
[56:04] production.
[56:06] For file permissions, I've commented out
[56:09] some of my standard Laravel production
[56:11] setup because they don't apply in
[56:13] Kubernetes or for this particular
[56:15] project, but I like to keep them here
[56:17] just as a reminder or if we change the
[56:20] project or architecture later.
[56:24] For the local dev environment, the
[56:26] simplest option is to extend the builder
[56:28] target just before the composer install
[56:31] command is run. Then install development
[56:35] tools like XDBug and create a directory
[56:38] for the output from XDBug profiling
[56:42] and then mount the local codebase into
[56:44] the container in docker compose with a
[56:47] bind mount.
[56:50] We want to run composure install and
[56:52] artisan commands inside this container
[56:55] to write files to our local device.
[56:59] So in the make file we can construct a
[57:01] command to enter the container with the
[57:04] www data user and our local users group
[57:09] GD
[57:10] and then set um mask to make sure that
[57:13] any files generated have 775 permissions
[57:17] as in full permissions for both user and
[57:20] group.
[57:21] That way the vendor directory and any
[57:24] other files created by this container
[57:26] are readable and writable by the www
[57:29] data user in this container and our
[57:33] local user on the host.
[57:37] I've also got a combined target for
[57:39] testing the architectural option of PHP
[57:43] FPM and engine X in a single container
[57:46] which we discussed earlier.
[57:50] The only complaint with this dev image
[57:52] is that it has different dependencies to
[57:54] the production image. So potentially
[57:57] some tests could be passing in this
[57:59] image but be failing for production.
[58:02] This is a trade-off I'm happy with. But
[58:05] an alternative, more complex approach
[58:07] would be to run composer install with
[58:10] testing dependencies included.
[58:13] Then copy the resultant vendor directory
[58:16] into a target that splits from the
[58:18] production image just before the
[58:20] codebase is copied in and then mount the
[58:24] local code base with a bind mount. And
[58:27] similarly for the dev environment split
[58:30] off from minimal base and install dev
[58:33] tools like xdebug.
[58:37] The local development environment is
[58:39] handled by docker compose. There's not
[58:42] too much to note here. Engineext depends
[58:45] on the laravel container and service
[58:48] started is enough because we just need
[58:50] the socket file to exist before enginex
[58:53] starts.
[58:55] Ideally, Laravel should depend on
[58:57] Postgress and Reddus passing health
[58:59] checks. Postgress has a handy pg_is
[59:04] ready command and Reddus has ping. But I
[59:08] commented out the dependencies since
[59:10] very often I was just testing engine X
[59:12] and PHP only and didn't want to wait an
[59:15] extra 2 seconds to get running.
[59:18] There are two instances of Postgress,
[59:21] one for the dev environment and one for
[59:22] testing. It's most common to run tests
[59:26] with an in-memory instance of SQL Lite
[59:29] as the database. But as we'll see later,
[59:32] our application is unfortunately tightly
[59:35] coupled to Postgress. So we need to run
[59:37] the tests with Postgress to have any
[59:40] confidence that the results represent
[59:42] the application in production.
[59:46] There's a pre-commit script to process
[59:48] the code before committing to git
[59:50] repository for Vue.js. JS lint staged
[59:55] handles things pretty well. We're
[59:58] running prettier eslint and vest on only
[1:00:02] staged files that have changed since the
[1:00:04] last commit and running view tsc on the
[1:00:08] whole project.
[1:00:10] In a similar way, in the bash script,
[1:00:13] we've got a function that returns an
[1:00:15] array of the staged files that have
[1:00:18] changed since the last commit, so that
[1:00:20] we're not wasting time and resources
[1:00:22] checking the whole codebase on each
[1:00:24] commit.
[1:00:26] For scripts, we feed that array into
[1:00:28] shell check lint. And for PHP, we're
[1:00:32] validating the composer.json JSON and
[1:00:35] log files. Then feeding the changed
[1:00:38] files array into PHP stan at level 9 and
[1:00:42] pint and then running git add to
[1:00:45] reststage any files that were modified.
[1:00:48] Each makes sure to reference the correct
[1:00:50] configuration file. And we have a
[1:00:53] stricter pint configuration for non-ests
[1:00:56] than for tests.
[1:00:59] And the last step is to run PHP unit. It
[1:01:03] executes a script mounted in the Laravel
[1:01:05] container which accepts arguments for
[1:01:09] whether or not to first run database
[1:01:11] migrations,
[1:01:12] the test coverage threshold, test suites
[1:01:16] to include or exclude, and which tests
[1:01:19] specifically to run.
[1:01:23] Since I'm not working as part of a team,
[1:01:25] a make file is sufficient for a CI/CD
[1:01:28] pipeline for testing, building, and
[1:01:30] deploying.
[1:01:32] Exec Laravel executes a command in the
[1:01:35] Laravel container.
[1:01:38] Very often we're creating or editing
[1:01:40] files on the host machine via a bind
[1:01:42] mount. So we enter the container with
[1:01:45] both the www data user and the local
[1:01:50] users group so we can function in both
[1:01:52] worlds. And we're setting um mask so
[1:01:55] that any files created have 775
[1:01:58] permissions so that both the containers
[1:02:01] user and the local user can read and
[1:02:03] write the files created.
[1:02:06] Shell Laravel executes an interactive
[1:02:09] shell in the container.
[1:02:12] The composer commands have the same
[1:02:14] function and are just time savers to do
[1:02:16] specific composer functions without
[1:02:19] opening an interactive terminal.
[1:02:21] And then there are similar exec and
[1:02:23] shell commands for each container.
[1:02:27] If we want to run PHP stan pint or PHP
[1:02:31] unit outside of the pre-commit check,
[1:02:34] those commands are here. For testing,
[1:02:37] there are commands for the standard test
[1:02:39] suite and for end-to-end tests for
[1:02:42] testing a deployment.
[1:02:45] Then there are commands for building the
[1:02:46] images and for deploying them in
[1:02:49] Kubernetes.
[1:02:51] We can switch contexts between the local
[1:02:53] kind cluster and my physical cluster.
[1:02:59] For this API, I want every response to
[1:03:02] fit within a small set of predictable
[1:03:04] JSON structures.
[1:03:07] The top level of the response should
[1:03:09] always be an object, not an array to
[1:03:11] avoid JSON array hijacking.
[1:03:14] And for every response object, we attach
[1:03:17] an object called meta, which at least
[1:03:20] includes a request ID to help clients
[1:03:23] communicate problems with us and help
[1:03:25] with debugging.
[1:03:27] I'm also including the timestamp and
[1:03:29] script duration for now, but these don't
[1:03:32] have any practical purpose at the
[1:03:33] moment.
[1:03:36] For a query that returns a single
[1:03:38] resource, the resource is the value of a
[1:03:41] key called data.
[1:03:44] For a query that returns multiple
[1:03:46] resources, data is an array of results.
[1:03:51] And we add an object called pagionation
[1:03:54] to help the client navigate through the
[1:03:56] data set.
[1:03:58] For a successful query with no resource
[1:04:00] to return, data just contains result
[1:04:04] success.
[1:04:06] And finally, if at least one error
[1:04:08] occurs, data is replaced by an array
[1:04:11] called errors.
[1:04:13] which contains error objects which each
[1:04:15] have an integer code and a string
[1:04:18] message.
[1:04:20] These error codes help the client to
[1:04:22] respond programmatically to a problem
[1:04:24] without needing to parse the message
[1:04:26] text.
[1:04:28] The error itself is a data transfer
[1:04:31] object called API error and the code is
[1:04:35] an integer enum called API error code.
[1:04:41] API error code has a method called
[1:04:44] message which accepts an array of
[1:04:46] placeholders if necessary and returns an
[1:04:50] appropriate message for each error code.
[1:04:53] And the API error constructor calls this
[1:04:57] message method.
[1:05:01] So API error code is a very convenient
[1:05:04] single location to plan and construct a
[1:05:07] list of all possible errors, pair them
[1:05:10] up with a suitable message, and a
[1:05:12] reference point for the placeholders
[1:05:14] that we need to pass to the API error
[1:05:17] constructor.
[1:05:19] The aim is to be as specific as possible
[1:05:22] with error codes, but also to have more
[1:05:24] general error codes to fall back on. For
[1:05:28] example, we have specific error codes
[1:05:30] for each type of input validation
[1:05:33] employed in our project. But if
[1:05:35] something goes wrong, there's a general
[1:05:37] validation error code. So if we use a
[1:05:41] new type of input validation in a
[1:05:43] controller or form request, but we don't
[1:05:46] account for it in our validation error
[1:05:48] handler, we can still return a
[1:05:50] semi-specific error response. And in the
[1:05:53] worst case scenario, we can fall back on
[1:05:55] the unknown error code.
[1:05:58] Of course, when such general errors
[1:06:00] happen, we need to analyze the logs and
[1:06:03] construct more insightful error codes.
[1:06:05] As a result,
[1:06:08] the JSON response structures mentioned
[1:06:10] each have a method in the class called
[1:06:13] API response builder.
[1:06:16] success pagenated
[1:06:19] errors.
[1:06:21] And as well as the errors method,
[1:06:23] there's also an error method because
[1:06:26] returning a single error is the most
[1:06:28] common scenario and it's easy to forget
[1:06:30] to enclose it in an array.
[1:06:35] Since we want to always send JSON
[1:06:37] responses with descriptive error codes,
[1:06:40] we want to replace Laravel's default
[1:06:43] exception handling behavior in Laravel
[1:06:46] 11 onwards. That's done in
[1:06:48] Bootstrap/app.php.
[1:06:53] But since it's likely to get quite
[1:06:54] sizable, I've extracted it to a class
[1:06:56] called API exception handler.
[1:07:00] In simple cases, we can just define an
[1:07:02] error response with API response
[1:07:05] builder.
[1:07:07] In some cases, there's a small amount of
[1:07:09] processing to add more detail to the
[1:07:11] error response.
[1:07:13] And there's a catch all default for any
[1:07:16] exception we're not handling
[1:07:17] specifically.
[1:07:20] For input validation errors, I've
[1:07:22] extracted the logic to validation errors
[1:07:25] builder, which returns an array of the
[1:07:28] data transfer object. API error to be
[1:07:32] fed into API response builder errors
[1:07:35] method.
[1:07:37] In validation errors builder, we loop
[1:07:40] through the errors returned by the
[1:07:42] validator and match each with the
[1:07:45] correct API error code.
[1:07:49] As mentioned throughout this project,
[1:07:51] whenever there's a scenario that we
[1:07:53] don't expect to happen, like if we fail
[1:07:55] to match the validation error, we log
[1:07:59] the details and fall back on a more
[1:08:01] general API error code.
[1:08:05] For the database, I have maybe something
[1:08:08] of a controversial feature which I'll
[1:08:11] have to explain carefully.
[1:08:14] When we need to write to the database
[1:08:16] and one of the columns has a unique
[1:08:18] constraint or a foreign key constraint,
[1:08:22] the standard practice is to first
[1:08:24] validate the value with a select query
[1:08:28] and if no results are returned, then
[1:08:30] continue with the insert or update
[1:08:33] operation.
[1:08:35] So that's one database query for the
[1:08:37] failure scenario and two queries for the
[1:08:40] success scenario. But
[1:08:43] with a lot of caveats, we can
[1:08:46] potentially skip the validation in
[1:08:48] Laravel, write the value directly to the
[1:08:51] database and if a constraint is
[1:08:54] violated, handle the error returned by
[1:08:56] the database and inform the user of the
[1:08:59] input validation error.
[1:09:02] That means only one database query for
[1:09:05] both success and failure scenarios which
[1:09:09] reduces the load on the database and
[1:09:11] speeds up responses from the client's
[1:09:13] perspective.
[1:09:15] It also removes the race condition
[1:09:17] between the select query for the
[1:09:19] validation and the eventual write to
[1:09:22] database.
[1:09:24] So now for the downsides. One, it splits
[1:09:28] the validation logic in two, which is
[1:09:31] messy. I kept the constraints in the
[1:09:34] form request as comments, as reminders
[1:09:37] of what will be validated by the
[1:09:39] database.
[1:09:41] Two, it's not great for the developer
[1:09:44] experience. We just have to remember to
[1:09:47] not validate constraints in form
[1:09:49] requests and to employ our new strategy
[1:09:52] each time we want to insert or update.
[1:09:57] Three, the database treats the
[1:09:59] constraint violation as an error, not as
[1:10:02] a simple validation check, and it logs
[1:10:05] it as such. So, we'd need a log
[1:10:08] filtration system in production.
[1:10:12] Four, the database returns the error as
[1:10:15] a string which we have to parse
[1:10:18] and that is a somewhat fragile process.
[1:10:21] We need to run rigorous tests with many
[1:10:23] edge cases every time we change database
[1:10:26] version.
[1:10:28] And five, error responses vary per
[1:10:31] vendor. So we're tightly coupling our
[1:10:34] application with our initial choice of
[1:10:36] database vendor, in this case Postgress.
[1:10:41] These are all very serious cons and
[1:10:44] nothing else about this project is
[1:10:46] geared towards the high concurrency
[1:10:49] situation that would give value to the
[1:10:51] pros. So for a real project, I would
[1:10:54] almost certainly not implement this
[1:10:56] feature, but I wanted to explore it as
[1:11:00] an educational exercise. And it's a
[1:11:02] healthy exercise to predict the cons in
[1:11:04] advance, try it, and run head-on into
[1:11:08] any unexpected cons, and then get better
[1:11:10] at analysis of design choices in the
[1:11:13] future.
[1:11:15] So far with Postgress with unique and
[1:11:18] foreign key constraints, I haven't hit
[1:11:21] any critical problems from paring the
[1:11:23] error message.
[1:11:25] It returns a unique SQL state code that
[1:11:29] identifies which constraint was violated
[1:11:33] and the offending column is bounded by
[1:11:36] characters that are invalid for a column
[1:11:38] name and preceded by a substantial fixed
[1:11:42] string.
[1:11:44] Postgress does actually allow illegal
[1:11:46] characters in the column name if it's
[1:11:48] bounded by double quotes. So we'd have
[1:11:51] to check for that.
[1:11:53] Another nuisance is that Laravel
[1:11:55] interpolates the actual values into the
[1:11:58] Postgress error message. And for
[1:12:01] security, we don't want potentially
[1:12:03] sensitive values feeding into our psing
[1:12:05] logic, especially for a fragile process
[1:12:08] like this that has a lot of logging.
[1:12:13] So it works for Postgress but if for
[1:12:15] some reason we change database vendor
[1:12:18] we'd need to rewrite the parsing logic
[1:12:20] and there's no guarantee that the error
[1:12:22] message provides the required detail and
[1:12:25] format for us to parse.
[1:12:28] Here's the comparable message from SQL
[1:12:30] light.
[1:12:32] The SQL state code is more generic than
[1:12:34] Postgresses. So we'd have to parse the
[1:12:37] text to discover even which constraint
[1:12:39] was violated.
[1:12:42] There's also a small possibility that a
[1:12:44] new version of Postgress will change the
[1:12:47] error message in a way that hurts this
[1:12:49] feature.
[1:12:52] I implemented this feature with a trait
[1:12:54] called handles DB errors.
[1:12:57] For any code that inserts or updates a
[1:13:00] database record, we wrap it in a closure
[1:13:04] and in the handle DB errors method. and
[1:13:07] the closure is executed inside a try
[1:13:10] catch block looking for query exception.
[1:13:15] This wrapping design causes very minimal
[1:13:18] disruption and there's a low development
[1:13:20] cost to enabling or disabling the error
[1:13:23] handling feature.
[1:13:25] The design is very reusable. Just add
[1:13:28] the trait to any class that writes to
[1:13:30] database. And compared to a rigid method
[1:13:33] signature, the closure gives us complete
[1:13:36] flexibility around what variables to
[1:13:38] pass, what type to return,
[1:13:42] which interface to use to interact with
[1:13:44] the database, how many queries to run,
[1:13:48] what parts of the code to wrap in a
[1:13:50] database transaction,
[1:13:52] and what other actions we need to run
[1:13:54] alongside the queries.
[1:13:57] Currently, if a constraint violation is
[1:14:00] detected, we're immediately returning a
[1:14:02] validation error response to the client.
[1:14:06] We could consider making this more
[1:14:07] flexible, like allowing more closures to
[1:14:10] be passed to handle specific error
[1:14:12] scenarios. For example, we might want to
[1:14:16] alter our reaction based on which column
[1:14:18] violated the constraint.
[1:14:27] Also regarding the database, we're
[1:14:30] implementing a rule of no select star
[1:14:33] queries or no queries that return all
[1:14:36] columns. This is to reduce the chance of
[1:14:40] exposing sensitive data and to make
[1:14:42] queries faster to run.
[1:14:45] So for any resource that will be output
[1:14:48] by the API, we're defining a resource
[1:14:50] class where we define how the raw output
[1:14:54] is processed into the API output.
[1:14:58] In this case, we're just renaming UU ID
[1:15:01] into ID.
[1:15:04] And we're defining the columns to be
[1:15:06] injected into the select query to fetch
[1:15:08] only the relevant columns.
[1:15:11] For the naming scheme, we have the model
[1:15:14] name item, then public or private
[1:15:18] explicitly warning if this resource will
[1:15:20] be output by the API or not. For
[1:15:23] example, the UU ID is for public usage
[1:15:27] while the incrementing integer ID is
[1:15:29] strictly for internal usage.
[1:15:32] And we have full or minimal to denote
[1:15:35] which columns to include.
[1:15:38] Pagionated results would typically use
[1:15:40] minimal while the results of a single
[1:15:43] specified resource would typically use
[1:15:46] full and include more columns.
[1:15:49] Then in the model class, it imports the
[1:15:51] columns constant to run the relevant
[1:15:54] queries.
[1:15:59] Let's run through the life cycle of a
[1:16:01] standard request. The first thing we do
[1:16:04] is create an instance of request context
[1:16:07] service which will hold auxiliary
[1:16:10] information about the request.
[1:16:12] We define it as a singleton so that
[1:16:15] anytime it's referenced in a method
[1:16:17] signature throughout the application,
[1:16:19] the service container will inject the
[1:16:21] same instance with the same properties
[1:16:24] similar to how request itself is
[1:16:26] handled.
[1:16:28] Immediately we store the current time so
[1:16:31] that we can calculate the wall clock
[1:16:33] duration at the end of the script
[1:16:35] execution and we store the current
[1:16:38] resource usage of the PHP FPM worker
[1:16:42] process in order to calculate the CPU
[1:16:44] time duration at the end of the script.
[1:16:48] There's an empty array to store the
[1:16:49] duration of any database queries which
[1:16:52] will also be logged at the end. And the
[1:16:55] request ID set way back in the ingress
[1:16:58] controller is saved here and added to
[1:17:01] the context of any logs that will be
[1:17:02] written.
[1:17:05] The logging middleware will call get
[1:17:07] duration milliseconds which is pretty
[1:17:10] simple and also get CPU time
[1:17:14] milliseconds which requires some
[1:17:16] explanation.
[1:17:20] R U
[1:17:22] time.tv TV sec is the number of whole
[1:17:26] seconds the PHP FBM worker process has
[1:17:30] spent in user mode like the PHP
[1:17:33] interpreter doing work.
[1:17:36] The same key with us or microsconds
[1:17:40] instead of sec is the number of
[1:17:42] microsconds towards the next whole
[1:17:45] second in user mode. So it's bounded by
[1:17:49] 1 million.
[1:17:51] RUS time.tv
[1:17:55] sec is the number of whole seconds the
[1:17:58] PHP FPM worker process has spent in
[1:18:01] kernel mode. So that's system calls,
[1:18:05] memory mapping, network stack
[1:18:07] processing etc.
[1:18:10] And the same key with USC instead of SE
[1:18:13] is the number of microsconds towards the
[1:18:15] next whole second in kernel mode.
[1:18:19] The most intuitive mathematical approach
[1:18:21] to calculating the CPU time duration of
[1:18:25] the script is to combine the seconds
[1:18:28] with microsconds
[1:18:30] and sum the user time and kernel time
[1:18:34] and then subtract the start time from
[1:18:36] the end time just like we do for the
[1:18:39] wall clock duration.
[1:18:41] I was worried that floating point
[1:18:43] accuracy might be a problem here.
[1:18:46] Floating point accuracy is 15 to 17
[1:18:49] significant figures. So adding
[1:18:51] everything up to a large total before
[1:18:54] the final subtraction could cause some
[1:18:57] precision to be lost.
[1:19:00] And since the difference between the
[1:19:01] start and end times will generally be
[1:19:03] tiny, that loss of resolution could mean
[1:19:06] we get a result of zero once the PHP FPM
[1:19:10] worker passes a certain age.
[1:19:14] So maybe it would be better to have a
[1:19:16] mathematically equivalent but less
[1:19:18] intuitive formula that subtracts large
[1:19:21] numbers from large numbers and sums the
[1:19:24] results at the end.
[1:19:27] But I tried some rough calculations and
[1:19:30] actually the loss of resolution happens
[1:19:32] sometime after 250 years. So yes, the
[1:19:36] intuitive formula will do fine.
[1:19:41] The CPU time duration will inform our
[1:19:43] decision on setting the PHP directive
[1:19:46] max execution and the wall clock
[1:19:48] duration helps with the PHP FPM
[1:19:51] directive request terminate timeout.
[1:19:56] The last two methods are to log the size
[1:19:59] of the response headers and the body.
[1:20:04] The only thing to note here is that the
[1:20:06] headers are ASKI only. So we can always
[1:20:09] safely use strlen which is faster than
[1:20:13] the multibbyte equivalent mb strl ln.
[1:20:19] The body is UTF8.
[1:20:21] So potentially nonasi.
[1:20:24] So we can only use strl ln as long as in
[1:20:28] php.ini
[1:20:30] we set mbstring.funk
[1:20:33] overload to zero.
[1:20:36] Otherwise, we'd have to use MB_ST
[1:20:40] strlen and specify 8 bit encoding to be
[1:20:44] sure we're getting the number of bytes
[1:20:46] instead of the number of characters.
[1:20:50] Then we add a listener for the database
[1:20:53] to add the query time to the query
[1:20:55] duration array we just created.
[1:20:59] And next we configure the rate limiter,
[1:21:03] although it isn't actually applied at
[1:21:05] this point in the request.
[1:21:07] Engine X rate limits by IP and
[1:21:10] intercepts those requests before they
[1:21:12] reach Laravel. So in Laravel, we're
[1:21:15] limiting by user ID and that needs to
[1:21:18] happen after authentication.
[1:21:21] For us, it happens later in
[1:21:23] roots/api.php
[1:21:25] PHP straight after the authentication
[1:21:28] middleware.
[1:21:30] Malicious users can get around this with
[1:21:33] a coordinated system of multiple
[1:21:35] accounts and multiple IPs.
[1:21:38] If we were worried about that and we
[1:21:41] couldn't control user accounts or
[1:21:43] whitelisted IPs more tightly, then we
[1:21:46] might consider running pattern
[1:21:47] recognition of usage, i.e. combine the
[1:21:51] usage of two or more users. assess them
[1:21:55] as if they were a single user and see if
[1:21:57] it adds up to a malicious usage pattern.
[1:22:02] Or we might try to gather a list of
[1:22:04] known VPN IP addresses and monitor those
[1:22:07] accounts more closely.
[1:22:10] For EngineX's rate limit on IP
[1:22:12] addresses, we need to consider if our
[1:22:15] clients are in a business at the same
[1:22:17] address.
[1:22:18] For example, if all users log on at 9:00
[1:22:22] a.m. on the same IP address, then we
[1:22:24] might need to loosen X's rate limit on
[1:22:27] IP addresses.
[1:22:33] Next, the request hits the middleware.
[1:22:36] I've disabled global middleware and
[1:22:39] moved most of the default middleware to
[1:22:41] the API group.
[1:22:43] That's because the web group can only be
[1:22:46] accessed internally within the
[1:22:47] Kubernetes cluster.
[1:22:50] For trust proxies, we're providing the
[1:22:53] cider range that the ingress controller
[1:22:55] is guaranteed to be within. And then
[1:22:58] Laravel knows that the client's IP is
[1:23:01] the real IP and that the connection is
[1:23:04] indeed secure.
[1:23:07] Then API requests pass to root/api.php.
[1:23:11] PHP
[1:23:13] and for routes that require
[1:23:14] authentication the requests pass through
[1:23:17] the authentication middleware and the
[1:23:20] rate limiter middleware we discussed
[1:23:22] earlier.
[1:23:24] I extended the authenticate class as a
[1:23:27] convenient way to add the user ID to the
[1:23:30] context of logs created after
[1:23:32] authentication
[1:23:34] and also because in the original class
[1:23:36] if an unauthenticated request didn't
[1:23:39] have the accept header of
[1:23:42] application/json
[1:23:44] it tried to redirect the user to a login
[1:23:46] page but our API is purely JSON so
[1:23:50] that's not the desired behavior.
[1:23:53] The last endpoint is for browsers to
[1:23:55] report content security policy
[1:23:57] violations which go straight into the
[1:23:59] log.
[1:24:02] The web routes are mostly for the
[1:24:04] Kubernetes probes mentioned earlier. /
[1:24:07] Laravel readiness simply replies
[1:24:10] immediately.
[1:24:12] / Laravel startup if you recall tests
[1:24:16] the connections to the database and the
[1:24:18] cache in a try catch block and returns
[1:24:21] an error HTTP status code if there's any
[1:24:24] problem
[1:24:26] and / Laravel status is for my own
[1:24:29] tinkering and analysis of real cache and
[1:24:32] opcache in production to further tweak
[1:24:35] those configurations.
[1:24:37] Unfortunately, we can't just run those
[1:24:39] commands in the command line interface
[1:24:42] because the CLI is separate from PHP FPM
[1:24:45] and maintains its own obcache memory
[1:24:48] pool.
[1:24:49] And finally, an API request runs back
[1:24:52] through the middleware.
[1:24:54] The handle cause middleware along with
[1:24:57] its config lets us tell browsers that
[1:25:00] the Vue.js front end is permitted to
[1:25:02] access the API.
[1:25:05] And the inject meta middleware adds the
[1:25:08] meta object to each JSON response object
[1:25:11] as it's passing out of the door.
[1:25:14] After the response has been sent, the
[1:25:16] last action is to log the request for
[1:25:18] future analysis.
[1:25:23] I won't cover the Vue.js front end
[1:25:25] because this video is already quite long
[1:25:28] and it just logs into the API, stores
[1:25:30] the token and interacts with the API in
[1:25:33] a simple manner.
[1:25:35] All the source code is available in the
[1:25:37] description of the video and please let
[1:25:39] me know if you have any questions or any
[1:25:42] improvements on any part of the code.
[1:25:44] Thanks.
⚡ Saved you 1h 25m reading this? Transcribe any YouTube video for free — no signup needed.