[0:00] Hello everyone. This video is a
[0:02] discussion of a Laravel restful API
[0:04] project using PHP FPM and engine X with
[0:08] a Postgress database all in docker
[0:11] containers deployed in Kubernetes.
[0:15] So I'll cover the setup in Kubernetes,
[0:18] the choices and thought processes when
[0:20] configuring PHP FPM and EngineX, the
[0:24] design of the development environment,
[0:27] testing, deployment, and some features
[0:29] of the API. Plus, there's a Vue.js front
[0:33] end thrown in as well to log into the
[0:35] API and interact with it. It's a bit too
[0:38] simple to discuss in this video, but
[0:40] it's also included in the code, and all
[0:43] the code is in the repository in the
[0:45] video description.
[0:49] The production environment is my
[0:51] Kubernetes cluster at home with a single
[0:54] control plane and two worker nodes on a
[0:57] pretty slow internet connection,
[0:59] but that doesn't have too much impact on
[1:01] the project and it can mostly be applied
[1:03] anywhere.
[1:06] The API itself has been kept generic so
[1:08] you won't have to listen to any domain
[1:10] specific concepts. The features I've
[1:13] added are focused on internal workings
[1:15] rather than a specific API use case. So
[1:18] they could be applied and expanded to
[1:20] many use cases.
[1:22] For example, responses are restricted to
[1:26] a set predictable structure, and each
[1:29] error has a unique integer error code,
[1:31] so the client can more easily handle
[1:34] error responses programmatically.
[1:37] Input validation for database columns
[1:39] with unique or foreign key constraints
[1:42] have been pushed to the database to
[1:44] reduce the number of queries needed.
[1:47] That's a controversial one, so we'll
[1:49] discuss the pros and the cons.
[1:52] There's also a fix for a very common
[1:54] enginex inefficiency regarding HTTP
[1:57] compression
[1:59] and there are some helpful logs to help
[2:02] us set the PHP FPM and enginex
[2:05] configurations.
[2:08] So the structure of the video is
[2:11] production environment,
[2:13] PHP FPM and EngineX configuration,
[2:16] the Docker files, the dev environment
[2:20] and something of a CI/CD pipeline but
[2:23] very minimal and the Laravel API itself
[2:28] and discussion of security
[2:30] considerations throughout.
[2:35] So what does the production cluster look
[2:37] like? The ingress controller manages
[2:40] traffic for the cluster, handles TLS
[2:43] termination, and applies HTTP
[2:45] compression. It forwards API requests to
[2:49] the EngineX and Laravel pod. If traffic
[2:53] increases, the EngineX and Laravel pod
[2:55] is duplicated by a horizontal pod
[2:58] autoscaler.
[2:59] So, as usual, we want to make sure that
[3:02] it's completely stateless.
[3:05] Then for the database we're using
[3:07] Postgress managed by a stateful set.
[3:11] Postgress zero is the primary instance
[3:13] that all Laravel instances write to.
[3:17] Postgress one and two and so on are the
[3:20] readonly standby instances to to
[3:23] increase read throughput if we have high
[3:25] concurrent usage.
[3:28] So whenever a new standby instance
[3:30] starts up, it duplicates the primary
[3:32] database with pg
[3:35] base backup.
[3:37] And whenever the primary instance
[3:39] executes a write operation, it sends the
[3:42] W or write ahead log records to each
[3:47] standby so that all instances are
[3:49] synchronized in near real time.
[3:52] To configure this in Laravel, in the
[3:55] database config file, add read and write
[3:58] elements to the database you're using,
[4:02] specifying the pod for the right
[4:04] instance and the service for the read
[4:06] instances.
[4:09] And for the development or testing
[4:10] environments, override that in the
[4:13] Laravel.env env file specifying the
[4:16] database's service name in docker
[4:18] compose
[4:22] for cache we're using reddus
[4:25] and of course there's the view front end
[4:27] which is mostly decoupled it could be
[4:29] hosted here or anywhere else and finally
[4:33] all these components are scraped by
[4:35] prometheus and graphana molds that data
[4:39] into dashboards which are also
[4:42] accessible through ingress controller
[4:46] Since the cluster has so few nodes, we
[4:49] can present the CPU and memory usage of
[4:52] each node in a single dashboard.
[4:55] And the other dashboard I watch most is
[4:58] for the ingress controller, especially
[5:00] latencies for the three connected
[5:02] services, the API, the front end, and
[5:06] the graphana front end we're using right
[5:08] now.
[5:10] On my slow home internet, the latencies
[5:12] don't help us model production systems
[5:15] very much. They're pretty slow, but the
[5:18] variance can still be insightful.
[5:20] The variance is expressed here with the
[5:23] average latency of the fastest 50%, 95%
[5:28] and 99% of requests.
[5:35] In the ingress controller, we're most
[5:38] commonly receiving requests from the CDN
[5:41] with the original client's IP in the X
[5:45] forwarded for header.
[5:47] So by specifying the CDN cider ranges
[5:50] that we trust, we can safely strip them
[5:53] from the forwarding chain and just leave
[5:56] the client's IP.
[5:58] That's important for logging and for
[6:00] rate limiting by IP address downstream
[6:04] in EngineX or Laravel.
[6:07] The ingress controller like any
[6:09] engineext instance generates a random
[6:12] request ID and here we're attaching it
[6:15] to the web request with a custom header.
[6:19] And that way we can reference a
[6:21] consistent request ID here in the
[6:23] ingress controller and also as the
[6:26] request passes on through engine X and
[6:28] Laravel and back again.
[6:32] Regarding HTTP compression,
[6:35] by default, gzip and brley compress
[6:39] files of any size, but they both add
[6:42] metadata to the compressed files. So for
[6:45] files that are already small, we're
[6:47] actually expending CPU power to compress
[6:51] a file and actually increase the amount
[6:53] of data to be transferred.
[6:57] So we specify brutley min length and
[7:00] gzip min length to set the file size
[7:03] threshold at which the ingress
[7:06] controller will apply compression.
[7:09] But what's the logical threshold to set?
[7:13] Well, the equilibrium point at which
[7:16] compression starts to reduce file size
[7:18] is different per file, but is usually
[7:21] between 100 and 400 kilob.
[7:25] So, is that the answer? Well, not
[7:28] really. When we send a message over TCP,
[7:32] as long as the payload fits within one
[7:34] TCP segment,
[7:36] the size of that payload has a
[7:38] negligible impact on the transmission
[7:41] time.
[7:42] It's like adding more passengers to a
[7:44] plane. It has almost no impact on the
[7:47] flight time.
[7:50] So from the client's perspective,
[7:52] latency doesn't scale linearly with the
[7:55] file size. It potentially jumps stepwise
[7:59] when the file size necessitates sending
[8:02] another TCP segment.
[8:05] That means HTTP compression is
[8:08] worthwhile only if it has a decent
[8:11] likelihood of reducing the number of TCP
[8:14] segments needed to hold a particular
[8:16] payload.
[8:18] So a legitimate strategy is to set the
[8:21] HTTP compression threshold at the file
[8:25] size that would trigger a second TCP
[8:28] segment.
[8:31] The maximum size of a TCP segment
[8:34] depends on the MTU, the maximum
[8:38] transmission unit on layer 3 of the OSI
[8:41] model, which is 1,500 bytes.
[8:46] Each IP packet has an IP header taking
[8:50] 20 to 60 bytes and a TCP header taking
[8:55] 20 to 40 bytes. And the rest of the
[8:59] 1,500 bytes is for the payload.
[9:03] For the largest possible payload that is
[9:06] still contained in one TCP segment,
[9:10] it would contain the TLS record for
[9:12] encryption, the HTTP headers, and
[9:16] finally once all of that is accounted
[9:18] for, the remaining space is for the HTTP
[9:21] body, which is the candidate for
[9:23] compression.
[9:26] In this project, it's too early to
[9:27] finalize what our standard HTTP headers
[9:30] will be. So, we can't finalize this
[9:33] optimization yet. But this is the
[9:35] formula that will decide it once all of
[9:37] the components are confirmed.
[9:41] But there's one last twist. If the
[9:44] response doesn't have a content length
[9:47] header, then EngineX ignores the gzip
[9:51] min length and brought minength
[9:54] directives and compresses every file
[9:57] regardless of file size.
[10:00] So for that reason in Laravel, we've got
[10:03] a middleware to measure the body and add
[10:06] the content length header.
[10:09] There's a PHP.ini ini directive called
[10:12] MB string.f funk_over.
[10:17] If we set it to zero, we can safely use
[10:19] the faster stren function for adding the
[10:24] content length header.
[10:26] Otherwise, we'd need to use the
[10:28] multibbyte equivalent mb
[10:32] strlen and specify 8 bit encoding to be
[10:36] sure we're getting the number of bytes
[10:38] instead of the character count.
[10:41] We have to make sure this middleware is
[10:42] late in the stack after any middleware
[10:45] that will modify the body because if the
[10:48] response body is longer than the content
[10:50] length header, engine X will cut off the
[10:52] excess.
[10:58] Okay, onto Laravel and Engine X. There
[11:02] are four ways of connecting PHP FPM with
[11:06] EngineX in Kubernetes.
[11:08] The first decision is whether to put the
[11:11] containers in separate pods or in the
[11:13] same pod.
[11:16] Engine X can handle far more connections
[11:18] than PHP FPM can and separate pods would
[11:22] let us scale them independently. So we
[11:25] could save the memory overhead of the
[11:27] excess engine X pods.
[11:30] Each engine X pod has an overhead of
[11:32] about 10 MGB.
[11:36] The downside is that EngineX could
[11:38] forward requests to PHP FPM instances on
[11:42] different nodes, which would add network
[11:45] latency.
[11:46] Getting around this is fiddly and might
[11:49] not be worth the memory saved.
[11:52] Also, separate pods means that the logs
[11:55] for a particular request would be split
[11:58] up between different pods.
[12:04] If we go with a single pod, that opens
[12:07] up the choice of how EngineX and PHP FPM
[12:10] will talk to each other. Either by using
[12:13] TCP or by mounting a shared volume onto
[12:17] both containers to hold a Unix socket
[12:20] file.
[12:22] Each TCP connection needs to be
[12:25] established with a TCP handshake
[12:27] consisting of three messages sent back
[12:29] and forth with the default EngineX
[12:33] configuration. That handshake happens
[12:35] for every single request rooted to PHP
[12:38] FPM.
[12:40] Each message sent is bundled with IP and
[12:43] TCP headers increasing the amount of
[12:46] data to be transferred. And depending on
[12:48] the size of the request and the
[12:50] response, they might be broken up into
[12:53] packets before sending, then reassembled
[12:56] at the other end. And finally, the
[12:58] connection needs to be torn down with
[13:00] another three messages.
[13:03] Due to all this redundant stuff, I
[13:06] really expected sockets to be measurably
[13:08] faster than TCP.
[13:11] Maybe the tearown would happen
[13:12] concurrently to the response being sent
[13:14] out, but the rest of it surely increases
[13:17] latency.
[13:19] However, in the very quick tests that I
[13:22] set up, latency was almost identical
[13:25] between the two methods.
[13:27] That's interesting academically, and I
[13:30] want to investigate more when I have the
[13:32] time. But practically speaking, if we're
[13:36] interested in such small savings in
[13:37] latency, then we'd likely be better off
[13:40] considering SWUL instead of PHP FPM and
[13:43] EngineX.
[13:45] And then this whole architectural
[13:47] decision would go away.
[13:50] Both TCP and sockets benefit from the
[13:53] EngineX directive fast CGI keep con
[13:57] which keeps the connection open between
[14:00] requests. So we wouldn't need the TCP
[14:02] handshake for every request.
[14:05] The PHP FPM counterparts are PM do
[14:09] process idle timeout which sets the time
[14:13] a worker can be idle before it's killed
[14:17] and PMAX
[14:19] requests which caps how many requests a
[14:22] worker can serve before it's respawned.
[14:27] And we need to set fast CGI pass in
[14:30] engineext and listen in PHP FBM.
[14:36] For TCP connections, we tell them which
[14:38] port and for socket, we tell them the
[14:41] location of the socket file.
[14:46] Then the fourth and final option for
[14:48] connecting enginex with PHP FPM in
[14:51] Kubernetes is putting them in the same
[14:54] container in the same pod.
[14:57] This is workable but it creates
[14:59] complications.
[15:01] Kubernetes monitors the status of the
[15:03] process with P1 as a health check. If
[15:07] the process exits, Kubernetes stops the
[15:10] container even if other processes are
[15:13] running. And if another process exit,
[15:16] Kubernetes has no idea and does nothing.
[15:20] So if a container has more than one key
[15:22] process, we need to implement a
[15:24] replacement for the Kubernetes health
[15:26] checks and process management.
[15:31] In the end, I went with the middle
[15:33] ground option of same pod, different
[15:36] containers with shared volume between
[15:38] them for a Unix socket file. Mostly
[15:42] because I hadn't tried this setup
[15:43] before.
[15:45] And we'll see the detailed
[15:46] implementation in the Kubernetes
[15:48] manifest and the Docker file later in
[15:50] the video.
[15:55] So with that decided, let's jump into
[15:57] the pod manifest.
[15:59] As I explain components, I'll also build
[16:02] up this visual representation to help
[16:05] visualize how everything links together.
[16:08] Obviously the Laravel and EngineX
[16:10] containers are at the core and the first
[16:13] step is to inject the configuration
[16:15] files with Kubernetes secrets and config
[16:18] maps and to mount the shared volume for
[16:21] PHP FPM to create the socket file we
[16:24] just talked about.
[16:27] If we can make the whole file system in
[16:29] a container read only, that's great for
[16:32] security. So we need to identify where
[16:35] the application needs right access and
[16:38] then mount volumes at those locations
[16:41] with right access and make everything
[16:43] else read only
[16:45] and the socket volume is our first one
[16:47] of those.
[16:52] The Laravel Bootstrap caches never
[16:55] change in production.
[16:57] So at first thought we'd run the cache
[16:59] creation commands in the Docker file so
[17:02] that they're part of the image and then
[17:04] we'd make them read only in production.
[17:08] That's fine for event cache, root cache,
[17:11] and view cache. But config cache needs
[17:15] to read thev file.
[17:19] For security reasons, we can't put
[17:21] sensitive files like thev into the
[17:24] image. So the only safe option is to run
[17:28] config cache within each pod as it
[17:31] starts up.
[17:33] So that means the cache directory needs
[17:36] to have right access in production. But
[17:39] we'd prefer it to only have read access
[17:41] because it never changes once created
[17:45] and any attacker that gets right access
[17:47] to the bootstrap caches could obviously
[17:49] do serious damage.
[17:52] But there is a way to get the best of
[17:54] both worlds.
[17:56] First, in the Docker file, we rename the
[17:59] bootstrap/cache
[18:01] directory to something else like cache
[18:05] temp.
[18:07] Then in Kubernetes, we run a laravel
[18:10] init container when the pod starts up.
[18:14] It has the env file injected and a
[18:18] writable volume mounted at
[18:20] bootstrap/cache.
[18:24] We copy everything from the temp cache
[18:26] directory into the volume at
[18:29] bootstrap/cache
[18:31] and then run php artisan config cache to
[18:37] create the final bootstrap cache file.
[18:41] The time command just logs the memory
[18:43] usage to help us set resource limits
[18:45] later.
[18:48] Then after the init container has
[18:50] finished, the main Laravel container
[18:53] starts up with the cache volume mounted
[18:56] as readonly true.
[19:00] And that's how we get readonly cache
[19:02] with sensitive data compiled at the pod
[19:06] startup.
[19:11] We're also running additive database
[19:14] migrations in the init container.
[19:17] The d- isolated flag is very
[19:20] consequential.
[19:22] It means while this migration is
[19:24] running, although normal reads and
[19:26] writes can still happen concurrently,
[19:29] no other migrate command with the
[19:32] isolated flag can begin.
[19:35] A migrate command without the isolated
[19:38] flag can still run. So it's important
[19:40] that we add it to every migrate command
[19:43] in production to make sure concurrent
[19:46] migrations are impossible.
[19:49] It uses the cache to track whether the
[19:52] isolated command is running or not
[19:55] currently.
[19:56] So if you're using the database for
[19:59] cache, it creates a catch22 situation
[20:02] for your first migration. you'd need to
[20:04] run migrations once without the isolated
[20:07] flag first,
[20:09] but most applications would just use
[20:11] Reddus, so it wouldn't be a problem.
[20:16] An unexpected problem I ran into is that
[20:20] if a container crashes and gets
[20:22] restarted by Kubernetes,
[20:24] the mounted volumes are not cleared.
[20:28] That's the behavior we want most of the
[20:30] time. We want data to be permanent for
[20:33] the life of the pod, but it can cause
[20:36] some misleading error logs. In the init
[20:39] container startup script, it copied and
[20:42] created the cache files with no problem.
[20:45] Then it hit an error with migrations.
[20:48] So, Kubernetes killed the container and
[20:51] ran it again. But on the second run, the
[20:54] volume was already populated. So the
[20:57] error logs were about file permissions,
[20:59] not the migration commands.
[21:02] So just bear that in mind. If all else
[21:05] is equal, move any operations on mounted
[21:08] volumes to the end of the script. But in
[21:11] our case, migrations actually depend on
[21:13] the cache files being present.
[21:18] We've got another init container to set
[21:20] up the directory structure for EngineX's
[21:23] writable volume.
[21:25] I prefer to use chain guard images to be
[21:28] minimal and more secure, but the CPUs of
[21:31] my nodes are too old and don't support
[21:33] them.
[21:35] And finally, we have one last container
[21:38] that scrapes the enginex metrics
[21:40] endpoint and presents that data in a
[21:43] format that Prometheus can then scrape.
[21:46] Port 8080 is for publicly available
[21:50] endpoints via ingress. Port 8081
[21:54] is for internal traffic like health
[21:56] checks and metrics and then Prometheus
[21:59] scripts the exporter on its default port
[22:02] 9113.
[22:08] Kubernetes provides three types of
[22:10] probes or health checks. Probes are
[22:14] attached to containers, but the actions
[22:17] on success or failure can impact either
[22:20] that container it's attached to or the
[22:23] whole pod.
[22:25] When a pod starts up, if at least one
[22:28] container has a startup probe, then that
[22:31] pod won't initially be added to the
[22:34] services end points. So, it won't be
[22:37] accessible by other pods or by outside
[22:40] traffic from ingress.
[22:43] If a startup probe fails, the container
[22:46] it's attached to is killed and by
[22:49] default configuration in Kubernetes, any
[22:52] killed container is instantly restarted.
[22:56] So that's hoping that a restart or a
[22:58] slight delay will fix whatever caused
[23:01] the startup probe to fail.
[23:04] Once a container startup probe passes,
[23:07] it will never run again. And instead,
[23:10] the container's readiness probe and
[23:12] livveness probe begin running if it has
[23:15] them. And they will then run repeatedly
[23:18] for as long as the pod exists.
[23:21] Once all startup probes in a pod pass
[23:25] and if there are no readiness probes,
[23:28] then the pod is added to the services
[23:30] endpoints and it starts serving traffic.
[23:34] If there are one or more readiness
[23:36] probes, then the pod waits on them.
[23:40] Once all readiness probes pass, the pod
[23:44] is added to the services endpoints.
[23:47] But the readiness probes keep running
[23:49] continuously.
[23:51] And if at any time a readiness probe
[23:53] fails, the pod is removed again. So a
[23:57] failed readiness probe only has a pod
[23:59] level effect. It doesn't kill the
[24:01] container.
[24:03] That's what livveness probes are for.
[24:05] When a livveness probe fails, it has no
[24:09] pod level effect. The pod can still
[24:11] receive external traffic. Instead, a
[24:14] failed livveness probe kills the
[24:16] container it's attached to.
[24:20] So, to summarize, a failed readiness
[24:22] probe stops traffic flow reaching the
[24:25] pod. A failed livveness probe kills the
[24:28] container it's attached to. And a failed
[24:32] startup probe does both of those, but
[24:35] only when the pod is starting up.
[24:39] Phew. I think that's the most concise
[24:41] summary of probes I can give. So, how
[24:44] can we apply these to our Laravel and
[24:46] EngineX pod?
[24:49] Well, we don't want requests reaching a
[24:52] broken pod. So, a readiness check is
[24:55] crucial.
[24:57] And the ability to serve requests
[24:59] depends on EngineX and Laravel both
[25:02] working. So we put the readiness probe
[25:05] on the engine X container to query a
[25:08] Laravel endpoint that returns a simple
[25:12] plain text response.
[25:14] That means the readiness check only
[25:16] passes if EngineX and Laravel and their
[25:19] connection are all fine.
[25:23] The ability to serve requests also
[25:25] depends on the connection to the
[25:27] database and the cache. So we might
[25:31] consider checking those connections as
[25:33] part of the readiness check.
[25:36] But if a database problem did occur, it
[25:40] would affect all Laravel pods.
[25:43] We wouldn't have a mix of healthy and
[25:45] unhealthy pods. And the readiness probe
[25:49] would react by removing this pod from
[25:51] its service, which doesn't do anything
[25:54] to solve the database problem. And
[25:57] actually we'd slightly prefer to keep
[25:59] the Laravel pod serving requests to give
[26:02] as graceful a response as possible.
[26:06] And then we'd rely on some other probe
[26:09] to heal the database problem closer to
[26:11] where it occurred. So no, the readiness
[26:15] probe should not check the connections
[26:17] to database and cache.
[26:20] However, it is a good idea to add those
[26:23] connections to the startup probe. That's
[26:26] so that if we're deploying a new version
[26:28] and I've messed up the connection
[26:29] configurations,
[26:31] Kubernetes will stop it going live and
[26:33] keeps the old version alive serving
[26:36] requests.
[26:39] Another reason to have a startup check
[26:41] is because we're doing opcache
[26:42] preloading at startup. So we need some
[26:45] flexibility around the slightly
[26:47] unpredictable bootup time.
[26:51] The startup probe of course also needs
[26:53] to check both enginex and Laravel and
[26:56] their connection. So we add it to the
[26:58] engineext container and call a startup
[27:01] endpoint in Laravel.
[27:05] This design has one perverse side effect
[27:07] that if the startup probe fails, it
[27:10] kills the engine X container it's
[27:12] attached to. Even though the problem is
[27:15] much more likely to come from the
[27:16] Laravel container or the database or
[27:19] cache connections, but that's one
[27:21] imperfection I think we can live with.
[27:25] And finally, what about livveness
[27:27] probes? Well, in the engineext
[27:30] configuration, I created an endpoint
[27:32] that returns a simple plain text
[27:34] response, but I think the chance of
[27:36] EngineX messing up is so unlikely.
[27:39] Currently, I don't think it's worth
[27:40] running a constant probe.
[27:43] Laravel is a bit more likely to mess up.
[27:45] So, we've got a livveness probe querying
[27:48] the PHP FPM status page.
[27:56] The last big topic of this manifest is
[27:58] the memory limits that Kubernetes
[28:00] imposes on each container. So this is
[28:03] where we'll transition to the topic of
[28:05] configuration for PHP, PHP FPM and
[28:09] engine X.
[28:11] The Kubernetes memory limit is a fail
[28:13] safe that kills the container if it's
[28:16] exceeded.
[28:17] That's a pretty drastic action. So we
[28:20] need to set it high enough to cover the
[28:22] peak memory usage in normal operation
[28:26] so that it's only triggered by abnormal
[28:28] memory usage that we want to catch early
[28:30] and contain.
[28:33] This is my formula to estimate peak
[28:35] usage in normal operation.
[28:38] My understanding can be improved further
[28:40] but I think this is a decent formula for
[28:42] now. And at the end we use a margin
[28:45] component to represent the degree of
[28:48] confidence we have in our estimation.
[28:51] PHP FBM has one master process and a
[28:55] variable number of workers that serve
[28:57] web requests. So we need to determine
[29:00] which memory expenses are per worker and
[29:04] which are shared between all workers.
[29:07] The master process overhead, the PHP
[29:10] interpreter and its extensions, OPC
[29:13] cache, and any mounted volumes stored in
[29:16] memory should all be counted once. And
[29:19] then everything in the brackets is
[29:21] multiplied by the maximum number of
[29:23] workers specified by the PHP FPM
[29:26] directive, PM domax children.
[29:31] For an application that's 100% CPU
[29:34] inensive, we'd set PM max children equal
[29:39] to the number of CPU cores available to
[29:41] the container.
[29:43] But the larger the IO weight is expected
[29:46] to be like waiting for database queries
[29:49] to come back, the more we can raise PMAX
[29:54] children above the number of cores.
[29:58] Workers process one request at a time.
[30:01] So memory usage doesn't scale infinitely
[30:04] as the concurrency of requests grow. The
[30:08] number of workers is a cutoff. So
[30:10] pm.mmax children is the ultimate cutff.
[30:14] And any queue of waiting requests mostly
[30:17] consumes engineext's memory allowance,
[30:20] not PHP FPMs.
[30:24] Memory limit in PHP.ini INI is the
[30:28] memory usage failsafe on script
[30:31] execution.
[30:33] So just like the Kubernetes memory
[30:35] limit, we need to predict the peak
[30:37] memory usage in normal operation of a
[30:40] single script this time and add a margin
[30:42] of confidence.
[30:45] If abnormal memory usage happens in a
[30:48] script, we want the PHP memory limit to
[30:51] kill that script.
[30:54] And the additional margin of the
[30:56] Kubernetes memory limit means it can
[30:58] only be triggered in an even rarer and
[31:01] more extreme situation and it would kill
[31:03] the whole container, not just a single
[31:06] request.
[31:08] To help set PHP's memory limit in
[31:11] Laravel, we're logging the peak memory
[31:13] usage for every request.
[31:16] Similarly, we're also logging the real
[31:18] path cache size and the worker ID.
[31:22] Doing it in the middleware's terminate
[31:24] method means it's executed after the
[31:27] response is sent out.
[31:30] We can also use Xdebug profiling to find
[31:33] exactly how memory is used in a
[31:35] particular request execution.
[31:38] And while profiling in Laravel, we
[31:41] should disable garbage collection at the
[31:43] start of the script to ensure accurate
[31:45] readings.
[31:47] So there's a setting in thev file to
[31:49] toggle garbage collection.
[31:55] We have a similar structure for the
[31:57] engine X prediction of peak memory usage
[32:00] in normal operation.
[32:03] Shared resources are counted once and
[32:06] the items in brackets are per
[32:08] connection. So they get multiplied by
[32:11] worker processes which is the number of
[32:13] workers and worker connections which is
[32:17] the maximum number of connections per
[32:19] worker.
[32:21] The default for worker connections is
[32:24] 512
[32:26] and that can realistically be set as
[32:27] high as 10,000 or more.
[32:30] That means that since we will add a
[32:32] margin of confidence to each of these
[32:34] buffer sizes here to ensure they can
[32:36] satisfy legitimate requests,
[32:39] those margins would then be multiplied
[32:42] thousands or tens of thousands of times
[32:44] when we're calculating the container
[32:46] level memory limit.
[32:49] We would then be reserving a huge amount
[32:51] of memory for a peak usage scenario that
[32:54] is very unlikely to occur.
[32:57] So to manage that problem, first we need
[33:00] to align worker connections with the
[33:03] peak request concurrency we want to
[33:05] guarantee satisfying and with the memory
[33:08] or the financial constraints that we
[33:11] have.
[33:12] Second, let's consider the size of these
[33:15] buffers extremely carefully. Can we
[33:18] restrict them without hurting UX?
[33:21] Does exceeding a particular buffer kill
[33:23] a request or does it just downgrade
[33:26] performance and by how much?
[33:29] So with that as our goal, how do these
[33:31] buffers work? For an incoming request,
[33:35] EngineX puts the headers into the client
[33:38] header buffer.
[33:40] If the headers exceed that buffer,
[33:43] EngineX puts them into the large client
[33:46] header buffers.
[33:49] And if in that scenario the initial
[33:52] client header buffer is no longer used,
[33:54] we could safely remove that from our
[33:56] peak usage formula. But I haven't had
[33:59] time to experiment with that yet, so I'm
[34:01] keeping it in just to be safe.
[34:04] If the large client header buffers are
[34:07] exceeded, EngineX returns a 400 bad
[34:11] request error. That's a pretty drastic
[34:14] action. So, we can't squeeze this buffer
[34:17] too tightly, especially if we have a
[34:19] broad or non-technical user base.
[34:23] But if our API end users are technical
[34:26] enough, we could put a low but
[34:28] reasonable limit on the size of the
[34:30] request headers and then require users
[34:33] to read and abide by the documentation.
[34:41] EngineX stores the request body in the
[34:44] client body buffer and any excess is
[34:47] stored on disk. So we can be more
[34:50] aggressive with this buffer. Maybe only
[34:52] guaranteeing it will hold the bodies of
[34:55] 95% or 99% of legitimate requests.
[35:02] For an enginex instance that only serves
[35:04] static assets like the enginex serving
[35:07] our Vue.js JS front end legitimate users
[35:11] will never send post, put, or patch
[35:13] requests.
[35:15] So, we can set the client body buffer
[35:17] size very low.
[35:20] But, we'd still need to count it in our
[35:21] formula because users can still fill up
[35:24] that buffer. So, we don't want to give
[35:26] malicious users the ability to trigger
[35:28] our memory limit and kill the EngineX
[35:30] container.
[35:34] After receiving the request line and the
[35:36] headers, EngineX compiles them to
[35:39] determine how to route the request. Some
[35:43] of the requests will be sent to PHP FPM
[35:46] and it will run the Laravel application
[35:48] to build the response which will then be
[35:51] sent back to EngineX.
[35:54] The first part of the output from PHP
[35:57] FPM is stored in the preliminary buffer
[36:00] determined by fast CGI buffer size.
[36:05] It's crucial that the fast CGI headers
[36:08] are fully included in this preliminary
[36:11] buffer because if not, EngineX returns a
[36:14] 502 bad gateway error.
[36:18] These fast CGI headers are what will
[36:20] later be translated into the responses
[36:23] HTTP status and HTTP headers.
[36:29] So we need to be very aware of and
[36:31] control the size of the headers returned
[36:34] by Laravel.
[36:35] In the engineext access log, we're
[36:38] recording the embedded variables,
[36:40] upstream response length, which is the
[36:43] total size of the response payload sent
[36:46] from PHP FPM, and body bytes sent, which
[36:51] is the size of the response body. So,
[36:54] the total minus the body gives us the
[36:57] size of the headers. This is likely to
[37:00] be much smaller than the size of the
[37:02] equivalent HTTP headers because it's in
[37:05] binary key value format and doesn't
[37:08] include endline characters.
[37:13] Any headers added by engine X are not
[37:16] held in this buffer because they're
[37:18] added as the response is sent out.
[37:23] If this preliminary CGI buffer fully
[37:25] contains the headers, the rest of the
[37:28] buffer is put to good use and fills up
[37:30] with the first part of the response
[37:32] body. So there's no memory saving
[37:34] benefit to squeezing this buffer
[37:36] aggressively.
[37:38] If the response exceeds the preliminary
[37:41] buffer, the rest of the body is stored
[37:43] in fast CGI buffers.
[37:47] Yes, that's named confusingly.
[37:50] what I'm calling the preliminary buffer
[37:53] is determined by the enginex directive
[37:56] fast CGI buffer size
[37:59] and then these body only buffers are
[38:01] determined by fast CGI buffers.
[38:06] If these body only buffers are also
[38:08] exceeded engineext responds with a 502
[38:12] bad gateway.
[38:14] So we need to be very aware of the
[38:16] maximum size of our responses.
[38:20] The value of upstream response length in
[38:22] our logs will help with that.
[38:26] If we can control the response size with
[38:28] a high degree of confidence, we can set
[38:31] these buffers quite tightly.
[38:34] And then we'd need to set up a system
[38:36] such that any design or codebase change
[38:38] that affects the maximum response size
[38:41] triggers a reassessment of this
[38:43] configuration before deployment.
[38:46] And we'd also need to implement
[38:47] end-to-end tests for the scenarios that
[38:50] generate the largest possible responses
[38:52] in production.
[38:55] It's possible that EngineX clears the
[38:57] request buffers before the response
[39:00] buffers are filled. If that's the case,
[39:03] we could safely use the max of the
[39:05] request buffers and the response buffers
[39:08] instead of the sum. And that would then
[39:10] reduce our peak memory estimate by quite
[39:13] a lot.
[39:14] That would be very interesting to
[39:16] examine, but I haven't had the time to
[39:17] do that yet.
[39:20] For an enginex instance that only serves
[39:23] static assets, it's impossible for a
[39:25] request to use the fast CGI buffers, not
[39:29] to mention the proxy or the output
[39:31] buffers not mentioned here. So, we can
[39:34] safely remove those from our peak usage
[39:36] formula for that instance.
[39:39] And the ingress controller handles TLS
[39:42] termination. So we're also ignoring SSL
[39:45] buffer here.
[39:52] PHP, PHP FPM, and EngineX all have
[39:57] settings for timeouts that govern
[39:59] various parts of the request response
[40:01] process. So I created this diagram for
[40:04] my notes to help visualize how they line
[40:06] up.
[40:08] The x-axis represents time very roughly
[40:13] as the request goes from client to
[40:15] engineext to PHP FPM and the response
[40:19] reverses that route.
[40:22] But the size of each block doesn't
[40:24] correspond to how long the task takes or
[40:26] the suggested timeout value. Rather, the
[40:30] diagram shows how and when each timeout
[40:33] is triggered and how they overlap.
[40:38] If any of these timeouts are breached,
[40:40] the client receives an error response.
[40:43] So, we want these timeouts to
[40:45] accommodate essentially 100% of
[40:48] legitimate requests.
[40:51] Starting from the left, the first
[40:53] timeout is client header timeout, which
[40:57] is triggered when EngineX accepts the
[40:59] connection after the TCP handshake.
[41:03] It sets the time needed to receive the
[41:05] request line and the headers from the
[41:08] client.
[41:09] It's not an absolute timeout. Rather,
[41:12] it's a timeout for the intervals between
[41:15] reads. That is, each time some part of
[41:19] the header is received, the timer resets
[41:22] to zero. And that's a running theme for
[41:25] a lot of these engine X timeouts. As you
[41:28] can see in the diagram,
[41:31] after EngineX receives and pauses the
[41:34] request line and the headers, the client
[41:36] body timeout begins and limits the
[41:40] intervals between reads of the request
[41:42] body from the client.
[41:45] The purpose of these two request
[41:47] timeouts is to stop partial requests
[41:50] filling up the available connections.
[41:53] A slow loris attack is an attempt to do
[41:56] that on mass and a clever attacker could
[41:59] easily determine our timeouts and then
[42:01] drip feed the server with a response
[42:04] repeatedly. So it's important to also
[42:07] limit concurrent requests from the same
[42:09] IP address.
[42:11] So we store all concurrent IP addresses
[42:15] with limit con zone. With my settings,
[42:19] there can be a maximum of 24,048
[42:22] concurrent connections. So that needs
[42:26] 124 kilobytes to guarantee storing all
[42:28] concurrent addresses.
[42:32] Then in a server or location block, use
[42:35] limit con to put an upper limit on the
[42:38] number of concurrent connections from
[42:40] the same IP address.
[42:43] Back to timeouts.
[42:46] After the header has been received,
[42:48] EngineX can determine how to process the
[42:51] request. If it will be forwarded to PHP
[42:54] FPM and if the EngineX worker doesn't
[42:57] already have a connection with an idle
[43:00] PHP FPM worker, it starts a new
[43:03] connection and starts the fast CGI
[43:07] connect timeout.
[43:09] In normal operation, establishing a
[43:12] connection is near instantaneous
[43:14] unless all PHP FPM workers are busy and
[43:18] the queue is filled up. The Q size is
[43:21] determined by the PHP FPM directive
[43:25] listen backlog. So, it's advisable to
[43:29] set it to the maximum 511
[43:32] and then set fast CGI connect timeout to
[43:36] just a few seconds.
[43:39] Once the body is fully received and the
[43:42] fast CGI connection is established,
[43:45] EngineX begins sending the request to
[43:47] PHP FPM and starts the fast CGI send
[43:53] timeout and PHP starts the max input
[43:57] time.
[44:00] Max input time also covers the pausing
[44:03] of the request body like populating the
[44:06] dollar post or dollar files predefined
[44:11] variables before the script can begin
[44:14] execution.
[44:16] But of course, EngineX doesn't know
[44:18] about or care about any of that. So as
[44:21] soon as the transmission is complete, it
[44:23] switches from the first CGI send timeout
[44:26] to the first CGI read timeout, which
[44:30] puts a limit on how long Laravel can
[44:32] take to return the response in full.
[44:36] More accurately, it puts a limit on the
[44:38] interval between read operations. But
[44:41] since Laravel typically buffers the
[44:43] whole response and sends it out at the
[44:45] end, that makes fast CGI read timeout
[44:50] almost the same as an absolute timeout.
[44:55] The execution time of the PHP script is
[44:58] limited by PHP's max execution time and
[45:03] by PHP FPM's request terminate timeout.
[45:09] Max execution time measures CPU time. So
[45:13] the timer is paused during IO operations
[45:16] like database queries. And when
[45:19] exceeded, it has a slightly more
[45:21] graceful termination.
[45:23] Whereas request terminate timeout
[45:26] measures wall clock time and it has a
[45:29] hard termination.
[45:31] So we'd align these two, but then we'd
[45:35] increase request terminate timeout to
[45:38] account for the peak expected IO time.
[45:41] And then we'd also add a little margin
[45:44] to give max execution time a chance to
[45:47] terminate the script more gracefully
[45:51] after returning the response in full. If
[45:54] the PHP script continues execution as in
[45:58] with the terminate method in middleware,
[46:01] this is included in max execution time
[46:05] and request terminate timeout, but not
[46:08] in EngineX's fast CGI read timeout
[46:12] because once EngineX has received the
[46:14] response, it's already moved on to
[46:16] sending it to the client.
[46:19] So the script execution timeouts can
[46:21] extend rightwards beyond T5 in the
[46:25] diagram and maybe beyond the end of
[46:28] engine X's fast CGI read timeout. Though
[46:32] in that situation, it's probably better
[46:34] to use Q workers instead
[46:38] to help set the script execution
[46:40] timeouts. In Laravel, we're logging wall
[46:43] clock duration and CPU time duration for
[46:47] each request.
[46:49] And finally, at the right of the
[46:51] diagram, send timeout limits the
[46:55] intervals between write operations while
[46:58] sending the response to the client.
[47:02] Engine X has four embedded variables
[47:04] which we can log to help us with setting
[47:08] some of these timeouts.
[47:10] Request time measures from the first
[47:13] bite received from the client to the
[47:16] last bite sent to the client.
[47:20] Upstream connect time measures the time
[47:23] to establish the first CGI connection.
[47:27] That should hopefully always be zero.
[47:31] Upstream header time measures from the
[47:34] first bite sent to PHP FPM until the
[47:38] first bite received in response by
[47:40] engine X.
[47:43] An upstream response time has the same
[47:46] start point but keeps measuring until
[47:49] the last bite of the response is
[47:51] received by engine X. Those two will be
[47:54] the same unless the response is very
[47:56] large.
[48:00] In the Laravel documentation, the
[48:03] suggested EngineX configuration is
[48:05] really not great.
[48:08] As an example, let's say a user sends a
[48:12] request to our domain /hello.
[48:16] So this embedded variable dollar uri
[48:19] equals hello.
[48:21] With this configuration, what we're
[48:23] asking engineext to do is this.
[48:26] First check for a file called hello and
[48:30] if it exists serve that file to the
[48:33] client.
[48:34] So far so good. That could be a JS file
[48:36] or a CSS file.
[48:39] But if hello file doesn't exist, check
[48:42] for a directory called hello.
[48:46] If hello directory exists, check for an
[48:49] index file which above is defined as
[48:52] index.php.
[48:54] If that exists, serve it to the client.
[48:57] And already from a Laravel perspective,
[49:00] we are way off course.
[49:03] If hello directory didn't contain
[49:06] index.php,
[49:08] then serve the directory listing, which
[49:11] very ancient internet users will
[49:13] remember, but these days directory
[49:15] listings are disabled by default. So,
[49:18] EngineX returns 403 forbidden simply
[49:22] because the user requested a directory
[49:24] which does exist on the server.
[49:28] And if the request URI isn't a file and
[49:31] isn't a directory, finally we're
[49:33] directed to the Laravel application
[49:36] index.php.
[49:38] But I don't understand why index.tphp
[49:40] here is so convoluted with variables.
[49:44] If not to a static file, we always want
[49:46] to forward to public/index.php.
[49:50] So why not just hardcode it here and
[49:52] state it clearly?
[49:56] And these headers at the moment are
[49:58] functional. But the moment we put an add
[50:01] header directive into a location block,
[50:04] that block no longer inherits add header
[50:07] directives from outside. So it's safer
[50:10] to just put all add header directives
[50:14] into location blocks and don't rely on
[50:16] inheritance.
[50:18] This whole configuration feels like a
[50:20] copy paste job from a pre- Laravel PHP
[50:23] project. And although it works, it has
[50:27] needless inefficiencies and potential
[50:29] security flaws.
[50:33] For our EngineX configuration on port
[50:35] 8080,
[50:37] we want all responses to be JSON,
[50:40] including errors caught by EngineX.
[50:43] So for the error pages, we're using
[50:45] named locations that serve static JSON
[50:48] files saved into the image.
[50:51] Internal locations would also achieve
[50:53] the same result.
[50:56] Then apart from the FAV icon and
[50:58] robots.txt, txt we're returning 404 for
[51:02] any request that doesn't start with
[51:05] slash API/v1
[51:08] slash
[51:10] and if we take a look at the standard
[51:12] hacky requests that every server gets
[51:15] this location block alone rejects well
[51:18] all of them we definitely don't want any
[51:21] malicious requests like this to access
[51:23] any static files and preferably we don't
[51:26] want to waste resources forwarding the
[51:28] request to Laravel just for it to return
[51:31] a 404 response.
[51:35] All API requests get sent to Laravel and
[51:38] index.php is hardcoded for clarity.
[51:43] Finally, we're rate limiting by IP
[51:45] address here in EngineX and later by
[51:49] user ID in the Laravel application.
[51:54] Then for port 8081 for cluster internal
[51:58] traffic, we've got keep alive times to
[52:00] sustain TCP connections with EngineX
[52:03] exporter and the readiness check for one
[52:08] week and for two weeks respectively.
[52:12] /engineex up is a simple enginex only
[52:15] endpoint for a livveness check which
[52:18] we're currently not using.
[52:20] slashengineext status is for the enginex
[52:23] exporter to scrape engineext metrics and
[52:26] in turn be scraped by Prometheus
[52:29] and the rest are specific endpoints to
[52:31] send to Laravel
[52:34] in the Vue.js EngineX instance if a
[52:38] request URI ends in one of these file
[52:41] extensions we check if the static file
[52:44] exists and if not return index.html HTML
[52:49] and the Vue.js router will send the 404
[52:52] page.
[52:55] And then for non-static asset URIs, go
[52:58] straight to index.html.
[53:01] In the content security policy header,
[53:04] we need to specify the API domain for
[53:07] connect source and form action.
[53:12] For static assets, we can cache the
[53:14] results of the stat and open system
[53:17] calls. And since we're using immutable
[53:20] containers, we can safely cache for a
[53:22] year or more. And then the static assets
[53:25] themselves will likely be held in memory
[53:27] by Linux's page cache, though that
[53:30] depends on other containers in the same
[53:32] node.
[53:37] Let's take a quick look at the Docker
[53:38] files for the Laravel image. Some
[53:42] dependencies are needed at build time
[53:44] for compiling PHP extensions and running
[53:47] composer install but not needed at
[53:50] runtime.
[53:52] So the overall design is compile in a
[53:55] builder target and then copy the results
[53:58] into a fresh minimal target for
[54:01] production.
[54:03] To accommodate other targets which we'll
[54:05] discuss in a second, we have to split
[54:08] builder and build prod targets and split
[54:12] minimal base and prod targets with prod
[54:15] being the ultimate image to deploy in
[54:18] production.
[54:20] During the build phase, the composer
[54:22] install command only needs composer.json
[54:26] and composer.lock, block. But creating
[54:29] the auto loader obviously needs the
[54:31] entire codebase. So a very efficient
[54:34] Docker caching strategy is to copy in
[54:37] thejson andlock files, run composer
[54:41] install with the d-n no autoloadader
[54:44] flag.
[54:46] Then that intensive process is cached
[54:49] until we change our composer
[54:51] dependencies.
[54:53] Then we can copy in the codebase and
[54:55] build the autoloader.
[54:57] If we didn't split those commands, we'd
[54:59] need to build the vendor directory every
[55:02] time we modify a file
[55:05] in the production image for PHP FPM to
[55:09] communicate with the engineext
[55:11] container. Both processes need read and
[55:14] write access to the socket file. So we
[55:17] make sure that the www data user in this
[55:21] container and the engineext user in the
[55:24] engineext container have the same UID
[55:28] and then set the file permissions
[55:30] accordingly in the PHP FPM configuration
[55:35] as mentioned in the Kubernetes section.
[55:37] We run all bootstrap caches except for
[55:40] config cache in the docker file.
[55:44] And we're not using Laravel's built-in
[55:46] health checker endpoint. But if we were,
[55:49] apparently that view isn't cached by PHP
[55:52] artisan view cache. So we can access it
[55:56] once in the Docker file to force that
[55:59] view to be rendered so that we can make
[56:02] the cache directory read only in
[56:04] production.
[56:06] For file permissions, I've commented out
[56:09] some of my standard Laravel production
[56:11] setup because they don't apply in
[56:13] Kubernetes or for this particular
[56:15] project, but I like to keep them here
[56:17] just as a reminder or if we change the
[56:20] project or architecture later.
[56:24] For the local dev environment, the
[56:26] simplest option is to extend the builder
[56:28] target just before the composer install
[56:31] command is run. Then install development
[56:35] tools like XDBug and create a directory
[56:38] for the output from XDBug profiling
[56:42] and then mount the local codebase into
[56:44] the container in docker compose with a
[56:47] bind mount.
[56:50] We want to run composure install and
[56:52] artisan commands inside this container
[56:55] to write files to our local device.
[56:59] So in the make file we can construct a
[57:01] command to enter the container with the
[57:04] www data user and our local users group
[57:09] GD
[57:10] and then set um mask to make sure that
[57:13] any files generated have 775 permissions
[57:17] as in full permissions for both user and
[57:20] group.
[57:21] That way the vendor directory and any
[57:24] other files created by this container
[57:26] are readable and writable by the www
[57:29] data user in this container and our
[57:33] local user on the host.
[57:37] I've also got a combined target for
[57:39] testing the architectural option of PHP
[57:43] FPM and engine X in a single container
[57:46] which we discussed earlier.
[57:50] The only complaint with this dev image
[57:52] is that it has different dependencies to
[57:54] the production image. So potentially
[57:57] some tests could be passing in this
[57:59] image but be failing for production.
[58:02] This is a trade-off I'm happy with. But
[58:05] an alternative, more complex approach
[58:07] would be to run composer install with
[58:10] testing dependencies included.
[58:13] Then copy the resultant vendor directory
[58:16] into a target that splits from the
[58:18] production image just before the
[58:20] codebase is copied in and then mount the
[58:24] local code base with a bind mount. And
[58:27] similarly for the dev environment split
[58:30] off from minimal base and install dev
[58:33] tools like xdebug.
[58:37] The local development environment is
[58:39] handled by docker compose. There's not
[58:42] too much to note here. Engineext depends
[58:45] on the laravel container and service
[58:48] started is enough because we just need
[58:50] the socket file to exist before enginex
[58:53] starts.
[58:55] Ideally, Laravel should depend on
[58:57] Postgress and Reddus passing health
[58:59] checks. Postgress has a handy pg_is
[59:04] ready command and Reddus has ping. But I
[59:08] commented out the dependencies since
[59:10] very often I was just testing engine X
[59:12] and PHP only and didn't want to wait an
[59:15] extra 2 seconds to get running.
[59:18] There are two instances of Postgress,
[59:21] one for the dev environment and one for
[59:22] testing. It's most common to run tests
[59:26] with an in-memory instance of SQL Lite
[59:29] as the database. But as we'll see later,
[59:32] our application is unfortunately tightly
[59:35] coupled to Postgress. So we need to run
[59:37] the tests with Postgress to have any
[59:40] confidence that the results represent
[59:42] the application in production.
[59:46] There's a pre-commit script to process
[59:48] the code before committing to git
[59:50] repository for Vue.js. JS lint staged
[59:55] handles things pretty well. We're
[59:58] running prettier eslint and vest on only
[60:02] staged files that have changed since the
[60:04] last commit and running view tsc on the
[60:08] whole project.
[60:10] In a similar way, in the bash script,
[60:13] we've got a function that returns an
[60:15] array of the staged files that have
[60:18] changed since the last commit, so that
[60:20] we're not wasting time and resources
[60:22] checking the whole codebase on each
[60:24] commit.
[60:26] For scripts, we feed that array into
[60:28] shell check lint. And for PHP, we're
[60:32] validating the composer.json JSON and
[60:35] log files. Then feeding the changed
[60:38] files array into PHP stan at level 9 and
[60:42] pint and then running git add to
[60:45] reststage any files that were modified.
[60:48] Each makes sure to reference the correct
[60:50] configuration file. And we have a
[60:53] stricter pint configuration for non-ests
[60:56] than for tests.
[60:59] And the last step is to run PHP unit. It
[61:03] executes a script mounted in the Laravel
[61:05] container which accepts arguments for
[61:09] whether or not to first run database
[61:11] migrations,
[61:12] the test coverage threshold, test suites
[61:16] to include or exclude, and which tests
[61:19] specifically to run.
[61:23] Since I'm not working as part of a team,
[61:25] a make file is sufficient for a CI/CD
[61:28] pipeline for testing, building, and
[61:30] deploying.
[61:32] Exec Laravel executes a command in the
[61:35] Laravel container.
[61:38] Very often we're creating or editing
[61:40] files on the host machine via a bind
[61:42] mount. So we enter the container with
[61:45] both the www data user and the local
[61:50] users group so we can function in both
[61:52] worlds. And we're setting um mask so
[61:55] that any files created have 775
[61:58] permissions so that both the containers
[62:01] user and the local user can read and
[62:03] write the files created.
[62:06] Shell Laravel executes an interactive
[62:09] shell in the container.
[62:12] The composer commands have the same
[62:14] function and are just time savers to do
[62:16] specific composer functions without
[62:19] opening an interactive terminal.
[62:21] And then there are similar exec and
[62:23] shell commands for each container.
[62:27] If we want to run PHP stan pint or PHP
[62:31] unit outside of the pre-commit check,
[62:34] those commands are here. For testing,
[62:37] there are commands for the standard test
[62:39] suite and for end-to-end tests for
[62:42] testing a deployment.
[62:45] Then there are commands for building the
[62:46] images and for deploying them in
[62:49] Kubernetes.
[62:51] We can switch contexts between the local
[62:53] kind cluster and my physical cluster.
[62:59] For this API, I want every response to
[63:02] fit within a small set of predictable
[63:04] JSON structures.
[63:07] The top level of the response should
[63:09] always be an object, not an array to
[63:11] avoid JSON array hijacking.
[63:14] And for every response object, we attach
[63:17] an object called meta, which at least
[63:20] includes a request ID to help clients
[63:23] communicate problems with us and help
[63:25] with debugging.
[63:27] I'm also including the timestamp and
[63:29] script duration for now, but these don't
[63:32] have any practical purpose at the
[63:33] moment.
[63:36] For a query that returns a single
[63:38] resource, the resource is the value of a
[63:41] key called data.
[63:44] For a query that returns multiple
[63:46] resources, data is an array of results.
[63:51] And we add an object called pagionation
[63:54] to help the client navigate through the
[63:56] data set.
[63:58] For a successful query with no resource
[64:00] to return, data just contains result
[64:04] success.
[64:06] And finally, if at least one error
[64:08] occurs, data is replaced by an array
[64:11] called errors.
[64:13] which contains error objects which each
[64:15] have an integer code and a string
[64:18] message.
[64:20] These error codes help the client to
[64:22] respond programmatically to a problem
[64:24] without needing to parse the message
[64:26] text.
[64:28] The error itself is a data transfer
[64:31] object called API error and the code is
[64:35] an integer enum called API error code.
[64:41] API error code has a method called
[64:44] message which accepts an array of
[64:46] placeholders if necessary and returns an
[64:50] appropriate message for each error code.
[64:53] And the API error constructor calls this
[64:57] message method.
[65:01] So API error code is a very convenient
[65:04] single location to plan and construct a
[65:07] list of all possible errors, pair them
[65:10] up with a suitable message, and a
[65:12] reference point for the placeholders
[65:14] that we need to pass to the API error
[65:17] constructor.
[65:19] The aim is to be as specific as possible
[65:22] with error codes, but also to have more
[65:24] general error codes to fall back on. For
[65:28] example, we have specific error codes
[65:30] for each type of input validation
[65:33] employed in our project. But if
[65:35] something goes wrong, there's a general
[65:37] validation error code. So if we use a
[65:41] new type of input validation in a
[65:43] controller or form request, but we don't
[65:46] account for it in our validation error
[65:48] handler, we can still return a
[65:50] semi-specific error response. And in the
[65:53] worst case scenario, we can fall back on
[65:55] the unknown error code.
[65:58] Of course, when such general errors
[66:00] happen, we need to analyze the logs and
[66:03] construct more insightful error codes.
[66:05] As a result,
[66:08] the JSON response structures mentioned
[66:10] each have a method in the class called
[66:13] API response builder.
[66:16] success pagenated
[66:19] errors.
[66:21] And as well as the errors method,
[66:23] there's also an error method because
[66:26] returning a single error is the most
[66:28] common scenario and it's easy to forget
[66:30] to enclose it in an array.
[66:35] Since we want to always send JSON
[66:37] responses with descriptive error codes,
[66:40] we want to replace Laravel's default
[66:43] exception handling behavior in Laravel
[66:46] 11 onwards. That's done in
[66:48] Bootstrap/app.php.
[66:53] But since it's likely to get quite
[66:54] sizable, I've extracted it to a class
[66:56] called API exception handler.
[67:00] In simple cases, we can just define an
[67:02] error response with API response
[67:05] builder.
[67:07] In some cases, there's a small amount of
[67:09] processing to add more detail to the
[67:11] error response.
[67:13] And there's a catch all default for any
[67:16] exception we're not handling
[67:17] specifically.
[67:20] For input validation errors, I've
[67:22] extracted the logic to validation errors
[67:25] builder, which returns an array of the
[67:28] data transfer object. API error to be
[67:32] fed into API response builder errors
[67:35] method.
[67:37] In validation errors builder, we loop
[67:40] through the errors returned by the
[67:42] validator and match each with the
[67:45] correct API error code.
[67:49] As mentioned throughout this project,
[67:51] whenever there's a scenario that we
[67:53] don't expect to happen, like if we fail
[67:55] to match the validation error, we log
[67:59] the details and fall back on a more
[68:01] general API error code.
[68:05] For the database, I have maybe something
[68:08] of a controversial feature which I'll
[68:11] have to explain carefully.
[68:14] When we need to write to the database
[68:16] and one of the columns has a unique
[68:18] constraint or a foreign key constraint,
[68:22] the standard practice is to first
[68:24] validate the value with a select query
[68:28] and if no results are returned, then
[68:30] continue with the insert or update
[68:33] operation.
[68:35] So that's one database query for the
[68:37] failure scenario and two queries for the
[68:40] success scenario. But
[68:43] with a lot of caveats, we can
[68:46] potentially skip the validation in
[68:48] Laravel, write the value directly to the
[68:51] database and if a constraint is
[68:54] violated, handle the error returned by
[68:56] the database and inform the user of the
[68:59] input validation error.
[69:02] That means only one database query for
[69:05] both success and failure scenarios which
[69:09] reduces the load on the database and
[69:11] speeds up responses from the client's
[69:13] perspective.
[69:15] It also removes the race condition
[69:17] between the select query for the
[69:19] validation and the eventual write to
[69:22] database.
[69:24] So now for the downsides. One, it splits
[69:28] the validation logic in two, which is
[69:31] messy. I kept the constraints in the
[69:34] form request as comments, as reminders
[69:37] of what will be validated by the
[69:39] database.
[69:41] Two, it's not great for the developer
[69:44] experience. We just have to remember to
[69:47] not validate constraints in form
[69:49] requests and to employ our new strategy
[69:52] each time we want to insert or update.
[69:57] Three, the database treats the
[69:59] constraint violation as an error, not as
[70:02] a simple validation check, and it logs
[70:05] it as such. So, we'd need a log
[70:08] filtration system in production.
[70:12] Four, the database returns the error as
[70:15] a string which we have to parse
[70:18] and that is a somewhat fragile process.
[70:21] We need to run rigorous tests with many
[70:23] edge cases every time we change database
[70:26] version.
[70:28] And five, error responses vary per
[70:31] vendor. So we're tightly coupling our
[70:34] application with our initial choice of
[70:36] database vendor, in this case Postgress.
[70:41] These are all very serious cons and
[70:44] nothing else about this project is
[70:46] geared towards the high concurrency
[70:49] situation that would give value to the
[70:51] pros. So for a real project, I would
[70:54] almost certainly not implement this
[70:56] feature, but I wanted to explore it as
[71:00] an educational exercise. And it's a
[71:02] healthy exercise to predict the cons in
[71:04] advance, try it, and run head-on into
[71:08] any unexpected cons, and then get better
[71:10] at analysis of design choices in the
[71:13] future.
[71:15] So far with Postgress with unique and
[71:18] foreign key constraints, I haven't hit
[71:21] any critical problems from paring the
[71:23] error message.
[71:25] It returns a unique SQL state code that
[71:29] identifies which constraint was violated
[71:33] and the offending column is bounded by
[71:36] characters that are invalid for a column
[71:38] name and preceded by a substantial fixed
[71:42] string.
[71:44] Postgress does actually allow illegal
[71:46] characters in the column name if it's
[71:48] bounded by double quotes. So we'd have
[71:51] to check for that.
[71:53] Another nuisance is that Laravel
[71:55] interpolates the actual values into the
[71:58] Postgress error message. And for
[72:01] security, we don't want potentially
[72:03] sensitive values feeding into our psing
[72:05] logic, especially for a fragile process
[72:08] like this that has a lot of logging.
[72:13] So it works for Postgress but if for
[72:15] some reason we change database vendor
[72:18] we'd need to rewrite the parsing logic
[72:20] and there's no guarantee that the error
[72:22] message provides the required detail and
[72:25] format for us to parse.
[72:28] Here's the comparable message from SQL
[72:30] light.
[72:32] The SQL state code is more generic than
[72:34] Postgresses. So we'd have to parse the
[72:37] text to discover even which constraint
[72:39] was violated.
[72:42] There's also a small possibility that a
[72:44] new version of Postgress will change the
[72:47] error message in a way that hurts this
[72:49] feature.
[72:52] I implemented this feature with a trait
[72:54] called handles DB errors.
[72:57] For any code that inserts or updates a
[73:00] database record, we wrap it in a closure
[73:04] and in the handle DB errors method. and
[73:07] the closure is executed inside a try
[73:10] catch block looking for query exception.
[73:15] This wrapping design causes very minimal
[73:18] disruption and there's a low development
[73:20] cost to enabling or disabling the error
[73:23] handling feature.
[73:25] The design is very reusable. Just add
[73:28] the trait to any class that writes to
[73:30] database. And compared to a rigid method
[73:33] signature, the closure gives us complete
[73:36] flexibility around what variables to
[73:38] pass, what type to return,
[73:42] which interface to use to interact with
[73:44] the database, how many queries to run,
[73:48] what parts of the code to wrap in a
[73:50] database transaction,
[73:52] and what other actions we need to run
[73:54] alongside the queries.
[73:57] Currently, if a constraint violation is
[74:00] detected, we're immediately returning a
[74:02] validation error response to the client.
[74:06] We could consider making this more
[74:07] flexible, like allowing more closures to
[74:10] be passed to handle specific error
[74:12] scenarios. For example, we might want to
[74:16] alter our reaction based on which column
[74:18] violated the constraint.
[74:27] Also regarding the database, we're
[74:30] implementing a rule of no select star
[74:33] queries or no queries that return all
[74:36] columns. This is to reduce the chance of
[74:40] exposing sensitive data and to make
[74:42] queries faster to run.
[74:45] So for any resource that will be output
[74:48] by the API, we're defining a resource
[74:50] class where we define how the raw output
[74:54] is processed into the API output.
[74:58] In this case, we're just renaming UU ID
[75:01] into ID.
[75:04] And we're defining the columns to be
[75:06] injected into the select query to fetch
[75:08] only the relevant columns.
[75:11] For the naming scheme, we have the model
[75:14] name item, then public or private
[75:18] explicitly warning if this resource will
[75:20] be output by the API or not. For
[75:23] example, the UU ID is for public usage
[75:27] while the incrementing integer ID is
[75:29] strictly for internal usage.
[75:32] And we have full or minimal to denote
[75:35] which columns to include.
[75:38] Pagionated results would typically use
[75:40] minimal while the results of a single
[75:43] specified resource would typically use
[75:46] full and include more columns.
[75:49] Then in the model class, it imports the
[75:51] columns constant to run the relevant
[75:54] queries.
[75:59] Let's run through the life cycle of a
[76:01] standard request. The first thing we do
[76:04] is create an instance of request context
[76:07] service which will hold auxiliary
[76:10] information about the request.
[76:12] We define it as a singleton so that
[76:15] anytime it's referenced in a method
[76:17] signature throughout the application,
[76:19] the service container will inject the
[76:21] same instance with the same properties
[76:24] similar to how request itself is
[76:26] handled.
[76:28] Immediately we store the current time so
[76:31] that we can calculate the wall clock
[76:33] duration at the end of the script
[76:35] execution and we store the current
[76:38] resource usage of the PHP FPM worker
[76:42] process in order to calculate the CPU
[76:44] time duration at the end of the script.
[76:48] There's an empty array to store the
[76:49] duration of any database queries which
[76:52] will also be logged at the end. And the
[76:55] request ID set way back in the ingress
[76:58] controller is saved here and added to
[77:01] the context of any logs that will be
[77:02] written.
[77:05] The logging middleware will call get
[77:07] duration milliseconds which is pretty
[77:10] simple and also get CPU time
[77:14] milliseconds which requires some
[77:16] explanation.
[77:20] R U
[77:22] time.tv TV sec is the number of whole
[77:26] seconds the PHP FBM worker process has
[77:30] spent in user mode like the PHP
[77:33] interpreter doing work.
[77:36] The same key with us or microsconds
[77:40] instead of sec is the number of
[77:42] microsconds towards the next whole
[77:45] second in user mode. So it's bounded by
[77:49] 1 million.
[77:51] RUS time.tv
[77:55] sec is the number of whole seconds the
[77:58] PHP FPM worker process has spent in
[78:01] kernel mode. So that's system calls,
[78:05] memory mapping, network stack
[78:07] processing etc.
[78:10] And the same key with USC instead of SE
[78:13] is the number of microsconds towards the
[78:15] next whole second in kernel mode.
[78:19] The most intuitive mathematical approach
[78:21] to calculating the CPU time duration of
[78:25] the script is to combine the seconds
[78:28] with microsconds
[78:30] and sum the user time and kernel time
[78:34] and then subtract the start time from
[78:36] the end time just like we do for the
[78:39] wall clock duration.
[78:41] I was worried that floating point
[78:43] accuracy might be a problem here.
[78:46] Floating point accuracy is 15 to 17
[78:49] significant figures. So adding
[78:51] everything up to a large total before
[78:54] the final subtraction could cause some
[78:57] precision to be lost.
[79:00] And since the difference between the
[79:01] start and end times will generally be
[79:03] tiny, that loss of resolution could mean
[79:06] we get a result of zero once the PHP FPM
[79:10] worker passes a certain age.
[79:14] So maybe it would be better to have a
[79:16] mathematically equivalent but less
[79:18] intuitive formula that subtracts large
[79:21] numbers from large numbers and sums the
[79:24] results at the end.
[79:27] But I tried some rough calculations and
[79:30] actually the loss of resolution happens
[79:32] sometime after 250 years. So yes, the
[79:36] intuitive formula will do fine.
[79:41] The CPU time duration will inform our
[79:43] decision on setting the PHP directive
[79:46] max execution and the wall clock
[79:48] duration helps with the PHP FPM
[79:51] directive request terminate timeout.
[79:56] The last two methods are to log the size
[79:59] of the response headers and the body.
[80:04] The only thing to note here is that the
[80:06] headers are ASKI only. So we can always
[80:09] safely use strlen which is faster than
[80:13] the multibbyte equivalent mb strl ln.
[80:19] The body is UTF8.
[80:21] So potentially nonasi.
[80:24] So we can only use strl ln as long as in
[80:28] php.ini
[80:30] we set mbstring.funk
[80:33] overload to zero.
[80:36] Otherwise, we'd have to use MB_ST
[80:40] strlen and specify 8 bit encoding to be
[80:44] sure we're getting the number of bytes
[80:46] instead of the number of characters.
[80:50] Then we add a listener for the database
[80:53] to add the query time to the query
[80:55] duration array we just created.
[80:59] And next we configure the rate limiter,
[81:03] although it isn't actually applied at
[81:05] this point in the request.
[81:07] Engine X rate limits by IP and
[81:10] intercepts those requests before they
[81:12] reach Laravel. So in Laravel, we're
[81:15] limiting by user ID and that needs to
[81:18] happen after authentication.
[81:21] For us, it happens later in
[81:23] roots/api.php
[81:25] PHP straight after the authentication
[81:28] middleware.
[81:30] Malicious users can get around this with
[81:33] a coordinated system of multiple
[81:35] accounts and multiple IPs.
[81:38] If we were worried about that and we
[81:41] couldn't control user accounts or
[81:43] whitelisted IPs more tightly, then we
[81:46] might consider running pattern
[81:47] recognition of usage, i.e. combine the
[81:51] usage of two or more users. assess them
[81:55] as if they were a single user and see if
[81:57] it adds up to a malicious usage pattern.
[82:02] Or we might try to gather a list of
[82:04] known VPN IP addresses and monitor those
[82:07] accounts more closely.
[82:10] For EngineX's rate limit on IP
[82:12] addresses, we need to consider if our
[82:15] clients are in a business at the same
[82:17] address.
[82:18] For example, if all users log on at 9:00
[82:22] a.m. on the same IP address, then we
[82:24] might need to loosen X's rate limit on
[82:27] IP addresses.
[82:33] Next, the request hits the middleware.
[82:36] I've disabled global middleware and
[82:39] moved most of the default middleware to
[82:41] the API group.
[82:43] That's because the web group can only be
[82:46] accessed internally within the
[82:47] Kubernetes cluster.
[82:50] For trust proxies, we're providing the
[82:53] cider range that the ingress controller
[82:55] is guaranteed to be within. And then
[82:58] Laravel knows that the client's IP is
[83:01] the real IP and that the connection is
[83:04] indeed secure.
[83:07] Then API requests pass to root/api.php.
[83:11] PHP
[83:13] and for routes that require
[83:14] authentication the requests pass through
[83:17] the authentication middleware and the
[83:20] rate limiter middleware we discussed
[83:22] earlier.
[83:24] I extended the authenticate class as a
[83:27] convenient way to add the user ID to the
[83:30] context of logs created after
[83:32] authentication
[83:34] and also because in the original class
[83:36] if an unauthenticated request didn't
[83:39] have the accept header of
[83:42] application/json
[83:44] it tried to redirect the user to a login
[83:46] page but our API is purely JSON so
[83:50] that's not the desired behavior.
[83:53] The last endpoint is for browsers to
[83:55] report content security policy
[83:57] violations which go straight into the
[83:59] log.
[84:02] The web routes are mostly for the
[84:04] Kubernetes probes mentioned earlier. /
[84:07] Laravel readiness simply replies
[84:10] immediately.
[84:12] / Laravel startup if you recall tests
[84:16] the connections to the database and the
[84:18] cache in a try catch block and returns
[84:21] an error HTTP status code if there's any
[84:24] problem
[84:26] and / Laravel status is for my own
[84:29] tinkering and analysis of real cache and
[84:32] opcache in production to further tweak
[84:35] those configurations.
[84:37] Unfortunately, we can't just run those
[84:39] commands in the command line interface
[84:42] because the CLI is separate from PHP FPM
[84:45] and maintains its own obcache memory
[84:48] pool.
[84:49] And finally, an API request runs back
[84:52] through the middleware.
[84:54] The handle cause middleware along with
[84:57] its config lets us tell browsers that
[85:00] the Vue.js front end is permitted to
[85:02] access the API.
[85:05] And the inject meta middleware adds the
[85:08] meta object to each JSON response object
[85:11] as it's passing out of the door.
[85:14] After the response has been sent, the
[85:16] last action is to log the request for
[85:18] future analysis.
[85:23] I won't cover the Vue.js front end
[85:25] because this video is already quite long
[85:28] and it just logs into the API, stores
[85:30] the token and interacts with the API in
[85:33] a simple manner.
[85:35] All the source code is available in the
[85:37] description of the video and please let
[85:39] me know if you have any questions or any
[85:42] improvements on any part of the code.
[85:44] Thanks.