[0:00] Hello everyone. This video is a [0:02] discussion of a Laravel restful API [0:04] project using PHP FPM and engine X with [0:08] a Postgress database all in docker [0:11] containers deployed in Kubernetes. [0:15] So I'll cover the setup in Kubernetes, [0:18] the choices and thought processes when [0:20] configuring PHP FPM and EngineX, the [0:24] design of the development environment, [0:27] testing, deployment, and some features [0:29] of the API. Plus, there's a Vue.js front [0:33] end thrown in as well to log into the [0:35] API and interact with it. It's a bit too [0:38] simple to discuss in this video, but [0:40] it's also included in the code, and all [0:43] the code is in the repository in the [0:45] video description. [0:49] The production environment is my [0:51] Kubernetes cluster at home with a single [0:54] control plane and two worker nodes on a [0:57] pretty slow internet connection, [0:59] but that doesn't have too much impact on [1:01] the project and it can mostly be applied [1:03] anywhere. [1:06] The API itself has been kept generic so [1:08] you won't have to listen to any domain [1:10] specific concepts. The features I've [1:13] added are focused on internal workings [1:15] rather than a specific API use case. So [1:18] they could be applied and expanded to [1:20] many use cases. [1:22] For example, responses are restricted to [1:26] a set predictable structure, and each [1:29] error has a unique integer error code, [1:31] so the client can more easily handle [1:34] error responses programmatically. [1:37] Input validation for database columns [1:39] with unique or foreign key constraints [1:42] have been pushed to the database to [1:44] reduce the number of queries needed. [1:47] That's a controversial one, so we'll [1:49] discuss the pros and the cons. [1:52] There's also a fix for a very common [1:54] enginex inefficiency regarding HTTP [1:57] compression [1:59] and there are some helpful logs to help [2:02] us set the PHP FPM and enginex [2:05] configurations. [2:08] So the structure of the video is [2:11] production environment, [2:13] PHP FPM and EngineX configuration, [2:16] the Docker files, the dev environment [2:20] and something of a CI/CD pipeline but [2:23] very minimal and the Laravel API itself [2:28] and discussion of security [2:30] considerations throughout. [2:35] So what does the production cluster look [2:37] like? The ingress controller manages [2:40] traffic for the cluster, handles TLS [2:43] termination, and applies HTTP [2:45] compression. It forwards API requests to [2:49] the EngineX and Laravel pod. If traffic [2:53] increases, the EngineX and Laravel pod [2:55] is duplicated by a horizontal pod [2:58] autoscaler. [2:59] So, as usual, we want to make sure that [3:02] it's completely stateless. [3:05] Then for the database we're using [3:07] Postgress managed by a stateful set. [3:11] Postgress zero is the primary instance [3:13] that all Laravel instances write to. [3:17] Postgress one and two and so on are the [3:20] readonly standby instances to to [3:23] increase read throughput if we have high [3:25] concurrent usage. [3:28] So whenever a new standby instance [3:30] starts up, it duplicates the primary [3:32] database with pg [3:35] base backup. [3:37] And whenever the primary instance [3:39] executes a write operation, it sends the [3:42] W or write ahead log records to each [3:47] standby so that all instances are [3:49] synchronized in near real time. [3:52] To configure this in Laravel, in the [3:55] database config file, add read and write [3:58] elements to the database you're using, [4:02] specifying the pod for the right [4:04] instance and the service for the read [4:06] instances. [4:09] And for the development or testing [4:10] environments, override that in the [4:13] Laravel.env env file specifying the [4:16] database's service name in docker [4:18] compose [4:22] for cache we're using reddus [4:25] and of course there's the view front end [4:27] which is mostly decoupled it could be [4:29] hosted here or anywhere else and finally [4:33] all these components are scraped by [4:35] prometheus and graphana molds that data [4:39] into dashboards which are also [4:42] accessible through ingress controller [4:46] Since the cluster has so few nodes, we [4:49] can present the CPU and memory usage of [4:52] each node in a single dashboard. [4:55] And the other dashboard I watch most is [4:58] for the ingress controller, especially [5:00] latencies for the three connected [5:02] services, the API, the front end, and [5:06] the graphana front end we're using right [5:08] now. [5:10] On my slow home internet, the latencies [5:12] don't help us model production systems [5:15] very much. They're pretty slow, but the [5:18] variance can still be insightful. [5:20] The variance is expressed here with the [5:23] average latency of the fastest 50%, 95% [5:28] and 99% of requests. [5:35] In the ingress controller, we're most [5:38] commonly receiving requests from the CDN [5:41] with the original client's IP in the X [5:45] forwarded for header. [5:47] So by specifying the CDN cider ranges [5:50] that we trust, we can safely strip them [5:53] from the forwarding chain and just leave [5:56] the client's IP. [5:58] That's important for logging and for [6:00] rate limiting by IP address downstream [6:04] in EngineX or Laravel. [6:07] The ingress controller like any [6:09] engineext instance generates a random [6:12] request ID and here we're attaching it [6:15] to the web request with a custom header. [6:19] And that way we can reference a [6:21] consistent request ID here in the [6:23] ingress controller and also as the [6:26] request passes on through engine X and [6:28] Laravel and back again. [6:32] Regarding HTTP compression, [6:35] by default, gzip and brley compress [6:39] files of any size, but they both add [6:42] metadata to the compressed files. So for [6:45] files that are already small, we're [6:47] actually expending CPU power to compress [6:51] a file and actually increase the amount [6:53] of data to be transferred. [6:57] So we specify brutley min length and [7:00] gzip min length to set the file size [7:03] threshold at which the ingress [7:06] controller will apply compression. [7:09] But what's the logical threshold to set? [7:13] Well, the equilibrium point at which [7:16] compression starts to reduce file size [7:18] is different per file, but is usually [7:21] between 100 and 400 kilob. [7:25] So, is that the answer? Well, not [7:28] really. When we send a message over TCP, [7:32] as long as the payload fits within one [7:34] TCP segment, [7:36] the size of that payload has a [7:38] negligible impact on the transmission [7:41] time. [7:42] It's like adding more passengers to a [7:44] plane. It has almost no impact on the [7:47] flight time. [7:50] So from the client's perspective, [7:52] latency doesn't scale linearly with the [7:55] file size. It potentially jumps stepwise [7:59] when the file size necessitates sending [8:02] another TCP segment. [8:05] That means HTTP compression is [8:08] worthwhile only if it has a decent [8:11] likelihood of reducing the number of TCP [8:14] segments needed to hold a particular [8:16] payload. [8:18] So a legitimate strategy is to set the [8:21] HTTP compression threshold at the file [8:25] size that would trigger a second TCP [8:28] segment. [8:31] The maximum size of a TCP segment [8:34] depends on the MTU, the maximum [8:38] transmission unit on layer 3 of the OSI [8:41] model, which is 1,500 bytes. [8:46] Each IP packet has an IP header taking [8:50] 20 to 60 bytes and a TCP header taking [8:55] 20 to 40 bytes. And the rest of the [8:59] 1,500 bytes is for the payload. [9:03] For the largest possible payload that is [9:06] still contained in one TCP segment, [9:10] it would contain the TLS record for [9:12] encryption, the HTTP headers, and [9:16] finally once all of that is accounted [9:18] for, the remaining space is for the HTTP [9:21] body, which is the candidate for [9:23] compression. [9:26] In this project, it's too early to [9:27] finalize what our standard HTTP headers [9:30] will be. So, we can't finalize this [9:33] optimization yet. But this is the [9:35] formula that will decide it once all of [9:37] the components are confirmed. [9:41] But there's one last twist. If the [9:44] response doesn't have a content length [9:47] header, then EngineX ignores the gzip [9:51] min length and brought minength [9:54] directives and compresses every file [9:57] regardless of file size. [10:00] So for that reason in Laravel, we've got [10:03] a middleware to measure the body and add [10:06] the content length header. [10:09] There's a PHP.ini ini directive called [10:12] MB string.f funk_over. [10:17] If we set it to zero, we can safely use [10:19] the faster stren function for adding the [10:24] content length header. [10:26] Otherwise, we'd need to use the [10:28] multibbyte equivalent mb [10:32] strlen and specify 8 bit encoding to be [10:36] sure we're getting the number of bytes [10:38] instead of the character count. [10:41] We have to make sure this middleware is [10:42] late in the stack after any middleware [10:45] that will modify the body because if the [10:48] response body is longer than the content [10:50] length header, engine X will cut off the [10:52] excess. [10:58] Okay, onto Laravel and Engine X. There [11:02] are four ways of connecting PHP FPM with [11:06] EngineX in Kubernetes. [11:08] The first decision is whether to put the [11:11] containers in separate pods or in the [11:13] same pod. [11:16] Engine X can handle far more connections [11:18] than PHP FPM can and separate pods would [11:22] let us scale them independently. So we [11:25] could save the memory overhead of the [11:27] excess engine X pods. [11:30] Each engine X pod has an overhead of [11:32] about 10 MGB. [11:36] The downside is that EngineX could [11:38] forward requests to PHP FPM instances on [11:42] different nodes, which would add network [11:45] latency. [11:46] Getting around this is fiddly and might [11:49] not be worth the memory saved. [11:52] Also, separate pods means that the logs [11:55] for a particular request would be split [11:58] up between different pods. [12:04] If we go with a single pod, that opens [12:07] up the choice of how EngineX and PHP FPM [12:10] will talk to each other. Either by using [12:13] TCP or by mounting a shared volume onto [12:17] both containers to hold a Unix socket [12:20] file. [12:22] Each TCP connection needs to be [12:25] established with a TCP handshake [12:27] consisting of three messages sent back [12:29] and forth with the default EngineX [12:33] configuration. That handshake happens [12:35] for every single request rooted to PHP [12:38] FPM. [12:40] Each message sent is bundled with IP and [12:43] TCP headers increasing the amount of [12:46] data to be transferred. And depending on [12:48] the size of the request and the [12:50] response, they might be broken up into [12:53] packets before sending, then reassembled [12:56] at the other end. And finally, the [12:58] connection needs to be torn down with [13:00] another three messages. [13:03] Due to all this redundant stuff, I [13:06] really expected sockets to be measurably [13:08] faster than TCP. [13:11] Maybe the tearown would happen [13:12] concurrently to the response being sent [13:14] out, but the rest of it surely increases [13:17] latency. [13:19] However, in the very quick tests that I [13:22] set up, latency was almost identical [13:25] between the two methods. [13:27] That's interesting academically, and I [13:30] want to investigate more when I have the [13:32] time. But practically speaking, if we're [13:36] interested in such small savings in [13:37] latency, then we'd likely be better off [13:40] considering SWUL instead of PHP FPM and [13:43] EngineX. [13:45] And then this whole architectural [13:47] decision would go away. [13:50] Both TCP and sockets benefit from the [13:53] EngineX directive fast CGI keep con [13:57] which keeps the connection open between [14:00] requests. So we wouldn't need the TCP [14:02] handshake for every request. [14:05] The PHP FPM counterparts are PM do [14:09] process idle timeout which sets the time [14:13] a worker can be idle before it's killed [14:17] and PMAX [14:19] requests which caps how many requests a [14:22] worker can serve before it's respawned. [14:27] And we need to set fast CGI pass in [14:30] engineext and listen in PHP FBM. [14:36] For TCP connections, we tell them which [14:38] port and for socket, we tell them the [14:41] location of the socket file. [14:46] Then the fourth and final option for [14:48] connecting enginex with PHP FPM in [14:51] Kubernetes is putting them in the same [14:54] container in the same pod. [14:57] This is workable but it creates [14:59] complications. [15:01] Kubernetes monitors the status of the [15:03] process with P1 as a health check. If [15:07] the process exits, Kubernetes stops the [15:10] container even if other processes are [15:13] running. And if another process exit, [15:16] Kubernetes has no idea and does nothing. [15:20] So if a container has more than one key [15:22] process, we need to implement a [15:24] replacement for the Kubernetes health [15:26] checks and process management. [15:31] In the end, I went with the middle [15:33] ground option of same pod, different [15:36] containers with shared volume between [15:38] them for a Unix socket file. Mostly [15:42] because I hadn't tried this setup [15:43] before. [15:45] And we'll see the detailed [15:46] implementation in the Kubernetes [15:48] manifest and the Docker file later in [15:50] the video. [15:55] So with that decided, let's jump into [15:57] the pod manifest. [15:59] As I explain components, I'll also build [16:02] up this visual representation to help [16:05] visualize how everything links together. [16:08] Obviously the Laravel and EngineX [16:10] containers are at the core and the first [16:13] step is to inject the configuration [16:15] files with Kubernetes secrets and config [16:18] maps and to mount the shared volume for [16:21] PHP FPM to create the socket file we [16:24] just talked about. [16:27] If we can make the whole file system in [16:29] a container read only, that's great for [16:32] security. So we need to identify where [16:35] the application needs right access and [16:38] then mount volumes at those locations [16:41] with right access and make everything [16:43] else read only [16:45] and the socket volume is our first one [16:47] of those. [16:52] The Laravel Bootstrap caches never [16:55] change in production. [16:57] So at first thought we'd run the cache [16:59] creation commands in the Docker file so [17:02] that they're part of the image and then [17:04] we'd make them read only in production. [17:08] That's fine for event cache, root cache, [17:11] and view cache. But config cache needs [17:15] to read thev file. [17:19] For security reasons, we can't put [17:21] sensitive files like thev into the [17:24] image. So the only safe option is to run [17:28] config cache within each pod as it [17:31] starts up. [17:33] So that means the cache directory needs [17:36] to have right access in production. But [17:39] we'd prefer it to only have read access [17:41] because it never changes once created [17:45] and any attacker that gets right access [17:47] to the bootstrap caches could obviously [17:49] do serious damage. [17:52] But there is a way to get the best of [17:54] both worlds. [17:56] First, in the Docker file, we rename the [17:59] bootstrap/cache [18:01] directory to something else like cache [18:05] temp. [18:07] Then in Kubernetes, we run a laravel [18:10] init container when the pod starts up. [18:14] It has the env file injected and a [18:18] writable volume mounted at [18:20] bootstrap/cache. [18:24] We copy everything from the temp cache [18:26] directory into the volume at [18:29] bootstrap/cache [18:31] and then run php artisan config cache to [18:37] create the final bootstrap cache file. [18:41] The time command just logs the memory [18:43] usage to help us set resource limits [18:45] later. [18:48] Then after the init container has [18:50] finished, the main Laravel container [18:53] starts up with the cache volume mounted [18:56] as readonly true. [19:00] And that's how we get readonly cache [19:02] with sensitive data compiled at the pod [19:06] startup. [19:11] We're also running additive database [19:14] migrations in the init container. [19:17] The d- isolated flag is very [19:20] consequential. [19:22] It means while this migration is [19:24] running, although normal reads and [19:26] writes can still happen concurrently, [19:29] no other migrate command with the [19:32] isolated flag can begin. [19:35] A migrate command without the isolated [19:38] flag can still run. So it's important [19:40] that we add it to every migrate command [19:43] in production to make sure concurrent [19:46] migrations are impossible. [19:49] It uses the cache to track whether the [19:52] isolated command is running or not [19:55] currently. [19:56] So if you're using the database for [19:59] cache, it creates a catch22 situation [20:02] for your first migration. you'd need to [20:04] run migrations once without the isolated [20:07] flag first, [20:09] but most applications would just use [20:11] Reddus, so it wouldn't be a problem. [20:16] An unexpected problem I ran into is that [20:20] if a container crashes and gets [20:22] restarted by Kubernetes, [20:24] the mounted volumes are not cleared. [20:28] That's the behavior we want most of the [20:30] time. We want data to be permanent for [20:33] the life of the pod, but it can cause [20:36] some misleading error logs. In the init [20:39] container startup script, it copied and [20:42] created the cache files with no problem. [20:45] Then it hit an error with migrations. [20:48] So, Kubernetes killed the container and [20:51] ran it again. But on the second run, the [20:54] volume was already populated. So the [20:57] error logs were about file permissions, [20:59] not the migration commands. [21:02] So just bear that in mind. If all else [21:05] is equal, move any operations on mounted [21:08] volumes to the end of the script. But in [21:11] our case, migrations actually depend on [21:13] the cache files being present. [21:18] We've got another init container to set [21:20] up the directory structure for EngineX's [21:23] writable volume. [21:25] I prefer to use chain guard images to be [21:28] minimal and more secure, but the CPUs of [21:31] my nodes are too old and don't support [21:33] them. [21:35] And finally, we have one last container [21:38] that scrapes the enginex metrics [21:40] endpoint and presents that data in a [21:43] format that Prometheus can then scrape. [21:46] Port 8080 is for publicly available [21:50] endpoints via ingress. Port 8081 [21:54] is for internal traffic like health [21:56] checks and metrics and then Prometheus [21:59] scripts the exporter on its default port [22:02] 9113. [22:08] Kubernetes provides three types of [22:10] probes or health checks. Probes are [22:14] attached to containers, but the actions [22:17] on success or failure can impact either [22:20] that container it's attached to or the [22:23] whole pod. [22:25] When a pod starts up, if at least one [22:28] container has a startup probe, then that [22:31] pod won't initially be added to the [22:34] services end points. So, it won't be [22:37] accessible by other pods or by outside [22:40] traffic from ingress. [22:43] If a startup probe fails, the container [22:46] it's attached to is killed and by [22:49] default configuration in Kubernetes, any [22:52] killed container is instantly restarted. [22:56] So that's hoping that a restart or a [22:58] slight delay will fix whatever caused [23:01] the startup probe to fail. [23:04] Once a container startup probe passes, [23:07] it will never run again. And instead, [23:10] the container's readiness probe and [23:12] livveness probe begin running if it has [23:15] them. And they will then run repeatedly [23:18] for as long as the pod exists. [23:21] Once all startup probes in a pod pass [23:25] and if there are no readiness probes, [23:28] then the pod is added to the services [23:30] endpoints and it starts serving traffic. [23:34] If there are one or more readiness [23:36] probes, then the pod waits on them. [23:40] Once all readiness probes pass, the pod [23:44] is added to the services endpoints. [23:47] But the readiness probes keep running [23:49] continuously. [23:51] And if at any time a readiness probe [23:53] fails, the pod is removed again. So a [23:57] failed readiness probe only has a pod [23:59] level effect. It doesn't kill the [24:01] container. [24:03] That's what livveness probes are for. [24:05] When a livveness probe fails, it has no [24:09] pod level effect. The pod can still [24:11] receive external traffic. Instead, a [24:14] failed livveness probe kills the [24:16] container it's attached to. [24:20] So, to summarize, a failed readiness [24:22] probe stops traffic flow reaching the [24:25] pod. A failed livveness probe kills the [24:28] container it's attached to. And a failed [24:32] startup probe does both of those, but [24:35] only when the pod is starting up. [24:39] Phew. I think that's the most concise [24:41] summary of probes I can give. So, how [24:44] can we apply these to our Laravel and [24:46] EngineX pod? [24:49] Well, we don't want requests reaching a [24:52] broken pod. So, a readiness check is [24:55] crucial. [24:57] And the ability to serve requests [24:59] depends on EngineX and Laravel both [25:02] working. So we put the readiness probe [25:05] on the engine X container to query a [25:08] Laravel endpoint that returns a simple [25:12] plain text response. [25:14] That means the readiness check only [25:16] passes if EngineX and Laravel and their [25:19] connection are all fine. [25:23] The ability to serve requests also [25:25] depends on the connection to the [25:27] database and the cache. So we might [25:31] consider checking those connections as [25:33] part of the readiness check. [25:36] But if a database problem did occur, it [25:40] would affect all Laravel pods. [25:43] We wouldn't have a mix of healthy and [25:45] unhealthy pods. And the readiness probe [25:49] would react by removing this pod from [25:51] its service, which doesn't do anything [25:54] to solve the database problem. And [25:57] actually we'd slightly prefer to keep [25:59] the Laravel pod serving requests to give [26:02] as graceful a response as possible. [26:06] And then we'd rely on some other probe [26:09] to heal the database problem closer to [26:11] where it occurred. So no, the readiness [26:15] probe should not check the connections [26:17] to database and cache. [26:20] However, it is a good idea to add those [26:23] connections to the startup probe. That's [26:26] so that if we're deploying a new version [26:28] and I've messed up the connection [26:29] configurations, [26:31] Kubernetes will stop it going live and [26:33] keeps the old version alive serving [26:36] requests. [26:39] Another reason to have a startup check [26:41] is because we're doing opcache [26:42] preloading at startup. So we need some [26:45] flexibility around the slightly [26:47] unpredictable bootup time. [26:51] The startup probe of course also needs [26:53] to check both enginex and Laravel and [26:56] their connection. So we add it to the [26:58] engineext container and call a startup [27:01] endpoint in Laravel. [27:05] This design has one perverse side effect [27:07] that if the startup probe fails, it [27:10] kills the engine X container it's [27:12] attached to. Even though the problem is [27:15] much more likely to come from the [27:16] Laravel container or the database or [27:19] cache connections, but that's one [27:21] imperfection I think we can live with. [27:25] And finally, what about livveness [27:27] probes? Well, in the engineext [27:30] configuration, I created an endpoint [27:32] that returns a simple plain text [27:34] response, but I think the chance of [27:36] EngineX messing up is so unlikely. [27:39] Currently, I don't think it's worth [27:40] running a constant probe. [27:43] Laravel is a bit more likely to mess up. [27:45] So, we've got a livveness probe querying [27:48] the PHP FPM status page. [27:56] The last big topic of this manifest is [27:58] the memory limits that Kubernetes [28:00] imposes on each container. So this is [28:03] where we'll transition to the topic of [28:05] configuration for PHP, PHP FPM and [28:09] engine X. [28:11] The Kubernetes memory limit is a fail [28:13] safe that kills the container if it's [28:16] exceeded. [28:17] That's a pretty drastic action. So we [28:20] need to set it high enough to cover the [28:22] peak memory usage in normal operation [28:26] so that it's only triggered by abnormal [28:28] memory usage that we want to catch early [28:30] and contain. [28:33] This is my formula to estimate peak [28:35] usage in normal operation. [28:38] My understanding can be improved further [28:40] but I think this is a decent formula for [28:42] now. And at the end we use a margin [28:45] component to represent the degree of [28:48] confidence we have in our estimation. [28:51] PHP FBM has one master process and a [28:55] variable number of workers that serve [28:57] web requests. So we need to determine [29:00] which memory expenses are per worker and [29:04] which are shared between all workers. [29:07] The master process overhead, the PHP [29:10] interpreter and its extensions, OPC [29:13] cache, and any mounted volumes stored in [29:16] memory should all be counted once. And [29:19] then everything in the brackets is [29:21] multiplied by the maximum number of [29:23] workers specified by the PHP FPM [29:26] directive, PM domax children. [29:31] For an application that's 100% CPU [29:34] inensive, we'd set PM max children equal [29:39] to the number of CPU cores available to [29:41] the container. [29:43] But the larger the IO weight is expected [29:46] to be like waiting for database queries [29:49] to come back, the more we can raise PMAX [29:54] children above the number of cores. [29:58] Workers process one request at a time. [30:01] So memory usage doesn't scale infinitely [30:04] as the concurrency of requests grow. The [30:08] number of workers is a cutoff. So [30:10] pm.mmax children is the ultimate cutff. [30:14] And any queue of waiting requests mostly [30:17] consumes engineext's memory allowance, [30:20] not PHP FPMs. [30:24] Memory limit in PHP.ini INI is the [30:28] memory usage failsafe on script [30:31] execution. [30:33] So just like the Kubernetes memory [30:35] limit, we need to predict the peak [30:37] memory usage in normal operation of a [30:40] single script this time and add a margin [30:42] of confidence. [30:45] If abnormal memory usage happens in a [30:48] script, we want the PHP memory limit to [30:51] kill that script. [30:54] And the additional margin of the [30:56] Kubernetes memory limit means it can [30:58] only be triggered in an even rarer and [31:01] more extreme situation and it would kill [31:03] the whole container, not just a single [31:06] request. [31:08] To help set PHP's memory limit in [31:11] Laravel, we're logging the peak memory [31:13] usage for every request. [31:16] Similarly, we're also logging the real [31:18] path cache size and the worker ID. [31:22] Doing it in the middleware's terminate [31:24] method means it's executed after the [31:27] response is sent out. [31:30] We can also use Xdebug profiling to find [31:33] exactly how memory is used in a [31:35] particular request execution. [31:38] And while profiling in Laravel, we [31:41] should disable garbage collection at the [31:43] start of the script to ensure accurate [31:45] readings. [31:47] So there's a setting in thev file to [31:49] toggle garbage collection. [31:55] We have a similar structure for the [31:57] engine X prediction of peak memory usage [32:00] in normal operation. [32:03] Shared resources are counted once and [32:06] the items in brackets are per [32:08] connection. So they get multiplied by [32:11] worker processes which is the number of [32:13] workers and worker connections which is [32:17] the maximum number of connections per [32:19] worker. [32:21] The default for worker connections is [32:24] 512 [32:26] and that can realistically be set as [32:27] high as 10,000 or more. [32:30] That means that since we will add a [32:32] margin of confidence to each of these [32:34] buffer sizes here to ensure they can [32:36] satisfy legitimate requests, [32:39] those margins would then be multiplied [32:42] thousands or tens of thousands of times [32:44] when we're calculating the container [32:46] level memory limit. [32:49] We would then be reserving a huge amount [32:51] of memory for a peak usage scenario that [32:54] is very unlikely to occur. [32:57] So to manage that problem, first we need [33:00] to align worker connections with the [33:03] peak request concurrency we want to [33:05] guarantee satisfying and with the memory [33:08] or the financial constraints that we [33:11] have. [33:12] Second, let's consider the size of these [33:15] buffers extremely carefully. Can we [33:18] restrict them without hurting UX? [33:21] Does exceeding a particular buffer kill [33:23] a request or does it just downgrade [33:26] performance and by how much? [33:29] So with that as our goal, how do these [33:31] buffers work? For an incoming request, [33:35] EngineX puts the headers into the client [33:38] header buffer. [33:40] If the headers exceed that buffer, [33:43] EngineX puts them into the large client [33:46] header buffers. [33:49] And if in that scenario the initial [33:52] client header buffer is no longer used, [33:54] we could safely remove that from our [33:56] peak usage formula. But I haven't had [33:59] time to experiment with that yet, so I'm [34:01] keeping it in just to be safe. [34:04] If the large client header buffers are [34:07] exceeded, EngineX returns a 400 bad [34:11] request error. That's a pretty drastic [34:14] action. So, we can't squeeze this buffer [34:17] too tightly, especially if we have a [34:19] broad or non-technical user base. [34:23] But if our API end users are technical [34:26] enough, we could put a low but [34:28] reasonable limit on the size of the [34:30] request headers and then require users [34:33] to read and abide by the documentation. [34:41] EngineX stores the request body in the [34:44] client body buffer and any excess is [34:47] stored on disk. So we can be more [34:50] aggressive with this buffer. Maybe only [34:52] guaranteeing it will hold the bodies of [34:55] 95% or 99% of legitimate requests. [35:02] For an enginex instance that only serves [35:04] static assets like the enginex serving [35:07] our Vue.js JS front end legitimate users [35:11] will never send post, put, or patch [35:13] requests. [35:15] So, we can set the client body buffer [35:17] size very low. [35:20] But, we'd still need to count it in our [35:21] formula because users can still fill up [35:24] that buffer. So, we don't want to give [35:26] malicious users the ability to trigger [35:28] our memory limit and kill the EngineX [35:30] container. [35:34] After receiving the request line and the [35:36] headers, EngineX compiles them to [35:39] determine how to route the request. Some [35:43] of the requests will be sent to PHP FPM [35:46] and it will run the Laravel application [35:48] to build the response which will then be [35:51] sent back to EngineX. [35:54] The first part of the output from PHP [35:57] FPM is stored in the preliminary buffer [36:00] determined by fast CGI buffer size. [36:05] It's crucial that the fast CGI headers [36:08] are fully included in this preliminary [36:11] buffer because if not, EngineX returns a [36:14] 502 bad gateway error. [36:18] These fast CGI headers are what will [36:20] later be translated into the responses [36:23] HTTP status and HTTP headers. [36:29] So we need to be very aware of and [36:31] control the size of the headers returned [36:34] by Laravel. [36:35] In the engineext access log, we're [36:38] recording the embedded variables, [36:40] upstream response length, which is the [36:43] total size of the response payload sent [36:46] from PHP FPM, and body bytes sent, which [36:51] is the size of the response body. So, [36:54] the total minus the body gives us the [36:57] size of the headers. This is likely to [37:00] be much smaller than the size of the [37:02] equivalent HTTP headers because it's in [37:05] binary key value format and doesn't [37:08] include endline characters. [37:13] Any headers added by engine X are not [37:16] held in this buffer because they're [37:18] added as the response is sent out. [37:23] If this preliminary CGI buffer fully [37:25] contains the headers, the rest of the [37:28] buffer is put to good use and fills up [37:30] with the first part of the response [37:32] body. So there's no memory saving [37:34] benefit to squeezing this buffer [37:36] aggressively. [37:38] If the response exceeds the preliminary [37:41] buffer, the rest of the body is stored [37:43] in fast CGI buffers. [37:47] Yes, that's named confusingly. [37:50] what I'm calling the preliminary buffer [37:53] is determined by the enginex directive [37:56] fast CGI buffer size [37:59] and then these body only buffers are [38:01] determined by fast CGI buffers. [38:06] If these body only buffers are also [38:08] exceeded engineext responds with a 502 [38:12] bad gateway. [38:14] So we need to be very aware of the [38:16] maximum size of our responses. [38:20] The value of upstream response length in [38:22] our logs will help with that. [38:26] If we can control the response size with [38:28] a high degree of confidence, we can set [38:31] these buffers quite tightly. [38:34] And then we'd need to set up a system [38:36] such that any design or codebase change [38:38] that affects the maximum response size [38:41] triggers a reassessment of this [38:43] configuration before deployment. [38:46] And we'd also need to implement [38:47] end-to-end tests for the scenarios that [38:50] generate the largest possible responses [38:52] in production. [38:55] It's possible that EngineX clears the [38:57] request buffers before the response [39:00] buffers are filled. If that's the case, [39:03] we could safely use the max of the [39:05] request buffers and the response buffers [39:08] instead of the sum. And that would then [39:10] reduce our peak memory estimate by quite [39:13] a lot. [39:14] That would be very interesting to [39:16] examine, but I haven't had the time to [39:17] do that yet. [39:20] For an enginex instance that only serves [39:23] static assets, it's impossible for a [39:25] request to use the fast CGI buffers, not [39:29] to mention the proxy or the output [39:31] buffers not mentioned here. So, we can [39:34] safely remove those from our peak usage [39:36] formula for that instance. [39:39] And the ingress controller handles TLS [39:42] termination. So we're also ignoring SSL [39:45] buffer here. [39:52] PHP, PHP FPM, and EngineX all have [39:57] settings for timeouts that govern [39:59] various parts of the request response [40:01] process. So I created this diagram for [40:04] my notes to help visualize how they line [40:06] up. [40:08] The x-axis represents time very roughly [40:13] as the request goes from client to [40:15] engineext to PHP FPM and the response [40:19] reverses that route. [40:22] But the size of each block doesn't [40:24] correspond to how long the task takes or [40:26] the suggested timeout value. Rather, the [40:30] diagram shows how and when each timeout [40:33] is triggered and how they overlap. [40:38] If any of these timeouts are breached, [40:40] the client receives an error response. [40:43] So, we want these timeouts to [40:45] accommodate essentially 100% of [40:48] legitimate requests. [40:51] Starting from the left, the first [40:53] timeout is client header timeout, which [40:57] is triggered when EngineX accepts the [40:59] connection after the TCP handshake. [41:03] It sets the time needed to receive the [41:05] request line and the headers from the [41:08] client. [41:09] It's not an absolute timeout. Rather, [41:12] it's a timeout for the intervals between [41:15] reads. That is, each time some part of [41:19] the header is received, the timer resets [41:22] to zero. And that's a running theme for [41:25] a lot of these engine X timeouts. As you [41:28] can see in the diagram, [41:31] after EngineX receives and pauses the [41:34] request line and the headers, the client [41:36] body timeout begins and limits the [41:40] intervals between reads of the request [41:42] body from the client. [41:45] The purpose of these two request [41:47] timeouts is to stop partial requests [41:50] filling up the available connections. [41:53] A slow loris attack is an attempt to do [41:56] that on mass and a clever attacker could [41:59] easily determine our timeouts and then [42:01] drip feed the server with a response [42:04] repeatedly. So it's important to also [42:07] limit concurrent requests from the same [42:09] IP address. [42:11] So we store all concurrent IP addresses [42:15] with limit con zone. With my settings, [42:19] there can be a maximum of 24,048 [42:22] concurrent connections. So that needs [42:26] 124 kilobytes to guarantee storing all [42:28] concurrent addresses. [42:32] Then in a server or location block, use [42:35] limit con to put an upper limit on the [42:38] number of concurrent connections from [42:40] the same IP address. [42:43] Back to timeouts. [42:46] After the header has been received, [42:48] EngineX can determine how to process the [42:51] request. If it will be forwarded to PHP [42:54] FPM and if the EngineX worker doesn't [42:57] already have a connection with an idle [43:00] PHP FPM worker, it starts a new [43:03] connection and starts the fast CGI [43:07] connect timeout. [43:09] In normal operation, establishing a [43:12] connection is near instantaneous [43:14] unless all PHP FPM workers are busy and [43:18] the queue is filled up. The Q size is [43:21] determined by the PHP FPM directive [43:25] listen backlog. So, it's advisable to [43:29] set it to the maximum 511 [43:32] and then set fast CGI connect timeout to [43:36] just a few seconds. [43:39] Once the body is fully received and the [43:42] fast CGI connection is established, [43:45] EngineX begins sending the request to [43:47] PHP FPM and starts the fast CGI send [43:53] timeout and PHP starts the max input [43:57] time. [44:00] Max input time also covers the pausing [44:03] of the request body like populating the [44:06] dollar post or dollar files predefined [44:11] variables before the script can begin [44:14] execution. [44:16] But of course, EngineX doesn't know [44:18] about or care about any of that. So as [44:21] soon as the transmission is complete, it [44:23] switches from the first CGI send timeout [44:26] to the first CGI read timeout, which [44:30] puts a limit on how long Laravel can [44:32] take to return the response in full. [44:36] More accurately, it puts a limit on the [44:38] interval between read operations. But [44:41] since Laravel typically buffers the [44:43] whole response and sends it out at the [44:45] end, that makes fast CGI read timeout [44:50] almost the same as an absolute timeout. [44:55] The execution time of the PHP script is [44:58] limited by PHP's max execution time and [45:03] by PHP FPM's request terminate timeout. [45:09] Max execution time measures CPU time. So [45:13] the timer is paused during IO operations [45:16] like database queries. And when [45:19] exceeded, it has a slightly more [45:21] graceful termination. [45:23] Whereas request terminate timeout [45:26] measures wall clock time and it has a [45:29] hard termination. [45:31] So we'd align these two, but then we'd [45:35] increase request terminate timeout to [45:38] account for the peak expected IO time. [45:41] And then we'd also add a little margin [45:44] to give max execution time a chance to [45:47] terminate the script more gracefully [45:51] after returning the response in full. If [45:54] the PHP script continues execution as in [45:58] with the terminate method in middleware, [46:01] this is included in max execution time [46:05] and request terminate timeout, but not [46:08] in EngineX's fast CGI read timeout [46:12] because once EngineX has received the [46:14] response, it's already moved on to [46:16] sending it to the client. [46:19] So the script execution timeouts can [46:21] extend rightwards beyond T5 in the [46:25] diagram and maybe beyond the end of [46:28] engine X's fast CGI read timeout. Though [46:32] in that situation, it's probably better [46:34] to use Q workers instead [46:38] to help set the script execution [46:40] timeouts. In Laravel, we're logging wall [46:43] clock duration and CPU time duration for [46:47] each request. [46:49] And finally, at the right of the [46:51] diagram, send timeout limits the [46:55] intervals between write operations while [46:58] sending the response to the client. [47:02] Engine X has four embedded variables [47:04] which we can log to help us with setting [47:08] some of these timeouts. [47:10] Request time measures from the first [47:13] bite received from the client to the [47:16] last bite sent to the client. [47:20] Upstream connect time measures the time [47:23] to establish the first CGI connection. [47:27] That should hopefully always be zero. [47:31] Upstream header time measures from the [47:34] first bite sent to PHP FPM until the [47:38] first bite received in response by [47:40] engine X. [47:43] An upstream response time has the same [47:46] start point but keeps measuring until [47:49] the last bite of the response is [47:51] received by engine X. Those two will be [47:54] the same unless the response is very [47:56] large. [48:00] In the Laravel documentation, the [48:03] suggested EngineX configuration is [48:05] really not great. [48:08] As an example, let's say a user sends a [48:12] request to our domain /hello. [48:16] So this embedded variable dollar uri [48:19] equals hello. [48:21] With this configuration, what we're [48:23] asking engineext to do is this. [48:26] First check for a file called hello and [48:30] if it exists serve that file to the [48:33] client. [48:34] So far so good. That could be a JS file [48:36] or a CSS file. [48:39] But if hello file doesn't exist, check [48:42] for a directory called hello. [48:46] If hello directory exists, check for an [48:49] index file which above is defined as [48:52] index.php. [48:54] If that exists, serve it to the client. [48:57] And already from a Laravel perspective, [49:00] we are way off course. [49:03] If hello directory didn't contain [49:06] index.php, [49:08] then serve the directory listing, which [49:11] very ancient internet users will [49:13] remember, but these days directory [49:15] listings are disabled by default. So, [49:18] EngineX returns 403 forbidden simply [49:22] because the user requested a directory [49:24] which does exist on the server. [49:28] And if the request URI isn't a file and [49:31] isn't a directory, finally we're [49:33] directed to the Laravel application [49:36] index.php. [49:38] But I don't understand why index.tphp [49:40] here is so convoluted with variables. [49:44] If not to a static file, we always want [49:46] to forward to public/index.php. [49:50] So why not just hardcode it here and [49:52] state it clearly? [49:56] And these headers at the moment are [49:58] functional. But the moment we put an add [50:01] header directive into a location block, [50:04] that block no longer inherits add header [50:07] directives from outside. So it's safer [50:10] to just put all add header directives [50:14] into location blocks and don't rely on [50:16] inheritance. [50:18] This whole configuration feels like a [50:20] copy paste job from a pre- Laravel PHP [50:23] project. And although it works, it has [50:27] needless inefficiencies and potential [50:29] security flaws. [50:33] For our EngineX configuration on port [50:35] 8080, [50:37] we want all responses to be JSON, [50:40] including errors caught by EngineX. [50:43] So for the error pages, we're using [50:45] named locations that serve static JSON [50:48] files saved into the image. [50:51] Internal locations would also achieve [50:53] the same result. [50:56] Then apart from the FAV icon and [50:58] robots.txt, txt we're returning 404 for [51:02] any request that doesn't start with [51:05] slash API/v1 [51:08] slash [51:10] and if we take a look at the standard [51:12] hacky requests that every server gets [51:15] this location block alone rejects well [51:18] all of them we definitely don't want any [51:21] malicious requests like this to access [51:23] any static files and preferably we don't [51:26] want to waste resources forwarding the [51:28] request to Laravel just for it to return [51:31] a 404 response. [51:35] All API requests get sent to Laravel and [51:38] index.php is hardcoded for clarity. [51:43] Finally, we're rate limiting by IP [51:45] address here in EngineX and later by [51:49] user ID in the Laravel application. [51:54] Then for port 8081 for cluster internal [51:58] traffic, we've got keep alive times to [52:00] sustain TCP connections with EngineX [52:03] exporter and the readiness check for one [52:08] week and for two weeks respectively. [52:12] /engineex up is a simple enginex only [52:15] endpoint for a livveness check which [52:18] we're currently not using. [52:20] slashengineext status is for the enginex [52:23] exporter to scrape engineext metrics and [52:26] in turn be scraped by Prometheus [52:29] and the rest are specific endpoints to [52:31] send to Laravel [52:34] in the Vue.js EngineX instance if a [52:38] request URI ends in one of these file [52:41] extensions we check if the static file [52:44] exists and if not return index.html HTML [52:49] and the Vue.js router will send the 404 [52:52] page. [52:55] And then for non-static asset URIs, go [52:58] straight to index.html. [53:01] In the content security policy header, [53:04] we need to specify the API domain for [53:07] connect source and form action. [53:12] For static assets, we can cache the [53:14] results of the stat and open system [53:17] calls. And since we're using immutable [53:20] containers, we can safely cache for a [53:22] year or more. And then the static assets [53:25] themselves will likely be held in memory [53:27] by Linux's page cache, though that [53:30] depends on other containers in the same [53:32] node. [53:37] Let's take a quick look at the Docker [53:38] files for the Laravel image. Some [53:42] dependencies are needed at build time [53:44] for compiling PHP extensions and running [53:47] composer install but not needed at [53:50] runtime. [53:52] So the overall design is compile in a [53:55] builder target and then copy the results [53:58] into a fresh minimal target for [54:01] production. [54:03] To accommodate other targets which we'll [54:05] discuss in a second, we have to split [54:08] builder and build prod targets and split [54:12] minimal base and prod targets with prod [54:15] being the ultimate image to deploy in [54:18] production. [54:20] During the build phase, the composer [54:22] install command only needs composer.json [54:26] and composer.lock, block. But creating [54:29] the auto loader obviously needs the [54:31] entire codebase. So a very efficient [54:34] Docker caching strategy is to copy in [54:37] thejson andlock files, run composer [54:41] install with the d-n no autoloadader [54:44] flag. [54:46] Then that intensive process is cached [54:49] until we change our composer [54:51] dependencies. [54:53] Then we can copy in the codebase and [54:55] build the autoloader. [54:57] If we didn't split those commands, we'd [54:59] need to build the vendor directory every [55:02] time we modify a file [55:05] in the production image for PHP FPM to [55:09] communicate with the engineext [55:11] container. Both processes need read and [55:14] write access to the socket file. So we [55:17] make sure that the www data user in this [55:21] container and the engineext user in the [55:24] engineext container have the same UID [55:28] and then set the file permissions [55:30] accordingly in the PHP FPM configuration [55:35] as mentioned in the Kubernetes section. [55:37] We run all bootstrap caches except for [55:40] config cache in the docker file. [55:44] And we're not using Laravel's built-in [55:46] health checker endpoint. But if we were, [55:49] apparently that view isn't cached by PHP [55:52] artisan view cache. So we can access it [55:56] once in the Docker file to force that [55:59] view to be rendered so that we can make [56:02] the cache directory read only in [56:04] production. [56:06] For file permissions, I've commented out [56:09] some of my standard Laravel production [56:11] setup because they don't apply in [56:13] Kubernetes or for this particular [56:15] project, but I like to keep them here [56:17] just as a reminder or if we change the [56:20] project or architecture later. [56:24] For the local dev environment, the [56:26] simplest option is to extend the builder [56:28] target just before the composer install [56:31] command is run. Then install development [56:35] tools like XDBug and create a directory [56:38] for the output from XDBug profiling [56:42] and then mount the local codebase into [56:44] the container in docker compose with a [56:47] bind mount. [56:50] We want to run composure install and [56:52] artisan commands inside this container [56:55] to write files to our local device. [56:59] So in the make file we can construct a [57:01] command to enter the container with the [57:04] www data user and our local users group [57:09] GD [57:10] and then set um mask to make sure that [57:13] any files generated have 775 permissions [57:17] as in full permissions for both user and [57:20] group. [57:21] That way the vendor directory and any [57:24] other files created by this container [57:26] are readable and writable by the www [57:29] data user in this container and our [57:33] local user on the host. [57:37] I've also got a combined target for [57:39] testing the architectural option of PHP [57:43] FPM and engine X in a single container [57:46] which we discussed earlier. [57:50] The only complaint with this dev image [57:52] is that it has different dependencies to [57:54] the production image. So potentially [57:57] some tests could be passing in this [57:59] image but be failing for production. [58:02] This is a trade-off I'm happy with. But [58:05] an alternative, more complex approach [58:07] would be to run composer install with [58:10] testing dependencies included. [58:13] Then copy the resultant vendor directory [58:16] into a target that splits from the [58:18] production image just before the [58:20] codebase is copied in and then mount the [58:24] local code base with a bind mount. And [58:27] similarly for the dev environment split [58:30] off from minimal base and install dev [58:33] tools like xdebug. [58:37] The local development environment is [58:39] handled by docker compose. There's not [58:42] too much to note here. Engineext depends [58:45] on the laravel container and service [58:48] started is enough because we just need [58:50] the socket file to exist before enginex [58:53] starts. [58:55] Ideally, Laravel should depend on [58:57] Postgress and Reddus passing health [58:59] checks. Postgress has a handy pg_is [59:04] ready command and Reddus has ping. But I [59:08] commented out the dependencies since [59:10] very often I was just testing engine X [59:12] and PHP only and didn't want to wait an [59:15] extra 2 seconds to get running. [59:18] There are two instances of Postgress, [59:21] one for the dev environment and one for [59:22] testing. It's most common to run tests [59:26] with an in-memory instance of SQL Lite [59:29] as the database. But as we'll see later, [59:32] our application is unfortunately tightly [59:35] coupled to Postgress. So we need to run [59:37] the tests with Postgress to have any [59:40] confidence that the results represent [59:42] the application in production. [59:46] There's a pre-commit script to process [59:48] the code before committing to git [59:50] repository for Vue.js. JS lint staged [59:55] handles things pretty well. We're [59:58] running prettier eslint and vest on only [60:02] staged files that have changed since the [60:04] last commit and running view tsc on the [60:08] whole project. [60:10] In a similar way, in the bash script, [60:13] we've got a function that returns an [60:15] array of the staged files that have [60:18] changed since the last commit, so that [60:20] we're not wasting time and resources [60:22] checking the whole codebase on each [60:24] commit. [60:26] For scripts, we feed that array into [60:28] shell check lint. And for PHP, we're [60:32] validating the composer.json JSON and [60:35] log files. Then feeding the changed [60:38] files array into PHP stan at level 9 and [60:42] pint and then running git add to [60:45] reststage any files that were modified. [60:48] Each makes sure to reference the correct [60:50] configuration file. And we have a [60:53] stricter pint configuration for non-ests [60:56] than for tests. [60:59] And the last step is to run PHP unit. It [61:03] executes a script mounted in the Laravel [61:05] container which accepts arguments for [61:09] whether or not to first run database [61:11] migrations, [61:12] the test coverage threshold, test suites [61:16] to include or exclude, and which tests [61:19] specifically to run. [61:23] Since I'm not working as part of a team, [61:25] a make file is sufficient for a CI/CD [61:28] pipeline for testing, building, and [61:30] deploying. [61:32] Exec Laravel executes a command in the [61:35] Laravel container. [61:38] Very often we're creating or editing [61:40] files on the host machine via a bind [61:42] mount. So we enter the container with [61:45] both the www data user and the local [61:50] users group so we can function in both [61:52] worlds. And we're setting um mask so [61:55] that any files created have 775 [61:58] permissions so that both the containers [62:01] user and the local user can read and [62:03] write the files created. [62:06] Shell Laravel executes an interactive [62:09] shell in the container. [62:12] The composer commands have the same [62:14] function and are just time savers to do [62:16] specific composer functions without [62:19] opening an interactive terminal. [62:21] And then there are similar exec and [62:23] shell commands for each container. [62:27] If we want to run PHP stan pint or PHP [62:31] unit outside of the pre-commit check, [62:34] those commands are here. For testing, [62:37] there are commands for the standard test [62:39] suite and for end-to-end tests for [62:42] testing a deployment. [62:45] Then there are commands for building the [62:46] images and for deploying them in [62:49] Kubernetes. [62:51] We can switch contexts between the local [62:53] kind cluster and my physical cluster. [62:59] For this API, I want every response to [63:02] fit within a small set of predictable [63:04] JSON structures. [63:07] The top level of the response should [63:09] always be an object, not an array to [63:11] avoid JSON array hijacking. [63:14] And for every response object, we attach [63:17] an object called meta, which at least [63:20] includes a request ID to help clients [63:23] communicate problems with us and help [63:25] with debugging. [63:27] I'm also including the timestamp and [63:29] script duration for now, but these don't [63:32] have any practical purpose at the [63:33] moment. [63:36] For a query that returns a single [63:38] resource, the resource is the value of a [63:41] key called data. [63:44] For a query that returns multiple [63:46] resources, data is an array of results. [63:51] And we add an object called pagionation [63:54] to help the client navigate through the [63:56] data set. [63:58] For a successful query with no resource [64:00] to return, data just contains result [64:04] success. [64:06] And finally, if at least one error [64:08] occurs, data is replaced by an array [64:11] called errors. [64:13] which contains error objects which each [64:15] have an integer code and a string [64:18] message. [64:20] These error codes help the client to [64:22] respond programmatically to a problem [64:24] without needing to parse the message [64:26] text. [64:28] The error itself is a data transfer [64:31] object called API error and the code is [64:35] an integer enum called API error code. [64:41] API error code has a method called [64:44] message which accepts an array of [64:46] placeholders if necessary and returns an [64:50] appropriate message for each error code. [64:53] And the API error constructor calls this [64:57] message method. [65:01] So API error code is a very convenient [65:04] single location to plan and construct a [65:07] list of all possible errors, pair them [65:10] up with a suitable message, and a [65:12] reference point for the placeholders [65:14] that we need to pass to the API error [65:17] constructor. [65:19] The aim is to be as specific as possible [65:22] with error codes, but also to have more [65:24] general error codes to fall back on. For [65:28] example, we have specific error codes [65:30] for each type of input validation [65:33] employed in our project. But if [65:35] something goes wrong, there's a general [65:37] validation error code. So if we use a [65:41] new type of input validation in a [65:43] controller or form request, but we don't [65:46] account for it in our validation error [65:48] handler, we can still return a [65:50] semi-specific error response. And in the [65:53] worst case scenario, we can fall back on [65:55] the unknown error code. [65:58] Of course, when such general errors [66:00] happen, we need to analyze the logs and [66:03] construct more insightful error codes. [66:05] As a result, [66:08] the JSON response structures mentioned [66:10] each have a method in the class called [66:13] API response builder. [66:16] success pagenated [66:19] errors. [66:21] And as well as the errors method, [66:23] there's also an error method because [66:26] returning a single error is the most [66:28] common scenario and it's easy to forget [66:30] to enclose it in an array. [66:35] Since we want to always send JSON [66:37] responses with descriptive error codes, [66:40] we want to replace Laravel's default [66:43] exception handling behavior in Laravel [66:46] 11 onwards. That's done in [66:48] Bootstrap/app.php. [66:53] But since it's likely to get quite [66:54] sizable, I've extracted it to a class [66:56] called API exception handler. [67:00] In simple cases, we can just define an [67:02] error response with API response [67:05] builder. [67:07] In some cases, there's a small amount of [67:09] processing to add more detail to the [67:11] error response. [67:13] And there's a catch all default for any [67:16] exception we're not handling [67:17] specifically. [67:20] For input validation errors, I've [67:22] extracted the logic to validation errors [67:25] builder, which returns an array of the [67:28] data transfer object. API error to be [67:32] fed into API response builder errors [67:35] method. [67:37] In validation errors builder, we loop [67:40] through the errors returned by the [67:42] validator and match each with the [67:45] correct API error code. [67:49] As mentioned throughout this project, [67:51] whenever there's a scenario that we [67:53] don't expect to happen, like if we fail [67:55] to match the validation error, we log [67:59] the details and fall back on a more [68:01] general API error code. [68:05] For the database, I have maybe something [68:08] of a controversial feature which I'll [68:11] have to explain carefully. [68:14] When we need to write to the database [68:16] and one of the columns has a unique [68:18] constraint or a foreign key constraint, [68:22] the standard practice is to first [68:24] validate the value with a select query [68:28] and if no results are returned, then [68:30] continue with the insert or update [68:33] operation. [68:35] So that's one database query for the [68:37] failure scenario and two queries for the [68:40] success scenario. But [68:43] with a lot of caveats, we can [68:46] potentially skip the validation in [68:48] Laravel, write the value directly to the [68:51] database and if a constraint is [68:54] violated, handle the error returned by [68:56] the database and inform the user of the [68:59] input validation error. [69:02] That means only one database query for [69:05] both success and failure scenarios which [69:09] reduces the load on the database and [69:11] speeds up responses from the client's [69:13] perspective. [69:15] It also removes the race condition [69:17] between the select query for the [69:19] validation and the eventual write to [69:22] database. [69:24] So now for the downsides. One, it splits [69:28] the validation logic in two, which is [69:31] messy. I kept the constraints in the [69:34] form request as comments, as reminders [69:37] of what will be validated by the [69:39] database. [69:41] Two, it's not great for the developer [69:44] experience. We just have to remember to [69:47] not validate constraints in form [69:49] requests and to employ our new strategy [69:52] each time we want to insert or update. [69:57] Three, the database treats the [69:59] constraint violation as an error, not as [70:02] a simple validation check, and it logs [70:05] it as such. So, we'd need a log [70:08] filtration system in production. [70:12] Four, the database returns the error as [70:15] a string which we have to parse [70:18] and that is a somewhat fragile process. [70:21] We need to run rigorous tests with many [70:23] edge cases every time we change database [70:26] version. [70:28] And five, error responses vary per [70:31] vendor. So we're tightly coupling our [70:34] application with our initial choice of [70:36] database vendor, in this case Postgress. [70:41] These are all very serious cons and [70:44] nothing else about this project is [70:46] geared towards the high concurrency [70:49] situation that would give value to the [70:51] pros. So for a real project, I would [70:54] almost certainly not implement this [70:56] feature, but I wanted to explore it as [71:00] an educational exercise. And it's a [71:02] healthy exercise to predict the cons in [71:04] advance, try it, and run head-on into [71:08] any unexpected cons, and then get better [71:10] at analysis of design choices in the [71:13] future. [71:15] So far with Postgress with unique and [71:18] foreign key constraints, I haven't hit [71:21] any critical problems from paring the [71:23] error message. [71:25] It returns a unique SQL state code that [71:29] identifies which constraint was violated [71:33] and the offending column is bounded by [71:36] characters that are invalid for a column [71:38] name and preceded by a substantial fixed [71:42] string. [71:44] Postgress does actually allow illegal [71:46] characters in the column name if it's [71:48] bounded by double quotes. So we'd have [71:51] to check for that. [71:53] Another nuisance is that Laravel [71:55] interpolates the actual values into the [71:58] Postgress error message. And for [72:01] security, we don't want potentially [72:03] sensitive values feeding into our psing [72:05] logic, especially for a fragile process [72:08] like this that has a lot of logging. [72:13] So it works for Postgress but if for [72:15] some reason we change database vendor [72:18] we'd need to rewrite the parsing logic [72:20] and there's no guarantee that the error [72:22] message provides the required detail and [72:25] format for us to parse. [72:28] Here's the comparable message from SQL [72:30] light. [72:32] The SQL state code is more generic than [72:34] Postgresses. So we'd have to parse the [72:37] text to discover even which constraint [72:39] was violated. [72:42] There's also a small possibility that a [72:44] new version of Postgress will change the [72:47] error message in a way that hurts this [72:49] feature. [72:52] I implemented this feature with a trait [72:54] called handles DB errors. [72:57] For any code that inserts or updates a [73:00] database record, we wrap it in a closure [73:04] and in the handle DB errors method. and [73:07] the closure is executed inside a try [73:10] catch block looking for query exception. [73:15] This wrapping design causes very minimal [73:18] disruption and there's a low development [73:20] cost to enabling or disabling the error [73:23] handling feature. [73:25] The design is very reusable. Just add [73:28] the trait to any class that writes to [73:30] database. And compared to a rigid method [73:33] signature, the closure gives us complete [73:36] flexibility around what variables to [73:38] pass, what type to return, [73:42] which interface to use to interact with [73:44] the database, how many queries to run, [73:48] what parts of the code to wrap in a [73:50] database transaction, [73:52] and what other actions we need to run [73:54] alongside the queries. [73:57] Currently, if a constraint violation is [74:00] detected, we're immediately returning a [74:02] validation error response to the client. [74:06] We could consider making this more [74:07] flexible, like allowing more closures to [74:10] be passed to handle specific error [74:12] scenarios. For example, we might want to [74:16] alter our reaction based on which column [74:18] violated the constraint. [74:27] Also regarding the database, we're [74:30] implementing a rule of no select star [74:33] queries or no queries that return all [74:36] columns. This is to reduce the chance of [74:40] exposing sensitive data and to make [74:42] queries faster to run. [74:45] So for any resource that will be output [74:48] by the API, we're defining a resource [74:50] class where we define how the raw output [74:54] is processed into the API output. [74:58] In this case, we're just renaming UU ID [75:01] into ID. [75:04] And we're defining the columns to be [75:06] injected into the select query to fetch [75:08] only the relevant columns. [75:11] For the naming scheme, we have the model [75:14] name item, then public or private [75:18] explicitly warning if this resource will [75:20] be output by the API or not. For [75:23] example, the UU ID is for public usage [75:27] while the incrementing integer ID is [75:29] strictly for internal usage. [75:32] And we have full or minimal to denote [75:35] which columns to include. [75:38] Pagionated results would typically use [75:40] minimal while the results of a single [75:43] specified resource would typically use [75:46] full and include more columns. [75:49] Then in the model class, it imports the [75:51] columns constant to run the relevant [75:54] queries. [75:59] Let's run through the life cycle of a [76:01] standard request. The first thing we do [76:04] is create an instance of request context [76:07] service which will hold auxiliary [76:10] information about the request. [76:12] We define it as a singleton so that [76:15] anytime it's referenced in a method [76:17] signature throughout the application, [76:19] the service container will inject the [76:21] same instance with the same properties [76:24] similar to how request itself is [76:26] handled. [76:28] Immediately we store the current time so [76:31] that we can calculate the wall clock [76:33] duration at the end of the script [76:35] execution and we store the current [76:38] resource usage of the PHP FPM worker [76:42] process in order to calculate the CPU [76:44] time duration at the end of the script. [76:48] There's an empty array to store the [76:49] duration of any database queries which [76:52] will also be logged at the end. And the [76:55] request ID set way back in the ingress [76:58] controller is saved here and added to [77:01] the context of any logs that will be [77:02] written. [77:05] The logging middleware will call get [77:07] duration milliseconds which is pretty [77:10] simple and also get CPU time [77:14] milliseconds which requires some [77:16] explanation. [77:20] R U [77:22] time.tv TV sec is the number of whole [77:26] seconds the PHP FBM worker process has [77:30] spent in user mode like the PHP [77:33] interpreter doing work. [77:36] The same key with us or microsconds [77:40] instead of sec is the number of [77:42] microsconds towards the next whole [77:45] second in user mode. So it's bounded by [77:49] 1 million. [77:51] RUS time.tv [77:55] sec is the number of whole seconds the [77:58] PHP FPM worker process has spent in [78:01] kernel mode. So that's system calls, [78:05] memory mapping, network stack [78:07] processing etc. [78:10] And the same key with USC instead of SE [78:13] is the number of microsconds towards the [78:15] next whole second in kernel mode. [78:19] The most intuitive mathematical approach [78:21] to calculating the CPU time duration of [78:25] the script is to combine the seconds [78:28] with microsconds [78:30] and sum the user time and kernel time [78:34] and then subtract the start time from [78:36] the end time just like we do for the [78:39] wall clock duration. [78:41] I was worried that floating point [78:43] accuracy might be a problem here. [78:46] Floating point accuracy is 15 to 17 [78:49] significant figures. So adding [78:51] everything up to a large total before [78:54] the final subtraction could cause some [78:57] precision to be lost. [79:00] And since the difference between the [79:01] start and end times will generally be [79:03] tiny, that loss of resolution could mean [79:06] we get a result of zero once the PHP FPM [79:10] worker passes a certain age. [79:14] So maybe it would be better to have a [79:16] mathematically equivalent but less [79:18] intuitive formula that subtracts large [79:21] numbers from large numbers and sums the [79:24] results at the end. [79:27] But I tried some rough calculations and [79:30] actually the loss of resolution happens [79:32] sometime after 250 years. So yes, the [79:36] intuitive formula will do fine. [79:41] The CPU time duration will inform our [79:43] decision on setting the PHP directive [79:46] max execution and the wall clock [79:48] duration helps with the PHP FPM [79:51] directive request terminate timeout. [79:56] The last two methods are to log the size [79:59] of the response headers and the body. [80:04] The only thing to note here is that the [80:06] headers are ASKI only. So we can always [80:09] safely use strlen which is faster than [80:13] the multibbyte equivalent mb strl ln. [80:19] The body is UTF8. [80:21] So potentially nonasi. [80:24] So we can only use strl ln as long as in [80:28] php.ini [80:30] we set mbstring.funk [80:33] overload to zero. [80:36] Otherwise, we'd have to use MB_ST [80:40] strlen and specify 8 bit encoding to be [80:44] sure we're getting the number of bytes [80:46] instead of the number of characters. [80:50] Then we add a listener for the database [80:53] to add the query time to the query [80:55] duration array we just created. [80:59] And next we configure the rate limiter, [81:03] although it isn't actually applied at [81:05] this point in the request. [81:07] Engine X rate limits by IP and [81:10] intercepts those requests before they [81:12] reach Laravel. So in Laravel, we're [81:15] limiting by user ID and that needs to [81:18] happen after authentication. [81:21] For us, it happens later in [81:23] roots/api.php [81:25] PHP straight after the authentication [81:28] middleware. [81:30] Malicious users can get around this with [81:33] a coordinated system of multiple [81:35] accounts and multiple IPs. [81:38] If we were worried about that and we [81:41] couldn't control user accounts or [81:43] whitelisted IPs more tightly, then we [81:46] might consider running pattern [81:47] recognition of usage, i.e. combine the [81:51] usage of two or more users. assess them [81:55] as if they were a single user and see if [81:57] it adds up to a malicious usage pattern. [82:02] Or we might try to gather a list of [82:04] known VPN IP addresses and monitor those [82:07] accounts more closely. [82:10] For EngineX's rate limit on IP [82:12] addresses, we need to consider if our [82:15] clients are in a business at the same [82:17] address. [82:18] For example, if all users log on at 9:00 [82:22] a.m. on the same IP address, then we [82:24] might need to loosen X's rate limit on [82:27] IP addresses. [82:33] Next, the request hits the middleware. [82:36] I've disabled global middleware and [82:39] moved most of the default middleware to [82:41] the API group. [82:43] That's because the web group can only be [82:46] accessed internally within the [82:47] Kubernetes cluster. [82:50] For trust proxies, we're providing the [82:53] cider range that the ingress controller [82:55] is guaranteed to be within. And then [82:58] Laravel knows that the client's IP is [83:01] the real IP and that the connection is [83:04] indeed secure. [83:07] Then API requests pass to root/api.php. [83:11] PHP [83:13] and for routes that require [83:14] authentication the requests pass through [83:17] the authentication middleware and the [83:20] rate limiter middleware we discussed [83:22] earlier. [83:24] I extended the authenticate class as a [83:27] convenient way to add the user ID to the [83:30] context of logs created after [83:32] authentication [83:34] and also because in the original class [83:36] if an unauthenticated request didn't [83:39] have the accept header of [83:42] application/json [83:44] it tried to redirect the user to a login [83:46] page but our API is purely JSON so [83:50] that's not the desired behavior. [83:53] The last endpoint is for browsers to [83:55] report content security policy [83:57] violations which go straight into the [83:59] log. [84:02] The web routes are mostly for the [84:04] Kubernetes probes mentioned earlier. / [84:07] Laravel readiness simply replies [84:10] immediately. [84:12] / Laravel startup if you recall tests [84:16] the connections to the database and the [84:18] cache in a try catch block and returns [84:21] an error HTTP status code if there's any [84:24] problem [84:26] and / Laravel status is for my own [84:29] tinkering and analysis of real cache and [84:32] opcache in production to further tweak [84:35] those configurations. [84:37] Unfortunately, we can't just run those [84:39] commands in the command line interface [84:42] because the CLI is separate from PHP FPM [84:45] and maintains its own obcache memory [84:48] pool. [84:49] And finally, an API request runs back [84:52] through the middleware. [84:54] The handle cause middleware along with [84:57] its config lets us tell browsers that [85:00] the Vue.js front end is permitted to [85:02] access the API. [85:05] And the inject meta middleware adds the [85:08] meta object to each JSON response object [85:11] as it's passing out of the door. [85:14] After the response has been sent, the [85:16] last action is to log the request for [85:18] future analysis. [85:23] I won't cover the Vue.js front end [85:25] because this video is already quite long [85:28] and it just logs into the API, stores [85:30] the token and interacts with the API in [85:33] a simple manner. [85:35] All the source code is available in the [85:37] description of the video and please let [85:39] me know if you have any questions or any [85:42] improvements on any part of the code. [85:44] Thanks.