[0:00] Hey everyone, welcome to Bitbyte Talks. [0:03] Today we are diving into something that [0:05] quietly powers every single video you [0:07] have ever watched on the internet, [0:09] whether it's a YouTube binge, a Netflix [0:12] marathon, or a video call with your [0:14] friends. Video encoding is always behind [0:17] the scenes making it all happen. So, let [0:21] me ask you something. Have you ever [0:22] noticed how a 2-hour movie can fit into [0:25] just a few gigabytes on your phone? That [0:28] right there is the magic of video [0:30] encoding. Let us break it all down. [0:32] Okay, so before we talk about encoding, [0:35] let us understand what raw video [0:37] actually looks like. A video is [0:40] basically just a long sequence of [0:42] images. We call them frames. Each frame [0:44] is a snapshot of what you see. Now, if [0:47] you have a 1080p video, each frame has [0:50] 1920 * 1080 pixels, and each pixel has [0:54] three color channels, red, green, and [0:57] blue. That is roughly 6 million bytes [1:00] just for one frame. Now, multiply that [1:03] by 30 frames per second for 1 second of [1:06] video, and you're looking at about 178 [1:09] megabytes every second. For a 10-minute [1:12] video, that would be over 100 gigabytes. [1:15] That is absolutely enormous. We simply [1:18] cannot store or stream that. This is [1:21] exactly why video encoding exists. [1:24] Encoding is the process of compressing [1:26] that massive raw video data into a much [1:29] smaller file size without making the [1:31] video look terrible. Think of it like [1:33] packing a suitcase. You start with a [1:35] mountain of clothes, and you use clever [1:37] folding techniques to fit everything [1:39] neatly into a single bag. Encoding does [1:42] the same thing for your video data. It [1:44] finds patterns, removes redundant [1:47] information, and stores only what is [1:49] truly necessary to reconstruct a [1:51] great-looking video when you hit play. [1:53] So, how exactly does encoding work? The [1:56] software that handles this job is called [1:59] a codec, which stands for coder-decoder. [2:02] A codec compresses the video when you [2:04] save it or stream it, and decompresses [2:07] it when you watch it. The encoder is on [2:09] the production side. It takes raw video [2:11] and crunches it down. The decoder is on [2:14] the playback side. Your phone, laptop, [2:17] or TV uses it to reconstruct the video [2:19] from the compressed data. You have [2:22] definitely heard of popular codecs [2:23] before, things like H.264, [2:27] H.265, [2:28] AV1, and VP9. These are all different [2:32] codecs with different ways of achieving [2:34] compression. Let me show you one of the [2:36] core tricks encoding uses. Imagine a [2:39] frame that shows a clear blue sky. The [2:42] upper half of that image is almost [2:44] entirely blue, thousands of pixels that [2:47] are nearly identical. Instead of storing [2:49] each pixel individually, the encoder [2:52] says, "Hey, this whole region is the [2:54] same color. Let me just store that [2:56] once." This is called spatial redundancy [3:00] or intraframe compression. It is similar [3:02] to how zip files work. Instead of [3:04] repeating information, it stores a [3:06] shorthand description. The result is [3:09] that the sky takes up a fraction of the [3:11] space it normally would. But there is an [3:13] even more powerful trick, temporal [3:16] redundancy. Here's the idea. In most [3:19] videos, consecutive frames look almost [3:22] identical. Think of a news anchor [3:24] sitting at a desk. Their face and hands [3:27] might move slightly, but the desk, [3:29] background, and studio are completely [3:31] unchanged from one frame to the next. [3:34] Instead of re-encoding the entire [3:36] background in every single frame, the [3:39] encoder only encodes what changed, just [3:42] the movement. It stores something like a [3:44] reference frame, and then only saves the [3:46] difference between frames. This is [3:49] called interframe compression, and it is [3:52] responsible for a huge chunk of the [3:54] compression gains in modern video. This [3:56] brings us to the three types of frames [3:58] that encoded video uses. First, we have [4:01] I-frames, short for intra-coded frames. [4:05] These are complete snapshots of the [4:06] scene, like a keyframe. Then we have [4:09] P-frames, predicted frames. These only [4:12] store the difference between the current [4:14] frame and the previous I or P-frame. [4:17] They rely on what came before. And [4:20] finally, B-frames, bidirectional frames. [4:23] These are the most efficient. They [4:25] reference both the frame before and the [4:28] frame after to predict what the current [4:30] frame looks like. A well-encoded video [4:33] uses a smart mix of these three to [4:35] achieve maximum compression while [4:37] maintaining great quality. Another key [4:40] concept in video encoding is bitrate. [4:44] Bitrate is the amount of data used per [4:46] second of video, usually measured in [4:48] kilobits per second or megabits per [4:50] second. Higher bitrate means more data, [4:53] which means better quality, but also a [4:55] bigger file. Lower bitrate means smaller [4:58] file, but more compression artifacts, [5:01] those blocky, blurry, or pixelated [5:03] glitches you sometimes see on a [5:05] low-quality stream. For reference, a [5:07] standard 1080p YouTube video typically [5:10] uses around 8 megabits per second. A 4K [5:13] HDR video might use 50 megabits per [5:16] second or more. Choosing the right [5:18] bitrate is a balancing act between [5:20] quality and storage or bandwidth [5:22] requirements. People often confuse [5:25] codecs with container formats, so let me [5:27] clear that up right now. A container is [5:30] like a box that holds everything, the [5:32] video stream, the audio stream, [5:34] subtitles, and metadata. Examples of [5:37] containers are MP4, MKV, AVI, and MOV. [5:42] The codec is what is inside the box. It [5:46] defines how the video data is actually [5:48] compressed. So, for example, you can [5:50] have an MP4 file that uses H.264 codec [5:54] for video and AAC codec for audio. Or an [5:58] MKV file using H.265 for video and DTS [6:02] for audio. The container and the codec [6:05] are two completely different things. All [6:07] right, now let us talk about H.264, [6:11] also known as MPEG-4 AVC or Advanced [6:15] Video Coding. This codec was introduced [6:17] way back in 2003, and to this day it [6:20] remains the most widely used video codec [6:23] on the planet. If you have watched a [6:25] video on YouTube, Netflix, Zoom, or [6:28] pretty much any platform in the last [6:30] decade, there's a very high chance it [6:32] was encoded using H.264. [6:35] The reason it became so dominant is [6:37] simple. It offers great quality at [6:39] relatively small file sizes, and [6:42] virtually every device in the world can [6:44] play it. Your phone, laptop, smart TV, [6:47] gaming console, they all have dedicated [6:50] hardware to decode H.264 at lightning [6:53] speed. So, how does H.264 achieve its [6:57] compression? It uses all the techniques [6:59] we talked about earlier, spatial and [7:01] temporal compression, but applies them [7:04] with a specific set of encoding profiles [7:06] and levels. H.264 uses macroblocks as [7:10] its basic encoding unit. Each macroblock [7:13] is a 16 * 16 pixel block. The encoder [7:17] analyzes each macroblock, looks for [7:19] similar patterns in nearby blocks and [7:21] nearby frames, and encodes only the [7:24] changes. It also uses sophisticated [7:26] motion estimation to predict where [7:29] objects in the frame are moving. The [7:31] result is incredible compression. A raw [7:34] 100 gigabyte video can be shrunk to just [7:36] 1 or 2 gigabytes without looking [7:39] noticeably different. But here's the [7:41] thing. H.264 was designed in an era when [7:44] 1080p was the highest mainstream [7:46] resolution. Today, we have 4K, 8K, 360° [7:51] video, HDR, and streaming to billions of [7:55] devices simultaneously. H.264 starts to [7:59] show its age at these higher [8:00] resolutions. To maintain good quality in [8:03] 4K, H.264 needs a very high bitrate, [8:07] which means bigger files and more [8:08] bandwidth. That directly translates to [8:11] higher streaming costs, more storage, [8:13] and slower load times. The world needed [8:16] something better, something that could [8:18] handle 4K and beyond without doubling or [8:21] tripling the file size. That is exactly [8:24] why H.265 was created. Enter H.265, [8:29] also known as HEVC, [8:32] which stands for High Efficiency Video [8:34] Coding. H.265 was finalized in 2013, and [8:39] it was specifically designed to be twice [8:41] as efficient as H.264. [8:44] That means H.265 can deliver the same [8:48] video quality as H.264, [8:51] but at half the file size. Or, if you [8:53] keep the file size the same, H.265 [8:56] will give you noticeably better quality. [8:59] This is a massive deal for 4K streaming, [9:02] video surveillance, Blu-ray Ultra HD, [9:05] and broadcasting. It is the codec of [9:08] choice for Apple, Netflix 4K, [9:10] PlayStation 5, and many modern cameras. [9:14] It represents the next generation of [9:16] video compression. So, what makes H.265 [9:20] so much more efficient? The key [9:22] difference is the encoding block size. [9:25] While H.264 uses fixed 16 * 16 [9:29] macroblocks H.265 [9:32] uses flexible coding tree units or CTUs [9:36] that can be up to 64 * 64 pixels. Why [9:40] does that matter? Because for large, [9:42] smooth areas like a clear sky or a plain [9:45] wall, a single 64 * 64 block can encode [9:50] the whole region in one shot. H.264 [9:54] would need 16 separate macroblocks for [9:56] the same area. H.265 also uses more [10:00] sophisticated motion compensation, [10:03] better intra prediction, and improved [10:05] entropy coding. All of these add up to [10:08] dramatically better compression [10:09] efficiency. Let us put the two codecs [10:12] head-to-head. H.264 has been around [10:15] since 2003, while H.265 arrived in 2013. [10:21] For compression efficiency, H.265 wins [10:24] by a large margin, roughly 40% to 50% [10:28] better compression at the same quality [10:30] level. [10:31] In terms of hardware support, H.264 is [10:35] the clear winner. It runs natively on [10:37] virtually every device ever made. [10:40] H.265 support is excellent on modern [10:43] devices, but older hardware may [10:45] struggle. [10:46] For encoding speed, H.264 is [10:49] significantly faster because its [10:51] algorithms are simpler. H.265 encoding [10:55] requires much more processing power. [10:58] And when it comes to licensing, H.264 is [11:01] relatively straightforward, while H.265 [11:04] has complex and expensive patent [11:07] licensing, which slowed its adoption. [11:09] Let me make the compression difference [11:11] really tangible with real numbers. [11:14] Imagine you have a 1-hour video at 1080p [11:17] resolution. With H.264, [11:20] a typical high-quality encode would give [11:23] you roughly 4 to 6 GB. [11:25] The same video encoded with H.265 [11:29] at the same visual quality, roughly 2 to [11:32] 3 GB. That is literally half the [11:35] storage. Now, scale that up to 4K. A [11:38] 1-hour 4K video in H.264 [11:41] might need 40 to 60 GB. In H.265, [11:46] you can achieve the same quality for [11:49] roughly 20 to 30 GB. For streaming [11:52] platforms serving millions of users [11:54] simultaneously, this difference is worth [11:56] hundreds of millions of dollars in [11:58] bandwidth savings every year. So, which [12:01] one should you actually use? The answer [12:03] depends on your use case. If you are [12:05] uploading content to YouTube, creating [12:07] videos for older devices, or need [12:09] maximum compatibility, stick with H.264. [12:13] It is the safe choice, and practically [12:15] everything can play it. If you're [12:17] working with 4K footage, distributing [12:19] large video libraries, streaming over [12:22] limited bandwidth, or targeting modern [12:24] Apple or Android devices, H.265 is the [12:27] better choice. It will save you [12:29] tremendous storage and bandwidth. Many [12:32] modern cameras like the iPhone, GoPro, [12:34] and DSLRs actually capture in H.265 [12:38] natively, and streaming platforms like [12:40] Apple TV Plus and Netflix 4K rely [12:43] heavily on H.265. Now, just when you [12:46] thought the codec wars were over, there [12:48] is a new player making massive waves. [12:51] Meet AV1, an open-source, royalty-free [12:55] codec developed by the Alliance for Open [12:58] Media, which includes Google, Netflix, [13:01] Amazon, Apple, and Microsoft. AV1 is [13:05] even more efficient than H.265, [13:08] offering another 30% to 50% compression [13:11] improvement. And because it is [13:13] completely free to use with no licensing [13:16] fees, it is rapidly gaining adoption. [13:18] YouTube already uses AV1 for many [13:21] videos. Netflix is rolling it out. The [13:24] PlayStation 5 and the latest smartphone [13:27] support it. AV1 is likely the future of [13:30] video compression, but it requires [13:32] enormous processing power to encode and [13:35] decode. Let me show you how all of this [13:38] comes together in the real world. When [13:40] you upload a video to YouTube, their [13:42] servers do not just store one copy of [13:45] your video. They re-encode it into [13:47] multiple versions at different [13:49] resolutions and bit rates. 360p, 480p, [13:53] 720p, 1080p, 1440p, 4K, using multiple [13:59] codecs including H.264, [14:02] H.265, [14:03] and AV1. Your YouTube app then monitors [14:07] your internet connection speed in real [14:09] time. If your connection drops, it [14:11] automatically switches to a lower [14:13] resolution version, seamlessly, without [14:15] you even noticing. This is called [14:17] adaptive bitrate streaming, or ABR. It [14:21] is the technology that ensures you [14:23] always get the best possible quality for [14:25] your connection speed. All right, let us [14:28] wrap everything up. Today you learned [14:30] that raw video is absolutely enormous, [14:33] potentially hundreds of gigabytes for [14:35] just a few minutes. Video encoding [14:37] compresses that data using spatial and [14:39] temporal redundancy, removing what the [14:41] eye cannot see or what did not change [14:44] between frames. We looked at H.264, [14:47] the most compatible and widely used [14:49] codec in history, and H.265, which [14:53] delivers the same quality at roughly [14:55] half the file size, making it the go-to [14:58] for 4K content. We also got a peek at [15:00] AV1, the royalty-free future of video. [15:04] And you now understand how streaming [15:05] platforms use adaptive bitrate streaming [15:08] to give you the smoothest possible [15:09] experience. If you found this video [15:11] helpful, make sure to like, subscribe, [15:13] and hit that notification bell on Bit [15:15] Byte Talks, because there is a lot more [15:18] tech explained simply coming your way. [15:20] See you in the next one.