[0:00] Hey everyone, welcome to Bitbyte Talks.
[0:03] Today we are diving into something that
[0:05] quietly powers every single video you
[0:07] have ever watched on the internet,
[0:09] whether it's a YouTube binge, a Netflix
[0:12] marathon, or a video call with your
[0:14] friends. Video encoding is always behind
[0:17] the scenes making it all happen. So, let
[0:21] me ask you something. Have you ever
[0:22] noticed how a 2-hour movie can fit into
[0:25] just a few gigabytes on your phone? That
[0:28] right there is the magic of video
[0:30] encoding. Let us break it all down.
[0:32] Okay, so before we talk about encoding,
[0:35] let us understand what raw video
[0:37] actually looks like. A video is
[0:40] basically just a long sequence of
[0:42] images. We call them frames. Each frame
[0:44] is a snapshot of what you see. Now, if
[0:47] you have a 1080p video, each frame has
[0:50] 1920 * 1080 pixels, and each pixel has
[0:54] three color channels, red, green, and
[0:57] blue. That is roughly 6 million bytes
[1:00] just for one frame. Now, multiply that
[1:03] by 30 frames per second for 1 second of
[1:06] video, and you're looking at about 178
[1:09] megabytes every second. For a 10-minute
[1:12] video, that would be over 100 gigabytes.
[1:15] That is absolutely enormous. We simply
[1:18] cannot store or stream that. This is
[1:21] exactly why video encoding exists.
[1:24] Encoding is the process of compressing
[1:26] that massive raw video data into a much
[1:29] smaller file size without making the
[1:31] video look terrible. Think of it like
[1:33] packing a suitcase. You start with a
[1:35] mountain of clothes, and you use clever
[1:37] folding techniques to fit everything
[1:39] neatly into a single bag. Encoding does
[1:42] the same thing for your video data. It
[1:44] finds patterns, removes redundant
[1:47] information, and stores only what is
[1:49] truly necessary to reconstruct a
[1:51] great-looking video when you hit play.
[1:53] So, how exactly does encoding work? The
[1:56] software that handles this job is called
[1:59] a codec, which stands for coder-decoder.
[2:02] A codec compresses the video when you
[2:04] save it or stream it, and decompresses
[2:07] it when you watch it. The encoder is on
[2:09] the production side. It takes raw video
[2:11] and crunches it down. The decoder is on
[2:14] the playback side. Your phone, laptop,
[2:17] or TV uses it to reconstruct the video
[2:19] from the compressed data. You have
[2:22] definitely heard of popular codecs
[2:23] before, things like H.264,
[2:27] H.265,
[2:28] AV1, and VP9. These are all different
[2:32] codecs with different ways of achieving
[2:34] compression. Let me show you one of the
[2:36] core tricks encoding uses. Imagine a
[2:39] frame that shows a clear blue sky. The
[2:42] upper half of that image is almost
[2:44] entirely blue, thousands of pixels that
[2:47] are nearly identical. Instead of storing
[2:49] each pixel individually, the encoder
[2:52] says, "Hey, this whole region is the
[2:54] same color. Let me just store that
[2:56] once." This is called spatial redundancy
[3:00] or intraframe compression. It is similar
[3:02] to how zip files work. Instead of
[3:04] repeating information, it stores a
[3:06] shorthand description. The result is
[3:09] that the sky takes up a fraction of the
[3:11] space it normally would. But there is an
[3:13] even more powerful trick, temporal
[3:16] redundancy. Here's the idea. In most
[3:19] videos, consecutive frames look almost
[3:22] identical. Think of a news anchor
[3:24] sitting at a desk. Their face and hands
[3:27] might move slightly, but the desk,
[3:29] background, and studio are completely
[3:31] unchanged from one frame to the next.
[3:34] Instead of re-encoding the entire
[3:36] background in every single frame, the
[3:39] encoder only encodes what changed, just
[3:42] the movement. It stores something like a
[3:44] reference frame, and then only saves the
[3:46] difference between frames. This is
[3:49] called interframe compression, and it is
[3:52] responsible for a huge chunk of the
[3:54] compression gains in modern video. This
[3:56] brings us to the three types of frames
[3:58] that encoded video uses. First, we have
[4:01] I-frames, short for intra-coded frames.
[4:05] These are complete snapshots of the
[4:06] scene, like a keyframe. Then we have
[4:09] P-frames, predicted frames. These only
[4:12] store the difference between the current
[4:14] frame and the previous I or P-frame.
[4:17] They rely on what came before. And
[4:20] finally, B-frames, bidirectional frames.
[4:23] These are the most efficient. They
[4:25] reference both the frame before and the
[4:28] frame after to predict what the current
[4:30] frame looks like. A well-encoded video
[4:33] uses a smart mix of these three to
[4:35] achieve maximum compression while
[4:37] maintaining great quality. Another key
[4:40] concept in video encoding is bitrate.
[4:44] Bitrate is the amount of data used per
[4:46] second of video, usually measured in
[4:48] kilobits per second or megabits per
[4:50] second. Higher bitrate means more data,
[4:53] which means better quality, but also a
[4:55] bigger file. Lower bitrate means smaller
[4:58] file, but more compression artifacts,
[5:01] those blocky, blurry, or pixelated
[5:03] glitches you sometimes see on a
[5:05] low-quality stream. For reference, a
[5:07] standard 1080p YouTube video typically
[5:10] uses around 8 megabits per second. A 4K
[5:13] HDR video might use 50 megabits per
[5:16] second or more. Choosing the right
[5:18] bitrate is a balancing act between
[5:20] quality and storage or bandwidth
[5:22] requirements. People often confuse
[5:25] codecs with container formats, so let me
[5:27] clear that up right now. A container is
[5:30] like a box that holds everything, the
[5:32] video stream, the audio stream,
[5:34] subtitles, and metadata. Examples of
[5:37] containers are MP4, MKV, AVI, and MOV.
[5:42] The codec is what is inside the box. It
[5:46] defines how the video data is actually
[5:48] compressed. So, for example, you can
[5:50] have an MP4 file that uses H.264 codec
[5:54] for video and AAC codec for audio. Or an
[5:58] MKV file using H.265 for video and DTS
[6:02] for audio. The container and the codec
[6:05] are two completely different things. All
[6:07] right, now let us talk about H.264,
[6:11] also known as MPEG-4 AVC or Advanced
[6:15] Video Coding. This codec was introduced
[6:17] way back in 2003, and to this day it
[6:20] remains the most widely used video codec
[6:23] on the planet. If you have watched a
[6:25] video on YouTube, Netflix, Zoom, or
[6:28] pretty much any platform in the last
[6:30] decade, there's a very high chance it
[6:32] was encoded using H.264.
[6:35] The reason it became so dominant is
[6:37] simple. It offers great quality at
[6:39] relatively small file sizes, and
[6:42] virtually every device in the world can
[6:44] play it. Your phone, laptop, smart TV,
[6:47] gaming console, they all have dedicated
[6:50] hardware to decode H.264 at lightning
[6:53] speed. So, how does H.264 achieve its
[6:57] compression? It uses all the techniques
[6:59] we talked about earlier, spatial and
[7:01] temporal compression, but applies them
[7:04] with a specific set of encoding profiles
[7:06] and levels. H.264 uses macroblocks as
[7:10] its basic encoding unit. Each macroblock
[7:13] is a 16 * 16 pixel block. The encoder
[7:17] analyzes each macroblock, looks for
[7:19] similar patterns in nearby blocks and
[7:21] nearby frames, and encodes only the
[7:24] changes. It also uses sophisticated
[7:26] motion estimation to predict where
[7:29] objects in the frame are moving. The
[7:31] result is incredible compression. A raw
[7:34] 100 gigabyte video can be shrunk to just
[7:36] 1 or 2 gigabytes without looking
[7:39] noticeably different. But here's the
[7:41] thing. H.264 was designed in an era when
[7:44] 1080p was the highest mainstream
[7:46] resolution. Today, we have 4K, 8K, 360°
[7:51] video, HDR, and streaming to billions of
[7:55] devices simultaneously. H.264 starts to
[7:59] show its age at these higher
[8:00] resolutions. To maintain good quality in
[8:03] 4K, H.264 needs a very high bitrate,
[8:07] which means bigger files and more
[8:08] bandwidth. That directly translates to
[8:11] higher streaming costs, more storage,
[8:13] and slower load times. The world needed
[8:16] something better, something that could
[8:18] handle 4K and beyond without doubling or
[8:21] tripling the file size. That is exactly
[8:24] why H.265 was created. Enter H.265,
[8:29] also known as HEVC,
[8:32] which stands for High Efficiency Video
[8:34] Coding. H.265 was finalized in 2013, and
[8:39] it was specifically designed to be twice
[8:41] as efficient as H.264.
[8:44] That means H.265 can deliver the same
[8:48] video quality as H.264,
[8:51] but at half the file size. Or, if you
[8:53] keep the file size the same, H.265
[8:56] will give you noticeably better quality.
[8:59] This is a massive deal for 4K streaming,
[9:02] video surveillance, Blu-ray Ultra HD,
[9:05] and broadcasting. It is the codec of
[9:08] choice for Apple, Netflix 4K,
[9:10] PlayStation 5, and many modern cameras.
[9:14] It represents the next generation of
[9:16] video compression. So, what makes H.265
[9:20] so much more efficient? The key
[9:22] difference is the encoding block size.
[9:25] While H.264 uses fixed 16 * 16
[9:29] macroblocks H.265
[9:32] uses flexible coding tree units or CTUs
[9:36] that can be up to 64 * 64 pixels. Why
[9:40] does that matter? Because for large,
[9:42] smooth areas like a clear sky or a plain
[9:45] wall, a single 64 * 64 block can encode
[9:50] the whole region in one shot. H.264
[9:54] would need 16 separate macroblocks for
[9:56] the same area. H.265 also uses more
[10:00] sophisticated motion compensation,
[10:03] better intra prediction, and improved
[10:05] entropy coding. All of these add up to
[10:08] dramatically better compression
[10:09] efficiency. Let us put the two codecs
[10:12] head-to-head. H.264 has been around
[10:15] since 2003, while H.265 arrived in 2013.
[10:21] For compression efficiency, H.265 wins
[10:24] by a large margin, roughly 40% to 50%
[10:28] better compression at the same quality
[10:30] level.
[10:31] In terms of hardware support, H.264 is
[10:35] the clear winner. It runs natively on
[10:37] virtually every device ever made.
[10:40] H.265 support is excellent on modern
[10:43] devices, but older hardware may
[10:45] struggle.
[10:46] For encoding speed, H.264 is
[10:49] significantly faster because its
[10:51] algorithms are simpler. H.265 encoding
[10:55] requires much more processing power.
[10:58] And when it comes to licensing, H.264 is
[11:01] relatively straightforward, while H.265
[11:04] has complex and expensive patent
[11:07] licensing, which slowed its adoption.
[11:09] Let me make the compression difference
[11:11] really tangible with real numbers.
[11:14] Imagine you have a 1-hour video at 1080p
[11:17] resolution. With H.264,
[11:20] a typical high-quality encode would give
[11:23] you roughly 4 to 6 GB.
[11:25] The same video encoded with H.265
[11:29] at the same visual quality, roughly 2 to
[11:32] 3 GB. That is literally half the
[11:35] storage. Now, scale that up to 4K. A
[11:38] 1-hour 4K video in H.264
[11:41] might need 40 to 60 GB. In H.265,
[11:46] you can achieve the same quality for
[11:49] roughly 20 to 30 GB. For streaming
[11:52] platforms serving millions of users
[11:54] simultaneously, this difference is worth
[11:56] hundreds of millions of dollars in
[11:58] bandwidth savings every year. So, which
[12:01] one should you actually use? The answer
[12:03] depends on your use case. If you are
[12:05] uploading content to YouTube, creating
[12:07] videos for older devices, or need
[12:09] maximum compatibility, stick with H.264.
[12:13] It is the safe choice, and practically
[12:15] everything can play it. If you're
[12:17] working with 4K footage, distributing
[12:19] large video libraries, streaming over
[12:22] limited bandwidth, or targeting modern
[12:24] Apple or Android devices, H.265 is the
[12:27] better choice. It will save you
[12:29] tremendous storage and bandwidth. Many
[12:32] modern cameras like the iPhone, GoPro,
[12:34] and DSLRs actually capture in H.265
[12:38] natively, and streaming platforms like
[12:40] Apple TV Plus and Netflix 4K rely
[12:43] heavily on H.265. Now, just when you
[12:46] thought the codec wars were over, there
[12:48] is a new player making massive waves.
[12:51] Meet AV1, an open-source, royalty-free
[12:55] codec developed by the Alliance for Open
[12:58] Media, which includes Google, Netflix,
[13:01] Amazon, Apple, and Microsoft. AV1 is
[13:05] even more efficient than H.265,
[13:08] offering another 30% to 50% compression
[13:11] improvement. And because it is
[13:13] completely free to use with no licensing
[13:16] fees, it is rapidly gaining adoption.
[13:18] YouTube already uses AV1 for many
[13:21] videos. Netflix is rolling it out. The
[13:24] PlayStation 5 and the latest smartphone
[13:27] support it. AV1 is likely the future of
[13:30] video compression, but it requires
[13:32] enormous processing power to encode and
[13:35] decode. Let me show you how all of this
[13:38] comes together in the real world. When
[13:40] you upload a video to YouTube, their
[13:42] servers do not just store one copy of
[13:45] your video. They re-encode it into
[13:47] multiple versions at different
[13:49] resolutions and bit rates. 360p, 480p,
[13:53] 720p, 1080p, 1440p, 4K, using multiple
[13:59] codecs including H.264,
[14:02] H.265,
[14:03] and AV1. Your YouTube app then monitors
[14:07] your internet connection speed in real
[14:09] time. If your connection drops, it
[14:11] automatically switches to a lower
[14:13] resolution version, seamlessly, without
[14:15] you even noticing. This is called
[14:17] adaptive bitrate streaming, or ABR. It
[14:21] is the technology that ensures you
[14:23] always get the best possible quality for
[14:25] your connection speed. All right, let us
[14:28] wrap everything up. Today you learned
[14:30] that raw video is absolutely enormous,
[14:33] potentially hundreds of gigabytes for
[14:35] just a few minutes. Video encoding
[14:37] compresses that data using spatial and
[14:39] temporal redundancy, removing what the
[14:41] eye cannot see or what did not change
[14:44] between frames. We looked at H.264,
[14:47] the most compatible and widely used
[14:49] codec in history, and H.265, which
[14:53] delivers the same quality at roughly
[14:55] half the file size, making it the go-to
[14:58] for 4K content. We also got a peek at
[15:00] AV1, the royalty-free future of video.
[15:04] And you now understand how streaming
[15:05] platforms use adaptive bitrate streaming
[15:08] to give you the smoothest possible
[15:09] experience. If you found this video
[15:11] helpful, make sure to like, subscribe,
[15:13] and hit that notification bell on Bit
[15:15] Byte Talks, because there is a lot more
[15:18] tech explained simply coming your way.
[15:20] See you in the next one.