Raw video is HUGE: 100GB for 10 min!
45sShocking fact about raw video size grabs attention immediately.
▶ Play ClipVideo encoding compresses massive raw video data into manageable file sizes using techniques like spatial and temporal redundancy. This video explains how codecs like H.264, H.265, and AV1 work, comparing their efficiency and use cases.
A 1080p video at 30fps requires about 178 MB per second; a 10-minute video would be over 100 GB without compression.
A codec (coder-decoder) compresses video for storage/streaming and decompresses it for playback. Encoder compresses, decoder reconstructs.
Encodes large uniform areas (like a blue sky) as a single block instead of storing each pixel individually, similar to zip compression.
Only encodes changes between consecutive frames; for a static background, only moving parts are stored, saving significant data.
I-frames are complete snapshots; P-frames store differences from previous frames; B-frames use both previous and next frames for prediction, offering best compression.
Bitrate is data per second (e.g., 8 Mbps for 1080p YouTube). Higher bitrate = better quality but larger file; lower bitrate causes artifacts.
Container (MP4, MKV) holds video, audio, subtitles; codec (H.264, H.265) defines how video is compressed. They are separate.
Introduced in 2003, H.264 is the most widely used codec, supported on virtually all devices. Uses 16x16 macroblocks for encoding.
Finalized in 2013, H.265 offers 40-50% better compression than H.264 at same quality. Uses flexible coding tree units (CTUs) up to 64x64 pixels.
H.264: older, wider compatibility, faster encoding, simpler licensing. H.265: better compression, needed for 4K, but slower encoding and complex licensing.
1-hour 1080p: H.264 ~4-6 GB, H.265 ~2-3 GB. 1-hour 4K: H.264 ~40-60 GB, H.265 ~20-30 GB. Half the storage.
AV1 is an open-source, royalty-free codec offering 30-50% better compression than H.265. Used by YouTube, Netflix, but requires high processing power.
YouTube encodes videos in multiple resolutions/codecs. Player dynamically switches quality based on internet speed for smooth playback.
Video encoding is essential for storing and streaming video efficiently. H.264 remains the most compatible, H.265 is ideal for 4K, and AV1 represents the royalty-free future with even better compression.
"Title accurately reflects content; video thoroughly explains encoding and compares H.264 vs H.265."
What is the approximate raw data rate for a 1080p video at 30fps?
About 178 MB per second.
01:00
What does codec stand for?
Coder-decoder.
01:56
What is spatial redundancy in video encoding?
Encoding large uniform areas (like a blue sky) as a single block instead of storing each pixel individually.
02:36
What is temporal redundancy?
Only encoding changes between consecutive frames, not re-encoding static parts.
03:13
Name the three types of frames in encoded video.
I-frames (intra-coded), P-frames (predicted), B-frames (bidirectional).
03:56
What is bitrate?
The amount of data used per second of video, measured in kbps or Mbps.
04:40
What is the difference between a container and a codec?
A container (e.g., MP4) holds video, audio, and metadata; a codec (e.g., H.264) defines how the video is compressed.
05:25
What year was H.264 introduced?
2003.
06:11
What is the basic encoding unit size in H.264?
16x16 pixel macroblocks.
07:10
What is the key improvement in H.265 over H.264?
It uses flexible coding tree units (CTUs) up to 64x64 pixels, allowing better compression for large uniform areas.
09:25
How much better compression does H.265 offer compared to H.264?
Roughly 40% to 50% better compression at the same quality level.
10:24
What is AV1?
An open-source, royalty-free codec developed by the Alliance for Open Media, offering 30-50% better compression than H.265.
12:51
What is adaptive bitrate streaming (ABR)?
A technology that monitors internet connection speed and switches video quality in real time for smooth playback.
14:17
Raw video is enormous
Quantifies the problem: 10-minute 1080p video would be over 100 GB without compression.
01:00Spatial redundancy analogy
Explains compression using a clear blue sky example, making the concept intuitive.
02:36Temporal redundancy insight
Key insight that most frames are nearly identical, enabling massive compression by storing only changes.
03:13H.264 dominance
Explains why H.264 became the universal codec: great quality, small file size, and near-universal hardware support.
06:07H.265 CTU advantage
Highlights the technical improvement: flexible 64x64 blocks vs fixed 16x16, enabling better compression for high-res video.
09:25Real-world file size comparison
Concrete numbers show H.265 halves file size for same quality, with huge cost implications for streaming services.
11:09AV1: royalty-free future
Introduces AV1 as an open-source alternative with even better compression, poised to become the next standard.
12:51[00:00] Hey everyone, welcome to Bitbyte Talks.
[00:03] Today we are diving into something that
[00:05] quietly powers every single video you
[00:07] have ever watched on the internet,
[00:09] whether it's a YouTube binge, a Netflix
[00:12] marathon, or a video call with your
[00:14] friends. Video encoding is always behind
[00:17] the scenes making it all happen. So, let
[00:21] me ask you something. Have you ever
[00:22] noticed how a 2-hour movie can fit into
[00:25] just a few gigabytes on your phone? That
[00:28] right there is the magic of video
[00:30] encoding. Let us break it all down.
[00:32] Okay, so before we talk about encoding,
[00:35] let us understand what raw video
[00:37] actually looks like. A video is
[00:40] basically just a long sequence of
[00:42] images. We call them frames. Each frame
[00:44] is a snapshot of what you see. Now, if
[00:47] you have a 1080p video, each frame has
[00:50] 1920 * 1080 pixels, and each pixel has
[00:54] three color channels, red, green, and
[00:57] blue. That is roughly 6 million bytes
[01:00] just for one frame. Now, multiply that
[01:03] by 30 frames per second for 1 second of
[01:06] video, and you're looking at about 178
[01:09] megabytes every second. For a 10-minute
[01:12] video, that would be over 100 gigabytes.
[01:15] That is absolutely enormous. We simply
[01:18] cannot store or stream that. This is
[01:21] exactly why video encoding exists.
[01:24] Encoding is the process of compressing
[01:26] that massive raw video data into a much
[01:29] smaller file size without making the
[01:31] video look terrible. Think of it like
[01:33] packing a suitcase. You start with a
[01:35] mountain of clothes, and you use clever
[01:37] folding techniques to fit everything
[01:39] neatly into a single bag. Encoding does
[01:42] the same thing for your video data. It
[01:44] finds patterns, removes redundant
[01:47] information, and stores only what is
[01:49] truly necessary to reconstruct a
[01:51] great-looking video when you hit play.
[01:53] So, how exactly does encoding work? The
[01:56] software that handles this job is called
[01:59] a codec, which stands for coder-decoder.
[02:02] A codec compresses the video when you
[02:04] save it or stream it, and decompresses
[02:07] it when you watch it. The encoder is on
[02:09] the production side. It takes raw video
[02:11] and crunches it down. The decoder is on
[02:14] the playback side. Your phone, laptop,
[02:17] or TV uses it to reconstruct the video
[02:19] from the compressed data. You have
[02:22] definitely heard of popular codecs
[02:23] before, things like H.264,
[02:27] H.265,
[02:28] AV1, and VP9. These are all different
[02:32] codecs with different ways of achieving
[02:34] compression. Let me show you one of the
[02:36] core tricks encoding uses. Imagine a
[02:39] frame that shows a clear blue sky. The
[02:42] upper half of that image is almost
[02:44] entirely blue, thousands of pixels that
[02:47] are nearly identical. Instead of storing
[02:49] each pixel individually, the encoder
[02:52] says, "Hey, this whole region is the
[02:54] same color. Let me just store that
[02:56] once." This is called spatial redundancy
[03:00] or intraframe compression. It is similar
[03:02] to how zip files work. Instead of
[03:04] repeating information, it stores a
[03:06] shorthand description. The result is
[03:09] that the sky takes up a fraction of the
[03:11] space it normally would. But there is an
[03:13] even more powerful trick, temporal
[03:16] redundancy. Here's the idea. In most
[03:19] videos, consecutive frames look almost
[03:22] identical. Think of a news anchor
[03:24] sitting at a desk. Their face and hands
[03:27] might move slightly, but the desk,
[03:29] background, and studio are completely
[03:31] unchanged from one frame to the next.
[03:34] Instead of re-encoding the entire
[03:36] background in every single frame, the
[03:39] encoder only encodes what changed, just
[03:42] the movement. It stores something like a
[03:44] reference frame, and then only saves the
[03:46] difference between frames. This is
[03:49] called interframe compression, and it is
[03:52] responsible for a huge chunk of the
[03:54] compression gains in modern video. This
[03:56] brings us to the three types of frames
[03:58] that encoded video uses. First, we have
[04:01] I-frames, short for intra-coded frames.
[04:05] These are complete snapshots of the
[04:06] scene, like a keyframe. Then we have
[04:09] P-frames, predicted frames. These only
[04:12] store the difference between the current
[04:14] frame and the previous I or P-frame.
[04:17] They rely on what came before. And
[04:20] finally, B-frames, bidirectional frames.
[04:23] These are the most efficient. They
[04:25] reference both the frame before and the
[04:28] frame after to predict what the current
[04:30] frame looks like. A well-encoded video
[04:33] uses a smart mix of these three to
[04:35] achieve maximum compression while
[04:37] maintaining great quality. Another key
[04:40] concept in video encoding is bitrate.
[04:44] Bitrate is the amount of data used per
[04:46] second of video, usually measured in
[04:48] kilobits per second or megabits per
[04:50] second. Higher bitrate means more data,
[04:53] which means better quality, but also a
[04:55] bigger file. Lower bitrate means smaller
[04:58] file, but more compression artifacts,
[05:01] those blocky, blurry, or pixelated
[05:03] glitches you sometimes see on a
[05:05] low-quality stream. For reference, a
[05:07] standard 1080p YouTube video typically
[05:10] uses around 8 megabits per second. A 4K
[05:13] HDR video might use 50 megabits per
[05:16] second or more. Choosing the right
[05:18] bitrate is a balancing act between
[05:20] quality and storage or bandwidth
[05:22] requirements. People often confuse
[05:25] codecs with container formats, so let me
[05:27] clear that up right now. A container is
[05:30] like a box that holds everything, the
[05:32] video stream, the audio stream,
[05:34] subtitles, and metadata. Examples of
[05:37] containers are MP4, MKV, AVI, and MOV.
[05:42] The codec is what is inside the box. It
[05:46] defines how the video data is actually
[05:48] compressed. So, for example, you can
[05:50] have an MP4 file that uses H.264 codec
[05:54] for video and AAC codec for audio. Or an
[05:58] MKV file using H.265 for video and DTS
[06:02] for audio. The container and the codec
[06:05] are two completely different things. All
[06:07] right, now let us talk about H.264,
[06:11] also known as MPEG-4 AVC or Advanced
[06:15] Video Coding. This codec was introduced
[06:17] way back in 2003, and to this day it
[06:20] remains the most widely used video codec
[06:23] on the planet. If you have watched a
[06:25] video on YouTube, Netflix, Zoom, or
[06:28] pretty much any platform in the last
[06:30] decade, there's a very high chance it
[06:32] was encoded using H.264.
[06:35] The reason it became so dominant is
[06:37] simple. It offers great quality at
[06:39] relatively small file sizes, and
[06:42] virtually every device in the world can
[06:44] play it. Your phone, laptop, smart TV,
[06:47] gaming console, they all have dedicated
[06:50] hardware to decode H.264 at lightning
[06:53] speed. So, how does H.264 achieve its
[06:57] compression? It uses all the techniques
[06:59] we talked about earlier, spatial and
[07:01] temporal compression, but applies them
[07:04] with a specific set of encoding profiles
[07:06] and levels. H.264 uses macroblocks as
[07:10] its basic encoding unit. Each macroblock
[07:13] is a 16 * 16 pixel block. The encoder
[07:17] analyzes each macroblock, looks for
[07:19] similar patterns in nearby blocks and
[07:21] nearby frames, and encodes only the
[07:24] changes. It also uses sophisticated
[07:26] motion estimation to predict where
[07:29] objects in the frame are moving. The
[07:31] result is incredible compression. A raw
[07:34] 100 gigabyte video can be shrunk to just
[07:36] 1 or 2 gigabytes without looking
[07:39] noticeably different. But here's the
[07:41] thing. H.264 was designed in an era when
[07:44] 1080p was the highest mainstream
[07:46] resolution. Today, we have 4K, 8K, 360°
[07:51] video, HDR, and streaming to billions of
[07:55] devices simultaneously. H.264 starts to
[07:59] show its age at these higher
[08:00] resolutions. To maintain good quality in
[08:03] 4K, H.264 needs a very high bitrate,
[08:07] which means bigger files and more
[08:08] bandwidth. That directly translates to
[08:11] higher streaming costs, more storage,
[08:13] and slower load times. The world needed
[08:16] something better, something that could
[08:18] handle 4K and beyond without doubling or
[08:21] tripling the file size. That is exactly
[08:24] why H.265 was created. Enter H.265,
[08:29] also known as HEVC,
[08:32] which stands for High Efficiency Video
[08:34] Coding. H.265 was finalized in 2013, and
[08:39] it was specifically designed to be twice
[08:41] as efficient as H.264.
[08:44] That means H.265 can deliver the same
[08:48] video quality as H.264,
[08:51] but at half the file size. Or, if you
[08:53] keep the file size the same, H.265
[08:56] will give you noticeably better quality.
[08:59] This is a massive deal for 4K streaming,
[09:02] video surveillance, Blu-ray Ultra HD,
[09:05] and broadcasting. It is the codec of
[09:08] choice for Apple, Netflix 4K,
[09:10] PlayStation 5, and many modern cameras.
[09:14] It represents the next generation of
[09:16] video compression. So, what makes H.265
[09:20] so much more efficient? The key
[09:22] difference is the encoding block size.
[09:25] While H.264 uses fixed 16 * 16
[09:29] macroblocks H.265
[09:32] uses flexible coding tree units or CTUs
[09:36] that can be up to 64 * 64 pixels. Why
[09:40] does that matter? Because for large,
[09:42] smooth areas like a clear sky or a plain
[09:45] wall, a single 64 * 64 block can encode
[09:50] the whole region in one shot. H.264
[09:54] would need 16 separate macroblocks for
[09:56] the same area. H.265 also uses more
[10:00] sophisticated motion compensation,
[10:03] better intra prediction, and improved
[10:05] entropy coding. All of these add up to
[10:08] dramatically better compression
[10:09] efficiency. Let us put the two codecs
[10:12] head-to-head. H.264 has been around
[10:15] since 2003, while H.265 arrived in 2013.
[10:21] For compression efficiency, H.265 wins
[10:24] by a large margin, roughly 40% to 50%
[10:28] better compression at the same quality
[10:30] level.
[10:31] In terms of hardware support, H.264 is
[10:35] the clear winner. It runs natively on
[10:37] virtually every device ever made.
[10:40] H.265 support is excellent on modern
[10:43] devices, but older hardware may
[10:45] struggle.
[10:46] For encoding speed, H.264 is
[10:49] significantly faster because its
[10:51] algorithms are simpler. H.265 encoding
[10:55] requires much more processing power.
[10:58] And when it comes to licensing, H.264 is
[11:01] relatively straightforward, while H.265
[11:04] has complex and expensive patent
[11:07] licensing, which slowed its adoption.
[11:09] Let me make the compression difference
[11:11] really tangible with real numbers.
[11:14] Imagine you have a 1-hour video at 1080p
[11:17] resolution. With H.264,
[11:20] a typical high-quality encode would give
[11:23] you roughly 4 to 6 GB.
[11:25] The same video encoded with H.265
[11:29] at the same visual quality, roughly 2 to
[11:32] 3 GB. That is literally half the
[11:35] storage. Now, scale that up to 4K. A
[11:38] 1-hour 4K video in H.264
[11:41] might need 40 to 60 GB. In H.265,
[11:46] you can achieve the same quality for
[11:49] roughly 20 to 30 GB. For streaming
[11:52] platforms serving millions of users
[11:54] simultaneously, this difference is worth
[11:56] hundreds of millions of dollars in
[11:58] bandwidth savings every year. So, which
[12:01] one should you actually use? The answer
[12:03] depends on your use case. If you are
[12:05] uploading content to YouTube, creating
[12:07] videos for older devices, or need
[12:09] maximum compatibility, stick with H.264.
[12:13] It is the safe choice, and practically
[12:15] everything can play it. If you're
[12:17] working with 4K footage, distributing
[12:19] large video libraries, streaming over
[12:22] limited bandwidth, or targeting modern
[12:24] Apple or Android devices, H.265 is the
[12:27] better choice. It will save you
[12:29] tremendous storage and bandwidth. Many
[12:32] modern cameras like the iPhone, GoPro,
[12:34] and DSLRs actually capture in H.265
[12:38] natively, and streaming platforms like
[12:40] Apple TV Plus and Netflix 4K rely
[12:43] heavily on H.265. Now, just when you
[12:46] thought the codec wars were over, there
[12:48] is a new player making massive waves.
[12:51] Meet AV1, an open-source, royalty-free
[12:55] codec developed by the Alliance for Open
[12:58] Media, which includes Google, Netflix,
[13:01] Amazon, Apple, and Microsoft. AV1 is
[13:05] even more efficient than H.265,
[13:08] offering another 30% to 50% compression
[13:11] improvement. And because it is
[13:13] completely free to use with no licensing
[13:16] fees, it is rapidly gaining adoption.
[13:18] YouTube already uses AV1 for many
[13:21] videos. Netflix is rolling it out. The
[13:24] PlayStation 5 and the latest smartphone
[13:27] support it. AV1 is likely the future of
[13:30] video compression, but it requires
[13:32] enormous processing power to encode and
[13:35] decode. Let me show you how all of this
[13:38] comes together in the real world. When
[13:40] you upload a video to YouTube, their
[13:42] servers do not just store one copy of
[13:45] your video. They re-encode it into
[13:47] multiple versions at different
[13:49] resolutions and bit rates. 360p, 480p,
[13:53] 720p, 1080p, 1440p, 4K, using multiple
[13:59] codecs including H.264,
[14:02] H.265,
[14:03] and AV1. Your YouTube app then monitors
[14:07] your internet connection speed in real
[14:09] time. If your connection drops, it
[14:11] automatically switches to a lower
[14:13] resolution version, seamlessly, without
[14:15] you even noticing. This is called
[14:17] adaptive bitrate streaming, or ABR. It
[14:21] is the technology that ensures you
[14:23] always get the best possible quality for
[14:25] your connection speed. All right, let us
[14:28] wrap everything up. Today you learned
[14:30] that raw video is absolutely enormous,
[14:33] potentially hundreds of gigabytes for
[14:35] just a few minutes. Video encoding
[14:37] compresses that data using spatial and
[14:39] temporal redundancy, removing what the
[14:41] eye cannot see or what did not change
[14:44] between frames. We looked at H.264,
[14:47] the most compatible and widely used
[14:49] codec in history, and H.265, which
[14:53] delivers the same quality at roughly
[14:55] half the file size, making it the go-to
[14:58] for 4K content. We also got a peek at
[15:00] AV1, the royalty-free future of video.
[15:04] And you now understand how streaming
[15:05] platforms use adaptive bitrate streaming
[15:08] to give you the smoothest possible
[15:09] experience. If you found this video
[15:11] helpful, make sure to like, subscribe,
[15:13] and hit that notification bell on Bit
[15:15] Byte Talks, because there is a lot more
[15:18] tech explained simply coming your way.
[15:20] See you in the next one.
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.