---
title: 'How Video Encoding Works | H.264 vs H.265'
source: 'https://youtube.com/watch?v=bseDrPAo9rE'
video_id: 'bseDrPAo9rE'
date: 2026-06-16
duration_sec: 0
---

# How Video Encoding Works | H.264 vs H.265

> Source: [How Video Encoding Works | H.264 vs H.265](https://youtube.com/watch?v=bseDrPAo9rE)

## Summary

Video encoding compresses massive raw video data into manageable file sizes using techniques like spatial and temporal redundancy. This video explains how codecs like H.264, H.265, and AV1 work, comparing their efficiency and use cases.

### Key Points

- **Raw video size problem** [00:00] — A 1080p video at 30fps requires about 178 MB per second; a 10-minute video would be over 100 GB without compression.
- **What is a codec?** [01:53] — A codec (coder-decoder) compresses video for storage/streaming and decompresses it for playback. Encoder compresses, decoder reconstructs.
- **Spatial redundancy (intraframe compression)** [02:36] — Encodes large uniform areas (like a blue sky) as a single block instead of storing each pixel individually, similar to zip compression.
- **Temporal redundancy (interframe compression)** [03:13] — Only encodes changes between consecutive frames; for a static background, only moving parts are stored, saving significant data.
- **Three frame types: I, P, B** [03:56] — I-frames are complete snapshots; P-frames store differences from previous frames; B-frames use both previous and next frames for prediction, offering best compression.
- **Bitrate explained** [04:40] — Bitrate is data per second (e.g., 8 Mbps for 1080p YouTube). Higher bitrate = better quality but larger file; lower bitrate causes artifacts.
- **Container vs codec** [05:25] — Container (MP4, MKV) holds video, audio, subtitles; codec (H.264, H.265) defines how video is compressed. They are separate.
- **H.264 dominance** [06:07] — Introduced in 2003, H.264 is the most widely used codec, supported on virtually all devices. Uses 16x16 macroblocks for encoding.
- **H.265 (HEVC) efficiency** [08:24] — Finalized in 2013, H.265 offers 40-50% better compression than H.264 at same quality. Uses flexible coding tree units (CTUs) up to 64x64 pixels.
- **H.264 vs H.265 comparison** [10:12] — H.264: older, wider compatibility, faster encoding, simpler licensing. H.265: better compression, needed for 4K, but slower encoding and complex licensing.
- **Real-world file size savings** [11:09] — 1-hour 1080p: H.264 ~4-6 GB, H.265 ~2-3 GB. 1-hour 4K: H.264 ~40-60 GB, H.265 ~20-30 GB. Half the storage.
- **AV1: the future** [12:46] — AV1 is an open-source, royalty-free codec offering 30-50% better compression than H.265. Used by YouTube, Netflix, but requires high processing power.
- **Adaptive bitrate streaming (ABR)** [13:38] — YouTube encodes videos in multiple resolutions/codecs. Player dynamically switches quality based on internet speed for smooth playback.

### Conclusion

Video encoding is essential for storing and streaming video efficiently. H.264 remains the most compatible, H.265 is ideal for 4K, and AV1 represents the royalty-free future with even better compression.

## Transcript

Hey everyone, welcome to Bitbyte Talks.
Today we are diving into something that
quietly powers every single video you
have ever watched on the internet,
whether it's a YouTube binge, a Netflix
marathon, or a video call with your
friends. Video encoding is always behind
the scenes making it all happen. So, let
me ask you something. Have you ever
noticed how a 2-hour movie can fit into
just a few gigabytes on your phone? That
right there is the magic of video
encoding. Let us break it all down.
Okay, so before we talk about encoding,
let us understand what raw video
actually looks like. A video is
basically just a long sequence of
images. We call them frames. Each frame
is a snapshot of what you see. Now, if
you have a 1080p video, each frame has
1920 * 1080 pixels, and each pixel has
three color channels, red, green, and
blue. That is roughly 6 million bytes
just for one frame. Now, multiply that
by 30 frames per second for 1 second of
video, and you're looking at about 178
megabytes every second. For a 10-minute
video, that would be over 100 gigabytes.
That is absolutely enormous. We simply
cannot store or stream that. This is
exactly why video encoding exists.
Encoding is the process of compressing
that massive raw video data into a much
smaller file size without making the
video look terrible. Think of it like
packing a suitcase. You start with a
mountain of clothes, and you use clever
folding techniques to fit everything
neatly into a single bag. Encoding does
the same thing for your video data. It
finds patterns, removes redundant
information, and stores only what is
truly necessary to reconstruct a
great-looking video when you hit play.
So, how exactly does encoding work? The
software that handles this job is called
a codec, which stands for coder-decoder.
A codec compresses the video when you
save it or stream it, and decompresses
it when you watch it. The encoder is on
the production side. It takes raw video
and crunches it down. The decoder is on
the playback side. Your phone, laptop,
or TV uses it to reconstruct the video
from the compressed data. You have
definitely heard of popular codecs
before, things like H.264,
H.265,
AV1, and VP9. These are all different
codecs with different ways of achieving
compression. Let me show you one of the
core tricks encoding uses. Imagine a
frame that shows a clear blue sky. The
upper half of that image is almost
entirely blue, thousands of pixels that
are nearly identical. Instead of storing
each pixel individually, the encoder
says, "Hey, this whole region is the
same color. Let me just store that
once." This is called spatial redundancy
or intraframe compression. It is similar
to how zip files work. Instead of
repeating information, it stores a
shorthand description. The result is
that the sky takes up a fraction of the
space it normally would. But there is an
even more powerful trick, temporal
redundancy. Here's the idea. In most
videos, consecutive frames look almost
identical. Think of a news anchor
sitting at a desk. Their face and hands
might move slightly, but the desk,
background, and studio are completely
unchanged from one frame to the next.
Instead of re-encoding the entire
background in every single frame, the
encoder only encodes what changed, just
the movement. It stores something like a
reference frame, and then only saves the
difference between frames. This is
called interframe compression, and it is
responsible for a huge chunk of the
compression gains in modern video. This
brings us to the three types of frames
that encoded video uses. First, we have
I-frames, short for intra-coded frames.
These are complete snapshots of the
scene, like a keyframe. Then we have
P-frames, predicted frames. These only
store the difference between the current
frame and the previous I or P-frame.
They rely on what came before. And
finally, B-frames, bidirectional frames.
These are the most efficient. They
reference both the frame before and the
frame after to predict what the current
frame looks like. A well-encoded video
uses a smart mix of these three to
achieve maximum compression while
maintaining great quality. Another key
concept in video encoding is bitrate.
Bitrate is the amount of data used per
second of video, usually measured in
kilobits per second or megabits per
second. Higher bitrate means more data,
which means better quality, but also a
bigger file. Lower bitrate means smaller
file, but more compression artifacts,
those blocky, blurry, or pixelated
glitches you sometimes see on a
low-quality stream. For reference, a
standard 1080p YouTube video typically
uses around 8 megabits per second. A 4K
HDR video might use 50 megabits per
second or more. Choosing the right
bitrate is a balancing act between
quality and storage or bandwidth
requirements. People often confuse
codecs with container formats, so let me
clear that up right now. A container is
like a box that holds everything, the
video stream, the audio stream,
subtitles, and metadata. Examples of
containers are MP4, MKV, AVI, and MOV.
The codec is what is inside the box. It
defines how the video data is actually
compressed. So, for example, you can
have an MP4 file that uses H.264 codec
for video and AAC codec for audio. Or an
MKV file using H.265 for video and DTS
for audio. The container and the codec
are two completely different things. All
right, now let us talk about H.264,
also known as MPEG-4 AVC or Advanced
Video Coding. This codec was introduced
way back in 2003, and to this day it
remains the most widely used video codec
on the planet. If you have watched a
video on YouTube, Netflix, Zoom, or
pretty much any platform in the last
decade, there's a very high chance it
was encoded using H.264.
The reason it became so dominant is
simple. It offers great quality at
relatively small file sizes, and
virtually every device in the world can
play it. Your phone, laptop, smart TV,
gaming console, they all have dedicated
hardware to decode H.264 at lightning
speed. So, how does H.264 achieve its
compression? It uses all the techniques
we talked about earlier, spatial and
temporal compression, but applies them
with a specific set of encoding profiles
and levels. H.264 uses macroblocks as
its basic encoding unit. Each macroblock
is a 16 * 16 pixel block. The encoder
analyzes each macroblock, looks for
similar patterns in nearby blocks and
nearby frames, and encodes only the
changes. It also uses sophisticated
motion estimation to predict where
objects in the frame are moving. The
result is incredible compression. A raw
100 gigabyte video can be shrunk to just
1 or 2 gigabytes without looking
noticeably different. But here's the
thing. H.264 was designed in an era when
1080p was the highest mainstream
resolution. Today, we have 4K, 8K, 360°
video, HDR, and streaming to billions of
devices simultaneously. H.264 starts to
show its age at these higher
resolutions. To maintain good quality in
4K, H.264 needs a very high bitrate,
which means bigger files and more
bandwidth. That directly translates to
higher streaming costs, more storage,
and slower load times. The world needed
something better, something that could
handle 4K and beyond without doubling or
tripling the file size. That is exactly
why H.265 was created. Enter H.265,
also known as HEVC,
which stands for High Efficiency Video
Coding. H.265 was finalized in 2013, and
it was specifically designed to be twice
as efficient as H.264.
That means H.265 can deliver the same
video quality as H.264,
but at half the file size. Or, if you
keep the file size the same, H.265
will give you noticeably better quality.
This is a massive deal for 4K streaming,
video surveillance, Blu-ray Ultra HD,
and broadcasting. It is the codec of
choice for Apple, Netflix 4K,
PlayStation 5, and many modern cameras.
It represents the next generation of
video compression. So, what makes H.265
so much more efficient? The key
difference is the encoding block size.
While H.264 uses fixed 16 * 16
macroblocks H.265
uses flexible coding tree units or CTUs
that can be up to 64 * 64 pixels. Why
does that matter? Because for large,
smooth areas like a clear sky or a plain
wall, a single 64 * 64 block can encode
the whole region in one shot. H.264
would need 16 separate macroblocks for
the same area. H.265 also uses more
sophisticated motion compensation,
better intra prediction, and improved
entropy coding. All of these add up to
dramatically better compression
efficiency. Let us put the two codecs
head-to-head. H.264 has been around
since 2003, while H.265 arrived in 2013.
For compression efficiency, H.265 wins
by a large margin, roughly 40% to 50%
better compression at the same quality
level.
In terms of hardware support, H.264 is
the clear winner. It runs natively on
virtually every device ever made.
H.265 support is excellent on modern
devices, but older hardware may
struggle.
For encoding speed, H.264 is
significantly faster because its
algorithms are simpler. H.265 encoding
requires much more processing power.
And when it comes to licensing, H.264 is
relatively straightforward, while H.265
has complex and expensive patent
licensing, which slowed its adoption.
Let me make the compression difference
really tangible with real numbers.
Imagine you have a 1-hour video at 1080p
resolution. With H.264,
a typical high-quality encode would give
you roughly 4 to 6 GB.
The same video encoded with H.265
at the same visual quality, roughly 2 to
3 GB. That is literally half the
storage. Now, scale that up to 4K. A
1-hour 4K video in H.264
might need 40 to 60 GB. In H.265,
you can achieve the same quality for
roughly 20 to 30 GB. For streaming
platforms serving millions of users
simultaneously, this difference is worth
hundreds of millions of dollars in
bandwidth savings every year. So, which
one should you actually use? The answer
depends on your use case. If you are
uploading content to YouTube, creating
videos for older devices, or need
maximum compatibility, stick with H.264.
It is the safe choice, and practically
everything can play it. If you're
working with 4K footage, distributing
large video libraries, streaming over
limited bandwidth, or targeting modern
Apple or Android devices, H.265 is the
better choice. It will save you
tremendous storage and bandwidth. Many
modern cameras like the iPhone, GoPro,
and DSLRs actually capture in H.265
natively, and streaming platforms like
Apple TV Plus and Netflix 4K rely
heavily on H.265. Now, just when you
thought the codec wars were over, there
is a new player making massive waves.
Meet AV1, an open-source, royalty-free
codec developed by the Alliance for Open
Media, which includes Google, Netflix,
Amazon, Apple, and Microsoft. AV1 is
even more efficient than H.265,
offering another 30% to 50% compression
improvement. And because it is
completely free to use with no licensing
fees, it is rapidly gaining adoption.
YouTube already uses AV1 for many
videos. Netflix is rolling it out. The
PlayStation 5 and the latest smartphone
support it. AV1 is likely the future of
video compression, but it requires
enormous processing power to encode and
decode. Let me show you how all of this
comes together in the real world. When
you upload a video to YouTube, their
servers do not just store one copy of
your video. They re-encode it into
multiple versions at different
resolutions and bit rates. 360p, 480p,
720p, 1080p, 1440p, 4K, using multiple
codecs including H.264,
H.265,
and AV1. Your YouTube app then monitors
your internet connection speed in real
time. If your connection drops, it
automatically switches to a lower
resolution version, seamlessly, without
you even noticing. This is called
adaptive bitrate streaming, or ABR. It
is the technology that ensures you
always get the best possible quality for
your connection speed. All right, let us
wrap everything up. Today you learned
that raw video is absolutely enormous,
potentially hundreds of gigabytes for
just a few minutes. Video encoding
compresses that data using spatial and
temporal redundancy, removing what the
eye cannot see or what did not change
between frames. We looked at H.264,
the most compatible and widely used
codec in history, and H.265, which
delivers the same quality at roughly
half the file size, making it the go-to
for 4K content. We also got a peek at
AV1, the royalty-free future of video.
And you now understand how streaming
platforms use adaptive bitrate streaming
to give you the smoothest possible
experience. If you found this video
helpful, make sure to like, subscribe,
and hit that notification bell on Bit
Byte Talks, because there is a lot more
tech explained simply coming your way.
See you in the next one.
