---
title: 'MoviePy is an Awesome Python Library for Automatic Video Editing!'
source: 'https://youtube.com/watch?v=kIxYMJUrYW0'
video_id: 'kIxYMJUrYW0'
date: 2026-06-28
duration_sec: 0
---

# MoviePy is an Awesome Python Library for Automatic Video Editing!

> Source: [MoviePy is an Awesome Python Library for Automatic Video Editing!](https://youtube.com/watch?v=kIxYMJUrYW0)

## Summary

The video demonstrates migrating an automatic YouTube Short caption generator from OpenCV to MoviePy to improve text styling and caption synchronization. The developer shows how to use MoviePy's `TextClip` with custom fonts, colors, and strokes while leveraging Whisper for accurate transcription timestamps. The final output achieves better caption quality with fewer dependencies and more flexible styling options.

### Key Points

- **Project Overview and Recap** [00:05] — The project started as a YouTube Short generator that uses AI to create video clips from text input. It generates narration and captions, but original captions were poorly synced.
- **Introducing the Capacity Project** [02:20] — A new project called 'capacity' improves sync by transcribing audio with Whisper, getting timestamps for every word, and drawing captions on the video.
- **Text Styling with MoviePy** [04:42] — MoviePy is used to create styled text clips with parameters like font, fontsize, color, stroke_color, stroke_width, and a custom blur for shadows using Pillow.
- **Refactoring to MoviePy** [05:30] — The developer refactors the main loop to use MoviePy's VideoFileClip, setting start and duration for each text clip, and discovers MoviePy supports newlines in text.
- **Successful Test Run** [08:20] — After migration, the video renders correctly on first try. Captions are synced and styled. Rendering is slower than OpenCV but quality is better.
- **Removing OpenCV Dependency** [10:46] — The developer removes OpenCV entirely, gets frame rate and width from MoviePy, and adds padding to avoid text clipping at edges.
- **Customisation and Next Steps** [12:06] — Fonts, colors, and stroke widths can be easily changed. Example shows red color with 'Poetsen One' font. Next steps include word highlighting and background effects.

## Transcript

hi there in today's video I'm once again
working on my capacity project which is
an automatic YouTube short caption
generator now to recap this project
which has now been going on for several
videos it started out with this short
trity project which is an AI generated
YouTube short generator so basically it
takes in a text input and then it
generates a YouTube short based on that
and I can actually show you a demo so we
can create a new file let's call it
prompt. text and here we put the prompt
for the short this can be either just a
description of the YouTube short you
want to create or it can be just a copy
pasted news article or whatever text
really let's try something like HTML 5
features you've never heard of so we
will save this and then we can run our
main.py and we can give it this prompt.
text and then it will just generate the
whole thing and if we then open this
thing in VLC for example then it will
look like this HTML 5 a revamp of the
classic HTML has brought many
advancements to web development but some
fascinating features might have slipped
under your radar features like the micro
dat API a method to integrate metadata
into your sites or content security
policy enhancing protection against
excess attacks and webq L database
although somewhat controversial can be a
useful tool for web-based applications
several Hidden Treasures lurk within
HTML 5 enhancing the prowess and
potential of web developers everywhere
now as you might have noticed we have a
couple of issues with this the first
issue is that the captions are not quite
synced to the video or to the narration
and that is because we use kind of a
dumb way of syncing the narration with
the captions so basically we generate
the narration one sentence at the time
and then we check the duration of that
narration audio and we calculate the
duration per character and then we show
every word in the video one by one based
on how many characters there are and it
is surprisingly accurate but still not
that accurate so hence I created this
capacity project and this one takes an
existing video with a narration and then
it transcribes the audio with open AI
whisper and gets the timestamps of every
word in the narration and that way we
can sync the narration pretty much
exactly correctly so if I go quickly
back to the short trity project and I go
to my text module and I disable the
drawing of the captions and then I
generate this short again then now we're
going to have a short without the
captions so let's move this file into
the capacity project
and then let's go into the capacity
project and add the captions to this so
here we can run again main.py and we
give it the video file which in this
case is short. Avi then it will
transcribe this and it will add the
captions to it and if we now play this
with transcript. Avi which is the output
of this script it will look like this
ever wonder what HTML 5 might be hiding
and its depths there's more to it than
just to facilitate webpage display did
you know it features geolocation
abilities allowing sites to know your
location or how about its video and
audio elements making for easy
multimedia integration perhaps the most
fascinating is the offline storage
feature enabling the browser to store
your data diving into HTML FES talents
indeed gives A New Perspective into the
world of web development so now the
captions are synced better and we can
actually see multiple work words at a
time and it in fact checks how many
words fit into the video on two lines
and that's how many it's going to use
which brings us to the latest video and
the subject of today's video so the
captions you just saw were added to the
video with open CV and open CV is not
that great at styling text so we are in
the process of moving this to movie pie
and in the previous video I started
already this movie pie test.py script to
try and style the text in the videos and
this script takes in a video file let's
change this to the short. Avi that we
have here and it adds some styled text
to it so we have this create text
function into which we can pass in the
text and a font size and a color and a
font and a background color and a blur
radius which is used for generating
shadows and an opacity and a stroke
color and stroke width and cerning and
most of these things are just passed
into this text clip class but because
blurring is not supported in movie pie
then we add this blur radius parameter
here and if we have a blur radius then
we take the movie pie text clip and we
convert it into the pillow format so
we're using the pillow Library which can
handle images and then we add just a
blur effect to it and then we convert it
back into a movie pie clip and basically
that allows us to create Shadows for the
text so here we are defining the text we
want to draw and the font we want to use
and then we create the text clip of the
actual text and then we create the
shadow clip and then we create this
composite video clip into which we add
the video the original video and then
the shadow clip and the shadow clip
again to make the shadow or stronger and
actually three times and then we add the
text to it so these are basically layers
in the video and right now we are just
saving the first frame of this just to
test it out but we can also write the
full video so if I run this now Python 3
movie test.py then this should create
one frame of our short with the text
subscribe in it so if I open output. PN
G then this is what it looks like so now
after 30 minutes of recap which
hopefully I can condense down to a
couple of minutes we can get to the
point of today's video which is to
actually integrate this styling of the
text into capacity so let's do it and
I'm in fact thinking about making this
another a third project just for writing
styled text on video with python because
there doesn't seem to be a very good
solution solution for that at the moment
but for now I will simply move this
function into a separate module so let's
create a new module here and let's call
this I don't know text drawer. PI in
lack of a better name and we just want
to add this function here and this one
but we don't want to do that we only
want to have these functions in there
and we in fact don't need to import the
video file clip we have to import this
in the main application and I guess we
don't need the composite video clip we
need that again in the main application
and this one is actually just a
hallucination by jat GPT so we don't
need that either and this comment does
not belong here either and we don't have
to save this test. PNG in here by the
way if you know how to do this without
having to save theame frame from movie
pie and then opening the file with
pillow then feel free to make a pull
request because I'm not really happy
about how I did it so basically this
converts from the movie pie clip format
into the pillow image format anyway
let's save this now and let's go to our
main function and let's find out where
we are actually drawing the text here we
have a right line function which takes
the text and a frame and a text y a font
font scale white color black color
thickness and a border and then it
calculates the size of the text to make
it centered and then it just puts the
text into the frame and here we're kind
of doing a shadow again so we are
drawing it twice and this is more like a
stroke than a shadow because we draw it
first in black and the thickness is
going to be the regular thick
plus border * 2 so in CV2 this results
in kind of a stroke and where do we use
this right line function we use it here
so this is the loop where we read every
frame of the original video and then we
Loop through all of the captions and
then we get the line data from calculate
lines which takes the text and the frame
width and from here we are going to get
the height and the lines so the height
is apparently the height of one line and
lines is a list of the texts in the
lines okay and then we write all the
lines on the frame so we of course have
to rewrite this calculate lines function
because we are not using CV2 anymore
let's see what is calculate lines it
takes the text and the frame width and
basically we calculate the size of the
text so the width and height and this is
done with CV2 so we have to figure out
how to do this in movie pie so let's go
to our text drawer and let's create that
function let's define get text size and
let's see what copilot says it wants the
text the font size the font and a stroke
width which I guess makes sense because
those are all the things that matter in
the calculation of the width and the
height of the text so let's do that and
we are going to create the text clip
into which we pass the text and the font
size and the color which color doesn't
really matter so here we use White and
the font and the stroke width okay and
then we convert that into the pillow
image and then we get the size of the
image which actually makes sense
although it just seems like kind of a
wasteful operation again because we are
saving it into a file and we might be
doing this quite a few times in the code
but let's use this to begin with and
let's see what the size actually is so
if this returns an image what is the
definition of an image we have the image
class here and it has a size which I
guess is going to be width and height
okay and in our
main we also have width and height the
same way so that will work the same so
this has to be get text width which we
have to import from our text drawer so
from text drawer import get text size
which actually yeah it's not text width
it is text size and we don't have to
take any zero element from there but we
need to change this so we have font font
scale and thickness but we want font
size font and stroke width okay so we
just move these around let's rename font
scale to font size and let's use that
120 was I using that one what was the
what was in the example here 120 okay
let's use that and we pass in the line
the font size and the font and stroke
with WID so let's get rid of thickness I
just call it stroke width and what were
we using here two so let's use two
stroke width is going to be two and by
the way these are right now global
variables which is not that great but we
have to refactor that at some point so
that should in fact work we get the text
size and the rest of this should just
work the same because we just check the
size and then if the width is greater
than the frame width then we are going
to add to the lines to write the current
line to draw if we already have some
text that we have drawn already or that
we tried to draw but it didn't fit
anymore so then we put it there okay
that makes sense they should work and
then the right line function now
currently we are using the frame which
is a CV2 frame but in movie pie this
works a little bit differently
now we might at first just literally
create the image of the text and then
just put that in the frame with CV2
which will be very slow I'm guessing but
let's start with that so again we have
to get the text size which we will do by
get text size and we pass in the text
and the font size and then the font and
the stroke width okay and no zero and
then we do the same thing here so we
take the frame shape one which is the
width of the frame of the video frame
and then we subtract the width of the
text and then we make this origin point
which is text X text y but then we want
to actually get the image and presumably
there is some sort of put image is there
such a thing let's see how this works um
image we can't find image here so let's
actually ask chat gbt how do I put an
image in the frame in a specific
position in CV2 we are going to say I am
read so we read the frame we read the
image and we are going to say frame y
offset y offset Plus image height Okay
so so I think what happens here is that
it is actually going to replace that
part of the frame with the image which
is not what we want because we want to
have it on top so we can't really do
that so it's probably easier to just use
movie pie so let's abandon this idea of
using CV2 here and we have to then
refactor this part as well so let's open
the video file which we do like this so
this we have to do in main over here and
none of this stuff and where do we
actually Define the name of the file it
is video file let's still put it here
and let's come back to these later we
still like get the frame rate and the
frame width and height with CV2 but let
that be like that for now and we have to
import this video file clip so let's do
that from movie pie editor
import video file clip and here we are
reading the frames with CV2 again and
for every frame we get the time which is
calculated by adding to it one over
frame rate and what do we do with the
time we just see if the captions start
and end is within that time and what is
actually in captions where do we get
this okay we get this from the segment
part
which takes the segments which are parts
of the narration transcribed by Whisper
and we pass this fit function which
basically checks how much text fits in
the frame so these captions are already
the captions we are going to put on the
video and since movie pie Works
differently we can probably implement
this in a completely different way so we
are not going to go frame by frame over
the video
we are just going to Loop through all of
the captions and in movie pie we had
this thing called set duration so we set
the duration of a clip but how do we set
the start time let's ask chat gbt how
does that work how do I set the start
time of a clip in movie pie okay set
start that makes sense so this is
actually going to be very easy we just
go through all the captions and we
create the text so text equals create
text was that what I call it create text
yes and we pass in all of this stuff
like this and this was called actually
font size like this and we don't have a
color let's call this font color and set
it up here right now font color let's
just do I don't know white just to test
it out I mean I guess we can use the
color that we tried in this file which
is that so font color will be this and
the font is not actually the CV2 font
let's again use this font and we create
the text which we have to import now so
create text and this I believe returns
an image clip or a text clip which is
clip let's see how this is defined it is
a video clip so let's actually Define it
here that this is going to return a
video clip which we have to import from
here so this is the video clip and the
text we pass in here is actually caption
text font size is fine font color is
fine font is fine background color let's
say the transparent now blur radius zero
opacity one stroke color okay that's
fine at the moment did we actually set
the stroke width somewhere we did so
let's set that to the stroke width
stroke width and we have to set the
stroke color let's do stroke color and
actually background color is transparent
by default so we don't have to set that
and blur radius is zero by default we
don't have to set that opacity is one by
default we don't have to set that that
either so we just pass in the stroke
color and the stroke width and curing is
zero as well so let's just Define stroke
color and stroke color is black so this
is going to be our text clip and then we
can say text equals text set duration
and set start why doesn't it suggest
text equals text. set start caption
start wait a minute did chat gity lie to
me let's actually ask about a text clip
how do I set the start time of a text
clip in mipie what time the text clip
should start in a composite video that
is what I mean to set the start time of
a text clip in movie you can use a set
start method on the text clip but it
doesn't seem like we have that thing or
H let's go to the definition of this set
duration do we have set start we have
set start so vs code is just being
annoying so we can set the start and we
can set the duration I want to set the
start first okay what so now set
duration is not defined set start
returns any that is the reason it
returns any so vs code doesn't know
anymore what happens fine so this will
will now write the text in the correct
position in the video but it will write
the whole text on one single line so we
do need this calculate lines function
but we don't need this part we just do
that so basically this has to be moved
inside of this for line in line data
lines we are going to do that and we
have to do text equals text do set
position and we can Center it
horizontally easily but then we need to
position it in the y direction because
we want multiple lines now I wonder does
movie pie support new lines can we add a
new line here that would simplify this a
lot let's try if we now Run movie pie
test will it actually write two lines
that would be pretty amazing if we now
check our output then it in fact
does support new lines so this is going
to be super easy let's go back to the
text drawer sorry to the main and we are
just going to do this we calculate the
lines and we only draw The Thing Once
down here and I'm not sure what this
break does here we don't need this so we
draw the thing once but here we set like
lines to be an empty string and for
every line we just say that lines plus
equals line plus a new line which I
guess we can just say that lines equals
new line join line data lines I think
that's how it works and then we can just
write the lines directly here so that
simplifies it now I wonder is there even
an automatic line break can I set the
width of a text clip let's see the
definition here and is there like set
width or set bounding box
bounding there is no bounding box I
think we still have to like do our own
new lines so now we calculate the lines
we get the text Y which we actually
don't need so this can be now just
Center and we don't need this stuff and
we don't even need the line height we
just want the line
okay and we have to gather these text
clips into some list so let's put like a
Clips list in which we have the video
and then we're going to add the text
into the clips and then we need to do
this stuff we have to do composite video
clip which conveniently takes a list so
we can do this after that video with
text is composite video clip with all of
the clips and we have to import
composite video clip from here and what
else currently we are using FFM Peg to
combine the video and the audio and this
is because CV2 doesn't support audio but
I guess now we can just do this so we
call on the video with text right video
file output video and let's put 30 FPS I
think or or original video is 30 FPS and
will this work uh let me check the fits
frame function how does this work so
basically it uses the calculate lines
with the text and the frame width and it
just checks if there are fewer than or
equal to two lines then the text will
fit because that is our rule right now
we have to fit two lines of text maximum
okay that should work and do we even use
right line we don't use right line we
can re move right line and calculate
lines should now work because we get the
text size and then based on that we
calculate how many lines fit on the
screen and we don't need to do this
extract audio on video because movie pie
supports video or audio automatically
and this is just a whisper
transcription and we don't use the oh
actually we want to use the frame rate
which we probably don't want to use CV2
to get but let's just put it there FPS
is the frame rate and frame height we
don't use at all right now so let's
remove that and I'm almost ready to try
this out we go through all the captions
we calculate lines which is kind of
redundant because we do this already
when we calculate the captions we might
want to save that over there already but
basically this just splits the caption
into lines and then we draw the lines
okay I am ready to test this let's run
main.py Python 3
main.py and did we give it the file how
did this work sorry I'm an idiot of
course we need to extract the audio
because the whisper needs the audio so
let me go back a little bit and take
this extract audio and let's put that
back here and what this video file okay
this Arc V1 so let's do it main.py and
short. AI which is the one without text
and then let's fix all the errors that
happen here of which there were zero
zero errors and now we're writing the
video file so let's see what happens
okay it is now done that was way slower
than CV2 kind of surprisingly and the
actual slow part was the rendering of
the video so none of my code was was
really slow but the rendering part now
my code might have something to do with
that but let's see what this video looks
like let's run with VLC output video.
MP4 ever wonder what HTML 5 might be
hiding in its depths there's more to it
than just to facilitate webpage display
did you know it features geolocation
abilities allowing sites to know your
location or how about its video and
audio elements making for easy
multimedia integration perhaps the most
fascinating is the offline storage
feature enabling the browser to store
your data diving into HTML 5es talents
indeed gives A New Perspective into the
world of web development okay it
actually worked first try I can't
believe it something weird happened in
the end I'm not sure if that was a bug
in VLC or what happened with the audio
but yeah it actually worked interesting
now let's see if we can modify it what
if we want to have just one line of text
is it still going to work if I change
here just one and then we run it again
so now we should have only one line
visible at a time in the video but it
should still show all of the
transcription and let's take a look at
this ever wonder what HTML 5 might be
hiding in its depths there's more to it
than just to facilitate webpage display
did you know it features geolocation
abilities allowing sites to know your
location or how about its video and
audio elements making for easy
multimedia integration perhaps the most
fascinating is the offline storage
feature enabling the browser to store
your data diving into HTML 5es talents
indeed gives A New Perspective into the
world of web development yeah it works
just as expected now one thing I would
like to do is have some sort of padding
there because sometimes the text goes a
bit too close to the borders so let's
add a padding and let's call it I don't
know what we should call it is it in
pixels I guess and the video is pretty
big so let's do 50 pixels of padding and
then our fits frame function will
calculate the lines with frame width
minus padding does that make sense maybe
we want to just pass it in here so when
we call fits frame we pass here frame
withth
minus padding like this but we also have
to pass it in here so maybe we want to
say something like text bounding box
with uh that's so long so this would be
frame width minus padding let's just
call it bbox text bbox width that's not
too long so then we can pass those in
here and let's actually check out our
competitor sub
magic.co what kind of size do they use
for these
videos so well they have different kind
of sizes but how about in the Mr Beast
video you can basically fit algorithm
rewards so how many characters is that
algorithm rewards and in
our we can fit ever wonder what HTML 5
ever wonder what HTML 5 and we should
have some padding so maybe two
characters from each side we should drop
so drop these two and these two so it's
basically the same size we can make it a
little bit bigger so we can say over
here that our font size is 130 and I
think we can add some stroke let's put
like four to the stroke and what is in
fact margin we don't use it so this was
used with the CV2 for the line like
margin between the lines but now it's
just the new line so I guess that is
something that we can't now change
unless we can change it directly in
movie pie and Border five what is that
we don't use it and of course white
color and black color we don't use
anymore because now they are in this
different format okay and I do want to
get rid of CV2 now I don't want to
import CV2 un necessarily and we are not
using math so where do we use CV2 we get
the frame rate from it and the frame
width so how do we do that in movie pie
how do I get the frame rate and width of
a video with movie Pi we can use
video. FPS and video. W okay so this is
going to be video. W and frame rate we
are using just here it is video. FPS and
we don't have to destroy CB2 windows and
we don't have to get the FPS here and we
don't have to get the cap and we have to
do this after the video which means we
have to do this after after that so this
is opening the video and then we
calculate this bounding box wi and then
we set the clips to the video and then
we get the captions so now we are not
using CV2 anymore we are just using
movie pie which means our requirements
shall not have CV2 anymore only movie
pie and pillow and Whisper great so
let's run it one more time are we
getting rid of this temp audio file
right now um yeah so it is actually a
temp file okay how about temp video file
are we even using it we are not using it
and we actually Define output file but
we don't use it so let's say that this
is the output file and do we need to do
anything else that's basically it and it
works now we did have this position here
um caption y pause so in a previous
video I added this kind of like
multiplier for the position which is
kind of a stupid idea anyway so we might
just say position and we can put here
Center Center and then we can pass it
directly here like this so then you can
change the position if you want okay so
now we can change all the settings here
which we might want to move in some sort
of configuration file or something and
actually pass them in as parameters to
some I don't know a class or some
function at least but let's try it again
and see if it works nicely ever wonder
what HTML 5 might be hiding in its
depths there's more to it than just to
facilitate webpage display did you know
sorry this is the wrong one because now
I Chang it to a different file so it is
now with transcript. Avi ever wonder
what HTML 5 might be hiding in its
depths there's more more to it than just
to facilitate webpage display did you
know it features geolocation abilities
allowing sites to know your location or
how about its video and audio elements
making for easy multimedia integration
perhaps the most fascinating is the
offline storage feature enabling the
browser to store your data diving into
HTML 5es talents indeed gives A New
Perspective into the world of web
development okay it seems to work pretty
well but there's again in the end I
heard some weirdness in there and I
think we can add a little bit more
padding because at some point it went a
bit too far in my opinion let me try to
see the place like here it is a little
bit too close to the edges so maybe even
100 pixels of padding and well yeah I
guess we should do times two because I
want 50 pixels on each side and I really
find this very annoying that movie pie
doesn't really support a proper stroke
because the stroke is in the middle so
it will make the text thinner which I
don't like but I don't have a solution
for that right now as I tried in the
previous video it doesn't help if I draw
the text twice because the thickness
doesn't work in the same way as in CV2
in movie pie so let's do that as the
last thing I will put a stroke with
three and I will make the padding times
two so then it should look pretty nice
and in the next video I am going to
implement the highlighting of the
current word so that the current word
will show in a different color and maybe
even a different size maybe it will
become bigger for a while when it's like
highlighted and I probably want to like
zoom in and out the background image or
pan it somehow or do something like that
but that will be in the next video so
make sure to subscribe if you want to
see that one but let's now try this one
more time and let's do a little bit of
modifications to the style so what if I
want let's say a red color and then
let's try one other font that I have we
have poetsen one which I can't view here
but let's see how that looks like
poetsen one regular and if we we run
this thing with those settings let's see
what it will look like ever wonder what
HTML 5 might be hiding in its depths
there's more to it than just to
facilitate webpage dis did you know it
features geolocation abilities allowing
sites to know your location or how about
its video and audio elements making for
easy multimedia integration perhaps the
most fascinating is the offline storage
feature enabling the browser to store
your data diving into HTML fives talents
in indeed gives A New Perspective into
the world of web development okay that
wasn't the greatest font or the greatest
color but at least we can modify the
colors and the fonts so hopefully you
like this video and if you did then make
sure to subscribe and hit the like
button and also give me a comment down
below and let me know what video should
I do next and if you want to try out
capacity then you can find it on my
GitHub page p and by the way it is not
open CV anymore now it is movie pie
anyway thanks for watching and I will
see you in the next one