MoviePy is an Awesome Python Library for Automatic Video Editing!

Transcribed Jun 28, 2026 Watch on YouTube ↗

Intermediate 14 min read For: Python developers with some experience in video processing or automation, interested in using MoviePy for building video editing tools.

18.9K

Views

331

Likes

24

Comments

3

Dislikes

1.9%

📊 Average

AI Summary

The video demonstrates migrating an automatic YouTube Short caption generator from OpenCV to MoviePy to improve text styling and caption synchronization. The developer shows how to use MoviePy's `TextClip` with custom fonts, colors, and strokes while leveraging Whisper for accurate transcription timestamps. The final output achieves better caption quality with fewer dependencies and more flexible styling options.

Chapters

1 Introduction to the Project and Problem Recap 00:00 2 Creating the Text Drawing Module 01:41 3 Refactoring the Main Loop for MoviePy 03:27 4 Testing the Migration and Fixes 05:53 5 Removing OpenCV and Final Testing 08:04 6 Customising Styles and Wrap-Up 10:17

[00:05]

Project Overview and Recap

The project started as a YouTube Short generator that uses AI to create video clips from text input. It generates narration and captions, but original captions were poorly synced.

[02:20]

Introducing the Capacity Project

A new project called 'capacity' improves sync by transcribing audio with Whisper, getting timestamps for every word, and drawing captions on the video.

[04:42]

Text Styling with MoviePy

MoviePy is used to create styled text clips with parameters like font, fontsize, color, stroke_color, stroke_width, and a custom blur for shadows using Pillow.

[05:30]

Refactoring to MoviePy

The developer refactors the main loop to use MoviePy's VideoFileClip, setting start and duration for each text clip, and discovers MoviePy supports newlines in text.

[08:20]

Successful Test Run

After migration, the video renders correctly on first try. Captions are synced and styled. Rendering is slower than OpenCV but quality is better.

[10:46]

Removing OpenCV Dependency

The developer removes OpenCV entirely, gets frame rate and width from MoviePy, and adds padding to avoid text clipping at edges.

[12:06]

Customisation and Next Steps

Fonts, colors, and stroke widths can be easily changed. Example shows red color with 'Poetsen One' font. Next steps include word highlighting and background effects.

Clickbait Check

85% Legit

"The title promises a demonstration of MoviePy for automatic video editing, and the video delivers exactly that by migrating a caption generator from OpenCV to MoviePy."

Mentioned in this Video

MoviePy

tool

OpenAI Whisper

tool

Pillow (PIL)

tool

GitHub capacity project

link

CMU Sphinx (mentioned indirectly)

tool

Tutorial Checklist

1 00:32 Create a prompt text file for the YouTube Short content.

2 00:56 Run main.py with the prompt text file to generate the initial short video (using the old generator).

3 02:43 Disable caption drawing in the old generator to get a video without captions.

4 02:58 Move the captionless video file into the capacity project folder.

5 03:06 Run main.py in the capacity project, passing the video file path as argument. This will transcribe the audio with Whisper and add captions.

6 07:07 Create a text drawing module (text_drawer.py) containing the create_text function and get_text_size function from the MoviePy test script.

7 08:17 In main.py, replace OpenCV's text drawing functions (write_line, calculate_lines) with MoviePy equivalents using TextClip and CompositeVideoClip.

8 09:49 Modify calculate_lines to use get_text_size from text_drawer.py instead of OpenCV's text size calculation.

9 10:25 Replace the frame-by-frame rendering loop with a clip list approach: create TextClips for each caption, set duration and start, then composite everything.

10 11:43 Remove OpenCV dependency by getting frame rate and width from MoviePy's VideoFileClip (video.fps, video.w).

11 13:20 Add padding to avoid text clipping at video borders by subtracting from frame width in fits_frame.

12 08:20 Test the video output and verify caption sync and styling.

💡 Key Takeaways

🔧

Syncing captions with Whisper timestamps

Shows a practical method to achieve frame-accurate caption synchronization by transcribing audio with Whisper and using per-word timestamps.

01:57

💡

MoviePy's native newline support

Saves significant code complexity compared to manually splitting and positioning lines in OpenCV.

05:00

🔧

Using compositing for text layers

Demonstrates how to layer multiple text clips (e.g., for shadows) using MoviePy's `CompositeVideoClip`, which is more intuitive than OpenCV's manual pixel manipulation.

05:30

⚖️

Eliminating OpenCV dependency

Reduces project complexity by relying solely on MoviePy, Pillow, and Whisper.

10:46

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

AI generates YouTube Shorts from text!

41s

Shows a fully automated AI project that creates entire videos, surprising and inspiring viewers.

▶ Play Clip

Fix caption sync with Whisper timestamps

52s

Solves a common and frustrating video editing problem with a clever technical solution.

▶ Play Clip

Add shadows to text in MoviePy (hack)

58s

Teaches a workaround for MoviePy's limitations, showing resourcefulness in coding.

▶ Play Clip

MoviePy newline support simplifies code!

60s

Reveals a surprising and time-saving feature, making video text layout much easier.

▶ Play Clip

Full Transcript

Download .txt Download .md

[00:00] hi there in today's video I'm once again

[00:02] working on my capacity project which is

[00:05] an automatic YouTube short caption

[00:08] generator now to recap this project

[00:11] which has now been going on for several

[00:13] videos it started out with this short

[00:16] trity project which is an AI generated

[00:19] YouTube short generator so basically it

[00:22] takes in a text input and then it

[00:24] generates a YouTube short based on that

[00:28] and I can actually show you a demo so we

[00:30] can create a new file let's call it

[00:33] prompt. text and here we put the prompt

[00:36] for the short this can be either just a

[00:38] description of the YouTube short you

[00:40] want to create or it can be just a copy

[00:42] pasted news article or whatever text

[00:45] really let's try something like HTML 5

[00:49] features you've never heard of so we

[00:53] will save this and then we can run our

[00:56] main.py and we can give it this prompt.

[00:59] text and then it will just generate the

[01:01] whole thing and if we then open this

[01:04] thing in VLC for example then it will

[01:08] look like this HTML 5 a revamp of the

[01:12] classic HTML has brought many

[01:14] advancements to web development but some

[01:16] fascinating features might have slipped

[01:18] under your radar features like the micro

[01:20] dat API a method to integrate metadata

[01:23] into your sites or content security

[01:26] policy enhancing protection against

[01:28] excess attacks and webq L database

[01:30] although somewhat controversial can be a

[01:32] useful tool for web-based applications

[01:35] several Hidden Treasures lurk within

[01:36] HTML 5 enhancing the prowess and

[01:39] potential of web developers everywhere

[01:41] now as you might have noticed we have a

[01:43] couple of issues with this the first

[01:45] issue is that the captions are not quite

[01:48] synced to the video or to the narration

[01:51] and that is because we use kind of a

[01:52] dumb way of syncing the narration with

[01:55] the captions so basically we generate

[01:57] the narration one sentence at the time

[02:00] and then we check the duration of that

[02:03] narration audio and we calculate the

[02:05] duration per character and then we show

[02:08] every word in the video one by one based

[02:11] on how many characters there are and it

[02:14] is surprisingly accurate but still not

[02:17] that accurate so hence I created this

[02:20] capacity project and this one takes an

[02:24] existing video with a narration and then

[02:27] it transcribes the audio with open AI

[02:30] whisper and gets the timestamps of every

[02:32] word in the narration and that way we

[02:35] can sync the narration pretty much

[02:37] exactly correctly so if I go quickly

[02:40] back to the short trity project and I go

[02:43] to my text module and I disable the

[02:47] drawing of the captions and then I

[02:49] generate this short again then now we're

[02:52] going to have a short without the

[02:54] captions so let's move this file into

[02:58] the capacity project

[03:01] and then let's go into the capacity

[03:03] project and add the captions to this so

[03:06] here we can run again main.py and we

[03:09] give it the video file which in this

[03:11] case is short. Avi then it will

[03:14] transcribe this and it will add the

[03:16] captions to it and if we now play this

[03:20] with transcript. Avi which is the output

[03:23] of this script it will look like this

[03:27] ever wonder what HTML 5 might be hiding

[03:29] and its depths there's more to it than

[03:31] just to facilitate webpage display did

[03:33] you know it features geolocation

[03:35] abilities allowing sites to know your

[03:37] location or how about its video and

[03:40] audio elements making for easy

[03:42] multimedia integration perhaps the most

[03:44] fascinating is the offline storage

[03:46] feature enabling the browser to store

[03:48] your data diving into HTML FES talents

[03:51] indeed gives A New Perspective into the

[03:53] world of web development so now the

[03:55] captions are synced better and we can

[03:58] actually see multiple work words at a

[04:00] time and it in fact checks how many

[04:03] words fit into the video on two lines

[04:06] and that's how many it's going to use

[04:08] which brings us to the latest video and

[04:12] the subject of today's video so the

[04:15] captions you just saw were added to the

[04:18] video with open CV and open CV is not

[04:22] that great at styling text so we are in

[04:25] the process of moving this to movie pie

[04:28] and in the previous video I started

[04:31] already this movie pie test.py script to

[04:34] try and style the text in the videos and

[04:38] this script takes in a video file let's

[04:40] change this to the short. Avi that we

[04:43] have here and it adds some styled text

[04:46] to it so we have this create text

[04:49] function into which we can pass in the

[04:51] text and a font size and a color and a

[04:54] font and a background color and a blur

[04:57] radius which is used for generating

[05:00] shadows and an opacity and a stroke

[05:02] color and stroke width and cerning and

[05:05] most of these things are just passed

[05:08] into this text clip class but because

[05:11] blurring is not supported in movie pie

[05:13] then we add this blur radius parameter

[05:17] here and if we have a blur radius then

[05:19] we take the movie pie text clip and we

[05:23] convert it into the pillow format so

[05:25] we're using the pillow Library which can

[05:28] handle images and then we add just a

[05:30] blur effect to it and then we convert it

[05:33] back into a movie pie clip and basically

[05:36] that allows us to create Shadows for the

[05:38] text so here we are defining the text we

[05:41] want to draw and the font we want to use

[05:44] and then we create the text clip of the

[05:46] actual text and then we create the

[05:49] shadow clip and then we create this

[05:51] composite video clip into which we add

[05:53] the video the original video and then

[05:56] the shadow clip and the shadow clip

[05:58] again to make the shadow or stronger and

[06:01] actually three times and then we add the

[06:04] text to it so these are basically layers

[06:07] in the video and right now we are just

[06:10] saving the first frame of this just to

[06:12] test it out but we can also write the

[06:14] full video so if I run this now Python 3

[06:19] movie test.py then this should create

[06:22] one frame of our short with the text

[06:26] subscribe in it so if I open output. PN

[06:29] G then this is what it looks like so now

[06:33] after 30 minutes of recap which

[06:35] hopefully I can condense down to a

[06:37] couple of minutes we can get to the

[06:39] point of today's video which is to

[06:42] actually integrate this styling of the

[06:45] text into capacity so let's do it and

[06:49] I'm in fact thinking about making this

[06:51] another a third project just for writing

[06:54] styled text on video with python because

[06:57] there doesn't seem to be a very good

[06:59] solution solution for that at the moment

[07:01] but for now I will simply move this

[07:04] function into a separate module so let's

[07:07] create a new module here and let's call

[07:10] this I don't know text drawer. PI in

[07:14] lack of a better name and we just want

[07:17] to add this function here and this one

[07:21] but we don't want to do that we only

[07:24] want to have these functions in there

[07:26] and we in fact don't need to import the

[07:29] video file clip we have to import this

[07:32] in the main application and I guess we

[07:35] don't need the composite video clip we

[07:37] need that again in the main application

[07:40] and this one is actually just a

[07:42] hallucination by jat GPT so we don't

[07:45] need that either and this comment does

[07:49] not belong here either and we don't have

[07:51] to save this test. PNG in here by the

[07:55] way if you know how to do this without

[07:57] having to save theame frame from movie

[08:00] pie and then opening the file with

[08:02] pillow then feel free to make a pull

[08:05] request because I'm not really happy

[08:06] about how I did it so basically this

[08:09] converts from the movie pie clip format

[08:11] into the pillow image format anyway

[08:14] let's save this now and let's go to our

[08:17] main function and let's find out where

[08:20] we are actually drawing the text here we

[08:24] have a right line function which takes

[08:28] the text and a frame and a text y a font

[08:32] font scale white color black color

[08:35] thickness and a border and then it

[08:38] calculates the size of the text to make

[08:41] it centered and then it just puts the

[08:44] text into the frame and here we're kind

[08:47] of doing a shadow again so we are

[08:49] drawing it twice and this is more like a

[08:51] stroke than a shadow because we draw it

[08:55] first in black and the thickness is

[08:58] going to be the regular thick

[09:00] plus border * 2 so in CV2 this results

[09:03] in kind of a stroke and where do we use

[09:06] this right line function we use it here

[09:10] so this is the loop where we read every

[09:14] frame of the original video and then we

[09:17] Loop through all of the captions and

[09:19] then we get the line data from calculate

[09:22] lines which takes the text and the frame

[09:26] width and from here we are going to get

[09:29] the height and the lines so the height

[09:33] is apparently the height of one line and

[09:36] lines is a list of the texts in the

[09:39] lines okay and then we write all the

[09:42] lines on the frame so we of course have

[09:46] to rewrite this calculate lines function

[09:49] because we are not using CV2 anymore

[09:52] let's see what is calculate lines it

[09:55] takes the text and the frame width and

[09:57] basically we calculate the size of the

[10:00] text so the width and height and this is

[10:03] done with CV2 so we have to figure out

[10:06] how to do this in movie pie so let's go

[10:09] to our text drawer and let's create that

[10:12] function let's define get text size and

[10:18] let's see what copilot says it wants the

[10:21] text the font size the font and a stroke

[10:24] width which I guess makes sense because

[10:27] those are all the things that matter in

[10:29] the calculation of the width and the

[10:32] height of the text so let's do that and

[10:35] we are going to create the text clip

[10:37] into which we pass the text and the font

[10:40] size and the color which color doesn't

[10:42] really matter so here we use White and

[10:45] the font and the stroke width okay and

[10:48] then we convert that into the pillow

[10:50] image and then we get the size of the

[10:53] image which actually makes sense

[10:55] although it just seems like kind of a

[10:58] wasteful operation again because we are

[11:00] saving it into a file and we might be

[11:04] doing this quite a few times in the code

[11:07] but let's use this to begin with and

[11:09] let's see what the size actually is so

[11:12] if this returns an image what is the

[11:14] definition of an image we have the image

[11:17] class here and it has a size which I

[11:21] guess is going to be width and height

[11:24] okay and in our

[11:27] main we also have width and height the

[11:30] same way so that will work the same so

[11:33] this has to be get text width which we

[11:37] have to import from our text drawer so

[11:40] from text drawer import get text size

[11:45] which actually yeah it's not text width

[11:48] it is text size and we don't have to

[11:51] take any zero element from there but we

[11:55] need to change this so we have font font

[11:57] scale and thickness but we want font

[12:01] size font and stroke width okay so we

[12:04] just move these around let's rename font

[12:09] scale to font size and let's use that

[12:14] 120 was I using that one what was the

[12:18] what was in the example here 120 okay

[12:21] let's use that and we pass in the line

[12:25] the font size and the font and stroke

[12:28] with WID so let's get rid of thickness I

[12:32] just call it stroke width and what were

[12:36] we using here two so let's use two

[12:40] stroke width is going to be two and by

[12:43] the way these are right now global

[12:45] variables which is not that great but we

[12:48] have to refactor that at some point so

[12:50] that should in fact work we get the text

[12:52] size and the rest of this should just

[12:55] work the same because we just check the

[12:56] size and then if the width is greater

[12:59] than the frame width then we are going

[13:02] to add to the lines to write the current

[13:06] line to draw if we already have some

[13:09] text that we have drawn already or that

[13:12] we tried to draw but it didn't fit

[13:14] anymore so then we put it there okay

[13:16] that makes sense they should work and

[13:18] then the right line function now

[13:21] currently we are using the frame which

[13:24] is a CV2 frame but in movie pie this

[13:27] works a little bit differently

[13:29] now we might at first just literally

[13:33] create the image of the text and then

[13:35] just put that in the frame with CV2

[13:39] which will be very slow I'm guessing but

[13:42] let's start with that so again we have

[13:44] to get the text size which we will do by

[13:47] get text size and we pass in the text

[13:51] and the font size and then the font and

[13:55] the stroke width okay and no zero and

[13:58] then we do the same thing here so we

[14:00] take the frame shape one which is the

[14:03] width of the frame of the video frame

[14:06] and then we subtract the width of the

[14:10] text and then we make this origin point

[14:14] which is text X text y but then we want

[14:18] to actually get the image and presumably

[14:22] there is some sort of put image is there

[14:25] such a thing let's see how this works um

[14:29] image we can't find image here so let's

[14:32] actually ask chat gbt how do I put an

[14:37] image in the frame in a specific

[14:41] position in CV2 we are going to say I am

[14:47] read so we read the frame we read the

[14:51] image and we are going to say frame y

[14:55] offset y offset Plus image height Okay

[14:58] so so I think what happens here is that

[15:03] it is actually going to replace that

[15:05] part of the frame with the image which

[15:07] is not what we want because we want to

[15:10] have it on top so we can't really do

[15:13] that so it's probably easier to just use

[15:16] movie pie so let's abandon this idea of

[15:18] using CV2 here and we have to then

[15:23] refactor this part as well so let's open

[15:25] the video file which we do like this so

[15:30] this we have to do in main over here and

[15:33] none of this stuff and where do we

[15:35] actually Define the name of the file it

[15:39] is video file let's still put it here

[15:43] and let's come back to these later we

[15:45] still like get the frame rate and the

[15:47] frame width and height with CV2 but let

[15:50] that be like that for now and we have to

[15:52] import this video file clip so let's do

[15:55] that from movie pie editor

[15:59] import video file clip and here we are

[16:02] reading the frames with CV2 again and

[16:06] for every frame we get the time which is

[16:11] calculated by adding to it one over

[16:14] frame rate and what do we do with the

[16:16] time we just see if the captions start

[16:21] and end is within that time and what is

[16:24] actually in captions where do we get

[16:26] this okay we get this from the segment

[16:28] part

[16:29] which takes the segments which are parts

[16:32] of the narration transcribed by Whisper

[16:35] and we pass this fit function which

[16:38] basically checks how much text fits in

[16:41] the frame so these captions are already

[16:45] the captions we are going to put on the

[16:47] video and since movie pie Works

[16:50] differently we can probably implement

[16:52] this in a completely different way so we

[16:54] are not going to go frame by frame over

[16:58] the video

[16:59] we are just going to Loop through all of

[17:02] the captions and in movie pie we had

[17:06] this thing called set duration so we set

[17:11] the duration of a clip but how do we set

[17:14] the start time let's ask chat gbt how

[17:17] does that work how do I set the start

[17:21] time of a clip in movie pie okay set

[17:26] start that makes sense so this is

[17:29] actually going to be very easy we just

[17:32] go through all the captions and we

[17:35] create the text so text equals create

[17:39] text was that what I call it create text

[17:43] yes and we pass in all of this stuff

[17:46] like this and this was called actually

[17:48] font size like this and we don't have a

[17:51] color let's call this font color and set

[17:55] it up here right now font color let's

[17:59] just do I don't know white just to test

[18:02] it out I mean I guess we can use the

[18:04] color that we tried in this file which

[18:08] is that so font color will be this and

[18:12] the font is not actually the CV2 font

[18:15] let's again use this font and we create

[18:20] the text which we have to import now so

[18:23] create text and this I believe returns

[18:29] an image clip or a text clip which is

[18:33] clip let's see how this is defined it is

[18:36] a video clip so let's actually Define it

[18:39] here that this is going to return a

[18:41] video clip which we have to import from

[18:44] here so this is the video clip and the

[18:49] text we pass in here is actually caption

[18:52] text font size is fine font color is

[18:54] fine font is fine background color let's

[18:58] say the transparent now blur radius zero

[19:01] opacity one stroke color okay that's

[19:03] fine at the moment did we actually set

[19:06] the stroke width somewhere we did so

[19:08] let's set that to the stroke width

[19:11] stroke width and we have to set the

[19:14] stroke color let's do stroke color and

[19:18] actually background color is transparent

[19:20] by default so we don't have to set that

[19:23] and blur radius is zero by default we

[19:25] don't have to set that opacity is one by

[19:27] default we don't have to set that that

[19:29] either so we just pass in the stroke

[19:30] color and the stroke width and curing is

[19:33] zero as well so let's just Define stroke

[19:37] color and stroke color is black so this

[19:40] is going to be our text clip and then we

[19:44] can say text equals text set duration

[19:48] and set start why doesn't it suggest

[19:52] text equals text. set start caption

[19:56] start wait a minute did chat gity lie to

[20:00] me let's actually ask about a text clip

[20:04] how do I set the start time of a text

[20:07] clip in mipie what time the text clip

[20:12] should start in a composite video that

[20:18] is what I mean to set the start time of

[20:20] a text clip in movie you can use a set

[20:22] start method on the text clip but it

[20:25] doesn't seem like we have that thing or

[20:29] H let's go to the definition of this set

[20:33] duration do we have set start we have

[20:37] set start so vs code is just being

[20:39] annoying so we can set the start and we

[20:42] can set the duration I want to set the

[20:44] start first okay what so now set

[20:47] duration is not defined set start

[20:50] returns any that is the reason it

[20:53] returns any so vs code doesn't know

[20:55] anymore what happens fine so this will

[20:58] will now write the text in the correct

[21:02] position in the video but it will write

[21:05] the whole text on one single line so we

[21:09] do need this calculate lines function

[21:11] but we don't need this part we just do

[21:14] that so basically this has to be moved

[21:18] inside of this for line in line data

[21:22] lines we are going to do that and we

[21:25] have to do text equals text do set

[21:30] position and we can Center it

[21:32] horizontally easily but then we need to

[21:34] position it in the y direction because

[21:37] we want multiple lines now I wonder does

[21:40] movie pie support new lines can we add a

[21:43] new line here that would simplify this a

[21:45] lot let's try if we now Run movie pie

[21:49] test will it actually write two lines

[21:53] that would be pretty amazing if we now

[21:55] check our output then it in fact

[21:59] does support new lines so this is going

[22:02] to be super easy let's go back to the

[22:05] text drawer sorry to the main and we are

[22:09] just going to do this we calculate the

[22:11] lines and we only draw The Thing Once

[22:16] down here and I'm not sure what this

[22:19] break does here we don't need this so we

[22:21] draw the thing once but here we set like

[22:25] lines to be an empty string and for

[22:29] every line we just say that lines plus

[22:33] equals line plus a new line which I

[22:37] guess we can just say that lines equals

[22:41] new line join line data lines I think

[22:45] that's how it works and then we can just

[22:48] write the lines directly here so that

[22:51] simplifies it now I wonder is there even

[22:55] an automatic line break can I set the

[22:59] width of a text clip let's see the

[23:01] definition here and is there like set

[23:04] width or set bounding box

[23:07] bounding there is no bounding box I

[23:10] think we still have to like do our own

[23:13] new lines so now we calculate the lines

[23:16] we get the text Y which we actually

[23:18] don't need so this can be now just

[23:21] Center and we don't need this stuff and

[23:24] we don't even need the line height we

[23:27] just want the line

[23:29] okay and we have to gather these text

[23:32] clips into some list so let's put like a

[23:35] Clips list in which we have the video

[23:39] and then we're going to add the text

[23:42] into the clips and then we need to do

[23:45] this stuff we have to do composite video

[23:48] clip which conveniently takes a list so

[23:51] we can do this after that video with

[23:55] text is composite video clip with all of

[23:58] the clips and we have to import

[24:01] composite video clip from here and what

[24:04] else currently we are using FFM Peg to

[24:09] combine the video and the audio and this

[24:11] is because CV2 doesn't support audio but

[24:15] I guess now we can just do this so we

[24:20] call on the video with text right video

[24:23] file output video and let's put 30 FPS I

[24:27] think or or original video is 30 FPS and

[24:31] will this work uh let me check the fits

[24:34] frame function how does this work so

[24:37] basically it uses the calculate lines

[24:39] with the text and the frame width and it

[24:41] just checks if there are fewer than or

[24:44] equal to two lines then the text will

[24:47] fit because that is our rule right now

[24:49] we have to fit two lines of text maximum

[24:53] okay that should work and do we even use

[24:56] right line we don't use right line we

[24:57] can re move right line and calculate

[25:01] lines should now work because we get the

[25:03] text size and then based on that we

[25:06] calculate how many lines fit on the

[25:09] screen and we don't need to do this

[25:12] extract audio on video because movie pie

[25:14] supports video or audio automatically

[25:18] and this is just a whisper

[25:20] transcription and we don't use the oh

[25:23] actually we want to use the frame rate

[25:25] which we probably don't want to use CV2

[25:27] to get but let's just put it there FPS

[25:30] is the frame rate and frame height we

[25:32] don't use at all right now so let's

[25:35] remove that and I'm almost ready to try

[25:38] this out we go through all the captions

[25:41] we calculate lines which is kind of

[25:43] redundant because we do this already

[25:45] when we calculate the captions we might

[25:47] want to save that over there already but

[25:50] basically this just splits the caption

[25:53] into lines and then we draw the lines

[25:56] okay I am ready to test this let's run

[26:01] main.py Python 3

[26:03] main.py and did we give it the file how

[26:08] did this work sorry I'm an idiot of

[26:11] course we need to extract the audio

[26:12] because the whisper needs the audio so

[26:17] let me go back a little bit and take

[26:19] this extract audio and let's put that

[26:23] back here and what this video file okay

[26:27] this Arc V1 so let's do it main.py and

[26:31] short. AI which is the one without text

[26:34] and then let's fix all the errors that

[26:36] happen here of which there were zero

[26:40] zero errors and now we're writing the

[26:43] video file so let's see what happens

[26:47] okay it is now done that was way slower

[26:50] than CV2 kind of surprisingly and the

[26:54] actual slow part was the rendering of

[26:55] the video so none of my code was was

[26:58] really slow but the rendering part now

[27:01] my code might have something to do with

[27:02] that but let's see what this video looks

[27:05] like let's run with VLC output video.

[27:10] MP4 ever wonder what HTML 5 might be

[27:13] hiding in its depths there's more to it

[27:15] than just to facilitate webpage display

[27:17] did you know it features geolocation

[27:32] your data diving into HTML 5es talents

[27:37] world of web development okay it

[27:40] actually worked first try I can't

[27:43] believe it something weird happened in

[27:45] the end I'm not sure if that was a bug

[27:47] in VLC or what happened with the audio

[27:51] but yeah it actually worked interesting

[27:54] now let's see if we can modify it what

[27:57] if we want to have just one line of text

[28:00] is it still going to work if I change

[28:02] here just one and then we run it again

[28:06] so now we should have only one line

[28:09] visible at a time in the video but it

[28:12] should still show all of the

[28:14] transcription and let's take a look at

[28:17] this ever wonder what HTML 5 might be

[28:20] hiding in its depths there's more to it

[28:22] than just to facilitate webpage display

[28:24] did you know it features geolocation

[28:39] your data diving into HTML 5es talents

[28:44] world of web development yeah it works

[28:47] just as expected now one thing I would

[28:51] like to do is have some sort of padding

[28:54] there because sometimes the text goes a

[28:57] bit too close to the borders so let's

[29:00] add a padding and let's call it I don't

[29:04] know what we should call it is it in

[29:05] pixels I guess and the video is pretty

[29:08] big so let's do 50 pixels of padding and

[29:11] then our fits frame function will

[29:14] calculate the lines with frame width

[29:17] minus padding does that make sense maybe

[29:21] we want to just pass it in here so when

[29:24] we call fits frame we pass here frame

[29:27] withth

[29:28] minus padding like this but we also have

[29:32] to pass it in here so maybe we want to

[29:35] say something like text bounding box

[29:41] with uh that's so long so this would be

[29:44] frame width minus padding let's just

[29:47] call it bbox text bbox width that's not

[29:51] too long so then we can pass those in

[29:53] here and let's actually check out our

[29:56] competitor sub

[30:00] magic.co what kind of size do they use

[30:04] for these

[30:06] videos so well they have different kind

[30:08] of sizes but how about in the Mr Beast

[30:12] video you can basically fit algorithm

[30:17] rewards so how many characters is that

[30:21] algorithm rewards and in

[30:26] our we can fit ever wonder what HTML 5

[30:30] ever wonder what HTML 5 and we should

[30:36] have some padding so maybe two

[30:39] characters from each side we should drop

[30:42] so drop these two and these two so it's

[30:45] basically the same size we can make it a

[30:48] little bit bigger so we can say over

[30:51] here that our font size is 130 and I

[30:56] think we can add some stroke let's put

[30:58] like four to the stroke and what is in

[31:01] fact margin we don't use it so this was

[31:04] used with the CV2 for the line like

[31:08] margin between the lines but now it's

[31:10] just the new line so I guess that is

[31:14] something that we can't now change

[31:16] unless we can change it directly in

[31:18] movie pie and Border five what is that

[31:21] we don't use it and of course white

[31:23] color and black color we don't use

[31:25] anymore because now they are in this

[31:27] different format okay and I do want to

[31:31] get rid of CV2 now I don't want to

[31:34] import CV2 un necessarily and we are not

[31:37] using math so where do we use CV2 we get

[31:41] the frame rate from it and the frame

[31:44] width so how do we do that in movie pie

[31:47] how do I get the frame rate and width of

[31:53] a video with movie Pi we can use

[31:58] video. FPS and video. W okay so this is

[32:03] going to be video. W and frame rate we

[32:08] are using just here it is video. FPS and

[32:13] we don't have to destroy CB2 windows and

[32:16] we don't have to get the FPS here and we

[32:19] don't have to get the cap and we have to

[32:22] do this after the video which means we

[32:26] have to do this after after that so this

[32:29] is opening the video and then we

[32:31] calculate this bounding box wi and then

[32:34] we set the clips to the video and then

[32:37] we get the captions so now we are not

[32:40] using CV2 anymore we are just using

[32:43] movie pie which means our requirements

[32:47] shall not have CV2 anymore only movie

[32:50] pie and pillow and Whisper great so

[32:54] let's run it one more time are we

[32:58] getting rid of this temp audio file

[33:00] right now um yeah so it is actually a

[33:04] temp file okay how about temp video file

[33:08] are we even using it we are not using it

[33:10] and we actually Define output file but

[33:12] we don't use it so let's say that this

[33:15] is the output file and do we need to do

[33:19] anything else that's basically it and it

[33:21] works now we did have this position here

[33:25] um caption y pause so in a previous

[33:29] video I added this kind of like

[33:31] multiplier for the position which is

[33:33] kind of a stupid idea anyway so we might

[33:36] just say position and we can put here

[33:40] Center Center and then we can pass it

[33:43] directly here like this so then you can

[33:47] change the position if you want okay so

[33:50] now we can change all the settings here

[33:53] which we might want to move in some sort

[33:55] of configuration file or something and

[33:57] actually pass them in as parameters to

[34:00] some I don't know a class or some

[34:02] function at least but let's try it again

[34:04] and see if it works nicely ever wonder

[34:08] what HTML 5 might be hiding in its

[34:10] depths there's more to it than just to

[34:12] facilitate webpage display did you know

[34:14] sorry this is the wrong one because now

[34:17] I Chang it to a different file so it is

[34:20] now with transcript. Avi ever wonder

[34:24] what HTML 5 might be hiding in its

[34:26] depths there's more more to it than just

[34:28] to facilitate webpage display did you

[34:30] know it features geolocation abilities

[34:33] allowing sites to know your location or

[34:35] how about its video and audio elements

[34:37] making for easy multimedia integration

[34:40] perhaps the most fascinating is the

[34:41] offline storage feature enabling the

[34:44] browser to store your data diving into

[34:46] HTML 5es talents indeed gives A New

[34:49] Perspective into the world of web

[34:51] development okay it seems to work pretty

[34:54] well but there's again in the end I

[34:56] heard some weirdness in there and I

[35:00] think we can add a little bit more

[35:03] padding because at some point it went a

[35:06] bit too far in my opinion let me try to

[35:10] see the place like here it is a little

[35:13] bit too close to the edges so maybe even

[35:15] 100 pixels of padding and well yeah I

[35:19] guess we should do times two because I

[35:22] want 50 pixels on each side and I really

[35:25] find this very annoying that movie pie

[35:28] doesn't really support a proper stroke

[35:31] because the stroke is in the middle so

[35:34] it will make the text thinner which I

[35:37] don't like but I don't have a solution

[35:39] for that right now as I tried in the

[35:41] previous video it doesn't help if I draw

[35:43] the text twice because the thickness

[35:46] doesn't work in the same way as in CV2

[35:49] in movie pie so let's do that as the

[35:52] last thing I will put a stroke with

[35:55] three and I will make the padding times

[35:59] two so then it should look pretty nice

[36:02] and in the next video I am going to

[36:05] implement the highlighting of the

[36:07] current word so that the current word

[36:09] will show in a different color and maybe

[36:12] even a different size maybe it will

[36:14] become bigger for a while when it's like

[36:16] highlighted and I probably want to like

[36:18] zoom in and out the background image or

[36:21] pan it somehow or do something like that

[36:24] but that will be in the next video so

[36:27] make sure to subscribe if you want to

[36:29] see that one but let's now try this one

[36:32] more time and let's do a little bit of

[36:34] modifications to the style so what if I

[36:38] want let's say a red color and then

[36:43] let's try one other font that I have we

[36:47] have poetsen one which I can't view here

[36:51] but let's see how that looks like

[36:53] poetsen one regular and if we we run

[36:59] this thing with those settings let's see

[37:02] what it will look like ever wonder what

[37:04] HTML 5 might be hiding in its depths

[37:06] there's more to it than just to

[37:08] facilitate webpage dis did you know it

[37:10] features geolocation abilities allowing

[37:12] sites to know your location or how about

[37:15] its video and audio elements making for

[37:17] easy multimedia integration perhaps the

[37:19] most fascinating is the offline storage

[37:24] your data diving into HTML fives talents

[37:27] in indeed gives A New Perspective into

[37:29] the world of web development okay that

[37:31] wasn't the greatest font or the greatest

[37:34] color but at least we can modify the

[37:38] colors and the fonts so hopefully you

[37:41] like this video and if you did then make

[37:43] sure to subscribe and hit the like

[37:46] button and also give me a comment down

[37:48] below and let me know what video should

[37:51] I do next and if you want to try out

[37:54] capacity then you can find it on my

[37:56] GitHub page p and by the way it is not

[37:59] open CV anymore now it is movie pie

[38:03] anyway thanks for watching and I will

[38:04] see you in the next one