Control AI video with a stick!
45sShows a novel, hands-on method to control AI video using physical objects, sparking curiosity.
▶ Play ClipThis video presents a free ComfyUI workflow that allows users to control AI video generation by physically moving objects with a stick or using animated previews. The workflow, based on the Time to Move (TTM) research paper, erases the stick and animates the object to follow the motion. The creator demonstrates the process by making a short film of toys coming to life and also uses One Animate to transform himself into his 15-year-old self.
A workflow is introduced that lets users control AI video by moving plastic toys or printed cutouts through a scene. The workflow erases the stick and animates the objects to follow the motion.
The core of the workflow is the TTM paper, which is training-free and works with any diffusion-based video model like W 2.2, Cork Video X, or stable video diffusion.
Users need to create a control video with motion, either by dragging things in After Effects/Blender or physically moving objects with a stick.
TTM uses dual clock denoising: lower noise in moving areas to follow motion precisely, higher noise elsewhere to generate a clean background.
The workflow in ComfyUI automatically generates a mask using SAM 3 from Meta, extracts start and end frames, and can use Gwen image edit AI to remove sticks or clean up characters.
Users prepare a mask for the character, paint out the stick in start and end frames using tools like After Effects or the free ComfyUI workflow.
Using SAM 3, users click on the character and right-click on parts to exclude, like the stick, then run to generate a mask.
Users copy the first frame, use the mask editor to select the stick area, and prompt to remove it. They can also adjust the prompt to make the character stand on the table.
After preparing frames and mask, users import them into the TTM workflow, set resolution, choose start/end frames, create a prompt, and run. The process splits into two parts with a rough preview after three steps.
To turn himself into his 15-year-old self, the creator used One Animate with a driving video and reference image. He trained a LoRA using AI toolkit with photos from his parents to improve consistency.
The workflow is surprisingly easy and robust, with about 70% of shots working on the first try. The final short film demonstrates the technique, and the creator encourages viewers to subscribe and share their creations.
"Title accurately describes the controllable AI video workflow using a stick, and the tutorial delivers on its promise."
What is the name of the research paper that forms the core of the AI video control workflow?
Time to Move (TTM)
0:54
What technique does TTM use to control motion in AI video generation?
Dual clock denoising: lower noise in moving areas to follow motion precisely, higher noise elsewhere for clean background.
1:22
Which AI model is used in the ComfyUI workflow to automatically generate a mask for the character?
SAM 3 from Meta
1:43
What tool can be used to remove sticks or unwanted objects from frames in the workflow?
Gwen image edit AI model
1:52
What is the approximate success rate of shots on the first try using this workflow?
About 70%
2:17
What is a LoRA in the context of AI models?
A small extra model that can be trained to help the main model better understand a specific concept it didn't know before.
8:17
Which AI model is One Animate based on?
One 2.2
7:31
Controlling AI video with a stick
Introduces a novel, intuitive method for controlling AI video generation using physical objects.
Time to Move (TTM) paper
Highlights a training-free architecture that works with any diffusion-based video model, making it widely accessible.
0:54Dual clock denoising explanation
Provides a clear technical explanation of how TTM achieves precise motion control while maintaining background quality.
1:2270% first-try success rate
Demonstrates the robustness and efficiency of the workflow compared to previous methods.
2:17Using LoRA for character consistency
Shows a practical solution for maintaining consistent character appearance across shots using LoRA training.
7:25[00:00] I've built a workflow that lets you
[00:01] control AI video with a stick. Check
[00:03] this out. You just move plastic toys or
[00:05] printed cutouts through your scene and
[00:06] the workflow erases whatever is holding
[00:09] them and then animates them to follow
[00:11] that exact motion. You can also skip the
[00:13] stick entirely and use animated previews
[00:15] instead using After Effects or Blender,
[00:18] for example.
[00:19] To show you exactly how this works and
[00:21] how you can use it on your own computer
[00:23] for free, we created an entire short
[00:25] film about toys coming to life. And
[00:27] yeah, I know Nico from Corridor Digital
[00:29] had pretty much the same idea, but he
[00:31] used a different technique. He used one
[00:33] animate to transfer acting performances
[00:35] onto his toy characters. In this video,
[00:37] we're also going to look at one animate,
[00:39] but I used it to turn myself into the
[00:41] 15year-old version of myself for the
[00:43] short film because I'm a responsible
[00:45] adult now. I definitely don't play with
[00:47] toys anymore. Make sure to subscribe and
[00:49] stick around till the end for the full
[00:51] short film.
[00:54] Now, at the heart of this workflow is a
[00:56] research paper called time to move or
[00:58] TTM for short. And the cool part is that
[01:00] it's completely training free. So, it's
[01:02] pretty much an architecture that you can
[01:04] use with any diffusion-based video
[01:06] model. We are using it with W 2.2, but
[01:08] you could also use it with Cork Video X
[01:10] or stable video diffusion, for example.
[01:12] First, you need to create a control
[01:13] video with some motion, either by
[01:15] dragging things around in After Effects
[01:17] or Blender, or physically moving around
[01:19] stuff through your scene with a stick.
[01:22] TTM then uses something called dual
[01:24] clock dn noising. In areas where your
[01:26] character is moving, it uses lower noise
[01:28] to follow that motion precisely. In the
[01:31] rest of the scene, higher noise lets it
[01:32] generate a clean, natural background.
[01:35] So to make this all as easy as possible,
[01:37] I slap together some AI models in
[01:39] Confui. Here you just import a video and
[01:41] it automatically generates a black and
[01:43] white mask for your character using the
[01:45] new SAM 3 model from Meta. This workflow
[01:48] also extracts the start and end frames.
[01:50] And if you want, you can use the Gwen
[01:52] image edit AI model to remove any sticks
[01:54] or unwanted objects or clean up your
[01:56] character if needed. Then you give it a
[01:58] simple prompt describing the action of
[02:00] your character and hit run. And that's
[02:03] it. That's all there is to it. It's
[02:05] surprisingly easy and very robust. I
[02:07] remember when we created a controllable
[02:09] creature for a previous short film using
[02:11] one vase. We needed countless iterations
[02:14] to get the movement right. It was really
[02:16] exhausting. But with this workflow,
[02:17] about 70% of the shots worked just on
[02:20] the first try. To use this workflow,
[02:21] you'll need Confui, which is an
[02:23] open-source AI interface. If you don't
[02:25] have Confui installed yet, we've
[02:26] prepared a step-by-step guide on our
[02:29] website that walks you through
[02:30] everything. But fair warning, if this is
[02:32] your first time using Confui, you might
[02:33] want to start with a simpler workflow.
[02:35] Let's start by bringing this shot to
[02:36] life. First, we must prepare a mask for
[02:39] the character. And we can paint out that
[02:41] stick in the start and end frame. Now,
[02:43] it really doesn't matter how you prepare
[02:45] this. If you want, you can use After
[02:46] Effects for the mask or nano banana to
[02:48] paint out the stick or Photoshop or
[02:50] something. But we also created this free
[02:51] workflow that lets you do all the
[02:53] preparation straight in comi. So to use
[02:55] it, just drag and drop it into Confui.
[02:57] Now in your case, you might need to
[02:59] click manager, install missing custom
[03:02] nodes if you have any red nodes in the
[03:04] workflow, and let's zoom in on the left
[03:06] side here. Here you can find all the
[03:08] model loader nodes and you can find the
[03:11] corresponding model that you need to the
[03:13] left in this node right here. For the
[03:15] Gwen image model, you can see that there
[03:16] are different versions and you need to
[03:19] pick the one that comfortably fits on
[03:21] your GPU's VRAMm. Once you've downloaded
[03:23] all these models and made sure that all
[03:25] the correct ones are loaded, go to the
[03:26] load video nodes and load in your plate.
[03:29] If you want, you can name your shot
[03:31] right here. For us, this was 20. And
[03:34] then click run. Wait for the images to
[03:36] load. Let's first create the mask for
[03:37] our character. For this, we're using the
[03:39] new SAM 3 model by Meta. If your
[03:41] character enters at a later stage in the
[03:44] video, so it's not there in the first
[03:45] frame, you can just change the pick
[03:47] frame right here. But for me, the
[03:49] character is already in the video. So,
[03:50] all I need to do is just click on this
[03:52] character. And then I have to specify
[03:55] which parts are not belonging to the
[03:56] character. For this, I'm just right
[03:58] clicking on the other parts of the image
[04:01] like so. It's really important to
[04:03] exclude the stick that your character is
[04:05] on. So, I will also put a red dot right
[04:07] here. Then you just click run. And after
[04:09] a few seconds, our mask is done. And you
[04:11] can see it flickers a little bit, but
[04:12] that doesn't matter at all. Just ignore
[04:14] that. Let's now clean up the start
[04:16] frame. For this, I'm zooming in on this
[04:19] part right here. And now I need to copy
[04:21] over this first frame. For this, just
[04:23] copy and paste. Now, you can click open
[04:27] and mask editor. And then you can select
[04:29] the area where your stick is. And you
[04:31] can be generous there. Left click to
[04:33] select and right click to delete parts
[04:35] again. Click save. Come over to the
[04:37] right here and add a simple prompt like
[04:40] remove the wooden stick. Click run and
[04:42] the stick is gone. But you can also
[04:44] change these frames in more creative
[04:46] ways. For example, you can see that the
[04:47] Lego character is like hovering above
[04:49] the table. So what we could also do is
[04:51] just go back to the start here and
[04:53] create a bigger mask. Something like
[04:55] this. And I'm just creating a new
[04:56] prompt. Remove the stick and make the
[04:58] Lego figure stand on the table. And you
[05:00] can see that worked really well. though
[05:02] it changed the legs a little bit. So
[05:04] this prompt worked a lot better. I also
[05:05] added do not change the look of the
[05:07] figure. And yeah, that that worked. So
[05:09] let's also remove the stick from the end
[05:11] frame. Open mask editor. I'm selecting
[05:14] the stick. Go over here. Create a
[05:17] prompt. Remove the stick. And that looks
[05:19] good. The stick is gone. And now we have
[05:20] everything we need. You could say it's
[05:22] time to move. So drag and drop that
[05:24] workflow in here. And you install this
[05:27] one in the exact same way. Install
[05:29] missing custom nodes. restart and then
[05:32] you can find all the models that you
[05:33] need in these model loader nodes right
[05:36] here. Once you have everything set up,
[05:38] you can import the start and end frame.
[05:40] You will find these in your comfy folder
[05:42] output and then there is a folder with a
[05:45] shot number that you created and then
[05:47] you can just drag and drop these in. So
[05:48] this is my start frame right here and
[05:51] then this is my end frame. Below that
[05:53] you will need to import the plate of
[05:55] your moving character and below that you
[05:58] need to import the mask that we just
[05:59] created. Next, come up here to the setup
[06:02] and here you can select which resolution
[06:03] you want to use. Next, you can choose if
[06:05] you want to use the start frame and end
[06:08] frame. You can use only a start frame or
[06:10] only an end frame. But you still need to
[06:12] import an image right here. Otherwise,
[06:14] it will give you an error. For static
[06:16] shots, a start frame is usually enough,
[06:18] but if you really want to make sure that
[06:20] your character does not change over the
[06:21] duration of your shot, I would recommend
[06:23] going with both. Next, you need to
[06:25] create a simple prompt, something like
[06:27] this. And then you can just click run.
[06:29] The sampling process for the video is
[06:31] split into two parts. And after three
[06:33] steps, you already get a rough preview
[06:35] like this. And usually you can already
[06:36] tell if your shot is working or not.
[06:38] Otherwise, you can just quit the
[06:40] process. And I would recommend trying
[06:42] another seat or adjusting your prompt.
[06:44] Okay, first try and the result already
[06:46] looks amazing. You can see how well it
[06:48] integrates into the shot. But the
[06:50] problem is that the legs kind of
[06:51] separate and then start sticking back
[06:52] together. And I think I can just fix
[06:55] this using the prompt, I guess. Well,
[06:58] and this kind of worked. Looks much
[07:00] better now with this prompt. So, this is
[07:02] the whole process. And as you can see,
[07:04] it works really well. Now, is a good
[07:06] time to mention that this video and the
[07:07] free workflows are sponsored by our
[07:08] lovely supporters on Patreon. Thank you
[07:11] for supporting us on Patreon, keeping us
[07:13] free and independent, and also allowing
[07:14] us to share all these workflows for
[07:16] free. If you want access to advanced
[07:18] workflows, extra demo files, and our
[07:20] amazing Discord community, consider
[07:22] supporting. So, that's how we created
[07:23] all the shots of the animated toys. But
[07:25] there were also those shots where I
[07:27] needed to turn myself into my 15year-old
[07:29] self. For this, I wanted to use one
[07:31] animate, which is based on one 2.2, but
[07:33] specifically designed for character
[07:35] animation. The concept is pretty simple.
[07:37] You just need a driving video of your
[07:39] performance and a reference image of the
[07:40] character you want to transform into.
[07:42] When I tested it a few weeks ago, it
[07:44] worked pretty well. So, I just went
[07:45] ahead and shot everything without doing
[07:47] proper tests, trusting it would work out
[07:50] of the box. In the end, I spent more
[07:51] time wrestling with one animate than I
[07:53] actually spent on the toy animation
[07:55] workflow. It started pretty promising,
[07:57] though. I used this image of me when I
[07:59] was around 15 as a reference image, and
[08:02] the shot itself looked pretty decent.
[08:03] The problem was that I looked pretty
[08:05] different in every single shot. But I
[08:06] had an idea that I wanted to try for a
[08:08] long time. Since one animate and one 2.2
[08:11] are based on the same model, Lauras
[08:13] trained for one 2.2 will work for one
[08:15] animate as well. For those who don't
[08:17] know, a Lara is pretty much like a small
[08:18] extra model that you can train to help
[08:20] the main model better understand a
[08:22] specific concept that it didn't know
[08:24] before. So, I asked my parents for more
[08:26] photos of me when I was 15. And then I
[08:29] used AI toolkit to train the Laura. So,
[08:31] I created my data set with some very
[08:34] basic captions like this. And then I
[08:36] used these settings right here. Feel
[08:39] free to copy them if you want. Once it
[08:41] was done, I downloaded the Laura, edited
[08:43] it at full strength, and look at how
[08:45] much better these results are. There are
[08:47] still some issues, especially with like
[08:49] eye direction, but that's something I
[08:51] would like to fix in a future video. So,
[08:53] without further ado, here's the final
[08:54] short film.
[09:09] Ow.
[09:19] Yeah.
[09:36] Heat.
[09:54] No,
[10:03] it's you guys. Wait, you just robbed
[10:06] neglected because I'm playing video
[10:08] games all the time. I'm I'm so sorry.
[10:11] >> No, nerd. We just want you to go
[10:13] outside.
[10:18] >> All right, that's it for this one. Thank
[10:20] you so much for watching and thank you
[10:21] to our lovely Patreon supporters for
[10:23] making these videos possible. As always,
[10:26] if you create something with these
[10:27] workflows, feel free to tag me or send
[10:30] it to me. I always love to see what you
[10:32] come up with. Make sure to subscribe and
[10:34] see you next time.
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.