---
title: 'AI is finally controllable! …with a stick [FREE ComfyUI Workflow + Tutorial]'
source: 'https://youtube.com/watch?v=pUb58eAZ3pc'
video_id: 'pUb58eAZ3pc'
date: 2026-06-15
duration_sec: 0
---

# AI is finally controllable! …with a stick [FREE ComfyUI Workflow + Tutorial]

> Source: [AI is finally controllable! …with a stick (FREE ComfyUI Workflow + Tutorial)](https://youtube.com/watch?v=pUb58eAZ3pc)

## Summary

This video presents a free ComfyUI workflow that allows users to control AI video generation by physically moving objects with a stick or using animated previews. The workflow, based on the Time to Move (TTM) research paper, erases the stick and animates the object to follow the motion. The creator demonstrates the process by making a short film of toys coming to life and also uses One Animate to transform himself into his 15-year-old self.

### Key Points

- **Controlling AI video with a stick** [0:00] — A workflow is introduced that lets users control AI video by moving plastic toys or printed cutouts through a scene. The workflow erases the stick and animates the objects to follow the motion.
- **Time to Move (TTM) research paper** [0:54] — The core of the workflow is the TTM paper, which is training-free and works with any diffusion-based video model like W 2.2, Cork Video X, or stable video diffusion.
- **Creating a control video** [1:12] — Users need to create a control video with motion, either by dragging things in After Effects/Blender or physically moving objects with a stick.
- **Dual clock denoising** [1:22] — TTM uses dual clock denoising: lower noise in moving areas to follow motion precisely, higher noise elsewhere to generate a clean background.
- **ComfyUI workflow setup** [1:35] — The workflow in ComfyUI automatically generates a mask using SAM 3 from Meta, extracts start and end frames, and can use Gwen image edit AI to remove sticks or clean up characters.
- **Step-by-step mask preparation** [2:35] — Users prepare a mask for the character, paint out the stick in start and end frames using tools like After Effects or the free ComfyUI workflow.
- **Creating the mask with SAM 3** [3:36] — Using SAM 3, users click on the character and right-click on parts to exclude, like the stick, then run to generate a mask.
- **Cleaning up start and end frames** [4:15] — Users copy the first frame, use the mask editor to select the stick area, and prompt to remove it. They can also adjust the prompt to make the character stand on the table.
- **Running the TTM workflow** [5:20] — After preparing frames and mask, users import them into the TTM workflow, set resolution, choose start/end frames, create a prompt, and run. The process splits into two parts with a rough preview after three steps.
- **Using One Animate for character transformation** [7:25] — To turn himself into his 15-year-old self, the creator used One Animate with a driving video and reference image. He trained a LoRA using AI toolkit with photos from his parents to improve consistency.

### Conclusion

The workflow is surprisingly easy and robust, with about 70% of shots working on the first try. The final short film demonstrates the technique, and the creator encourages viewers to subscribe and share their creations.

## Transcript

I've built a workflow that lets you
control AI video with a stick. Check
this out. You just move plastic toys or
printed cutouts through your scene and
the workflow erases whatever is holding
them and then animates them to follow
that exact motion. You can also skip the
stick entirely and use animated previews
instead using After Effects or Blender,
for example.
To show you exactly how this works and
how you can use it on your own computer
for free, we created an entire short
film about toys coming to life. And
yeah, I know Nico from Corridor Digital
had pretty much the same idea, but he
used a different technique. He used one
animate to transfer acting performances
onto his toy characters. In this video,
we're also going to look at one animate,
but I used it to turn myself into the
15year-old version of myself for the
short film because I'm a responsible
adult now. I definitely don't play with
toys anymore. Make sure to subscribe and
stick around till the end for the full
short film.
Now, at the heart of this workflow is a
research paper called time to move or
TTM for short. And the cool part is that
it's completely training free. So, it's
pretty much an architecture that you can
use with any diffusion-based video
model. We are using it with W 2.2, but
you could also use it with Cork Video X
or stable video diffusion, for example.
First, you need to create a control
video with some motion, either by
dragging things around in After Effects
or Blender, or physically moving around
stuff through your scene with a stick.
TTM then uses something called dual
clock dn noising. In areas where your
character is moving, it uses lower noise
to follow that motion precisely. In the
rest of the scene, higher noise lets it
generate a clean, natural background.
So to make this all as easy as possible,
I slap together some AI models in
Confui. Here you just import a video and
it automatically generates a black and
white mask for your character using the
new SAM 3 model from Meta. This workflow
also extracts the start and end frames.
And if you want, you can use the Gwen
image edit AI model to remove any sticks
or unwanted objects or clean up your
character if needed. Then you give it a
simple prompt describing the action of
your character and hit run. And that's
it. That's all there is to it. It's
surprisingly easy and very robust. I
remember when we created a controllable
creature for a previous short film using
one vase. We needed countless iterations
to get the movement right. It was really
exhausting. But with this workflow,
about 70% of the shots worked just on
the first try. To use this workflow,
you'll need Confui, which is an
open-source AI interface. If you don't
have Confui installed yet, we've
prepared a step-by-step guide on our
website that walks you through
everything. But fair warning, if this is
your first time using Confui, you might
want to start with a simpler workflow.
Let's start by bringing this shot to
life. First, we must prepare a mask for
the character. And we can paint out that
stick in the start and end frame. Now,
it really doesn't matter how you prepare
this. If you want, you can use After
Effects for the mask or nano banana to
paint out the stick or Photoshop or
something. But we also created this free
workflow that lets you do all the
preparation straight in comi. So to use
it, just drag and drop it into Confui.
Now in your case, you might need to
click manager, install missing custom
nodes if you have any red nodes in the
workflow, and let's zoom in on the left
side here. Here you can find all the
model loader nodes and you can find the
corresponding model that you need to the
left in this node right here. For the
Gwen image model, you can see that there
are different versions and you need to
pick the one that comfortably fits on
your GPU's VRAMm. Once you've downloaded
all these models and made sure that all
the correct ones are loaded, go to the
load video nodes and load in your plate.
If you want, you can name your shot
right here. For us, this was 20. And
then click run. Wait for the images to
load. Let's first create the mask for
our character. For this, we're using the
new SAM 3 model by Meta. If your
character enters at a later stage in the
video, so it's not there in the first
frame, you can just change the pick
frame right here. But for me, the
character is already in the video. So,
all I need to do is just click on this
character. And then I have to specify
which parts are not belonging to the
character. For this, I'm just right
clicking on the other parts of the image
like so. It's really important to
exclude the stick that your character is
on. So, I will also put a red dot right
here. Then you just click run. And after
a few seconds, our mask is done. And you
can see it flickers a little bit, but
that doesn't matter at all. Just ignore
that. Let's now clean up the start
frame. For this, I'm zooming in on this
part right here. And now I need to copy
over this first frame. For this, just
copy and paste. Now, you can click open
and mask editor. And then you can select
the area where your stick is. And you
can be generous there. Left click to
select and right click to delete parts
again. Click save. Come over to the
right here and add a simple prompt like
remove the wooden stick. Click run and
the stick is gone. But you can also
change these frames in more creative
ways. For example, you can see that the
Lego character is like hovering above
the table. So what we could also do is
just go back to the start here and
create a bigger mask. Something like
this. And I'm just creating a new
prompt. Remove the stick and make the
Lego figure stand on the table. And you
can see that worked really well. though
it changed the legs a little bit. So
this prompt worked a lot better. I also
added do not change the look of the
figure. And yeah, that that worked. So
let's also remove the stick from the end
frame. Open mask editor. I'm selecting
the stick. Go over here. Create a
prompt. Remove the stick. And that looks
good. The stick is gone. And now we have
everything we need. You could say it's
time to move. So drag and drop that
workflow in here. And you install this
one in the exact same way. Install
missing custom nodes. restart and then
you can find all the models that you
need in these model loader nodes right
here. Once you have everything set up,
you can import the start and end frame.
You will find these in your comfy folder
output and then there is a folder with a
shot number that you created and then
you can just drag and drop these in. So
this is my start frame right here and
then this is my end frame. Below that
you will need to import the plate of
your moving character and below that you
need to import the mask that we just
created. Next, come up here to the setup
and here you can select which resolution
you want to use. Next, you can choose if
you want to use the start frame and end
frame. You can use only a start frame or
only an end frame. But you still need to
import an image right here. Otherwise,
it will give you an error. For static
shots, a start frame is usually enough,
but if you really want to make sure that
your character does not change over the
duration of your shot, I would recommend
going with both. Next, you need to
create a simple prompt, something like
this. And then you can just click run.
The sampling process for the video is
split into two parts. And after three
steps, you already get a rough preview
like this. And usually you can already
tell if your shot is working or not.
Otherwise, you can just quit the
process. And I would recommend trying
another seat or adjusting your prompt.
Okay, first try and the result already
looks amazing. You can see how well it
integrates into the shot. But the
problem is that the legs kind of
separate and then start sticking back
together. And I think I can just fix
this using the prompt, I guess. Well,
and this kind of worked. Looks much
better now with this prompt. So, this is
the whole process. And as you can see,
it works really well. Now, is a good
time to mention that this video and the
free workflows are sponsored by our
lovely supporters on Patreon. Thank you
for supporting us on Patreon, keeping us
free and independent, and also allowing
us to share all these workflows for
free. If you want access to advanced
workflows, extra demo files, and our
amazing Discord community, consider
supporting. So, that's how we created
all the shots of the animated toys. But
there were also those shots where I
needed to turn myself into my 15year-old
self. For this, I wanted to use one
animate, which is based on one 2.2, but
specifically designed for character
animation. The concept is pretty simple.
You just need a driving video of your
performance and a reference image of the
character you want to transform into.
When I tested it a few weeks ago, it
worked pretty well. So, I just went
ahead and shot everything without doing
proper tests, trusting it would work out
of the box. In the end, I spent more
time wrestling with one animate than I
actually spent on the toy animation
workflow. It started pretty promising,
though. I used this image of me when I
was around 15 as a reference image, and
the shot itself looked pretty decent.
The problem was that I looked pretty
different in every single shot. But I
had an idea that I wanted to try for a
long time. Since one animate and one 2.2
are based on the same model, Lauras
trained for one 2.2 will work for one
animate as well. For those who don't
know, a Lara is pretty much like a small
extra model that you can train to help
the main model better understand a
specific concept that it didn't know
before. So, I asked my parents for more
photos of me when I was 15. And then I
used AI toolkit to train the Laura. So,
I created my data set with some very
basic captions like this. And then I
used these settings right here. Feel
free to copy them if you want. Once it
was done, I downloaded the Laura, edited
it at full strength, and look at how
much better these results are. There are
still some issues, especially with like
eye direction, but that's something I
would like to fix in a future video. So,
without further ado, here's the final
short film.
Ow.
Yeah.
Heat.
No,
it's you guys. Wait, you just robbed
neglected because I'm playing video
games all the time. I'm I'm so sorry.
>> No, nerd. We just want you to go
outside.
>> All right, that's it for this one. Thank
you so much for watching and thank you
to our lovely Patreon supporters for
making these videos possible. As always,
if you create something with these
workflows, feel free to tag me or send
it to me. I always love to see what you
come up with. Make sure to subscribe and
see you next time.
