TubeSum ← Transcribe a video

HuggingFace Crash Course - Sentiment Analysis, Model Hub, Fine Tuning

0h 38m video Transcribed Jun 30, 2026 Watch on YouTube ↗
Intermediate 8 min read For: Python developers with basic knowledge of NLP and PyTorch/TensorFlow.
128.1K
Views
2.8K
Likes
94
Comments
53
Dislikes
2.2%
📈 Moderate

AI Summary

Patrick introduces the Hugging Face Transformers library, a popular NLP library in Python that integrates with PyTorch or TensorFlow. The tutorial covers building a sentiment analysis pipeline, exploring the model hub, and fine-tuning a custom model.

[0:00]
Introduction to Hugging Face Transformers

The library provides state-of-the-art NLP models with a clean API, making it easy to build powerful pipelines.

[0:41]
Installation and Setup

Install PyTorch or TensorFlow first, then run 'pip install transformers' or use conda.

[1:06]
Using the Pipeline for Sentiment Analysis

Import 'pipeline' from transformers, create a classifier with 'pipeline('sentiment-analysis')', and classify text with two lines of code.

[3:41]
Processing Multiple Texts

Pass a list of texts to the pipeline to get multiple results at once.

[4:54]
Specifying a Model and Tokenizer

Use 'model_name' to load a specific pre-trained model (e.g., 'distilbert-base-uncased-finetuned-sst-2-english') and pass it to the pipeline.

[6:45]
Manual Tokenization and Model Inference

Import 'AutoTokenizer' and 'AutoModelForSequenceClassification', use 'from_pretrained' to load them, then tokenize text and get predictions manually.

[13:01]
Batch Processing with Padding and Truncation

Use tokenizer with arguments like 'padding=True', 'truncation=True', 'max_length=512', and 'return_tensors='pt'' to prepare batches for PyTorch.

[15:14]
Manual Inference with PyTorch

Disable gradient tracking with 'torch.no_grad()', pass the batch to the model, apply softmax to logits, and get predictions using 'torch.argmax'.

[21:50]
Saving and Loading Models

Save tokenizer and model with 'save_pretrained(directory)' and load them back with 'from_pretrained(directory)'.

[23:36]
Exploring the Model Hub

Visit huggingface.co/models to search for pre-trained models by task (e.g., text classification) and language (e.g., German).

[25:18]
Using a German Sentiment Model

Load a German sentiment model (e.g., 'oliverguhr/german-sentiment-bert') and test it on German sentences.

[29:30]
Fine-Tuning a Model Overview

Five steps: prepare dataset, load tokenizer, encode data, build PyTorch dataset, load pre-trained model, and train using Trainer or custom loop.

[31:28]
Fine-Tuning with Trainer

Use 'Trainer' and 'TrainingArguments' from transformers to simplify training; specify epochs, output directory, learning rate, etc.

[36:22]
Custom PyTorch Training Loop

For more flexibility, use a native PyTorch loop with DataLoader, optimizer (e.g., AdamW), and manual forward/backward passes.

The Hugging Face Transformers library simplifies NLP tasks like sentiment analysis through high-level pipelines and also allows manual control for fine-tuning. With the model hub, you can leverage pre-trained models for multiple languages and tasks.

Clickbait Check

95% Legit

"The title accurately reflects the content: a crash course covering sentiment analysis, model hub usage, and fine-tuning."

Mentioned in this Video

Tutorial Checklist

1 0:41 Install PyTorch or TensorFlow, then run 'pip install transformers'.
2 1:06 Import 'pipeline' from transformers and create a sentiment analysis pipeline.
3 3:41 Pass a list of texts to the pipeline for batch classification.
4 4:54 Specify a model name (e.g., 'distilbert-base-uncased-finetuned-sst-2-english') and pass it to the pipeline.
5 6:45 Import 'AutoTokenizer' and 'AutoModelForSequenceClassification', then load them with 'from_pretrained(model_name)'.
6 13:01 Tokenize batch data with arguments: padding=True, truncation=True, max_length=512, return_tensors='pt'.
7 15:14 Use 'torch.no_grad()', pass batch to model, apply softmax to logits, and get predictions with 'torch.argmax'.
8 21:50 Save model and tokenizer with 'save_pretrained(directory)' and load with 'from_pretrained(directory)'.
9 23:36 Search the model hub for a pre-trained model (e.g., German sentiment) and load it by name.
10 31:28 For fine-tuning, use 'Trainer' and 'TrainingArguments' from transformers, or implement a custom PyTorch training loop.

Study Flashcards (14)

What is the Hugging Face Transformers library?

easy Click to reveal answer

A popular NLP library in Python that provides state-of-the-art models with a clean API.

How do you install the transformers library?

easy Click to reveal answer

Run 'pip install transformers' or use conda.

0:41

What does the 'pipeline' function do?

easy Click to reveal answer

It provides an easy way to use a model for inference, abstracting many details.

1:06

How do you classify multiple texts at once using the pipeline?

easy Click to reveal answer

Pass a list of texts to the pipeline.

3:41

What is the purpose of 'AutoTokenizer' and 'AutoModelForSequenceClassification'?

medium Click to reveal answer

They are generic classes to load a tokenizer and model for sequence classification tasks.

6:45

What does 'from_pretrained' do?

medium Click to reveal answer

It loads a pre-trained model or tokenizer from a name or directory.

8:05

What is the difference between 'tokenizer.tokenize' and calling the tokenizer directly?

medium Click to reveal answer

'tokenizer.tokenize' returns a list of tokens (strings), while calling the tokenizer directly returns a dictionary with 'input_ids' and 'attention_mask'.

9:32

What do the special tokens 101 and 102 represent?

hard Click to reveal answer

101 is the beginning-of-string token, and 102 is the end-of-string token.

12:38

What arguments should be used when tokenizing a batch for PyTorch?

medium Click to reveal answer

padding=True, truncation=True, max_length=512, return_tensors='pt'.

13:01

How do you get predictions from model outputs?

hard Click to reveal answer

Apply softmax to 'outputs.logits' along dimension 1, then use 'torch.argmax' to get the label index.

15:14

How do you save and load a fine-tuned model?

medium Click to reveal answer

Use 'save_pretrained(directory)' to save and 'from_pretrained(directory)' to load.

21:50

Where can you find pre-trained models for different tasks?

easy Click to reveal answer

On the Hugging Face model hub at huggingface.co/models.

23:36

What are the five steps for fine-tuning a model?

hard Click to reveal answer

Prepare dataset, load tokenizer, encode data, build PyTorch dataset, load pre-trained model, and train.

31:28

What is the 'Trainer' class used for?

medium Click to reveal answer

It abstracts away the training loop, making fine-tuning easier.

33:17

💡 Key Takeaways

💡

Introduction to Hugging Face

Establishes the library as the most popular NLP library in Python.

🔧

Two-Line Sentiment Analysis

Demonstrates the simplicity of the pipeline API.

1:06
⚖️

from_pretrained Function

Key function for loading models and tokenizers.

8:05
🔧

Model Hub Exploration

Shows how to find and use pre-trained models for different languages.

23:36
🔧

Fine-Tuning Overview

Provides a clear five-step process for custom model training.

31:28

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Build Sentiment AI in 2 Lines

30s

Shows how incredibly easy it is to create a sentiment analysis model, shocking viewers with the simplicity.

▶ Play Clip

Model Hub: Free AI Models

59s

Reveals the massive library of free pre-trained AI models, making viewers curious to explore and use them.

▶ Play Clip

Fine-Tune Your Own AI Model

59s

Demystifies the fine-tuning process with a clear 5-step guide, empowering viewers to customize AI for their own needs.

▶ Play Clip

Sentiment Analysis in German

59s

Demonstrates multilingual capabilities, showing the model works on non-English text and expanding its appeal.

▶ Play Clip

Hugging Face Trainer vs Manual

59s

Compares the easy trainer with manual PyTorch training, giving viewers options for different skill levels.

▶ Play Clip

[00:00] hi everyone i'm patrick and in today's

[00:02] video we are going to learn how to get

[00:03] started with hugging face and the

[00:05] transformers library

[00:07] the hugging face transformers library is

[00:09] probably the most popular nlp library in

[00:12] python right now

[00:13] and it can be combined directly with

[00:14] pytorch or tensorflow

[00:16] it provides state-of-the-art natural

[00:19] language processing models and has a

[00:21] very clean api that makes it extremely

[00:23] simple to build powerful

[00:25] nlp pipelines so today we have a first

[00:27] look at the library and build a

[00:29] sentiment

[00:30] classification algorithm i show you some

[00:32] basic functions

[00:33] and then we have a look at the model hub

[00:35] and then i also show you how you can

[00:37] fine-tune your own model

[00:38] so let's get started all right so to get

[00:41] started you should

[00:42] either install pytorch or tensorflow

[00:45] first

[00:46] and then in order to install the

[00:48] transformers library you just have to

[00:50] say

[00:51] pip install transformers

[00:54] or there's also a conda installation

[00:56] command that you can find on the

[00:58] installation page so let's

[01:02] install it like this so i already did

[01:04] this and then we can start using this so

[01:06] we can save

[01:07] from transformers and then we import

[01:10] a pipeline as first thing and have a

[01:13] look at this

[01:14] and then we also import some utilities

[01:18] that we need from the

[01:19] pytorch library so we import torch

[01:22] and we import torch dot nn

[01:25] dot functional sf so we're going to use

[01:29] this

[01:29] later and now we can start using this

[01:33] pipeline so let's say classifier

[01:36] equals and then we create a

[01:39] pipeline and we need to specify the

[01:42] task that we want so in this case we

[01:45] want to do

[01:46] sentiment analysis so we have to call it

[01:50] like

[01:50] this and you will find the different

[01:54] available tasks on the website

[01:58] so here we can see for example we have

[02:01] this

[02:01] sentiment analysis which is just an

[02:05] alias of text classification but for

[02:08] example we also have a

[02:09] question answering pipeline or a text

[02:12] generation or a conversational pipeline

[02:16] so yeah this is how we can define a

[02:18] pipeline

[02:19] and what a pipeline does is that it

[02:22] gives you a great and easy way to use

[02:25] model for inference and it abstracts a

[02:28] lot of the things for you

[02:30] so you will see what i mean in a moment

[02:33] so now we can just use this classifier

[02:36] and classify some text by saying

[02:39] res for results equals

[02:42] and then we call this classifier and we

[02:46] want to classify a example text

[02:49] so let me copy and paste some example

[02:52] text for you

[02:54] so we want to classify we are very happy

[02:56] to show you

[02:57] the smiley face transformers library and

[03:00] then let's print

[03:02] the result and see how this looks like

[03:05] so let's run the code all right and as

[03:08] you can see we get the label

[03:09] is positive and the score is 0.99 so

[03:13] it's very confident that this is

[03:15] a positive sentence and as you can see

[03:17] it only takes

[03:18] two lines of code with this pipeline to

[03:21] create a

[03:22] sentiment analysis code so

[03:26] yeah this is exactly what we need so we

[03:28] need to see the

[03:29] label of the text if it's negative or

[03:31] positive

[03:32] and we also get the score so yeah this

[03:35] is really nice

[03:36] and now let's have a look at some more

[03:38] things that we can do with this pipeline

[03:41] so first of all we can put in

[03:44] more texts at once so we can not just

[03:47] use

[03:48] one so we can give it a list so let's

[03:50] for example use a list

[03:52] and then let's use another example text

[03:55] so let me

[03:56] copy and paste this one in here as well

[04:00] so we also want to classify this we hope

[04:03] you don't

[04:04] hate it and then we get multiple

[04:07] results back so let's call this results

[04:10] and then we can iterate over this so we

[04:12] can say for

[04:13] results in results

[04:16] and then we want to print the result

[04:19] and now let's run this code and have a

[04:22] look at how this looks like

[04:24] all right and as you can see for the

[04:26] second text we get

[04:28] another result back so here the label is

[04:31] negative and the score is maybe not that

[04:34] confident in this case

[04:35] so this text might be a little bit

[04:37] confusing we hope

[04:39] you don't hate it but basically this is

[04:41] how you can pass in multiple texts at

[04:44] once

[04:44] and now so right now we only use

[04:48] the default pipeline with the default

[04:51] model but now let's have a look at how

[04:53] we can use a

[04:54] concrete model and then also how you can

[04:57] use a concrete

[04:58] tokenizer so what we can do is

[05:02] we can specify the model name

[05:05] and say model name equals and in this

[05:09] case i use

[05:10] this pillbird base uncased and then

[05:13] fine tuned sst to english so i will show

[05:17] you where i got this

[05:19] string or this name in a moment

[05:22] but for now yeah this is basically just

[05:24] a distilled bird model

[05:26] which is a smaller and faster version of

[05:30] bird but it was pre-trained on the same

[05:33] corpus

[05:34] and then you see that it also was

[05:36] fine-tuned and this is just the name of

[05:38] the data set so in this case

[05:40] it's an english data set from the

[05:43] stanford sentiment tree bank version two

[05:46] and yeah so now if we have the model

[05:48] name we can

[05:49] give this to our pipeline with the model

[05:53] argument so we can say model equals and

[05:56] then we use this model name

[05:58] so now in this case i can tell you that

[06:01] the

[06:01] default model for this sentiment

[06:04] analysis task

[06:06] is already this model name so this

[06:08] should do

[06:09] exactly the same but later we will

[06:12] switch this and then have a look at how

[06:14] we can use different models

[06:16] so first of all let's run this again and

[06:19] see that this is still the same

[06:21] all right so we see this is still the

[06:23] same result so this worked

[06:25] so now we um just use

[06:28] this string to define our model but now

[06:31] let's have a different

[06:33] approach to define a model and then also

[06:36] a

[06:36] tokenizer so this will give us a little

[06:39] bit more flexibility

[06:40] later so in order to do this we want to

[06:44] say

[06:45] from transformers and then here i

[06:48] import a auto tokenizer class

[06:51] and auto model for

[06:54] sequence classification and this is

[06:58] just a generic class for a tokenizer

[07:02] and this is also a generic class but a

[07:05] little bit more specific so in this case

[07:08] i want to have it for sequence

[07:10] classification

[07:11] and then it will give me a little bit

[07:13] more functionality

[07:14] specifically for this task so don't

[07:18] worry about this right now you can

[07:20] also find all the model classes

[07:22] available

[07:23] in the documentation so if you're

[07:25] interested then have a look at this

[07:27] and also if you use tensorflow then

[07:30] here you have to say tf and then

[07:33] the name of this class but the rest is

[07:36] actually

[07:36] the same so yeah this is how you use

[07:39] tensorflow

[07:40] and now after importing this

[07:43] we can create um two instances of this

[07:47] so we can do we can say model

[07:50] equals and then we use this class

[07:54] so auto model for sequence

[07:56] classification

[07:58] and then we use a function that is

[08:00] called so let's say

[08:02] dot from pre-trained

[08:05] and then it also needs the model name

[08:07] and we do the same with the tokenizer so

[08:10] we say

[08:11] tokenizer equals the auto tokenizer

[08:15] dot from pre-trained and then it needs

[08:18] the

[08:19] model name so this dot from

[08:23] pre-trained function is a very important

[08:26] function in hacking phase that you will

[08:28] see a lot

[08:29] so you will see this later a few more

[08:31] times so

[08:33] now that we created this we can also

[08:37] just give the actual model and not just

[08:40] the string

[08:41] to the classifier or to the pipeline

[08:44] so we can say our model equals

[08:47] our model and our tokenizer

[08:50] equals our tokenizer so

[08:54] now if we run this we should still get

[08:56] the same results because these are the

[08:59] default versions and yeah as we see we

[09:02] still get the same result

[09:03] but then later um if you want to use a

[09:06] different

[09:07] model or tokenizer then you know how you

[09:09] can switch this

[09:11] so just by using a different model and

[09:13] tokenizer here for the pipeline so now

[09:16] instead of using this

[09:17] pipeline let's see how we can use this

[09:21] model and tokenizer directly and do some

[09:24] of the steps manually

[09:26] and this will give you a little bit more

[09:28] flexibility

[09:29] so down here um let's first

[09:32] use the tokenizer and see what this

[09:36] does so first let's

[09:39] um call the tokenizer.tokenize function

[09:44] so we say let's call this tokens and

[09:47] then

[09:48] equals tokenizer dot tokenize

[09:52] and then the string or the sentence we

[09:54] want to tokenize

[09:56] so let's copy and paste this in here

[09:59] and then once we get the tokens we can

[10:02] use them and get the

[10:04] token ids out of it so we can say

[10:07] token ids equals and then we

[10:11] again use the tokenizer and the function

[10:15] convert tokenizer to

[10:18] it's called ids and then it needs

[10:21] the tokens so this is one way how to do

[10:26] this

[10:26] or we can um do this directly by saying

[10:30] token ids equals and then we

[10:34] call this tokenizer like a function

[10:38] and then again we give it the same

[10:41] string here so now let's

[10:45] print all these three variables to see

[10:48] where is the difference

[10:50] so first we print the tokens then we

[10:53] print the token ids

[10:55] and then here let's actually

[10:58] give this a different name so let's call

[11:01] this

[11:02] input ids so

[11:05] now let's run this and see how this

[11:07] looks like all right so here is the

[11:09] result so as you can see when we call

[11:12] tokenizer tokenizer.tokenize then we get

[11:15] a

[11:16] list of strings or the list of the words

[11:20] back so now

[11:21] each word is a oh sorry

[11:24] each word is a separate token

[11:28] and for example this one is our smiley

[11:32] face or our emoji

[11:34] so yeah this is what the tokenize

[11:37] function

[11:37] does and then once we call this

[11:41] convert tokens to ids we get

[11:44] this one back so now it converted

[11:47] each token to an id so

[11:50] each word has a very unique

[11:53] id and this is basically the

[11:56] mathematical

[11:57] representation or the numerical

[11:59] representation that our model then can

[12:02] understand

[12:03] so this is what we get after this

[12:05] function and if we call this tokenizer

[12:08] directly then we get a dictionary back

[12:12] and here we have the key input ids

[12:15] and we also have the attention mask so

[12:18] for now you don't really have to worry

[12:20] about this

[12:21] but let's have a look at the input ids

[12:25] so if we compare the token ids with the

[12:29] input ids then we see we have the exact

[12:32] same

[12:33] sequence of token ids but we also have

[12:37] this 101

[12:38] and 102 token and this is

[12:41] just the beginning of string and the end

[12:44] of string

[12:45] token but basically it's the same

[12:48] so yeah this is the difference between

[12:50] these three

[12:51] functions and then these input ids

[12:54] this is what we can pass to our model

[12:58] later to do the predictions manually

[13:01] so now like before we can also use

[13:04] multiple

[13:04] um sentences of course to for our

[13:07] tokenizers so

[13:09] um for example usually in your code you

[13:12] have your

[13:13] training data so let's say x train

[13:16] and in this example let's just use these

[13:19] two

[13:20] sentences so this is our x train

[13:23] and then we can um and then we can pass

[13:27] this to our

[13:28] tokenizer and let's call this batch so

[13:31] this is

[13:32] our batch that we put into our model

[13:35] later

[13:35] so we say batch equals tokenizer and

[13:39] then we call this

[13:40] tokenizer directly with our training

[13:43] data

[13:44] and then i also want to show you some

[13:46] useful arguments so we say

[13:48] padding equals true and we also say

[13:52] truncation

[13:53] equals true and then we say

[13:56] max length equals 412

[14:01] and we say return tensors

[14:04] equals and then as a string pt

[14:08] for pi torch so this will ensure that

[14:11] all of our samples in our batch have the

[14:14] same

[14:15] length so it will apply padding and

[14:18] truncation if necessary

[14:20] and this is also important so in this

[14:23] case we want to have a

[14:25] pie torch tensor returned directly

[14:28] so i will show you later what's the

[14:30] difference if you don't use this

[14:33] but for now let's just use this and then

[14:36] um first of all let's print this

[14:39] batch and see how this looks like and

[14:42] then

[14:42] we see we get a dictionary

[14:45] and again it has the key input ids

[14:49] and the key attention mask and then here

[14:52] it has

[14:53] two tensors so the first one

[14:56] for the first sentence and the second

[15:00] one for the

[15:01] second sentence and the same for the

[15:03] attention mask so two tensors

[15:05] so yeah as i said these input ids are

[15:08] these unique ids that our

[15:10] model can understand so yeah now we have

[15:13] this batch

[15:14] and now we can pass this to our

[15:17] model so and let's do this manually and

[15:21] see how we can call our model

[15:23] so in pytorch when we do inference we

[15:26] also want to say

[15:28] with torch dot no grab

[15:31] so this will disable the gradient

[15:33] tracking i explained this in

[15:36] a lot of my tutorials so you can just

[15:37] have a look at them if you want to learn

[15:39] more about this

[15:41] and then we can call our model by saying

[15:44] outputs equals and then we call

[15:47] the model and then here we use

[15:51] two asterisks and then we

[15:55] unpack this batch so if you remember

[15:58] here this is

[15:59] a dictionary and here basically

[16:02] with this we just unpack these

[16:06] values in our dictionary so for

[16:08] tensorflow you don't do this so

[16:10] you just pass in the batch like this but

[16:13] for pytorch you

[16:14] have to unpack this and now we get the

[16:17] outputs of our model

[16:19] so let's print the outputs and as you

[16:22] might know this

[16:23] these are just the raw values so

[16:26] to get the actual probabilities and the

[16:29] predictions

[16:30] we can apply a the softmax so let's say

[16:34] predictions equals torch or

[16:37] we also have this in f dot soft

[16:40] max and then here we say

[16:44] outputs dot logits and we want to do

[16:48] this along dimension

[16:49] equals one and let's also

[16:52] print the um predictions

[16:56] and then let's do one more thing so

[16:58] let's also get the

[17:00] labels labels equals and we just get

[17:03] this by

[17:04] taking the prediction with the or the

[17:09] index with the highest probability so we

[17:11] get this by saying

[17:12] torch dot arc max

[17:15] and we can either put in the predictions

[17:19] or we can put in the outputs and

[17:22] actually

[17:23] don't need this but just for

[17:25] demonstration

[17:26] uh let's use the predictions and then

[17:29] again

[17:29] dimension equals one and then let's

[17:33] print the labels as well

[17:36] and now let's actually do one more thing

[17:40] so let's convert the labels

[17:42] by saying labels equals and then we use

[17:45] list comprehension

[17:47] and call model dot config

[17:50] dot id to

[17:53] label and then it needs the

[17:56] actual label id

[18:00] and then we iterate so we say for

[18:04] label id in labels

[18:08] to list and now what this does you will

[18:12] see this when we print this so we print

[18:15] the labels and now

[18:19] let's actually run this and see if this

[18:22] works

[18:22] all right so this worked so as you can

[18:25] see

[18:26] um here we print the output

[18:30] so these are our output this is a

[18:33] sequence classifier output and as you

[18:37] see

[18:37] it has the logits argument so that's why

[18:40] we used

[18:42] outputs.logith and then we get the

[18:45] actual probabilities and

[18:49] then to get the labels we used arcmux so

[18:52] this is a tensor with the label

[18:55] one and the label zero and then we

[18:58] converted each

[19:00] label to the actual class name and then

[19:03] we get

[19:03] positive and negative so by the way this

[19:07] function i think is only dedicated

[19:11] to a auto model for sequence

[19:13] classification

[19:15] for example if we just used a autumn

[19:18] model then i

[19:18] think it won't be available so that's

[19:21] what

[19:22] these more um concrete classes will do

[19:25] for you it gives you

[19:27] a little bit more functionality for the

[19:29] dedicated task

[19:31] so we see that the loss is

[19:34] none in this case so if you also want to

[19:36] have

[19:37] a loss that we want to inspect then we

[19:40] can

[19:40] give the loss or the

[19:43] not the loss but the labels arguments

[19:47] to our model that um it knows how to

[19:49] compute the loss

[19:51] so we say labels and then we

[19:54] create a torch dot tensor by saying

[19:57] torch dot tensor and then as a list we

[20:01] give it the labels

[20:02] one and zero and now let's run this

[20:06] again

[20:06] and then you should see that we should

[20:08] see a loss here

[20:10] and yeah now here we see the loss and

[20:13] again

[20:13] this labels argument is i think

[20:17] special to this auto model for sequence

[20:20] classification

[20:22] so yeah this worked and now if we have a

[20:26] careful look at the probabilities

[20:30] so first of all we see we get label

[20:33] positive

[20:34] and negative and here for the first one

[20:37] this is the highest probability so 9.997

[20:42] and here for the second one this is

[20:45] the largest number so it took this one

[20:49] and this

[20:49] is 5.30 so if we compare them

[20:53] with the results that we got from our

[20:56] pipeline

[20:57] then we see these are exactly the same

[21:01] numbers so now you might see

[21:04] what's the difference between a pipeline

[21:07] and

[21:07] using tokenizer and model directly

[21:10] so with the pipeline we only need two

[21:12] lines of code and then we actually

[21:15] get what we want so we get the label and

[21:17] we get the score we are interested in

[21:19] so this might be just fine but then yeah

[21:22] if you want to do it manually

[21:23] you can do it like i showed you and you

[21:25] will get the same results that you can

[21:27] then

[21:28] use so yeah that's how you can use a

[21:30] model and a

[21:32] tokenizer and yeah so using the model

[21:35] and the tokenizer will be important when

[21:38] you for example want to

[21:39] fine-tune your model so i will show you

[21:43] roughly how to do this later but

[21:46] yeah so this is how you use model and

[21:49] tokenizer

[21:50] and let's just assume we did

[21:53] fine tune our model then what we can do

[21:56] and we can say save directory and

[22:00] specify

[22:01] a directory so let's call the folder

[22:04] saved and then we can call tokenizer

[22:08] and then we can call dot save

[22:11] pre-trained

[22:12] and then the location just the safe

[22:15] directory

[22:16] and the same with our model so we can

[22:18] say model

[22:19] dot save pre-trained save

[22:23] underscore pre-trained and then again

[22:27] the

[22:27] safe directory and then we can load them

[22:30] in another application for example

[22:33] tokenizer

[22:34] equals and then again here we use this

[22:37] auto tokenizer class

[22:39] and then the from pre-trained and then

[22:42] here

[22:43] we can give it a directory so

[22:46] this from pre-trained we can either give

[22:49] it a

[22:50] model name or we can give it this

[22:52] directory

[22:54] and again the same for the model so

[22:56] model

[22:57] equals and then we use this auto model

[23:00] for

[23:00] sequence classification dot from

[23:03] pre-trained and then the safe directory

[23:07] so this should work and then you should

[23:09] get the exact same

[23:11] model and tokenize it back and yeah as

[23:14] you might see

[23:14] these um model these dot

[23:18] from pre-trained functions are very

[23:21] important

[23:22] and you will use them a lot of time all

[23:24] right so i think these are the basic

[23:26] functions you need to build a pipeline

[23:29] or to apply the model and tokenizer

[23:31] manually

[23:33] and now let's have a look at how we can

[23:35] use a different

[23:36] model so like here you can either

[23:40] load this from your disk if you already

[23:42] have a pre-trained model somewhere on

[23:45] your computer

[23:46] but what you can also do is you can go

[23:49] to

[23:50] the hugging face model hub so you can

[23:52] find this at hugging face dot

[23:54] co slash models and here we have the

[23:58] model hub and you can search

[24:00] through different models so for example

[24:03] you

[24:04] could filter for the tasks so

[24:07] in this case we want to do text

[24:09] classification

[24:10] which is the same as sentiment analysis

[24:14] and then it filter is applies this

[24:16] filter so

[24:17] you can see the most popular model

[24:20] is already this one and then we can

[24:23] click on this and get some more

[24:25] information

[24:26] and as you could see so this is the

[24:28] exact same

[24:30] model name that we used in our code

[24:34] so once you've decided for a model you

[24:36] can click here and copy this

[24:38] name and then paste into your code

[24:41] so let's say in this case we want to use

[24:44] a different model so in this case

[24:46] i want to do sentiment classification

[24:49] with

[24:49] german sentences so then of course i

[24:53] need one that is trained on

[24:55] german so you can filter here so you can

[24:59] search so i can either again

[25:01] search for distilbert and see what

[25:03] different versions there are available

[25:06] or let me search for german

[25:09] and then here let's take the most

[25:12] popular one so

[25:14] by oliver gore and then we see this is a

[25:18] german sentiment bird and then we get

[25:21] more information and sometimes we also

[25:24] see

[25:24] some example code which is helpful so

[25:27] yeah this is nice and now what we have

[25:29] to do is we want to click here and

[25:31] copy this will just copy the name and

[25:35] then in our application let me

[25:38] comment this out and then let's again

[25:41] say

[25:42] model name equals and now i hit

[25:45] paste so now it pasted this

[25:48] string here so now we have this

[25:52] and now here we can give our model and

[25:55] tokenizer

[25:57] the model name so model name

[26:00] and model name and now let's do this for

[26:03] some

[26:04] example texts in german so let me copy

[26:07] and paste this in here so basically let

[26:10] me

[26:10] quickly translate this for you so this

[26:12] says not a good result

[26:15] this was unfair this was not good

[26:19] um not as bad as expected this

[26:22] was good and she drives a green car

[26:25] so basically these three texts are

[26:29] negative this one is rather positive and

[26:32] this

[26:33] is neutral so let's see if our model can

[26:36] detect this correctly

[26:38] so now again like above we do the same

[26:42] steps so

[26:43] we could copy and paste this so let's

[26:46] copy

[26:47] and paste this and then the same as

[26:50] above we say width torch

[26:53] torch dots no graphs and then we call

[26:57] the model so we say

[26:59] outputs equals model and then here we

[27:04] unpack our batch then we have the model

[27:08] then we want to have the label id so

[27:11] let's say

[27:11] label ids equals and then we

[27:15] use the torch.arc max function

[27:19] with the outputs and along dimension

[27:23] equals

[27:24] one and let me remove this one

[27:27] and then we print the label id so print

[27:30] the label ids

[27:32] and then we do the same as we do here so

[27:36] we want to

[27:36] convert them to the actual label names

[27:39] by calling model.config

[27:42] id to label label id for

[27:45] label in here we call this label

[27:49] ids to list and then print the labels

[27:53] and now let's run this and actually

[27:56] let's

[27:57] also print the batch in this

[28:00] case and uh let's have a look at how

[28:04] this looks like

[28:05] so let's run this and i get an error so

[28:08] here i forgot to say

[28:10] outputs dot logits like we did before

[28:14] so let's try it again and this is only

[28:16] two results so

[28:18] of course here in our tokenizer we want

[28:21] to use

[28:21] these texts so let's call this

[28:25] x train underscore

[28:28] sherman and then let's use x train

[28:31] underscore german here and let's

[28:34] run it again all right and as we can see

[28:37] we get the

[28:38] labels one one one zero zero and

[28:42] two and this is equal to negative

[28:45] negative negative then two times

[28:47] positive and then neutral

[28:49] so yeah this is exactly what i told you

[28:52] the first three sentences are rather

[28:54] negative

[28:55] than two positive ones and this one is

[28:57] neutral

[28:58] so yeah now our german model works as

[29:01] well and this

[29:02] is how we can use different models

[29:05] so we simply search the model hub and

[29:09] hopefully there is an already

[29:11] pre-trained version for the task we want

[29:14] and then we can just use this here as

[29:16] our model name and then we are good to

[29:18] go

[29:19] or if there is not a already pre-trained

[29:22] version then we have to do this

[29:24] ourselves and fine-tune our own model so

[29:27] i will show you how you do this in a

[29:29] moment

[29:30] but now one more thing i want to mention

[29:32] so

[29:33] um i want to talk about this return

[29:36] tensors equals pt so

[29:40] um if we here we print the batch and

[29:44] here the input ids and then we see

[29:47] this is a tensor so right now it's

[29:50] already

[29:50] in the pi touch format so we could

[29:54] use tensorflow here or we just um

[29:57] omit this and if we omit this

[30:00] then we don't have this in the tensor

[30:04] format

[30:04] so now it is just a python list i think

[30:08] but then what you could do is you could

[30:11] convert this so we can say

[30:13] batch equals and then we convert this to

[30:17] a tensor by saying

[30:18] torch dot tensor and then we

[30:21] give it the we call this batch

[30:24] and this is a dictionary so we can say

[30:28] batch and then access the key input

[30:33] ids like we see here and now

[30:36] we created a actual tensor out of this

[30:40] and then we don't have to

[30:43] unpack it like this here so now we

[30:45] remove this

[30:46] and then if we run it again then this

[30:49] should work as well

[30:51] and yeah this worked too so we get the

[30:53] same result

[30:54] and here we printed our batch and now we

[30:56] see this is a

[30:57] tensor directly so yeah be careful here

[31:00] to specify

[31:02] what you want so it's actually if you

[31:05] use pytorch then it's just simpler to

[31:08] use this as a return argument so return

[31:12] tensors equals pt but if you don't

[31:16] use this then you know what you can do

[31:18] otherwise all right so now we know how

[31:20] we can use different

[31:21] models so yeah try this out for

[31:24] other models in your language and see if

[31:27] this works

[31:28] and now let's have another look at how

[31:30] we can fine

[31:31] tune our own models so this is very

[31:35] important

[31:36] and i already prepared some code here

[31:39] and i will

[31:40] go over this very roughly

[31:43] but there's also a very great

[31:45] documentation

[31:46] about this so we can go to this

[31:49] documentation page here and you can also

[31:52] open this in collab so either with

[31:55] pytorch or tensorflow code so this is

[31:57] really helpful

[31:58] so i encourage you to check this out

[32:01] um but now let's go over this briefly

[32:04] so basically there are five steps you

[32:07] have to do

[32:08] um so in this example it's for pytorch

[32:12] so we have to prepare our data set for

[32:15] example

[32:16] loaded from a csv file or whatever

[32:19] then we have to load a pre-trained

[32:22] tokenizer

[32:23] and then call it with our data set so

[32:26] then we get the

[32:27] encodings or the token ids then

[32:30] we have to build a pie torch data set

[32:33] out of this with these encodings so if

[32:36] you don't know

[32:37] what the pi torch data set is then i

[32:39] will have a link for you here where i

[32:41] explain this then we also load a

[32:44] pre-trained

[32:44] model and then we can either load

[32:48] a hugging face trainer and train it so

[32:51] this abstracts away a lot of things or

[32:54] we can just use

[32:56] a native or normal python training

[32:58] pipeline like in our other pytorch code

[33:02] so yeah this is what we have to do so

[33:04] let's go

[33:05] over this very quickly so in this case

[33:08] we define our base model name so we want

[33:12] to start with

[33:13] a distilbert base uncased version

[33:17] but in this case for example not the

[33:19] fine-tuned one so

[33:20] just this one then step one we prepare

[33:23] the data set so we write a helpful

[33:25] function

[33:26] to create texts and

[33:29] labels out of the actual text

[33:33] and here we downloaded some

[33:37] data set and put it in our folder so i

[33:39] already did this here and

[33:41] yeah this is available at this website

[33:44] and this contains

[33:45] movie reviews so we want to fine-tune

[33:48] our models on movie reviews for

[33:50] sentiment classification

[33:52] so here we create training texts and the

[33:55] training

[33:55] labels with our helper function and we

[33:58] also do

[33:59] a trained test split to get validation

[34:02] texts and labels

[34:04] and yeah then as a next step

[34:07] we create or we define a

[34:10] pi torch data set so this inherits from

[34:13] pi torch data set so torch utils data we

[34:18] import data set and then we define this

[34:21] here so again i have a tutorial where i

[34:24] explain how this works

[34:26] but basically it needs the encodings

[34:29] and the labels and it stores them in

[34:32] here

[34:33] so yeah this needs the encoding so for

[34:36] the

[34:36] encodings we need a tokenizer

[34:39] so again we use this from pre-trained

[34:42] function

[34:43] with the model name and in this case

[34:46] since we know

[34:47] we use the distilled bird one we can

[34:50] use this class so remember before we

[34:53] used a generic

[34:55] tokenizer this auto tokenizer class

[34:58] and here we use a more concrete one so

[35:01] we use the

[35:02] distal bird tokenizer fast then we apply

[35:05] it

[35:06] to a training validation and test set

[35:08] and get the

[35:09] encodings then we put them in our data

[35:13] set

[35:14] and create the pi torch data set

[35:17] and then we import a trainer

[35:21] and the training argument so this is in

[35:24] available in transformers library and

[35:27] then we can

[35:28] set this up so we can create the

[35:31] arguments so here for example we specify

[35:35] the number of training epochs the output

[35:38] directory

[35:38] the learning rate and other parameters

[35:41] we want and then we

[35:42] create our model again from a

[35:46] concrete model class and then with this

[35:49] dot

[35:49] from pre-trained function and then we

[35:52] set up this

[35:54] trainer and give it the model and the

[35:56] training

[35:57] arguments and then the training set and

[36:00] the validation set

[36:02] and then we simply have to call

[36:04] trainer.train

[36:05] and this will do all the training for us

[36:07] and afterwards you can test it on your

[36:09] test data set

[36:10] and then you have a fine-tuned model so

[36:13] yeah this is basically

[36:14] all you need and then i also want to

[36:17] show you that instead of using this

[36:20] trainer if you want to do it manually

[36:22] and have

[36:23] even more flexibility you can just use a

[36:27] normal pie touch training loop so

[36:30] for this we use a data loader

[36:33] and we need an optimization so in this

[36:36] case

[36:36] we use a optimizer from the transformers

[36:39] library

[36:40] and then here we specify our device then

[36:43] again we create this

[36:44] model we push it to the device and set

[36:47] it to training mode

[36:48] then we create a data loader and the

[36:51] optimizer

[36:52] and then we do the typical training loop

[36:55] so we say

[36:55] for epoch in num epochs and for batch in

[36:59] our training loader

[37:01] and then we do the stuff we always do we

[37:03] say optimize the zero grad

[37:05] we also push it to the device if

[37:08] necessary

[37:09] then we call the model and we calculate

[37:11] the

[37:12] loss with this and in this case um

[37:15] this is already contained in the output

[37:18] so we can just

[37:19] access the loss like this then we call

[37:22] lost.backward

[37:23] and optimizer step and iterate and

[37:27] afterwards we can set our model to

[37:30] evaluation mode again and yeah this is

[37:32] how we do it in native pi touch code

[37:34] and yeah so this is basically how we do

[37:37] a

[37:38] fine tuning and then can fine-tune our

[37:41] own models and then afterwards you can

[37:42] also

[37:43] upload them to the hugging face model

[37:45] hub if you want so

[37:47] yeah i think that's pretty cool and yeah

[37:50] that's all that i wanted to

[37:52] show you for now i think that's enough

[37:54] to get started with hugging face

[37:56] and i hope you enjoyed this tutorial and

[37:58] then i hope to see you in the next video

[38:02] bye

[38:11] you

⚡ Saved you 0h 38m reading this? Transcribe any YouTube video for free — no signup needed.