[0:00] hi everyone i'm patrick and in today's
[0:02] video we are going to learn how to get
[0:03] started with hugging face and the
[0:05] transformers library
[0:07] the hugging face transformers library is
[0:09] probably the most popular nlp library in
[0:12] python right now
[0:13] and it can be combined directly with
[0:14] pytorch or tensorflow
[0:16] it provides state-of-the-art natural
[0:19] language processing models and has a
[0:21] very clean api that makes it extremely
[0:23] simple to build powerful
[0:25] nlp pipelines so today we have a first
[0:27] look at the library and build a
[0:29] sentiment
[0:30] classification algorithm i show you some
[0:32] basic functions
[0:33] and then we have a look at the model hub
[0:35] and then i also show you how you can
[0:37] fine-tune your own model
[0:38] so let's get started all right so to get
[0:41] started you should
[0:42] either install pytorch or tensorflow
[0:45] first
[0:46] and then in order to install the
[0:48] transformers library you just have to
[0:50] say
[0:51] pip install transformers
[0:54] or there's also a conda installation
[0:56] command that you can find on the
[0:58] installation page so let's
[1:02] install it like this so i already did
[1:04] this and then we can start using this so
[1:06] we can save
[1:07] from transformers and then we import
[1:10] a pipeline as first thing and have a
[1:13] look at this
[1:14] and then we also import some utilities
[1:18] that we need from the
[1:19] pytorch library so we import torch
[1:22] and we import torch dot nn
[1:25] dot functional sf so we're going to use
[1:29] this
[1:29] later and now we can start using this
[1:33] pipeline so let's say classifier
[1:36] equals and then we create a
[1:39] pipeline and we need to specify the
[1:42] task that we want so in this case we
[1:45] want to do
[1:46] sentiment analysis so we have to call it
[1:50] like
[1:50] this and you will find the different
[1:54] available tasks on the website
[1:58] so here we can see for example we have
[2:01] this
[2:01] sentiment analysis which is just an
[2:05] alias of text classification but for
[2:08] example we also have a
[2:09] question answering pipeline or a text
[2:12] generation or a conversational pipeline
[2:16] so yeah this is how we can define a
[2:18] pipeline
[2:19] and what a pipeline does is that it
[2:22] gives you a great and easy way to use
[2:25] model for inference and it abstracts a
[2:28] lot of the things for you
[2:30] so you will see what i mean in a moment
[2:33] so now we can just use this classifier
[2:36] and classify some text by saying
[2:39] res for results equals
[2:42] and then we call this classifier and we
[2:46] want to classify a example text
[2:49] so let me copy and paste some example
[2:52] text for you
[2:54] so we want to classify we are very happy
[2:56] to show you
[2:57] the smiley face transformers library and
[3:00] then let's print
[3:02] the result and see how this looks like
[3:05] so let's run the code all right and as
[3:08] you can see we get the label
[3:09] is positive and the score is 0.99 so
[3:13] it's very confident that this is
[3:15] a positive sentence and as you can see
[3:17] it only takes
[3:18] two lines of code with this pipeline to
[3:21] create a
[3:22] sentiment analysis code so
[3:26] yeah this is exactly what we need so we
[3:28] need to see the
[3:29] label of the text if it's negative or
[3:31] positive
[3:32] and we also get the score so yeah this
[3:35] is really nice
[3:36] and now let's have a look at some more
[3:38] things that we can do with this pipeline
[3:41] so first of all we can put in
[3:44] more texts at once so we can not just
[3:47] use
[3:48] one so we can give it a list so let's
[3:50] for example use a list
[3:52] and then let's use another example text
[3:55] so let me
[3:56] copy and paste this one in here as well
[4:00] so we also want to classify this we hope
[4:03] you don't
[4:04] hate it and then we get multiple
[4:07] results back so let's call this results
[4:10] and then we can iterate over this so we
[4:12] can say for
[4:13] results in results
[4:16] and then we want to print the result
[4:19] and now let's run this code and have a
[4:22] look at how this looks like
[4:24] all right and as you can see for the
[4:26] second text we get
[4:28] another result back so here the label is
[4:31] negative and the score is maybe not that
[4:34] confident in this case
[4:35] so this text might be a little bit
[4:37] confusing we hope
[4:39] you don't hate it but basically this is
[4:41] how you can pass in multiple texts at
[4:44] once
[4:44] and now so right now we only use
[4:48] the default pipeline with the default
[4:51] model but now let's have a look at how
[4:53] we can use a
[4:54] concrete model and then also how you can
[4:57] use a concrete
[4:58] tokenizer so what we can do is
[5:02] we can specify the model name
[5:05] and say model name equals and in this
[5:09] case i use
[5:10] this pillbird base uncased and then
[5:13] fine tuned sst to english so i will show
[5:17] you where i got this
[5:19] string or this name in a moment
[5:22] but for now yeah this is basically just
[5:24] a distilled bird model
[5:26] which is a smaller and faster version of
[5:30] bird but it was pre-trained on the same
[5:33] corpus
[5:34] and then you see that it also was
[5:36] fine-tuned and this is just the name of
[5:38] the data set so in this case
[5:40] it's an english data set from the
[5:43] stanford sentiment tree bank version two
[5:46] and yeah so now if we have the model
[5:48] name we can
[5:49] give this to our pipeline with the model
[5:53] argument so we can say model equals and
[5:56] then we use this model name
[5:58] so now in this case i can tell you that
[6:01] the
[6:01] default model for this sentiment
[6:04] analysis task
[6:06] is already this model name so this
[6:08] should do
[6:09] exactly the same but later we will
[6:12] switch this and then have a look at how
[6:14] we can use different models
[6:16] so first of all let's run this again and
[6:19] see that this is still the same
[6:21] all right so we see this is still the
[6:23] same result so this worked
[6:25] so now we um just use
[6:28] this string to define our model but now
[6:31] let's have a different
[6:33] approach to define a model and then also
[6:36] a
[6:36] tokenizer so this will give us a little
[6:39] bit more flexibility
[6:40] later so in order to do this we want to
[6:44] say
[6:45] from transformers and then here i
[6:48] import a auto tokenizer class
[6:51] and auto model for
[6:54] sequence classification and this is
[6:58] just a generic class for a tokenizer
[7:02] and this is also a generic class but a
[7:05] little bit more specific so in this case
[7:08] i want to have it for sequence
[7:10] classification
[7:11] and then it will give me a little bit
[7:13] more functionality
[7:14] specifically for this task so don't
[7:18] worry about this right now you can
[7:20] also find all the model classes
[7:22] available
[7:23] in the documentation so if you're
[7:25] interested then have a look at this
[7:27] and also if you use tensorflow then
[7:30] here you have to say tf and then
[7:33] the name of this class but the rest is
[7:36] actually
[7:36] the same so yeah this is how you use
[7:39] tensorflow
[7:40] and now after importing this
[7:43] we can create um two instances of this
[7:47] so we can do we can say model
[7:50] equals and then we use this class
[7:54] so auto model for sequence
[7:56] classification
[7:58] and then we use a function that is
[8:00] called so let's say
[8:02] dot from pre-trained
[8:05] and then it also needs the model name
[8:07] and we do the same with the tokenizer so
[8:10] we say
[8:11] tokenizer equals the auto tokenizer
[8:15] dot from pre-trained and then it needs
[8:18] the
[8:19] model name so this dot from
[8:23] pre-trained function is a very important
[8:26] function in hacking phase that you will
[8:28] see a lot
[8:29] so you will see this later a few more
[8:31] times so
[8:33] now that we created this we can also
[8:37] just give the actual model and not just
[8:40] the string
[8:41] to the classifier or to the pipeline
[8:44] so we can say our model equals
[8:47] our model and our tokenizer
[8:50] equals our tokenizer so
[8:54] now if we run this we should still get
[8:56] the same results because these are the
[8:59] default versions and yeah as we see we
[9:02] still get the same result
[9:03] but then later um if you want to use a
[9:06] different
[9:07] model or tokenizer then you know how you
[9:09] can switch this
[9:11] so just by using a different model and
[9:13] tokenizer here for the pipeline so now
[9:16] instead of using this
[9:17] pipeline let's see how we can use this
[9:21] model and tokenizer directly and do some
[9:24] of the steps manually
[9:26] and this will give you a little bit more
[9:28] flexibility
[9:29] so down here um let's first
[9:32] use the tokenizer and see what this
[9:36] does so first let's
[9:39] um call the tokenizer.tokenize function
[9:44] so we say let's call this tokens and
[9:47] then
[9:48] equals tokenizer dot tokenize
[9:52] and then the string or the sentence we
[9:54] want to tokenize
[9:56] so let's copy and paste this in here
[9:59] and then once we get the tokens we can
[10:02] use them and get the
[10:04] token ids out of it so we can say
[10:07] token ids equals and then we
[10:11] again use the tokenizer and the function
[10:15] convert tokenizer to
[10:18] it's called ids and then it needs
[10:21] the tokens so this is one way how to do
[10:26] this
[10:26] or we can um do this directly by saying
[10:30] token ids equals and then we
[10:34] call this tokenizer like a function
[10:38] and then again we give it the same
[10:41] string here so now let's
[10:45] print all these three variables to see
[10:48] where is the difference
[10:50] so first we print the tokens then we
[10:53] print the token ids
[10:55] and then here let's actually
[10:58] give this a different name so let's call
[11:01] this
[11:02] input ids so
[11:05] now let's run this and see how this
[11:07] looks like all right so here is the
[11:09] result so as you can see when we call
[11:12] tokenizer tokenizer.tokenize then we get
[11:15] a
[11:16] list of strings or the list of the words
[11:20] back so now
[11:21] each word is a oh sorry
[11:24] each word is a separate token
[11:28] and for example this one is our smiley
[11:32] face or our emoji
[11:34] so yeah this is what the tokenize
[11:37] function
[11:37] does and then once we call this
[11:41] convert tokens to ids we get
[11:44] this one back so now it converted
[11:47] each token to an id so
[11:50] each word has a very unique
[11:53] id and this is basically the
[11:56] mathematical
[11:57] representation or the numerical
[11:59] representation that our model then can
[12:02] understand
[12:03] so this is what we get after this
[12:05] function and if we call this tokenizer
[12:08] directly then we get a dictionary back
[12:12] and here we have the key input ids
[12:15] and we also have the attention mask so
[12:18] for now you don't really have to worry
[12:20] about this
[12:21] but let's have a look at the input ids
[12:25] so if we compare the token ids with the
[12:29] input ids then we see we have the exact
[12:32] same
[12:33] sequence of token ids but we also have
[12:37] this 101
[12:38] and 102 token and this is
[12:41] just the beginning of string and the end
[12:44] of string
[12:45] token but basically it's the same
[12:48] so yeah this is the difference between
[12:50] these three
[12:51] functions and then these input ids
[12:54] this is what we can pass to our model
[12:58] later to do the predictions manually
[13:01] so now like before we can also use
[13:04] multiple
[13:04] um sentences of course to for our
[13:07] tokenizers so
[13:09] um for example usually in your code you
[13:12] have your
[13:13] training data so let's say x train
[13:16] and in this example let's just use these
[13:19] two
[13:20] sentences so this is our x train
[13:23] and then we can um and then we can pass
[13:27] this to our
[13:28] tokenizer and let's call this batch so
[13:31] this is
[13:32] our batch that we put into our model
[13:35] later
[13:35] so we say batch equals tokenizer and
[13:39] then we call this
[13:40] tokenizer directly with our training
[13:43] data
[13:44] and then i also want to show you some
[13:46] useful arguments so we say
[13:48] padding equals true and we also say
[13:52] truncation
[13:53] equals true and then we say
[13:56] max length equals 412
[14:01] and we say return tensors
[14:04] equals and then as a string pt
[14:08] for pi torch so this will ensure that
[14:11] all of our samples in our batch have the
[14:14] same
[14:15] length so it will apply padding and
[14:18] truncation if necessary
[14:20] and this is also important so in this
[14:23] case we want to have a
[14:25] pie torch tensor returned directly
[14:28] so i will show you later what's the
[14:30] difference if you don't use this
[14:33] but for now let's just use this and then
[14:36] um first of all let's print this
[14:39] batch and see how this looks like and
[14:42] then
[14:42] we see we get a dictionary
[14:45] and again it has the key input ids
[14:49] and the key attention mask and then here
[14:52] it has
[14:53] two tensors so the first one
[14:56] for the first sentence and the second
[15:00] one for the
[15:01] second sentence and the same for the
[15:03] attention mask so two tensors
[15:05] so yeah as i said these input ids are
[15:08] these unique ids that our
[15:10] model can understand so yeah now we have
[15:13] this batch
[15:14] and now we can pass this to our
[15:17] model so and let's do this manually and
[15:21] see how we can call our model
[15:23] so in pytorch when we do inference we
[15:26] also want to say
[15:28] with torch dot no grab
[15:31] so this will disable the gradient
[15:33] tracking i explained this in
[15:36] a lot of my tutorials so you can just
[15:37] have a look at them if you want to learn
[15:39] more about this
[15:41] and then we can call our model by saying
[15:44] outputs equals and then we call
[15:47] the model and then here we use
[15:51] two asterisks and then we
[15:55] unpack this batch so if you remember
[15:58] here this is
[15:59] a dictionary and here basically
[16:02] with this we just unpack these
[16:06] values in our dictionary so for
[16:08] tensorflow you don't do this so
[16:10] you just pass in the batch like this but
[16:13] for pytorch you
[16:14] have to unpack this and now we get the
[16:17] outputs of our model
[16:19] so let's print the outputs and as you
[16:22] might know this
[16:23] these are just the raw values so
[16:26] to get the actual probabilities and the
[16:29] predictions
[16:30] we can apply a the softmax so let's say
[16:34] predictions equals torch or
[16:37] we also have this in f dot soft
[16:40] max and then here we say
[16:44] outputs dot logits and we want to do
[16:48] this along dimension
[16:49] equals one and let's also
[16:52] print the um predictions
[16:56] and then let's do one more thing so
[16:58] let's also get the
[17:00] labels labels equals and we just get
[17:03] this by
[17:04] taking the prediction with the or the
[17:09] index with the highest probability so we
[17:11] get this by saying
[17:12] torch dot arc max
[17:15] and we can either put in the predictions
[17:19] or we can put in the outputs and
[17:22] actually
[17:23] don't need this but just for
[17:25] demonstration
[17:26] uh let's use the predictions and then
[17:29] again
[17:29] dimension equals one and then let's
[17:33] print the labels as well
[17:36] and now let's actually do one more thing
[17:40] so let's convert the labels
[17:42] by saying labels equals and then we use
[17:45] list comprehension
[17:47] and call model dot config
[17:50] dot id to
[17:53] label and then it needs the
[17:56] actual label id
[18:00] and then we iterate so we say for
[18:04] label id in labels
[18:08] to list and now what this does you will
[18:12] see this when we print this so we print
[18:15] the labels and now
[18:19] let's actually run this and see if this
[18:22] works
[18:22] all right so this worked so as you can
[18:25] see
[18:26] um here we print the output
[18:30] so these are our output this is a
[18:33] sequence classifier output and as you
[18:37] see
[18:37] it has the logits argument so that's why
[18:40] we used
[18:42] outputs.logith and then we get the
[18:45] actual probabilities and
[18:49] then to get the labels we used arcmux so
[18:52] this is a tensor with the label
[18:55] one and the label zero and then we
[18:58] converted each
[19:00] label to the actual class name and then
[19:03] we get
[19:03] positive and negative so by the way this
[19:07] function i think is only dedicated
[19:11] to a auto model for sequence
[19:13] classification
[19:15] for example if we just used a autumn
[19:18] model then i
[19:18] think it won't be available so that's
[19:21] what
[19:22] these more um concrete classes will do
[19:25] for you it gives you
[19:27] a little bit more functionality for the
[19:29] dedicated task
[19:31] so we see that the loss is
[19:34] none in this case so if you also want to
[19:36] have
[19:37] a loss that we want to inspect then we
[19:40] can
[19:40] give the loss or the
[19:43] not the loss but the labels arguments
[19:47] to our model that um it knows how to
[19:49] compute the loss
[19:51] so we say labels and then we
[19:54] create a torch dot tensor by saying
[19:57] torch dot tensor and then as a list we
[20:01] give it the labels
[20:02] one and zero and now let's run this
[20:06] again
[20:06] and then you should see that we should
[20:08] see a loss here
[20:10] and yeah now here we see the loss and
[20:13] again
[20:13] this labels argument is i think
[20:17] special to this auto model for sequence
[20:20] classification
[20:22] so yeah this worked and now if we have a
[20:26] careful look at the probabilities
[20:30] so first of all we see we get label
[20:33] positive
[20:34] and negative and here for the first one
[20:37] this is the highest probability so 9.997
[20:42] and here for the second one this is
[20:45] the largest number so it took this one
[20:49] and this
[20:49] is 5.30 so if we compare them
[20:53] with the results that we got from our
[20:56] pipeline
[20:57] then we see these are exactly the same
[21:01] numbers so now you might see
[21:04] what's the difference between a pipeline
[21:07] and
[21:07] using tokenizer and model directly
[21:10] so with the pipeline we only need two
[21:12] lines of code and then we actually
[21:15] get what we want so we get the label and
[21:17] we get the score we are interested in
[21:19] so this might be just fine but then yeah
[21:22] if you want to do it manually
[21:23] you can do it like i showed you and you
[21:25] will get the same results that you can
[21:27] then
[21:28] use so yeah that's how you can use a
[21:30] model and a
[21:32] tokenizer and yeah so using the model
[21:35] and the tokenizer will be important when
[21:38] you for example want to
[21:39] fine-tune your model so i will show you
[21:43] roughly how to do this later but
[21:46] yeah so this is how you use model and
[21:49] tokenizer
[21:50] and let's just assume we did
[21:53] fine tune our model then what we can do
[21:56] and we can say save directory and
[22:00] specify
[22:01] a directory so let's call the folder
[22:04] saved and then we can call tokenizer
[22:08] and then we can call dot save
[22:11] pre-trained
[22:12] and then the location just the safe
[22:15] directory
[22:16] and the same with our model so we can
[22:18] say model
[22:19] dot save pre-trained save
[22:23] underscore pre-trained and then again
[22:27] the
[22:27] safe directory and then we can load them
[22:30] in another application for example
[22:33] tokenizer
[22:34] equals and then again here we use this
[22:37] auto tokenizer class
[22:39] and then the from pre-trained and then
[22:42] here
[22:43] we can give it a directory so
[22:46] this from pre-trained we can either give
[22:49] it a
[22:50] model name or we can give it this
[22:52] directory
[22:54] and again the same for the model so
[22:56] model
[22:57] equals and then we use this auto model
[23:00] for
[23:00] sequence classification dot from
[23:03] pre-trained and then the safe directory
[23:07] so this should work and then you should
[23:09] get the exact same
[23:11] model and tokenize it back and yeah as
[23:14] you might see
[23:14] these um model these dot
[23:18] from pre-trained functions are very
[23:21] important
[23:22] and you will use them a lot of time all
[23:24] right so i think these are the basic
[23:26] functions you need to build a pipeline
[23:29] or to apply the model and tokenizer
[23:31] manually
[23:33] and now let's have a look at how we can
[23:35] use a different
[23:36] model so like here you can either
[23:40] load this from your disk if you already
[23:42] have a pre-trained model somewhere on
[23:45] your computer
[23:46] but what you can also do is you can go
[23:49] to
[23:50] the hugging face model hub so you can
[23:52] find this at hugging face dot
[23:54] co slash models and here we have the
[23:58] model hub and you can search
[24:00] through different models so for example
[24:03] you
[24:04] could filter for the tasks so
[24:07] in this case we want to do text
[24:09] classification
[24:10] which is the same as sentiment analysis
[24:14] and then it filter is applies this
[24:16] filter so
[24:17] you can see the most popular model
[24:20] is already this one and then we can
[24:23] click on this and get some more
[24:25] information
[24:26] and as you could see so this is the
[24:28] exact same
[24:30] model name that we used in our code
[24:34] so once you've decided for a model you
[24:36] can click here and copy this
[24:38] name and then paste into your code
[24:41] so let's say in this case we want to use
[24:44] a different model so in this case
[24:46] i want to do sentiment classification
[24:49] with
[24:49] german sentences so then of course i
[24:53] need one that is trained on
[24:55] german so you can filter here so you can
[24:59] search so i can either again
[25:01] search for distilbert and see what
[25:03] different versions there are available
[25:06] or let me search for german
[25:09] and then here let's take the most
[25:12] popular one so
[25:14] by oliver gore and then we see this is a
[25:18] german sentiment bird and then we get
[25:21] more information and sometimes we also
[25:24] see
[25:24] some example code which is helpful so
[25:27] yeah this is nice and now what we have
[25:29] to do is we want to click here and
[25:31] copy this will just copy the name and
[25:35] then in our application let me
[25:38] comment this out and then let's again
[25:41] say
[25:42] model name equals and now i hit
[25:45] paste so now it pasted this
[25:48] string here so now we have this
[25:52] and now here we can give our model and
[25:55] tokenizer
[25:57] the model name so model name
[26:00] and model name and now let's do this for
[26:03] some
[26:04] example texts in german so let me copy
[26:07] and paste this in here so basically let
[26:10] me
[26:10] quickly translate this for you so this
[26:12] says not a good result
[26:15] this was unfair this was not good
[26:19] um not as bad as expected this
[26:22] was good and she drives a green car
[26:25] so basically these three texts are
[26:29] negative this one is rather positive and
[26:32] this
[26:33] is neutral so let's see if our model can
[26:36] detect this correctly
[26:38] so now again like above we do the same
[26:42] steps so
[26:43] we could copy and paste this so let's
[26:46] copy
[26:47] and paste this and then the same as
[26:50] above we say width torch
[26:53] torch dots no graphs and then we call
[26:57] the model so we say
[26:59] outputs equals model and then here we
[27:04] unpack our batch then we have the model
[27:08] then we want to have the label id so
[27:11] let's say
[27:11] label ids equals and then we
[27:15] use the torch.arc max function
[27:19] with the outputs and along dimension
[27:23] equals
[27:24] one and let me remove this one
[27:27] and then we print the label id so print
[27:30] the label ids
[27:32] and then we do the same as we do here so
[27:36] we want to
[27:36] convert them to the actual label names
[27:39] by calling model.config
[27:42] id to label label id for
[27:45] label in here we call this label
[27:49] ids to list and then print the labels
[27:53] and now let's run this and actually
[27:56] let's
[27:57] also print the batch in this
[28:00] case and uh let's have a look at how
[28:04] this looks like
[28:05] so let's run this and i get an error so
[28:08] here i forgot to say
[28:10] outputs dot logits like we did before
[28:14] so let's try it again and this is only
[28:16] two results so
[28:18] of course here in our tokenizer we want
[28:21] to use
[28:21] these texts so let's call this
[28:25] x train underscore
[28:28] sherman and then let's use x train
[28:31] underscore german here and let's
[28:34] run it again all right and as we can see
[28:37] we get the
[28:38] labels one one one zero zero and
[28:42] two and this is equal to negative
[28:45] negative negative then two times
[28:47] positive and then neutral
[28:49] so yeah this is exactly what i told you
[28:52] the first three sentences are rather
[28:54] negative
[28:55] than two positive ones and this one is
[28:57] neutral
[28:58] so yeah now our german model works as
[29:01] well and this
[29:02] is how we can use different models
[29:05] so we simply search the model hub and
[29:09] hopefully there is an already
[29:11] pre-trained version for the task we want
[29:14] and then we can just use this here as
[29:16] our model name and then we are good to
[29:18] go
[29:19] or if there is not a already pre-trained
[29:22] version then we have to do this
[29:24] ourselves and fine-tune our own model so
[29:27] i will show you how you do this in a
[29:29] moment
[29:30] but now one more thing i want to mention
[29:32] so
[29:33] um i want to talk about this return
[29:36] tensors equals pt so
[29:40] um if we here we print the batch and
[29:44] here the input ids and then we see
[29:47] this is a tensor so right now it's
[29:50] already
[29:50] in the pi touch format so we could
[29:54] use tensorflow here or we just um
[29:57] omit this and if we omit this
[30:00] then we don't have this in the tensor
[30:04] format
[30:04] so now it is just a python list i think
[30:08] but then what you could do is you could
[30:11] convert this so we can say
[30:13] batch equals and then we convert this to
[30:17] a tensor by saying
[30:18] torch dot tensor and then we
[30:21] give it the we call this batch
[30:24] and this is a dictionary so we can say
[30:28] batch and then access the key input
[30:33] ids like we see here and now
[30:36] we created a actual tensor out of this
[30:40] and then we don't have to
[30:43] unpack it like this here so now we
[30:45] remove this
[30:46] and then if we run it again then this
[30:49] should work as well
[30:51] and yeah this worked too so we get the
[30:53] same result
[30:54] and here we printed our batch and now we
[30:56] see this is a
[30:57] tensor directly so yeah be careful here
[31:00] to specify
[31:02] what you want so it's actually if you
[31:05] use pytorch then it's just simpler to
[31:08] use this as a return argument so return
[31:12] tensors equals pt but if you don't
[31:16] use this then you know what you can do
[31:18] otherwise all right so now we know how
[31:20] we can use different
[31:21] models so yeah try this out for
[31:24] other models in your language and see if
[31:27] this works
[31:28] and now let's have another look at how
[31:30] we can fine
[31:31] tune our own models so this is very
[31:35] important
[31:36] and i already prepared some code here
[31:39] and i will
[31:40] go over this very roughly
[31:43] but there's also a very great
[31:45] documentation
[31:46] about this so we can go to this
[31:49] documentation page here and you can also
[31:52] open this in collab so either with
[31:55] pytorch or tensorflow code so this is
[31:57] really helpful
[31:58] so i encourage you to check this out
[32:01] um but now let's go over this briefly
[32:04] so basically there are five steps you
[32:07] have to do
[32:08] um so in this example it's for pytorch
[32:12] so we have to prepare our data set for
[32:15] example
[32:16] loaded from a csv file or whatever
[32:19] then we have to load a pre-trained
[32:22] tokenizer
[32:23] and then call it with our data set so
[32:26] then we get the
[32:27] encodings or the token ids then
[32:30] we have to build a pie torch data set
[32:33] out of this with these encodings so if
[32:36] you don't know
[32:37] what the pi torch data set is then i
[32:39] will have a link for you here where i
[32:41] explain this then we also load a
[32:44] pre-trained
[32:44] model and then we can either load
[32:48] a hugging face trainer and train it so
[32:51] this abstracts away a lot of things or
[32:54] we can just use
[32:56] a native or normal python training
[32:58] pipeline like in our other pytorch code
[33:02] so yeah this is what we have to do so
[33:04] let's go
[33:05] over this very quickly so in this case
[33:08] we define our base model name so we want
[33:12] to start with
[33:13] a distilbert base uncased version
[33:17] but in this case for example not the
[33:19] fine-tuned one so
[33:20] just this one then step one we prepare
[33:23] the data set so we write a helpful
[33:25] function
[33:26] to create texts and
[33:29] labels out of the actual text
[33:33] and here we downloaded some
[33:37] data set and put it in our folder so i
[33:39] already did this here and
[33:41] yeah this is available at this website
[33:44] and this contains
[33:45] movie reviews so we want to fine-tune
[33:48] our models on movie reviews for
[33:50] sentiment classification
[33:52] so here we create training texts and the
[33:55] training
[33:55] labels with our helper function and we
[33:58] also do
[33:59] a trained test split to get validation
[34:02] texts and labels
[34:04] and yeah then as a next step
[34:07] we create or we define a
[34:10] pi torch data set so this inherits from
[34:13] pi torch data set so torch utils data we
[34:18] import data set and then we define this
[34:21] here so again i have a tutorial where i
[34:24] explain how this works
[34:26] but basically it needs the encodings
[34:29] and the labels and it stores them in
[34:32] here
[34:33] so yeah this needs the encoding so for
[34:36] the
[34:36] encodings we need a tokenizer
[34:39] so again we use this from pre-trained
[34:42] function
[34:43] with the model name and in this case
[34:46] since we know
[34:47] we use the distilled bird one we can
[34:50] use this class so remember before we
[34:53] used a generic
[34:55] tokenizer this auto tokenizer class
[34:58] and here we use a more concrete one so
[35:01] we use the
[35:02] distal bird tokenizer fast then we apply
[35:05] it
[35:06] to a training validation and test set
[35:08] and get the
[35:09] encodings then we put them in our data
[35:13] set
[35:14] and create the pi torch data set
[35:17] and then we import a trainer
[35:21] and the training argument so this is in
[35:24] available in transformers library and
[35:27] then we can
[35:28] set this up so we can create the
[35:31] arguments so here for example we specify
[35:35] the number of training epochs the output
[35:38] directory
[35:38] the learning rate and other parameters
[35:41] we want and then we
[35:42] create our model again from a
[35:46] concrete model class and then with this
[35:49] dot
[35:49] from pre-trained function and then we
[35:52] set up this
[35:54] trainer and give it the model and the
[35:56] training
[35:57] arguments and then the training set and
[36:00] the validation set
[36:02] and then we simply have to call
[36:04] trainer.train
[36:05] and this will do all the training for us
[36:07] and afterwards you can test it on your
[36:09] test data set
[36:10] and then you have a fine-tuned model so
[36:13] yeah this is basically
[36:14] all you need and then i also want to
[36:17] show you that instead of using this
[36:20] trainer if you want to do it manually
[36:22] and have
[36:23] even more flexibility you can just use a
[36:27] normal pie touch training loop so
[36:30] for this we use a data loader
[36:33] and we need an optimization so in this
[36:36] case
[36:36] we use a optimizer from the transformers
[36:39] library
[36:40] and then here we specify our device then
[36:43] again we create this
[36:44] model we push it to the device and set
[36:47] it to training mode
[36:48] then we create a data loader and the
[36:51] optimizer
[36:52] and then we do the typical training loop
[36:55] so we say
[36:55] for epoch in num epochs and for batch in
[36:59] our training loader
[37:01] and then we do the stuff we always do we
[37:03] say optimize the zero grad
[37:05] we also push it to the device if
[37:08] necessary
[37:09] then we call the model and we calculate
[37:11] the
[37:12] loss with this and in this case um
[37:15] this is already contained in the output
[37:18] so we can just
[37:19] access the loss like this then we call
[37:22] lost.backward
[37:23] and optimizer step and iterate and
[37:27] afterwards we can set our model to
[37:30] evaluation mode again and yeah this is
[37:32] how we do it in native pi touch code
[37:34] and yeah so this is basically how we do
[37:37] a
[37:38] fine tuning and then can fine-tune our
[37:41] own models and then afterwards you can
[37:42] also
[37:43] upload them to the hugging face model
[37:45] hub if you want so
[37:47] yeah i think that's pretty cool and yeah
[37:50] that's all that i wanted to
[37:52] show you for now i think that's enough
[37:54] to get started with hugging face
[37:56] and i hope you enjoyed this tutorial and
[37:58] then i hope to see you in the next video
[38:02] bye
[38:11] you