What is Hugging Face? 3 Cool Things You Can Do
44sQuickly introduces Hugging Face and teases three practical demos, hooking viewers interested in AI.
▶ Play ClipThis video from Simplilearn introduces Hugging Face, a company that provides pre-trained AI models for language tasks like translation, text analysis, and generation. The presenter demonstrates three practical applications: speech-to-text, sentiment analysis, and text generation using the Transformers library and pipelines.
Hugging Face is a company that helps people use AI models for language tasks like translation, text analysis, and generating new text. They created the Transformers library with pre-trained models.
The video covers speech-to-text, sentiment analysis, and text generation. Speech-to-text turns spoken words into written text. Sentiment analysis determines if text is positive, negative, or neutral. Text generation creates human-like text.
The Transformers library is installed via pip install transformers. It allows downloading and running thousands of pre-trained open-source AI models.
A pipeline describes the flow of data from origin to destination and defines how to transform data. It simplifies using Hugging Face models.
Libraries imported include librosa (audio analysis), torch (PyTorch), IPython.display (interactive display), and transformers (Wav2Vec2ForCTC, Wav2Vec2Tokenizer).
The model used is 'facebook/wav2vec2-base-960h'. The tokenizer and model are loaded using from_pretrained.
An audio file (v.m4a) is loaded using librosa.load with a sampling rate of 16000. The audio is played using IPython.display.
Input values are obtained by tokenizing the audio with return_tensors='pt'. Logits (non-normalized predictions) are extracted from the model.
Predicted IDs are obtained via torch.argmax on logits, then decoded using tokenizer.decode to get the transcription: 'hello and welcome this is an ai voice message'.
Libraries imported: warnings, numpy, pandas, matplotlib, seaborn, sklearn (train_test_split, metrics), transformers (pipeline), torch.
A sentiment analysis pipeline is created using pipeline('sentiment-analysis'). It classifies text as positive or negative with confidence scores.
Testing with 'this is a great movie' returns label POSITIVE with high score. 'I did not understand any of it' returns NEGATIVE.
A CSV file 'airline_tweets.csv' is loaded using pandas. The dataset contains columns: tweet_id, airline_sentiment (neutral, positive, negative), and text.
Rows with neutral sentiment are filtered out, leaving 11,541 rows. Sentiment is mapped: positive -> 1, negative -> 0.
The classifier predicts sentiments on the text column. Predictions are converted to binary (1 for positive, 0 for negative).
Accuracy is 88.99%. A confusion matrix is plotted showing true vs predicted labels for negative and positive classes.
ROC AUC score is 0.94 (94%), indicating high accuracy and effectiveness in classifying sentiment.
A dataset of poems (robert_frost.csv) is loaded. Content column is extracted, lines are split and cleaned.
A text generation pipeline is created using pipeline('text-generation'). It generates text based on a prompt.
Using a line from a poem as prompt, text is generated with max_length=20. Example: 'Whose woods these are I think I know' generates 'I wish to go to church because I feel'.
With max_length=30 and num_return_sequences=2, two different continuations are generated from the same prompt.
A custom prompt 'Transformers have a wide variety of applications in NLP' is used to generate text with max_length=100.
Hugging Face simplifies AI model usage with pre-trained models and pipelines. The video demonstrates speech-to-text, sentiment analysis, and text generation, showing how to implement these tasks with minimal code.
"Title accurately describes the content: Hugging Face, models, transformers, and pipelines are all covered with demos."
What is Hugging Face?
A company that provides pre-trained AI models for language tasks like translation, text analysis, and text generation.
00:04
What is the Transformers library?
A Python library created by Hugging Face that allows downloading, manipulating, and running thousands of pre-trained open-source AI models.
03:05
What does a pipeline in Hugging Face do?
It describes the flow of data from origin to destination and defines how to transform data along the way.
03:51
What is librosa used for?
A Python package for audio and music analysis, used to extract key audio features and metrics from audio files.
05:18
What is tokenization in NLP?
The conversion of text into meaningful lexical tokens belonging to categories like nouns, verbs, adjectives, punctuation, etc.
08:02
What model is used for speech-to-text in the demo?
facebook/wav2vec2-base-960h
09:10
How do you get the transcription from the model?
Tokenize audio, get logits from model, apply torch.argmax to get predicted IDs, then decode with tokenizer.decode.
13:52
What accuracy did the sentiment analysis model achieve on the airline tweets dataset?
88.99%
32:10
What does ROC AUC score measure?
It sums up how well a model can produce relative scores to discriminate between positive and negative instances across all classification thresholds.
37:46
What is the purpose of the text generation pipeline?
To generate creative and coherent text based on a given prompt or starting point.
39:08
Hugging Face Introduction
Defines Hugging Face as a company simplifying AI model usage for language tasks.
00:04Transformers Library
Explains the core library that enables access to thousands of pre-trained models.
03:05Pipeline Concept
Describes the pipeline as a data flow abstraction that simplifies model usage.
03:51Sentiment Analysis Accuracy
Demonstrates high accuracy (88.99%) and ROC AUC (94%) on a real dataset.
16:50Text Generation from Poems
Shows creative text generation using a poem dataset and custom prompts.
39:08[00:04] Hello everyone, I am Mang and welcome to
[00:06] our YouTube channel. Today we are
[00:08] exploring hugging face, an amazing tool
[00:10] that makes working with language and AI
[00:13] super easy. If you're curious about
[00:15] using advanced technology to understand
[00:17] or create text, this video is perfect
[00:19] for you. Hugging is a company that helps
[00:22] people use AI models for language tasks
[00:24] like translation, text analysis or even
[00:27] generating new text. They have created a
[00:29] library called transformer which comes
[00:31] with pre-trained models. So you don't
[00:34] have to build everything from scratch.
[00:36] It's simple, powerful, and perfect for
[00:38] developers, researchers, or even
[00:40] beginner. So in this video, I will show
[00:43] you three cool things you can do with
[00:44] hugging face. Speech to text. Turn
[00:47] spoken words into written text easily.
[00:49] Great for captions or voice apps. Second
[00:52] one is sentiment analysis. Find out if a
[00:54] text is positive, negative, or neutral.
[00:57] helpful for understanding reviews or
[00:59] comments. The third one is text
[01:01] generation. Create humanlike text that
[01:04] sounds natural, perfect for chat bots or
[01:06] creative writing. I will also explain
[01:08] some important basics like how pipelines
[01:11] make using hugging face model super easy
[01:14] and how tokenization helps AI understand
[01:16] text. By end of this video, you will
[01:19] know how to use hugging phase to start
[01:21] building your own projects. It's simple,
[01:23] fun, and really powerful. Before we move
[01:26] forward, unlock the future of AI with
[01:28] comprehensive program in collaboration
[01:29] with ENICT Academy, IIT Goharti. In just
[01:33] a few months, you will master cutting
[01:34] edge skills in generative AI, large
[01:36] language models and machine learning
[01:38] from learning tools like charge and DLE2
[01:42] to gaining hands-on experience with
[01:43] Python and TensorFlow. This course will
[01:46] equip you to lead AI projects and
[01:48] qualify for top roles like machine
[01:50] learning engineer or data scientist.
[01:52] plus with an IIT certificate on your
[01:55] resume. You will stand out to the best
[01:57] employers in tech. So don't forget to
[01:59] check out the coursing from the
[02:00] description box below and the pin
[02:02] comment. So without any further ado,
[02:03] let's get started. So welcome to this
[02:06] demo part of this video. So here as you
[02:09] know we will be doing three things.
[02:12] First thing speech to text recognition.
[02:14] The second thing text generation from a
[02:16] particular sentence or a word. And the
[02:20] third thing we will do sentiment
[02:22] analysis. Okay. So we will perform this
[02:25] one by one. So let's start with
[02:28] something called speech to text
[02:30] recognition. Okay. Using hugging face.
[02:33] So first I will write rename
[02:36] this. I will write here hugging
[02:41] face speech to
[02:44] text. Okay. Yeah. So first I will
[02:49] install transformers
[02:53] library. It is already installed on my
[02:56] system but again just for you as I'm
[02:59] doing this okay pip install transformer.
[03:02] So now let's see what is transformers.
[03:05] So transformers is a powerful Python
[03:07] library created by hugging face. So that
[03:10] allows you to download, manipulate and
[03:13] run thousands of pre-trained open-source
[03:15] AI model. Fine. So as you can see
[03:18] requirement already satisfied. It means
[03:21] like already installed. So these
[03:24] transformer models cover multiple tasks
[03:26] across you know modities like NLP,
[03:29] natural language processing, computer
[03:31] vision, audio and multimodal learning
[03:33] like many things. Fine. So now I will do
[03:38] from
[03:41] transformers
[03:43] import
[03:45] pipeline. Okay. So now let's run it. So
[03:50] now what is
[03:51] pipeline? Okay. So a transformer
[03:54] pipeline describes the flow of data from
[03:57] origin system to destination system.
[04:00] Fine. because you already know
[04:03] transformer means you can run or
[04:05] manipulate thousands of pre-trained
[04:07] opensource AI model. So pipeline
[04:10] describes the flow of data from origin
[04:12] system to destination system and defines
[04:14] how to transform you the data along the
[04:17] way. Okay. So let's check
[04:22] the
[04:24] versions
[04:26] transform
[04:33] mer
[04:39] version. Okay that long
[04:43] tab. Okay. Why it is
[04:46] coming? Import
[04:50] transformers. Yeah. So we have currently
[04:55] 4.42.4 version of the transformer. Okay.
[04:58] So I'm using Google Collab. Okay. So you
[05:01] can use either Jupyter notebook, Visual
[05:04] Code Studio or
[05:06] Collab. Okay. So now let's import
[05:11] libraries. import
[05:14] librosa. Okay. So what is librosa? So
[05:18] librosa is a python package for audio
[05:21] and music analysis. Right? Because we
[05:24] are doing speech to text. So we need
[05:26] this. So it provides various functions
[05:28] to quickly extract key audio features
[05:31] and metrics from the audio
[05:34] files. Okay. And this librosa can also
[05:38] be used to analyze and manipulate audio
[05:40] files in a variety of formats such as
[05:44] wav, mp3, m4a and like that. Okay. So
[05:50] next we will import
[05:53] torch. I hope everyone know
[05:56] torch. So torch is nothing pytorch. It
[06:00] is a machine learning library based on
[06:02] the torch library used for applications
[06:04] such as computer vision and NLP and
[06:07] originally it was developed by the meta
[06:11] and now the part of Linux foundation you
[06:13] know umbrella let's import one more
[06:16] thing
[06:19] import so now let's
[06:23] import
[06:28] IPython
[06:30] Okay capital
[06:33] ipython dot
[06:37] display as
[06:39] display. So now you will be wondering
[06:42] what is this IPython
[06:43] display. Okay. So it is an interactive
[06:48] command line uh for the terminal for
[06:51] Python and it uh you know it can provide
[06:55] a IPython terminal and the web-based
[06:57] notebook platform for Python
[06:59] computing and this uh IPython have you
[07:04] know more advanced features than the
[07:05] Python standard interpreter and we
[07:08] quickly execute a single line of Python
[07:10] code nothing okay for that I'm using
[07:13] this and the Next is from
[07:19] transformers
[07:23] import
[07:25] wav
[07:28] to
[07:31] vectorzer for
[07:34] CTC and
[07:38] WAV
[07:39] to vectorizer to tokenizer.
[07:46] Okay. So now what is
[07:50] this so WA2 vectorizer for CTC is
[07:53] supported by the notebook on how to
[07:56] fine-tune a speech recognition
[07:59] model. Okay. And this tokenizer so
[08:02] tokenizer is nothing tokenization is a
[08:05] conversion of a text into meaningful
[08:08] lexical tokens belonging to categories
[08:11] defined lexa program. in case of NLP. So
[08:15] those categories include nouns, verbs,
[08:17] adjective, punctuation, ATC. Okay. Now,
[08:21] and the last library, let's import
[08:26] numpy, s np. I hope everyone know numpy.
[08:30] What is numpy? Again, numpy is a python
[08:32] library used for working with arrays. It
[08:34] al it also has a function you know for
[08:36] working in domain like linear algebra
[08:39] fer transform and matrices and many
[08:42] others. Fine. I hope everyone know about
[08:47] these library which we have
[08:49] imported. Okay. So let's move forward by
[08:58] tokenizer
[09:04] to
[09:08] [Music]
[09:10] tokenizer dot
[09:14] from
[09:17] pre-trained. This is why I love this
[09:19] Google collab. view the site just some
[09:22] words and it will show the
[09:24] suggestions you know
[09:27] Facebook so this is again one pre uh
[09:30] trained model okay so we are just
[09:33] importing
[09:35] it
[09:38] base
[09:40] 960th this is nothing the name okay then
[09:45] model equals
[09:48] to wav2
[09:50] to for
[09:53] CTC dot
[09:57] from
[10:00] retrained then
[10:04] Facebook
[10:06] WV2
[10:08] vectorizer
[10:10] 2 then
[10:14] base
[10:17] 960
[10:19] Fine. Now let's run
[10:22] it. So now you can see. So we are
[10:25] loading this. Okay, they they're
[10:31] downloading. Okay. So you can ignore
[10:33] these warning. So now let's load the
[10:36] audio file. So here I will write
[10:39] audio
[10:43] sampling
[10:46] rate equals to
[10:50] librosa
[10:52] dot
[10:55] load. Okay. So you know about libroser
[10:59] right? Then I have v
[11:03] dot m4 a. So we I have already one
[11:09] speech or one audio you can say.
[11:13] So I will play it. Don't worry. Before
[11:17] the final output I will you
[11:20] know show you 15,000.
[11:27] Okay. So now I will write
[11:31] Okay. Okay. I've already loaded I
[11:38] guess.
[11:41] Okay. Okay. No issues. I will load
[11:48] again. Okay. The file is
[11:52] loaded. I will rename it to
[11:56] V. Yeah. Fine. So now it won't there
[12:01] won't come.
[12:05] Again we have v.mpp4
[12:09] in.
[12:11] Okay. So
[12:13] now what we have to
[12:18] do?
[12:20] Okay. Let me try
[12:23] it. We have
[12:25] v.mpp4 but I don't know why it is not
[12:28] coming.
[12:32] Okay, let me copy the path. And now it
[12:36] will run I
[12:39] guess it's running.
[12:43] Yeah. So now I will write here
[12:47] audio comma
[12:50] sampling
[12:54] rate. Okay. Now listen carefully. Then I
[12:57] will write
[13:01] display display dot
[13:06] audio
[13:08] then
[13:10] path
[13:17] comma auto
[13:20] play to true. Okay.
[13:26] Hello and welcome. This is an AI voice
[13:28] message.
[13:30] I guess you heard but again let me play
[13:34] play it again. Hello and welcome. This
[13:36] is an AI voice message.
[13:39] Okay. So it is saying hello and welcome.
[13:42] This is an AI voice message. Fine. So
[13:46] now what I will do? I will input some
[13:49] values.
[13:52] Input values equals
[13:59] tokenizer then
[14:03] audio comma
[14:06] return
[14:09] tensors equals to
[14:12] pt fine
[14:15] dot
[14:17] input values.
[14:20] Okay. Then here
[14:24] input
[14:27] values. Okay. Input
[14:31] values.
[14:35] Okay. Yeah. So now logits will come
[14:40] here. So we have to now store the logits
[14:43] which means non-normalized predictions.
[14:46] Okay. So logits equals
[14:50] to
[14:53] model
[14:55] input
[14:57] values dot
[15:00] logits.
[15:01] Okay. Then here I'll write
[15:08] logits. Okay. It's running. Done. So now
[15:11] what we will do? We will store predicted
[15:14] ids. Then we will pass the logits to
[15:17] values to softmax to get the predictive
[15:20] value. Okay. So here I will write
[15:23] predicted
[15:28] ids equals
[15:32] torch dot arg
[15:35] max. Okay. Then
[15:39] logit dimension equals to minus one.
[15:45] Okay. Then I will pass the you know
[15:48] prediction uh to the tokenizer decode to
[15:51] the
[15:52] transcription. Okay. So here I will
[15:55] write
[15:57] transcriptions equals
[15:59] to
[16:03] tokenizer
[16:08] tokenizer dot
[16:12] decode predicted
[16:16] ids
[16:19] zero running fine. So now let's see our
[16:24] output
[16:27] transcriptions. So now you can see hello
[16:30] and welcome. This is an AI voice voice
[16:32] message. So now let's play it again.
[16:34] Hello and welcome. This is an AI voice
[16:37] message.
[16:38] Amazing right? So this is how you can
[16:41] use hugging face for this piece to text
[16:43] recognition. It's a very small code line
[16:47] of code as you can see. Okay. So now
[16:50] let's move forward and do the sentiment
[16:55] analysis. Okay. Or text
[16:58] generation. So let me open new
[17:02] drive. So
[17:03] yeah. So now I will write
[17:07] here. I will write sentiment
[17:12] slash
[17:14] text
[17:15] [Music]
[17:18] generation hugging
[17:20] face.
[17:22] Okay. So first of all let's import the
[17:26] files. So first I'll write
[17:30] import
[17:32] warnings then
[17:36] warnings dot filter
[17:39] warnings okay I will write here
[17:44] ignore then we'll
[17:47] import
[17:48] numpy you already know what is numpy and
[17:54] import panda so these are some basic
[17:56] basic uh Python library and I hope
[17:59] everyone knows and import Matt plot lib
[18:03] for the
[18:04] plots mattplot lib
[18:08] dot
[18:10] piplot as
[18:13] plt then I will import cbond for the
[18:19] graphs the statics graph okay SNS then
[18:25] SNS dot
[18:27] set. So now we'll import skarn model
[18:32] train
[18:33] split. So skarn is nothing scikitlearn
[18:37] is a probably the most useful library
[18:39] for machine learning python. So, so this
[18:42] uh library contains a lot of efficient
[18:44] tools for machine learning and you know
[18:46] statical modeling including
[18:48] classification you can do regression you
[18:50] can do clustering you can do from this
[18:52] escal okay
[18:55] from
[18:57] skarn dot
[19:00] model
[19:01] [Music]
[19:03] selection
[19:10] import train test
[19:15] split from
[19:23] skarn
[19:25] dot
[19:27] matrix
[19:29] import f1
[19:36] Comma confusion
[19:41] matrix comma ro
[19:49] Aore. So again we will import gyms from
[19:57] transformers
[19:59] import pipeline. Everyone know what is
[20:03] this? Then
[20:07] import
[20:09] torch. Okay. Now let's run
[20:12] it. You have too many active session.
[20:14] Tell me existing to continue. If you're
[20:17] interesting using one more
[20:19] session. Okay. Wait. Yeah. Fine.
[20:24] Now which one is open,
[20:27] bro? Terminate this.
[20:31] Terminate this. Terminate
[20:34] this. Okay. Fine. So now let's do
[20:38] sentiment
[20:41] analysis. So uh we will explore
[20:43] sentiment analysis using a pre-trained
[20:45] transform model. Okay. The hugging face
[20:48] library provides you know convenient
[20:50] pipeline as you already know function
[20:52] that allows us to easily perform
[20:54] sentiment on the text. So now let's
[20:57] first import the necessary dependencies.
[21:01] and create a sentiments pipeline using
[21:04] this line. So I will write here
[21:08] classifier equals
[21:10] to
[21:14] pipeline
[21:16] sentiment
[21:20] analysis then
[21:24] type classifier.
[21:29] Okay. So, it will download the
[21:32] pre-trained sentiment analysis pipeline.
[21:36] Pipeline is not
[21:39] defined.
[21:40] How?
[21:42] Okay. Okay. Error is
[21:47] there. I hope it will run now. Yeah. So
[21:52] now we can uh pass a single sentence or
[21:56] a list of sentence to the classifier and
[21:58] now and get the predictive sentiment
[22:01] labels and
[22:02] associate confidence score. Fine. So
[22:07] now just for the
[22:09] testing classifier. So, let's write
[22:14] this is a
[22:20] great
[22:23] movie. Okay, now let me run it. So now
[22:26] you can see here label positive. So
[22:29] label positive typically refers to the
[22:32] outcome or a class of interest that the
[22:35] model is designed to predict. So here we
[22:38] are just checking the sentiment analysis
[22:40] model. Okay. So this is why I wrote
[22:45] this. Okay. So let's check one
[22:56] more.
[22:59] This was
[23:02] a
[23:04] great
[23:07] course.
[23:09] Then
[23:11] I did
[23:14] not
[23:16] understood any of
[23:20] it. Now let's check
[23:22] this. Yeah, perfect. And the score you
[23:26] can see the accuracy score 99%. 99%
[23:29] which is almost close to 100%. And which
[23:32] is amazing. So now you have access to a
[23:34] GPU. So you can also utilize this for
[23:37] the faster processing by specifying the
[23:39] device using the parameter. Okay. So
[23:44] now what I will
[23:47] do first I
[23:51] will okay wait. Yeah fine. So now I will
[23:57] import data
[24:00] set a line
[24:04] tweets equals to pd dot read. PD means
[24:10] pandas library here we are using. Okay.
[24:13] Now first import will import here
[24:17] itself. So I have twitter dot tweets dot
[24:21] csv.
[24:25] Okay.
[24:27] Tweets dot
[24:29] CSV. Let it
[24:33] upload. Yeah. Done. So now let
[24:37] me
[24:41] airline tweets.
[24:47] Y dot head means you it will show me the
[24:51] top five
[24:53] rows. Okay. 0 1 2 3 4 5. Okay. You can
[24:57] see tweet ID a line sentiment neutral
[24:59] positive neutral negative. Then this is
[25:01] this. Okay. So
[25:03] now let's do something. Okay. So what I
[25:08] will do df equal df means data frame a
[25:12] [Music]
[25:15] line to
[25:17] it then I will write a
[25:21] line
[25:23] sentiment we have airline sentiment this
[25:27] column
[25:29] okay text I need these two
[25:33] columns again df do
[25:37] had
[25:39] five. So yeah, air a cime neutral and
[25:45] this is what this text is all about.
[25:49] Okay, because these two are the main
[25:52] things text and the sentiment. So now
[25:55] let's make
[25:58] plot count
[26:01] plot then I will write df
[26:04] comma x =
[26:07] to a line
[26:11] sentiment then
[26:17] pallet equals
[26:20] to
[26:23] is then here I will write plt dotx
[26:30] label a line
[26:35] sentiment okay
[26:37] then plt dot
[26:43] label
[26:46] count plt dot show
[26:52] So it
[26:53] will draw one
[26:56] graph.
[26:59] Okay. Okay. Spelling is
[27:04] wrong. Yeah. So it will show neutral
[27:07] positive sentiment and the negative. So
[27:09] as you can see the negative sentences or
[27:11] the sentiments are the
[27:14] more. Okay. So now so we have now three
[27:18] classes which do not match the two class
[27:20] available in the hugging face pipeline.
[27:23] So therefore we will filter out all the
[27:26] rows which have been labeled as neutral.
[27:29] Okay. So,
[27:31] DF equals to
[27:34] DF and again
[27:37] DF airline
[27:40] sentiment.
[27:42] Fine was not equals
[27:45] to
[27:48] neutral. Okay. Then
[27:53] DF
[27:55] target equals to
[27:58] DF
[28:01] airline
[28:05] sentiment dot
[28:07] uh
[28:10] map then I will write here
[28:17] positive positive 1 and the negative
[28:20] 0
[28:22] find negative will
[28:26] be
[28:28] zero. Okay. Then
[28:34] print number of
[28:39] rows comma df
[28:43] dot shape.
[28:50] Okay. So now you can see number of rows
[28:54] are
[28:56] 11,541. Okay. So now I will write
[29:03] predictions
[29:06] five it will
[29:09] okay predictions is not defined.
[29:14] Okay. So now what I will
[29:18] do. So here I will write
[29:22] text= to
[29:26] TF text
[29:29] text and to
[29:31] list dot to
[29:33] list then here I will add
[29:37] predictions equals
[29:39] to
[29:42] classifier text.
[29:45] Okay. So here write
[29:50] probabilities equals
[29:52] to
[29:54] predictions
[29:56] then
[29:58] score. If
[30:02] predictions
[30:05] label
[30:07] dot
[30:08] starts
[30:13] with starts with P means
[30:17] positive. Okay.
[30:21] else 1
[30:23] minus prediction
[30:26] score
[30:29] prediction
[30:31] score
[30:33] okay then I will write
[30:35] for
[30:37] prediction and
[30:39] prediction why it is not
[30:43] running this is taking time too much
[30:46] time I don't know
[30:49] why So as you can see finally we have
[30:52] the output predictions values. So this
[30:55] depends on you know system to system. My
[30:58] system took almost 17 minutes 47 seconds
[31:01] to complete. Okay. So now let's run
[31:06] it. Okay. Predictions is now defined
[31:10] row. Now
[31:13] predictions. Yeah. So now I will write
[31:17] here predictions equals to
[31:21] np dot array. np means numpy numpy dot
[31:27] array then I will write one if
[31:32] prediction label is
[31:36] positive
[31:37] dot
[31:39] starts
[31:42] with p. Okay.
[31:45] else
[31:48] zero
[31:51] for prediction and prediction
[31:56] values.
[31:59] Yep. Okay. Fine. So now let's check the
[32:02] accuracy of our model.
[32:10] Branch
[32:12] accuracy.
[32:15] Okay, then I will do round
[32:19] up the values. So here I will write np
[32:24] dot
[32:27] mean then
[32:30] df then the
[32:32] target
[32:34] right.
[32:36] Yeah. Then I will write equals equals to
[32:41] predictions. Then I will write into
[32:46] 100
[32:48] comma two. Then I will write
[32:54] percentage.
[32:56] Okay. Fine. Looks good. Yes.
[33:01] So as you can see our accuracy is
[33:04] 88.99% which is you know very good
[33:07] again. So now let's do some confusion
[33:12] matrix
[33:15] confusion
[33:17] matrix then
[33:22] df
[33:25] target predictions comma
[33:29] normalize equals
[33:31] to
[33:34] true. Okay.
[33:36] Then I will
[33:39] plot confusion
[33:44] matrix. Okay.
[33:52] DF
[33:54] then
[33:58] target predictions command
[34:03] normalize equals
[34:06] to
[34:08] true. Okay. Okay. My bad. My bad. My
[34:10] bad.
[34:14] that
[34:18] confusion
[34:19] [Music]
[34:22] matrix
[34:24] comma
[34:26] labels. Okay. So here uh I will you
[34:30] know plot a confusion matrix using
[34:33] cbond. Okay. So here I will use args
[34:36] which is confusion matrix np array. So
[34:40] which is you know labels list. So I will
[34:43] write plt dot
[34:47] figure then figure
[34:50] size equals to 8 comma
[34:55] 6 that's sn
[34:57] dot
[35:00] set font
[35:03] scale equals to
[35:09] 1.4. Okay. Then let's create the heat
[35:17] map diffusion
[35:20] matrix comma and not will be
[35:26] true. Okay. Then I will write
[35:30] fmp equals to
[35:33] g then confusion
[35:36] map equals to
[35:40] [Music]
[35:43] blues. Then X ticks
[35:47] labels equals to
[35:49] labels. Then
[35:56] Y labels equals
[36:00] to
[36:01] labels.
[36:05] Okay. Yes. So now I'll write plt
[36:09] dot title will
[36:14] be confusion
[36:17] matrix then plt dox label will
[36:23] be predicted values then plt doy label
[36:29] will
[36:30] be actual values then plt dot
[36:38] show.
[36:40] Okay. Why the chart is not
[36:45] coming? Okay. So I have to write
[36:51] plot confusion matrix then write
[36:56] cm
[37:00] then
[37:04] negative
[37:08] comma
[37:10] positive. Okay. Now the chart will come.
[37:21] Okay. Yeah. So now you can see here this
[37:25] is you know actual and this is a
[37:28] confusion matrix and the negative and
[37:31] the positive ratio is there. Okay. So
[37:35] now let's print
[37:37] the let's check the ROC score. So I will
[37:40] write here
[37:46] print
[37:47] ro A
[37:52] score then here I will write C A
[37:58] score then
[38:02] TF
[38:06] target
[38:08] props.
[38:10] Okay. So 94. Okay. So first let me tell
[38:15] you what is this ROC AC score. So this
[38:18] is the area under the ROC curve. So what
[38:21] it does it sum up how well a model can
[38:24] produce relative scores to discriminate
[38:26] between positive and the negative
[38:28] instance across all the classification
[38:31] threshold. So with this ROC score of
[38:35] 940.94 which is 94% we can conclude that
[38:38] the that a pre-trained sentiment
[38:41] analysis model has achieved the high
[38:43] level of accuracy and
[38:45] effectiveness. So in predicting the
[38:47] sentiment labels so this indicates that
[38:49] the model is capable of accurately
[38:51] classifying text into positive or the
[38:53] negative sentiment categories. Okay. So
[38:56] now we'll do text generation. Okay. So
[39:00] text generation involves generating
[39:01] creative and coherent text based thing.
[39:05] So I will write here
[39:08] text.
[39:10] Okay. So text generation involves
[39:12] generating creative and coherent text
[39:14] based on a given prompt or starting
[39:16] point. So what we will do first? We will
[39:19] import the necessary dependencies and
[39:20] load the data data set of the poem.
[39:23] Okay. So here I am write poems. Let me
[39:26] write poems equals to pd dot read dot
[39:30] csv. Don't worry I will give you these
[39:34] files.
[39:37] Okay description box below I will
[39:40] add. So here I will write
[39:43] robot
[39:47] frost with dot
[39:51] csc. Fine. And then I will write
[39:56] poems dot
[39:58] head
[40:01] five or head just have to write it
[40:04] module pandas
[40:07] has okay not
[40:09] doc
[40:12] csv
[40:13] yeah okay it will show the top five
[40:16] stopping by woods on a snowy evening
[40:19] fire and ice the aim was
[40:21] song collection content, year of
[40:25] publish.
[40:27] Okay. So now what we will do? We will
[40:33] write
[40:35] content equals to
[40:44] poems
[40:46] content dot
[40:49] drop na to to
[40:57] list. Okay. So to generate text we
[41:00] extract individual lines from the poems
[41:03] and use the pipeline text generation
[41:05] function to create a text generation
[41:07] pipeline. Fine. So here I will write
[41:11] lines equals to then I will write
[41:16] for poem and
[41:21] content. Okay. For
[41:25] line and poem
[41:29] dot
[41:30] split into the next
[41:33] line.
[41:36] Okay. Then lines dot
[41:41] append. Then I will add
[41:43] line dot write
[41:47] strip to the right. It will add. Okay.
[41:56] Fine. So now lines equals
[42:00] to line for
[42:03] line in lines.
[42:08] If
[42:10] length of
[42:12] [Music]
[42:14] line
[42:17] zero then show me the
[42:21] lines
[42:23] five okay so now you have seen here
[42:27] whose words these are I think I know
[42:29] then the next line his house is in the
[42:31] village though the next line he will not
[42:33] sing okay like this so now Let's
[42:38] uh you know import that pipeline text
[42:40] generation module. Okay. Gen equals
[42:45] to
[42:46] [Music]
[42:49] pipeline. So these are the some
[42:51] pre-trained model you already
[42:55] know
[43:02] generation. Okay. So now I will write
[43:06] here
[43:09] lines. Let's run
[43:12] it. Yeah. Done. So in line zero you have
[43:17] we have whose votes these are. I think I
[43:20] know. Okay. So now we can now generate
[43:23] tags by providing a prompt and
[43:25] specifying the parameters such as max
[43:27] length and num return sequence. Okay.
[43:30] Why? Because we have imported this text
[43:31] generation module. Okay. For example,
[43:35] see
[43:36] [Music]
[43:37] gen. Okay,
[43:40] sorry.
[43:42] Lines
[43:44] zero dot max
[43:50] length max length equals to 20. So now
[43:53] it will generate the this to the maximum
[43:58] 20. Okay. till 20 word maximum length.
[44:03] Okay.
[44:05] Check. Okay.
[44:07] Uh okay. Chance expression cannot
[44:11] contain perhaps double is equals
[44:13] to where
[44:18] okay now it will run. So see the line
[44:22] was this much only whose words these
[44:25] are. I think I know. Whose words these
[44:27] are? I think I know. But here the
[44:31] generated text I wish to go to church
[44:33] because I feel
[44:34] like okay this is how you can do you
[44:37] know text generation. Now let's check
[44:40] for the more
[44:42] gen lines
[44:44] 1. Okay. Then
[44:49] max length equals to
[44:53] 30,
[44:54] [Music]
[44:56] num return sequence
[45:00] says
[45:02] sequences equals to
[45:05] two. Okay. So our first line was this.
[45:10] His house is in the village though.
[45:12] Okay. 0 1 2 3 4 5. Okay. So here you can
[45:16] see see his house in the village though.
[45:19] However you might say that the place was
[45:21] the same with the place this this this.
[45:24] So these are the generated text. Okay.
[45:27] And this is another you know return
[45:29] sequence second. So there are two one
[45:32] and the second. Okay. So
[45:36] now let me you know
[45:39] import text.
[45:44] Okay,
[45:46] then creating function wrap x. I don't
[45:51] need
[45:57] this. Then
[46:02] return text
[46:05] wrap dot
[46:08] fill x comma
[46:13] replace white
[46:17] space wide space equals to
[46:26] false comma uh
[46:30] fix
[46:32] sentence endings equals to
[46:36] true. Okay. So now
[46:40] out equals to generated
[46:46] lines
[46:48] zero to maximum
[46:55] length equals to 30.
[47:00] Then
[47:04] print wrap
[47:07] out to zero.
[47:09] [Music]
[47:11] Then
[47:15] generated
[47:17] text. Okay. So now we are setting uh the
[47:21] pad tokens to EOS tokens. Okay. So whose
[47:24] h these are I think I know. And this is
[47:27] a maximum 30 till 30.
[47:29] Okay. So now I will write
[47:34] here okay preview equals to okay. So now
[47:39] what you can do you
[47:42] can you know generate a prompt to
[47:45] generate text on a specific topic like
[47:48] this prompt equals
[47:52] to
[47:55] transformers have a wide
[47:59] variety of
[48:04] applications application in NLB
[48:10] Okay. So this is my prompt. Okay. This
[48:12] I'm not importing from the data set.
[48:15] Okay. So I will write out equals to
[48:21] gen prompt comma max
[48:27] length equals to 100.
[48:36] print
[48:38] wrap out.
[48:42] Okay, here I will write anyways here I
[48:46] will write
[48:50] generated
[48:55] text
[49:02] prompt. Okay, so it is
[49:06] running. Let's wait.
[49:13] Yeah.
[49:15] So, okay. Somewhere is
[49:34] there. Okay. So, here the issue is with
[49:37] 100 words I go with. We'll go till
[49:42] 50. Yeah. So let me do it again.
[49:48] 100. Yeah. So now you can see we have we
[49:52] can generate text using by giving a
[49:56] prompt. So we have covered you know
[50:00] three topics in this. First thing is
[50:02] piece to text using hugging face. Second
[50:04] thing is sentiment analysis using
[50:06] hugging face. And the third is text
[50:09] generation using hugging face. Okay. And
[50:12] in this text generation we have we did
[50:14] with two methods. One with the data set
[50:18] and second thing with the you know by
[50:20] giving prompt like in charge you can
[50:23] say. So with this we have come to end of
[50:25] this video. If you have any question or
[50:27] doubt please feel free to ask in the
[50:28] comment section below and our team of
[50:30] experts will help you as soon as
[50:31] possible. So thank you and keep learning
[50:33] with simply learn.
[50:35] [Music]
[50:38] Hi there. If you like this video,
[50:40] subscribe to the SimplyLearn YouTube
[50:42] channel and click here to watch similar
[50:44] videos. To nerd up and get certified,
[50:47] click here.
⚡ Saved you time reading this? Transcribe any YouTube video for free — no signup needed.