in

#Deploying Large Language Model for Hugging Face Inference #AI

Jyoti Dabass, Ph.D

Summarise this content to 300 words

Large language models (LLMs) refer to artificial intelligence systems capable of understanding and generating human-like texts at scale. These models have been trained on massive amounts of text data, typically billions of words, enabling them to learn complex patterns and relationships between words, sentences, and concepts.

Large language models

Hugging Face Inference points are lightweight, self-contained instances designed specifically for serving machine learning models with low latency and high throughput. They allow developers to easily deploy and manage models built using the open-source library Hugging Face Transformers, without requiring expertise in cloud computing or DevOps.

Hugging face

This blog aims to make you walk through the process of deploying a large language model as a hugging face Inference endpoint. Here, we will take the example of deploying Falcon-40B instruct which is a hugging face model.

Deploy LLM

To start with this, first, we need to create an account on Hugging Face (Hugging Face — The AI community building the future.). It requires an email ID and a password.

Hugging face account

Once it is done, you can log in with the credentials provided at the time of registration. On the home page, one can search for the models we need and select the model to view the details of the model.

Search for Model
Selected Model details page

As we can see on the right-hand side of the search model page, there is an option called “Deploy” under which you can observe multiple deployment options. Since we are focusing on the Hugging face inference point, we will select the same and proceed.

Deployment options

Once the inference endpoint is selected, it will take you to the page where you can create inference endpoints for the selected model. One needs to fill several fields for this.

Create endpoint.

The next step is to enter the Hugging face repository ID and the name of the desired endpoint.

Model repository and endpoint name

Next, select the cloud provider and region.

Cloud provider and region

Next, define the security level for the endpoint.

Security level

Create your endpoint by clicking “Create Endpoint”. By default, the endpoint is created with a large GPU. The estimated cost assumes that the endpoint will be up for an entire month and does not include autoscaling into account.

Create endpoint.

We need to wait for the endpoint to build, initialize, and run which may take up to 10 mins.

Wait for the building and running endpoint.

Finally, we can test the endpoint with the API endpoint generated from deployment using the Hugging face access token.

Test endpoint

Cheers!! Happy reading!! Keep learning!!

Please upvote if you liked this!! thanks!!

You can connect with me on Jyoti Dabass, Ph.D | LinkedIn and jyotidabass (Jyoti Dabass, Ph.D) (github.com) for more related content. Thanks!!

Source link

Source link: https://python.plainenglish.io/deploying-large-language-model-as-hugging-face-inference-points-159b9a3bf4a2?source=rss——large_language_models-5

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

A Recruiter's Guide to Hiring AI Talent - ClearanceJobs

#AI tool TripoSR generates 3D models from images #Stability

Hugging Face Safetensors vulnerable to supply chain attacks

#HuggingFace Safetensors vulnerable to supply chain attacks #Cybersecurity