Deploy Financial BERT model API in AWS SageMaker | Arjun #FinancialBERTDeployment


Summarise this content to 300 words

In our recent articles, we delved into the realm of Stock Market Analysis using Python and LLM (Large Language Models). This series explores the application of LLM in various facets of market analysis, including technical pattern analysis, fundamental analysis, and news-based trend analysis. If you’re a retail investor keen on leveraging the power of AI and technology for your investments, I highly recommend diving into the comprehensive insights provided in this Python and LLM for Market Analysis series. Whether you’re a seasoned investor or just starting, these articles, from Part I to Part V, offer valuable perspectives and strategies to enhance your understanding and decision-making in the dynamic world of finance

Up until now, our exploration into news sentiment analysis has been powered by cloud-based Google Bard Models. Yet, in my personal projects, I’ve opted for a finely-tuned BloombergGPT model, pre-trained on an extensive dataset of financial information.

In this article, we’re taking a leap into deploying our bespoke financial model using AWS SageMaker. Our model of choice? FinBERT. This robust choice promises to bring a new dimension to our analysis, and we’ll delve into the intricacies of setting it up and harnessing its potential for more accurate and nuanced financial sentiment evaluation

In this article, we’re spotlighting the FinBERT model. Worth noting, the insights shared apply not only to FinBERT but also to any Large Language Model (LLM) on Hugging Face. Short and sweet, the principles here can guide you through various models, offering flexibility for your specific language model preferences.

  1. Why is FinBERT my choice of financial model?

For a detailed understanding, check out the FinBERT article by Prosus AI. In essence, Financial Sentiment Analysis differs from regular sentiment analysis.

  • In the finance world, a company making a loss may still be seen positively if it’s better than analysts’ expectations.
  • Conversely, a profitable company falling short of market expectations can generate negative sentiment.
  • Sometimes, seemingly good news triggers large institutional sell-offs, causing prices to drop.

Relying solely on news sentiment may not suffice; factors like bulk trades and additional data are crucial for a comprehensive analysis.

Yes, take a deep breathe…

Unless a model is trained to such nuances, it’s very hard to understand what really went on. FinBERT is one such. More details about model here.

2. What is BERT?

BERT, which stands for Bidirectional Encoder Representations from Transformers has achieved state-of-the-art performance in various benchmark datasets and has become a foundational model in the field of natural language processing. What sets BERT apart is its bidirectional approach to language understanding. Unlike traditional language models that read text in one direction (either left-to-right or right-to-left), BERT considers both directions simultaneously. This bidirectional context allows BERT to capture complex relationships and nuances in language, making it particularly effective in tasks such as text classification, sentiment analysis, question answering, and more.

Different from the Sesame Street BERT though!

3. How are we going to use AWS SageMaker to host our own LLM?

What we are going to build will look something like this.

  • In Short we use SageMaker to host our Language Model. we need a dedicated machine for this
  • Lambda function( a small piece of python code triggered by event like API call ) to conveniently interact with language model. We do not need dedicated server, this is serverless, the code just executes and produce results on-demand.
  • API Gateway — this is how external systems including local machine(anything outside of AWS ecosystem) can interact with the model, This will be enabled through traditional REST APIs.
  • For more details — API Gatway, Lambda Functions and SageMaker.

4. Concerned about the cost of hosting your own Language Models?

Let’s break it down. Using a pay-per-token model like GPT-4 in Azure Open AI, cost will look something like below.

For a decent accuracy, we will use GPT-4

  • Cost of Completion Tokens > Cost of Prompt Tokens. Fortunately for our use case, there is more prompt token and less completion token(10:1 ratio)
  • Approximately 1 token ~= 4 chars in English
  • We will easily have 250 words(~400 tokens) per request. It costing 0.03$ for every 1000 tokens.
  • In a day we will easily perform 1000 requests costing us 400K tokens ~ 12$
  • Since response is 1/10th and cost is doubled, it’s going to be 2.4$ making it a total cost of ~14.5$/day.

When hosting our own model in AWS SageMaker, without fine-tuning, the cost breakdown is as follows:

1. AWS Instance for Notebook (ml.t3.medium):

  • Notebook is needed to deploy the language model. we run code that deploys the model.
  • 1 instance at $0.05 per hour
  • Created for 1 hour per day and later deleted

2. AWS Instance for Language Model (ml.m5.xlarge):

  • 1 instance at $0.23 per hour
  • Operational for 8 hours per day and later deleted

3. AWS Lambda Serverless Function:

  • Runs custom Python program invoking the LLM.
  • AWS Lambda free tier includes 1M free requests and 400K GB-seconds of compute time per month

4. AWS API Gateway:

  • Allows API consumption from local machines or any location
  • Free tier includes one million API calls received for REST APIs, one million for HTTP APIs, and one million messages with 750,000 connection minutes for WebSocket APIs per month for up to 12 months

Even if there are 1M API requests per month, It’s going to be 3.5$ for the entire month. Roughly 0.12$/day(only after free tier ends).

In a smaller use case, costs for Lambda and API Gateway may be minimal, but for a production-like system, additional charges may apply if exceeding the free tier limits.

Total Cost: ($0.05 X 1 hour) + ($0.23 X 8 hours) + 0.12$ = 2.01$ per day

– Above would mean we will clean up all instances once the work is completed.

– My current setup: So everyday morning I have an automated deployment of these services in-place that takes about ~10/15 mins. and by the end of the day(after 8 hours)my services will be deleted.

– This calculation is only applicable for minimal use case and may not be applicable to all. but this will give a fair enough idea of what to expect.

Let’s dive into the most important part of it — the implementation.

  1. It all starts by us visiting the

2. Click on Deploy and Choose the Amazon SageMaker option and we will be presented with the code. Copy the code.

3. Create an AWS Account, Initially free tier may be available. Login to the AWS account and navigate to SageMaker.

4. To deploy the model, create a Notebook instance. Under the ‘Notebook’ section, initiate a new instance, providing it with a name. The process might resemble the following

5. Once the instance is up, we can open the notebook and paste the code copied from huggingface.

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration.
hub = {

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type

"inputs": "<your input text for sentiment classification goes here>",

Lets try to understand this code a bit

  1. Import boto, sagemaker and hugging face model
  2. initialize role, hub, model
  • note how we use ProsusAI/finbert for the model and text-classification for the task.

You might notice that this is only a classification model. This doesn’t provide a numeric value (-1.0 to 1.0) like Google Bard, refer our previous article here. thats just fine for now, in the upcoming article we will use another model and fine tune it to predict the intensity of the market sentiment.

3. Look at this line carefully

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
  • here an ml.m5.xlarge instance is spinned up to run the model. If needed, we can change this to another instance type that suits the model.
  • executing this line takes upto ~10 mins for the model to be ready.
  • Once the model is ready, we can run the predict

4. The notebook should look something like this

  • the line predictor.predict generates a score and label as output.

5. Now, deploy the model as a service by creating a Lambda function. Give it a name and paste the below provided code under the Code tab.

import json
import boto3

def lambda_handler(event, context):
client = boto3.client('runtime.sagemaker')
payload = json.dumps(event)
model = "huggingface-pytorch-inference-xxxx-xx-xx-xx-xx-xx-xxx"
response = client.invoke_endpoint(EndpointName=model, ContentType='application/json', Body=bytes(payload,'utf-8'))
result = json.loads(response['Body'].read().decode())
return {
'statusCode': 200,
'body': result

6. As you might see, we are just invoking the model using invoke_endpoint and just parsing the reponse from the model.

7. Replace the model value with your model. You can find it under the SageMaker Inference tab. Copy the Name not the ARN.

Legends says put the model variable under Environment configurations and refer it appropriately in the code.

7. The above code just invokes the SageMaker model and pass the response along with statusCode.

event and context comes by default. we shouldn’t attempt to change this. event is where the JSON input from API request comes.

This is just a plain lamda_handler. A production ready instance of lambda_handler will handle more scenarios than this.

8. Click on Deploy under the code tab each time a change is made to the code.

9. Make sure to update the policies by following this article. this ensure there is no permission denial between lambda and sagemaker inference model.

9. Go to test tab and Enter the input as below and hit Test

A successful output will look something like this.

10. Next expose the lambda function over API. Go to AWS open API Gateway Service, create a new API Gateway and choose REST API and click build.

11. Give it a name and Create a method. Make it a post and choose Lambda function

12. In the Lambda configuration, a dropdown menu will appear where you can choose the appropriate Lambda function you’ve created. Select the correct one.

13. After creating the method, deploy the API by clicking the button in the top right corner. Choose or create a stage; think of a stage as an environment in this context.

14. Make sure to enable throttling to avoid a burst of API calls giving a huge dent in the wallet. Yes, that happens to everyone playing around with LLM these days!

15. For security, its advisable to generate Authorizers as well. or simply create an API Key under API Gateway(newly created resource).

Thats it!! we have deployed and hosted our own Language Model now.

We can test the API using Postman or using a simple python program as well.

import requests
import json

API_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
URL = ''

body = {
"inputs": """In light of recent discussions surrounding inquiries by the Enforcement Directorate (ED), Paytm Payments Bank clarified that no probe against Bank or One97 Communications Ltd by the probe agency. "One 97 Communications Ltd and Paytm Payments Bank operate with the highest ethical standards. We can confirm that neither we nor OCL's founder-CEO have been the subject matter of investigation by the Enforcement Directorate regarding money laundering," a spokesperson for Paytm Payments Bank said, news agency ANI reported.
If there are any fresh charges of money laundering against Paytm by RBI, those will be investigated by Directorate of Enforcement as per the law of the land," Reuters quoted Revenue Secretary Sanjay Malhotra.
The clarification comes amidst reports suggesting otherwise. "Occasionally, some merchants on our platforms have been the subject of inquiries, and we cooperate fully with authorities in such instances. We categorically deny any involvement in money laundering activities and believe fair and responsible journalism is crucial for accurate information dissemination," the spokesperson added.

data = json.dumps(body)

response =, data=data,headers={'Content-Type':'application/json','x-api-key':API_KEY})


Make sure to replace the API_KEY and URL and run it!!

Please make sure to delete the model and all the other services associated with it once done. Especially the instances ml.m5.xlarge and ml.t3.medium. Remember its charged / hour.

I’m intrigued by how effortlessly AWS and HuggingFace have simplified this for us. Curious, I inquired with ChatGPT, and here’s what it had to say.

Deploying an AI model with AWS and HuggingFace feels like waving a wand and making magic happen. Thanks to the seamless integration and user-friendly interfaces, what used to be a complex process has become as enchanting as a spell. Now, turning your machine learning dreams into reality is as easy as saying Abracadabra with the help of AWS and the magical touch of HuggingFace!

In the upcoming article, we will discuss about fine tuning a language model trained on large finance corpus and see how the fine tuned model is able to perform better on the market conditions.

Thanks for reading it this far. If you find this post useful, please leave a clap or two, or if you have suggestions or feedbacks, please feel free to comment, It would mean a lot to me!

Incase of queries or details, Please feel free to connect with me on LinkedIn or X(formerly twitter).

Source link

Source link:——llm-5

What do you think?

Leave a Reply

GIPHY App Key not set. Please check settings

25 ChatGPT SEO prompts. Here’s an explanation for each of the… | by Online Business Pro | Feb, 2024

25 ChatGPT SEO prompts explained in detail for online businesses. #ChatGPTSEOtips

Creating movie trailers with AI. AI has been used in the movie industry… | by Dimitre Oliveira | Feb, 2024

AI revolutionizes movie industry with cutting-edge trailer creation technology. #AIinMovieTrailers