#**Goal:**

**The primary goal of this project is to create a chatbot capable of answering user queries based on content extracted from a set of provided PDF documents. The chatbot is trained using a Language Model (LLM), and the information is stored and retrieved from a vector database. The project aims to showcase the ability to leverage both pre-trained language models and vector databases to build an intelligent and informative chatbot.**

#**Installing Required Packages**

In [None]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade
!pip install langchain einops accelerate transformers bitsandbytes scipy
!pip install xformers sentencepiece
!pip install llama-index==0.7.21 llama_hub==0.0.19
!pip install sentence-transformers
!pip install gradio==3.48.0

Looking in indexes: https://download.pytorch.org/whl/cu117


#**Importing Transformer Classes and Setting up Model**

In [None]:
# Import transformer classes for generaiton
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Import torch for datatype attributes
import torch

In [None]:
# Define variable to hold llama2 weights naming
name = "mistralai/Mistral-7B-Instruct-v0.2"
# Set auth token variable from hugging face
auth_token = "AUTH_TOKEN"

In [None]:
# Create tokenizer
tokenizer = AutoTokenizer.from_pretrained(name,
    cache_dir='./model/', use_auth_token=auth_token)



tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
# Create model
model = AutoModelForCausalLM.from_pretrained(name,
    cache_dir='./model/', use_auth_token=auth_token, torch_dtype=torch.float16,
    load_in_8bit=True)



config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

#**Running the Model to Generate Output**

In [None]:
# Setup a prompt
prompt = "### User:What is the fastest car in  \
          the world and how much does it cost? \
          ### Assistant:"
# Pass the prompt to the tokenizer
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Setup the text streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

**created a prompt, passing it to the tokenizer, setting up a text streamer, and running the model to generate output text based on the prompt.**

In [None]:
# Actually run the thing
output = model.generate(**inputs, streamer=streamer,
                        use_cache=True, max_new_tokens=float('inf'))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


I'd be happy to help answer your question. However, it's important to note that the title of the fastest car in the world can change as new models are released. As of now, the SSC Tuatara holds the record for the fastest production car with a top speed of 316.11 mph (504.87 km/h). However, this speed has not been officially recognized by Guinness World Records yet. 

As for the cost, the SSC Tuatara is priced at $1.9 million. Please keep in mind that prices can vary based on customizations and other factors.


In [None]:
# Covert the output tokens back to text
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

#**Importing and Setting up LLM**

In [None]:
# Import the prompt wrapper...but for llama index
from llama_index.prompts.prompts import SimpleInputPrompt
# Create a system prompt
system_prompt = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain
why instead of answering something not correct. If you don't know the answer
to a question, please don't share false information.

Your goal is to provide answers relating to
the company.<</SYS>>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

In [None]:
# Complete the query prompt
query_wrapper_prompt.format(query_str='hello')

'hello [/INST]'

**import and setting up the LLM (HuggingFaceLLM) using the llama index wrapper. This includes creating a system prompt and a query wrapper prompt.**

In [None]:
# Import the llama index HF Wrapper
from llama_index.llms import HuggingFaceLLM
# Create a HF LLM using the llama index wrapper
llm = HuggingFaceLLM(context_window=4096,
                    max_new_tokens=256,
                    system_prompt=system_prompt,
                    query_wrapper_prompt=query_wrapper_prompt,
                    model=model,
                    tokenizer=tokenizer)

#**Bringing in Embeddings Wrapper**

In [None]:
# Bring in embeddings wrapper
from llama_index.embeddings import LangchainEmbedding
# Bring in HF embeddings - need these to represent document chunks
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

**bringing in the embeddings wrapper and HuggingFace embeddings to represent document chunks.**

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# Create and dl embeddings instance
embeddings=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

#**Setting up Service Context**

**setting up the service context for my application, including the LLM and embeddings.**

In [None]:
# Bring in stuff to change service context
from llama_index import set_global_service_context
from llama_index import ServiceContext

In [None]:
# Create new service context instance
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embeddings
)
# And set the service context
set_global_service_context(service_context)


#**Importing Dependencies to Load Documents**

importing dependencies to load documents and creating a VectorStoreIndex from the loaded documents.

In [None]:
# Import deps to load documents
from llama_index import VectorStoreIndex, download_loader, SimpleDirectoryReader
from pathlib import Path

In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-3.17.4-py3-none-any.whl (278 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.17.4


In [None]:
documents = SimpleDirectoryReader("/content/sample_data/Data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
len(documents)

1140

In [None]:
# Setup index query engine using LLM
query_engine = index.as_query_engine()

#**Testing Queries and Printing Results**

In [55]:
print("Result from PDF 1")
response1 = query_engine.query("What is total investment necessary to begin operation of a BIGGBY® COFFEE franchise?")
print(response1)

Result from PDF 1


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The total investment necessary to begin operation of a BIGGBY® COFFEE franchise is from $202,450 to $418,700. This includes $25,750 to $38,750 that must be paid to the franchisor or its affiliates. The document provides a breakdown of estimated expenditures for various categories such as insurance, utilities, license permits, initial advertising and grand opening promotions, organizational expenses, and additional funds for the first three months. It's important to note that these are estimates and actual costs may vary.


In [None]:
print("Result from PDF 2")
response2 = query_engine.query("What is total investment necessary to begin operation of a Wahlburgers franchise?")
print(response2)

Result from PDF 2


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Based on the information provided in the Franchise Disclosure Document (FDD), the total estimated initial investment for a Wahlburgers Master Franchise ranges from $1,106,000 to $1,191,000. This includes expenses such as a development fee, furnishing, fixtures, equipment, computer software and system, travel and living expenses during training, legal and accounting fees, franchise registration fees, additional funds for the first three months of operation, and other miscellaneous costs. However, it's important to note that these figures are estimates and actual costs may vary depending on specific circumstances. Additionally, none of the costs shown on the table are refundable unless a supplier has a refund policy of which Wahlburgers is not aware.


In [None]:
print("result from pdf 3")
response3 = query_engine.query("What is total investment necessary to begin operation of a Bloomin’ Blinds franchise?")
print(response3)

result from pdf 3


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Based on the information provided in the Franchise Disclosure Document (FDD), the total investment necessary to begin operation of a Bloomin’ Blinds franchise can range from $62,570 to $137,425. This includes expenses such as travel and meals for initial training, additional funds for the first three months, initial franchise fee, start-up expense fee, insurance, vehicle, vehicle signage, office expenses, inventory, licenses and permits, professional fees, and uniforms. However, it's important to note that some expenses, such as rent, utilities, and leasehold improvements, may vary depending on individual circumstances and are not included in this range. Additionally, the franchisee is responsible for paying these expenses directly to the respective vendors or service providers.


In [None]:
print("Result from PDF 4")
response4 = query_engine.query("What is total investment necessary to begin operation of a Amazing Athletes franchise?")
print(response4)

Result from PDF 4


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Based on the information provided in the context, the total investment necessary to begin operation of a Amazing Athletes franchise offering the Complete AA Program ranges from $43,650 to $64,950. This includes the initial franchise fee, training expenses, furniture and equipment, startup kit and initial uniforms and marketing materials, computer system and technology maintenance fee, insurance and professional services, permits, licenses and certifications, and funds for the first three months of operation. Please note that this estimate does not include an owner's salary or draw.


In [None]:
print("Result from PDF 5")
response5 = query_engine.query("What is total investment necessary to begin operation of a Atomic Wings franchise?")
print(response5)

Result from PDF 5


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Based on the information provided in the Franchise Disclosure Document (FDD), the total investment necessary to begin operation of a Atomic Wings franchise ranges from $119,750 to $332,500. This includes expenses such as leasehold improvements, furniture and fixtures, insurance, advertising, travel and living expenses, vehicle expenses, filing fees, professional fees, and additional funds for the first three months of operation. It is important to note that this does not include any financing fees or other expenses not listed in the FDD. Additionally, the franchisor does not finance any portion of the initial investment.


#**Deployment in Gradio**

In [None]:
torch.cuda.empty_cache()


In [None]:
import gradio as gr

# Define the function to handle user inputs and return responses
def chatbot_interface(user_input):
    response = query_engine.query(user_input)
    answer = response['answer']
    return answer

# Create the Gradio interface
iface = gr.Interface(
    fn=chatbot_interface,
    inputs=gr.Textbox(),
    outputs=gr.Textbox(),
    capture_session=True  # This is to capture the CUDA sessions if you are using GPU
)

# Add a button to trigger the chatbot response
iface.launch(share=True)


  iface = gr.Interface(


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://1af141e8f07f63e5e8.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


