How to build Copilot using GPT4

Vinay KumarVinay Kumar
Vinay Kumar
March 31, 2023
How to build Copilot using GPT4
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


LLMs have taken the world by storm and redefined how we interact with machines. ChatGPT became the most used general-purpose copilot. The biggest complaint about ChatGPT was the knowledge base. It is trained on public datasets until Sep 2021. For many serious users, the quality of the knowledge sources is extremely important. And second, it can never have access to certain verticalized knowledge which is inside the organizations. This is where verticalization presents a significant opportunity to bridge these gaps. Vertical-specific or Topic-specific Copilots are the next big thing in knowledge-based assistants. The recent update from OpenAI, offering plugins on ChatGPT, allows such specialized copilots to be deployable within ChatGPT. But like Microsoft’s strategy to build Copilots (product-specific, topic-specific), every other organization will aim to own and mark their Copilot as the industry's best to assist users.   

This article offers a starting approach to building such professional copilots, where we show how to build a copilot using GPT-4. The bot uses OpenAI's GPT-4 to answer developer and natural language questions.

We divide this process into three stages: 

  1. Data Preparation
  2. Preparing Embeddings
  3. Designing the Copilot module

Stage 1: Data Preparation: 

To begin with, let's prepare the data. You may have the knowledge base stored in different locations. So, let’s collate them and store them in one place. 

Langchain’s ‘Document Loaders’ or Llama-Index (earlier GPT-index) can be used to load the data. They provide multiple off-the-shelf options for loading data. 

In our case, we have our knowledge base in websites, documentation, whitepaper and videos.

From Websites: 

From the sitemap, we created a list of URLs that need to be scraped for the knowledge base. This list was then stored as ‘urls’. 

from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(urls=urls)
urls = loader.load()

This stores the parsed data along with metadata like source URL. 

[Document(page_content="PARSED CONTENT", lookup_str='', metadata={'source': 'URL'}, lookup_index=0)]

Langchain’s URL loader was used to scrape the data from these URLs. You can add any URL to the list and prepare the knowledge base.

From PDFs: 

Our whitepapers and research papers were stored as PDFs, and Langchain supports PyPDF. We stored all our whitepapers in a directory and loaded them in Langchain ‘pdf’ loader. 

import os
import glob
from langchain.document_loaders import PyPDFLoader

# Replace this with the path to your directory of PDFs
pdf_dir = "/content/AryaXAI Copilot/White Papers"

# Get a list of all PDF files in the directory
pdf_files = glob.glob(os.path.join(pdf_dir, "*.pdf"))

# Load each PDF file using the PyPDFLoader
pages = []
for pdf_file in pdf_files:
   loader = PyPDFLoader(pdf_file)
   pdf_pages = loader.load_and_split()

# Print the list of pages

From text/transcription files: 

The transcriptions from our webinars, tutorials etc. were stored in a directory, which was then loaded using Langchain ‘Directory Loader’.

from langchain.document_loaders import DirectoryLoader

# Load the texts
loader = DirectoryLoader('/content/AryaXAI Copilot/Video Transcriptions', glob='**/*.txt')
texts = loader.load()

Merging the lists.:

Next, we merged all this knowledgebase into one list.

# Merge the urls, pages and  lists into a single list
documents = []

Split the data: 

To ensure the context is precise and minimize the token length, the data must be split before storing the embeddings.. Langchain has multiple text splitter options - more info here: Text Splitters

# Split the documents into text chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=25)
texts = text_splitter.split_documents(documents)

Change the chunk sizes if you face a 'MaxToken' error.

Embedding the data: 

You can use OpenAI embeddings to convert your textual data into high-dimensional vectors that can later be used to create conversations, summaries, searches, etc. 

How to store the embeddings: 

There are many databases which can be used to store the embeddings. Langchain has support for both open-source and paid options. Read more about various options here.

ChromDB and FAISS are the most used options when building a demo application. AtlasDB, Pinecode and Qdrant are paid options that are scalable.

from langchain.vectorstores import FAISS
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = FAISS.from_documents(texts, embeddings)

Note that this is running the embeddings whenever you are running the app. Instead, you can run the embeddings, store them in a DB, JSON or Pickle file and load them while loading the application. This is where DBs can help. 

We stored the embeddings in pinecone and updated them frequently to add additional knowledge.

import pinecone
from langchain.vectorstores.pinecone import Pinecone

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Pinecone.from_documents(texts, embeddings, pinecone_client=pinecone)

This automatically creates an index in pinecone along with embeddings. 

Creating Copilot:

Once the embeddings are created, you are ready to create the Copilot for your apps/ data. 

The requirements of your ‘Copilot’ can be very specific to your use case. It can simply be answering a question or engaging in a conversation using your knowledge source. These are primarily differentiated by two key components - ‘Prompting’ and ‘Memory’. They need to be specifically optimized for various use cases. 

Langchain has multiple options for building Copilots:

  • load_qa_with_sources_chain
  • VectorDBQA.from_chain_type
  • VectorDBQAWithSourcesChain.from_chain_type
  • ConversationChain

And OpenAI has two API types that can be used for building Copilot.

  • openai.Completion.create
  • openai.ChatCompletion.create

Langchain provides ad hoc functions to use these directly. 

Copilot: To provide answers to questions as a QnA with sources

If the use case is a simple QnA module, you need to get the snippets with high similarity with the question, combine these snippets and summarize them along with the prompt. 

For simple QnA, you can use any of the above options for your QA module. 

llm = OpenAI(temperature=0.1, model_name="gpt-4", max_tokens=256)
qa = VectorDBQA.from_chain_type(llm=llm, chain_type="stuff", vectorstore=docsearch, return_source_documents=True)

Now let's define a prompt for this:

initial_prompt = "For this conversation, you are a data scientist but working as a customer success manager at AryaXAI. AryaXAI is an ML Observability platform to help users build explainable, auditable and safe AI. It helps users to explain models that are understandable by everyone, and monitor models for drifts, performance, and privacy to improve model performances. Consider the input as a question on explaining AryaXAI and convincing the users to use AryaXAI "

It is now ready to take queries:

query = "how can aryaxai help me if I'm in banking?"
result = qa({"query": query, "prompt": initial_prompt})

This is run the query on ‘qa’ and gives  the result. 

Let's see the result: 


AryaXAI can help you in banking by providing explainable AI solutions that can translate complex deep-learning models into understandable insights. This can assist in areas such as credit risk assessment, fraud detection, and customer segmentation by offering transparent and interpretable results, which can improve decision-making and regulatory compliance in the banking industry.

You can also find the source documents from where it is fetching the answer. 


The same can be done using ‘load_qa_with_sources_chain’ from Langchain. This example is provided in the colab. 

Copilot: A Conversational Engine using Langchain

A memory function needs to be added for converting this QnA engine into a conversation engine. You can use the ‘Memory function’ from Langchain directly, which does provide various options like ConversationalBufferMemory, ConversationalSummaryMemory etc. In our case, we wanted to customize memory per our requirements or even use a different LLM to build the memory. 

Things to remember when building ‘Memory’:

  • It will use tokens to summarize and also for prompts. The bigger the memory, you may reach the token limit, which will also be expensive to answer. 
  • It can also be built to remember only the last ‘n’ conversations if you are confident that the token is within the limit.

To build memory, we simply captured and summarized the conversation history before answering a new question. 

previous_query = query
previous_answer = result["result"]
docs = [Document(page_content=previous_query + previous_answer)]
chain = load_summarize_chain(llm, chain_type="map_reduce")

‘Summarization chain’ uses the ‘llm’ defined and summarises the previous conversation. 

Next, we built conversational interaction with our copilot. 

Copilot: Designing the conversational copilot with customizable doc search and few-shot learning

Customizing Document Search

Another component that can be customized is the ‘document search’. Instead of using the default document search in Langchain, your own document search can be used and passed on to the conversational engine for responding to the query. 

In this example, we defined the document search separately and passed these results as ‘Contexts’ to the conversational engine. 

def documents(query):
   # Create an embedding for the input query
   doc = openai.Embedding.create(

   # Retrieve the documents from Pinecone
   xq = doc['data'][0]['embedding']

   # Get relevant contexts (including the questions)
   res = psearch.query(xq, top_k=4, include_metadata=True)
   contexts = [x['metadata']['text'] for x in res['matches']]
   sources = [x['metadata']['source'] for x in res['matches']]
   return contexts, sources

Why do you need a custom document search?

If needed, the search algorithm can be modified, and you can define cut-off cosine distance or/and ‘n’ of search results etc., to create custom ‘contexts’ which will be passed on the conversational engine along with the prompt. You can also use document ranking, which along with the cosine score can be used to pick the more preferred 'Context'. For example - if shortlisted, you may want to use certain information sources more in the answers than others. 

Advance Prompting: 

Recent updates from OpenAI allow users to pass on more information within the prompts using ‘Roles’. Few-shot learning can be done by passing sample prompts and responses within the prompt and jump-starting the learning. 

def chat_complete(query, system_prompt):
   chat = openai.ChatCompletion.create(
           {"role": "system", "content": system_prompt},
           {"role": "user", "content": query},
   answer = chat['choices'][0]['message']['content']
   return answer

In ‘messages’, your own prompts can be added for ‘System’, ‘User’ and ‘assistant’. For few-shot learning, add the examples within the prompt and under these roles in multiple lines. This will act like in-context feedback to the engine before answering the question.  

Final Notes: 

Copilots are helpful AI Assistants that provide answers or direct to the right sources for answers. Copilots built on vertical-specific or topic-specific knowledge can differentiate in the quality of the answer compared to general purpose Copilots. We have created detailed documentation on how to build Copilots, and various ways to build them, primarily using Langchain and OpenAI. You can use the same components with other LLMs too. Within OpenAI, the next addition to the Copilot is using ‘agents’. We will cover this in our future articles.

Access the Colab notebook here:

You can access the demo of this Copilot here: 


See how AryaXAI improves
ML Observability

Learn how to bring transparency & suitability to your AI Solutions, Explore relevant use cases for your team, and Get pricing information for XAI products.