Enhancing LLM Responses through Document Objects and Retrieval Methods

Get AI, Live! Team
Sep 2
21 min read

Techniques for Implementing Generative AI Document Objects and Vector-Based Datastores to Optimize LLM Performance and Accuracy

General purpose large language models (LLM) continue to evolve and improve on performance, cost, and accuracy. Generative AI constructs such as reinforcement learning from human feedback (RLHF) and model context protocol (MCP) are providing the path for multi-model, custom tooling networks that extends the LLM utility as a reactive agent with the potential to compound acquired knowledge at elevated accuracy and precision. Despite the prodigious advancements, providing the LLM with the most current data to reason with is still a function of fine tuning, pre and post training, and external data collection that allows for the integration of custom and/or proprietary content.

Improvements have been rapid in the systems and methods to introduce current content to a generative AI workflow and the ability for the LLM to efficiently reference the content as it reasons and formulates a response. Reasoning begins with a query or instruction that is routed into a system tasked with producing a contextual response most similar to the input primitives. The mechanisms to decorate the inputs with context that reflect current or custom content that is not part of the LLM fine tuning or pre/post training heuristics can take on many permutations weighted by the objective function for the desired response. For example, there are document objects that enable context to be introduced into the input query or instruction directly without referencing external data sources, which is not entirely scalable. There are also retriever functions that broadcasts LLM input requests across multiple vector indexes and then use similarity functions to aggregate a consolidated response.

This article provides insight into various patterns of associating context to LLM input sequences and the supporting functions to preprocess, store, and retrieve an LLM’s response. We will dive deeper into Langchain document objects to manage page content, build custom retrievers to directly prompt responses, and vectorstores to load and retrieve additional context when needed. We will also explore retriever routing using Pydantic parsers to determine the most similar retriever to use for querying a vectorstore based on the retriever’s own similarity to an input request or query.

1. Loading Documents

The Langchain document class provides the parameters that can store additional data and be used by the LLM when responding to the input request. The document parameters function as the object primitives that a prompt uses to answer a user’s question, find specific information based on a query, or aggregate related data to improve a responses accuracy. In addition to the parameter context, called page_content, the document class also provides metadata parameters to decorate the content and ids to differentiate the content types.

In our first example, we will use two prompt templates to facilitate the input instructions to the LLM, and the document object wrapped into a document prompt that other input prompt templates can reference for context. All examples use the Anthropic Claude model and is referenced through the LLM parameter in our chain definitions.

The input prompt below are the instructions for the LLM to use to generate a response using the context included in the document prompt. We will use page_content to store the context data and the source parameter used in the metadata to identify the source of the page_content.

input_prompt = PromptTemplate.from_template(prompt_template)document_prompt = PromptTemplate(template="Context:\ncontent:{page_content}\nsource:{source}", input_variables=["page_content", "source"],)

In our example we prompt the LLM to use the context provided and answer the question in a single sentence because a context parameter might have a large volume of reference data and can be reciprocated in the LLMs response.

prompt_template = """The following document context will help you answer the question, therefore use it when answering the question. You must use the following rules when answering the question as well: 1. Answer the question using a single sentence that is concise and relevant to the question being asked.2. For each answer, make sure to provide the sources that were used.3. If you do not know the answer or do not have enough context to answer the question, do not try to guess or assume you know the answer. 4. If you do not know the answer, tell the user that **I am sorry, but cannot answer the question because I do not have enough information, but here are some linksthat you can use to look up the answer**, then include links that the user can use to help answer their question.{context}Question: {question}Helpful Answer:"""

We then combine our input prompt and document prompt into a chain using Langchain’s create_stuff_documents_chain. This allows us to pass document objects that the input prompt can use for context and directed to the LLM for response generation.

But how do we associate the expected context variable name in our input prompt template to the document variable name? The create_stuff_documents_chain provides a document_variable_name parameter that assigns a variable name to the document prompt, and which we can use to associate to the input prompt. The default variable name is ‘context’, which we have already used in the input prompt template and thus do not need to override in the document_prompt definition.

combine_document_chain = create_stuff_documents_chain (
llm=llm, prompt=input_prompt, document_prompt=document_prompt,        
# document_variable_name="context", this is declared by default which is where the document context is referenced.    
)

Now that the data structures have been defined, we need to route the user query to a data collection mechanism and send back a response.

The retriever interface in Langchain provides the ability to consume a user query in natural language format and produce results that are packaged into a document object. We can then use the invoke method from the retriever interface to return the document objects formatted into prompts and have the LLM generate a response to a query or request input. The retriever allows this because the retriever interface is a runnable and part of the Langchain expression language declaratives. In our example we will extend Langchain’s baseretriever class to create a custom retriever that can return either document objects directly or documents generated by loading a file.

retriever = CustomRetriever(format="documents")   
retriever = CustomRetriever(format="files")

We also need to implement the getrelevant_documents to determine how the documents will be processed and returned. First, extend the baseretriever class called CustomRetreiver and pass in a parameter to decide which type of document to be returned. In our example we request that a document or file be loaded first and then respond with document objects for that request. We then implement the getrelevant_documents to return the document object requested by the calling function. The format variable is the input that is passed which tells the function which decision branch to take and document object to return. The PyPDFLoader loads a PDF file and retrieves a parsed document type with page content and metadata. We will use the lazy loading method to load resources as needed which helps to improve performance but can also use the load() function directly as an alternative.

class CustomRetriever(BaseRetriever):    
"""Always return three static documents for testing. """    
format: str    

def _get_relevant_documents(            
self, query: str, *, run_manager: CallbackManagerForRetrieverRun    
) -> List[Document]:        
if self.format == "documents":            
return [Document(page_content="Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.",                         
        metadata={"source": https://en.wikipedia.org/wiki/Prompt_engineering}),                
        Document(page_content="Reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences.", 
        metadata={"source": https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback}),                       Document(page_content="The Model Context Protocol (MCP) is an open standard, open-source framework to standardize the way artificial intelligence (AI) models like large language models (LLMs) integrate and share data with external tools, systems, and data sources.",                         
       metadata={"source": https://en.wikipedia.org/wiki/Model_Context_Protocol}),            
]        

elif self.format == "files":
pages = []
loader = PyPDFLoader("../datasets/agents.pdf")            

for doc in loader.lazy_load():                
pages.append(doc)            
return pages

We then use Langchain’s create_retrieval_chain to combine our retriever and input chain and produce a runnable with context. The retrieval chain has the retriever as input that we implemented to collect documents from our baseretriever, which we extended using the custom retriever override above. The other input is the document chain that we created for passing our request and document prompts to the model.

chain = create_retrieval_chain(retriever, combine_document_chain)

To run the document chain with a custom retriever, we use the invoke method to send the request through the document loading and selection, retrieval, and the LLM response.

# using format=documents 
print(chain.invoke({"question": "Summarize generative AI prompt engineering.", "input": "text"}))

>> {'question': 'Summarize generative AI prompt engineering.', 'input': 'text', 'context': [Document(page_content='Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.', 
metadata={'source': 'https://en.wikipedia.org/wiki/Prompt_engineering'}), 
Document(page_content='Reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences.', 
metadata={'source': 'https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback'}), Document(page_content='The Model Context Protocol (MCP) is an open standard, open-source framework to standardize the way artificial intelligence (AI) models like large language models (LLMs) integrate and share data with external tools, systems, and data sources.', metadata={'source': 'https://en.wikipedia.org/wiki/Model_Context_Protocol'})], 

'answer': 'Generative AI prompt engineering is the process of structuring or crafting an instruction to produce the best possible output from a generative artificial intelligence (AI) model.\n\nSources:\n- https://en.wikipedia.org/wiki/Prompt_engineering'}

# using format=files 
print(chain.invoke({"question": "Summarize multi-agent collaboration based on the context you have.", "input": "text"})['answer'])

>> Multi-agent collaboration pattern works as follows: 1. Specialized Agents: Each agent is designed to perform specific roles or tasks, such as a software engineer, a product manager, or a designer. 2. Task Decomposition: Complex tasks are broken down into smaller, manageable subtasks that can be distributed among the agents. 3. Communication and Coordination: Agents interact with each other, exchanging information and coordinating their actions to achieve common goals. 4. Distributed Problem Solving: The system uses the collective capabilities of multiple agents to address problems too complex for a single agent.Sources:- ../datasets/agents.pdf

The code for our first example:

from langchain_core.callbacks import CallbackManagerForRetrieverRunfrom langchain_core.documents import Documentfrom langchain_core.prompts import PromptTemplatefrom langchain_core.retrievers import BaseRetrieverfrom langchain.chains import create_retrieval_chainfrom langchain.chains.combine_documents import create_stuff_documents_chainfrom langchain_community.document_loaders import PyPDFLoaderprompt_template = """The following document context will help you answer the question, therefore use it when answering the question. You must use the following rules when answering the question as well: 1. Answer the question using a single sentence that is concise and relevant to the question being asked.2. For each answer, make sure to provide the sources that were used.3. If you do not know the answer or do not have enough context to answer the question, do not try to guess or assume you know the answer. 4. If you do not know the answer, tell the user that **I am sorry, but cannot answer the question because I do not have enough information, but here are some linksthat you can use to look up the answer**, then include links that the user can use to help answer their question.{context}Question: {question}Helpful Answer:"""

class CustomRetriever(BaseRetriever):    
"""Always return three static documents for testing. """    
format: str    
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun    
) -> List[Document]:        
if self.format == "documents":
return [
Document(page_content="Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.",                         
metadata={"source": https://en.wikipedia.org/wiki/Prompt_engineering}),                
Document(page_content="Reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences.",                         
metadata={"source": https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback}),               Document(page_content="The Model Context Protocol (MCP) is an open standard, open-source framework to standardize the way artificial intelligence (AI) models like large language models (LLMs) integrate and share data with external tools, systems, and data sources.",                         
metadata={"source": https://en.wikipedia.org/wiki/Model_Context_Protocol}),            
]        

elif self.format == "files":            
pages = []            
loader = PyPDFLoader("../datasets/agents.pdf")            
for doc in loader.lazy_load():                
pages.append(doc)            
return pages

def load_docs():    
retriever = CustomRetriever(format="files")   
#can toggle with either files or documents:    
#retriever = CustomRetriever(format="documents")    
qa_chain_prompt = PromptTemplate.from_template(prompt_template)    
document_prompt = PromptTemplate(
template="Context:\ncontent:{page_content}\nsource:{source}",        
input_variables=["page_content", "source"],    
)    
combine_document_chain = create_stuff_documents_chain(llm=llm, prompt=qa_chain_prompt, document_prompt=document_prompt,        
# document_variable_name="context", this is declared by default which is where the document context is referenced.    
)    
chain = create_retrieval_chain(retriever, combine_document_chain)    

print(chain.invoke({"question": "Summarize generative AI prompt engineering.", "input": "text"}))    

print(chain.invoke({"question": "Summarize multi-agent collaboration based on the context you have.", "input": "text"})['answer'])

2. Vector Datastores

In the next example, we will create document objects like our prior example but this time use a vectorstore to load and retrieve the document parameters. We will use the base document class to define our page content and metadata parameters, including an additional id parameter which allows us to identify a document using a unique identifier.

def load_docs_vectorstore():    
doc1 = Document(page_content="Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.",      
metadata={"source": https://en.wikipedia.org/wiki/Prompt_engineering},                    
id=1, )    

doc2 = Document(page_content="Reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences.",                    
metadata={"source": https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback},
id=2, )    

doc3 = Document(page_content="The Model Context Protocol (MCP) is an open standard, open-source framework to standardize the way artificial intelligence (AI) models like large language models (LLMs) integrate and share data with external tools, systems, and data sources.",                    
metadata={"source": https://en.wikipedia.org/wiki/Model_Context_Protocol},                    
id=3, )    

documents = [doc1, doc2, doc3]    
uuids = [str(uuid4()) for _ in range(len(documents))]

Instead of calling a custom retriever to augment the query with document parameters, we will first load the documents created to a vector database and then create a retriever that can read and respond to the user’s request.

A vector database enables data (text, images, video) to be transformed and embedded into numeric representations and stored into a high dimensional vectorstore. A vector is a mathematical object that has both direction and length. The direction is the overall orientation of the object in the n-dimensional database while the length, also referred to as magnitude, is the size of the vector relative to a point of origin also in the n-dimensional space. These numeric objects along with the length and magnitude primitives that describe the objects enables the vector database to congregate similar objects together and is the fabric for the LLM to retrieve results most related to the original query.The high dimensionality allows data to be localized with other similar embeddings for faster retrieval of related content and provides a malleable solution to augment input context using a vectorstore. Furthermore, using the semantic search functions of vector databases delivers a more accurate response to a query by returning the most similar vector datasets to that query.

We will use the Chroma vectorstore in our example, which is an open-source vector database that integrates well with the Langchain framework and interfaces.

We first specify the embedding model to use that will transform the documents into numeric representations and store them into the Chroma database as vectors. We will use the Amazon Bedrock embedding class and select a model that will embed the documents.

embed_model = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

Next, we initiate the Chroma instance and pass the embedding configuration that we defined above. We then call the add_documents function to add the documents created. The default parameters for the add_documents accepts a List of documents to add to the Chroma vectorstore. In our example, we will pass the document objects that we created above as a list to the add_documents including the ids for those documents.

vectorstore = Chroma(embedding_function=embed_model)vectorstore.add_documents(documents=documents, ids=uuids)

A retriever is then initialized from the vectorstore to invoke the model and collect the results. As optional parameters, we use the maximal marginal relevance (MMR) algorithm to pass to the search function, which selects the most similar vector objects to the input query while optimizing for uniqueness among the objects. We also use the fetch_k parameters to help limit the documents returned by the retriever based on the MMR similarity scores. The retriever is then invoked with the given query, references the documents (as numerical vectors in Chroma), selects the most similar documents, and the LLM infers a response to the query.

retriever = vectorstore.as_retriever(        
            search_type="mmr", search_kwargs={"k": 1, "fetch_k":5})

print(retriever.invoke("Can you tell me more about prompt engineering?"))


>> [Document(page_content='Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.', 
    metadata={'source': 'https://en.wikipedia.org/wiki/Prompt_engineering'})]

The code for our second example:

from langchain_aws import BedrockEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from uuid import uuid4

embed_model = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

def load_docs_vectorstore():    
doc1 = Document(page_content="Prompt engineering is the process of structuring or crafting an instruction in order to produce the best possible output from a generative artificial intelligence (AI) model.",                    
       metadata={"source":https://en.wikipedia.org/wiki/Prompt_engineering},id=1, )    
doc2 = Document(page_content="Reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences.",                    
       metadata={"source": https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback},id=2, )    
doc3 = Document(page_content="The Model Context Protocol (MCP) is an open standard, open-source framework to standardize the way artificial intelligence (AI) models like large language models (LLMs) integrate and share data with external tools, systems, and data sources.",                    
       metadata={"source": https://en.wikipedia.org/wiki/Model_Context_Protocol},id=3, )    

documents = [doc1, doc2, doc3]    
uuids = [str(uuid4()) for _ in range(len(documents))]    
vectorstore = Chroma(embedding_function=embed_model)    
vectorstore.add_documents(documents=documents, ids=uuids)    
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5}    )   

print(retriever.invoke("Can you tell me more about prompt engineering?"))

3. Loading Multiple Documents

Creating document objects directly may serve a viable use case in decorating context with lower volume datasets, but what about LLMs that need a larger context window to produce higher fidelity responses? We now have a method to create documents directly, embed the documents to a vectorstore, and read results using a retriever.

In this next example, we will extend the workflow to embedding full document files across multiple vectorstores. This will allow us to embed full file content into dedicated vectorstores and persist the vectorstores so we don’t have to recreate them in subsequent request routines.

First, we load a file using the PyPDFLoader as before and obtain the document type with page content and metadata. Here we are setting the mode parameter as “page”, which splits the PDF pages and associates the corresponding page number in the document metadata.

Next, we use the Langchain RecursiveCharacterTextSplitter to organize the file data based on the chunk size, which is the number of characters that will be grouped together iteratively in the file. The chunk overlap ensures we preserve some contextual information shared between character and/or sentence transitions. This text splitter is unique because it uses a predefined list of characters to spilt the document, then validate if the spilt results in a character length less than or equal to the chunk size parameter. If the length is greater than the chunk size, the splitter uses the next character in the predefined list to further split the document until the chunk size is obtained. The result of the split_documents is a list of document objects derived from the original PDF file that has been recursively split and prepared to be routed to the vectorstore.

def load_multiple_docs():    
data = PyPDFLoader("../datasets/agents.pdf", mode="page").load()    
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)    
splits = text_splitter.split_documents(data)

We again use the Chroma vector datastore as the source to load and retrieve the documents partitioned by the text splitter. The from_documents function is used to create a datastore from a list of documents while specifying the embedding model to use, which is similar to the prior example. The collection name parameter specifies the name of the collection and the persist directory will be where the collection is written to so the vectorstore will not need to be recreated again because it can be referenced from disk or memory when needed.

vectorstore = Chroma.from_documents(splits, embedding=embed_model, collection_name="agents_collection", persist_directory='./chroma_db')

Once the vectorstore has been created, we can reference the persist directory where the vectorstore is located. In most cases, persisting the vectorstore will improve the overall performance by reducing the computational cycles required to load, spilt, and create a new vectorstore during every runtime operation. A retriever wrapper is created to query the text and retrieve the documents from the vectorstore using an argument ‘k’ to limit the number of documents returned.

vectorstore= Chroma(collection_name="agents_collection",embedding_function=embed_model,persist_directory='./chroma_db',)
retriever_agents = vectorstore.as_retriever(search_kwargs={"k": 2})

We then proceed to create another vectorstore that uses a different source PDF but the same splitter and Chroma object definitions as before. The main difference other than the dataset is that we give this retriever wrapper a different name since we will let the LLM decide which retriever to use based on the user’s query. A retriever json is created and the retrievers are assigned specific names that we will search for later based on which retriever the LLM believes will best support the users query.

# second vector store and retriever    
data = PyPDFLoader("../datasets/prompts.pdf").load()    
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)    
splits = text_splitter.split_documents(data)    
vectorstore = Chroma.from_documents(splits, embedding=embed_model, collection_name="prompts_collection", persist_directory='./chroma_db')    
vectorstore= Chroma(collection_name="prompts_collection", embedding_function=embed_model,persist_directory='./chroma_db',)    
retriever_prompts = vectorstore.as_retriever(search_kwargs={"k": 1})   
retrievers = { "agents": retriever_agents, "prompts": retriever_prompts,    }

To determine which retriever and vectorstore to query we can write a function that will use an LLM to infer from the user’s query which retriever to use, match that recommendation to the json retriever key, and use the value to reference the right retriever object.

There will be two steps in our call pattern that invokes an LLM for a response. First will be to format and infer from the user’s query the retriever to use. The second will be to use the inferred retriever to instruct the LLM to generate a response to the query.

The python Pydantic parser is a useful library to validate incoming data against defined fields and rules, and can also be used to define the expected LLM output schema based on an incoming query.

For our use case, we will create a Pydantic class (called a model) that inherits from the Pydantic basemodel and define the output data structure. The input to the class will be the query that the user submits. The class defines a query field to reformat the user’s query into a question string and a topic field that returns a string that we can then use to lookup the retriever function from the list. Our topic parameter instructs the LLM to select either of the strings in the list by applying the string Literal. Both query and topic fields have description attributes that also instructs the LLM how to process and format the response.

Pydantic supports custom validation as well so we can check if each output parameter aligns to a specific constraint defined in the validator function. Since we want to ask the LLM a question, a check for a question mark is implemented after the LLM returns the reformatted response.

class Search(BaseModel):    
query: str = Field(description="Main query to search for, answer in form of a question.")    
topic: Optional[Literal["agents", "prompts"]] = Field(description="Topic to look up, should be agents or prompts.")    
@field_validator("query")    
@classmethod    
def query_ends_in_question_mark(cls, field):        
      if field[-1] != "?":            
         raise ValueError("Not in the form of a question.")        
      return field

The Pydantic parser is setup using Langchain’s PydanticOutputParser and we reference the custom search class created for our parser. A simple prompt is created to instruct the LLM along with variables that reference the user’s query and the format instructions based on the Pydantic parser that was created. The get_format_instructions method will return a string specifying how the output of the LLM should be formatted. We declare the formatting parameter as a partial variable since we can get the formatting instructions early in the chain and not wait for the input variable to be populated.

Next, we setup the chain with the call sequences and reference the chain with a query input using the Langchain invoke interface. The output and formatting of the invoke interface will consist of the parameters defined in the Pydantic class, in this case the query parameter to reformat the query in the form of a question and the topic parameter which the LLM will classify the query as either ‘agents’ or ‘prompts’.

parser = PydanticOutputParser(pydantic_object=Search)
prompt = PromptTemplate(template="You have the ability to dissect user queries and extract the correct information.\n{format_instructions}\n{query}\n", input_variables=["query"],partial_variables={"format_instructions": parser.get_format_instructions()},    )    

query_search = (
    {"query": RunnablePassthrough()} | 
      prompt| 
      llm   | 
      parser    
      )

response = query_search.invoke({"query": "Can you tell me about negative prompting?"})

Now that the LLM has determined which topic from the Pydantic literal is most similar to the user query, we can use that result as a key to the retriever list and lookup the retriever to read from the related vectorstore that was created and loaded with the PDF file earlier. We can then invoke the retriever using the reformatted user query because the retriever component in Langchain implements the runnable protocol, allowing it to invoke the chain on an input.

The retriever uses similarity searches on the vectorstore to best match the user query and returns the corresponding document objects with content and metadata. We then iterate through the objects to list the content or use the documents as context into prompts that an LLM can use to generate a response.

retriever = retrievers[response.topic]    
doc = retriever.invoke(response.query)    
for result in doc:        
    print(f"* {result.page_content} [{result.metadata}]")

>> In this method, you tell the AI what not to do. For instance, you might specify that you don’t want a certain type of content in the response. Example: “Explain the concept of Foundation Models in AI without mentioning natural language processing or NLP.” [{'page': 5, 'source': '../datasets/prompts.pdf'}]

The code for our third example:

from langchain_aws import BedrockEmbeddingsfrom langchain_core.runnables import RunnablePassthroughfrom langchain_core.prompts import PromptTemplatefrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.output_parsers import PydanticOutputParserfrom langchain_community.vectorstores import Chromafrom langchain_community.document_loaders 
import PyPDFLoaderfrom pydantic 
import BaseModel, Field, field_validator from typing 
import Optional, Literal

embed_model = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0")

class Search(BaseModel):    
"""Search for information about a specific topic."""    
query: str = Field(description="Main query to search for, answer in form of a question.")    
topic: Optional[Literal["agents", "prompts"]] = Field(description="Topic to look up, should be agents or prompts.")    

@field_validator("query")    
@classmethod    
def query_ends_in_question_mark(cls, field):        
if field[-1] != "?":            
   raise ValueError("Not in the form of a question.")        
return field 

def load_multiple_docs():    
data = PyPDFLoader("../datasets/agents.pdf").load()    
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)    
splits = text_splitter.split_documents(data)    
# first store to disk, then can comment out and just call back the db later    
vectorstore = Chroma.from_documents(splits, embedding=embed_model, collection_name="agents_collection", persist_directory='./chroma_db')    

vectorstore = Chroma(collection_name="agents_collection", embedding_function=embed_model,  persist_directory='./chroma_db',    ) retriever_agents = vectorstore.as_retriever(search_kwargs={"k": 2})    

# second vector store and retriever    
data = PyPDFLoader("../datasets/prompts.pdf").load()   
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)    
splits = text_splitter.split_documents(data)    
vectorstore = Chroma.from_documents(splits, embedding=embed_model, collection_name="prompts_collection", persist_directory='./chroma_db')    
vectorstore = Chroma( collection_name="prompts_collection", embedding_function=embed_model,  persist_directory='./chroma_db',    )    

retriever_prompts = vectorstore.as_retriever(search_kwargs={"k": 1})    
retrievers = { "agents": retriever_agents, "prompts": retriever_prompts,    }    
parser = PydanticOutputParser(pydantic_object=Search)    
prompt = PromptTemplate(template="You have the ability to dissect user queries and extract the correct information.\n{format_instructions}\n{query}\n",        
input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()},)    

query_search = ( {"query": RunnablePassthrough()} | 
                                prompt            | 
                                   llm            | 
                                parser    )    
response = query_search.invoke({"query": "Can you tell me about negative prompting?"})    
retriever = retrievers[response.topic]    
doc = retriever.invoke(response.query)    
for result in doc:       
 print(f"* {result.page_content} [{result.metadata}]")

Conclusion:

In this post, we identify and bring together methods that use external context to help an LLM respond to a user query and improve the relevancy of those responses. We explored the document object as the data primitive to store context directly and write context to vectorstores that are embedded and persisted for similarity searches. To expose the n-dimensional embedding and similarity advantage from vectorstores, data access methods are needed to compare a user’s query to the data in a vectorstore. The retriever interface is used as the proxy to read the document objects and vectorstores while also defining the similarity methods and request parameters that the methods will use. We also introduce custom classes that can select the type of documents we are most interested in referencing and customize the LLM’s response format.

We created two prompts, the first instructing the LLM to use the context retrieved to respond to the user’s question and the second prompt as a placeholder to store the document list context. A chain is created using Langchain’s stuff documents that passes a documents list to the LLM by combining both the LLM prompt and document prompt. A custom retriever is created by extending Langchain’s baseretreiver class that returns either document objects directly or by loading a file first. This provides the flexibility in how we build and load document objects from sources that can range from PDF files to inline static content. We then create the retrieval chain with the custom retriever and stuff document parameters to retrieve the documents as context because we instructed the LLM to use the context parameter when answering the user’s question. Finally, the retrieval chain returns a runnable which we use to invoke the LLM with a user query and respond using the context provided.

We then extended our document storage and retrieval constructs by using vector databases to embed the document data and respond to queries based on vector similarity searches. We created three document objects that included content, metadata, and id’s for each object which were then stored in a list. The Chroma vectorstore was instantiated as our vector database using the Amazon titan embeddings as our embedding model to convert the document datasets into vectors that have magnitude and direction. Again, we embed the document objects so that we can compare how close the user’s query is in semantic and contextual relevancy to the document content. We then add the document list created earlier to the vectorstore and create a retriever wrapper around the vectorstore to retrieve the documents. The retriever method provides optional parameters we can define including the search type algorithm and number of documents to return. We specified the maximum marginal relevance (MMR) algorithm in our parameter, which returns documents that are most similar to the query and also calibrates similarity among the documents themselves to reduce redundancy in the returned document datasets.

In our final example, we build upon our document modeling and surface area by promulgating unique documents across multiple vectorstores using the native Langchain PyPDF document loader. To prepare the documents for embedding and vectorstore loading, the PDF files are split by specific chunk sizes and overlaps. The split routine uses a recursive approach that compares the content against a hierarchy of expressions to determine where the split must occur, which helps to preserve contextual meaning in the data chunks. We create a vectorstore using Chroma and define parameters prior to loading the documents that were chunked. The documents are first embedded as vectors using the embedding algorithm, which in our case was the Amazon titan embedding model, and define a directory to persist the vectorstore once created. If we have already created the vectorstore we can reference it by the collection name and directory path, and thus not be required to recreate the entire vectorstore.

Retrievers are created for each vectorstore and stored in a json list. To enable a LLM query search across all vectorstores, we have the LLM first determine which retriever is most relevant to the user’s query based on the json retriever list. This is achieved by creating a prompt template instructing the LLM to parse the query and use output formatting from a class that inherits the Pydantic basemodel. The class informs the LLM to output a field that matches the retriever’s name in the json list along with additional validation performed in the class. Now that we have semantically matched the most similar retriever to the query, that retriever is invoked to request and return an LLM response the user’s question.

In the next post we will continue exploring new ways to introduce custom context that helps an LLM respond to a user’s query. We will introduce LllamaIndex to load documents into vectorstores and create indexes on top of the vectorstores for faster and efficient context searches. We will demonstrate how to fuse multiple indexes into a single retriever and automatically expand the user’s query into multiple queries to ensure our index search coverage is optimal. The retrieved nodes will be our added context and be used to generate a response to a user’s request.