LLM Tracing
Quick Summary
LLM tracing allows you to easily debug LLM applications for both production monitoring and development evaluation environments at a component level. You should consider setting up tracing because tracing allows you to easily:
- debug failing test cases (ie. figure out how unsatisfactory
actual_output
s are generated during evaluation). - understand (in real-time) how LLM responses are generated when monitoring in production.
Confident AI is NOT tied to or vendor-locked into any LLM provider or framework, which means you can trace literally any LLM application of your choice.
Tracing Integrations
If your LLM application was built using a hybrid approach (eg., a bit of langchain
, and bit of llama_index
, and maybe a bit of custom LLM API calling), your LLM application will need to consider hybrid tracing instead.
deepeval
allows anyone to automatically log traces for LLM applications built using either llama_index
or langchain
in just one line of code. All traces will be sent to Confident AI upon setup and all you need to do is login to Confident AI.
LlamaIndex
import deepeval
deepeval.trace_llama_index()
For more information on the DeepEval's LlamaIndex Integration, visit this page.
LangChain
import deepeval
deepeval.trace_langchain()
For more information on the DeepEval's LangChain Integration, visit this page.
Setup Custom Tracing
Custom tracing allows you to more flexibly define which part of your LLM application you want traced. There are a two main scenarios where custom tracing comes in handy:
- your LLM application wasn't built on any frameworks
- your LLM application was built on multiple frameworks
In deepeval
, tracing a particular component in your LLM application is as simple as adding a with
block along with deepeval
's Tracer
in python.
from deepeval.tracing import Tracer, TraceType
...
with Tracer(trace_type=TraceType.LLM) as llm_trace:
response = openai.ChatCompletion.create(
model='gpt-4o',
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "..."},
],
)
llm_response = response.choices[0].message.content
print(llm_response)
Nesting with
blocks in python allows deepeval
to recognize the trace hierarchy in your LLM application. For example, this trace structure:
from deepeval.tracing import Tracer, TraceType
with Tracer(trace_type="My LLM Application") as custom_trace:
...
with Tracer(trace_type=TraceType.RETRIEVER) as retriever_trace:
...
with Trace(trace_type=TraceType.EMBEDDING) as embedding_trace:
...
with Tracer(trace_type=TraceType.LLM) as llm_trace:
...
This particular nested with
block will give the following structure:
My LLM Application
|
* --- Retriever
| |
| * --- Embedding
|
* --- LLM
|
You'll notice the Tracer
accepts a trace_type
argument. The trace_type
argument accepts either a str
(for custom types of traces), or TraceType
. The trace_type
parameter is an easy way for you to classify components in your LLM pipeline and here are all the TraceType
s that deepeval
offers.
TraceType.AGENT
TraceType.CHAIN
TraceType.EMBEDDING
TraceType.LLM
TraceType.QUERY
TraceType.RERANKING
TraceType.RETRIEVER
TraceType.SYNTHESIZE
TraceType.TOOL
Before the end of each with
block, as you'll learn later, requires you to call the set_attributes()
method to set the attributes associated with your trace. For example, you'll need to set the output_str
and input_str
to your LLM when defining a trace of type TraceType.LLM
via the set_attributes()
method.
Although you can technically use custom traces for all trace_type
s to avoid the trouble of setting trace specific attributes, deepeval
's default approach provides you with a better UI on Confident AI.
TraceType.LLM
The LLM
TraceType
should be reserved for components where LLM generation happens.
from deepeval.tracing import Tracer, TraceType, LlmAttributes
with Tracer(trace_type=TraceType.LLM) as llm_trace:
response = openai.ChatCompletion.create(
model='gpt-4o',
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "..."},
],
)
llm_response = response.choices[0].message.content
print(llm_response)
# Don't forget to set attributes for the LLM trace type
attributes = LlmAttributes(
input_str="...",
output_str=llm_response,
model='gpt-4o', # Optional
total_token_count=response['usage']['total_tokens'], # Optional
prompt_token_count=response['usage']['prompt_tokens'], # Optional
completion_token_count=response['usage']['completion_tokens'], # Optional
)
llm_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofLlmAttributes
The LlmAttributes
accepts the following arguments:
input_str
: a string representing the input text provided to the llm.output_str
: a string representing the output generated by the llm.- [Optional]
model
: a string specifying the model name of the llm used for generation. - [Optional]
total_token_count
: the total number of tokens used in the response generation. - [Optional]
prompt_token_count
: the number of tokens used from the prompt in the response generation. - [Optional]
completion_token_count
: the number of tokens generated as part of the completion. - [Optional]
prompt_template
: a string specifying the template used for the prompt. - [Optional]
prompt_template_variables
: a str to str dictionary of variables used in the prompt template.
TraceType.EMBEDDING
The EMBEDDING
TraceType
should be reserved for components where text is embedded into its vector representation.
from deepeval.tracing import Tracer, TraceType, EmbeddingAttributes
with Tracer(trace_type=TraceType.EMBEDDING) as embedding_trace:
response = openai.embeddings.create(
input="Your text string goes here",
model="text-embedding-3-small"
)
vector = response.data[0].embedding
# Don't forget to set attributes for the EMBEDDING trace type
attributes = EmbeddingAttributes(
embedding_text="Your text string goes here",
model='text-embedding-3-small', # Optional
embedding_length=len(vector) # Optional
)
embedding_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofEmbeddingAttributes
The EmbeddingAttributes
accepts the following arguments:
embedding_text
: a string representing the text being embedded.- [Optional]
model
: a string specifying the embedding model name . - [Optional]
embedding_length
: an integer representing the length of the embedding vector.
TraceType.RETRIEVER
The RETRIEVER
TraceType
should be reserved for components where nodes/text chunks/documents are retrieved based based on some vector embedding (as is often the case for RAG applications).
from typing import List
from deepeval.tracing import Tracer, TraceType, RetrievalNode, RetrieverAttributes
with Tracer(trace_type=TraceType.RETRIEVER) as retriever_trace:
...
retrieval_nodes : List[str] = get_nodes_in_vector_database(vector, k)
average_chunk_size = sum(len(node) for node in retrieval_nodes) / len(retrieval_nodes) if retrieval_nodes else 0
# Don't forget to set attributes for the RETRIEVER trace type
attributes = RetrieverAttributes(
query_str="Your query string goes here",
nodes=[RetrievalNode(content=node) for node in retrieval_nodes],
top_k=len(retrieval_nodes), # Optional
average_chunk_size=average_chunk_size, # Optional
similarity_scorer="cosine" # Optional
)
retriever_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofRetrieverAttributes
The RetrieverAttributes
accepts the following arguments:
query_str
: a string representing the query string used for retrieval.nodes
: a list ofRetrievalNodes
.- [Optional]
top_k
: an integer representing the number of top retrieval nodes. - [Optional]
average_chunk_size
: an integer representing the average chunk size of the retrieved text chunks. - [Optional]
top_score
: a float representing the top score for retrieval. - [Optional]
similarity_scorer
: a string specifying the similarity function used for retrieval.
The Retrieval Node
class represents a single node in a retrieval process. It accepts the following arguments:
content
: a string representing the content of the node.- [Optional]
id
: a string representing the identifier for the node. - [Optional]
score
: a float representing the relevance score of the node. - [Optional]
source_file
: a string specifying the source file from which the node was retrieved.
TraceType.RERANKING
The RERANKING
TraceType
should be reserved for components where nodes/text chunks/documents are reranked POST retrieval (ie., this should be contained within the RETRIEVER
).
from typing import List
from deepeval.tracing import Tracer, TraceType, RerankingAttributes, RetrievalNode
with Tracer(trace_type=TraceType.RERANKING) as reranking_trace:
...
reranked_nodes : List[str] = rerank_nodes(retrieval_nodes)
# Don't forget to set attributes for the RERANKING trace type
attributes = RerankingAttributes(
input_nodes=[RetrievalNode(content=node) for node in retrieval_nodes],
output_nodes=[RetrievalNode(content=node) for node in rerank_nodes],
model="some-reranking-model", # Optional
top_n=len(reranked_nodes), # Optional
batch_size=10, # Optional
query_str="Your reranking query string", # Optional
)
reranking_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofRerankingAttributes
The RerankingAttributes
accepts the following arguments:
input_nodes
: a list of inputRetrievalNode
s.output_nodes
: a list of outputRetrievalNode
s.- [Optional]
model
: a string specifying the model name used for reranking. - [Optional]
top_n
: an integer representing the number of top reranked nodes. - [Optional]
batch_size
: an integer representing the batch size used for reranking. - [Optional]
query_str
: a string representing the query string used for reranking.
The Retrieval Node
class represents a single node in a retrieval process. It accepts the following arguments:
content
: a string representing the content of the node.- [Optional]
id
: a string representing the identifier for the node. - [Optional]
score
: a float representing the relevance score of the node. - [Optional]
source_file
: a string specifying the source file from which the node was retrieved.
TraceType.SYNTHESIZE
The SYNTHESIZE
TraceType
should be reserved for components where synthesis of data, queries, or responses occurs.
from typing import List
from deepeval.tracing import Tracer, TraceType, SynthesizeAttributes
with Tracer(trace_type=TraceType.SYNTHESIZE) as syntheisze_trace:
...
user_query = "Tell me about historical events."
response, retrieved_context = synthesize_response(user_query)
# Don't forget to set attributes for the SYNTHESIZE trace type
attributes = SynthesizeAttributes(
user_query=user_query,
response=response,
retrieved_context=retrieved_context # Optional
)
synthesize_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofSynthesizeAttributes
The SynthesizeAttributes
accepts the following arguments:
user_query
: a string representing the user query input.response
: a string representing the generated response.- [Optional]
retrieved_context
: a string specifying the retrieved context used in synthesis.
TraceType.TOOL
The TOOL
TraceType
should be reserved for components involving the usage of specific LLM tools or utilities.
from typing import List
from deepeval.tracing import Tracer, TraceType, ToolAttributes
with Tracer(trace_type=TraceType.TOOL) as tool_trace:
...
original_text = "The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods. In 1950, Alan Turing published an article titled 'Computing Machinery and Intelligence' which proposed what is now called the Turing test as a criterion of intelligence, a fundamental goal of natural language processing."
summarized_text = summarize_text(original_text)
# Don't forget to set attributes for the TOOL trace type
attributes = ToolAttributes(
name="Summarization Tool",
description="A tool that uses an LLM to automatically summarize lengthy text into a concise version."
)
tool_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofToolAttributes
The ToolAttributes
accepts the following arguments:
name
: a string representing the name of the tool.description
: a string describing the tool.
TraceType.QUERY
The QUERY
TraceType
should be reserved for components where engine querying occurs.
from typing import List
from deepeval.tracing import Tracer, TraceType, QueryAttributes
with Tracer(trace_type=TraceType.QUERY) as query_trace:
...
user_query = "What are the latest trends in artificial intelligence?"
output_response = fetch_information_from_llm(user_query)
# Don't forget to set attributes for the QUERY trace type
attributes = QueryAttributes(
input=user_query,
output=output_response
)
query_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofQueryAttributes
The QueryAttributes
accepts the following arguments:
input
: a string representing the input query string.output
: a string representing the output response to the query.
TraceType.AGENT
The AGENT
TraceType
should be reserved for components where agent-based interaction happens.
from typing import List
from deepeval.tracing import Tracer, TraceType, AgentAttributes
with Tracer(trace_type=TraceType.AGENT) as agent_trace:
...
user_input = "What were the top selling products?"
sql_output = text_to_sql(user_input)
# Don't forget to set attributes for the AGENT trace type
attributes = AgentAttributes(
input=user_input,
output=sql_output,
name="Text-to-SQL Converter",
description="An agent that converts natural language queries into SQL queries for database interaction."
)
agent_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofAgentAttributes
The AgentAttributes
accepts the following arguments:
input
: a string representing the input provided to the agent.output
: a string representing the output generated by the agent.name
: a string representing the name of the agent.description
: a string describing the agent.
TraceType.CHAIN
The CHAIN
TraceType
should be reserved for components where a sequence of operations or tasks is performed.
from typing import List
from deepeval.tracing import Tracer, TraceType, ChainAttributes
with Tracer(trace_type=TraceType.CHAIN) as chain_trace:
...
input_text = "This is a good example of how complex processing chains can be structured."
sentiment_result = analyze_sentiment(input_text)
summary = summarize_text(sentiment_result)
final_output = translate_text(summary, "Spanish")
# Don't forget to set attributes for the CHAIN trace type
attributes = ChainAttributes(
input=input_text,
output=final_output,
prompt_template="Process involves sentiment analysis, summarization, and translation." # Optional
)
chain_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofChainAttributes
The ChainAttributes
accepts the following arguments:
input
: a string representing the input provided to the chain.output
: a string representing the output generated by the chain.- [Optional]
prompt_template
: a string specifying the template used for generating prompts.
TraceType.CHUNK
The CHUNK
TraceType
is used for components where a large input text is divided into smaller chunks. This can be useful in scenarios where the input text is too large to process in a single step, so it needs to be split into manageable pieces.
from deepeval.tracing import Tracer, TraceType, ChunkAttributes
with Tracer(trace_type=TraceType.CHUNK) as chunk_trace:
input_text = "Your large input text goes here..."
chunk_size = 1000 # Example chunk size
output_chunks = [input_text[i:i + chunk_size] for i in range(0, len(input_text), chunk_size)]
# Set attributes for the CHUNK trace type
attributes = ChunkAttributes(
input=input_text,
output_chunks=output_chunks
)
chunk_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofChunkAttributes
The ChunkAttributes
accepts the following arguments:
input
: a string representing the large input text being chunked.output_chunks
: a list of strings representing the smaller chunks resulting from the chunking process.
TraceType.NODE_PARSING
The NODE_PARSING
TraceType
is used for components where nodes are parsed from a larger dataset or document. This can involve extracting specific sections, elements, or pieces of information from the larger content.
from typing import List
from deepeval.tracing import Tracer, TraceType, RetrievalNode, NodeParsingAttributes
with Tracer(trace_type=TraceType.NODE_PARSING) as node_parsing_trace:
document = "Your large document goes here..."
parsed_nodes: List[str] = parse_document_to_nodes(document) # Function to parse the document
# Set attributes for the NODE_PARSING trace type
attributes = NodeParsingAttributes(
output_nodes=[RetrievalNode(content=node) for node in parsed_nodes]
)
node_parsing_trace.set_attributes(attributes)
Additionally, set_attributes()
must be called with the following parameters:
attributes
: an instance ofNodeParsingAttributes
The NodeParsingAttributes
accepts the following arguments:
content
: a string representing the content of the node.- [Optional]
id
: a string representing the identifier for the node. - [Optional]
score
: a float representing the relevance score of the node. - [Optional]
source_file
: a string specifying the source file from which the node was retrieved.
Remember, you can also supply a custom str
as trace_type
for custom trace types. GenericAttributes
(which accepts only input and output) must be set via the set_attributes
method before the end of custom traces.
Custom Tracing During Monitoring in Production
To enable Custom Tracing for production response monitoring, you'll need to call the monitor()
method at the end of the root trace with
block. This method has the same exact arguments (and types) as deepeval.monitor()
. You can learn more about deepeval.monitor()
in production monitoring here.
from deepeval.tracing import Tracer, TraceType
with Tracer(trace_type=TraceType.AGENT) as agent_trace:
...
agent_trace.monitor(
event_name="Chatbot",
model='gpt-4-turbo-preview',
input=user_input,
response=output,
)
Tracing During Evaluation in development
to be documented...
Full Advanced Hybrid Tracing Example
Lastly, deepeval
supports hybrid tracing, which is particularly helpful if you are using a combination of frameworks to build your LLM application.
In this example, we'll setup hybrid tracing on a RAG pipeline built using LlamaIndex and some raw OpenAI APIs. To do so, simply import set_global_handler
from llama_index.core
and set it to ‘deepeval’. deepeval
will handle all the logic behind the scenes and automatically nest the LlamaIndex traces in the correct trace positions.
Here's how you can set it up:
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from openai import AsyncOpenAI
from llama_index.core import set_global_handler
from deepeval.tracing import Tracer, TraceType, LlmAttributes
class RAGPipeline:
def __init__(self, model_name="gpt-4-turbo-preview", top_k=5, chunk_size=200, chunk_overlap=20, min_similarity=0.5, data_dir="data"):
openai_key = os.getenv("OPENAI_API_KEY")
if not openai_key:
raise ValueError("OpenAI API key not found in environment variables.")
self.openai_client = AsyncOpenAI(api_key=openai_key)
self.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
documents = SimpleDirectoryReader(data_dir).load_data()
node_parser = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
nodes = node_parser.get_nodes_from_documents(documents, show_progress=True)
self.index = VectorStoreIndex(nodes, embed_model=self.embed_model)
self.retriever = self.index.as_retriever(similarity_top_k=top_k, similarity_cutoff=min_similarity)
self.model_name = model_name
def format_nodes(self, query):
with Tracer(trace_type="CustomNodeParsing") as node_parsing_trace:
nodes = self.retriever.retrieve(query)
combined_nodes = "\n".join([node.get_content() for node in nodes])
attributes = GenericAttributes(
input=query,
output=combined_nodes
)
node_parsing_trace.set_attributes(attributes)
return combined_nodes
async def generate_completion(self, prompt, context):
with Tracer(trace_type=TraceType.LLM) as llm_trace:
full_prompt = f"Context: {context}\n\nQuery: {prompt}\n\nResponse:"
response = await self.openai_client.chat.completions.create(
model=self.model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": full_prompt}
],
temperature=0.7,
max_tokens=200
)
output = response.choices[0].message.content
attributes = LlmAttributes(
input_str=full_prompt,
output_str=output,
model=self.model_name,
total_token_count=response['usage']['total_tokens'],
prompt_token_count=response['usage']['prompt_tokens'],
completion_token_count=response['usage']['completion_tokens'],
)
llm_trace.set_attributes(attributes)
return output
async def aquery(self, query_text):
with Tracer(trace_type=TraceType.QUERY) as query_trace:
context = self.format_nodes(query_text)
response = await self.generate_completion(query_text, context)
attributes = QueryAttributes(
input=query_text,
output=response
)
query_trace.set_attributes(attributes)
query_trace.monitor(
input=query_text,
response=response,
model='gpt-4-turbo-preview',
)
return response