Skip to main content

LLM Tracing

Quick Summary

LLM tracing allows you to easily debug LLM applications for both production monitoring and development evaluation environments at a component level. You should consider setting up tracing because tracing allows you to easily:

  • debug failing test cases (ie. figure out how unsatisfactory actual_outputs are generated during evaluation).
  • understand (in real-time) how LLM responses are generated when monitoring in production.
info

Confident AI is NOT tied to or vendor-locked into any LLM provider or framework, which means you can trace literally any LLM application of your choice.

Tracing Integrations

note

If your LLM application was built using a hybrid approach (eg., a bit of langchain, and bit of llama_index, and maybe a bit of custom LLM API calling), your LLM application will need to consider hybrid tracing instead.

deepeval allows anyone to automatically log traces for LLM applications built using either llama_index or langchain in just one line of code. All traces will be sent to Confident AI upon setup and all you need to do is login to Confident AI.

LlamaIndex

import deepeval

deepeval.trace_llama_index()

For more information on the DeepEval's LlamaIndex Integration, visit this page.

LangChain

import deepeval

deepeval.trace_langchain()

For more information on the DeepEval's LangChain Integration, visit this page.

Setup Custom Tracing

Custom tracing allows you to more flexibly define which part of your LLM application you want traced. There are a two main scenarios where custom tracing comes in handy:

  • your LLM application wasn't built on any frameworks
  • your LLM application was built on multiple frameworks

In deepeval, tracing a particular component in your LLM application is as simple as adding a with block along with deepeval's Tracer in python.

from deepeval.tracing import Tracer, TraceType
...

with Tracer(trace_type=TraceType.LLM) as llm_trace:
response = openai.ChatCompletion.create(
model='gpt-4o',
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "..."},
],
)
llm_response = response.choices[0].message.content
print(llm_response)
DID YOU KNOW?

Nesting with blocks in python allows deepeval to recognize the trace hierarchy in your LLM application. For example, this trace structure:

from deepeval.tracing import Tracer, TraceType

with Tracer(trace_type="My LLM Application") as custom_trace:
...
with Tracer(trace_type=TraceType.RETRIEVER) as retriever_trace:
...
with Trace(trace_type=TraceType.EMBEDDING) as embedding_trace:
...
with Tracer(trace_type=TraceType.LLM) as llm_trace:
...

This particular nested with block will give the following structure:

My LLM Application
|
* --- Retriever
| |
| * --- Embedding
|
* --- LLM
|

You'll notice the Tracer accepts a trace_type argument. The trace_type argument accepts either a str (for custom types of traces), or TraceType. The trace_type parameter is an easy way for you to classify components in your LLM pipeline and here are all the TraceTypes that deepeval offers.

  • TraceType.AGENT
  • TraceType.CHAIN
  • TraceType.EMBEDDING
  • TraceType.LLM
  • TraceType.QUERY
  • TraceType.RERANKING
  • TraceType.RETRIEVER
  • TraceType.SYNTHESIZE
  • TraceType.TOOL

Before the end of each with block, as you'll learn later, requires you to call the set_attributes() method to set the attributes associated with your trace. For example, you'll need to set the output_str and input_str to your LLM when defining a trace of type TraceType.LLM via the set_attributes() method.

info

Although you can technically use custom traces for all trace_types to avoid the trouble of setting trace specific attributes, deepeval's default approach provides you with a better UI on Confident AI.

TraceType.LLM

The LLM TraceType should be reserved for components where LLM generation happens.

from deepeval.tracing import Tracer, TraceType, LlmAttributes

with Tracer(trace_type=TraceType.LLM) as llm_trace:
response = openai.ChatCompletion.create(
model='gpt-4o',
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "..."},
],
)
llm_response = response.choices[0].message.content
print(llm_response)

# Don't forget to set attributes for the LLM trace type
attributes = LlmAttributes(
input_str="...",
output_str=llm_response,
model='gpt-4o', # Optional
total_token_count=response['usage']['total_tokens'], # Optional
prompt_token_count=response['usage']['prompt_tokens'], # Optional
completion_token_count=response['usage']['completion_tokens'], # Optional
)
llm_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of LlmAttributes

The LlmAttributes accepts the following arguments:

  • input_str: a string representing the input text provided to the llm.
  • output_str: a string representing the output generated by the llm.
  • [Optional] model: a string specifying the model name of the llm used for generation.
  • [Optional] total_token_count: the total number of tokens used in the response generation.
  • [Optional] prompt_token_count: the number of tokens used from the prompt in the response generation.
  • [Optional] completion_token_count: the number of tokens generated as part of the completion.
  • [Optional] prompt_template: a string specifying the template used for the prompt.
  • [Optional] prompt_template_variables: a str to str dictionary of variables used in the prompt template.

TraceType.EMBEDDING

The EMBEDDING TraceType should be reserved for components where text is embedded into its vector representation.

from deepeval.tracing import Tracer, TraceType, EmbeddingAttributes

with Tracer(trace_type=TraceType.EMBEDDING) as embedding_trace:
response = openai.embeddings.create(
input="Your text string goes here",
model="text-embedding-3-small"
)
vector = response.data[0].embedding

# Don't forget to set attributes for the EMBEDDING trace type
attributes = EmbeddingAttributes(
embedding_text="Your text string goes here",
model='text-embedding-3-small', # Optional
embedding_length=len(vector) # Optional
)
embedding_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of EmbeddingAttributes

The EmbeddingAttributes accepts the following arguments:

  • embedding_text: a string representing the text being embedded.
  • [Optional] model: a string specifying the embedding model name .
  • [Optional] embedding_length: an integer representing the length of the embedding vector.

TraceType.RETRIEVER

The RETRIEVER TraceType should be reserved for components where nodes/text chunks/documents are retrieved based based on some vector embedding (as is often the case for RAG applications).

from typing import List
from deepeval.tracing import Tracer, TraceType, RetrievalNode, RetrieverAttributes

with Tracer(trace_type=TraceType.RETRIEVER) as retriever_trace:
...
retrieval_nodes : List[str] = get_nodes_in_vector_database(vector, k)
average_chunk_size = sum(len(node) for node in retrieval_nodes) / len(retrieval_nodes) if retrieval_nodes else 0

# Don't forget to set attributes for the RETRIEVER trace type
attributes = RetrieverAttributes(
query_str="Your query string goes here",
nodes=[RetrievalNode(content=node) for node in retrieval_nodes],
top_k=len(retrieval_nodes), # Optional
average_chunk_size=average_chunk_size, # Optional
similarity_scorer="cosine" # Optional
)
retriever_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of RetrieverAttributes

The RetrieverAttributes accepts the following arguments:

  • query_str: a string representing the query string used for retrieval.
  • nodes: a list of RetrievalNodes.
  • [Optional] top_k: an integer representing the number of top retrieval nodes.
  • [Optional] average_chunk_size: an integer representing the average chunk size of the retrieved text chunks.
  • [Optional] top_score: a float representing the top score for retrieval.
  • [Optional] similarity_scorer: a string specifying the similarity function used for retrieval.

The Retrieval Node class represents a single node in a retrieval process. It accepts the following arguments:

  • content: a string representing the content of the node.
  • [Optional] id: a string representing the identifier for the node.
  • [Optional] score: a float representing the relevance score of the node.
  • [Optional] source_file: a string specifying the source file from which the node was retrieved.

TraceType.RERANKING

The RERANKING TraceType should be reserved for components where nodes/text chunks/documents are reranked POST retrieval (ie., this should be contained within the RETRIEVER).

from typing import List
from deepeval.tracing import Tracer, TraceType, RerankingAttributes, RetrievalNode

with Tracer(trace_type=TraceType.RERANKING) as reranking_trace:
...
reranked_nodes : List[str] = rerank_nodes(retrieval_nodes)

# Don't forget to set attributes for the RERANKING trace type
attributes = RerankingAttributes(
input_nodes=[RetrievalNode(content=node) for node in retrieval_nodes],
output_nodes=[RetrievalNode(content=node) for node in rerank_nodes],
model="some-reranking-model", # Optional
top_n=len(reranked_nodes), # Optional
batch_size=10, # Optional
query_str="Your reranking query string", # Optional
)
reranking_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of RerankingAttributes

The RerankingAttributes accepts the following arguments:

  • input_nodes: a list of input RetrievalNodes.
  • output_nodes: a list of output RetrievalNodes.
  • [Optional] model: a string specifying the model name used for reranking.
  • [Optional] top_n: an integer representing the number of top reranked nodes.
  • [Optional] batch_size: an integer representing the batch size used for reranking.
  • [Optional] query_str: a string representing the query string used for reranking.

The Retrieval Node class represents a single node in a retrieval process. It accepts the following arguments:

  • content: a string representing the content of the node.
  • [Optional] id: a string representing the identifier for the node.
  • [Optional] score: a float representing the relevance score of the node.
  • [Optional] source_file: a string specifying the source file from which the node was retrieved.

TraceType.SYNTHESIZE

The SYNTHESIZE TraceType should be reserved for components where synthesis of data, queries, or responses occurs.

from typing import List
from deepeval.tracing import Tracer, TraceType, SynthesizeAttributes

with Tracer(trace_type=TraceType.SYNTHESIZE) as syntheisze_trace:
...
user_query = "Tell me about historical events."
response, retrieved_context = synthesize_response(user_query)

# Don't forget to set attributes for the SYNTHESIZE trace type
attributes = SynthesizeAttributes(
user_query=user_query,
response=response,
retrieved_context=retrieved_context # Optional
)
synthesize_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of SynthesizeAttributes

The SynthesizeAttributes accepts the following arguments:

  • user_query: a string representing the user query input.
  • response: a string representing the generated response.
  • [Optional] retrieved_context: a string specifying the retrieved context used in synthesis.

TraceType.TOOL

The TOOL TraceType should be reserved for components involving the usage of specific LLM tools or utilities.

from typing import List
from deepeval.tracing import Tracer, TraceType, ToolAttributes

with Tracer(trace_type=TraceType.TOOL) as tool_trace:
...
original_text = "The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods. In 1950, Alan Turing published an article titled 'Computing Machinery and Intelligence' which proposed what is now called the Turing test as a criterion of intelligence, a fundamental goal of natural language processing."
summarized_text = summarize_text(original_text)

# Don't forget to set attributes for the TOOL trace type
attributes = ToolAttributes(
name="Summarization Tool",
description="A tool that uses an LLM to automatically summarize lengthy text into a concise version."
)
tool_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of ToolAttributes

The ToolAttributes accepts the following arguments:

  • name: a string representing the name of the tool.
  • description: a string describing the tool.

TraceType.QUERY

The QUERY TraceType should be reserved for components where engine querying occurs.

from typing import List
from deepeval.tracing import Tracer, TraceType, QueryAttributes

with Tracer(trace_type=TraceType.QUERY) as query_trace:
...
user_query = "What are the latest trends in artificial intelligence?"
output_response = fetch_information_from_llm(user_query)

# Don't forget to set attributes for the QUERY trace type
attributes = QueryAttributes(
input=user_query,
output=output_response
)
query_trace.set_attributes(attributes)


Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of QueryAttributes

The QueryAttributes accepts the following arguments:

  • input: a string representing the input query string.
  • output: a string representing the output response to the query.

TraceType.AGENT

The AGENT TraceType should be reserved for components where agent-based interaction happens.

from typing import List
from deepeval.tracing import Tracer, TraceType, AgentAttributes

with Tracer(trace_type=TraceType.AGENT) as agent_trace:
...
user_input = "What were the top selling products?"
sql_output = text_to_sql(user_input)

# Don't forget to set attributes for the AGENT trace type
attributes = AgentAttributes(
input=user_input,
output=sql_output,
name="Text-to-SQL Converter",
description="An agent that converts natural language queries into SQL queries for database interaction."
)
agent_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of AgentAttributes

The AgentAttributes accepts the following arguments:

  • input: a string representing the input provided to the agent.
  • output: a string representing the output generated by the agent.
  • name: a string representing the name of the agent.
  • description: a string describing the agent.

TraceType.CHAIN

The CHAIN TraceType should be reserved for components where a sequence of operations or tasks is performed.

from typing import List
from deepeval.tracing import Tracer, TraceType, ChainAttributes

with Tracer(trace_type=TraceType.CHAIN) as chain_trace:
...
input_text = "This is a good example of how complex processing chains can be structured."

sentiment_result = analyze_sentiment(input_text)
summary = summarize_text(sentiment_result)
final_output = translate_text(summary, "Spanish")

# Don't forget to set attributes for the CHAIN trace type
attributes = ChainAttributes(
input=input_text,
output=final_output,
prompt_template="Process involves sentiment analysis, summarization, and translation." # Optional
)
chain_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of ChainAttributes

The ChainAttributes accepts the following arguments:

  • input: a string representing the input provided to the chain.
  • output: a string representing the output generated by the chain.
  • [Optional] prompt_template: a string specifying the template used for generating prompts.

TraceType.CHUNK

The CHUNK TraceType is used for components where a large input text is divided into smaller chunks. This can be useful in scenarios where the input text is too large to process in a single step, so it needs to be split into manageable pieces.

from deepeval.tracing import Tracer, TraceType, ChunkAttributes

with Tracer(trace_type=TraceType.CHUNK) as chunk_trace:
input_text = "Your large input text goes here..."
chunk_size = 1000 # Example chunk size
output_chunks = [input_text[i:i + chunk_size] for i in range(0, len(input_text), chunk_size)]

# Set attributes for the CHUNK trace type
attributes = ChunkAttributes(
input=input_text,
output_chunks=output_chunks
)
chunk_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of ChunkAttributes

The ChunkAttributes accepts the following arguments:

  • input: a string representing the large input text being chunked.
  • output_chunks: a list of strings representing the smaller chunks resulting from the chunking process.

TraceType.NODE_PARSING

The NODE_PARSING TraceType is used for components where nodes are parsed from a larger dataset or document. This can involve extracting specific sections, elements, or pieces of information from the larger content.

from typing import List
from deepeval.tracing import Tracer, TraceType, RetrievalNode, NodeParsingAttributes

with Tracer(trace_type=TraceType.NODE_PARSING) as node_parsing_trace:
document = "Your large document goes here..."
parsed_nodes: List[str] = parse_document_to_nodes(document) # Function to parse the document

# Set attributes for the NODE_PARSING trace type
attributes = NodeParsingAttributes(
output_nodes=[RetrievalNode(content=node) for node in parsed_nodes]
)
node_parsing_trace.set_attributes(attributes)

Additionally, set_attributes() must be called with the following parameters:

  • attributes: an instance of NodeParsingAttributes

The NodeParsingAttributes accepts the following arguments:

  • content: a string representing the content of the node.
  • [Optional] id: a string representing the identifier for the node.
  • [Optional] score: a float representing the relevance score of the node.
  • [Optional] source_file: a string specifying the source file from which the node was retrieved.
tip

Remember, you can also supply a custom str as trace_type for custom trace types. GenericAttributes (which accepts only input and output) must be set via the set_attributes method before the end of custom traces.

Custom Tracing During Monitoring in Production

To enable Custom Tracing for production response monitoring, you'll need to call the monitor() method at the end of the root trace with block. This method has the same exact arguments (and types) as deepeval.monitor(). You can learn more about deepeval.monitor() in production monitoring here.

from deepeval.tracing import Tracer, TraceType

with Tracer(trace_type=TraceType.AGENT) as agent_trace:
...

agent_trace.monitor(
event_name="Chatbot",
model='gpt-4-turbo-preview',
input=user_input,
response=output,
)

Tracing During Evaluation in development

to be documented...

Full Advanced Hybrid Tracing Example

Lastly, deepeval supports hybrid tracing, which is particularly helpful if you are using a combination of frameworks to build your LLM application.

In this example, we'll setup hybrid tracing on a RAG pipeline built using LlamaIndex and some raw OpenAI APIs. To do so, simply import set_global_handler from llama_index.core and set it to ‘deepeval’. deepeval will handle all the logic behind the scenes and automatically nest the LlamaIndex traces in the correct trace positions.

Here's how you can set it up:

test_hybrid_chatbot.py
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from openai import AsyncOpenAI
from llama_index.core import set_global_handler

from deepeval.tracing import Tracer, TraceType, LlmAttributes

class RAGPipeline:
def __init__(self, model_name="gpt-4-turbo-preview", top_k=5, chunk_size=200, chunk_overlap=20, min_similarity=0.5, data_dir="data"):
openai_key = os.getenv("OPENAI_API_KEY")
if not openai_key:
raise ValueError("OpenAI API key not found in environment variables.")

self.openai_client = AsyncOpenAI(api_key=openai_key)
self.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

documents = SimpleDirectoryReader(data_dir).load_data()
node_parser = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
nodes = node_parser.get_nodes_from_documents(documents, show_progress=True)

self.index = VectorStoreIndex(nodes, embed_model=self.embed_model)
self.retriever = self.index.as_retriever(similarity_top_k=top_k, similarity_cutoff=min_similarity)
self.model_name = model_name

def format_nodes(self, query):
with Tracer(trace_type="CustomNodeParsing") as node_parsing_trace:
nodes = self.retriever.retrieve(query)
combined_nodes = "\n".join([node.get_content() for node in nodes])

attributes = GenericAttributes(
input=query,
output=combined_nodes
)
node_parsing_trace.set_attributes(attributes)
return combined_nodes

async def generate_completion(self, prompt, context):
with Tracer(trace_type=TraceType.LLM) as llm_trace:
full_prompt = f"Context: {context}\n\nQuery: {prompt}\n\nResponse:"
response = await self.openai_client.chat.completions.create(
model=self.model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": full_prompt}
],
temperature=0.7,
max_tokens=200
)
output = response.choices[0].message.content

attributes = LlmAttributes(
input_str=full_prompt,
output_str=output,
model=self.model_name,
total_token_count=response['usage']['total_tokens'],
prompt_token_count=response['usage']['prompt_tokens'],
completion_token_count=response['usage']['completion_tokens'],
)
llm_trace.set_attributes(attributes)
return output

async def aquery(self, query_text):
with Tracer(trace_type=TraceType.QUERY) as query_trace:
context = self.format_nodes(query_text)
response = await self.generate_completion(query_text, context)

attributes = QueryAttributes(
input=query_text,
output=response
)
query_trace.set_attributes(attributes)
query_trace.monitor(
input=query_text,
response=response,
model='gpt-4-turbo-preview',
)
return response