How to Build an AI Journal with LlamaIndex

A step-by-step guide for building an AI assistant powered by LlamaIndex The post How to Build an AI Journal with LlamaIndex appeared first on Towards Data Science.

May 16, 2025 - 23:34
 0
How to Build an AI Journal with LlamaIndex

This post will share how to build an AI journal with the LlamaIndex. We will cover one essential function of this AI journal: asking for advice. We will start with the most basic implementation and iterate from there. We can see significant improvements for this function when we apply design patterns like Agentic Rag and multi-agent workflow.

You can find the source code of this AI Journal in my GitHub repo here. And about who I am.

Overview of AI Journal

I want to build my principles by following Ray Dalio’s practice. An AI journal will help me to self-reflect, track my improvement, and even give me advice. The overall function of such an AI journal looks like this:

AI Journal Overview. Image by Author.

Today, we will only cover the implementation of the seek-advise flow, which is represented by multiple purple cycles in the above diagram.

Simplest Form: LLM with Large Context

In the most straightforward implementation, we can pass all the relevant content into the context and attach the question we want to ask. We can do that in Llamaindex with a few lines of code.

import pymupdf
from llama_index.llms.openai import OpenAI

path_to_pdf_book = './path/to/pdf/book.pdf'
def load_book_content():
    text = ""
    with pymupdf.open(path_to_pdf_book) as pdf:
        for page in pdf:
            text += str(page.get_text().encode("utf8", errors='ignore'))
    return text

system_prompt_template = """You are an AI assistant that provides thoughtful, practical, and *deeply personalized* suggestions by combining:
- The user's personal profile and principles
- Insights retrieved from *Principles* by Ray Dalio
Book Content: 
```
{book_content}
```
User profile:
```
{user_profile}
```
User's question:
```
{user_question}
```
"""

def get_system_prompt(book_content: str, user_profile: str, user_question: str):
    system_prompt = system_prompt_template.format(
        book_content=book_content,
        user_profile=user_profile,
        user_question=user_question
    )
    return system_prompt

def chat():
    llm = get_openai_llm()
    user_profile = input(">>Tell me about yourself: ")
    user_question = input(">>What do you want to ask: ")
    user_profile = user_profile.strip()
    book_content = load_book_summary()
    response = llm.complete(prompt=get_system_prompt(book_content, user_profile, user_question))
    return response

This approach has downsides:

  • Low Precision: Loading all the book context might prompt LLM to lose focus on the user’s question.
  • High Cost: Sending over significant-sized content in every LLM call means high cost and poor performance.

With this approach, if you pass the whole content of Ray Dalio’s Principles book, responses to questions like “How to handle stress?” become very general. Such responses without relating to my question made me feel that the AI was not listening to me. Even though it covers many important concepts like embracing reality, the 5-step process to get what you want, and being radically open-minded. I like the advice I got to be more targeted to the question I raised. Let’s see how we can improve it with RAG.

Enhanced Form: Agentic RAG

So, what is Agentic RAG? Agentic RAG is combining dynamic decision-making and data retrieval. In our AI journal, the Agentic RAG flow looks like this:

Stages of Agentic Rag. Image by Author
  • Question Evaluation: Poorly framed questions lead to poor query results. The agent will evaluate the user’s query and clarify the questions if the Agent believes it is necessary.
  • Question Re-write: Rewrite the user enquiry to project it to the indexed content in the semantic space. I found these steps essential for improving the precision during the retrieval. Let’s say if your knowledge base is Q/A pair and you are indexing the questions part to search for answers. Rewriting the user’s query statement to a proper question will help you find the most relevant content.
  • Query Vector Index: Many parameters can be tuned when building such an index, including chunk size, overlap, or a different index type. For simplicity, we are using VectorStoreIndex here, which has a default chunking strategy.
  • Filter & Synthetic: Instead of a complex re-ranking process, I explicitly instruct LLM to filter and find relevant content in the prompt. I see LLM picking up the most relevant content, even though sometimes it has a lower similarity score than others.

With this Agentic RAG, you can retrieve highly relevant content to the user’s questions, generating more targeted advice.

Let’s examine the implementation. With the LlamaIndex SDK, creating and persisting an index in your local directory is straightforward.

from llama_index.core import Document, VectorStoreIndex, StorageContext, load_index_from_storage

Settings.embed_model = OpenAIEmbedding(api_key="ak-xxxx")
PERSISTED_INDEX_PATH = "/path/to/the/directory/persist/index/locally"

def create_index(content: str):
    documents = [Document(text=content)]
    vector_index = VectorStoreIndex.from_documents(documents)
    vector_index.storage_context.persist(persist_dir=PERSISTED_INDEX_PATH)

def load_index():
    storage_context = StorageContext.from_defaults(persist_dir=PERSISTED_INDEX_PATH)
    index = load_index_from_storage(storage_context)
    return index

Once we have an index, we can create a query engine on top of that. The query engine is a powerful abstraction that allows you to adjust the parameters during the query(e.g., TOP K) and the synthesis behaviour after the content retrieval. In my implementation, I overwrite the response_mode NO_TEXT because the agent will process the book content returned by the function call and synthesize the final result. Having the query engine to synthesize the result before passing it to the agent would be redundant.

from llama_index.core.indices.vector_store import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import  VectorStoreIndex, get_response_synthesizer

def _create_query_engine_from_index(index: VectorStoreIndex):
    # configure retriever
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=TOP_K,
    )
    # return the original content without using LLM to synthesizer. For later evaluation.
    response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.NO_TEXT)
    # assemble query engine
    query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=response_synthesizer
    )
    return query_engine

The prompt looks like the following:

You are an assistant that helps reframe user questions into clear, concept-driven statements that match 
the style and topics of Principles by Ray Dalio, and perform look up principle book for relevant content. 

Background:
Principles teaches structured thinking about life and work decisions.
The key ideas are:
* Radical truth and radical transparency
* Decision-making frameworks
* Embracing mistakes as learning

Task:
- Task 1: Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.
- Task 2: Rewrite a user’s question into a statement that would match how Ray Dalio frames ideas in Principles. Use formal, logical, neutral tone.
- Task 3: Look up principle book with given re-wrote statements. You should provide at least {REWRITE_FACTOR} rewrote versions.
- Task 4: Find the most relevant from the book content as your fina answers.

Finally, we can build the agent with those functions defined.

def get_principle_rag_agent():
    index = load_persisted_index()
    query_engine = _create_query_engine_from_index(index)

    def look_up_principle_book(original_question: str, rewrote_statement: List[str]) -> List[str]:
        result = []
        for q in rewrote_statement:
            response = query_engine.query(q)
            content = [n.get_content() for n in response.source_nodes]
            result.extend(content)
        return result

    def clarify_question(original_question: str, your_questions_to_user: List[str]) -> str:
        """
        Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.
        """
        response = ""
        for q in your_questions_to_user:
            print(f"Question: {q}")
            r = input("Response:")
            response += f"Question: {q}\nResponse: {r}\n"
        return response

    tools = [
        FunctionTool.from_defaults(
            fn=look_up_principle_book,
            name="look_up_principle_book",
            description="Look up principle book with re-wrote queries. Getting the suggestions from the Principle book by Ray Dalio"),
        FunctionTool.from_defaults(
            fn=clarify_question,
            name="clarify_question",
            description="Clarify the user's question if needed. Ask follow-up questions to ensure you understand the user's intent.",
        )
    ]

    agent = FunctionAgent(
        name="principle_reference_loader",
        description="You are a helpful agent will based on user's question and look up the most relevant content in principle book.\n",
        system_prompt=QUESTION_REWRITE_PROMPT,
        tools=tools,
    )
    return agent

rag_agent = get_principle_rag_agent()
response = await agent.run(chat_history=chat_history)

There are a few observations I had during the implementations:

  • One interesting fact I found is that providing a non-used parameter, original_question , in the function signature helps. I found that when I do not have such a parameter, LLM sometimes does not follow the rewrite instruction and passes the original question in rewrote_statement the parameter. Having original_question parameters somehow emphasizes the rewriting mission to LLM.
  • Different LLMs behave quite differently given the same prompt. I found DeepSeek V3 much more reluctant to trigger function calls than other model providers. This doesn’t necessarily mean it is not usable. If a functional call should be initiated 90% of the time, it should be part of the workflow instead of being registered as a function call. Also, compared to OpenAI’s models, I found Gemini good at citing the source of the book when it synthesizes the results.
  • The more content you load into the context window, the more inference capability the model needs. A smaller model with less inference power is more likely to get lost in the large context provided.

However, to complete the seek-advice function, you’ll need multiple Agents working together instead of a single Agent. Let’s talk about how to chain your Agents together into workflows.

Final Form: Agent Workflow

Before we start, I recommend this article by Anthropic, Building Effective Agents. The one-liner summary of the articles is that you should always prioritise building a workflow instead of a dynamic agent when possible. In LlamaIndex, you can do both. It allows you to create an agent workflow with more automatic routing or a customised workflow with more explicit control of the transition of steps. I will provide an example of both implementations.

Workflow Explain. Image by Author.

Let’s take a look at how you can build a dynamic workflow. Here is a code example.

interviewer = FunctionAgent(
        name="interviewer",
        description="Useful agent to clarify user's questions",
        system_prompt=_intervierw_prompt,
        can_handoff_to = ["retriver"]
        tools=tools
)
interviewer = FunctionAgent(
        name="retriever",
        description="Useful agent to retrive principle book's content.",
        system_prompt=_retriver_prompt,
        can_handoff_to = ["advisor"]
        tools=tools
)
advisor = FunctionAgent(
        name="advisor",
        description="Useful agent to advise user.",
        system_prompt=_advisor_prompt,
        can_handoff_to = []
        tools=tools
)
workflow = AgentWorkflow(
        agents=[interviewer, advisor, retriever],
        root_agent="interviewer",
    )
handler = await workflow.run(user_msg="How to handle stress?")

It is dynamic because the Agent transition is based on the function call of the LLM model. Underlying, LlamaIndex workflow provides agent descriptions as functions for LLM models. When the LLM model triggers such “Agent Function Call”, LlamaIndex will route to your next corresponding agent for the subsequent step processing. Your previous agent’s output has been added to the workflow internal state, and your following agent will pick up the state as part of the context in their call to the LLM model. You also leverage state and memory components to manage the workflow’s internal state or load external data(reference the document here).

However, as I have suggested, you can explicitly control the steps in your workflow to gain more control. With LlamaIndex, it can be done by extending the workflow object. For example:

class ReferenceRetrivalEvent(Event):
    question: str

class Advice(Event):
    principles: List[str]
    profile: dict
    question: str
    book_content: str

class AdviceWorkFlow(Workflow):
    def __init__(self, verbose: bool = False, session_id: str = None):
        state = get_workflow_state(session_id)
        self.principles = state.load_principle_from_cases()
        self.profile = state.load_profile()
        self.verbose = verbose
        super().__init__(timeout=None, verbose=verbose)

    @step
    async def interview(self, ctx: Context,
                        ev: StartEvent) -> ReferenceRetrivalEvent:
        # Step 1: Interviewer agent asks questions to the user
        interviewer = get_interviewer_agent()
        question = await _run_agent(interviewer, question=ev.user_msg, verbose=self.verbose)

        return ReferenceRetrivalEvent(question=question)

    @step
    async def retrieve(self, ctx: Context, ev: ReferenceRetrivalEvent) -> Advice:
        # Step 2: RAG agent retrieves relevant content from the book
        rag_agent = get_principle_rag_agent()
        book_content = await _run_agent(rag_agent, question=ev.question, verbose=self.verbose)
        return Advice(principles=self.principles, profile=self.profile,
                      question=ev.question, book_content=book_content)

    @step
    async def advice(self, ctx: Context, ev: Advice) -> StopEvent:
        # Step 3: Adviser agent provides advice based on the user's profile, principles, and book content
        advisor = get_adviser_agent(ev.profile, ev.principles, ev.book_content)
        advise = await _run_agent(advisor, question=ev.question, verbose=self.verbose)
        return StopEvent(result=advise)

The specific event type’s return controls the workflow’s step transition. For instance, retrieve step returns an Advice event that will trigger the execution of the advice step. You can also leverage the Advice event to pass the necessary information you need.

During the implementation, if you are annoyed by having to start over the workflow to debug some steps in the middle, the context object is essential when you want to failover the workflow execution. You can store your state in a serialised format and recover your workflow by unserialising it to a context object. Your workflow will continue executing based on the state instead of starting over.

workflow = AgentWorkflow(
    agents=[interviewer, advisor, retriever],
    root_agent="interviewer",
)
try:
    handler = w.run()
    result = await handler
except Exception as e:
    print(f"Error during initial run: {e}")
    await fail_over()
    # Optional, serialised and save the contexct for debugging 
    ctx_dict = ctx.to_dict(serializer=JsonSerializer())
    json_dump_and_save(ctx_dict)
    # Resume from the same context
    ctx_dict = load_failed_dict()
    restored_ctx = Context.from_dict(workflow, ctx_dict,serializer=JsonSerializer())
    handler = w.run(ctx=handler.ctx)
    result = await handler

Summary

In this post, we have discussed how to use LlamaIndex to implement an AI journal’s core function. The key learning includes:

  • Using Agentic RAG to leverage LLM capability to dynamically rewrite the original query and synthesis result.
  • Use a Customized Workflow to gain more explicit control over step transitions. Build dynamic agents when necessary.

The source code of this AI journal is in my GitHub repo here. I hope you enjoy this article and this small app I built. Cheers!

The post How to Build an AI Journal with LlamaIndex appeared first on Towards Data Science.