Harnessing the Power of JSON-based Agents: Integrating Ollama, LangChain, and Neo4j

Introduction: The Evolving Landscape of LLM Agents

The integration of Large Language Models (LLMs) with external tools has become a cornerstone in advancing their capabilities beyond mere text generation. This synergy allows LLMs to access dynamic information, maintain user context through memory, and understand intricate relationships within knowledge graphs. While platforms like OpenAI offer sophisticated fine-tuned models for tool usage, the open-source community is rapidly developing powerful alternatives. This tutorial delves into building a JSON-based agent that bridges the gap between open-source LLMs, specifically Mixtral via Ollama, and the robust graph database capabilities of Neo4j, orchestrated by LangChain.

Enhancing LLMs with Tools

The concept of augmenting LLMs with tools is revolutionizing how we interact with artificial intelligence. Imagine an LLM that doesn't just answer questions based on its training data but can actively search the web, execute code, or query a specialized database. This is precisely the power that tools provide. For instance, ChatGPT's paid version integrates tools like Bing Search and a Python interpreter, enabling it to perform real-world actions and access up-to-date information. These tools act as extensions, granting the LLM dynamic access to information, enabling personalization through memory, and facilitating a deeper understanding of relationships, particularly within a knowledge graph structure. This leads to more accurate recommendations, a better grasp of user preferences over time, and a more adaptive user experience.

The Challenge with Open-Source LLMs

While advanced models like GPT-4 excel at function calling and tool usage, many open-source LLMs present a different challenge. Models available through platforms like Ollama, while powerful in their own right, often struggle with consistently generating the predefined structured output required to power an agent reliably. Some models are fine-tuned for function calling, but they may adhere to undocumented or highly specific prompt engineering schemas, limiting their versatility. The goal here is to enable these capable open-source models to act as effective agents by guiding them to produce structured, actionable output.

Introducing the Semantic Layer

To address the challenges of integrating open-source LLMs with external systems like Neo4j, we employ a "semantic layer." This layer comprises a set of predefined tools that the LLM can utilize. These tools are designed to abstract the complexities of interacting with the Neo4j graph database, allowing the LLM to focus on understanding user intent and selecting the appropriate tool. For example, we might have tools for recommending movies based on genre or specific movie titles, or tools to retrieve information about actors and movies. The semantic layer essentially provides a structured vocabulary and set of actions that the LLM can understand and invoke.

Defining Tool Inputs with Pydantic

A crucial aspect of building an effective agent is defining how the LLM should interact with the tools. This involves specifying the inputs each tool expects. We utilize Python's Pydantic library to define these inputs in a structured and type-safe manner. For instance, a recommender tool might accept optional `movie` and `genre` parameters. The `genre` parameter can be further refined using an enumeration of available genres, ensuring the LLM provides valid input. While these inputs might seem straightforward, their correct interpretation by the LLM is paramount for the agent's success. The Pydantic models serve as a clear contract for what information the tools require.

Crafting the JSON-based Prompt

The core of enabling a JSON-based agent lies in its system prompt. This prompt must meticulously instruct the LLM on how to format its output, especially when it needs to invoke a tool. The desired output structure typically includes a "Thought" section where the LLM explains its reasoning, an "Action" section containing a JSON blob, and an "Observation" section for the tool's response. The JSON blob itself must adhere to a specific format, containing an "action" key (the name of the tool to use) and an "action_input" key (the parameters for the tool, also in JSON format). It is critical that the LLM produces only a single action at a time and avoids returning lists of actions. The prompt must also clearly define the available tools and their names. The final output structure guides the LLM to provide a "Final Answer" once it has successfully completed the task or gathered sufficient information.

Example Prompt Structure:

The prompt begins by listing the available tools. The most critical part is the instruction on the output format. When the LLM needs to call a function, it should use the following JSON structure:

{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}

This is why it's termed a JSON-based agent: the LLM is instructed to produce JSON when it intends to use any of the available tools. However, this is only a portion of the output definition. The complete output should follow this structure:

Thought: you should always think about what to do
Action:
```json
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

The LLM is expected to articulate its reasoning in the "Thought" section. When invoking tools, it must provide the action input as a JSON blob. The "Observation" section is reserved for the output from the tools. When the agent determines it can provide a final answer to the user, it should use the "Final Answer" key.

Handling LLM Output Variability: The Smalltalk Tool Workaround

A common challenge encountered when working with open-source LLMs like Mixtral is their tendency to deviate from the strict JSON output format, especially when they don't intend to use a tool. In some experimental scenarios, when an LLM decided not to use any tools, it might output a JSON action input like `{"action": null, "action_input": ""}`. LangChain's output parsing functions, however, do not gracefully handle such null actions and can result in errors, as "null" is not a defined tool. To circumvent this, a practical workaround is to introduce a "dummy" smalltalk tool. This tool can be invoked by the agent when the user's input is conversational or doesn't require a specific tool function. By providing this escape hatch, the agent can gracefully handle situations where no specific tool is needed, thereby preventing parsing errors and ensuring a smoother user experience. This approach effectively provides an "out" for the LLM when it doesn