Implementing "Modular RAG" with Haystack and Hypster: A Tech Tutorial

Introduction to Modular RAG

Navigating the rapidly evolving landscape of Artificial Intelligence, particularly in the domain of Retrieval Augmented Generation (RAG), can be a daunting task. The proliferation of diverse solutions and implementation strategies often leaves professionals feeling overwhelmed and struggling to keep pace with new acronyms and methodologies. This article aims to demystify the process by introducing a structured approach to building RAG systems: "Modular RAG." This framework allows for the decomposition of RAG systems into distinct, interchangeable components, akin to building with LEGOs. By adopting this modularity, developers can achieve greater flexibility, clarity, and ease in designing, understanding, and navigating the complexities of RAG system development.

The Modular RAG Framework

The core concept of Modular RAG, as presented in recent research, involves breaking down a RAG system into six fundamental components:

Indexing: The process of organizing data to facilitate efficient searching.
Pre-Retrieval: Steps taken to process the user's query before the retrieval phase.
Retrieval: Identifying and fetching the most relevant information from the data store.
Post-Retrieval: Refining the information that has been retrieved.
Generation: Utilizing a Large Language Model (LLM) to formulate a response based on the retrieved and processed information.
Orchestration: Managing the overall flow and coordination of all components within the RAG system.

The key insight is that a wide array of existing RAG solutions can be conceptualized and implemented using these modular building blocks. This approach provides a unified framework for understanding, designing, and managing RAG systems with enhanced flexibility and clarity. The paper demonstrates this by re-expressing various RAG solutions using these common components, illustrating how different methods can be seen as different combinations or implementations of these core modules.

Implementing Modular RAG with Haystack and Hypster

To practically implement this Modular RAG framework, we will leverage two powerful Python libraries: Haystack and Hypster. Haystack serves as the primary library for constructing the core components and pipelines of our RAG system, offering a rich set of pre-built components for various stages of the RAG process. Hypster, on the other hand, is a configuration management system designed to handle complex, hierarchical, and swappable configurations, making it ideal for managing the diverse options within a modular framework.

Haystack: The Component Library

Haystack is an open-source framework designed for building production-ready LLM applications, retrieval-augmented generative pipelines, and sophisticated search systems. Its strengths lie in its well-designed components, flexible pipeline architecture that allows for dynamic configurations, and extensive documentation. While its pipeline interface can be somewhat verbose, and using components outside of a pipeline might be less ergonomic, Haystack provides a robust and customizable foundation for LLM application development.

Hypster: Managing Configuration Spaces

Hypster is a lightweight, Pythonic configuration system developed to manage AI and Machine Learning project configurations. It supports hierarchical and swappable configurations, enabling the definition of a "superposition of workflows" or a "hyper-workflow." This allows users to define a range of possible configurations and easily switch between them for experimentation and optimization. While currently under active development and not yet recommended for production, Hypster offers a powerful way to manage the complexity of modular systems.

Codebase Structure and Implementation

This tutorial assumes a familiarity with the fundamental concepts of RAG. We will break down the implementation into key areas, demonstrating how Haystack and Hypster work together.

LLM Configuration Space

We begin by defining the configuration space for our Large Language Models (LLMs). Using Hypster's `@config` decorator, we create a function that encapsulates various LLM options, including providers like Anthropic and OpenAI, and specific models within each provider. Conditional logic within this configuration allows us to instantiate the appropriate Haystack component (e.g., OpenAIGenerator or AnthropicGenerator) based on the selected model. This enables dynamic switching between LLMs without altering the core code structure.

Indexing Pipeline Configuration

Next, we configure the indexing pipeline, which handles the processing of input documents, such as PDFs. This pipeline can optionally include an LLM-based enrichment step. This enrichment involves summarizing the document's initial content and extracting keywords, which are then inherited by document chunks. This is achieved by conditionally adding components like PromptBuilder, an LLM component, and a custom AddLLMMetadata component to the Haystack pipeline. Hypster's ability to nest configurations (e.g., hp.nest("configs/llm.py")) allows us to seamlessly integrate the LLM configuration into the indexing pipeline. We can also define various splitting strategies (by sentence, word, passage, or page) with configurable lengths and overlaps.

Retrieval Pipeline Configuration

For the retrieval stage, we build a flexible configuration space that supports multiple retrieval types: BM25 (keyword-based) and embeddings (semantic search). Haystack's in-memory document store is used for rapid experimentation. The configuration allows for selecting either retriever, or both, and combining their results using different join modes like distribution_based_rank_fusion or reciprocal_rank_fusion. We also configure similarity functions for embedding retrievers and algorithms for BM25 retrievers. Helper components like PassThroughText and PassThroughDocuments are used to ensure a consistent pipeline structure regardless of the selected retrieval types.

Integrating Components and Building the Full RAG Configuration

The main configuration, rag_config, binds all these modular pipelines together. It allows for nested configurations, enabling us to specify settings for indexing, retrieval, embedding, and generation components using a unified interface. For instance, we can select different embedders (e.g., fastembed or jina) and document stores (in-memory or Qdrant). The configuration also allows for optional reranking steps and defines the final response generation pipeline. This hierarchical configuration system, powered by Hypster, provides a powerful way to manage a vast space of potential RAG configurations.

Executing the Modular RAG System

Once the configurations are defined, we can instantiate and execute the pipelines. This involves warming up the indexing pipeline, running it on a set of documents, and then executing the retrieval and generation pipeline with a user query. The output demonstrates how the system effectively retrieves relevant information and generates a coherent response. The example showcases the flexibility by configuring specific models, document stores, and retrieval strategies.

Benefits of a Modular RAG Approach

The modular approach to RAG, facilitated by tools like Haystack and Hypster, offers significant advantages:

Hyperparameter Optimization: The configurable nature makes it straightforward to experiment with different hyperparameters for various components.
Scenario-Specific Configurations: Different types of queries or use cases can be handled by distinct RAG configurations within the same codebase. For example, one configuration might prioritize BM25 retrieval for specific queries, while another focuses on dense embeddings.
Agentic Tool Use: This modular system can be easily wrapped as a tool that an AI agent can instantiate and utilize, opening up possibilities for more sophisticated agentic behaviors.
A/B Testing in Production: The ability to dynamically select configurations allows for A/B testing of different RAG setups directly in a production environment by specifying configurations for individual API requests.

Conclusion

Building a fully configurable, modular RAG system represents a significant advancement in developing adaptable and efficient AI applications. By breaking down the RAG process into interchangeable components and managing configurations with tools like Hypster, developers can create a "superposition of workflows" or a "hyper-workflow." This approach not only simplifies the management of complex RAG systems but also unlocks considerable benefits for optimization, customization, and deployment in diverse scenarios. This tutorial provides a foundational understanding and a practical example of how to implement such a system, encouraging further exploration and adaptation for specific use cases.

Resources

For further exploration and implementation details, please refer to the project's GitHub repository and the official documentation for Haystack and Hypster.