The system is a modular retrieval framework, managing the retrieval part of a RAG system.

It allows to easily ingest a different kind of data and experiment with different retrieval system, evaluate each of them and gives accuracy measurements to say which one is the best.

The retrieval system takes a plain text user query and search for relevant data with a combination of strict filtering and similarity search via a vector database filled with embeddings of the data.

Processing pipeline

Ingestion phase

Initially we have input data, which can be a sql dump, a panda dataframe, json,… We then transfer this data into our own data system (like a postgresql and/or S3 server) for easy access

Data extraction phase

We may need to transform the input data.

Each input data could be too big, then we need to chunk it.

Or, if it is unstructured, we can try to extract it into a proper structure, via LLMs or classifier

Data transformation

For each input data (or chunked part), we then derive one or multiple derived outputs.

These outputs can be the raw data itself, but also a transformed form of it, like a rephrasing, a plain text description from a structured object, possible queries a user could ask to find it, …

This can be done in different ways, either programmatically or via an LLM or classifier.

As the search is based on similarity search (vector distance between embeddings), embedding a data in a similar form than what could the query look like helps find relevant data.

Embedding phase

For each input data (or chunked) create an embedding for each derived outputs.