Back to Projects
RAG + evaluationv1.0
Flagship project

Reasoning Ladder

Retrieval, SQL, benchmarks, and recursive reasoning patterns turned into repeatable engineering work instead of one-off prompt tricks.

Quality comes from structure, measurement, and feedback loops. The prompt is only one layer in that stack.

Evidence firstRepeatable testsExplainable outputs
What this proves
Use retrieval to ground the answer in the right evidence.
Use SQL to make data questions deterministic where possible.
Use evaluation to compare outputs over time instead of trusting memory.
Open-source stack
SQLRAGPyTorchscikit-learnpromptfooBenchmark harness
Experience mode
Step 1
Question
Step 2
Retriever
Step 3
Context
Step 4
Answer
Context pull
The right answer starts with the right evidence.
Live pattern
Engineering lens
  • The system should pull evidence before it reasons.
  • Good retrieval keeps hallucinations from becoming policy.
  • The evidence set should be inspectable.
Platform fit
This project belongs in Llewellyn Systems because it turns a repeated engineering pattern into a governed operating asset. The page is not a slide deck. It is a proof surface for how the system is built and how it behaves.
Toolchain note
Use JupyterBook for publication, MyST for source text, Voilà for notebook apps, Binder for reproducible environments, and JupyterLab or Colab for interactive editing. The page itself is the front door to that workflow.
Related projects
Reasoning Ladder v1.0 | Llewellyn Christian