All articles
techMay 9, 2026

How We Placed 2nd Out of 350+ Teams at the Agentic Legal RAG Challenge

What our 2nd place at an international Legal RAG hackathon means for clients, and the takeaways we'd apply to any production RAG project

Gleb Drozdov
Legal RAG pipeline architecture — Gless AI

Two weeks ago our team Gless AI placed 2nd out of 350+ teams at the international Agentic Legal RAG Challenge 2026 and took home $8,000. We were supposed to fly to Dubai for the Machines Can See conference at Dubai AI Week to present in person, but couldn't make it due to the situation — so here are the takeaways instead.

What the competition was

The task sounds simple: answer 900 legal questions over a corpus of 300 PDFs — court decisions, laws, regulations from the DIFC (Dubai International Financial Centre) — with exact source pages cited as grounding.

In practice, the scoring formula was the harshest we've seen in any RAG benchmark:

  • Scoring was multiplicative — weak grounding (the pages you cite as sources) crushed the entire result no matter how good the answers were.
  • Some questions were traps: they referenced cases or laws that don't exist in the corpus, and the correct answer was "nothing found." A confident hallucination cost multiple scoring components at once.
  • Latency mattered: slow pipelines were penalized by a separate multiplier.

In other words, the benchmark was specifically built to reward production-grade systems. A "beautifully over-engineered slow pipeline" and a "fast but ungrounded bot" both lost.

Final results

The top-5 teams finished within six points of each other:

  1. RAGnarok — 77.9
  2. Gless AI — 76.7
  3. CPBD — 76.0
  4. Cohomology — 72.0
  5. Dmitry Ulybin — 71.9

Full leaderboard at agentic-challenge.ai/leaderboard.

Why this matters for business

Most RAG systems we see in production are optimized for a single metric — "does it answer or not." That's not enough. In legal, medical, or financial work, every citation has to be verifiable, and a confident wrong answer is worse than "I didn't find it."

That's exactly what this competition tested: you couldn't win with a slow over-engineered pipeline, and you couldn't win with a fast but ungrounded one. So 2nd place out of 350+ teams matters to us more than the prize money — it's concrete proof that we can build RAG that holds up to production load and to clients who check the citations.

Takeaways

A few short lessons that apply to any RAG project, not just legal:

  • Grounding isn't a feature — it's the core of the system. If a user can't click through to the source page, RAG in serious domains is useless.
  • A simple pipeline with the right details beats "smart" agents. On the final phase, our 600 lines of Python beat a SOTA agent that ran on the same task for 1.5 hours.
  • Measure answer quality, grounding, and latency together. Optimizing for one of them lies — a pipeline that answers beautifully but is slow or ungrounded won't survive in production.

Technical details

If you want the full technical breakdown — parsing, embeddings, retrieval, reranking, structured output, what we tried and dropped — we wrote a detailed technical write-up on LinkedIn.

If you're building a RAG system where the cost of a wrong answer is your reputation or regulatory exposure, get in touch — we'll help you design a pipeline where citations actually work and users trust the answers.

Want to implement an AI agent?

Let's discuss your task and suggest the optimal solution

How We Placed 2nd Out of 350+ Teams at the Agentic Legal RAG Challenge | Gless AI