Stack: Python, PyTorch, FastAPI, Docker, SyncFusion

Blog post: Pseudonymization, Context Engineering, Word documents: Three Years Building an AI Legal Assistant

Co-founded with Paul Lefeuvre (lawyer and CEO) in late 2022, Copilex is an AI assistant helping lawyers analyze and draft legal documents. As a CTO and co-founder, I designed and built all the technical aspects, from the AI engineering to the security and compliance.

We developed Copilex in close partnership with design partners from top-tier law firms, then opened it to smaller firms and businesses. It was adopted by a wide range of users: solo practitioners, mid-size firms, small legal teams, and entrepreneurs. In early 2026, after three years of building, my co-founder joined Legora, the European Legal AI leader, to continue the legal AI mission there. For my part, I’m now looking for new opportunities to apply what I’ve learned building AI products in complex, high-stakes domains.

Problem

Large Language Models unlocked tremendous potential for the legal domain, but two major blockers stood in the way of adoption by law firms: confidentiality (sending sensitive documents to external providers was a non-starter) and document complexity (contracts can span hundreds of pages with intricate cross-references and formatting that carry legal significance). See the blog post for a deeper dive into these challenges.

Solution

Security, compliance, and pseudonymization

Confidentiality is paramount to the legal profession, so we invested heavily in security. The platform runs on dedicated servers located in France, with AES 256 end-to-end encryption and zero data retention agreements with external LLM providers. We implemented the security controls required by ISO 27001, covering access management, incident response, and data handling procedures.

On top of this infrastructure, we built Sentinel, a pseudonymization system that automatically replaces sensitive information before it is sent to any external LLM: “John Smith” becomes “[PERSON_NAME_1]”, “Acme Corp” becomes “[ORGANIZATION_NAME_1]”. The LLM’s reasoning is unaffected, and the response is de-pseudonymized before being shown to the user.

Standard PII detection models did not fit the legal domain (e.g., court names should not be anonymized since they determine jurisdiction, while private corporations almost always should be). We built a custom entity ontology and trained a NER model on it, using a data augmentation pipeline: small hand-annotated “gold” dataset, LLM-generated “silver” annotations at scale, then distillation into a smaller model able to run locally in real time.

The platform is also LLM-agnostic: we continuously benchmarked different models to select the best ones for each sub-task, while letting users choose a specific model if they prefer.

Document analysis: context engineering for legal reasoning

Two analysis modes proved particularly useful: sanity check (flag issues in a single contract) and comparative analysis (compare a contract against a reference document to identify discrepancies).

For comparative analysis, I built a multi-stage pipeline: structure extraction, section matching (vector search + BM25), context augmentation (pulling in referenced articles and definitions), LLM-based comparison, and post-processing. Interestingly, the LLM step that analyzed each specific section was not the hardest part, more like a cherry on top of a complex multi-layered cake. This illustrates why context engineering is key to getting good results out of LLMs.

Working with Word documents

Lawyers work in Word, and every detail matters. Standard libraries could not extract everything faithfully (e.g., Pandoc would miscalculate section numbering, producing false positives in cross-reference checks). I forked python-docx to handle edge cases and built a mapping from XML nodes to text positions, allowing format-preserving modifications when the LLM suggests changes.

I also built an in-app document editor using SyncFusion with version control, enabling iterative collaboration between the lawyer and the AI while keeping a full audit trail.

Challenges

Building on shifting ground. Developing from late 2022 / early 2023 onward meant the LLM landscape changed dramatically under our feet. Tasks that required elaborate prompting and retries started to work without special handling as models improved. Some custom models we trained became obsolete when newer LLMs could handle the same tasks via few-shot prompting with better accuracy.

Document structure extraction was a prime example of this. We invested significant effort training custom models to extract document structure and hierarchy. Performance was acceptable but far from satisfactory, and crucially, any error at this step would cascade through the entire analysis pipeline. With later versions of GPT-4o, few-shot prompting outperformed our custom models, so we shifted approaches. The training data and annotation effort were not wasted (reused for evaluation and prompt design), but the models themselves became obsolete.

Word document edge cases consumed far more effort than anticipated. From the outside it seems like a solved problem, but when precision matters and a misrendered footnote or incorrect section number could have legal consequences, the standard tools fall short.

Precision vs. cost trade-offs in the retrieval pipeline required careful tuning. More context generally means better results, but also slower and more expensive processing. More sophisticated approaches (agentic retrieval, GraphRAG) add complexity and risk of cascading failures. Finding the right balance was an ongoing challenge.

Overall that was an amazing experience, which made me truly “full-stack” in the largest possible sense, while also improving my product skills.