LegalQAEval: a new benchmark for legal AI

We’re releasing LegalQAEval, the first open-source extractive question answering benchmark for the legal domain.

LegalQAEval consists of 2,410 different legal questions encompassing a broad and diverse range of topics, from ecological crime all the way to international trade law.

Answers in LegalQAEval have all been manually extracted from and annotated within reference documents, making the dataset suitable for evaluating both extractive and generative question answering models as well as rerankers and relevance scorers (when transformed into a simple relevant-irrelevant binary classification problem).

Notably, LegalQAEval was bootstrapped entirely from other question answering datasets, namely, the well-known SQuAD 2.0, MS MARCO, HotpotQA and Natural Questions datasets. This was done by using Isaacus’ Kanon Universal Classifiers to filter out non-legal examples from those datasets, with a legal expert optimizing the models’ prompts and manually verifying the accuracy of classifications.

In the spirit of giving back to the open-source community, we’ve made LegalQAEval free for commercial use, and we encourage data scientists to use it to validate and test their own legal AI models.

Those interested in accessing LegalQAEval can find it on Hugging Face.

If you’d like to integrate the technology we used to create LegalQAEval into your own solutions or if you’d like to learn more about our legal AI offerings, we encourage you to reach out.