Request A Demo

How to Measure AI Quality

The prerequisite for measuring AI quality is a so-called test set. You test how well a task is solved by a human and how well it is solved by an AI.

These test sets are the basis for training the AI further and further to achieve more precise outputs. Legartis, for example, only releases AI results that achieve an accuracy of over 92%.

1. A task is solved by humans

Example: A specific clause is to be extracted from 100 contracts. Have this task solved by specialists.

 

2. the same task is solved by an AI

Now let an AI perform the same task. So: The AI should recognise a specific clause from 100 contracts.

3. comparison of the results

The AI's answers are compared with the human answers. This provides a match or deviation. You can quantify how well the AI performs compared to humans.  

 

AI Quality Standards

There are growing calls for AI quality standards. This makes sense - it is sometimes difficult for laypeople to judge AI performance, or to understand why an AI makes certain mistakes.

However, there are currently no official AI quality standards. There are initiatives, such as the Legal Bench Project in the English-speaking world, which compares large language models on certain categories.

But as a user, there is currently little to know about the quality of a vendor's AI.

The Challenge of Measuring AI Results

The more complex the issue, the more difficult it is to evaluate the AI output. For simple tasks, such as recognising certain contract clauses, an AI's performance can be quantified by direct comparison with human results. However, as tasks become more unstructured, such as interpreting legal documents, it becomes more difficult to develop standardised testing methods.

 

High AI Quality Indicators

Based on discussions with experts, the following indicators point to good AI quality at an AI provider

  • One indicator of the AI quality of a provider is, for example, its data protection regulations and security standards of the technology infrastructure (ISO certificates).

  • In-house test sets: There are AI providers who work with their own test sets. They continuously measure and improve the AI output.

  • Experts & AI developers working together: The development of an AI for a specific use case is particularly successful when specialists who can evaluate the AI's output work together with the AI developers.

Testset-Legartis

Figure: At Legartis, only AI results with an accuracy of over 92% (F1 score over 0.92) are made available to the customer.

How To Evaluate an AI for Your Business Case?

There are more and more providers offering AI solutions for different areas of application. How do you find out whether the AI quality meets your requirements?  

1. Narrow down the area of application and define AI quality

The more clearly defined and limited the area of application, the easier it is to say how high the quality of the AI needs to be: Should the AI take over a process fully automated, without subsequent verification by a human (autonomous AI)? Then the error tolerance of the AI output should be close to zero.

Or is the AI used for a process that is then briefly checked by a human (AI assistance)? Then the error tolerance is slightly higher.

In order to evaluate an AI, it is important that the area of application is narrowed down and the fault tolerance is defined.

2. Data protection regulations and security guidelines of the company 

As with other technology applications, the provider's data protection regulations and security guidelines should be in line with your own organisation.

3. Perform a plausibility check

If you know which process you want to use an AI for, you can contact the various providers and ask them directly:

  • Do you use your own test sets? How do they look like?

  • Are there specialists in your AI development team (in the legal field, for example, lawyers) who check and improve the AI results?

 

Areas of Application and AI Quality

AI offers significant advantages for repetitive tasks such as spending hours looking through contracts and searching for clauses. It performs these tasks more efficiently and accurately than humans, who are more prone to error due to fatigue.

Examples of good applications for AI:

  1. Routine tasks with a clear focus: AI works well with routine tasks/work that is clearly defined and not too complex. It completes tasks efficiently and reliably. Example: AI is successfully used for reviewing contracts and analysing clauses. It is very well suited for initial contract reviews.
  2. Document review and risk analysis: In areas such as Due Diligence, where large volumes of documents need to be searched for specific questions, AI provides valuable preliminary work by extracting and structuring relevant information.
  3. Creation of text suggestions: AI provides good support when creating text proposals. These may be drafts for e-mails, briefs or wording suggestions in contracts. These proposals are then checked by a human and adapted if necessary.

Get to know the Legal AI of Legartis. 

Get to Know Legartis Now

The right solution for many teams

The AI-assisted review allows us to improve risk management by immediately identifying which clauses violate company standards, are contrary, or are phrased differently.
Nicole Steuer
General Counsel DACH
Rexel Group
The manual review of a data protection agreement takes approximately 45 - 60 minutes, depending on the scope and complexity of an agreement - even by an experienced lawyer. With Legartis, we are able to reduce the initial review of a DPA to under 10 minutes.
Dr. Marc Hansmann
Director Legal & Governance
Arvato Supply Chain Solutions
The routine task of NDA contract review is better solved with AI than by costly back-office resources. AI is fast. We only need to follow the AI's guidance following the AI contract analysis.
Norbert Knapp
Chief Commercial Officer
Publicis Group

Die passende Lösung für jedes Team

Legal Team

Legal Teams

Legartis automates contract review so you can focus on more complex challenges.

Sales Teams

Sales Teams

Legartis ensures complete compliance with your guidelines on the way to the signature and ensures accelerated sales processes.

Procurement-Team

Procurement Teams

Legartis increases purchasing efficiency and allows you to focus on shaping optimal contract conditions.

Unleash the Full Potential of Legartis

Bid farewell to time-consuming, complicated contract review processes! Legartis artificial intelligence supports legal, sales and procurement teams in getting contract reviews accomplished at scale.

Frequently asked questions (FAQ)

+
What mistakes does an AI make?
1. Incorrect results due to prompts: These are errors that occur due to unclear inputs or instructions (‘prompts’). These misunderstandings can lead to the AI delivering unexpected or incorrect results. 2. Incomplete or inaccurate answers: Errors in which the AI only partially fulfils a task correctly due to a lack of context or insufficient data. 3. Interpretation errors: In tasks that allow for multiple correct answers, such as summarising a court case, deviations may occur that are not necessarily errors, but different interpretations or solutions. 4. Hallucinations: Errors where the AI invents information that is not present in the source document. This type of error is described as particularly problematic as it produces unexpected and unpredictable results, making it difficult to trust the AI.
+
Can you trust an AI?
If you train someone and the AI is also a trainee employee in a way, at some point you rely on the fact that the work has been done well. Especially if it has proven itself 30 or 40 times. So when you realise that the AI works well and does the work reliably, then trust builds up.
+
Who is liable if an AI makes mistakes?
If a lawyer sends a signed document, it is irrelevant how he/she created it. Whether with or without AI. He/she is liable in this case.
+
How much faster can you analyse contracts with an AI?
The use of AI, such as with Legartis in contract review, means that AI serves as an assistance solution. It helps to complete the work up to 85% faster, depending on the level of automation selected. In the case of risk analyses across all active contracts, costs are reduced by over 90% due to the elimination of manual work.
+
Why use AI for contracts today?
Automation with AI reduces the costs for external law firms and minimises legal risks.Companies benefit from faster contract processing and greater efficiency.
+
Which data protection and compliance requirements does Legartis cover?
Legartis is fully GDPR compliant and meets the highest security standards, including ISO/IEC 27001 certification.
+
What sets Legartis apart from other legal AI solutions?
Legartis offers a pre-trained and pre-tested solution (internal test sets) that is ready for immediate use. By combining language models and legal expertise, Legartis provides accurate and consistent results and supports multiple languages (German, English, French).