This article is part of a series of articles on AI-powered contract reviewing and redlining. In this article, we explore the compliance challenges that come with procuring AI-powered contract reviewing tools, and how to avoid the most common pitfalls. Part 1 of the series can be found here, and part 2 here.

There is a great deal of fear surrounding the use of AI tools. In fact, it’s one of the more often-cited barriers to adoption.

And honestly, that fear is completely justified. Legal professionals struggle with significant confidentiality, deontological, and regulatory obligations. Throwing large language models (LLMs) into that in that high-stakes environment warrants some forethought.

The problem? Fear often leads organisations to restrict AI adoption so much it becomes useless of even worse, counterproductive. Some of the delightfully ironic situations we’ve seen include:

  • Allowing an AI tool to assist with reading through hundreds of documents but only if they are anonymised first. That anonymisation must be done manually of course, or did you think the AI was going to anonymise the documents so the AI couldn’t read any confidential information?
  • Prohibiting lawyers to use AI tools outright because of security concerns and then watching as lawyers circumvent this ban by using a (much less secure) personal ChatGPT account.

This is unfortunate, because while the fear is valid, there are easy ways to it without throwing AI out with the bathwater.

Below, we tackle the most common compliance concerns we hear from lawyers on the topic of integrating AI and how the underlying fears can be resolved.  

Do LLMs use my data to train their models?  

A definitive answer to this question requires reading through the terms of service of your chosen LLM or AI tool that uses an LLM. Different providers have different rules, however a clear trend emerged: If the product is free, you are the product.

A screenshot of a websiteAI-generated content may be incorrect.
Data reuse policies of major LLM providers: a quick breakdown

n other words: if you use the free version of a tool like ChatGPT, then your data is reused by default to train the model (although some providers allow you to opt out). If you use the paid or API version (i.e.: the LLM as integrated into a different tool like ClauseBuddy) then by default your data will not be reused.

  • Free versions: Typically reuse your data by default to improve their models.  
  • Paid versions or APIs: Usually do not reuse your data.  

From the LLM-provider’s perspective, this is very understandable. LLMs are very data hungry, to the point that the providers have already run out of available high-quality public data to train their LLMs on. In order to improve their products, they need additional data, and one of the best ways to gather this data, is by analysing the documents and instructions their own users submit, and how these same users respond to the answers that are given to the LLMs. Indeed, similar to how search engines can learn a great deal of information from the feedback of users — do users return to Google when they clicked on a search result, or are they happy with that result? — it is also the case that LLMs learn from feedback and follow-up questions asked by users.

At the same time, LLM-providers want to be trusted by their customers, particularly in a business environment. This trust would be breached if the customers would find out that their confidential questions and documents would end up as training material. The compromise is therefore to reuse the data in the free version, but do the opposite in the paid versions.

A good example is GPT. This LLM is developed by OpenAI, and each version is trained through a combination of data found publicly on the internet, paid content (e.g., from online newspapers and discussion forums like Reddit) and feedback gathered from users of the free version of ChatGPT. When the training is done, OpenAI then makes it available as both a free version (where user data continues to be gathered to get training data for the next version) and a paid version (where no such data is gathered). In addition, as part of its investment deal with Microsoft, it also makes a copy available to Microsoft, which makes exactly the same version available through its “Azure” server-environment, with very strict guarantees that no user data is being gathered for training purposes.

Because of this, it’s also what our own ClauseBuddy uses as a default (but not exclusive) model.  

There is one important exception to the general guarantee that your data is not reused or stored for longer than necessary – abuse monitoring. In short: most LLM providers preserve the right to store prompts, and even output, during a limited period of time.

Examples of abuse are things like looking up tips to hide a body, or recipes to build a bomb using homemade chemicals. Similar to how hosting providers have been cooperating with authorities for decades (e.g., Google providing a list of search queries for someone searching for “ways to hide a body” or Microsoft giving authorities access to documents stored in OneDrive), LLM-providers will cooperate with authorities when you would ask for bomb recipes or ways to hack into IT-systems.  

In other words, the abuse monitoring gets a lot of attention because it seems like an important exception, but in reality it’s business as usual. Read the terms & conditions of your average IT-provider, and you’ll likely see very similar wording.

Should we anonymize documents before using an LLM on them?  

In short: no, provided you work with professional-grade tools (i.e.: not the free version of ChatGPT, Claude, Gemini, etc.) and all providers involved are reliable and provide sufficient legal guarantees. In such scenario, anonymisation can even be harmful.  

It’s understandable that anonymisation feels like a sensible step, but it is neither necessary nor effective. Here is our thinking on the matter:

  • Confidentiality: As discussed in the previous section, our data is not being reused or stored for longer than necessary when working with professional tools. We’ve made that clear in the previous section.  
  • Data protection: Any provider worth their salt has made sure to tick the necessary regulatory checkboxes – most notably the GDPR in Europe. In the AI-powered reviewing framework we’ve discussed here, these providers act as processors and should in that sense have taken the necessary precautions like having data processing agreements in place as well as ensuring proper international data transfer protections.  
  • Effectiveness: Even if anonymisation was necessary and beneficial in order to use AI in the contract reviewing process,  it would be practically impossible to implement effectively:

Manual anonymization would destroy and productivity gains. No lawyer would ever take the time to manually remove all the sensitive information from a document before sending it on to an LLM for review. With the time spent on that exercise, you could have reviewed the document yourself two times already. This would mean the technology is essentially dead on arrival.  

Automated anonymization is a delightfully ironic alternative. This entails automatic anonymisation with the help of a specialised tool – typically another LLM. In this situation, you would use an LLM to anonymise a document to make sure that another LLM cannot read the sensitive information contained in it. Or as we like to call it: a field day for Kafka.

There are also some vendors of AI tools that exploit the compliance fear to claim that their tool is the only option because they offer an anonymisation engine. This is another case of the worst kind of compliance theatre: you would be sending your personal data to the anonymisation engine of a small startup because you don’t trust a behemoth like Microsoft (in the case of Azure GPT4o).

Sadly, the anonymisation misconception has been reinforced by well-reputed sources such as the Belgian Bar Association (in this case: the emphasis is on pseudonymisation rather than anonymisation, but similar concerns apply). This is usually well-intentioned, if misguided advice born from a lack of understanding of the technical infrastructure behind the professional-grade tools we discussed above.

What security measures should vendors be able to show?

Choosing a trusted vendor can eliminate many typical compliance concerns, but how do you know if a vendor can be trusted? How do you properly assess their trustworthiness? When selecting an AI-powered contract review tool, don’t settle for vague claims of “military-grade encryption” or generic data security promises. Ask for tangible compliance measures. The following are already a good start.

ISO-27001 or SOC2 certification

These are the gold standards for information security:

  • ISO 27001 is stricter and internationally recognized.  
  • SOC 2 is a strong but more US-focused certifications, often emphasizing internal controls  

Either certification is a positive sign, but ISO 27001 is especially reasurring for international use.    

Ability to use your own API key

More and more legal teams are developing the technical expertise to take charge of their data infrastructure (specifically in law firms, where this knowledge has long been lacking). As a result, more legal teams are now able to set up their own LLM subscription.  

A key component of this subscription is an API key, which allows you to plug your subscription into a tool that communicates directly with the language model to generate responses. With an API key of your own subscription, you’re not forced to use the vendor’s default API and can instead use your own so you can track billing and ensure that any data you send to the LLM never even hits the vendor’s servers. modulate access.  

A responsible vendor should give you the option to plug in your own LLM API key (e.g., from OpenAI or Azure). This ensures you remain in full control of where the data goes, and who processes it.

LLM-agnostic architecture

The Generative AI space moves incredibly fast. The best LLM today may be obsolete in six months. Vendors that tie themselves too tightly to a single LLM provider risk leaving you behind when better options emerge.  

An LLM-agnostic tool allows for easy model-switching and ensures that your workflows stay future-proof. Bonus points if the vendor actively monitors the LLM landscape and swaps in newer, better-performing models proactively.

Abuse monitoring exception

As mentioned earlier, LLM providers like Microsoft and Google have abuse monitoring systems in place. While they are a good thing on paper, they do constitute a big exception to the rule that data is not stored for longer than necessary or used for any other purpose than to provide the service.

Vendors of AI reviewing tools will therefore typically try to get an exemption from the LLM providers powering their tools to ensure that this final loophole is closed. If a vendor has secured such an exemption, it’s a strong signal that even the biggest LLM providers trust them enough to make an exception.