Background
Senne Mennes, co-founder of ClauseBase, recently had the honor of being invited as keynote speaker to TechTorget Oslo. Senne shared the results of ClauseBase’s experiments with combining a firm’s legal knowledge with Large Language Models.
The feedback we received was overwhelmingly positive and so we decided to open the talk up to a wider audience.
Below, you can find the transcript of the presentation. If you prefer watching the presentation, you can also check out this video.
AI, pure magic or disappointing?
If you're anything like the lawyers that we talk to daily, I imagine you've all tried Chat GPT, and were blown away by how it can create coherent, creative, and even funny text out of nowhere with a short prompt, and simultaneously a little bit underwhelmed with the quality of its output when it comes to legal drafting
This has been the reaction of lawyers everywhere since the technology first came to prominence late 2022.
The language manipulation and generation skills of large language models like GPT-4 are truly revolutionary, but in the end, it's just that: language manipulation".
The next Big Frontier: Large Legal Models
The next big frontier and one that law firms and legal tech vendors are already looking into is to take this miracle of a language reading and language generation machine and augmented with the knowledge collected by a law firm over years of providing legal services.
Imagine asking a tool like Chat GPT to draft a memo for a client and have it automatically searched your own internal database to come up with an answer based on the experience and expertise of your firm.
Imagine asking this tool to look at the shared purchase agreement you are drafting and propose the perfect drag along clause, considering the context of the document, the party you are representing, the industry of the parties, the applicable law, the way that you and your colleagues have dealt with this issue in the past, et cetera. That's what's at stake with a technology like this.
Now, there is an incredible amount of misinformation out there already because of the technology to do this already kind of exists and is beautiful to watch in tailor demos but leaves a lot to be desired if the setup is not done properly.
My intention is to arm you with the knowledge that will allow you to distinguish fact from fiction and marketing hype from real life use cases.
Let's start by laying the groundwork on what LLMs can and cannot do out of the box.
Fact: Generative AI is really good with language
By now, it is no secret. LLMs are really good at reading and writing languages, especially English. Because of the deliberate, built-in randomness of its response, it possesses a certain creativity that makes it something more than just an information recycling machine. While LLMs can get a little bit wordy, they are able to answer like a human would — choosing to supply only that information that is relevant and writing it in a coherent way.
Fact: Generative AI is great at search
It used to be that search technology could only find the literal keyword you searched for, perhaps complimented by synonyms and conjugations and things like that. But LLMs have kicked this up a notch because of their huge database of associations.
These tools were trained on enormous quantities of texts from which they essentially distilled huge word clouds or associations between words. When using this technology to search through a document, this allows it to not only perform your search literally, but also to guess at the underlying meaning of your search.
Fact: Generative AI is fantastic at summarization
But where generative AI really shines is in the act of summarization of texts.
In fact, a recent study has shown that LLMs like GPT outperform humans in the act of summarization. Not only are they faster and better at creating summaries, but the tests even found that humans hallucinated more than the machine. People would read a text and would misremember or make up facts to fill in the gaps in their memory.
Fiction: Generative AI has legal knowledge
We know that LLMs can search text very well. We already know that they can write well, and we know that they can summarize very well. Couldn't we then take all those elements and let them loose on our document management system? Our LLM could read through our documents and then take that knowledge to draft memos, legal opinions, and contracts for us. Right?
Unfortunately, this is where we start to enter the realm of fiction.
First, it's important to emphasize that LLMs do not have an inherent feel for legal considerations. They happen to be trained partly on a large volume of publicly available, mostly Anglo-Saxon, contracts and other legal documents which allow them to draft clauses and rudimentary contracts but not in a thoughtful or reliable way. After all, they are called "large language models". They know enough about the world in general to understand what you mean and write meaningful things about it, but they are not "large legal models".
The data to properly train a large legal model is locked away from companies like OpenAI. It's in the case law that may or may not always be reported. It's in legal doctrine that may be hidden behind paywalls, but most importantly, it's in the type of knowledge that a legal practitioner learns from years of experience; from trial and error.
Suppose you are a lawyer who is negotiating a liability clause in an outsourcing agreement. How do you know what is acceptable, what is customary, what favors your client when it comes to such things as the scope of the clause, the amount of the liability cap, etc.? The answer is simple: years of experience.
It's this knowledge that really makes the difference, and it's this knowledge that tools like GPT simply do not have access to
Future versions like GPT-5 and GPT-6 won't help, by the way. The focus for these improvements will be on increasing the character limit and making the output better and faster based mostly on existing data. They will get progressively better at reasoning based on existing data, but they won't magically gain legal knowledge if there's simply insufficient legal data to train it on.
If LLM are like a supercar, then legal data is the road
Basically, you can think of LLMs like a supercar, a general-purpose vehicle that can get you to certain places very fast, but a car can only go as fast as the road allows it to.
Data is that road, especially for niche use cases like legal services. You can have the fastest car ever built, but if all you have is a bumpy country road to drive it on, then that won't get you anywhere any faster than a tractor would.
Fiction: you can just upload your internal database
It's tempting to think then that you can build this data highway toward Large Legal Valhalla by simply letting a tool like GPT-4 crawl through your document management system.
There are a few problems with that assumption.
The reality today is that every LLM is struggling with an inherent character limit. Think of it as a limited concentration span. The LLM can only ingest so much text before it simply loses focus, and this is the reason why you can't ask ChatGPT to write a book.
Some LLMs do this better than others, Google and Amazon backed Claude can theoretically handle around 75,000 words. The newly released GPT-4 Turbo can handle around 100,000 words. There is, however, always an inherent risk that the tools lose concentration in the middle of the document. And of course, this falls short if you want to interact with your entire database, which is far more than 100,000 words unless you've been in business for two weeks.
One question that we get daily from law firms is, well, can I just train my own LLM based on the huge number of documents that I'm currently sitting on? After all, GPT was also trained on a big unstructured pile of data. Why couldn't I do the same for my organization?
And the reality is: no, you can't. Even the largest firms have interesting data volumes in the order of gigabytes, and unfortunately, this is by far not enough to get the kind of automatic cleanup and balancing of signal and noise as you get with terabytes of information.
You only have a few hundreds of contracts of the same type. You'll need hundreds of thousands of them.
Putting it all together – Retrieval Augmented Generation (RAG)
Fortunately, there still exists a way to combine your firm's legal knowledge with GPT, one promising solution where a lot of work is being done is something called retrieval augmented generation or RAG. The idea is this: your firm has gigabytes of information, but you can only feed a few pages to an LLM at a time. So, you use technology that breaks up all the information into small chunks, and then you use separate technology to locate those chunks, see which ones match the prompt in terms of their semantic meaning, and then selectively feed them to an LLM as legal knowledge.
This does allow you to run large volumes of text through GPT or any other LLM, but its number one problem is the character limit. Text must be split up in blocks because RAG cannot handle the text in its entirety. Only those blocks that have a high percentage match semantically speaking with the prompt will run through to GPT-4. The best that you can get with this approach is a 6 out of 10.
Furthermore, the technology has trouble coming up with the right answer if it must combine it from multiple sources.
Compared to the bumpy country road that your supercar must drive over if there is no augmentation of legal knowledge, this approach is more like a provincial road with a lot of traffic signs, a speed limit, and where things can go very wrong very quickly as soon as things become too complex.
As soon as the character limits gets expanded to be able to incorporate gigabytes of documents, this will be much less of a problem. But it's anyone's guess when that might happen.
Putting it all together – Prompt Engineering
There is an even better road to Large Legal Valhalla, and that's prompt engineering.
Remember how I mentioned earlier that the most valuable data to train an LLM with is not out on the internet somewhere, but in your collective heads as lawyers, thanks to years of experience? What if there was a way to let the AI read through that?
I have good news and bad news.
The good news: yes, it is possible.
The bad news: there is no brain chip yet that can do it for you automatically. What it requires is to write that knowledge down in a way and a place that the AI can read it. And this usually takes the form of comments or tags that you attach to certain clauses, documents or pieces of text in legal opinions, et cetera, to assist the AI in explaining the reflexes that you as a lawyer already have or the process that you go through in your head when you are, for example, deciding whether a clause is a useful one or not.
Coming back to legal data as a road analogy — where no data is a bumpy country road; where RAG gives you a faster road, but one that has a lot of confusing signs, prompt engineering creates a veritable racetrack, and this approach by far is the most promising.
Currently its only downside is that it is a lot of unbillable work, at least for those firms that aren't already curating their knowledge with the help of templates, clause libraries, et cetera.
Impact on law firms
We at ClauseBase have been lawyers ourselves. We get it. We know what it’s like. Lawyers are simply too busy to spend time on curating their knowledge. On really anything that isn’t billable.
But we also believe that this model for law firms as a professional organization is coming to an end.
When we talk to law firms about using LLMs in this way, we get one of two reactions:
- Some firms are really excited about the opportunities
- Some firms do not see the opportunities or, as a result, why they should choose to spend their time on making the most of them.
To those firms, we say this: it may not be a matter of choice, but one of survival.
The baseline will move
Instead of looking at what ChatGPT and the like are doing for law firms, think of what it is already doing for your clients today, and will do so increasingly tomorrow.
From your client’s perspective, asking ChatGPT a legal question is an easy way to get a response they can more or less trust. You may think it’s silly to trust the output you can’t verify yourself, but remember: clients often can’t verify a lawyer’s output either and only have trust to go on.
Furthermore, they know you have access to this tool as well. This has two consequences:
- One, clients will expect greater speed from attorneys. One of our clients recently told us they felt like the expectations of their clients were shifting from a one-week model to a one-day model.
- Two, pricing pressure is going to continue mounting. With increased expectations on speed, the billable hour model is going to come under pressure.
We personally don’t believe this means that law firms who are able to do their work two times faster are all of a sudden going to see their revenue cut in half. What we do expect to see is that in-house legal teams will become less inclined to rely on external counsel for matters that aren’t of bet-the-farm-level importance.
Ultimately, trustworthy law firms are always going to be able to charge a premium, but to gain that trust is going to require a combination of speed, quality and competitive pricing that was unthinkable 5 years ago but which will become the norm going forward.
Think of it this way: lawyers have always focused on external facing work and while that has allowed them to excel at what they do, it has also kept them busy with day-to-day affairs in ways that eliminated the possibility of technological advancement.
The real difference in the future will be made by looking inward, taking stock of the knowledge that a firm has gathered, leveraging it in a structured way, and then putting that knowledge to work for client-facing lawyers or even for clients themselves.
This is not impossible. A firm of any size can do it, from solo lawyers to the legal giants.
But it’s going to require a huge shift in the way lawyers work and think, in the way roles are divided at law firms and the roles they hire for.
If people tell you that this is going to be easy, they would be lying. But the same is true if they tell you it’s not worth the effort.