Want to listen to the full episode and all our other episodes?
Hearsay allows you to fulfill your legal CPD requirements every year.
Our yearly subscription is only $299/year.
With a yearly subscription, you can access all of our episodes AND every episode we release over the next year.
Do Human Lawyers Dream of Electric Briefs? Using AI Wisely in the Practice of Law
| What area(s) of law does this episode consider? | Legal AI Tools; Emerging Technologies; Artificial Intelligence. |
| Why is this topic relevant? | The law is sometimes criticised for being slow to change, but it feels like legal practice is evolving rapidly these days. And there’s few rapidly evolving developments, more disruptive or promising – or for that matter, divisive – as artificial intelligence. As generative AI tools integrate into the daily workflows of lawyers to varying degrees of depth and rapidity, they’re transforming how legal professionals research, draft and think about the law, perhaps. These systems can provide up-to-date conversational answers to natural language questions complete with citations and source links, which change how information is accessed and used, but the reliability of that information is sometimes called into question and raises important questions about a lawyer’s responsibility and independence when using these tools. As with any powerful tool, the potential for misuse or misunderstanding is significant. An effective use of AI in your legal practice requires an awareness of its limitations and an understanding of how it all actually works. |
| What are the main points? |
|
| What are the practical takeaways? |
|
| Show notes | The Supreme Court of New South Wales, Practice Note SC Gen 23 – Use of Generative Artificial Intelligence, effective from 3 February 2025 Benedict Evans, Are better models better?, published online January 22, 2025 |
DT = David Turner; WM = Will McCartney
| 00:00:00 | DT: | Hello and welcome to Hearsay the Legal Podcast, a CPD podcast that allows Australian lawyers to earn their CPD points on the go and at a time that suits them. I’m your host, David Turner. Hearsay the Legal Podcast is proudly supported by Lext Australia. Lext’s mission is to improve user experiences in the law and legal services and Hearsay the Legal Podcast is how we’re improving the experience of CPD. The law is sometimes criticised for being slow to change, but it feels like legal practice is evolving rapidly these days. And there’s few rapidly evolving developments, more disruptive or promising – or for that matter, divisive – as artificial intelligence. As generative AI tools integrate into the daily workflows of lawyers to varying degrees of depth and rapidity, they’re transforming how legal professionals research, draft and think about the law, perhaps. These systems can provide up-to-date conversational answers to natural language questions complete with citations and source links, which change how information is accessed and used, but the reliability of that information is sometimes called into question and raises important questions about a lawyer’s responsibility and independence when using these tools. As with any powerful tool, the potential for misuse or misunderstanding is significant. An effective use of AI in your legal practice requires an awareness of its limitations and an understanding of how it all actually works. To unpack these challenges and opportunities, we’re joined today by Will McCartney, founder of Habeas AI, an Australian company dedicated to revolutionising the way Australian lawyers engage with legal research. Habeas provides easily readable responses backed by verifiable legal sources. I’ve known Will for a while in the legal tech community. Love Habeas, love what you’re doing. Will, thanks so much for joining us on Hearsay. |
| 00:01:57 | WM: | Thank you so much for having me on. I really appreciate the opportunity. I think it’s gonna be a great session. And obviously we met a few years ago back when your company Lext was actually on a similar trajectory in certain ways too, Habeas was working on some similar problems. So I was really excited to meet someone else in the space who was working on legal AI and research and those kinds of problems. So yeah, I’m looking forward to today. |
| 00:02:20 | DT: | Yeah, we met back at Web Directions in 2023, I think, in the, like 32nd pitch competition? |
| 00:02:27 | WM: | Yep, yeah. |
| 00:02:28 | DT: | Yeah. And I guess since 2023, back when we met at Web Directions we’ve seen an enormous number of legal tech startups enter the market and we’ve seen a lot of incumbent legal technology companies develop AI features. You know, I think you and I go to a lot of the same conferences, right? And we see at those conferences over the last 12, 24 months, basically everything is about AI features or new AI products, right? But some of those entrants to the market, and I say this particularly about some recent entrants, although sometimes it can be true of features released by incumbents, is that they don’t have much of a background in the problem that they’re solving, right? As we’ll talk about a bit later, I feel like the law is a very enticing sector for AI companies to work in because we’ve got all of this unstructured text to work with. You know, we work with cases and legislation and submissions and opinions, and it’s all this data that’s trapped in writing that we now have a tool to unlock. But you do have a background in the law. Tell us a bit about your background in the law and how you came to found Habeas. |
| 00:03:28 | WM: | Yeah. I think even going back a bit even to high school days, I was always really interested in the humanities. So that was where I grew up and did a lot of English, did a lot of history, did a lot of essay writing and that sort of stuff. So immediately after uni I actually went and studied at Cambridge University. So I was studying HSPS, which is sort of politics, sociology, social sciences, that sort of stuff. But my story I guess there was a bit of a spanner in the works that occurred with COVID, right? So that impacted international students in a number of complex ways, right? It impacted everyone globally in a very different way, and the end result of that for me was that it became a better option to study in Sydney, in Australia with my family at that point in time. So I transferred to an arts degree at USyd carrying on some of the same stuff, but I added law to that because I was like, “well, I think this would be an interesting thing that I’m not currently studying, that aligns with my area of interest and could be quite fascinating to add this on top of the arts component.” And I was studying arts/law for a couple of years there. But at the same time, I will say that definitely throughout my uni years, particularly 2021, 2022, I was getting really into technology at the same time. So I was building projects on the side. I was studying projects, not really just in legal initially, but in other spaces that were deeply fascinating to me. And I liked the fact that with tech you have this enablement, this sense of autonomy that you might not always get in the structure of say, a university degree and what you are studying there, which can be a bit more rigid or constrained to the parameters of what you are working on, right? So I think that’s a bit of my background in the sense of, yes, I’ve got the humanities interest, but increasingly got interested in tech and capacities we could do there. And I think eventually where that led me to was the fact that whilst in law school I saw there was quite a lot of problems that not only students, but also professors at law school that you’re talking with, face around the retrieval of legal information and search engines that we were using at law school even three to four years ago. Of course, the dominant ones that you might be exposed to in law school, like AustLII or bigger providers like Lexus or [Thomson Reuters] that the university has a subscription to, right? And there’d be these common phrases from teachers being like, “oh, you this is good to get you started,” or, “you know, this is how you kind do a keyword or a Boolean search,” et cetera. |
| 00:05:35 | DT: | It’s funny you mentioned this is how you do a Boolean search at law school because close to 20 years ago now, right? – we’ve got a bit of a gap between us in our legal educations – but almost 20 years ago now, I was at law school and also was taught the Boolean search on Lexus Nexus. So not a huge amount has changed, or had changed, I suppose, in that almost 20 year period. |
| 00:05:55 | WM: | Yeah. And I think that was something I was like, fascinated with at the time, I was seeing that there were also certain constraints around that and how you could access information. It was a problem I became well acquainted with, and then on the side of course I was doing a lot of tech work and improving my skillset there. And so, early 2023 when the bones of Habeas started coming into play, and at that time, I really don’t think I had fully concretized this idea, if it’s gonna be a company, et cetera, it’s gonna be a business, it was more like a project that I was working on and seeing, “does this have legs? Is this a real problem to solve?” AI and the rise of LLMs, my thesis was this is gonna be transformative in certain ways, but there’s gonna be a lot of technical problems to solve and get your head around. So I think it was the emergence of various interests that I’d had in tech entrepreneurship, the humanities ,and just on a personal level I think I was super passionate about the fact that, as you said before, LLMs are a way to unlock an entire new system of knowledge, an entire new data set. They’re a way to access textual data and information, and that’s incredibly interesting to me as well. The hybrid of those two elements of using tech products and software to access this material and create a bridge with human language is super interesting. |
| 00:07:05 | DT: | Now we’re gonna dive into legal research and Habeas’ product and talk a bit about what’s a good use for AI and legal research and what’s not, and how you might compare different ways of using AI to use legal research. But before that, I just wanted to do a quick refresher. We had a ‘AI 101’ episode in our last season. We talked a bit about transformers and next token prediction and how LLMs work. TIP: The episode that David just mentioned that gives a realistic look at AI in legal practice is our episode with Dr Armin Alimardani. That one is episode 131 and is called ‘The Lawyer’s Guide to Generative AI: Where It Fits (and Doesn’t) in Legal Practice’. Check that one out if you get a chance. It includes a quick crash course on how generative AI actually works. And I don’t know if you would agree with this, but at least anecdotally, I feel like there’s less and less magical thinking about AI in 2025. I think in 2023 there was a lot of misconception about how AI worked and a lot of inflated expectations of what it might be capable of. I think people are starting to understand that LLMs are next token predictors. They generate plausible responses, but not necessarily correct ones. But there’s still some of that misconception, right? And I guess there’s still a lot of people who haven’t used them regularly or at all. So, just as a bit of a refresher, the elevator summary when we’re talking about AI in this episode, we’re really talking about large language models and derivatives thereof like, embedding models and things like that. What are we describing when we’re describing an LLM? |
| 00:07:28 | WM: | Yeah, so when we’re describing a large language model, we’re looking at something that engages in what’s fundamentally known as next token prediction. So, a token could be done in a few different ways. A token is like a representation of text, and it could be a character-based token. So it could be letters in the alphabet that are constricted down. Or it could even be word-based tokenization. So it could be “cat,” “dog.” Those could be individual tokens, depending on the model and how it’s been designed, basically, right? |
| 00:07:50 | DT: | They’re like Lego blocks for making words and sentences. |
| 00:08:34 | WM: | Exactly, exactly. |
| 00:09:01 | DT: | So you think about the word “cognition.” That might have three tokens in it, like “cog,” which is a word on its own but could also form part of a word. The letter “n” could be a token on its own. And “-ition” is the ending to a lot of words. So that’s probably a token in and of itself. |
| 00:09:05 | WM: | Yeah. So with large language models, most of them work in some sense that they’ve been trained on a large body of text or material that could be domain specific material. It could also just be more general material across the web. And that’s gonna make an impact on the way that those models actually give responses to the user, the way that they write the knowledge that might be encoded within them, so to speak. Now in terms of that notion of next token prediction as well I think some people maybe get this a bit wrong, that they then assume, “oh, this is just a sophisticated auto-complete.” I think it is a much more powerful technology in the sense that large language models are still oftentimes working from a core conceptual space. For example, Anthropic, one of the providers who’s published a lot of material in this space, has shown that their model has at least a limited capacity to plan. It doesn’t necessarily just next-token-predict, it can also think ahead. The token prediction model is still a very sophisticated one because as we can see, and as people see with the results that they get from AI, it can still be capable of producing quite meaningful results. So just because we talk about next-token-prediction doesn’t mean that’s necessarily a limitation. And the other thing to really point out with all this is most modern legal AI companies at least if they’re worth their salt and not really just whacking up an LLM in a new interface and calling it a day, you might see some companies that start off their journey doing that to draft an MVP, most modernly aware companies are combining that technology with other tech that already exists like sophisticated search technology and retrieval tech. And the large language model might be like the cherry on top, the analysis or inference layer, but that doesn’t mean it’s necessarily going to be responsible for the initial retrieval of information. And that is one of the key things that might distinguish a legal AI product like Habeas, from say, someone who’s using ChatGPT, where all they might be getting in their answer from ChatGPT is just a one shot response from the model based on the initial question that they give. And so that really plays into the sophistication and the architecture of the tool you are using, right? It really matters. |
| 00:09:06 | DT: | Yeah, all great points. I guess one thing that sets an LLM apart as a next token predictor from your auto complete on your phone is that if you were to select the auto complete option on your phone, after five or six selections it clearly becomes nonsense, right? LLMs are capable of producing coherent, meaningful, often valuable responses because of the attention mechanism, the ability to form connections between words over quite a long distance, and as you say, sometimes thinking ahead from a prior token. And you make a great point about the idea that LLMs are probably not even the main event I think in legal AI or legal research, certainly at this point. At Lext – and we do something slightly different in that we do document review, but the concepts are similar – we call the LLM the last mile of the journey. It’s the last step, but probably not the most important or the longest step in that journey. A few years ago, I remember someone describing what LLMs are good for or good at, and I think this is a great summary – maybe you can tell me if you agree or disagree – “LLMs are really good at telling you the answer to a question if given the answer.” |
| 00:09:20 | WM: | Yep. Sure. |
| 00:11:18 | DT: | First I thought, what is that good for, right? If I know the answer, why do I need the LLM to tell me the answer again? But this is the summary that succinctly summarises what we talk about when we talk about Retrieval Augmented Generation. And an LLM can tell you the answer if given the answer first, and you can do all of these effective techniques around context engineering searching for relevant information to provide to the LLM in order to surface the right answers. |
| 00:12:18 | WM: | Exactly, yeah. And that architecture of Retrieval Augmented Generation has become so crucial in the stack of not only Habeas, but also some of the ones that are getting a lot of attention and news at the moment. They also oftentimes are leveraging some form of RAG as part of their stack, as part of their technology, right? That’s really important to these companies as a way of mitigating the perceived hallucination risk of if you’re just using a naive model especially one that is not trained on the law or legal knowledge, one way you can combat that is you can incorporate that initial search depth. And the thing to point out here is there’s just a whole range of ways to actually do that. It’s not like everyone has got the same approach, even though we say, you know, companies are using similar architectures, the way one company approaches RAG might be entirely different. The way that they process and embed meaning and data and the way that they represent that to the end user or what is actually being selectively retrieved could be entirely different to another company that might just have a really simple RAG stack or might just be using a more basic architecture. So there’s just a whole realm of difference within that, even though a lot of engineers and legal engineers are looking at the same problems of course, right? And coming to similar conclusions of, how can we mitigate the risk with AI and build better architectures? |
| 00:12:19 | DT: | Yeah. That term, “Retrieval Augmented Generation” (“RAG”), captures a lot of different techniques, doesn’t it? Because all it really means is, searching for and retrieving some information or context and giving that to the large language model before the user prompts it. So that means that you could be querying a database, you could be doing a web search, you could be doing a simple keyword search over some text, you could be using more advanced search techniques like semantic search where we look for relationships between text based on their underlying meaning rather than their keywords. There’s lots of different ways that you can retrieve and that can mean there’s a great deal of variability in one RAG implementation to another. |
| 00:12:47 | WM: | A hundred percent. And this is where I think sometimes you’ll see these interesting examples of smaller startups that are emerging, but maybe they’ve got some specialist expertise in that space and their implementation for a particular workflow, whether that be research or document analysis or contracts, is particularly good implementation and is quite specialised to that workflow. And that matters quite a bit as well, in terms of how you design the underlying technology stack of your legal AI product will have huge ramifications and impacts for the quality of the tool in day-to-day use. The other thing this brings me to is – I think you mentioned this point – around LMS as well and potentially that in certain instances they’re good when they already have the retrieved knowledge at their fingertips. That’s not to say that there’s not a realm of questions in the law and tasks that people are doing that sort of range from subjective to objective. So there’s certain tasks where hundred percent factual accuracy is required to validate a source or to understand where information came from. There is a right answer. But there’s also a lot of tasks, new areas, fundamentally constructing a new legal argument where creativity, subjectivity, these sorts of factors come into play. And I still think that’s an awesome thing about large language models. They can be a brilliant creative aid in that capacity. They’re still obviously rooted in a model of the world a certain way of, I wouldn’t necessarily describe it as thinking, but reasoning, that we might call it colloquially. And so we can remember that. And then for those situations where that objectivity is required, we can build stacks so that an LLM is always being able to reference real data in a more context informed way, but we should also embrace those aspects where the creative and the ideation capabilities of LLMs come out and are really strong as a tool in the lawyers toolbox. |
| 00:13:57 | DT: | Yeah. Benedict Evans, who is an independent analyst, who was a former partner at the venture capital firm, Andreessen Horowitz. He’s one of my favourite writers on the diffusion of AI in business. And he makes a distinction between questions that have a true answer and questions that have a better answer. |
| 00:14:39 | WM: | Exactly. Yep. |
| 00:16:17 | DT: | And he uses it in the context of, does it matter if the model is better now? Right. So model improvements are better for questions where the answer can be better, but there are questions where the answer can just be true or false. And then there are some fundamental architectural limitations there, right? As I said at the top of the episode, there’s a bit of division, I suppose, about the appropriateness of using AI and legal work or the efficacy of it. We’ve seen some restrictions imposed by certain courts and then some relaxation of those restrictions. And I bet speaking to a lot of lawyers about using AI and legal research, you still see a lot of misconceptions about AI, especially from people who, Habeas might be their first experience with a legal AI tool. What are some of the biggest misconceptions that you’re still seeing in the market? |
| 00:16:34 | WM: | Okay, so there’s a few on this front. I think firstly you’ll have some people really, I think the LinkedIn world in particular, they’re still operating in the mindset of fundamentally at a basic level, AI always hallucinates. It cannot be trusted for anything really. It will just make stuff up. I think we need to be careful around that, in the sense that distinguishing between general purpose, large language models that might not be designed for legal use case, and architectures that have not solved but fundamentally mitigated that problem quite a bit, where with applications like Habeas, people can point a model to say, “look, only retrieve based on this specific subset of information,” or, “only give me an answer based on this particular court’s rulings,” et cetera, right? People have that capability oftentimes with a legal AI tool like Habeas. They can always see the citation, they can see the reference point, and then they can then also make an evaluation if they have an AI summary as to whether the analysis given by the model about the relevance of sources is correct, or whether it’s subjectively true or potentially needs revision, right? So I think that’s misconception number one is that the tech has come quite a long way. We are already on top of these issues and legal engineers are thinking about them and have implemented stacks to at least significantly mitigate the risk of hallucination. The other thing though is that secondly, I’ll see the tendency where there’s a very different success outcome depending on the type of lawyer and how they embrace AI. So the lawyer who says, “well, look, this tool is an incredible source of leverage, of capability for my practice compared to traditional research and search tools that are oftentimes frustrating, laborious.” There’s no point of connection with some of these traditional search engines. If you’ve got a query in your head, a complex legal issue that you want answering, you can’t necessarily just whack that into a keyword search engine. There’s so many steps in between, right? So the lawyers who embrace that and say, “look, this is a tool that gets me 80% of the way there” have a real success mindset and they tend to do well but they’re not approaching it as a magic eight ball. Whereas there are also some lawyers who will come in, submit one query. If they don’t feel they’ve gotten the perfect, senior lawyer level, hundred percent response to that question on a one shot question, they might log outta the platform and never use it again, right? And so we do encourage people to think about how they’re prompting as well, and that that makes a difference. The fact that you are actually free with AI to submit several different prompts and model the search engine, whatever it may be, it’s gonna come up with a few different, slightly conceptually different answers that you then as a lawyer can piece together just as you would with a traditional research tool. So I think the misconception there is just that understanding of this is an incredibly improved research aid, but it’s not a sort of magic eight ball necessarily, in the sense that it can always achieve a hundred percent authenticity and accuracy, right? Saying that, legal AI tools are still so much better than what we become accustomed to as search and inference tools, you know, three, four years ago. And the lawyers who are not using these tools at least in a safe and effective manner are probably going to see results and implications for their practice if they don’t adopt some form of generative AI. |
| 00:16:35 | DT: | Yeah. I think in some ways the level of hype and inflated expectation in the marketing from foundational model companies, “this is gonna cure cancer, eliminate all jobs,” it makes it difficult to have a normal expectation of, “this is gonna be a modest productivity game for my practice.” You know what I mean? Like, “oh, it didn’t do the whole job for me, but it’s still vastly better than doing a Boolean search,” the way I was taught to 20 years ago, and you were taught to recently. It’s still an improvement. And I guess we’ll talk about this further a bit later, but in my own practice, I don’t think of them as a binary either. Like I use both. I use a legal AI research tool to find cases that are relevant, but then for me, you would be mad not to then read that case yourself, right? So I then do that. And it’s great that I’ve been able to find relevant on point cases so early rather than reading 20 head notes before I find the relevant case. And having found the right cases, I can then read those – as you should, because you should never hand up a decision in court that you haven’t yourself read – and I can use traditional tools like, “this case has been cited by that case recently,” or “this case was actually overturned but not on a point that matters to you.” |
| 00:17:19 | WM: | A hundred percent. The way that you’ve just expressed and conceptualised that that is the success pathway that I’m seeing with our own customers. And so I’ll see, oftentimes senior litigators – which is a real pleasure to watch, how they go about and how they’re using these tools, right? Because they are quite smart people – they’re using it to get 80% of the way there or run an initial couple of searches that gives them a vast wealth of information to draw on. But they don’t necessarily have this approach of, “look, I’m one shotting, I’m copying and pasting what I’ve got from habeas and submitting that as my advice.” No, of course not, right? They are taking little bits and pieces from the analysis. They’re spotting a great little insight and shortcutted them straight to what might have otherwise taken hours to find, right? Sometimes that’s even the locus classicus, or that’s the key test that they need, or a key authority that is so conceptually close to the matter that they’re working on, right? And then that in and of itself has saved them several hours or it’s even just saved some of the more manual work that they otherwise could be doing. But then as you say, there’s other tools that do other jobs. So they might take the citations and put them into something like Jade, where Jade has its own features. Jade has a really great tracking capability for if, as you say, something has been re-cited or there’s been an evolution to that matter that maybe an AI tool is not gonna spot, right? |
| 00:20:15 | DT: | The Habeas to Jade pipeline is the one I use. |
| 00:21:27 | WM: | Oh really? Okay. Interesting, yeah. Yeah. So, that’s really interesting to hear, and that is probably one of the most common pipelines that I hear from Australian litigators. It can still be helpful to retrieve a few relevant authorities or principles where, as you say, you might have to read a bit deeper into that case because that subparagraph may not be the best subparagraph, but it’ll give you a location in the source where that issue is being discussed. And then if you read around that, that becomes a sort of new practice. And it is really fascinating to hear that you are using it. How has that impacted your practice? Are you finding it requires entirely new workflows and approaches? Or are you just almost taking the same approaches that you learned in law school and just reapplying them when you use AI? |
| 00:22:41 | DT: | I think the core task of legal research, there’s kind of two stages to it, right? There’s this walk in the fog initially of trying to establish the principles, and then once the principles are well established, it’s about application, and application usually requires a deep reading of the cases. Having now understood the framework, conceptually, what about this case is distinguishable from my own or similar to my own? If this is an inconvenient finding for me, why might it be different in my own case? And those things require a deep reading, I think. But a lot of the time before the availability of semantic search for cases that walk in the fog stage would be much longer than the application stage, right? |
| 00:22:44 | WM: | Yes, a hundred percent. |
| 00:23:25 | DT: | Just trying to find a relevant case, right? My old legal research methodology was: type keywords into jade and open 30 taps, right? And then just start reading through and like irrelevant, actually a different Act, overturned, irrelevant, irrelevant, irrelevant. And eventually you find something that cites another case and you read that case, and then that’s more recently cited by the Full Federal Court, and that’s a good authority. And then, you’re at the principle stage, right? But it takes a long time to get there. I think AI is really good for getting to that principle stage quickly because you can find that relevant case faster. I mean, I do have a conviction that in many domains the real advance or the real innovation that AI represents, and I think this is true for law, is that it’s a leap forward in search more so than being a drafting tool, or – maybe we even differ a little bit on this – but as a reasoning aid. I think of it as, this is a really powerful search tool that replaces a pretty stagnant search paradigm, and you can get to the stage of thinking and principles much, much sooner. One thing I do like about Habeas is that you do provide that subparagraph, right? Because what I wanna ask you about next is verifiability. I think you put it exactly right, which is that retrieval, and retrieval that’s specifically adapted for the type of work you’re doing, so, specific to law, substantially mitigates hallucinations, right? But they kind of can’t be solved in the sense that Meta’s chief AI scientist says it’s a fundamental feature of large language models. It will just happen no matter how large they get, no matter how sophisticated. And so there is always a need to verify, right? And there are some lawyers who complain, “well, the verification takes as long as it would take to do the task. So why bother? Like, I should just do the task myself.” And we do see in other domains, the Metr study – that’s METR, not META – in software engineering that found that some software engineers thought they were saving 20% of their time because it took them less time to write the code, that it takes them 20% more time to review the code. So tell me a bit about verification and how you solve for what I’ve heard called “verification fatigue,” which is that challenge of verification being a really laborious task that can perhaps rival the original task in terms of its time, because I know Habeas has some useful features there. |
| 00:24:05 | WM: | Yeah, a hundred percent. Really what that issue points to is you could have a highly complex response given by Claude or ChatGPT or even legal AI tools that aren’t built for research. But then, as you said, there’s going to be a set of steps in between where a person might have to fact check every element of that answer. And if it’s not clearly cited or it’s not clear what the reasoning chain getting to that answer is, people can’t break it down. And I guess the other thing I’d point to is LLMs are sometimes guilty of something lawyers do, which is like reasoning by analogy. So they’ll extract something from a case and say, maybe this is a principle from over here in construction law, and they say, yeah this is generally conceptually relevant to what you are working on, which is in IP law. Now a seasoned lawyer immediately can spot that and think, well, it’s not necessarily inherently wrong, but it is reasoning by analogy or that it’s sort of a conflation of principles and I need to be careful in taking that at face value and how I apply it, right? And those are the sorts of mistakes that aren’t necessarily hallucinations, but they’re just things that almost any AI tool is oftentimes going to sometimes make those sorts of jumps and reasoning, et cetera, and lawyers have to be on the lookout for it. |
| 00:24:06 | DT: | Yeah. I actually had an experience like that recently. I was doing a bit of research on accessorial liability incorporations, and depending on the provision you’re dealing with, the relevant test of conduct for accessorial liability might be aids, abets council’s, procures, that kind of active encouraging step, or it might be knowingly concerned in, right? That more passive step that you see in the Competition and Consumer Act, for example. And I was using an AI tool – not Habeas, I should hasten to add – to do some of that research. And it said, “oh yeah, well, here’s a bunch of cases about the test of knowingly concerned in.” And that’s the sort of mistake of saying, “well, this test equals that test.” They’re not irrelevant. They’re accessorial liability cases about the liability of corporations, right? It’s right on point. But that test of, “this test equals that test” and it’s not quite that way, you could easily see a graduate lawyer making the same mistake. You’ve still gotta exercise that independent judgment. It goes back to what we were saying, that this is a part of your process, it is not your process. I think so far what we’ve compared is research without AI and research with AI. And just before we move off that topic, it occurs to me that litigators are actually well-placed to do this verification task, right? And are well-placed to do it efficiently. Because as litigators, we are used to the process of assessing the credibility of evidence, right? If your research tool says to you, “yep, the relevant case that you’re looking for is X. And that case stands for Y.” And my citation for that is an ACCC press release about exactly that case and what they say it stands for. Well, that seems pretty reliable. But if my source for that is a Medium article. I’m like, “oh, okay, well maybe I want to scrutinise that a bit more closely.” As litigators we’re good at looking at, is this source credible? Is it current? How much do I trust it? What questions do I ask to test its credibility? And that’s kind of the task you’re doing right with verification. But some tools make that easier and some tools make that not so easy, which brings me onto the next topic I wanted to talk about with you, which is, we’ve talked about legal research with AI and without it. I wanna talk about general use AI tools versus specific AI tools, because I’ve noticed over the past few years – we were having this conversation two years ago or even a year ago – maybe we would’ve made the distinction between Habeas and ChatGPT is that ChatGPT is just an interface for the core model, right? It’s prompt in black box does a thing that even the world’s best researchers in this area can’t completely explain, and response comes out. And it’s not really inspectable, it’s not verifiable, you just have to kind of take it on faith and do your own manual research that it’s true. Whereas what Habeas is doing is retrieval and all of the clever search techniques that you’re using and that’s the difference, but ChatGPT increasingly is implementing these other retrieval steps, right? |
| 00:26:21 | WM: | Sure. Yeah. |
| 00:27:21 | DT: | So, it does web search. And I think it relies on web search very heavily when you’re looking at research tasks, but it now implements a retrieval step, sometimes there are sources, where there’s this blurring of the lines between general use tools that are just an interface for the model and general use tools that based on the query will add a reasoning step or will do some search first or will call an MCP server to use some other tool that’s available. Talk to me a little bit about what you see as the continuing difference between general use AI tools and domain specific AI tools as a lot of the techniques used in both start to coalesce, I guess. |
| 00:30:05 | WM: | Yeah. Awesome question. And I think not a question I actually hear get asked as much, but it’s like the thing everyone should be thinking about in a way. So I think to provide some context for this and then I will give you my best answer – What is the landscape for the open AI and Anthropic type companies of the world? Well, these are massive VC funded type companies. They’re also competing with these more specialist type companies. So as you said, OpenAI has had this strategy where increasingly it’s trying to go for the jugular of other companies in different, more specialised spaces. And this is also somewhat true of Anthropic as well, right? So a good example of that recently is like OpenAI has released their own agent kit. They’re also releasing their own agent search browser for the web which competes with other products like Perplexity’s Comet and so forth. So they’re actually observing what else is in the market, what is popular, and then trying to cover that use case themselves. What I see is that oftentimes those sort of sub features that OpenAI is implementing, they oftentimes will have something that isn’t liked as much by the market compared to the other specialised products. So a lot of people might still choose Perplexity for retrieval, augmented type responses because there’s something in the efficacy of their engineering, et cetera, that they prefer over, say, the open AI output in that space. So I think that’s the first thing to say. And it just makes logical sense because the company’s focused on a very specific problem set and set of users. It’s oftentimes gonna do better than the some more general purpose models that are trying to cater to every usage at once, right? So that’s one thing to note there. But I do think there’s gonna be a whole bunch of losers. So like these massive, VC funded AI plays for agents or for search, et cetera, that will just start to fall off because they didn’t meet their revenue targets, there was competitive pressure, OpenAI has come part of their business model and they don’t have enough of a moat, right? So that is true. That’s a broader context that’s going on, it’s a really fascinating one. In terms of what distinction point, well, it’s more about the conversations that you are having and how you’re gonna implement that in your stack. So a legal specific tool will A), have a very specific type of interface, an interface designed for that ICP, for that type of customer or target profile. And oftentimes you find that interface becomes more specific over time. So as you say, as we found a lot of litigators adopt Habeas, that means that we might make decisions that are less conducive to say, something that a lawyer in a totally different field of law might want, and that might disadvantage them a bit. So interface is a big one. I think secondly, the architecture. When you’re just building with a very specific type of customer and person in mind, you are thinking about that problem all the time. You are aware of the nuances your customers are giving you feedback of what they don’t like, what they do like, et cetera, in an extremely nuanced way that I think any general purpose lab maybe won’t be able to replicate. So I do think there’s a value in the market for specialised tools. Also just bridging that gap of understanding is important because speaking to a lawyer on their terms and the specific problems that their firm is experiencing and then understanding how AI or a specialised AI tool can actually solve that problem, right? There will always be kind of an advantage for that, but saying that, I think there pre-presents an interesting conundrum for big legal AI companies that are maybe on that VC model of, “make X amount of return within five years,” right? if they don’t prove that their tools are sophisticated enough and comparatively much better than say ChatGPT or Claude, because a lot of people are putting them through those tests, right? How does this compare to Y? And a lot of people are starting to wake up and say, “I’m not sure this provides the additional value in certain instances.” And then there’s, we say, “no, I do think this is something different from what I’m getting with a general purpose AI.” And so you also see a trend of some of the curiosity revenue that may have come from big law firms that have done a couple of years investment. They will be more specific with the tools that they use going into 2026, 2027. They might drop off a few tools. They might stay with a few tools and they might adopt a few more specialist tools for different areas of their practice as well. |
| 00:30:06 | DT: | We might be reaching the end of the piloting phase where there’s lots of nibbles at different tools in the environment and it’s time to make some commitments. I mean, what you say about the importance of UI, to a choice between general and specific tools, makes a lot of sense to me. |
| 00:30:46 | WM: | Yeah. |
| 00:34:32 | DT: | I used ChatGPT for a legal research task recently to see how it would perform after GPT-5 after the introduction of web search in the app, all these sorts of things. And what I noticed was that completely hallucinated cases, that is cases that just don’t exist, as far less prevalent. But I was presented with I think maybe four cases in that session that existed, but were completely irrelevant to the question. Even leaving aside whether or not a legal specific tool might have a better retrieval process, right? Just leave that whole question aside of, “can you improve on search if you know the law really well?” Just doing the UI piece, right? ust thinking about the UI, if those four irrelevant cases were presented to me in a UI that was optimised for law and so showed me, this is the medium neutral citation and these are the catch words for that case. And this is a little paragraph from the case. I don’t have to go and read those cases to know they’re irrelevant, right? Even, holding all else equal, I can look at the catch words and say, “no, I don’t need to look at that one.” “No, I don’t need to look at that one.” “Oh, that one’s good.” “No, I don’t need to look at that one.” “No, I don’t need to look at that one.” So even just at the UI level, without any distinction around the quality of the model or the quality of the retrieval process, there’s benefits in a specific tool. I absolutely get that. |
| 00:34:47 | WM: | Awesome. Yeah. I agree with that wholeheartedly because that is such a niche but nuanced point that people might not get in a scenario where even sometimes legal, AI specific tools, but maybe they’re not quite for research, maybe they’re for contract or drafting or something in that realm. But then people start trying to use those tools for research and encounter a similar problem to that and become quite frustrated, right? And just the fact that they can’t immediately check and assess for themself is such a blocker and limiter. And then they might also come to the conclusion that, “oh, legal AI, it’s not good for research. I’m just gonna use it for drafting and admin tasks in my firm.” Right? That is kind of the wrong conclusion in the sense that no, there are specialised tools that have thought about your use case and the UI that you need as a lawyer that you care about. Right? That can at the very least, solve that initial problem and then in many times help a lot more with relevant case retrieval and more relevant results, et cetera., right, in the second instance. |
| 00:34:47 | DT: | Yeah. Absolutely. And I guess that brings me to my next question, which is about hallucinations. We’ve talked about verifiability, making sure that cases stand for what they stand for. That RAG mitigates hallucination risk a great deal, but doesn’t eliminate it. It’s still happening, right? I think there’s 31 cases now in Australia reported that involve a party, a lawyer, or a self-represented litigant seeking to rely on a non-existent case. And we have some guidance from the Court that this really is a personal responsibility of practitioners now. So the Supreme Court of New South Wales Practice Note says that if you’re going to submit generative AI assisted work product – so the affidavits and witness statements – generative AI can’t be used to create those at all – but written submissions, you are permitted to do so, but you must personally check that every case cited therein the list actually starts with exists and then stands for the proposition you claim it stands for. |
| 00:36:00 | WM: | Exactly, yeah. |
| 00:36:52 | DT: | Two part question: At Habeas, what do you do to, so far as possible, mitigate the risk that non-existent hallucinated citations appear in results? And second, what do you recommend that lawyers do to ensure that they’re not putting hallucinated cases before the court? |
| 00:37:47 | WM: | So, two good questions. I think in terms of mitigating the hallucination risk, it’s almost everything we’ve talked about up until this point in the podcast is relevant to that, to a certain degree, right? Like the architecture that we’ve got, being so specific to the legal mind and the legal thought processes, going back to simple processes like issue rule, application, conclusion (“IRAC”), just building those into our models and our architectures and having a sophisticated search layer, all of that mitigates the hallucination risk. I don’t believe that there’s a company out there that could claim to have a hundred percent solve the hallucination risk, simply because no human can even really stand up to that barrier. Even, prior to AI, there’s so many instances of humans and real lawyers making mistakes in law, right? Bad submissions, they’ve drafted something based on – they quickly read the catch words, assumed the case was relevant – this occurred prior to AI and it’s happening even in the midst of AI as well. And then in terms of the second part of that question, the first thing you can be doing is using a more specialised tool, whether that’s Habeas, whether that’s something else out there in the market that is designed for your use case and area of law, right? Because it might be that you are actually doing a lot of contracts work, et cetera, but there’s still relevance of hallucinated material creeping in there. So another tool might be necessary for you compared to, for example, one of the lawyers who was recently pinged for this; he’d even designed his own custom like workflow with Claude and some internal models. So he was doing some smart things but again, even that was not enough to actually mitigate the hallucination risk because it was not a domain specific tool, et cetera. And so that would be the first thing is, be very intentional about the tools your firm is buying and using for research if you’re using gen AI assisted tools. But I think beyond that, taking on that element of personal responsibility, just because you have this tool that can give, of course, such a quantum leap in your capacity for judgment analysis research, right? That is extremely exciting. But there’s always gonna be that element of personal responsibility. And I guess a lawyer recently, this is not my words, he said to me, “it’s a laziness issue.” I’m not sure I a hundred percent agree with that, but I get where he’s coming from, right? Which is that there is this increasing psychological thing of, you’ve got a pristine answer in front of you, all looks fundamentally just about right, you are maybe on the clock for work and under some kind of time pressure. And I would just encourage people to still, you practice those skills and still think of this tool as a really high powered research aid rather than a magic eight ball that is always a hundred percent accurate and correct. And where you can get mad or annoyed if the tool is not a hundred percent correct, or you think there’s something that’s unfaithful with the law, that’s still part of the personal responsibility piece, which you alluded to as part of the Practice Note. These sorts of things I’m almost encouraging is also people developing an understanding of the underlying architecture. Don’t assume as well just because you’re a lawyer that it’s not gonna be helpful to learn about RAG, to learn about the systems that these tools are using. Learn what a large language model is, equip yourself with that knowledge so that when you use this tool, you understand exactly what it is and isn’t good for, right? And how much can you trust it? And by becoming a more tech informed and tech savvy lawyer, you are gonna put yourself ahead of so many more practices. Which actually leads me to another point, which is seeing so many small practices and small law firms basically take off with AI. They’re happy because maybe they’re increasing their billables or they’re maybe able to take on more client work. There’s an element of caution there, but it’s an incredibly empowering thing if you are a smaller firm that maybe lacked those resources before, to add that volume or some sort of leverage that your practice didn’t have before. |
| 00:37:48 | DT: | Yeah. I mean, look, long before generative AI was even a term out there in the world, I was taught you don’t cite a case unless you’ve read it. And I think that’s still a very good rule to follow. It’s a good rule to follow to avoid hallucinations. But we should remember why we had that rule before generative AI, which is that it’s such a missed opportunity for forensic advantage to know the cases you’re citing deeply. |
| 00:38:04 | WM: | Exactly. |
| 00:41:29 | DT: | You don’t wanna be surprised when you’ve said, “oh yeah, and this case is authority for the proposition that I should win this case.” And your opponent jumps up and says, “well, actually that’s not very helpful, because it was decided 50 years ago and it really turned on a prohibition that now no longer exists under the statute. And so it’s of limited assistance now even though it hasn’t been overturned.” And you go, “oh man, I wish I had spent the 20 minutes it took to read that.” We’re not living in such a different world that some of these core lessons can be thrown away. |
| 00:41:50 | WM: | Yeah. And just as an example that I recently read, a ruling, I think it was by Justice Jackson. A self-represented litigant had come in with a particular argument in that case. And one of the interesting things there is, yeah, again, as part of his judgment, he was alluding to the fact that, well, he agreed with many of the key tests and the key case law that had been given and retrieved. But of course, turn on the facts of the matter and the application of that test really actually involved like a very specific subparagraph or a very specific qualification right in that piece. And that’s where the crucial issue lay in determining that judgment. So there are so many instances of this where you’re gonna see that more and more. You’re gonna see judgments where it’s like, “look, everything here is technically correct,” or “everything you bought for the Court is sound, but you’ve maybe missed a key nuance.” So that’s where I think actually, embrace the fact that AI can save so much time on the backend, and then actually use that spare time to do more of the deep research that maybe you couldn’t before. Actually do the inverse of what some practice might do, which is just “look, I’ll just take on a million more matters and be okay with just the efficiency gate.” So actually there’s a middle way. There’s a middle pathway. Yeah. |
| 00:41:50 | DT: | Yeah. I think one of the unfortunate consequences of the widespread adoption of eDiscovery tools, right? Like we had a similar conversation over 10 years ago with technology assisted review and eDiscovery tools. “Oh, this is all work that grads and paralegals do. There’s gonna be no more jobs for graduates in litigation. What are we gonna do?” None of that happened. What we ended up with was, I think, something that a lot of judges and maybe plenty of lawyers bemoan, which is millions and millions of documents submitted and reviewed in discovery, and often ending up in evidence without anyone really having paid close attention to whether or not they ought to be in evidence, whether they are ever going to be cited in submissions, whether they’re gonna assist a judge in making a decision at all. We ended up with what you described, which is, “well, let’s just be content to do more, irrespective of whether it’s valuable.” I think yeah, there is an opportunity to use AI to do the same amount, but at a much greater quality. |
| 00:42:20 | WM: | A hundred percent. Yeah. And I’d even think in that volume issue as well, it’s an interesting distinction because what I’ve seen is a bit of a distinction between some self-represented litigants I’ve noticed over the past year who have logged on and started using Habeas – so we don’t advertise or really market ourselves as a tool for everyday people and it’s not really how the infrastructure is designed – however, there are some self-represented litigants who are going through multi-year, battles and complex jurisdictional issues and all this sort of stuff going on. So they actually start to develop this awareness of the law and try to upskill themselves in the law. They might turn to tools like Habeas as part of that. Now I think it’s a positive thing that someone has at least had the wherewithal to be like, “I’ll use Habeas rather than ChatGPT,” which is by the way, what 99% of people will be doing, is using something free like ChatGPT, something that doesn’t cost them anything. But again, that will lead to even more problems when they misuse a case or they don’t have the legal background. |
| 00:43:24 | DT: | I mean, we’ve already seen plenty of judgements, I think including in the Court of Appeal where we are seeing self-represented litigants filing voluminous submissions stuffed with references, many of which are completely irrelevant. |
| 00:44:16 | WM: | Yeah. A hundred percent. |
| 00:45:09 | DT: | But that’s sometimes not even a question of, “oh, well these are all hallucinated,” right? Sometimes I use the term that AI can be a differential multiplier, right? It’s really useful for someone who already knows what they’re doing, right? If you can ask the right question on Habeas, you can get great results. But if you don’t know what you don’t know and you start down the wrong path, you can get lots of great, relevant, accurate answers about completely irrelevant questions, right? |
| 00:45:22 | WM: | Exactly. That’s actually the core skill, is question asking. And honestly, that’s probably one of the core skills that will separate top litigators from people who are not maybe at the top level of the craft, is asking the right questions. Anyway, this skill that has always been super important pre AI and continues to be of increased importance now, is which questions are you using? How confident are you with the tools you’re using? Are you checking and actually going through a really sophisticated analysis step when you do receive material and validating it as well. And even to the point I was making before, that’s where I notice a difference between if self-represented litigants are using a platform versus how someone who’s got like 20 years PQE. And it’s just a very different thing, and especially a more tech savvy lawyer who’s more experienced because the self-represented litigant might sometimes almost use it again in that more magic eight ball of just “draft me a submission that it’s x, y, z, blah blah, blah, and then reference it” and just try and one shot that prompt. It’s the admin stuff. “I need to draft this particular form and I need the margins to be three inches,” right? It’s like, well you need to do that within Microsoft margins, right? You’re not gonna be able to set your margins within Habeas. It’s gonna give you research, it’s gonna give you information, right? So, on the one hand it’s a positive thing because I have seen some self-represented litigants achieve successful outcomes with the aid of Habeas and other tools. And that’s great to see. But I’m also just cautious that a lot of the reason people are self-represented is also lack of access to justice or just not having the money at their disposal to pay a lawyer. So, it brings up all these complex issues of seeing these interesting different trends in different usage patterns of someone who might want to use Habeas. And it is my view; that combination of an experienced lawyer with a search tool like this is the best, happy medium. |
| 00:45:23 | DT: | Yeah. I think it tells us that as much as we might like to believe that AI is not this silver bullet for access to justice, I think there’s a lot of potential there. We do a bit of access to justice work at Lext as well, but there’s that important human expert in the middle that has to contextualise and identify issues ,which brings me to your earlier point around question asking being a very important skill both now and in the past – and I wonder whether we’ll disagree on this – one of the pearls of received wisdom you see a lot and I think this has been enduring over the past two years, is that prompt engineering is a really important skill for lawyers to develop. What do you think about that proposition? |
| 00:45:48 | WM: | Yeah, I have mixed thoughts. So first of all, especially in 2023, what you will see is like LinkedIn and spaces like this, right, where what people need to do is they need to find almost a model or a skillset that, if you just learn this skillset this complex crazy thing that has come out, ChatGPT and AI models, et cetera, you can sort wield it better, right? Like that was the promise of prompt engineering. You can control the beast. There’s a strong element of skill. And then it also gives a bit of an excuse as well, by the way, for big labs, because their models were hallucinating a lot at that point in time in particular. “It’s because you haven’t prompted correctly here,” right? “It’s not to do with our architecture.” So there’s multiple components why that became so popular and it also becomes thirdly, I think people can sell. So you can sell courses in prompt engineering. This is the expertise, this is what you need. And it becomes a term people can get their head around and identify with. The caveat I’ll say is that again, legal AI tools like Habeas, we already put a massive amount of work on the backend, on sometimes quite vast or sometimes more simple prompts where, we imbue the model with a persona or a particular tool set, or a way that it needs to answer questions. A model that it needs to be faithful to. So already a lot of people are not sometimes realizing that there’s a system prompt on the back end or something like that that is being incorporated. Now we can try and make that quite patently clear to our customers because we actually give them the ability to customize their own customizable assistant where they can put in their own system, prompt and mess around without an experiment until they get something that is optimized for them. So that’s the first thing to say is like your prompt, whatever you might be giving to the model is already being influenced by that. Secondly, I think that alludes to the point that a lot of any development or tool worth their salt is trying to create mechanisms so that someone doesn’t have to be an expert in this. |
| 00:47:24 | DT: | Yeah. |
| 00:48:01 | WM: | Again, I do think there’s gonna be a different output between someone who gives them a more detailed prompt and asks the right questions. But Habeas can cater to both scenarios, right? If someone just says, “okay, give me injury law related to X, Y, Z.” It’s quite a simple query. It’ll still do well. It would be better if they had a more informed, “I really want to look at this subset or this particular issue in the context of X, Y, Z.” But it can handle both those scenarios. And so I actually see that as, increasingly, this sort of skill. It’s not necessarily a science. There’s probably some art when there’s some nuance to it, but we have no perfect, one model for what the perfect prompt is. And a lot of specialist legal AI tools are already doing a lot of the work on the backend for you. So that even if you don’t have the perfect prompt, that prompt is being reformulated anyway. It’s being contextualized. Yeah. So that’s really important to note, I think. |
| 00:49:43 | DT: | I agree with all of that. I think. I see those posts on LinkedIn that are like, “this prompt will turn ChatGPT into a financial advisor.” “This prompt will turn it into a PhD level stock analyst.” I’m like, these are the top hat wearing snake oil salesmen in the wild west town, right? |
| 00:49:43 | WM: | And it contributes to so much misunderstanding. I will talk to lawyers who maybe it’s “oh, I’ve heard that.” And then they might misconstrue what habeas is. “Oh, is habeas just a large language model or sophisticated prompt?” No, it’s like an entire architecture. There’s various steps to it. Yes, you can create your own, for example, customizable ChatGPT conversation – they have tools for that as well, I’ve forgotten what they call it, what they market it as – where you can put in a system prompt, but again, you’re still gonna be relying on the same underlying architecture at play there. |
| 00:50:31 | DT: | Yeah, exactly. Even if we go back to Habeas’ agents feature, right? That’s a classic example of your prompt is immediately being transformed into multiple different prompts. And so the cause and effect relationship between your prompt and your result is more and more attenuated. Because you’ve got the retrieval step – and that’s happening again, as we said, not just in specialist tools, but even in generalist tools because OpenAI and Anthropic are adding web search to their core interfaces – you’ve got prompt transformation that’s happening, you’ve got a system prompt that’s affecting what you are doing. In some cases, the user prompt is being transformed or appended to, or prepend to as well. And as all of these features become more common in the applications that we use, the cause and effect between a prompt and a result is just really hard to observe. Then you’ve got the problem that a lot of the research says that these prompting techniques aren’t actually consistent across interactions or different models. If you are using a prompt base chat GPT application which has some level of randomness, a non-zero temperature value, it’s not gonna behave the same way every time. And even there is some interesting research which we will include in the show notes, prompts sometimes behave in the opposite way to the way we think. So you can’t just tell ChatGPT to behave like a PhD level stock analyst and expect it to do it. There was some interesting research that if you told chat GPT to avoid positional bias. which is that if given the options one and two or A and B, it might tend to prefer the first option or the second option. If you tell it to avoid positional bias, it’s more positionally biased than if you didn’t. |
| 00:50:45 | WM: | Which is yeah, very fascinating to speculate on why, but yeah. Yeah. |
| 00:51:15 | DT: | I mean, you could speculate that the idea of positional bias is now in the latent space. |
| 00:52:50 | WM: | Yes, Yes. Yeah. |
| 00:52:53 | DT: | It’s like the Streisand effect. Now I’m thinking about positional bias, I’m gonna be more positionally biased. TIP: So David just mentioned a study on trying to negate positional bias in LLMs. That study is called ‘Can We Instruct LLMs to Compensate for Position Bias?’ and it was by Zhang, Meng and Collier in 2024. We’ll include a link to that one in the show notes if you’d like to take a further read. One of the most interesting takeaways from that study is that prompting large language models doesn’t always work the way our intuition suggests. You might think that telling a model to “avoid bias” or “be more objective” would improve its behaviour, but in reality, sometimes it can actually do the opposite. So the study is looking at something called position bias, which is the tendency for language models to prefer information that appears at the beginning or end of a prompt, while underweighting content in the middle. The researchers tested whether you could “talk the model out of” this behaviour using instructions. They tried two approaches. The first was what you might naturally attempt: relative positioning. For example, telling the model to focus on “the middle section” or “the later part” of the input. That didn’t work. Across multiple models, including closed-source ones, the models showed little understanding of relative position concepts. In other words, “the middle” doesn’t mean much to a system that doesn’t actually perceive the prompt spatially the way humans do. The second approach was more concrete: explicitly labelling documents with numeric IDs and telling the model to focus on “Document 2” or “Document 5.” That did work. When the correct document was given a clear identifier and referenced directly in the instruction, the model shifted attention and performance then improved. When the instruction pointed at the wrong document, performance dropped sharply, which is actually good evidence that the model was genuinely following the instruction rather than guessing. The lesson is that vague behavioural instructions like “avoid bias,” “be objective,” or “act like an expert” often don’t map cleanly onto how models operate internally. In some cases, telling a model to avoid positional bias actually made it more biased, because the instruction itself introduced confusion. So rather than abstract, human-style advice, what does seem to work more is being explicit and giving the model clear anchors, so numbered references, explicit constraints, or concrete targets. The broader takeaway is that prompting is about control surfaces. If you want reliable behaviour, you need to point, label, and constrain as a user. That’s a very different mental model from how most people approach AI today. All of these things tell me that prompt engineering is a really important skill for people like yourself who are testing prompts at scale and designing them as part of a cohesive framework, and who can observe how a prompt works when you’ve run it a hundred times, a thousand times in many different scenarios, but not so obviously useful to the lawyer who maybe prompts once a week. |
| 00:52:58 | WM: | Yep. Yeah. A hundred percent. Increasingly, we are going to probably see a diffusion or less and less of these posts where it’s just prompt engineering related type posts. I think because as you say, the market people become more skilled themselves and they read more about this stuff, they start to come to the same observations that you’ve just made, right? So I have already seen a lessening of that to a certain extent over the past year. But nevertheless, I’ll even get emails like, “oh, I created my particular agent with this persona to do X, et cetera.” But maybe it had a slightly different outcome on the way it actually went about this research. There’s always an incredible complexity because what people ultimately want is they almost want this J.A.R.V.I.S. human level thing that they can just talk to, and that is the perceived standard in a lot of lawyer’s minds. And then occasionally, I do think sometimes AI hits the mark there and it’s incredible, right? But sometimes it falls short. And that’s important there’s something that maybe goes wrong in the backend or the way the assistant interprets that prompt as well. But yeah, I think in particular, just even to provide more context there, our agents feature, which we call assistants, that’s how we frame it to people, it is an agentic architecture on the backend. When we say agents, it’s just really this concept of rather than exposing it to a one shot answer, the model can actually go and reason and call upon different tools as part of its response. And it can take that initial query, maybe break it down into three to four subqueries, right? Because maybe that initial query covers so much ground and it can call all those tools in tandem, get all that context and then, take it into the inference step into the step where it actually generates an answer as well. So it becomes a much more potentially informed step. However, the interesting thing about that is you are also delegating some decision making to a model, just as you would if you asked a paralegal. Five different people would probably have slightly different methodology and way that they would submit those queries to search engines, right? So this is this interesting step as well is there’s some element of almost choice or autonomy that we’re now giving models, right, to create better answers. So handing off some of that, but then also realising that you can still point things in a very specific direction and that will lead to better search theories or a stronger understanding for the model of what you want and need. |
| 00:52:59 | DT: | Something I wanted to ask you because I sometimes see this a bit myself both working on legal technology and working in legal practice. Habeas is a small team. As you said, you’re a humanities guy, but you’ve built most of this tool yourself. You would have exposure to a lot of AI tools for software developers, for software engineers that you are using every day to get your own labor to go further faster. Do you draw any inspiration from the tools that you yourself use in building Habeas for the features that you intend to add to Habeas? |
| 00:53:06 | WM: | Sometimes I do. I think it tends to often be more on like the user interface element, right? I might spot something from another tool or another provider that I think, look, that’s a really smart way to convey what this feature does. So for example some of the inspiration for our assistance for our agents interface, some of it was drawn from a company called Relevance AI, which is another Sydney based quite a big VC funded company now that’s over the last couple of years coming into prominence, and they are not in the legal space at all, but they allow people to create custom assistance or agents for all sorts of day-to-day work use cases. And I observed what went well with a user interface over time, and then how they refined it over time. And that actually gives you some insights of rather me going on that whole same journey. What if we apply a similar interface? Not quite a one-to-one copy in any way, but a similar interface or principle for how people engage with it because if it’s worked in this other context, maybe it can work over here if we reapply it in a legal context. So, that’s an example of drawing inspiration from other tools that are being built and other things that you see that work well. Another example that might be like Perplexity, right? They had a really great, one of the things that allowed for getting more popular, at least a lot of users is their user interface being quite simple, and this easy interaction between a citation being clickable and going straight to the source and understanding the relationship of citations, right? That was really their key advantage, was that user interface much more so than the actual engineering, at least, one to two years ago. So we’re talking inspiration for that to a certain degree as well for the search engine part of our platform, where we refine that over time and figured out what would work for lawyers, what met them in the middle with the old tools that they’re using, but then could still be intuitive and engaging enough that they get it on a first pass, right? Day to day, I’ve tried an experiment with tools like Cursor and stuff, so like coding type tools that I think increasingly are getting better for certain types of tasks. And those tools work, especially if you’ve got an established code base and so you’re not chefing up a new project from scratch, which is probably not the right time to use AI. You are maybe working on a particular bug or solving a bug, iterating an element of your user interface. I think tools like that are incredible in the sense that back in the day, you might have to choose one path only, code that path and hope that it achieves your vision of what you want. With AI, you can sometimes do this thing where you might irritably plan, you might test out four to five different interfaces for a particular feature on the fly, and then you realise, it’s a fourth or fifth one that makes the most sense. And you can do that in the span of a couple of days rather than individually coding each of those. You have a way to visualise that quite quickly in real time, at least a scaffold. So also super powerful for like how quickly you can move and add new features to your product. |
| 00:55:21 | DT: | Absolutely. We’re nearly out of time. I like to finish each episode with a question for our law student listeners, and I feel like this is a particularly important episode for law students because they’re entering the profession at a pivotal time, and there’s a lot of disagreement about what the core skills they should be focusing on are, right? And whether they’re entering the profession with the right skills based on how they’ve been doing research or writing during their degree. We’re at the point now where if someone is graduating now, they’ve had the use of generative AI for their tertiary education for more of that period than they haven’t. What would your tips be for law students who are coming into the profession now, they’re wanting to make sure that they have the relevant skills to use AI in practice, but they don’t wanna erode those core professional skills that are gonna make them good lawyers and keep them professional and competitive in the market for young lawyers? |
| 00:55:43 | WM: | Yeah, it’s a great question. There’s a lot of things on that front. So I alluded to the notion of personal responsibility before, but really what that means in practice for a junior lawyer is, especially as you’re going through those years of law school, understanding when it’s a good time to leverage and use AI and when it might not be a good time, actually still honing and refining those deep research skills that could become easy things for someone to skip over or not always do, as we alluded before. So really the same skills that are also relevant to senior lawyers as we were discussing, right? I think another couple of things is sometimes junior lawyers are only exposed to legal AI specific stuff beyond ChatGPT, et cetera., once they move into, say, a firm context. Even then if it’s a larger enterprise firm, it’s the specific tools that have already been sanctioned. Sometimes that’s only like Co-Pilot, which is not actually gonna be the best tool for a more specific use case. So I would actually encourage junior lawyers – because you are tech savvy because you understand the internet, you understand how to search and assess all sorts of different products – to go out and look for those more specialised tools, et cetera, that you might wanna start testing and experimenting with as well. The third thing I’d say is becoming a tech enabled lawyer. And what does that mean? Doesn’t mean you have become an expert developer, no. But realising that the boundary of how easy it is, to say, learn the basics of code or learn the basics of understanding how at the very least software works and what the underlying principles are and how a particular software works. That becomes so much easier now that you have so much knowledge at your fingertips. You can ask LLMs, at least for the high level stuff about how this stuff works. So actually not being afraid to upskill in that and really dive deep into it so that it gives you an extra string to your bow when you come to a practice, they will actually want to start to know, have you used AI? What do you think about this? How are you using it? Are you using it in a skillful way or do you have a surface level understanding of this? And that can become a real thing on your CV or a distinguishing point that, maybe if you’re in a competitive market for new hires, et cetera., that’s something that gives you a special advantage. So those are the main things I’d recommend. And then also, as a final point, not being afraid in the sense that there’s newer opportunities emerging. So some of the people, for example, working at Lext, your company, are people who are at that intersection of law, like maybe they’ve completed their law degree and they are doing both some practical law, but they’re also doing some development work, or coding, or marketing work. So they’re actually building other skills in tandem. So there’s these sorts of new opportunities from startups and small businesses for more flexible or multifaceted types of work as well. Your one career pathway is not necessarily only doing your years of study and then getting into a large enterprise firm, even though that’s of course a great pathway if that’s the one for you, right? So hopefully I’ve covered a good chunk of things there, and if I can think of anything else, I’ll let you know at another time, but I think that gives a good little summary. |
| 00:57:50 | DT: | Yeah, I think if there’s one takeaway from our discussion today it’s that domain expertise really matters to the capability of a technology tool, right, to an AI tool. And so there are a lot of opportunities for great lawyers in a growing legal technology sector. |
| 00:58:22 | WM: | A hundred percent actually. Yeah. |
| 01:00:54 | DT: | So there’s absolutely alternate career paths available. As you said, those core skills, you absolutely have to develop. These are additional skills, not replacement skills. And the question of whether lawyers should learn to code is such an interesting one. I remember when the blockchain and smart contracts was the new hotness, there was similar calls for like, everyone’s got to learn to code so they can write these smart contracts because otherwise the tech sector’s gonna eat our lunch as transactional lawyers, which didn’t really work out. Now you sometimes hear people say, “oh, well, who would learn to code now? Because AI can generate code, why bother?” I think it’s a great time to learn to code because a basic level of coding ability unlocks so much more capability with the assistance of AI, right? As we said before, AI is great at helping someone who knows what they’re doing, do more, but it’s not so useful for someone who has no idea what they’re doing. If you’ve got that basic level of coding knowledge, you can do a lot very quickly in your practice with code. Even as just one little example, it doesn’t have to be, build an app and make that your core business, a couple of months ago, I used a coding agent to create a little interactive app to present advice to a client instead of using a PowerPoint. Just a cool little applet with clickable buttons and accordions that you could open stuff. And it was just a cool way to present that advice. Did I put the confidential advice into the model? Absolutely not. But to get that interface, really quickly to do so. Yeah there’s I think good arguments for learning to code now. |
| 01:01:50 | WM: | And we absolutely know that this market and the way the world is moving now, there’s always gonna be a degree of exaggeration, but it is increasingly a world where you know, you need some kind of flexibility in your skillset. It’s no longer necessarily the case that people stay in one job or stay in one career pathway in the way that 30 to 50 years ago, that was a trend, and that could be over a really extended period of time. People might settle in one job for 20 years. Now, increasingly we’re seeing people switching jobs every three to four years or they might even make more frequent switches in their path or where they navigate from one space to another, right? And it might be you move from family law to criminal or whatever it may be as well. So these little movements that are actually gonna become more possible with AI. Like certain practices that were previously like “we only do this,” will increasingly be like, “no, we’re actually starting to take on matters for this broader range of segments.” And yeah, that’s gonna be quite interesting to witness as well. |
| 01:04:31 | DT: | Yeah. Oh, well, don’t get me started on this question. I am a big believer that we live in the age of the generalist. Sure. I’m absolutely on the side of the fox in the fox in the hedgehog fable. But we’ll talk for another 90 minutes to start that. |
| 01:04:46 | WM: | Yeah, yeah. Yeah. |
| 01:04:47 | DT: | Will, thanks so much for joining me today on Hearsay. |
| 01:06:13 | WM: | Thank you so much, David. It was a pleasure. |
| 01:07:04 | TH: | As always, you’ve been listening to Hearsay the Legal Podcast. We’d like to thank our guest today, Will McCartney, for coming on the show. Now, we mentioned our previous episode with Dr Armin Alimardani that gives an honest take on generative AI in legal practice. We recommend checking that one out if you want a practice management and business skills point. Again, that one is episode 131 and is called ‘The Lawyer’s Guide to Generative AI: Where It Fits (and Doesn’t) in Legal Practice’. If you’re an Australian legal practitioner, you can claim one continuing professional development point for listening to this episode. Now, as you know, whether an activity entitles you to claim a CPD unit is self-assessed but we suggest that this episode entitles you to claim one professional skills point or a professional skills point – take your pick. For more information on claiming and tracking your points, head on over to the Hearsay website. Hearsay the Legal Podcast is brought to you by Lext Australia, a legal innovation company that makes the law easier to access and easier to practice and that includes your CPD. Now, before you go, we’d like to ask you a favour. If you’re enjoying Hearsay the Legal Podcast, please consider leaving us a Google review. It helps other listeners to find us and ultimately that keeps us in business. Thanks for listening and we’ll see you on the next episode of Hearsay. |
You must be a subscriber to access this content.