Episode image for Grasping Causation: A Data Science Explanation of Causal Inference and the Role of Counterfactuals

LOADING ...

Want to listen to the full episode and all our other episodes?

Hearsay allows you to fulfill your legal CPD requirements every year.

Our yearly subscription is only $299/year.

With a yearly subscription, you can access all of our episodes AND every episode we release over the next year.

Purchase Subscription

Episode 129 Buy Episode

Grasping Causation: A Data Science Explanation of Causal Inference and the Role of Counterfactuals

Hans Weemaes, Head of Economics and Data Analytics at Vincents, joins David to provide a data science explanation of the intricacies of causal inference and causal effect, and how they should be navigated in legal practice to avoid drawing inaccurate conclusions from statistical evidence.

Category

Professional Skills

Date 4 September 2024

Guest(s) Hans Weemaes

Guest website Vincents

Additional Content

Download Guest Bios (PDF)

Download Summary (PDF)

Download Infographic (PDF)

CPD Points 1 hour = 1 CPD point
How does it work?

What area(s) of law does this episode consider?	Litigation; understanding causal inference and effect through a data science lens.
Why is this topic relevant?	Understanding causal inferences is a fundamental aspect of litigation, as how these inferences are established can significantly impact the outcome of a case. Causation is not only relevant in proving liability; but also in calculating damages and organising the overall strategy of proceedings. The ability to present accurate and well-supported inferences can significantly strengthen a case, while weak or erroneous ones can undermine it. As such, lawyers must be able to navigate the complexities of causation to build strong arguments and present compelling cases. This must be done with care and consideration, as treating causal inferences with a casual attitude can result in inaccurate conclusions being made. The legal industry often relies on experts to assist in addressing causal questions, who may rely on simple before-and-after comparisons or correlations when refuting causal claims. Being able to effectively identify and assess the assumptions at hand when dealing with causal inferences, will place lawyers in a better position to both advocate for their clients and fulfil their duty to the court. As such, developing a data driven understanding of causation is an immensely useful tool to add to your litigation toolbelt.
What legislation is considered in this episode?	Evidence Act 1995 (NSW) Uniform Civil Procedure Rules 2005 (NSW)
What cases are considered in this episode?	R v Mason (2003) 140 A Crim R 274 The document discusses a case where the defendant was accused of multiple robberies. The main issue was the admissibility of tendency and coincidence evidence under ss 97(1)(a), 98(1)(a), and 101(2) of the Evidence Act 1995 (NSW). The court had to determine whether the similarities across the events, such as the description of the robber, the getaway vehicle, and the method of robbery, were distinct enough to provide significant probative value. The court found that when these features were considered together, they demonstrated substantial and relevant similarity across the events. The court concluded that the probative value of the evidence substantially outweighed any prejudicial effect it may have had on the defendant. R v Milenkovic (2005) 158 A Crim R 4 The accused was charged with the armed robbery of a Westpac Bank, and the Crown presented evidence linking him to a subsequent, similar robbery. Both robberies involved men dressed in dark clothing, armed with a shotgun and sledgehammers, and driving stolen vehicles. A key point was the use of the same changeover vehicle, owned by the father of one of the other men involved, in both robberies. DNA evidence connected the accused to the second robbery. Despite these similarities, the trial judge dismissed the connections as typical of armed robberies, and the Court of Criminal Appeal upheld the decision, finding that while the shared changeover vehicle gave the evidence some probative value, it was not significant enough to be deemed statistically meaningful in proving the accused’s involvement.
What are the main points?	Causal inference is the study of cause and effect relationships among variables and involves two main aspects: discovery, which focuses on identifying causal relationships between variables, and measurement of causal effects, which examines the size of causal effects. To determine a true causal effect, one would need to compare outcomes in different scenarios but whilst holding all other factors constant. The human brain is a predictive machine that relies on pattern recognition, leading to potential biases in causal inference. Counterfactual analysis is essential for establishing causal inference as data without a counterfactual can only provide observational insights. It is necessary to have a counterfactual to make better causal claims. Regardless of the assignment, an expert must understand the nature of the question being asked of them, especially if it is causal. It is crucial for the expert to acknowledge and address causal questions appropriately rather than ignoring them. Randomisation in creating groups for a study ensures balance not only in observable characteristics like gender and age but also in hidden factors like preferences and attitudes. This balancing act through randomisation leads to the elimination of potential influences, making it an efficient and effective method for achieving causal inference. In litigation, most data is backward-looking and observational, with some exceptions in areas like intellectual property or misleading advertising where experiments can be conducted. The key is often considered as finding situations with random events, called natural experiments, where the assignment of a treatment is external or random. Various causal inference techniques exist but the choice of technique depends on the specific circumstances of the case and the available data structure.
What are the practical takeaways?	It is essential that practitioners distinguish between correlation and causation. Where there are limits or invisible data at play, practitioners should address these as potential influences on any assertions made. When conducting research or analysis, it is essential to control for numerous factors to ensure the validity and reliability of the results. This requires a significant amount of variability in the data being used, as having diverse data points allows for a more comprehensive understanding of the phenomenon under study. It is important to ask questions early on in the research process, considering the available data sources and their potential value for exploration as a strong form of evidence. Practitioners should map out the causal structure by identifying variables and influences and then envisioning ideal settings for causal inference. By proactively thinking about data early on in the case and considering all possible variables and scenarios, one can prevent overlooking crucial information that could enhance the analysis and decision-making process. Lawyers should be conscious of biases and not projecting aspirations onto data when analysing it. Learning the basics of data analytics can help in asking the right questions, and ultimately making stronger claims.
Show notes	Weemaes, Hans, and Joshua Arnold, ‘Litigation Insights – Causation and Counterfactuals in Litigation’ (2024) Seven Up! (1964) Council of Chief Justices Australia and New Zealand, ‘Harmonised Expert Witnesses Code of Conduct’ (2024)

DT = David Turner; HW = Hans Weemaes

00:00:00	DT:	Hello and welcome to Hearsay the Legal Podcast, a CPD podcast that allows Australian lawyers to earn their CPD points on the go and at a time that suits them. I’m your host, David Turner. Hearsay the Legal Podcast is proudly supported by Lext Australia. Lext’s mission is to improve user experiences in the law and legal services, and here’s how the legal podcast is how we’re improving the experience of CPD. Today on Hearsay, we’re joined by Hans Weemaes, Head of Economics and Data Analytics at Vincents. Now, Hans recently co-authored a paper entitled Causation, A Data Science Perspective for Lawyers – really incredible paper which delves into the intricacies of dealing with causal inferences and the role of counterfactuals. With his extensive experience and expertise in economics and data analytics, Hans can provide a wealth of insight into these critical aspects of litigation. Hans, thank you so much for joining me today on Hearsay.
00:01:10	HW:	Glad to be here. Thank you.
00:01:11	DT:	Now, before we get into your paper and some of the topics that it covers, tell me a bit about your journey in your career. How did you get to working in this field and appearing as an expert witness testifying in it?
00:01:22	HW:	Sure. Well, I’ve been in this field for quite some time. I originally am from Australia and studied here, did some time at the Federal Treasury and Department of Foreign Affairs and Trade, and then went to the US and studied at the University of Chicago, business and economics and fell into this industry. Litigation consulting, they call it over there. It’s a fairly obscure sort of industry, but absolutely fascinating because you get to apply economics and statistics and finance to real life cases. So, I got a taste for it and returned around four years ago and have been doing similar work ever since.
00:01:55	DT:	Fantastic. So I suppose it’s an application of something you’ve been working on for a long time, but in a new environment.
00:02:01	HW:	Correct. Yes. It’s an interesting environment. It’s obviously court schedule driven, so it can be intense and, you know, in some jurisdictions, you really have the opportunity to work with leading experts who are leaders in their field and they’re applying cutting edge frameworks and have access to data. So it’s actually quite fascinating.
00:02:17	DT:	I suppose some litigators – I certainly think this, say that one of the fascinating things about working as a litigator is you get to learn a lot very intensively about an area that was previously unknown to you. If you have a case about the management of an oil rig, you learn a lot about oil rigs, or you learn a lot about pharmaceuticals, or a lot about wine making. You sort of get really deep into the specifics of a particular industry or someone else’s business, and then you move on. I imagine litigation consulting, there’s a similar allure in the sense that you get to learn a lot about something new and then move on to the next exciting thing.
00:02:50	HW:	Absolutely. I can identify with that. Every case is different. Even if you work within the same genre of case, you’re dealing with a different industry, a different environment. So you’re always learning. And I completely agree, that is the attraction of the industry.
00:03:03	DT:	Now, we’re talking about your paper. It’s called Causation. A lot of our listeners who work in litigation are very familiar with that word, they’re familiar with its use to describe a particular element of a cause of action, especially in torts, but they might not be familiar with some of the other terms that we’ll be discussing today, like ‘causal inference’, ‘causal effect’, the concept of causation from a data science perspective rather than from a legal or civil procedure perspective. So let’s start with those terms. What’s causal inference?
00:03:32	HW:	Yes, that’s a good place to start. So causal inference is the study of cause and effect relationships among variables and it’s a fascinating topic because it’s a topic that we’ve thought about as humans for a very long time, obviously, and it’s one of those topics that transcend a number of industries. So you have contributions from computer science, philosophy, statistics, economics, so it’s quite a rich tapestry of disciplines that approach this. I like to think of causal inference as having two pieces; there’s a causal discovery side, and then there’s more of the quantification of causal effects, and if you like, I can get into those two pieces just very quickly. The causal discovery, which is really not the focus of today’s conversation, but what that is really concerned about is we’re trying to identify the causal structure of a set of variables and I like to think of it as almost like a map of relationships. So if you think back to primary school or high school and you were studying ecology, for instance, I don’t know if you might have seen in your textbook or whatever, maybe perhaps a teacher put something on the overhead, which was a chart with all these arrows pointing to animals and plants and there’s a giant ecosystem.
00:04:45	DT:	Yeah, like the food chain.
00:04:46	HW:	Like a food chain, correct. It’s not a perfect analogy, but it’s a little bit like that where we’re trying to understand a phenomenon in terms of the causal relationships – and it literally does use arrows – we want to understand the direction of the cause and the effect. So you can think about a causal discovery and before moving on, that part of causal inferences is quite exciting right now because we are seeing algorithms being used where you can apply them to datasets and these algorithms will suggest causal relations in the data and there may be relationships you haven’t thought of. Now, that’s not to say that it will spit out a precise causal relationship, but it could be an indicator or a lead, if you like, as to what might be out there. So that’s causal discovery. What the paper is about, is the quantification of what’s called causal effects. And a causal effect is, if you like, a pure effect of a cause and perhaps the best way to explain this is an example. So, if I have a fever, and I take some medicine, and my temperature drops, say, five points, is that the causal effect? The five points. What do you think? Is that the causal effect of the medicine?
00:05:55	DT:	I see what you mean. You’re saying you’ve been given the two events, sort of as an assumption, what we’re investigating here is the medicine and the drop in your temperature. To what extent did A influence B?
00:06:07	HW:	Correct. Correct. So, would you say that the five points is a causal effect? And the answer is, well, no. And so why is that? And the reason is there can be other influences, just my physiology, I just could be good at getting my temperature down. So, but that still doesn’t answer, well, how do I get to that causal effect? And what I would have to do is, essentially, I’d have to go back in time, or I’d have to go to a parallel universe, meet myself there, and have a look at what the temperature difference was in that universe, or in that time period, without the medicine. And if I saw that the temperature fell four points, that also is no causal effect because there’s no treatment in that case. But the difference would give me an estimate of the causal effect. In fact, it would be the causal effect if everything else was held constant and so that’s just the notion of a pure effect. So we have this weird situation where you have to be able to take the treatment and not take the treatment at the same time. Now, you know, if I’m a listener to this, yeah, I’m thinking “well, we’re not off to a good start here. This guy’s telling me that I need to invent a time machine.
00:07:12	DT:	I was about to say, how do we do this without interdimensional time travel?
00:07:16	HW:	Exactly. So where is this going? Now, the good news is you can stay firmly planted on this earth and still come up with a causal effect. And that’s precisely what causal inference is about, is how can we use what we have, to come up with robust estimates of causal effects?
00:07:31	DT:	So, okay, we’ve covered two important terms so far – causal discovery, that’s mapping the relationships in the food chain, this causes that, or that causes this – our map of arrows. We’ve got causal inference, which is about the quantification of those relationships, how large or small is the causal effect between those two variables. Just pausing there, one I can think of that some of our listeners might be familiar with, there was a case involving Colgate and misleading and deceptive conduct. It’s an interesting case because it’s about how misleading and deceptive conduct in the market is actionable between competitors in the market, not just someone who was misled. So I’m doing this off the top of my head, but I believe that Colgate suggested one of its competitors had published some misleading marketing material, which had caused it to suffer damage in the market because consumers were misled away from using their product. There, I imagine, someone had to conduct an exercise in causal inference to look at, “well, there was some marketing material published by the defendant and there was this drop in revenue for Colgate.” What was the size of the causal effect between those two variables? Is that sort of the right way of thinking about it?
00:08:40	HW:	Yeah. Yeah. I think you’ve hit on a very nice application of causal inference and I think marketing is and marketing data is one of those data sources where you do have the right shape of the data, the right circumstances sometimes, where you can exploit these types of techniques. So again, it might be a good example to come back to. TIP: So I just mentioned a case involving Colgate. I was doing that off the top of my head. In that one, Procter & Gamble, the maker of Oral-B, filed a lawsuit against Colgate-Palmolive in the Federal Court, accusing the company of misleading and deceptive advertising for its Optic White Renewal Toothpaste. The lawsuit challenged Colgate’s claim that their toothpaste removes 10 years of yellow stains, alleging that this statement lacks reliable scientific support. Procter & Gamble sought a court declaration that Colgate’s advertising breached Australian Consumer Law. Oral-B argued that Colgate’s claims were damaging Oral-B’s sales, citing Woolworth’s data that showed that customers who bought the new Colgate product had previously purchased Oral-B or non-Colgate items. This is the causal inference exercise that I was discussing. Oral-B noted that Colgate had aggressively marketed the toothpaste across various platforms, including free to air television, with unusually high advertising spending between March and May. In response, Colgate dismissed the Woolworths data as uninformative or without context, suggesting that factors like pricing or promotions at individual retailers could influence consumer choices.
00:10:04	DT:	And the last term that I wanted to cover, and we’ll have to restrain ourselves here, because I guess this could be its own hour long podcast, is the difference between correlation and causation, or more specifically, the difference between correlation and the size of a causal effect. Your discipline looks at measuring the size of a causal effect. Why can’t I just say, “well, I’ve looked at the correlation coefficient between these two variables in a dataset, there’s the size of my causal effect.” Tell me why that’s not right.
00:10:34	HW:	Yeah, that’s a good question and that arguably is the question here, because I think that’s the classic manifestation of poor causal inference. So I think before we even get into any of these statistical notions, and I’ll try to avoid that, I think it just starts with the human brain. We as humans have these brains that are predictive machines. And again, this is my opinion, I don’t have a background in neuroscience here, but I think the origin of poor causal inference is that in order to do prediction, you have to rely on patterns and so our brains are these pattern recognition systems. And incidentally, it’s not just our brains we see it in. We see it in animals. And even there’s interesting research right now that we’re seeing pattern recognition at a cellular level in life forms that don’t have brains, like plants, for instance. So clearly it’s part of how nature operates in terms of surviving and navigating the world. And so just to give you an example, this morning driving to the airport, I saw a stop sign. I don’t need to stop and research what that stop sign… I didn’t even need to read the word ‘stop’. I recognise the pattern and I act. So they’re very efficient mechanisms of getting through life. The problem is that you don’t all get into these routines. Life gets complicated, you deal with complex phenomena, and yet our brains are still operating in this pattern recognition mode and I think that is, at least it’s my opinion, why we run into the problem of confusing relationships. Now, a lot of that sounds abstract, so I’ll get into some examples, maybe with some fake data if you like. So, let’s imagine I’ve got in front of me a spreadsheet, and I’ve got two columns of data. That’s it, just two. One is ice cream sales, and the other is drowning incidents at a beach. And I look at the data, and I see, as you say, a correlation. There’s a strong statistical association between the two. One goes up, the other goes up, and vice versa. Now, if I think about it, nothing’s really registering my brain. There’s no pattern there, and I think, well, it does seem a bit odd why there would be a relationship, and really, it’s the absence of the pattern in my head that saves me, and I don’t really see an association there. It’s correlation, but I don’t take that next leap and draw any causality conclusion from it. The reason why there is an association to begin with is I’m not really thinking about the data that’s missing and this will be, no doubt, something we’ll come back to over and over again. There’s things that are invisible. So what’s ultimately happening in that situation is we’ve got something called, could be hot weather or it could be some holiday season that’s affecting both the ice cream sales and people visiting the beach. If I control for that, the relationship just disappears. There is no relationship between the two, generally. So that’s what’s going on. Now, let’s keep the data series that I like, which is ice creams, and I’ll get rid of the drownings and replace them with advertising. We’ll come back to advertising. And again, I see a positive association between the two, a positive correlation, they move together. This time I think “okay, well, I know something about advertising, it makes sense.” It’s plausible that if you advertise more, you’ll generate more sales and I might stop there. I might say, “well, yeah, that makes sense. It’s advertising that is causing those sales.” In reality, the situation is no different to the drowning example. In both cases, there are a lot of these hidden variables that I’m not seeing. It could be that there is an event that’s driving both ice cream sales, and it’s causing me to advertise more. There’s all sorts of influences, so there’s really no difference between the two, but yet in one, I’ve got a previous pattern that is biasing, I’m projecting my knowledge onto the data, and the other one I’m not but in any case, we need to step back and be conscious of those biases and think about what is missing. TIP: So Hans just mentioned that humans have developed these pattern-recognising techniques that can occur very quickly in the human brain, but sometimes trick us into being convinced of things that aren’t true. We talk about these heuristics, these mental shortcuts that our brains sometimes take in the context of behavioural science and how we can harness this aspect of our psychology to make the law more effective in practice in episode 66 with Dr Alex Gyani called Don’t Shove – Nudge: Promoting Better Choices in the Legal System. If I could just add one more thing, perhaps a different way of couching correlation versus causation and I think this is quite important to mention. The task of coming up with a measure of causal relationship is an order of magnitude greater than the rigour that’s required for coming up with a correlated relationship, and I like to think of it almost in a negative versus positive sense. So, in order to come up with a correlation, I can apply a statistical measure of how they move together, and I’m done. I can conclude whether they’re correlated, how strongly they’re correlated. But when it comes to causation, and I think this is the critical point, is that in order to come up with a causal effect, I have to eliminate every other influence that’s out there. It turns the problem on its head. And so that’s what I mean in terms of thinking about causation in a negative sense, that I’ve got to eliminate everything else and we’ll expand on that a bit later, but that is a key difference between the two concepts.
00:15:39	DT:	And I guess for some of our listeners who have practiced or do practice in the criminal jurisdiction, that sounds a little familiar. Eliminate any other plausible explanation is one way of thinking about the test of beyond a reasonable doubt. In your paper, though, you say that a casual attitude towards causal inference dominates in courtrooms, leading to both legal professionals and testifying experts drawing some unreliable conclusions. Why do you say that is?
00:16:06	HW:	Yes. So I do think there is somewhat of an inconsistency that I see in the industry, where on the one hand, I think the average expert will know the difference between correlation and causation at an intuitive level. And yet, when you read expert reports, there’s generally analysis that you see that relies on a type of counterfactual that doesn’t line up with truly understanding the difference between the two concepts. And in my paper, I talk about two manifestations of that. We call them naïve counterfactuals and maybe I can just give you a quick example of what you would typically see and why, from a causal standpoint, it’s not as robust as you would like.
00:16:46	DT:	Yeah, well let’s start there with the bad counterfactual, I suppose, or the naïve counterfactual that you’ve described, and maybe a better one, and also look at the role of the counterfactual in elucidating some of this expert evidence.
00:16:58	HW:	So, we’ll go back to our one of the first examples about the impact of taking medicine in reducing your fever. You know, you want to try and get to that pure counterfactual. So, in a position where essentially if you’re dealing with a unit that’s an individual, almost a clone of yourself, or if it’s a business, the same business.
00:17:15	DT:	For the listeners of ours who love Latin, ceteris paribus, you know, holding all else equal.
00:17:20	HW:	So a counterfactual is critical for drawing causal inference. If you look at data without a counterfactual, data can only be observational. You’re just making an observation. So you have to have a counterfactual. So one of the types of naïve counterfactuals you see is pre and post type of analysis. And maybe, just to go back to my previous example of stop signs… So let’s imagine we’ve got data on road accidents, and we’ve got a high incidence of road accidents, the government puts in a stop sign, that section of the road, and then we see the volume of road accidents fall. And it’s that sort of setup that we typically see. We look at the past, we look at the future, and we say, “okay, we have a causal relation.” But there are obviously issues with that approach. Now, the key one being is we’re not controlling for time. There’s a lot of things over time that will affect the incidence of road accidents. So, there’s no attempt to control for that.
00:18:12	DT:	Drivers might be getting better, the population aware of how dangerous that stretch of road is might now be driving more sensibly. There’s a lot of things that might change because of the incidence of road deaths in the past, independent of the stop sign.
00:18:27	HW:	Correct. There could be all sorts of factors, and in fact, when you look at particularly road data, it can be quite variable. It almost looks random. So you can have years where it’s quite high and then it drops and then it’s up again. And so the other problem you have with what we call longitudinal data, so the pre and post setup, is that you can have situations where you have high incidence of accidents and then the next year, it goes down irrespective of whether there’s a stop sign there or not. It’s just that’s just the natural flow of the data. And so that can also happen. That’s another phenomenon, some data sources that can undermine the conclusions that you draw from that simple type of analysis. TIP: So Hans has just mentioned longitudinal data. For any listeners who might be unfamiliar with that term, longitudinal data, we just mean a collection of repeated observations over a long period of time. So in fields like medicine, social sciences, economics, longitudinal data is often used to study the long term effects of interventions or programs, like the progression of diseases or the impact of policy changes. If anyone remembers that program called ‘Seven Up!’ that looked at people from the age of seven throughout their lives. That’s a great example of some longitudinal data over the life of a set of subjects.
00:19:41	DT:	I didn’t even think of that. Your mind immediately goes to look for some other rational explanation. I guess this is the pattern recognition part of the brain firing again, right? We like to see a story and explanation, but I suppose sometimes you might be giving the evidence that there’s no cause for any of this. It’s just random.
00:19:56	HW:	That’s right, yep. There’s things you can do, you can study the patterns of data beforehand and there’s ways you can deal with that as well. The other type of naïve counterfactual you see is where you don’t have time dimension but perhaps you’re looking at the differences in two groups. So it’s more of a cross sectional setup. So imagine, for instance, you’ve got two groups, and maybe we’ll make this set up, we’ve got cancer patients, for instance. And we’ve got one group that has taken an experimental drug, and one that has not and we could look at the differences in outcomes. So what is the issue with that? Well, we could control for certain characteristics, which is great, but there may be factors we haven’t controlled for that are impacting whether a group is treated, but it’s also impacting the outcome. So, for instance, if I’m in the treated group, the reason why I may have been in that treated group is that I may be at a different stage of cancer, or maybe in the same stage, maybe we’ve controlled for it, but I feel like my treatment’s not going very well, and I’m willing to take a chance on this drug. So there could be something about me that’s not measured that’s influencing both the treatment and the effect and if that’s not taken into account, then the conclusions that I draw from that simple, naïve setup can be challenged. So that’s another type of counterfactual that you see.
00:21:11	DT:	So you see the, if I can categorise those, the before and after is one category, and then the other is what looks at face value like the randomised double blind, but in fact isn’t controlling for some important variable that we haven’t observed.
00:21:25	HW:	Correct and they’re notoriously data heavy. You essentially need to control for a lot of factors. You need a lot of variability in that data.
00:21:33	DT:	Do you think the casual attitude about causal inference… Well, I’m asking you a causal question now, is attributable a little bit to the nature of the exercise that we conduct in a civil courtroom? We talked a moment ago about the burden of proof in criminal proceedings beyond a reasonable doubt. I wonder if there’s something about the environment of civil proceedings that leads legal practitioners, testifying experts to have this casual attitude about causal inference because in civil proceedings we don’t have that high burden of proof we have a burden of proof on the balance of probabilities and, you know, that’s not the highest burden of proof to which we can put a set of factual assertions. We even have a mechanism in our rules of evidence to draw meaningful conclusions permissible conclusions off correlation. There’s a category of evidence, our listeners may be aware, called coincidence evidence where the tribunal of fact is permitted to draw a conclusion that two things are so unlikely to have occurred independently of one another that they must be related. So there’s a permission in the legislation to draw a conclusion about the ultimate fact without a causal relationship in certain circumstances. TIP: Coincidence evidence is dealt with under Section 98 of the Evidence Act in NSW and equivalent provisions in the Evidence Act in the Commonwealth and other uniform law jurisdictions. Coincidence evidence means evidence that’s adduced to show that it’s unlikely that two or more events happened coincidentally, having regard to the respective features of each incident. For example, if the victim of a hit and run recalls the vehicle in question was a blue and pink striped bus, and on a small island where the incident occurred, the alleged perpetrator also imported and reportedly drove around in a pink and blue striped bus, the chances that those events were not related is incredibly small, even though there’s no direct evidence that it was the perpetrator’s pink and blue striped bus that was involved in the incident. Another example might be in a murder case where a person is charged with murdering their spouse, that that person has been married before several times to spouses all of whom have died in suspicious circumstances. It’s necessary that the person seeking to adduce coincidence evidence give notice to the other party of their intention to do so, you can’t do it by surprise, and that the evidence has significant probative value, something that a judge needs to make a determination about. Now experts often draw conclusions from statistical data by extrapolating general patterns observed in groups to individual cases. This process involves establishing a general proposition based on a sample and inferring that that applies to the broader population and to the individual case. Now what lawyers then do, which causes scientists a bit of concern, is to then say that because a certain fact is true about the general population, then we can say that this and that particular individual in this particular individual case also followed that same path, or that that certain fact applies to that individual. Even if individual similarities across multiple events could be dismissed as common, the key question is whether the combination of these similar features is distinct enough to provide significant probative value, a question that was highlighted in the case of R v Mason (2003) 140 A Crim R 274. One example of where a set of facts was not coincidental enough to be admitted is the R v Milenkovic (2005) 158 A Crim R 4. In that case, the accused was charged with the armed robbery of a Westpac bank, and the Crown presented evidence linking him to a subsequent similar robbery. Both robberies involved men dressed in dark clothing, armed with a shotgun and sledgehammers and driving stolen vehicles. A key point in the evidence was the use of the same changeover vehicle, owned by the father of one of the other men involved, in both robberies. DNA evidence connected the accused to the second robbery, but not the first. Despite these similarities, the trial judge dismissed the connections as typical of armed robberies. Lots of armed robbers wear dark clothing, lots use shotguns and other weapons, lots drive stolen vehicles, and the Court of Criminal Appeal upheld that decision, finding that while the shared changeover vehicle gave the evidence some probative value, it wasn’t significant enough to be deemed statistically meaningful in proving the accused involvement. Now we talk about this vexed question of using statistical data in individual cases in an earlier episode of Hearsay with Nicholas Lennings and John-Henry Eversgerd. That’s episode 14 called, fittingly enough, Statistics in Adjudicative Fact Finding. Go check out that one after you finish this one.
00:25:45	HW:	It’s not an issue of aptitude of the experts. My opinion is that, you know, a hundred years ago or whenever our curriculum became specialised, we chopped up a big discipline into very small disciplines. So we’ve got accounting and we’ve got economics and statistics, and we’ve got, particularly in the Australian jurisdiction, a lot of people that testify come from forensic accounting backgrounds and they understand the language of the business and you know, they’re experts of financial statements and accounting rules and all that but when it comes to understanding issues like causal inference, advanced data analytics, it’s just not part of the curriculum. And I do see that over time, as that evidentiary potential of big data increases, I do think there’s going to be somewhat of a tension where you’ve got a background that is not geared towards this type of analysis, and yet the data available is becoming more useful and subject to these type of techniques.
00:26:42	DT:	Previously on the show, we’ve talked about the importance of selecting the right expert witness, or selecting the right expert witness from the right discipline for the factual question that you intend to pose to them, and the important preparatory research that might be done there, even using so called dirty expert witnesses who you do not intend to produce evidence from just to get your own head around some of the concepts. TIP: Now some of our listeners will already be familiar with the concept of a clean witness and a dirty witness when it comes to expert evidence, but colloquially, a clean witness is someone who’s offering unbiased expert opinions that are admissible to help the court or tribunal understand a specific issue. This person is called to give evidence in the party’s case. On the other hand, a dirty expert witness works directly for one of the parties in a dispute but they aren’t actually called to give evidence. The instruction that they’ve been given, the material with which they’ve been briefed… Well, it might bias them against giving evidence that’s admissible or probative or persuasive in a court hearing, but the dirty expert witness’ job isn’t to give evidence in court. No, it’s to give advice to a party and its legal advisors on how to strategically build and present their case and reduce expert testimony from the clean witness. They’re more in the nature of a litigation consultant, part of the litigation team, rather than an impartial expert witness who’s abiding by the Expert Witness Code of Conduct. Tell me about the importance of the expert witness’ discipline, training, background in drawing causal inferences, measuring causal effects?
00:28:17	HW:	That’s a good question, and in particular, because we are seeing the evidentiary potential of big data increasing all the time, which allows you, among other things, to address causal inference type questions. Look, irrespective of the assignment, an expert does have a duty to understand what is asked of him or her, and so if you’ve got a question that is causal in nature, irrespective of how it’s framed, it is what it is in terms of a causal inquiry. An expert needs to recognise that and can’t pretend that it’s not a causal question. And having identified it as a causal question, then the issue becomes, well, is the expert qualified to take on that type of question? And again, in an age where you’ve got more data to work with, and you are using an expert who may not have that background, what are the implications of that? You could have a missed opportunity to draw strong causal conclusions, and it may even be a risk, if the other side, even if you don’t have the data, but they’re causally minded, they may take issue with someone’s assignment, particularly if that assignment or set of assumptions implies a causal relationship. So an expert not only has a duty, but I think it’s part of the risk management process. Again, in this age of big data, you do need to marry the right expert with the right question. TIP: The specific professional and ethical obligations of expert witnesses vary between jurisdictions in Australia. For each jurisdiction, these obligations can be found both in the court rules and the practice notes of that jurisdiction. In 2015, the Council of Chief Justices of Australia and New Zealand approved a harmonised Expert Witness Code of Conduct. And today, this harmonised code has been adopted in the Supreme courts of New South Wales, Victoria, Tasmania, the ACT, and the Northern Territory, as well as the Federal Court of Australia. So it’s Queensland, WA, and South Australia who are missing out. NSW listeners, the harmonised code appears in Schedule 7 of the NSW Uniform Civil Procedure Rules. The rules state that an expert’s report and oral testimony can’t be admitted as evidence unless the expert first confirms that they’ve read the Code of Conduct and agree to follow it. That Code of Conduct mandates a general duty to the court, a duty to comply with court orders, a duty to collaborate with other expert witnesses where possible, as well as specific requirements for the content of the expert’s report – things like enumerating the assumptions on which they’ve relied. For listeners outside of New South Wales, we’ll include a link to the show notes to a helpful summary provided by the Council of Chief Justices comparing the rules in each jurisdiction. And by the way, if you’re interested in learning more about the ethical obligations of experts, you can check out episode 10 of Hearsay with John-Henry Eversgerd called How to Be an Expert at Briefing Experts.
00:30:56	DT:	Now, we were talking just before about how randomness in longitudinal data in particular can be a bit of a confounding factor when trying to draw causal inferences. On the other hand, your article talks about how the key to overcoming the challenge of estimating causal effects is randomisation. So, tell us about how we can use randomisation, use randomness, to measure causation.
00:31:19	HW:	Yes, that’s a very good question. Randomness is an intriguing and powerful phenomenon, and I think at one level, it’s intuitive. We can think back to a medical trial, for instance, and we know we don’t want to put our hand on the scale, so to speak, to bias the results. So we have this concept that randomness is good in a medical trial setting but if you go one level deeper, I think you can truly understand what it’s doing and just how powerful it is. But before we get there, just to go back to something that we talked about earlier in the podcast. Recall just the level of rigour that’s required to come up with a causal effect and we talked about this notion of thinking about the negative in terms of eliminating all other possible influences, right? It’s a very rigorous process. So let’s think about, well, how does randomisation help with that? So if you imagine you’re creating two groups, and so I’m reaching into a population of individuals and I’m starting to populate both groups. If I do that in a random way and I start populating, I start to see that they become balanced in observable characteristics. So I see some men in this group, some men in that group, some women in this one, it goes on and on in terms of balancing on age, for instance, and other characteristics. They’re things that I can see and measure, so that’s great, but what might not be intuitive is that as these groups populate, they’re also becoming balanced in the hidden characteristics. So your preferences and your attitudes and your routines, things that some of them I can observe but I can’t measure, and some things I can’t even observe at all, things in your head or whatever and so that’s key. And so through the act of randomisation, in one process, I’ve taken care of all these, it could be hundreds, it could be thousands, it could be tens of thousands of factors, but they’re equally balanced. So therefore, when I compare the two groups, they wash out. So that’s why I say it’s almost like this magical property of randomisation, where I can get to a causal effect, and from an efficiency standpoint, it just simplifies the process of estimating a causal effect. So that’s why it’s referred to as, really the gold standard of causal inference, because it’s so efficient, so effective. It gets you very, very close to that pure effect under the right circumstances.
00:33:37	DT:	Now, how do we achieve that in a litigation context? Because generally speaking, unless it’s a truly enormous piece of litigation, we can’t practically run a randomised experiment. I’m thinking even in terms of experiment design there are probably things like ethics considerations and voluntariness that even start to introduce some biases into the groups because you can’t conscript people randomly from the electoral roll for example. So, how do we leverage randomisation in a litigation context where we don’t have a lot of control over, say, experiment design, and we can really only use the historical data?
00:34:16	HW:	Yes, that’s right. So obviously, most litigation, you’re dealing with backward looking data, you’re looking at observational data. Now, there can be a few exceptions where you might be able to run some experiments, like for instance, in intellectual property or some misleading advertising, some of these consumer actions, you could run studies to, for instance, impute prices on particular variables and that sort of thing. But putting that aside, you’re quite right, most of the time you’re dealing with these observational data and so the question is, where’s the randomness in that? And so what it comes down to is finding situations where there are random events and to be specific, what we’re looking for is where the assignment of a treatment is external or random. So that’s one area and they’re called natural experiments. So to give you a non-litigation example, these situations are all around us. You can see them in daily life, you can see them on the news. So for instance, we’ve all been through COVID and if you look at the variability in teaching during COVID, so we might have grade 12 students that had 12 months at home or six months at home, we can use that variability and importantly, that variability may have been random, it just could have been where they were in the state or particular area and so we can look at the differences there. So the idea being that there was a random treatment, but generally these groups are the same. So we can see these situations in real life that we might be able to exploit. You know, you can have situations where perhaps one city had contaminated water and a neighboring city did not. It could be another example, or if you wanted to look at the impact on fares in terms of ridership, Queensland has just introduced a 50 cent fare, it’s a type of experiment, you could compare that with a similar city. So anyway, so it’s that type of thinking. So it’s not like we are subjecting the data to randomness. We need to find those situations and they don’t always exist, I mean, it’s certainly not the case that this is going to be available in every case. Instead, the takeaway is it’s useful to look at the history of whatever you’re dealing with, the company or social setting, but to look at history and see if those events are there, see if that randomness is there you can exploit. But there is another situation to exploit. It’s not quite a natural experiment, but it’s more of a, we call it semi-natural or quasi-experiment. So you may still have a situation where there is variability but it’s not random, and there are procedures where we can take a look at that data and transform the data so that the output is what you would expect from a random process. So it involves an extra modelling step. I won’t get into the details there, but it’s not the case that you should only look for treatments that have come about due to randomness. You can just look for variation, and then we can deal with that. It’s probably not as robust, but it can get you further along that causal spectrum.
00:36:59	DT:	Yeah.
00:36:59	HW:	So those sorts of situations are the types of data contexts that you should be on the lookout for. And if it does work in your particular case, it can lead to some very powerful insights.
00:37:11	DT:	I suppose looking for those natural experiments where they can be found, normalising the data where that’s possible, removing outliers, I suppose, interquartile ranges, things like that. I suppose one challenge there is, there’s almost some causal thinking to be done about whether that is a natural experiment, whether these populations are different for a factor that has a causal relationship to the thing you’re studying or not. You know, you describe the study of students and their time spent being taught remotely during COVID. That might be a consequence of their geographic location, that might be a consequence of their geographic location. You would want to make sure that geographic location doesn’t have some causal relationship with the variable you’re studying.
00:37:52	HW:	That’s exactly right. That’s exactly right. So what we talked about previously holds. You want to think about these potential influences on the treatment and the outcome. You’re absolutely, absolutely right. If there was an influence on the treatment, then of course you’re not dealing with a random situation. You have to think carefully about the design of these natural experiments.
00:38:09	DT:	I suppose this is why researchers really like twins as the kind of natural experiment, especially for longitudinal data.
00:38:16	HW:	Yes.
00:38:17	DT:	Because so much is held equal between them, right down to genetics, provided that there’s some measurable difference. Okay, so we’ve talked about using randomness, finding natural experiments. We’ve talked about some of the naïve ways that people look at causal effect and causal inference and how selecting the right expert can help you to mitigate that as a litigator. Tell me about some of the other causal inference techniques that you might use as a witness to measure a causal effect, to estimate a causal effect that we might not have talked about yet.
00:38:51	HW:	Yeah. So there’s quite a few causal inference techniques out there, but I would say that I wouldn’t think of them as a menu in the sense that you have a case and you say, “well, I could try any of these.” It’s more of a strange sort of menu where what’s available ultimately, depends on who’s ordering from the menu. Your circumstance may not be good for most techniques. You may not have the right data structure. So, there is definitely a marriage between techniques and the circumstances. It’s not an automatic process, but they all have some similarities conceptually. All of them deal with the notion that we have observational data. We can’t run experiments. So they have that in common. And as you said, you’ve got to have a counterfactual twin. So they have different types of twins. The twin might be a comparable set of companies. So say, for instance, a unit is a business. The twin might be a comparable set of companies that have a very similar trend in the outcome. So they have similar characteristics, but also a trend in the outcome before an event. So that could be your twin. Sometimes a twin could be your own company, could be a model of that company, so this is the classic event study setup that you see in securities cases, so that’s a type of causal inference technique. Sometimes your twin can be your neighbour or your peers. So there’s techniques there, there’s one called regression discontinuity design and the idea is that you’ve got some sort of rule or some sort of boundary where it separates individuals, let’s say, or businesses that are practically the same on either side of that boundary and it’s almost like a quasi-random situation, so you could exploit natural boundaries. So there’s all different types of creative techniques that you can think about developing a twin. But they all have that sort of concept of a twin. And again, to do that, they’re doing that to address those confounders, those invisible characteristics. So they all have that in common. And then finally, they all have different sets of assumptions, which you need to make in order to make a causal conclusion.
00:40:48	DT:	Now, you said that which of these techniques might be available to you really depends on the data that is available concerning your case and the structure of that data. And in your article, you say it’s important for lawyers to understand data early in a case. Now, I sympathise with litigators, everyone’s always telling them to think about X or Y early in the case, they have to think about everything early in the case if you listen to everyone. Make your case for why data is so important and why it’s something that litigators need to be thinking about very early in their case.
00:41:20	HW:	Yeah, look, it goes back to something that we talked about earlier, is it boils down to risk ultimately. There’s a risk of a missed opportunity, and there’s a risk that the other side may employ some robust methods. When I say you think about data early in a case, it’s a little bit broader than that in that you should think about the causal question early. So what I mean by that is you need to think about, well, what is the treatment and what is the effect? And be precise about it. If we’re talking about that your class has been damaged, for instance, what does that mean? Are you talking about a reduction in health outcomes or if it’s a business, is it a reduction in sales? So you have to be very precise, first of all, think about that early. We’ve talked about this notion of a map of causal structure, so think about the variables, the influences. And then the next step is then to think about, well, in an ideal setting with that benefit of having a causal inference framework, what would be some ideal settings? So without looking at the data, think about, well, what sort of context could set up these random or quasi-random situations? So for instance, we talked about misleading advertising. Ideally, you might have the advertising in question rolled out in one spot and not in another. So we’ve created some variation and it’d be even better if the decision to roll it out in the first instance to say a pilot group was random. So I always started by thinking, well, what would be the ideal situation? And then that allows you to then say, okay, well, what data is actually available? And so without that causal inference framework, I don’t have that sort of third step. I think it’s just useful to engage in a little bit of exploration and imagination. So that’s what I mean by thinking about data early in the case. You mentioned earlier in misleading advertising that you’ve looked at sales before and after. You don’t want to go to the end of the case and then know that, well, you did actually have data on another market, a similar market, where there was no advertising that would have allowed you with that one additional question that would have allowed you to make potentially quite a strong causal inference statement, but you didn’t ask the question. So that’s what I mean by a missed opportunity. If you have that right mindset in that case the data was available, in other cases it might not. But the other thing to keep in mind from a risk standpoint is that, again, there’s a lot of data that’s freely available out there now, there’s administrative datasets. We have social media. So to get back to misleading advertising, you could take a look at what people are talking about your product and you’ve got that data on sales and you come up with a strong causal inference analysis. It’s almost like a preemptive attack. You could deal with those other opinions that are coming from a different data source.
00:43:57	DT:	Absolutely, and I suppose a lot of litigators at a pleading stage, they’re thinking about causation because it’s an element of the case, they need to be pleading causation in their originating process. They need to have some proper basis, or at least it has to be plausible in order to plead it but what you’re talking about is thinking about the data that might support that conclusion in an earlier stage. And I suppose, and you say this in your article, it can unlock case strategies or case theories that weren’t available to you before. I think with more traditional forms of evidence, speaking to that witness or finding that document in the discovery bundle, that reveals something that makes an entirely separate cause of action available, or, you know, I’m thinking about an extreme example here, but evidence that you uncover in discovery that suggests some intentionality or malice behind the actions of the defendant, which opens up the door to exemplary damages or special damages. That kind of evidence changes the way you plead the case in the way you present it. You’re talking about, well, hey, there’s an opportunity to do that with data driven evidence as well.
00:45:05	HW:	Oh, absolutely. I think what causal inference does is it gives you some generic research designs, that if you can pull those off, you can make very strong causal conclusions. And the implication of that, when I say about, think about data early, it’s just to, with that in mind, ask about, well, what data is available? Can we get close? And if it is worth exploring quite early, because it may turn out to be a very strong form of evidence. Now again, not always, but again, in the age of big data, so both coming from your client, but again publicly available sources, I think it’s going to be harder to ignore.
00:45:46	DT:	During our conversation today, we’ve referred to a few possible examples of the kinds of cases where this exercise of causal inference, the use of data, might be particularly relevant. We’ve talked about false advertising cases, securities cases. What are the kinds of cases that you see in your practice tend to involve this kind of causal inference exercise and what sort of data do listeners need to be looking for, gathering, thinking about in those kinds of cases?
00:46:16	HW:	Yeah, so you’ve hit on some of the cases that I think are applicable but it’s really not the case genre, it’s really the circumstances of your case. So, from a data standpoint, again we’ve talked a little bit about this before, but I think being on the lookout for those natural experiments where you’ve got randomisation, that’s a good place to start, irrespective of your case but more generally, you know, I don’t like to give the answer “it depends” – I mean, that’s the correct answer – but as a general matter, where you’ve got datasets such as in, you know, consumer data, marketing data, we talked about securities, high frequency data, prices on the competition side, and then you also got data on big administrative datasets and health datasets, really rich data in terms of not only cross sectional, so details on units that can give you a lot of potential but then also data over time and really, one of the most versatile data sets is where you track individuals over time, do you combine those two features and that’s called panel data and so that allows you to construct some of those twins we were talking about earlier, where you can find whether it’s a group of individuals or businesses that have a common trend before a treatment. So that panel data structure is particularly versatile for these types of cases. So it’s not really a case genre, but it comes down to the circumstances where there’s randomisation, and again, where you’ve got fairly rich datasets.
00:47:37	DT:	I’m just thinking back to so many of the cases I’ve worked on in my career in the context of our conversation today and thinking how many of them probably could have benefited from this causal inference exercise. Things like an acquiring company makes a material change in the target company before the end of the earn out period and its revenue declines or landlord fails to fix an issue the tenant raises in a restaurant and the restaurant’s revenue declines or even some of our listeners, I’m sure, work in personal injury torts and there are so many causation questions around potential earning capacity in the future, the relationship between the injuries or capacity of the plaintiff and other characteristics that person has, independent of the injury they suffered. There’s so many things I can think of now where I would, with the benefit of hindsight, think about the kinds of causal inferences that I would draw. If there was one tip that you would want to leave our listeners with to think about the next time they’re preparing a case that will help them to remember, or “I should consider how I might measure this causal effect or I should consider how I might persuade a court that there is a strong causal relationship here.” What would that tip be?
00:48:56	HW:	Yeah, I mean perhaps since we first started the podcast, which is to be conscious that when you’re looking at data, that we mustn’t project biases or aspirations for a case onto that data, and it’s more of a mindset where you’re constantly thinking, “well, what am I missing?” You can go in saying “100%, I’m missing something here.” I’m not seeing the full story. Every data point that you look at is just one window into reality, and I think it’s just good to keep in mind that there are these lurking variables that can really mess with the conclusions that you think you have. So that would be my message, and I think, particularly if you’re starting out as a lawyer, I think, given where we are in terms of the explosion of data that’s available and we’ve got exciting data analytics areas and there’s also now the rise of AI, we talked a little bit about causal discovery and the applications there. I think it is important for a young lawyer starting out to learn about the basics of data analytics. You don’t need to know a lot, but I think it would help pose the right questions early on. And one of the things you can do is you can obviously read litigation handbooks and business statistics books, but you can invite experts to law firms and give seminars and I think that’s quite healthy to start developing that intuition.
00:50:11	DT:	That’s a great tip. And I think your takeaway about having that inquiring gaze at a new matter is a really good one. We owe it to our clients to do that. I think every lawyer has maybe experienced this phenomenon as an advocate for our clients. We sometimes have a new client come in and say to us, “defendant X did this, then this bad thing happened to me. Get me justice, get me redress.” And as our client’s advocate, as someone standing in their corner, we’re quick to draw a relationship between those two things. What you’re saying is “well, we owe it to our clients to slow down and think, hang on, are those two things related or not?” And I think doing that not just helps us make sure that we’re being appropriately critical about our own cases, but, as you said, can open up new case strategies, can help us find more persuasive arguments than we otherwise might have had, so that’s a great tip. Well, Hans Wermes, thank you so much for joining me today on Hearsay.
00:51:16	HW:	Thanks very much. It’s been a lot of fun. Thank you.
00:51:17	DT:	As always, you’ve been listening to Hearsay the Legal Podcast. I’d like to thank my guest today, Hans Weemaes, for coming on the show. Now, as we mentioned in the episode, if you want to learn more about the role of data and statistics in the law, especially in litigation, check out our episode with Nicholas Lennings and John-Henry Eversgerd. That’s episode 14, way back in the archive, called ‘Statistics in Adjudicative Fact-Finding’. If you’re an Australian legal practitioner, you can claim one continuing professional development point for listening to this episode. Whether an activity entitles you to claim a CPD unit is self assessed, as you know, but we suggest this episode entitles you to claim a professional skills point. For more information on claiming and tracking your points on Hearsay, go to our website. Hearsay the Legal Podcast is brought to you by Lext Australia, a legal technology company that makes the law easier to access and easier to practise, and that includes your CPD. I’d like to ask you a favour, listeners. If you like Hearsay the Legal Podcast, please leave us a Google review. It helps other listeners to find us, and that keeps us in business. Thanks for listening, and I’ll see you on the next episode of Hearsay.

You must be a subscriber to access this content.

Want to listen to the full episode and all our other episodes?

Grasping Causation: A Data Science Explanation of Causal Inference and the Role of Counterfactuals

Join our mailing list!

Hearsay CPD – Anywhere, Anytime

Full access to Episode 129

Full access to Episode 129

Want to listen to the full episode and all our other episodes?

Grasping Causation: A Data Science Explanation of Causal Inference and the Role of Counterfactuals

Join our mailing list!

Hearsay CPD – Anywhere, Anytime