Building Trust in Software-Generated Evidence in Criminal Justice

Introduction

Last week, I attended an international seminar hosted by the Financial Intelligence Analysis Unit (FIAU), where we touched on the use of technology and blockchain analytics in investigations, asset freezing, and eventual prosecutions of crypto cases. Interestingly, the past month had also seen me closely following the UK consultation on software-generated evidence in criminal proceedings.  

Truth be told, these experiences left the same lingering thoughts and echoed some of my long-held concerns. It became clear to me that we are now at a pivotal point in legal and regulatory practice where software-generated outputs, be it from AI, analytics tools, or legacy accounting systems are becoming central to criminal investigations and prosecutions. And yet, the legal framework hasn’t caught up. In many cases, we are still treating machines as infallible witnesses, even when their inner workings are opaque.  

Remember the Post Office Horizon scandal? Hundreds of sub-postmasters were wrongly accused of theft, fraud, and misconduct. Their “accuser”? A software system named Horizon. It logged accounting shortfalls that didn’t exist, allowed remote tampering, and often couldn’t explain itself. Yet, it was treated as gospel.

People lost everything: jobs, reputations, freedom because no one asked the obvious: Is this system accurate? Courts presumed it was. The presumption of reliability, a leftover from when computers were rare and simple, turned into a weapon. And those sub-postmasters? They paid for that assumption in ways no one should.  

Beyond Horizon: In United States v Sharma (D NJ, 2022), blockchain clustering software linked addresses to criminal activity. The evidence stood, but the court made one thing clear: the software’s inner workings must be open to scrutiny.  

The same tension arose in the Bitcoin Fog case. Prosecutors relied on Chainalysis Reactor, and defence attorneys pushed back, calling it a black box. The judge allowed the evidence but not without acknowledging the concern.  

In United States v Catala (SDNY, 2023), although not software-related per se, the court reminded everyone that assumptions aren’t enough. Evidence must be specific, traceable, and logically connected. That’s just as true for zeros and ones as it is for fingerprints.  

Meanwhile, Matter of Weber (2024 NY Slip Op 24258 (Sur Ct, Saratoga County)) presented a surreal scenario: an expert relied on Microsoft Copilot to assess a trustee’s accounting... and was unable to explain how it reached its conclusions. The judge ended up interrogating the chatbot himself.  

Then there’s Mata v Avianca, Inc (2023 US Dist LEXIS 97638 (SDNY)), where an AI-generated affidavit cited non-existent legal precedents. The court threw it out. “Credibility,” the judge said, “cannot be AI-generated.”  (my emphasis)

In Europe, frameworks like the GDPR, the EU AI Act, AML Regulation (Article 76), and the ECHR already address black-box dependency by emphasising transparency and adversarial rights.  

Learning From Safety-Critical Industries: Here’s something we can learn from industries where lives are on the line every day. Aviation software? Nuclear control systems? They don’t get deployed unless they’ve been tested, certified, monitored, and scrutinised from every angle. There’s an entire discipline around “assurance cases”, structured arguments demonstrating a system’s safety for its intended scope.  

Why should evidence-generating software be any different?  Imagine a blockchain analysis tool like Chainalysis Reactor as the modern version of a radar gun. If a radar gun is off by 10%, that’s a ticket overturned. But with software tracing digital assets or inferring guilt, we often don’t even know its error margin. We just trust the output because it looks precise. Trust, however, isn’t a replacement for testing. Ultimately, who are we trusting, and based on what?  

So how do we move forward? It starts with treating the tool itself as a piece of evidence. Whether it’s AI, blockchain analytics, or legacy code, if it’s used to investigate, detect, or prosecute crimes, its outputs must be tested, assessed, and verified by an independent party, not just for accuracy, but for how it was built, tested, and deployed.  

Borrowing from a model proposed by Marshall et al and supported by the Safety-Critical Systems Club and the British Computer Society (BCS), here’s a two-stage system every jurisdiction should consider:  

Stage 1: Disclosure and Assessment  

  • What version was used?  
  • What bugs were known at the time?  
  • How has the tool been calibrated, certified, or audited?  
  • Is it operated by certified professionals?  

Stage 2: Evidentiary Challenge 

If doubts arise, the party relying on the tool must demonstrate the credibility of its outputs. Courts then consider whether the software’s complexity, update history, or methodology compromises its reliability in the case.  

Essentially, we hold a trial of the tool before we let it accuse someone else. Software reflects the flaws of its creator, it’s not infallible. And when it fails, someone pays the price. In criminal cases, that price is a heavy toll on both justice and public trust.  

This is why we must:  

  • Certify developers whose tools end up in court.  
  • Require independent oversight and structured disclosure for forensic tools.  
  • Demand reproducibility and calibration data, just like we would from any expert. 

Conclusion

Software-generated evidence can be immensely powerful, if it used responsibly and with due process. But that means no shortcuts, no rubber-stamping, and no shrugging off questions because the tool comes from a big-name vendor. Calibrated trust isn’t about cynicism; it’s about care. And in criminal justice, being careful isn’t a luxury. It’s a duty.  

References 

  1. Matter of Weber, 2024 NY Slip Op 24258 (Sur Ct, Saratoga County).  
  2. Mata v Avianca, Inc, 2023 US Dist LEXIS 97638 (SDNY).  
  3. United States v Sharma, No 2:22-cr-00321 (D NJ, 2022).  
  4. United States v Sterlingov, No 1:21-cr-00309 (D DC, 2024). 
  5. Illinois Supreme Court, ‘Policy on Artificial Intelligence in the Courts’ (2024) (https://ilcourts.gov).  
  6. Paul W Grimm and Maura R Grossman, ‘Potentially Fabricated or Altered Electronic Evidence’ (Advisory Committee on Evidence Rules, April 2024). 
  7. Andrea Roth, ‘Machine Testimony’ (2021) 126 Yale L.J. 1972, 1972-2045.  
  8. John P LaMonaga, ‘Digital Evidence and the Deepfake Defense’ (2020) 70 Am. U. L. Rev. 1027, 1027-1050. 
  9. Rebecca Delfino, ‘Authenticating Deepfakes: A Framework for Digital Evidence’ (2023) 75 Hastings L.J, 601-634.  
  10. Marshall et al., Assurance Cases for Forensic Software (Safety-Critical Systems Club, 2022).  

Article by Dr Ian Gauci

Disclaimer This article is not intended to impart legal advice and readers are asked to seek verification of statements made before acting on them.
Skip to content