234 | 👀 💥 You're missing out

Brainyacts #234

It’s Friday. I am sitting in the back of a cab as I write this. Sometimes we overlook the things that once seemed incredible and now have become unremarkable. Cruising in a cab while connecting to the internet. How boring. I wonder when GenAI will become boring and unremarkable in our lives? Right now, it is stunning and I hope you are not missing out!

Onward 👇

In today’s Brainyacts:

  1. AI predicting case outcomes

  2. Try NotebookLM now. Seriously, go try it!

  3. Meta unleashes serious AI advancements and other AI model news

  4. Law students versus ChatGPT and more news you can use

    👋 to all subscribers!

To read previous editions, click here.

Lead Memo

⁉️🙋 AI Predicts Case Outcomes

Two posts on LinkedIn caught my eye recently and I want to share them here. They are about this research and paper.

The first is from Prof. Felix Steffek - one of the authors of the research here. This is taken from how LinkedIn post you can find here.

“Can we use AI to predict how a court will decide a dispute? Yes, we can – but how good will the prediction be and will it be better than the prediction of legal experts? We are working on these questions and have just published our first paper focussing on decisions of the UK Employment Tribunal. To the best of our knowledge, this is the first prediction paper on UK court decisions since the advent of generative AI.

We found that both AI and employment law experts are pretty good in predicting what the Employment Tribunal will decide. The task for both was to predict on the basis of the facts and the claims made whether the claimant will win, partly win or lose, or whether the Tribunal will make another decision such as asking for more evidence. On the AI side, we assessed different models (BERT, T5, GPT-3.5 and GPT-4) including different types of prompting for the generative models.


The human experts achieved an overall F-score of 67% while the best AI model (T5) came back with an F-score of 56%. GPT-4 got remarkably close to that with an F-score of 55%. When the ‘win’ and ‘partly win’ results are aggregated, the F-score of the human experts rises to 81% while T5 and GPT-4 achieve F-scores of 71% and 66%, respectively.

The Recall score is a useful measure if the costs of missing a true positive are high. In the context of court proceedings, this is the case when the opportunity cost of not initiating likely successful litigation is high, for example, if the expected remedy has a high monetary value or otherwise has a high relevance for the potential claimant (e.g., for emotional reasons). Some of these scores were impressive, such as the human’s Recall score of 82% for ‘wins’ and BERT’s Recall score of 83% for ‘wins’.

This is just the start of our work in this area and our paper engages with the limitations of the results. We discuss issues such as restrictions, possible errors and information leakage in the input data as well as the terminology of prediction v classification. In particular, all scores are baselines only. More effort and better conditions could have been applied on both the human and the machine side. Hence, great care should be taken when drawing conclusions from the results.”

For a quick summary of the paper findings, I am sharing Sam Burrett’s post. Sam is AI Lead at MinterEllison in Sydney.

“Here are the key points:

• Dataset: 14,582 UK Employment Tribunal cases
• Human baseline: Two PhD candidates in UK employment law assessed 1,371 cases
• Predictions classified as "claimant wins," "loses," "partly wins," or "other"
• AI models tested include BERT, T5, GPT-3.5, and GPT-4

The findings are fascinating:

• Best AI model F-score: 56%
• Human experts F-score: 67%

An F-score is a measure of accuracy. It balances finding what you’re looking for (true positives) while avoiding mistakes (false positives). This matters because it reflects real-world usefulness of legal predictions.

In short, AI is quite good at identifying likely case outcomes - but doesn't match human experts yet.

What surprised me was GPT-4's accuracy scores. No fine tuning, no RAG, no Chain of Thought. And the ablation analysis suggests it’s getting 7/10 predictions right out-of-the-box. That’s pretty wild.

Here’s bottom line:

This isn't an AI ceiling; it's a baseline. And that's exciting for AI in law - and potentially for access to justice too.”

🙏 Thank you to both Felix and Sam!

Spotlight

🔥🤯 Google’s NotebookLM - Its time you tried it! Seriously!

I’ve talked about NotebookLM many times. Recently it has been gaining some incredible features. People are still getting blown away by the Audio Overview feature that turns documents into a podcast with two hosts discussing the content. It’s incredible.

I made a short YouTube of it. https://youtu.be/ozfZcfmXW68

Let’s quickly revisist that NotebookLM is. NotebookLM is an experimental AI-first notebook from Google Labs that uses the power of language models to help users gain critical insights from their existing content, functioning as a virtual research assistant that can summarize facts, explain complex ideas, and brainstorm new connections based on the sources selected by the user.

They just added new features like you can now upload YouTube video URLs and audio files directly to NotebookLM, in addition to Google Docs, PDFs, text files, Google Slides and web pages. You can also now share your Audio Reviews (podcast).

Hey how about those dreadful and boring client alerts or law firm blogs? Why not try Audio Overview on those?

This tweet was posted by one of the Google directors working on NotebookLM. It is directed at students (which is terrific). But I also thoughts this could be helpful for working professionals!

🔥 🔥 How might you use this workflow to get more out of your meetings?

AI Model Notables

Meta AI is getting more useful. At Meta Connect earlier this week, Zuckerberg announced a ton of impressive AI upgrades—including voice and image understanding in Meta AI, four new Llama models, Voice Mode (similar to ChatGPT’s recent Advanced Voice Mode), which allows users to use their voice to talk with Meta AI on Messenger, Facebook, WhatsApp and Instagram DMs and of course, a crazy cool AR glasses prototype.

Microsoft announced a series of artificial intelligence security, privacy and transparency capabilities, including a new feature in its Copilot AI chatbot designed to give more visibility into the components of AI responses.

OpenAI to remove non-profit control and give Sam Altman equity. Several key executives — including CTO Mira Murati — announced they were departing OpenAI as the firm finalizes a massive new funding round and the nonprofit-controlled company edges toward for-profit status.

ChatGPT's human-like Advanced Voice Mode feature can say “Sorry I’m late” in over 50 languages — but the EU is reportedly turning a deaf ear to OpenAI's plea because it can detect user emotion.

OpenAI is floating plans for 5 gigawatt data centers that would each use as much energy as entire cities. The pitch? It's essential to keep the U.S. ahead in AI and compete with China. Energy execs are skeptical, to put it mildly. One called it "not only something that's never been done, but I don't believe it's feasible as an engineer."

Hugging Face reached 1 million free public AI models on its platform, highlighting the trend towards specialized models for diverse use cases rather than a single dominant model.

News You Can Use:

Mizzou professor trained a chatbot to go up against her law students in a mock negotiation. Coaching the bot to be obstructive, aggressive and difficult presented the perfect opportunity for her students to perform a live negotiation with the chatbot with the goal of coming to an agreement on obtaining documents.

An Aussie university lecturer pitted his law students against AI. The study found that GenAI performed below the student average in questions that required detailed legal and critical analysis. However, all GenAI papers performed better than students in open-ended questions and essay writing tasks. Link to PAPER.

After a decade of explosive growth, body cameras are now standard-issue for most American police as they interact with the public. The vast majority of those millions of hours of video are never watched — it's just not humanly possible – but it is AI possible.

Australia - Victoria’s child protection agency has been ordered to ban staff from using generative AI services after a worker was found to have entered significant amounts of personal information, including the name of an at-risk child, into ChatGPT.

France to host AI Action Summit in Feb. 2025. Here, France’s special envoy for AI Anne Bouverot discusses the need for a global AI framework.

AI model identifies existing drugs that can be repurposed for treatment of rare diseases.

Was this newsletter useful? Help me to improve!

With your feedback, I can improve the letter. Click on a link to vote:

Login or Subscribe to participate in polls.

Who is the author, Josh Kubicki?

Some of you know me. Others do not. Here is a short intro. I am a lawyer, entrepreneur, and teacher. I have transformed legal practices and built multi-million dollar businesses. Not a theorist, I am an applied researcher and former Chief Strategy Officer, recognized by Fast Company and Bloomberg Law for my unique work. Through this newsletter, I offer you pragmatic insights into leveraging AI to inform and improve your daily life in legal services.

DISCLAIMER: None of this is legal advice. This newsletter is strictly educational and is not legal advice or a solicitation to buy or sell any assets or to make any legal decisions. Please /be careful and do your own research.8