- The Brainyacts
- Posts
- 203 | š„š Apple AI. Stanford vs. Westlaw. Copilot makes impact
203 | š„š Apple AI. Stanford vs. Westlaw. Copilot makes impact
Brainyacts #203
Itās Tuesday. This edition is longer than usual. This is not vanity - itās because there is lot going on and much to share. I try to be direct and pithy when breaking everything down. So hopefully you will take an extra minute or two to read it.
To get us going, let the following sink in for a moment:
Here we go.
In todayās Brainyacts:
AI-powered legal research: a problem?
Apple joins the AI party (a lot to unpack so a bit lengthy)
UKās Ashurt GenAI study and more news you can use
š to new subscribers!
To read previous editions, click here.
Lead Memo
WHY YOU SHOULD READ THIS: Does your legal research tool provide you with accurate and factual responses? This seems to be a fair and logical question. But it is not. Two groups have been battling it out in the public as to how well these tools perform.
šš«ø Stanford vs. Thomson Reuters on AI in Legal Research: The Battle of Unrealistic Expectations
Does your legal research tool provide you with accurate and factual responses? This seems to be a fair and logical question. But it is not. Two groups have been battling it out in the public as to how well these tools perform.
Large language models (LLMs) like OpenAI's GPT-4 have brought transformative changes to various fields, including legal research. Despite their amazing capabilities, these tools are often criticized for their occasional inaccuracies, or "hallucinations." Regular readers know my take on so-called āhallucinationsā ā that they are a feature, not a bug. Regardless, this critique has sparked a debate between Stanford University researchers and Thomson Reuters.
The Stanford Study: Exposing Limitations
Leading legal research products āhallucinateā at high rates. A recent study by Stanford University researchers, hailed as the first āpreregistered empirical evaluation of AI-driven legal research tools,ā tested products from major legal research providers and compared them to OpenAIās GPT-4 on over 200 manually constructed legal queries (which they have not released yet). The findings were stark: while LLM-based legal tools performed better than general-purpose chatbots, they still exhibited a significant rate of hallucinations, ranging from 17-33%.
The researchers noted that legal queries often lack a single clear-cut answer and that deciding what to retrieve can be highly complex. This complexity often leads to inaccuracies, as the retrieved documents may not always apply appropriately to the context of the legal case. It is worth noting that the researchers categorized a response as inaccurate even if it provided four points of law factually correct but misstated a fifth. I note this as I wouldnāt necessarily disregard an entire response as though it contained zero value. Moving on.
Thomson Reuters' Rebuttal: Defending Practical Use
In response, representatives from Thomson Reuters defended their products. They acknowledged that while no Gen AI tool could deliver 100% accuracy, their internal testing showed significantly lower rates of hallucinations compared to the Stanford findings.
Mike Dahn, head of Westlaw Product Management at Thomson Reuters, emphasized that their tools are designed to enhance legal research, not replace the judgment of lawyers. He suggested that the discrepancies in the study might be due to the inclusion of atypical query types that their products are not usually designed to handle. Dahn also pointed out that their tools are tested rigorously with real-world legal questions, and customers find them invaluable, even if they occasionally produce inaccuracies.
The Core Issue: Misunderstanding and Misusing AI
The debate between Stanford researchers and Thomson Reuters underscores a broader issue: the unrealistic expectation of 100% factual accuracy from LLMs. This standard is not applied to other research tools, whether they be web search engines, books, or even expert mentors.
The question then arises: why are LLMs held to such an unattainable standard? A more challenging question is why these tools are marketed in a manner that might lead some to believe they do not hallucinate at all?
Several factors are at play here:
Technological Hype ā Companies want to sell Panaceas, not Piecemeal: The rapid advancements and marketing of AI technologies often create exaggerated expectations. LLMs are frequently portrayed as revolutionary, leading users to believe in their infallibility. Marketing these tools is challenging. Revisit Dahnās point above about the researchers using atypical search queries. Nobody wants to sell by saying āthese tools are amazing, you just have to know the right way to ask questions and the right questions to ask.ā
Misunderstanding and therefore misapplying AI: Most users lack a deep understanding of how LLMs function. When LLMs are integrated into tools they already use and have formed habits and behaviors around, they bring with them their expectations for what the core tool should do. They are not necessarily going to automatically change their behavior to fit how the AI works ā they are just going to go about doing their work as they always have. Generative AI-based functioning requires a different user thought process. Without it, most will get less useful and likely inaccurate responses.
Replacing or Augmenting Human Expertise: Everyone is in a race to replace humans with AI ā either to justify their fears about this tech or to sell AI as a key driver of the future of work. Regardless, people often compare LLMs to human experts, expecting them to exhibit similar reliability. I argue and my experience bears out that LLMs are similarly reliable to humans. But, humans, even human experts err, and their advice is always taken with a degree of skepticism and cross-checked. Why not for AI?
Historical Context: Remembering Legal Research Before LLMs
To flesh this out a bit more, it's helpful to understand the historical context of legal research. Before digital databases like Westlaw and LexisNexis, legal professionals relied on manual methods, such as "shepherdizing" cases, a laborious and error-prone process of checking the validity and treatment of legal precedents. I know as I am old enough to have had to learn to do this in law school ā visiting the stacks in the library, chasing down every conceivable weakness in my case lawās precedent. Even with the introduction of these digital databases, mastering complex query techniques was challenging and often one had to seek assistance from research librarians to ensure accuracy. Despite these efforts, inaccuracies persisted.
The point: legal research has always had subjective nuances that would make one personās research stronger or different than anotherās.
The other point is LLMs should be viewed as powerful tools that augment human capabilities rather than replace them. But saying this out loud is not as simple or as sexy as marketers of these tools would like.
The advice to use these tools as supplementsāas companions to human judgmentāis not a disclaimer to mitigate their inaccuracies but a realistic portrayal of their best use case. Just as legal databases transformed research by providing easier access to information, LLMs enhance our ability to process and reason data quickly and efficiently. But they are just one more tool; not the one tool.
The Path Forward: Embracing Imperfection
Reminder to everyone ā we are still in the early days of generative AI and LLMs.
Despite their imperfections, LLMs represent a significant advancement in legal research. They provide faster, more comprehensive access to information and can assist in identifying relevant cases and statutes that might otherwise be overlooked. The call for transparency and benchmarking, as highlighted in the Stanford study, is essential. Legal professionals need to understand the strengths and limitations of these tools to use them effectively. By setting realistic expectations and focusing on continuous improvement, we can leverage LLMs to their full potential without falling into the trap of expecting perfection.
Oh, and letās not vilify the AI model for āhallucinatingā ā rather letās be smarter, more capable users of these tools right now. There will come a day when this burden is lifted from us ā but we are not there yet. And the companies selling us these things should be bold and out front with this.
Spotlight
WHY YOU SHOULD READ THIS: Put aside the consumer-facing updates below, and note the innovation around privacy and security for generative AI - Apple might just have given us a pathway to reliable privacy and security when interacting with these AI models.
š š APPLE JOINS THE AI PARTY
For this edition I am combining the Spotlight and AI Model News sections. It is Apple week so there is much to cover.
All this week, Apple is holding its annual WWDC event which is geared toward developers. But it always packs some major releases and updates. Yesterday was the keynote. This was a cinematic (kinda goofy) production but came with many updates and news.
There will be more news and analysis coming but I will sum it up like this: If you are a giddy Apple consumer hoping for nifty upgrades and snazzy AI gizmos ā this was for you. If you are looking for a serious AI powerhouse productivity suite from Apple ā you will be waiting a bit (but it looks like it is coming if you read between the lines).
Read below or watch this 5-minute overview from Apple.
Siri Upgrades
An upgraded Siri will converse more naturally, remember context across requests, and accomplish more complex tasks by better understanding both voice and text.
Siri also gains āonscreen awarenessā, with the ability to take actions and utilize on-device info to better tailor requests to the individual user.
Controversially ā Siri also will connect the user to ChatGPT(GPT-4o) for any requests it cannot handle. Users will be prompted for permission to do so.
New AI Features
New AI writing tools built into apps like Mail, Messages, and Notes will allow users to auto-generate and edit text.
Mail will utilize AI to better organize and surface content in inboxes, while Notes and Phone gain new audio transcription and summarization capabilities.
iPhones will let you record calls ā and tells everyone for their privacy
AI-crafted āGenmojisā enable personalized text-to-image emojis, and a new "Image Playground" feature introduces an image generation tool from prompts.
Photos get more conversational search abilities, the ability to create photo āstoriesā, and new editing tools.
Privacy
A focus of the AI reveal was privacy ā with new features leveraging on-device processing when possible and Private Cloud Compute for more complex tasks.
Private Cloud Compute (PCC) is Appleās new intelligence system specifically for private AI processing on the cloud.
The new AI features will be opt-in, so users will not be forced to adopt them.
OpenAI Integration
Beyond Siri, there is more to how OpenAI and Apple partnered ā see OpenAIās blog which outlines additional ChatGPT tools like image generation and document understanding embedded into the new OS.
Now letās break down Appleās approach to AI
Apple's approach to artificial intelligence divides its AI tasks into three groups: on-device, private cloud compute, and third-party model inference.
On-Edge AI: Making Devices Smarter: Apple is adding a small, fast AI model to future iPhones that can understand commands and interact with apps without needing the internet. For example, you can ask Siri to find the best weather day this week at a specific location, plan a 1-hour hike for you, and set up an Uber pickup to get you there and it will handle everything. This AI runs on Appleās own chips, not Nvidiaās, making it quick and efficient.
Private Cloud Compute: Powerful and Secure: When the on-device AI can't handle a task, it sends it to Apple's private cloud ā powerful computers in Appleās data centers. These centers use Appleās own hardware, ensuring everything is secure and efficient. This vertical integration ā where Apple controls the whole process ā helps keep costs down and improves privacy. See more below.
Third-Party AI: Integrating ChatGPT: Apple also allows users to use OpenAIās ChatGPT for certain tasks. This isn't about replacing Siri but providing options. For example, you might want help capturing your ideas for an article and want some initial research questions, Siri will likely punt this to ChatGPT. But Apple will be watching, meaning it benefits by learning how people use ChatGPT. They will likely use this insight to improve their own AI models.
The private cloud compute is what is getting many experts excited - and this matters to you.
This system is designed to ensure that even when your data is processed in the cloud, it remains as private and secure as if it were on your own device. Hereās a simple explanation of how it works and why itās special.
What is Private Cloud Compute?
Think of it as a super-smart assistant that helps your device when it needs extra brainpower. But what makes PCC unique is how it keeps your personal data safe and private.
How Does PCC Work?
1. On Your Device: Normally, your Apple device handles most tasks itself. This keeps your data secure because it never leaves your device.
2. When PCC is Needed: For really complex tasks, like understanding a lot of data or running big AI models, your device asks PCC for help. This involves sending some of your data to Appleās data centers.
3. Super Secure Transfer: Before your data is sent to PCC, it is encrypted (turned into a secret code) so that no one else can read it, not even Apple. Only the special computers in PCC can decrypt (decode) it.
4. Processing the Task: The powerful computers in PCC process your request using advanced AI models. These computers are custom-built by Apple and run on special software designed for security.
5. Immediate Deletion: Once the task is completed, all your data is deleted from PCC. Itās like it was never there. This ensures that thereās no trace of your data left behind.
Why is PCC Special?
Unmatched Privacy: PCC extends the high level of security and privacy of Apple devices to the cloud. Your data is never accessible to anyone other than you, not even Apple employees.
Custom-Built Hardware: PCC uses custom-built hardware with Appleās own chips, ensuring tight control over the entire process.
No Privileged Access: Unlike traditional cloud systems, even Appleās engineers canāt access your data. There are no back doors or special access routes.
Transparency and Verification: Security experts can inspect the PCC system to ensure itās working as promised. Apple publishes the software running on PCC so researchers can verify its security.
News You Can Use:
ā Beyond anecdotal success: Copilot delivers value for Microsoftās corporate, external, and legal affairs (CELA) organization
ā Gen AI cut lawyersā drafting time in half, UKās Ashurst says (deep dive coming on this next edition)
ā 5th Circuit scraps plans to adopt AI rule after lawyers object
ā Can ChatPGTās output defame someone through hallucination? - this Georgia court case is testing the future of AI law
ā Spending too much time on emails? This Pillsbury IP partner developed an AI tool to solve the problem
ā California lawmakers demand AI firms install 'kill switch'
ā French data protection authority CNIL has released recommendations for developing AI systems that comply with the EU's General Data Protection Regulation (GDPR)
Was this newsletter useful? Help me to improve!With your feedback, I can improve the letter. Click on a link to vote: |
Who is the author, Josh Kubicki?
Some of you know me. Others do not. Here is a short intro. I am a lawyer, entrepreneur, and teacher. I have transformed legal practices and built multi-million dollar businesses. Not a theorist, I am an applied researcher and former Chief Strategy Officer, recognized by Fast Company and Bloomberg Law for my unique work. Through this newsletter, I offer you pragmatic insights into leveraging AI to inform and improve your daily life in legal services.
DISCLAIMER: None of this is legal advice. This newsletter is strictly educational and is not legal advice or a solicitation to buy or sell any assets or to make any legal decisions. Please /be careful and do your own research.8