The Brainyacts
Posts
087 | ✅👾 Guide to LLMs

087 | ✅👾 Guide to LLMs

Brainyacts #87

June 02, 2023

🏡🔨 Back from vacation, just in time for the weekend. This is a good one.

In today’s Brainyacts we:

say ‘no, you’re a parrot’ (or why calling AI a bulls!t generator is BS)
give you a guide to LLM approaches for business
meet a humanoid robot with a real face
see OpenAI giving away $1M
don’t care if you dislike Joshua Browder, he is right
look at fun AI misspelled Wonder of the World

👋 A special Welcome! to NEW SUBSCRIBERS.
To reach previous posts, go here.

🚨🚨Feel free to skip this Op-ed and go right to the next post. This is a setup but it is not actionable nor pragmatic, just straight up reaction to the recent overreaction to AI in legal.

🦜💩 A “Stochastic Parrot’ or the Unveiling of Human Pretense?

A reaction to this article: Beware ‘death by GPT syndrome’ which you don’t have to read to get what I am throwing down next.

As an ardent follower and an insatiable consumer of human intellect, far be it from me to throw down the gauntlet and express disagreement with our esteemed computational linguist, Emily Bender, and our illustrious AI entrepreneur who swapped his biglaw lawyer’s robes for an arguably more alluring affair with artificial intelligence, Mathias Strasser. Yet, I find myself called upon to say, "Bullshit."

Strasser, a former counsel at the venerable US law firm Sullivan & Cromwell, quite pointedly proclaimed, “The legal industry is based on words. Lawyers are outsourced word processors.” A statement brimming with the tragicomic reality of the situation - reducing lawyers to mere word processors, but equally, a statement that unveils the shuddering vulnerability of his own profession to the onslaught of artificial intelligence.

Generative AI, Strasser contends, presents an existential threat to the legal profession. If, however, the quintessence of lawyering is tantamount to outsourced word processing, the fear seems as illogical as a calculator being unnerved by an abacus.

Let us, for a moment, turn our attention to Emily Bender's poetic reduction of language models like ChatGPT to “stochastic parrots.” These "parrots" she says, mimic with an eye towards the highest probability rather than accuracy. Apparently, we’re supposed to accept this and mourn the purported loss of substance and nuance in the generated content.

But let's call as it is, shall we?

First, the core thesis of Bender’s argument, in essence, is not a failing of AI but a reflection of its user's skill. A ‘stochastic parrot’ merely imitates what it has been taught. If the outcome isn’t accurate or satisfactory, it's indicative not of the parrot’s incompetence but of the shortcomings in the quality or comprehension of the data it was trained on. As any good teacher would, perhaps we should look inward, rather than berating the AI.

Second, if we, in our humanistic arrogance, call AI a mimicry machine while conveniently dismissing our own storied history of mimicry, it reeks of hypocrisy. Humans have long learned from mimicking - from the first caveman imitating fire-making to a child mimicking speech patterns. The notion that AI mimicking human language is somehow less worthy because it lacks 'original' thought, strikes me as a peculiarly self-serving critique.

The issue at hand is not whether AI tools like ChatGPT are just word factories, or merely stochastic parrots. They are what we make them. The issue is how the two groups involved, namely the lawyers and computational linguists, project their fears and shortcomings onto these tools.

Far from being unreliable, these tools instead seem to be a mirror reflecting our own limitations. They expose the uncomfortable truth about how we've been operating. They put in perspective our highfalutin claims of nuanced expression and our pretensions of superiority. In other words, they reveal our own bullshit.

In short, it seems rather clear that those who dismiss these tools are either unable to harness their true potential or perhaps more insidiously, have self-interest in maligning the capabilities of AI. To them, I say, might we be better served to look inward before pointing fingers outward? But what do I know? I am but a humble observer, disagreeing with the grandees of computational linguistics and legal jurisprudence.

To this end, let’s turn to some intermediate basics (is that a thing?) to address the various approaches or flavors of LLMs (large language models) aka what we teach parrots on.

✅ 👾 A Guide to Large Language Models (LLMs) for Legal Leaders and Technologists

LLMs are sophisticated AI models that can generate human-like text by predicting the probability of a word given the previous words used in the text. They've found use in various applications, from drafting emails to answering customer queries, and even generating creative content and legal content.

Most people are familiar with OpenAI’s ChatGPT and GPT-4. Perhaps even Bing Chat and Google Bard but there are many more types and kinds of LLMs. As legal organizations and teams continue to explore (or ignore) the advancements in generative AI, it is important to understand what lies beyond the basic consumer-facing LLMs that I mentioned above.

This is a basic guide to LLMs. I will be diving more deeply into these in the coming newsletters.

Overview of the Four Approaches to LLMs

There are four primary approaches to employing LLMs in a business: Consumer-Grade, Public-Facing LLMs; Public Access/Open Source Models; LangChain Models; and Proprietary Models. The choice among these depends on the specific needs and resources of a business.

Consumer-Grade, Public-Facing LLMs

Consumer-grade LLMs include popular AI models like GPT-4, ChatGPT, BingChat, and Google BARD. These are ready-to-use and can be integrated directly into applications or platforms.

Their biggest advantages include ease of use, high-quality output, and widespread adoption, which can often lead to a large user base and community support.

The downside is limited customization, reliance on the provider for updates and maintenance, and potential concerns about data privacy, accuracy, and bias.

Ideal Use Cases

These models are perfect for small-scale operations, startups, and prototype development due to their low setup requirements. There are also exceptional initial training tools for anyone to get introduced to this flavor of AI. For firms and legal teams, these tools are highly usable on the business and operation side. For the practice of law, they are also useful but come with risks that must be mitigated.

Public Access/Open Source Models

Public Access/Open Source Models include GPT-2, GPT4ALL, OpenLLaMa, RedPajama-INCITE, h20GPT, FastChat-75, and others. These models are open to the public and can be customized to specific needs.

The biggest advantages are flexibility, access to an active development community, and the lack of licensing costs.

These models, however, require more technical expertise and there is a potential for misuse due to the lack of control mechanisms.

Ideal Use Cases

They're ideally suited for research, development of customized solutions, and academic institutions.

LangChain Models

LangChain Models involve combining different models to broaden capabilities and reduce model bias.

The synergy of different models can result in a more robust and versatile AI solution. For example, a LangChain Model can use one language model to generate a question, another language model to answer it, and a third language model to evaluate the quality of the answer. This way, the LangChain Model can create more diverse and accurate texts than using a single language model.

The challenges include ensuring model compatibility, managing the increased complexity, and potential cost implications.

Ideal Use Cases

These models are suited for advanced AI applications and projects that need diverse AI inputs.

Proprietary Models

Proprietary Models involve building in-house LLMs tailored to specific business needs.

The advantages include control over data, customization to specific needs, and providing a unique competitive advantage.

The challenges include significant resource requirements, long development times, and the need to keep the models updated.

Ideal Use Cases

These are suited for large corporations, industries with specific requirements, and situations where high security is required.

News you can Use:

Meet Ameca - A Humanoid Robot with a Nightmare AI Scenario

Yes this is a bit funny on its face (pun) as to what this robot says but it is serious in terms of the AI development and worth the watch.

#FPVideo: Humanoid robot 'imagines' nightmare AI scenario
— Firstpost (@firstpost)
12:38 PM • Jun 1, 2023

OpenAI Launches a $1M Cybersecurity grant program

We are launching the Cybersecurity Grant Program—a $1M initiative to boost and quantify AI-powered cybersecurity capabilities and to foster high-level AI and cybersecurity discourse.

Below are some general project ideas that the OpenAI team has put forward:

Collect and label data from cyber defenders to train defensive cybersecurity agents
Detect and mitigate social engineering tactics
Automate incident triage
Identify security issues in source code
Assist network or device forensics
Automatically patch vulnerabilities
Optimize patch management processes to improve prioritization, scheduling, and deployment of security updates
Develop or improve confidential compute on GPUs
Create honeypots and deception technology to misdirect or trap attackers
Assist reverse engineers in creating signatures and behavior-based detections of malware
Analyze an organization’s security controls and compare to compliance regimes
Assist developers to create secure by design and secure by default software
Assist end users to adopt security best practices
Aid security engineers and developers to create robust threat models
Produce threat intelligence with salient and relevant information for defenders tailored to their organization
Help developers port code to memory safe languages

OpenAI cybersecurity grant program

Our goal is to facilitate the development of AI-powered cybersecurity capabilities for defenders through grants and other support.

openai.com/blog/openai-cybersecurity-grant-program

This is Tough to Argue With

Like or despise him, Joshua Browder is always ‘on message’ and for that I respect him. He may go a bit too far with claims and PR stunts and calling his tech the first ‘robot lawyer’ but boldness is what it takes to confront the extreme traditions and self-protectionism of our profession.

Joshua was on national news programming this week, making the case for consumer-grade AI assisted legal services.

Went on NBC News this week to discuss the lawyer who got in trouble for using ChatGPT, DoNotPay and the future of consumer rights. Part of the clip here:
— Joshua Browder (@jbrowder1)
3:00 PM • Jun 1, 2023

In the Memetime:

Was this newsletter useful? Help me to improve!

With your feedback, I can improve the letter. Click on a link to vote:

DISCLAIMER: None of this is legal advice. This newsletter is strictly educational and is not legal advice or a solicitation to buy or sell any assets or to make any legal decisions. Please /be careful and do your own research.8