The Brainyacts
Posts
092 | 🔌⚙️ Generative AI Ops

092 | 🔌⚙️ Generative AI Ops

Brainyacts #92

June 07, 2023

In today’s Brainyacts we:

give thanks to a bunch of global referrers
learn what books GPT-4 has eaten
talk Generative AI Ops roles in law firms
get a smarter Google Bard
get longer Bing Chat sessions
chide colleges who jump the gun on students and AI
tempt you with a lawsuit rabbit hole against OpenAI

👋 A special Welcome! to NEW SUBSCRIBERS.
To reach previous posts, go here.

🙏😅 It is time to give out a huge THANK YOU to the most recent batch of folks who have been referring subscribers to this newsletter. I am always humbled when one of you share this with others. You are the best!

Ellen Connor: CFO | Business Support Team Lead at FSC - Melbourne, Australia
Eli Zbar: Lawyer - Vancouver, Canada
Katrina Perez: Technical Services Manager at Faegre Drinker - Chicago, US
Catherine Ongiri: Managing Attorney in the Office of Professional Competence at the State Bar of California - San Francisco, US
Mark Dodd: Head of Market Insights at LOD | Former Lawyer - Perth, Australia
Aurea Garrido: Vice President & Associate General Counsel WM News & Sports - London, UK
Brooke Bornick: Co-Founder of Lodgeur | Alumni Advisory Council Member at Cambridge Judge Business School - Houston, US
Evan Reed: Category Development Manager at The Wonderful Company - LA, US
Nathan Dudgeon: Legal Counsel at Capgemini - London, UK
David Curle: Independent Writer and Analyst for the Legal Services Industry - Minneapolis, US
Emily Solley: Legal at Glue - Winston-Salem, US
Alan Robinson: Client Council at GLG - Bellevue, USR
Rob Otty: Global COO, Norton Rose Fulbright
Graeme Grovum: Head of Technology & Client Solutions, Allens, Sydney, AUS

🍽️ 📚 The Books That GPT-4 Ate

David Bamman, an information scientist at UC Berkeley, was intrigued by how Large Language Models (LLMs) like GPT-4 were capable of producing informed responses, seemingly indicative of a specific knowledge base. In order to unravel the mystery behind this, Bamman initiated an experiment where he posed a question about the character relationships in the classic novel "Pride and Prejudice" to GPT-4. He was impressed when the model provided accurate insights about the novel's characters, suggesting an intimate familiarity with the book.

However, it was unclear whether the model's accurate response was due to it having been specifically trained on "Pride and Prejudice" or whether it had merely synthesized the information from related texts and resources it had seen on the internet.

To investigate further, Bamman's team embarked on a journey to become "data archaeologists", aiming to discover what books GPT-4 might have been trained on. They employed a method akin to quizzing, asking GPT-4 questions about various books, almost like testing a high school English student's knowledge of their reading list. The performance on these tests was then used to score the likelihood of each book being part of GPT-4's training data.

The team also used a technique called a name cloze, which involved extracting short passages from numerous novels, removing character names and any identifying clues, and prompting GPT-4 to fill in the blanks with the appropriate character names.

Through these techniques, Bamman's team aimed to approximate GPT-4's "reading list" and gain a deeper understanding of the literature that shaped its outputs.

Based on this work, here is a list of the Top 50 books GPT-4 most likely consumed.

Why is this important to know?

Understanding an LLM's training dataset is crucial as it defines the model's knowledge base and potential output.
The composition of the training dataset can raise copyright violation concerns if the LLM is perceived to be memorizing and repeating copyrighted source materials.
The bias in the training dataset can influence an LLM's output. If the LLM is trained with a specific genre, it may replicate the biases inherent in those works.
GPT-4's training data includes a significant proportion of classic literature and genre literature, particularly science fiction and fantasy. This could potentially influence the themes and associations in its responses.
The impact of individual books on GPT-4's outputs is uncertain due to the vastness of its dataset and complex internal interactions.
The inclusion of diverse genres in the training dataset is critical for broadening an LLM's conceptual associations and worldview.
Understanding an LLM's training data is beneficial for its use in fields like digital humanities, potentially revealing new insights in the analysis of lesser-known works.
Transparency in disclosing training datasets is necessary for more informed usage and scrutiny of LLMs. It helps users understand the nature and composition of the data on which an LLM is trained.

🔌 ⚙️ Yesterday I Said “NO” to Hiring AI Tech Talent. Today I Say “YES” to Hiring Generative AI Ops talent.

A Guide to Generative AI Ops for Law Firms

What is Generative AI Ops?

Generative AI Ops refers to the application of generative artificial intelligence models in firm operations and its tech stack. These models are capable of generating new insight, workflows, and data, and they can be trained to improve the efficiency and reliability of processes. Generative AI Ops utilizes these capabilities to automate operational tasks, optimize resource utilization, identify patterns related to application performance, connect practices and employees to tech capabilities in more intuitive ways, and much more.

Why Choose Generative AI Ops over Generative AI Tech?

While Generative AI Tech talent focuses on the development of specific AI models, Generative AI Ops offers a more comprehensive approach to AI management. By harnessing generative AI operations, law firms can optimize the management of their burgeoning AI environments, deal with increasing complexity, and drive improved business outcomes. With a focus on operational efficiency, Generative AI Ops offers an opportunity to drive business value rather than solely focusing on the technical or aspirational aspects of AI.

The Basics of Generative AI Ops Role

A Generative AI Ops role involves mapping business needs to technology and services. The professional in this role has to:

Understand the law firm's business needs and translate them into requirements that generative AI might address.
Implement generative AI technologies that align with the firm's strategic goals.
Manage the integration of systems and data across back-office and practice domains.
Contribute to and execute on a firm generative AI strategy.
Use generative AI models to analyze data, identify patterns, and make informed decisions.
Foster a culture of innovation and continuous learning centered on generative AI without creating hype, fatigue, or misuse.

First 100 Days as a Generative AI Ops Professional

In the first 100 days, a Generative AI Ops professional might:

Assess the Existing Tech Environment: Analyze the current state of the firms tech infrastructure, identify both client facing and back-office technologies and capabilities.
Understand Business Goals: Meet with key stakeholders (practice leads and business function heads) to comprehend the firm's strategic goals and how operationalize generative AI might support these objectives.
Formulate a Generative AI Ops Strategy: Based on the initial assessment and understanding of business needs, develop a roadmap for implementing generative AI Ops in the firm.
Set Key Performance Indicators (KPIs): Define measurable KPIs that can track the effectiveness of the Generative AI Ops strategy.
Initiate Training Programs: Start upskilling programs for the firm staff to familiarize them with the use of generative AI technologies in their day-to-day operations.

By embracing Generative AI Ops, law firms can create a more efficient, responsive, and innovative IT environment that aligns with their business needs and enhances their service delivery.

News you can Use:

Google Bard Updates

I have poked at Bard much over the last month or so. Out of the Big3 (OpenAI, Bing, and Bard) it is the weakest by far. But they are improving. The latest development is really interesting. Basically they have created the ability to have Bard determine the nature of the prompt and decide if it needs to write some code (that you don’t see) in order to do some higher level thinking and produce a more accurate reply.

The idea is based on a theory about human thinking from a book called "Thinking, Fast and Slow" by Daniel Kahneman.

In the book, he talks about two types of thinking: "System 1" which is fast and intuitive, and "System 2" which is slow and takes more effort. For example, playing a jazz improvisation or touch typing would use System 1 thinking. On the other hand, solving a long division problem or learning a new instrument would use System 2 thinking.

So far, LLMs like Bard have been using System 1 thinking - quickly generating text without deep thought. Now, with this new method, Bard can also use System 2 thinking - generating and using code to solve more complex problems. The result is that Bard's answers are about 30% more accurate on challenging tasks that involve logic or math.

Even with this upgrade, Bard might not always provide the right answer because it might not use code when it should, the code it writes could be incorrect, or it might not use the result of the executed code in its response. But overall, this new feature is a big step towards making Bard more useful.

Bing Chat Updates

Microsoft is rolling out an increase to the number of “turns” (think prompts or interactions) you can have in a single chat session from 20 to 30. Also now the daily limit is 300.

You will see this reflected in the “x of 30” number circled below.

What this means? Bing Chat’s context window or how long a conversation can be that it will remember has grown.

For other minor updates click here.

Don’t Do This, Colleges!

A 21-year-old political science major from Santa Barbara about to graduate at University of California, Davis, with plans to attend law school, was falsely accused of using generative AI on one of her assignments. The problem was the anti-plagiarism and AI checker falsey flagged her paper.

She Was Falsely Accused of Cheating With AI -- And She Won't Be the Last

UC Davis student Louise Stivers became the victim of her college’s attempts to root out essays and exams completed by chat bots

www.rollingstone.com/culture/culture-features/student-accused-ai-cheating-turnitin-1234747351

Down a Lawsuit Rabbit Hole Against OpenAI

Interesting thread and I must admit I did not open all the sub-threads and links to determine what is in here. Nevertheless, there has been many murmurings and even outright declarations by the likes of Elon as to the weirdness (some say illegality) of OpenAI’s business structure. Anyway, have fun with this one and please share what you learn.

I promised a thread this weekend about OpenAI and the lawsuit I filed against them, and an explanation of what I hope to achieve here. Sorry for the length, but there's a lot going on here.
To begin with, we need to understand what “OpenAI” really is: a poorly constructed scheme… twitter.com/i/web/status/1…
— The Short Straw (@short_straw)
11:21 PM • May 7, 2023

Was this newsletter useful? Help me to improve!

With your feedback, I can improve the letter. Click on a link to vote:

DISCLAIMER: None of this is legal advice. This newsletter is strictly educational and is not legal advice or a solicitation to buy or sell any assets or to make any legal decisions. Please /be careful and do your own research.8