Anthropic’s $1.5 Billion (Avoidable) Mistake

The most expensive mistake in AI wasn’t chips. It was ebooks.

Sep 10, 2025

Anthropic, the company behind Claude (aka one of ChatGPT’s cousins), is negotiating a proposed $1.5 billion settlement to escape a copyright lawsuit.

Not because training AI on books is inherently illegal. A federal judge already ruled: if you (legally) own a book, scanning it and using it to train an AI model is fair use. That means lawful training alone doesn’t violate copyright.

Anthropic’s billion-dollar problem is narrower.

The authors who filed the lawsuit also said Anthropic illegally obtained the books (in other words… actual stealing). They are accusing Anthropic of downloading + storing half a million pirated ebooks from “shadow libraries” like LibGen and PiLiMi (think Limewire for books).

And the $1.5B number? It’s not a jackpot for authors. It’s a discounted settlement number the parties agreed on because Anthropic’s potential liability at trial could have been many times higher.

The Anthropic lawsuit in plain English

When the case was filed in 2024, the authors’ claims boiled down to two arguments:

1️⃣ Training is infringement.

The authors argued that making digital copies of books to feed into an AI training set automatically violates copyright — even if you bought the books. Aka.. you can read the book or use it to decorate your shelf, but not to train an LLM because that violates the author’s copyright.

2️⃣ Piracy is infringement.

They also alleged that Anthropic didn’t always use books it bought. Instead, it obtained copies from pirate sites with free downloads and warehoused them in datasets.

💭 The difference is subtle but crucial.

If you buy a novel at a bookstore and scan it into your system, that’s a lawful copy — and under the judge’s ruling, training on that copy is legal fair use. But if you grab the same novel from LibGen, that’s an illegal copy from the start. Copyright law says that’s infringement as soon as the file hits your server, regardless of what you do with it later.

What the judge already decided

In June 2025, Judge William Alsup split the case into two:

✅ Training on lawful copies = fair use.

He called it “spectacularly transformative.” Why? Because the model isn’t memorizing books or competing with their market. It’s distilling patterns of language to generate new text. So this claim by the authors was thrown out.

❌ Acquiring pirated copies = still infringement.

Fair use doesn’t excuse theft. If the source was illegal, you’re already infringing. Those claims survived.

This ruling narrowed the battlefield. The trial wouldn’t have been about training in general. It would have been about whether Anthropic downloaded and stored pirated copies at massive scale.

How we got here (to settlement talks)

August 2024: Authors file suit against Anthropic.

June 2025: Court rules training on lawful books = fair use; piracy claims remain.

July 2025: Judge certifies a “class” (case is now a class action lawsuit for all authors whose books were stolen).

August 2025: The parties mediate and sign a settlement term sheet.

September 5, 2025: $1.5B settlement news leaks (~$3,000 per pirated book for ~500,000 books).

September 9, 2025: Judge Alsup refuses to approve the $1.5B settlement (because the settlement was missing details like: which books, which authors, and how payouts would actually be divided).

Now: The judge has ordered the parties to meet again and fill in the blanks later this month (stay tuned…).

$1.5 billion isn’t what you think it is

The headlines make it sound like every author is about to get a check. That’s not how class action settlements work.

Here’s what this actually means:

✦ The court approves the deal as a package.

The judge doesn’t sign off one author at a time. He has to approve the entire settlement structure — the fund itself, the rules for who can claim, how money is distributed, and how fees and expenses are handled. Right now, Judge Alsup has said the proposal isn’t specific enough, so nothing has been approved yet.

✦ The fund only exists if the deal is approved.

If the judge gives the green light, Anthropic sets up a dedicated fund. It’s not money being directly handed to authors — it’s more like a reserve account set aside for paying all of the costs of the settlement.

✦ Money gets sliced before authors see anything.

Lawyers and administrators are paid first. Class counsel takes their share as fees, and the cost of running the settlement — websites, staff, fraud prevention, notice to class members — comes out next. What’s left is what flows to authors.

✦ Authors have to raise their hands.

Rightsholders don’t just get automatic payments. They’ll receive notice and must file a claim saying, in effect, “Yes, my book is covered.” Those claims are then checked, duplicates removed, and eligibility confirmed before money is allocated.

✦ Timing is slow and staggered.

Even if approved, money doesn’t flow immediately. These funds are often paid by the company in installments or held in escrow and released over time. Distributions to authors typically happen months — sometimes years — later.

So when you see “$1.5B settlement,” don’t picture checks in the mail. Picture a cap on Anthropic’s liability, subject to court approval, deductions, claims processing, and a long administrative process before anyone gets paid.

What was Anthropic really on the hook for?

Settlements are always smaller than what someone would actually be on the hook for if they lost in court.

That’s the whole point and why anyone would take them.

For Anthropic, this is the part that makes $1.5B look like a decent deal:

Minimum statutory damages: $750 × 500,000 = $375 million.
Mid-range statutory damages: $30,000 × 500,000 = $15 billion.
Maximum if willful: $150,000 × 500,000 = $75 billion.

Would a jury actually award $75B? Almost certainly not.

But the authors would argue Anthropic’s conduct was willful, and juries are unpredictable. Even a fraction of the maximum — let’s say 10% — is $7.5B (still 5x times the settlement).

That’s why $1.5B is really just a discounted risk cap, not a windfall.

The Kindle trap

Too many people (including Anthropic’s team) don’t know this… but ebooks are NOT digital versions of books you own.

You don’t “own” them at all.

For example… if you buy an ebook on Kindle, you’re licensing access to a file that’s wrapped in DRM (digital rights management).

Strip that DRM, and you risk violating the DMCA, even if copyright law might otherwise consider training fair use.

Paper books are different. When you buy a paper book, you own that copy. Scanning it for transformative uses like training is protected as fair use.

Alsup was clear: lawful purchase + scanning = legal. Piracy = liability.

The $15M fix that Anthropic missed

Ironically, if Anthropic had simply bought the books, this wouldn’t exist.

Let’s lay out what that would have cost to build the same 500,000-book dataset lawfully:

📚 Buying the books

Used copies (bulk resale, libraries, online marketplaces): around $8 each → about $4M total
Mixed supply (some new, some used): about $15 each → about $7.5M total
All new retail copies: around $25 each → about $12.5M total

🖨️ Scanning and digitizing

Each book has to be prepped (unbound or cradle-mounted), scanned page by page, and run through OCR to make the text machine-readable.

At a $15/hour wage with overhead, the labor looks like this:

Fast throughput (15 mins/book): 0.25 hours × $15/hour = $3.75 in labor per book (add ~30% overhead for supervision/payroll = ~$5 per book) → $2.4M total
Moderate throughput (30 mins/book): 0.5 hours × $15/hour = $7.50 (with overhead = ~$10 per book) → $4.9M total
Slow throughput (60 mins/book): 1 hour × $15/hour = $15.00 (with overhead = ~$20 per book) → $9.7M total

📦 Equipment, logistics, storage

Beyond labor and purchase price, you need:

Equipment amortization: $0.50–$2.00 per book depending on scanner type and throughput
Shipping/procurement: ~$3.00 per book for freight and handling
Consumables (blades, glue, etc.): ~$0.50 per book
Digital storage/backups: ~$0.05 per book

Per book for these line items = $4.05–$5.55. Across 500,000 books, we can add ~ $2M–$2.8M to the total.

💸 Total costs for a 500k-book dataset

Low end (used books, fast scans): ~ $8.5M total
Base case (mixed books, moderate scans): ~ $14.7M total
High end (new books, slow scans): ~ $25M total

Even if you push wages higher, slow throughput, or pricier long-tail titles, the totals only creep into the $30–$40M range. Still nowhere near the $1.5B settlement fund.

For ~1-2% of the $1.5B, Anthropic could have built the dataset lawfully, with clean provenance, and avoided this entire piracy fight.

The $1.5B isn’t the “cost of data.” It’s the price Anthropic is paying for their lack of hindsight or creative operations.

Why the judge hit pause

Alsup’s refusal wasn’t about the size of the settlement. It was about missing details.

He wants:

A precise list of covered works and authors.
A workable allocation plan (handling duplicates, multiple rights holders).
Plain-English notice so authors understand the deal.
Clarity on how the fund grows if more books are validated.
Verification of dataset destruction (including backups and duplicates).

Until those answers are provided, his approval is off the table. The parties are meeting later this month to discuss details.

Key takeaways

💥 Training on books you actually own is legal fair use. That’s a major clarification for AI.

💥 Piracy is the problem. That’s what created a billion-dollar liability.

💥 $1.5B is a discount. The real trial risk stretched into the tens of billions.

💥 Doing it the right way is cheap. Buying and scanning half a million books would have cost tens of millions, not billions.

💥 Clean sourcing is compliance. You need to know where your data, info, and content comes from… and be able to prove it (and delete anything questionable).

This case isn’t just about Anthropic. It’s a signal to every business exploring AI: your risk lives in your inputs.

The difference between clean and unclean data isn’t academic. It’s the difference between a manageable line item and a lawsuit that can put your entire company at risk.

⸻

🤖 Subscribe to AnaGPT

Every week, I break down the latest legal in AI, tech, and law—minus the jargon. Whether you’re a founder, creator, or lawyer, this newsletter will help you stay two steps ahead of the lawsuits.

➡️ Forward this post to someone working on AI. They’ll thank you later.

➡️ Follow Ana on Instagram @anajuneja

➡️ Add Ana on LinkedIn @anajuneja

AnaGPT

Discussion about this post

Ready for more?