Authors, here is a quick way to check out if your books were used to train AI

Over 190,000 books were used without permission to train AI tools from Meta and Bloomberg. Are yours among them?

In late summer, 2023, the news broke out that the books by Stephen King, Zadie Smith, and thousands of other contemporary authors have been used to train generative artificial intelligence.

Large language models (LLMs) are a type of artificial intelligence that are trained on massive amounts of text data. This training allows them to learn the patterns and relationships in language.

Alex Reisner, the man who revealed the news to the world in a series of articles for the Atlantic, says:

💬 Few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on.

Reisner acquired a data set of more than 191,000 books that were used without permission to train LLMs.

The data set is known as “Books3” and is based on thousands of pirated books that were, in a vast majority, published in the past 20 years.

If you subscribe to The Atlantic, you can access that data set here. If you don’t, you can use the source provided by the Authors Guild in this article.

💬 If you’re an author, you may have recently discovered that your published book was included in a dataset of books used to train artificial intelligence systems without your permission.

When you go to the data set, you will see a simple search box, where you can type your name to see which of your books were used to train LLMs (a screenshot with an example search is shown below).

There are two things you can be sure about, if your book is in the data set:

▸ It was used to train generative artificial intelligence models without your permission. 

▸ It was pirated in the first place.

The article on the Authors Guild blog also lists actions you can take to speak out in defense of your rights. Among them, you will find signing The Author Guild’s open letter and taking actions to prevent future unauthorized use.

Source: You Just Found Out Your Book Was Used to Train AI. Now What? – The Authors Guild

Comments

3 responses to “Authors, here is a quick way to check out if your books were used to train AI”

  1. Darcy Branwyn Avatar
    Darcy Branwyn

    I heard about this. Creepy as heck.

    Like

  2. […] It’s no surprise that every industry, including publishing, investigates how to benefit from AI, while keeping an eye on possible threats, naming only the case of training large language models (LLMs) on pirated books. […]

    Like

  3. […] books are flooding Amazon, and it’s not easy to spot them. Large language models (LLMs) are trained on a vast number of pirated books, so they could come up with their own […]

    Like

Leave a comment

Popular Posts

Start Reading

New at Geek Updated

Revolution Begins Update All Apps Funny T-shirt for Geeks

Revolution Begins Update All Apps Funny T-shirt for Geeks – Time to start a revolution, comrades! Wait, I need to finish one more thing. This propaganda style artwork says: “The revolution begins as soon as I update all my apps”.

Writer Updated

Aspiring writer’s friendly companion