Useful tech info

Some of the suits against AI are just lawyers riding a trend to make money. This one is serious and could set a meaningful precedent if it isn’t quietly settled.

The publishers of 400 local papers are suing OpenAI and Microsoft for scraping content, chewing it up into AI answers, and often copying it verbatim in the answers. Importantly, the suit demands a jury trial.

The core of the lawsuit includes some useful info about AI tricks:

= = = = = START QUOTE:

Using automated systems, Defendants systematically and secretly crawled the Publishers’ websites—including content behind paywalls and other access restrictions—and copied the Publishers’ articles, stories, and other original works onto their own servers without authorization. As part of that process, Defendants’ systems stripped from the Publishers’ works all copyright management information (“CMI”) embedded in and associated with those works, such as author credits, publication names, copyright notices, and terms of use information, that establish ownership and signal that a work is protected. That CMI-stripping, an instrumental part of Defendants’ ingestion pipeline, helped sever the link between the copied content and its rightful owners and authorizations. The scraped, stripped content was then used to train Defendants’ large language models (“LLMs”), which have “memorized” that material and likely reproduced it, verbatim or near-verbatim, in response to user prompts for years.

= = = = = END QUOTE.

CMI-stripping is the text equivalent of grinding off the VINs from stolen car parts before selling them.