EU's AI rules spark debate over transparency

A new set of laws governing the use of artificial intelligence (AI) in the European Union will force companies to be more transparent about the data used to train their systems, prying open one of the industry’s most closely guarded secrets.

But as the industry booms, questions have been raised over how AI companies obtain the data used to train their models, and whether feeding them bestselling books and Hollywood movies without their creators’ permission amounts to a breach of copyright.

One of the more contentious sections of the Act states that organisations deploying general-purpose AI models, such as ChatGPT, will have to provide “detailed summaries” of the content used to train them.

While the details have yet to be hammered out, AI companies are highly resistant to revealing what their models have been trained on, describing the information as a trade secret that would give competitors an unfair advantage were it made public.

Cooking secrets revealed?

“It would be a dream come true to see my competitors’ datasets, and likewise for them to see ours,” said Matthieu Riouf, CEO of AI-powered image-editing firm Photoroom.

“It’s like cooking,” he added. “There’s a secret part of the recipe that the best chefs wouldn’t share, the ‘je ne sais quoi’ that makes it different.”

Over the past year, a number of prominent tech companies, including Google, OpenAI, and Stability AI have faced lawsuits from creators claiming their content was improperly used to train their models.

Scarlett Johansson look alike

Amid growing scrutiny, tech companies have signed a flurry of content-licensing deals with media outlets and websites. Among others, OpenAI signed deals with the Financial Times and The Atlantic, while Google struck deals with NewsCorp and opens new tab social media site Reddit.

Despite such moves, OpenAI drew criticism in March when CTO Mira Murati declined to answer a question from the Wall Street Journal on whether YouTube videos had been used to train its video-generating tool Sora, which the company said would breach its terms and conditions.

Last month, OpenAI faced further backlash for featuring an AI-generated voice described as “eerily similar” to her own by actress Scarlett Johansson in a public demonstration of the newest version of ChatGPT.