Digestly

Mar 19, 2025

How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED

TED - How AI Models Steal Creative Work — and What to Do About It | Ed Newton-Rex | TED

Generative AI companies rely on three main resources: people, compute, and data. While they invest heavily in engineers and computing power, they often use creative works as training data without permission or compensation. This practice is unsustainable and unfair to creators whose works are used without consent. Many AI companies scrape the web for data, including copyrighted materials, leading to competition with the original creators. For example, AI-generated music and art are already impacting the livelihoods of musicians and artists, as seen with filmmaker Ram Gopal Varma's preference for AI music and artist Kelly McKernan's income drop. Legal challenges are ongoing, with creators arguing that AI training violates copyright laws. Licensing is proposed as a solution, allowing AI companies to use creative works legally and fairly. Some companies have already adopted this approach, demonstrating its feasibility. Licensing ensures creators are compensated and prevents the closure of valuable online content. Public opinion supports compensation for data providers, and initiatives like Fairly Trained certify companies that license their data. A collaborative approach between AI and creative industries can lead to mutual benefits, but respect for creators' rights is essential.

Key Points:

  • Generative AI companies often use unlicensed creative works for training, leading to unfair competition with creators.
  • AI training on copyrighted work without permission is common, with significant negative impacts on creators' livelihoods.
  • Legal frameworks like copyright laws are being tested, with ongoing lawsuits challenging unlicensed AI training.
  • Licensing training data is a viable solution, with some companies already adopting this approach to ensure fair compensation.
  • Public opinion favors compensating creators for their work, and initiatives like Fairly Trained promote ethical AI practices.

Details:

1. 🎨 Generative AI: Innovation vs. Ethical Concerns

  • Generative AI holds immense potential for innovation, transforming industries by creating new content autonomously.
  • A significant ethical concern is the use of creators' work without permission, which infringes on intellectual property and moral rights, evidenced by cases where artists' styles are replicated without credit.
  • Balancing innovation with ethical practices is paramount, as highlighted by initiatives requiring AI models to obtain explicit permission from original creators.
  • The ongoing debate in the tech community stresses the need for transparent AI systems that respect copyright laws and individual rights.
  • Solutions like watermarking AI-generated content and developing frameworks for ethical AI use are being considered to address these challenges.

2. 🔧 Building AI Models: People, Compute, and Data

  • AI companies need three key resources to build models: people, compute, and data.
  • Essential personnel include engineers who are crucial for model development.
  • Compute resources, particularly GPUs, are necessary for running the training process.
  • Abundant and quality data is required for effective model training.

3. 💰 Cost of AI Development: Paying for Resources

3.1. Financial Investments in AI Engineering and Modeling

3.2. Ethical Concerns in the Use of Training Data

4. 🔄 Resetting AI Ecosystems: Licensing Training Data

  • Licensing training data can establish a fair AI ecosystem, ensuring both AI companies and creators benefit.
  • Creators are compensated for their contributions, preventing exploitation and fostering sustainable practices.
  • AI companies gain access to high-quality, licensed data, which can enhance model performance and innovation.
  • Equitable licensing practices can lead to a more transparent and ethical AI development environment.

5. 🕵️‍♂️ Unlicensed AI Training: Current Practices

  • Most AI companies today do not license the majority of their training data, relying heavily on web scrapers to collect massive amounts of data without permission.
  • They use web scrapers to find, download, and train on as much content as they can gather, often without considering the copyright status of the data.
  • AI companies are often secretive about their training data sources, making it difficult to assess the legality or ethics of their data acquisition methods.
  • Training on copyrighted work without a license is widespread, raising significant legal and ethical concerns for the industry.
  • This practice exposes companies to potential legal actions and damages, as they may infringe on intellectual property rights.
  • The lack of transparency and regulation in data acquisition for AI training poses risks not only to companies but also to the creators of the original content.
  • If these practices continue, the AI industry could face increased scrutiny and possible intervention from regulatory bodies.

6. 📉 Impact of AI on Creative Industries

  • 64% of large language models developed between 2019 and 2023 were trained on data from Common Crawl, which includes copyrighted works without explicit licenses.
  • 21% of these models did not disclose their training data sources, highlighting a lack of transparency in the industry.
  • The practice of unlicensed training on copyrighted work is becoming a standard procedure in the generative AI industry, leading to significant negative impacts on creators, who may not be compensated or credited for their work.

7. 🤖 AI Models Competing with Original Works

  • Generative AI inherently competes with the data it was trained on, challenging the narrative that it primarily democratizes creativity.
  • The competition between AI outputs and original works is unavoidable, despite industry narratives focusing on creativity democratization.
  • AI-generated content, such as art and writing, often directly competes with original creators, impacting industries by offering cheaper and faster alternatives.
  • For example, AI-generated art can undercut traditional artists by producing similar works at a fraction of the time and cost.
  • The implications extend to legal and ethical considerations, as the line between inspiration and replication blurs with AI's capabilities.

8. 🎥 Real-World Effects of Generative AI

  • Generative AI models trained on short stories can now generate similar stories, effectively competing with their source material, impacting the publishing industry.
  • AI models trained on stock images are producing new stock images, indicating a shift in sourcing and licensing practices in the stock image industry.
  • Music models trained on TV show music are capable of creating competing tracks, affecting the traditional music licensing business.
  • Despite being in early stages, generative AI is already altering market dynamics by providing efficient and accessible alternatives to traditional content sources.
  • The impact of AI is observable in real-world scenarios, with industries like publishing, stock images, and music licensing experiencing competition from AI-generated content.

9. 🎨 Artists and Creators: Struggling with AI Competition

  • Filmmaker Ram Gopal Varma plans to integrate AI music into all future projects, highlighting a trend towards AI-produced music in the film industry.
  • An AI-generated song achieved significant success by reaching number 48 on the German charts, demonstrating AI's growing influence in music.
  • Visual artist Kelly McKernan faced a 33% decline in income after their artwork was used to train an AI model, illustrating financial impacts on individual artists.
  • Illustrators around the world are facing increased competition from AI models, which often utilize their own artwork for training purposes, leading to concerns about copyright and creative ownership.

10. 📊 Freelance Market Impact: AI vs. Human Work

  • Generative AI has reduced the demand for freelance writing tasks by 8% since the introduction of ChatGPT.
  • This reduction highlights a shift in the freelance market, where AI tools are increasingly replacing traditional writing roles.
  • Freelancers are encouraged to adapt by upskilling in areas that AI cannot easily replicate, such as creative and strategic writing.
  • The demand for AI-related skills, such as prompt engineering and AI integration, is on the rise, offering new opportunities for freelancers.
  • Freelancers focusing on unique human skills or combining AI with creativity can remain competitive in the evolving market.
  • Future trends suggest that AI will continue to reshape the freelance landscape, necessitating continuous learning and adaptation from freelancers.

11. ⚖️ Legal and Ethical Frameworks: Copyright Challenges

  • Generative AI is seen as competing with the work it is trained on, presenting a challenge for original creators.
  • Creators argue that AI training, which involves copying, is illegal under current copyright laws, as it infringes on their exclusive rights to authorize copies.
  • In the US, AI companies claim AI training is protected by the fair use exception, which allows unlicensed copying in certain cases, like parody creation.
  • Creators and rights holders dispute this claim, arguing that the fair use exception cannot justify the mass exploitation of creative works for automated competition.
  • There are approximately 30 ongoing lawsuits in the US brought by rights holders against AI companies, aiming to resolve this legal question.
  • These lawsuits are likely to take time to conclude, leaving creators to contend with what they perceive as unfair competition in the interim.
  • The outcome of these lawsuits could significantly impact the creative industry, determining whether AI companies can continue using copyrighted works for training under the fair use doctrine.
  • A decision favoring AI companies could lead to increased use of copyrighted material without compensation to creators, while a decision against them might enforce stricter regulations and protect creators' rights.

12. 📜 Proposing Licensing as a Solution

  • Licensing is proposed as a viable solution for commercial entities to use copyrighted work, aligning with existing practices in merchandise and streaming services.
  • AI companies oppose licensing by invoking fair use, arguing AI should learn from copyrighted works without licenses, akin to human learning.
  • The contrast is highlighted where traditional education compensates creators through paid resources, whereas AI companies scrape content without monetary contribution.
  • Generative AI firms, valued at millions or billions, utilize extensive copyrighted content without compensation, raising legal and ethical concerns over copyright infringement.

13. 📈 Licensing Feasibility: Options and Examples

  • AI image generators are creating approximately 2.5 million images daily, and AI song generators are producing 10 songs every second, highlighting the scalability of AI content creation.
  • AI companies argue that licensing training data is impractical due to the vast amounts of data used, resulting in potentially small payments to individual creators.
  • Despite the argument about small payments, many content-licensing markets function this way, and creators still expect compensation.
  • The feasibility of licensing is challenged by AI companies, yet there have been 27 major licensing deals between AI companies and rights holders in the past year, indicating that licensing is possible and being actively pursued.
  • Specific examples of licensing deals include partnerships between major AI firms and music labels, where rights holders receive a share of profits from AI-generated music, illustrating a successful model of compensating creators.

14. 🏢 Fairly Trained: Certifying Ethical AI Practices

  • Marketplaces provide access to training data, including public domain data like the 500-billion-word Common Corpus.
  • Synthetic data generated by AI models can be used to avoid copyright infringement.
  • Multiple companies have successfully licensed their data for AI model training, proving it is feasible.
  • Stability AI released an AI music model trained on licensed music, demonstrating practical application.
  • Fairly Trained is a nonprofit certifying generative AI companies that do not use copyrighted work without a license.
  • Since January, Fairly Trained has certified 18 companies, showcasing a commitment to ethical AI practices.
  • Certified companies adopt various licensing approaches, including licensing individual voices and music catalogs.
  • A large language model has been trained solely on public domain data, illustrating diverse data sourcing strategies.
  • Fairly Trained evaluates companies using criteria such as data sourcing, licensing agreements, and compliance with copyright laws.
  • Companies face challenges such as aligning business models with ethical practices and managing licensing costs.
  • Certified companies benefit from enhanced reputational credibility and potential market advantages.

15. 🌐 Public and Creator Sentiments on AI Training

15.1. Flexible Licensing Models for AI Training

15.2. Public Sentiments and Access Restrictions

16. 🗣️ Statement on AI Training: A Call for Fairness

  • 60 percent of people said the unlicensed use of creative works for AI training should not be allowed, versus only 19 percent who said it should.
  • 74 percent of people believe that AI companies should compensate data providers, while only nine percent opposed it.
  • The public consistently supports requirements around permission and payment for the use of data in AI training, rejecting the notion that public availability makes data fair game.
  • A "Statement on AI Training" has been launched, signed by 11,000 creators, including Nobel-winning authors and Oscar-winning composers, opposing the unlicensed use of creative works for AI training.
  • The unlicensed use of creative works for AI training is seen as an unjust threat to the livelihoods of creators, with potential catastrophic impacts on their professions.

17. 🤝 Mutually Beneficial AI and Creative Industries

  • Many artists, writers, and musicians currently oppose generative AI because it trains on their work without consent.
  • A mutually beneficial relationship between AI and creative industries requires beginning with respect for the value of creative works and the rights of creators.
  • Examples of potential benefits include AI assisting in repetitive tasks, allowing creators more time for original work, and AI-driven tools enhancing creative processes.
  • Challenges include ensuring fair compensation and recognition for creators whose work is used in AI training datasets.
  • Strategies for overcoming these challenges involve transparent agreements and collaborations between AI developers and creative professionals.

18. 🚀 The Path Forward: Licensing as a Sustainable Solution

  • Licensing AI development resources will slow progress in the short term but will ultimately lead to equally capable and powerful models.
  • Following licensing practices will prevent conflicts with publishers and creators, fostering a sustainable ecosystem.
  • The Fairly Trained certification is an example of companies implementing licensing for training data, addressing ethical concerns and enhancing credibility.
  • There is an encouragement for AI companies to license their training data and for employees to advocate for this practice to ensure ethical standards.
  • Users of generative AI are urged to inquire about the training data of their favorite models, promoting transparency and accountability.
  • A sustainable future is envisioned where generative AI and human creativity coexist symbiotically, driven by responsible licensing practices.

19. 👏 Closing Remarks

  • The segment consists solely of a closing thank you and applause, indicating the end of the presentation or event. No actionable insights or concrete metrics are provided in this segment.
View Full Content
Upgrade to Plus to unlock complete episodes, key insights, and in-depth analysis
Starting at $5/month. Cancel anytime.