Training AI on User-Generated Content: Problems and Solutions

Note - by Evan Hancock Volume 104 - Issue 6

Introduction

The ongoing flurry of developments in generative Artificial Intelligence (AI) has the power to reshape society—through sheer volume of output, if nothing else. As a result, AI raises a variety of intimidatingly large issues.^[1] But it is best not to catastrophize.^[2] Instead, the best approach is to seek pragmatic, attainable policies to isolate and solve the various problems that AI creates.

To do so, it is critical to develop a basic understanding of how current AI technology works, what it can do, and what it cannot do. The current wave of AI advancement was teed up by a breakthrough in machine learning technology that occurred a handful of years before name-brand products like ChatGPT came to market.^[3] “Transformer” architecture, which enables today’s consumer AI models, was originally developed as an outgrowth of existing translation technology.^[4] Though the shift in model architecture initially seemed subtle, transformers immediately proved more powerful than older types of AI.^[5] This enhancement came because transformers encode relationships between input data points by themselves rather than simply matching input data to something a human had already equipped them to recognize.^[6]

Transformer architectures have enabled AI models to perform feats as incredible as they are ominous. New models can generate text so convincingly emotional as to become objects of human affection, sometimes to tragic results.^[7] Despite its power, though, AI is not magic. When it comes to producing coherent information that a human could have or has already produced, AI works fairly well.^[8] But when it comes to performing original reasoning outside their training data, AI models struggle.^[9] This limitation led researchers from Apple, a company itself invested in AI, to describe AI models’ ability to reason as “fragile.”^[10] Perhaps, then, AI models will not lead to the apocalypse, and the problems they create can be addressed through measured policy proposals while preserving their utility as tools.

One such problem lies in the way models are made. To generate data, AI models must first be fed data; to feed models data, tech companies must get their hands on some. The industry’s current practice is simple: if data is available on the internet, use it.^[11] This practice has growing privacy implications as models incorporate large amounts of personal data.^[12] The practice is also in the crosshairs of celebrity artists whose works were scooped up for AI training.^[13] Those are problems for another day.

This Note will address an aspect of training data acquisition that, while perhaps less sensational, is certainly more familiar to the workaday social media scroller—the incorporation of user-generated content into AI models. For the purposes of this Note, “user-generated content” means “any kind of text, data or action performed by online digital systems users, published and disseminated by the same user through independent channels, that incur an expressive or communicative effect either [i]n an individual manner or combined with other contributions from the same or other sources.”^[14] That means that the scope of this Note excludes, say, a sitcom aired on network television; instead, the focus here is on TikTok users’ videos, Bandcamp users’ music, and photos of you and your family shared on Instagram.

As will be discussed in Part I below, unwanted incorporation of such content into AI training datasets is potentially unlawful and certainly both unethical and unpopular. Part II of this Note will evaluate deficient efforts to address these problems, including the industry’s current practice of licensing user-generated content through consumer contracts. Part III then proposes that states leverage consumer protection law to achieve an opt-in regime for AI training on user-generated content, a solution effective against the practice’s problematic aspects. Finally, Part IV addresses challenges to this novel approach for prescribing cultural policy.

I. AI Training and the User-Generated Content Problem(s)

User-generated content is incorporated into AI models at the “training” stage,^[15] a process of machine learning and fine-tuning that enables models to encode language, images, or other information as the numerical values that AI operates upon.^[16] The goal of this process is not so much to create a replica of human cognition as it is to minimize the rate of error in the model’s output.^[17] Historically, large AI companies have pursued this end not through fundamental changes to the technology but by drastically increasing the size and complexity of transformer models.^[18] Recent model performance suggests that this scaling process is already experiencing diminishing returns, as was forecasted by some researchers.^[19] The approach has, however, significantly improved model capabilities over the last four years.^[20]

As models increase in size, so does their appetite for training data—so much so that, if trends continue, training datasets will soon incorporate all human-generated data on the internet.^[21] Considering that models have a nasty habit of breaking down when trained on AI-generated data,^[22] AI firms are incentivized to incorporate as much human-generated data as they can as quickly as possible to stay ahead of the competition. This includes user-generated content, both that published for-profit and not-for-profit, which host platforms license to AI firms or use to train their own models.^[23] Such usage is potentially problematic in multiple ways.

A. Legal Liability

The problem of the greatest immediate concern to AI firms and user-generated content creators is the potential that use of such content in model training might render AI firms liable for damages. For starters, AI models can directly reproduce copyright-protected works included in their training data.^[24] Industry leader OpenAI, of course, insists that such “regurgitation” is a rare bug that will soon be eliminated.^[25]

Even if OpenAI is right on that count, the training process itself may be legally problematic. Several lawsuits alleging that AI training violates copyright are underway but far from resolution.^[26] Among these cases, the putative class action Andersen v. Stability AI Ltd.^[27] stands out as it involves user-generated content. To use the first named plaintiff as an example, Sarah Andersen differs from the average social media user in that she has several registered copyrights for comic books,^[28] but she also posts artworks as free-to-view user-generated content on various social media platforms.^[29] The defendant AI firms succeeded in having most of the plaintiffs’ claims dismissed.^[30] However, the court did not dismiss a direct copyright infringement claim against Stability AI because the plaintiffs “adequately alleged . . . that Stability downloaded or otherwise acquired copies of billions of copyrighted images without permission . . . and used those images . . . to train Stable Diffusion.”^[31]

Andersen and cases like it are evolving at a rapid pace. Filings in a similar case, Kadrey v. Meta Platforms, Inc.,^[32] allege that the company behind Facebook and Instagram knew that training datasets it used “included copyrighted works without the permission of the copyright holders.”^[33] Furthermore, the plaintiffs discovered internal Meta communications indicating that its “engineers ‘filtered . . . copyright lines’” out of the dataset and that CEO Mark Zuckerberg personally approved the use of a dataset that other members of “Meta’s AI executive team . . . ‘kn[e]w to be pirated.’”^[34] If this is true, it appears that tech companies themselves believed their training activities were potentially infringing. Still, whether courts will find copyright infringement in AI training remains to be seen. All that can be said for certain at this point is that Andersen has the potential to evolve into a precedent that would threaten all AI firms that train models on the open internet.

B. The Ethics of Copyright

Whether or not AI firms lose cases like Andersen, their training methods are offensive to the prevailing ethical theories underpinning American copyright law. This theory, that “[a]uthors behave well when they create and offer works that enrich the audience’s intellectual and cultural lives” and “[a]udiences behave well when they offer authors the financial support needed to engage in creative work,”^[35] traces back to Locke’s conception of property.^[36] A justification for copyright based on a labor theory of property must grapple with whether the production of intellectual property involves labor,^[37] but the issue is moot when the intellectual property right protects an intentionally created piece of personal expression posted to a user-generated content hosting platform. In the case of user-generated content, the products of human labor (however negligible the amount) are taken without compensation to train AI models.^[38] This is a serious ethical breach if Lockean property theory indeed underpins American copyright law.

Whether that is true remains contested, though. Locke’s theory of property was originally created to address the withdrawal of physical necessities from the common; to Locke, claiming something as property is only justified if that thing is not wasted and enough is left for others.^[39] This condition is not obviously satisfied by American copyright law, which protects even ideas that are “wasted” by being withheld from others—at least, for a certain period of time.^[40] Furthermore, the use of ideas and creative expressions (especially digital ones, copies of which are fungible and losslessly transmittable) is not rivalrous in the same way as physical property.^[41] These gaps render “a labor theory of intellectual property . . . powerful, but incomplete.”^[42]

In the alternative, one might view copyright law as a manifestation of authorship ideology. Rather than a property right based on labor, the story goes, a copyright is a heroic author’s reward for creating something original.^[43] If this theory is the true justification for copyright, that body of law protects one’s freedom to self-actualize via the creation of ideas that reflect one’s personality.^[44] This perspective calls to mind Professor Radin’s theory of property as personhood, which argues that some property interests are so central to one’s being that they ought to be presumptively afforded strong protection.^[45] Radin’s original article advancing that theory covered copyright only briefly and did not decide whether property interests in ideas are personal.^[46] However, Radin’s treatment of ideas implies that a copyright—a protection of economic interests alone—is insufficient when unaccompanied by moral rights if expressing ideas really is integral to an author’s personhood.^[47] In this way, a bare copyright is more aligned with a Lockean, economic protection of labor than a Hegelian, metaphysical protection of personality.

As critics of intellectual property economies point out, the copyright regime awkwardly form-fits exclusionary rights with ideas and creative works, which lack the literal rivalry in use of physical property.^[48] But the aim of this subpart is to describe rather than to critique; taking American copyright law as it is, AI firms’ use of user-generated content in training data is problematic regardless of which theory sounds louder. The Lockean conception—reduced to its simplest form, the notion that artists ought to be paid for their labor—is most aligned with current public perceptions of AI training and is ingrained in American law.^[49] Under that theory, AI firms’ current practice is an ethical breach so long as the users behind training data go uncompensated. That theory does not necessarily support creators earning more than a mere recoupment of their production costs^[50]—but even if that is all creators deserve, AI firms are not paying. If a personhood theory is more accurate, the practice is still unethical, at least insofar as training datasets incorporate user-generated content that is sufficiently personal to merit protection.^[51] When an AI firm trains a model on a piece of personality-infused content without the author’s consent, it violates the boundaries of that author’s personhood—boundaries that are usually fenced off by intellectual property law.

C. Popular Backlash to AI Training

Regardless of the legal or ethical implications of training AI models on user-generated content, the practice is deeply unpopular. A significant proportion of Americans believe that tech companies should not be allowed to train AI models on the open internet.^[52] One study found that supermajorities of Americans favored requiring AI firms to compensate creators of training data (but it should be noted that that group’s mission is to “seek[] political solutions to potential catastrophic risks from emerging AI technology”).^[53] Discomfort with the use of user-generated content in AI training is particularly prevalent among influencers, professional social media users whose livelihoods depend on receiving compensation for user-generated content. Many influencers with audiences numbering in the millions have broadcast that such use of user-generated content is not only wrong but outright theft.^[54]

Tech companies are already feeling the heat. As discussed in subpart I(A) above, users may have copyright claims against AI firms—but in the internet age, lawsuits operate on a much slower timescale than consumer outrage. In a recent update to the terms of service for its popular Creative Cloud software suite, Adobe faced scrutiny for merely appearing to clear the way for AI training.^[55] The update was unassuming; new language clarified that Adobe reserved the right to access users’ content “through both automated and manual methods.”^[56] Consumers’ reactions, though, were not; they forced Adobe to make multiple statements clarifying that it did not plan to train models on users’ content and to add explanatory language to the terms themselves.^[57]

Users’ distaste for AI is sufficiently potent that appealing to it can be a useful selling point for software products. The digital illustration app Procreate has specifically positioned itself against AI and, by implication, competitors who might train models on user-generated content.^[58] Procreate’s marketing copy on the issue is as strident as the consumer rhetoric it emulates, claiming that generative AI is “[b]uilt on a foundation of theft.”^[59] If rejecting AI training is a viable economic niche, negative consumer sentiment toward AI training must be significant.

Such harsh sentiment among user-generated content creators fills one of the key gaps in intellectual property ethics—namely, why a producer of a creative work that may not be particularly valuable on the market should be granted any right to exclude against certain uses. As intellectual property critics Oren Bracha and Talha Syed point out, the premises that (1) expressions are deeply imbued with the author’s personality only and (2) authors deserve enough exclusive control over works to effect expression of their personalities logically imply only that authors may exclude others from the creation process, not use or consumption after a work is published.^[60] Even if a legal right is not logically implied, though, it may be justified. Here, the repugnance of AI training to creators justifies the exercise of a legal right—that is, the entitlement of creators to exclude against repugnant uses and the correlative disentitlement of AI firms to make such use.^[61]

II. False Solutions

This Note is not the first writing in history to approach the problem of AI firms gobbling up Americans’ data en masse, much less Big Tech’s other affronts to data privacy. This Part, though, will explain why previous proposals to address AI training on user-generated content fall short. Most are poor fits for the problems previously identified or are simply wishful thinking. The industry’s current “solution,” deployed via adhesive consumer contracts, is predictably cynical and likewise ineffectual.

A. Revolutionary Proposals

The emergence of generative AI has intensified calls for the United States to address a broad range of sociotechnological harms in one fell swoop through large-scale federal action. This subpart will examine why those policies are not the best way to address AI training on user-generated content.

1. EU-Style Data Protection.—New AI issues intersect with continuing demand for the United States to enact a European Union-style comprehensive data protection law.^[62] The EU’s General Data Protection Regulation (GDPR) provides users with a variety of rights related to accuracy of and control over personal data.^[63] AI firms operating in the EU are also subject to the Union’s AI Act^[64] and data mining copyright directive.^[65] Pushes for the federal government to standardize United States data privacy law are certainly worthy and, according to some, necessary.^[66] However, such an approach has two key deficiencies when it comes to AI training.

First, in the current American political landscape, federal data protection is too far off to react swiftly enough to the booming AI industry. This is not for lack of trying—two Congress members, one Democrat and one Republican, sponsored a major data privacy bill in early 2024.^[67] The bill was dead by summer.^[68] “The decision to cancel the hearing” at which the House Committee on Energy and Commerce would have considered the bill “came amid news that House GOP leaders had vowed . . . to scuttle the bill whether it was approved by the committee or not.”^[69] Republicans considered the bill insufficiently “pro-business,”^[70] a claim not without irony considering that the current jumble of state data protection laws may be much worse for small and medium businesses than a bipartisan federal law.^[71] But it seems those are not the businesses in question, as the tech industry’s oligarchs, almost to a man, recently dropped all pretenses as to which political party is more useful to them.^[72] No progress toward data protection will be made under a Republican trifecta government.

Even in a world where the EU’s regulations on AI were adopted stateside, though, the particular problems with AI training on user-generated content would go unsolved. While the EU gives consumers a right to opt out of data collection and have their data erased from corporate servers,^[73] there is little EU users can do to stop ongoing use of their content in AI models. Because transformer models do not actually contain copies of training data,^[74] deletion requests, like those Europeans are entitled to make, do not affect the models themselves. Sending a deletion request to an AI firm results in the erasure of only any personal data one gave that firm while using an AI product.^[75] Besides, though the future of American copyright claims over AI training is up in the air, that Andersen made it past the motion to dismiss stage^[76] is already better than a similar claim in Europe would fare. The EU’s copyright directive on data mining instructs member states to make an “exception or limitation” to copyright protections for “reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.”^[77] The exception does not apply if the right to use works has been “expressly reserved by their rightsholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”^[78] But a plain-language declaration that one dislikes AI on one’s social media page,^[79] for example, might not qualify as “machine-readable.”^[80] In fact, the Directive has already been cited as the primary grounds to dismiss a copyright complaint against a training dataset compiler.^[81] All told, EU-style regulation in the United States would be both too little and too late.

2. Federal AI Agency.—Alongside calls for nationwide data use regulations, policy think tanks and tech companies alike have proposed the creation of a new federal agency for AI regulation.^[82] An “AI agency” would not solve issues with AI training on user-generated content. That is because, though it very well might lead to some useful regulations, current proposals focus on model output over input and center enterprise AI over consumer AI.^[83] Brookings scholar Anton Korinek’s proposal would empower an AI agency to “pursue solutions to the direct AI control problem” (enterprises’ abilities to ensure AI systems function as intended) and “to oversee and . . . regulate the way AI is used across the U.S. economy to address the social control problem” (i.e., to ensure that AI output conforms to social norms).^[84] Korinek portrays these broad categories as AI’s essential issues and assumes that more targeted approaches would come at the expense of accomplishing his grand designs.^[85] While portraying AI as potentially more revolutionary than aviation or nuclear energy, Korinek joins industry leaders in centering calls for regulation on AI’s output, which can be discriminatory, misinformative, or otherwise disagreeable.^[86]

Conspicuous by their absence in calls for a federal AI agency are any mentions of AI training. In fact, the industry’s proposal to the government seems to be: “draw the line for us now so that we know how to avoid provoking you going forward.”^[87] This guardrail-setting approach is a sensible way to start fixing AI output.^[88] Addressing problems with training, though, would necessarily abridge tech companies’ abilities to create models with wild abandon. Therefore, even if one assumes that industry leaders’ calls for regulation are made in good faith, they are unlikely to advocate for any policy that would meaningfully address training problems. This probability is bolstered by worries that AI industry leaders’ friendliness with top-level government officials would lead to rampant industry capture of an AI agency, making the AI sector particularly vulnerable to a problem that already affects federal agencies.^[89] In short, a federal “AI agency” may well help the tech industry set guardrails for model output, but the incentives at play would make it unlikely to address problems with AI training on user-generated content.

B. 1:1 Compensation

Another proposal to solve the problems of AI training goes that AI firms should simply be made to pay users back for their data.^[90] What “levying one’s way out of a doctrinal and economic dilemma”^[91] has in conceptual elegance, though, it lacks in practicality. A chief inspiration for a compensatory system is found in the Audio Home Recording Act (AHRA), a late-eighties federal law that taxed equipment that could be used to pirate music so that the music industry could be compensated for lost profits.^[92] However, that model is the exact inverse of AI training. In the case of the AHRA, a massive, decentralized body of potential infringers was made to compensate centralized rightsholders. AI firms are, instead, a few centralized infringers drawing content from a massive, decentralized body of rightsholders. The sheer number of social media users whose data has been swept up by models is probably in itself fatal to this proposal’s workability.

Trying to compensate users for their content on a one-to-one basis also raises difficult questions about how levies should be calculated and the value of user-generated content. Would the amount collected from AI firms be some percentage of their revenue imagined to represent how much their use of user-generated content earned them? Which companies would be taxed? Would only firms with AI at their core, like OpenAI, with valuations in the hundreds of billions be included?^[93] Or would the levy’s sweep include Nvidia, a firm primarily dealing in the hardware used to make AI with a multi-trillion-dollar valuation?^[94] Similar problems of scope gutted the AHRA when MP3 players, though definitely devices used by music pirates, were held not to fall within the Act’s ambit.^[95] Furthermore, it would be tremendously difficult to decide how much should be paid to each user. Not all user-generated content is created equal; setting aside sponsorships and other ways of making money off-platform, content-hosting platforms require users to accrue certain levels of popularity before participating in profit-sharing schemes.^[96] Surely, then, users should not be compensated equally as some content is ordinarily deemed to be worth nothing.^[97] However, if each piece of user-generated content has equal influence upon model weights, then users should be compensated equally. This fundamental contradiction makes compensating sources of training data much less appealing.

Finally, compensation would be partially ineffective against the popular backlash to AI training. As discussed above, there is a notion among consumers that AI training is “theft.”^[98] Perhaps some users would be happy to receive fair monetary value for their content. However, particularly strident opponents insist that, because content incorporated into the training dataset theoretically influences every operation of the model, each piece of model output is also theft.^[99] For those who believe that AI outputs influenced by their works are a sort of stolen goods, a post-hoc payment for the input would not be satisfying. They would only be appeased by the ability to decide whether or not AI firms may use their content before model training occurs.

C. Current Industry Practice: Licensing via Consumer Contracts

The tech industry’s current practice, which purports to solve the problems with AI training on user-generated content, is to obtain users’ assent via consumer contracts.^[100] This is not altogether new; hosting platforms have licensed user-generated content for most purposes since before such platforms started training AI models.^[101] Thanks to this existing practice, platforms have adapted their terms to include AI training without conferring any additional benefits upon users.^[102]

American law is especially friendly to this practice. Meta, for example, was able to begin training AI on American users’ content without any special notification.^[103] Though some scholars object to such abrupt changes in terms,^[104] the contracts remain effective. Like all adhesive contract practices, changes in terms test the limits of traditional principles, but American law has expanded to accommodate.^[105] Contemporary American law generally endorses contracts like this as long as they are properly made.^[106]

To give credit where it is due, some platforms allow American users to opt out of having their new content used for AI training.^[107] However, Meta platforms do not,^[108] leaving Meta’s Terms of Service as the social media monolith’s only means of addressing the problems with AI training. This may well be enough to escape legal liability. A key feature of the Andersen case is that Stability AI did not acquire the works at issue through a deal with a licensee^[109]—instead, it acquired them through a third-party data compiler without any license.^[110] Supposing the Andersen plaintiffs manage to recover against Stability AI, the same reasoning could not be applied in a case against an AI firm with a licensed training set of user-generated content.

However, licensing user-generated content through adhesive consumer contracts is ineffective against the ethical problems with AI training. Operating under a Lockean theory of intellectual property law, the licensee platforms might insist that they do compensate users for their labor in that they provide a service in return for that labor. However, as discussed above, platforms introduced the specific provisions allowing for AI training only recently and well after many users initially manifested assent to their transactions with those platforms.^[111] For users who joined licensee platforms years before the emergence of generative AI models,^[112] the proposition is not “give us a license and you may use our platform”; rather, it is the ultimatum, “give us a license and you may continue to use our platform. Otherwise, leave.”^[113] The discussion, then, should not be whether users are adequately compensated but whether the licenses are enforceable at all.^[114] Regardless, these licenses miss the mark when it comes to Lockean copyright ethics for want of additional consideration. Finding the ethical breach is even easier under a personality theory of copyright, which problematizes any use of a personality-imbued work that strips its creator of self-determination, even if the creator is pressured into yielding partial legal control over that work.^[115]

Consumer contracts are likewise ineffective devices for the mitigation of popular backlash to AI training. In fact, the adhesive licensing provisions themselves form part of the controversy.^[116] Current public outcry, vocal as it is, may not even reflect the true prevalence of negative sentiment regarding AI training.^[117] Consumer contracts are useful to businesses in that consumers tend to feel bound by their terms, but those same consumers do not necessarily believe the terms to be fair.^[118] All told, the practice of training AI models on user-generated content remains unethical and unpopular despite users’ assent to fine print permitting the practice. A better approach is needed.

III. A Proposed Solution in State Consumer Protection Law

Despite the strain they place upon traditional contract doctrines, adhesive consumer contracts are accepted because they can efficiently serve modern consumers’ wants and needs.^[119] This being the case, drafters are afforded broad leeway to impose terms that smack of unfairness.^[120] Coerced assent to AI training is just the latest manifestation. As a response, the law provides safeguards to keep consumer contracts within certain boundaries of fairness.^[121] Consumer protection statutes are particularly powerful tools for doing so, as they enable governments to proscribe conduct that businesses with the power of the drafter’s pen might otherwise get away with.^[122] This Part will explain why the imposition of an “opt-in” regime via consumer protection law would solve AI’s user-generated content problems, then proposes a particular method of doing so through state consumer protection statutes.

A. Opt-In Is Effective

Consumer protection law provides a way to shape trade practices for the better. The particulars of this method are laid out in subpart III(B), infra, but the gist is simple: consumer protection law provides a route to effectively forbid training AI models on user-generated content without separate, specific, affirmative consent.

Several major platforms’ training data collection programs operate on an “opt-out” basis.^[123] This trend mirrors the current landscape of state data privacy law.^[124] Just like opt-out schemes for the sale of personal data, user-generated content licenses for AI make users behave in ways they would often prefer not to by default.^[125] Swapping opt-out for opt-in would place the control back in users’ hands. True, the change would essentially have the effect of swapping the current default rule for its opposite, but the difference would be material. After all, content can always be added to new training sets but generally cannot be removed from a trained model—at least, for now.^[126] An opt-in requirement would also prevent consumers with low digital literacy from unknowingly having their content used without their consent.^[127]

1. Opt-In Means Ethical Use of User-Generated Content.—An opt-in paradigm would provide a route to ethical inclusion of user-generated content in AI models. Under a labor theory of intellectual property, the alienation of one’s works is right so long as it is voluntary.^[128] A user who is entirely willing to contribute content to AI training datasets without receiving any form of compensation beyond use of the platform collecting such content should be entitled to do so. Likewise, a user who is unwilling to contribute to AI training should be permitted to withhold content. An opt-in paradigm achieves both ends.

While the inquiry is not quite so simple under a personality theory of property, the author’s will remains the touchstone in the unique circumstances discussed here. The alienation of personality-infused property is somewhat paradoxical in that Hegel and his successors equate alienation with abandonment—and if one abandons a thing by withdrawing one’s personality from it, what right has one to decide what happens to that thing?^[129] This might pose a problem in the case of voluntarily giving up, say, a physical painting or manuscript in exchange for nothing. But user-generated content online is a horse of a different color. Media posted online can remain linked to the author’s personhood—say, on the author’s personal Facebook page—while simultaneously being reproduced elsewhere, including in AI training datasets. Therefore, authors retain an interest in the destiny of their works.^[130] An up-front opt-in system would allow authors to determine whether they will participate in AI training according to their own personalities without ever being forced to act any other way.

2. Opt-In Is Popular.—Likewise, an opt-in system would address the causes of public backlash to AI training practices. As discussed above, such sentiments are often couched in a rhetoric of “theft,” reflecting that the primary objection is the default non-consensual (or, via contract, artificially consensual) nature of AI training licenses.^[131] An opportunity for users to exercise their wills before any content is incorporated into training data would address this concern. While opt-out systems provide at least some opportunity for choice, it may come too late for those users who do not want to contribute any content to AI training. Besides, if the analogous issue of personal data privacy is any indicator, a large majority of American users would likely prefer an opt-in requirement for AI training.^[132]

The consumers are crying out for protection. As luck would have it, state governments have purpose-built tools for such issues. An effective opt-in requirement for the incorporation of user-generated content into AI training datasets could be achieved through state consumer protection laws.

B. The Mechanism: State UDAP Statutes

State-level Unfair and Deceptive Acts and Practices (UDAP) law provides the right mechanism to impose a de facto opt-in requirement. Every state has some form of UDAP statute that allows the state to protect consumers and allows consumers to protect themselves.^[133] By doing so, these statutes have an advantage over federal law; federal consumer protection law can do the former but not the latter.^[134] Two hallmarks of state UDAP laws are their applicability to most all “personal consumer transactions involving products and services” and their “choice of a principle . . . over a rule.”^[135] The latter means that UDAP laws center value-laden, flexible language like “unfair.”^[136] Some states lean into the adaptability of such language, making general prohibitions of “deceptive acts” in trade with relatively few elaborations.^[137] Other states, though, follow the National Conference of Commissioners on Uniform State Laws’ model, the Uniform Deceptive Trade Practices Act, by listing specific practices they define to be examples of unfair or deceptive conduct.^[138] These prohibitions can be very granular—for example, California defines as a deceptive trade practice “[a]dvertising furniture without clearly indicating that it is unassembled if that is the case.”^[139] A solution to AI’s user-generated content problems exists in this ability to proscribe narrow categories of conduct.^[140] For the sake of simplicity, this subpart will explore this possibility mainly with reference to Texas’s Uniform Act-style UDAP law, the Texas Deceptive Trade Practices Act (DTPA).^[141]

It is relatively unlikely that consumer protection suits under UDAP statutes’ broadest protections could cause a shift to an opt-in system. For example, one of the most used protections of Texas’s DTPA is its general protection against companies “representing that goods or services have sponsorship, approval, characteristics . . . or quantities which they do not have.”^[142] Perhaps a user could argue that platforms, especially those that allow users to monetize their content, implicitly represent that they will not make unwanted copies of user-generated content. However, the practice of licensing user-generated content for AI training would likely not be found a violation of such a broad protection unless a platform made an affirmative representation indicating that it would not do so.^[143] Most state UDAP statutes, including the DTPA, also allow consumers to recover for unconscionable actions.^[144] The DTPA defines an “[u]nconscionable action or course of action” as “an act or practice which, to a consumer’s detriment, takes advantage of the lack of knowledge, ability, experience, or capacity of the consumer to a grossly unfair degree.”^[145] Texas courts’ evaluations of unconscionability are hard to predict, depending on factfinders’ decisions as to whether the “unfairness was glaringly noticeable, flagrant, complete[,] and unmitigated.”^[146] Unconscionable acts claims against AI firms stand a chance of success, especially because the relevant question is the nature of the practice of AI training rather than the contractual provisions enabling tech companies to do so.^[147] But the true nature of UDAP laws’ potential to bring about an opt-in system is not judicial—it is legislative.

Instead of depending on broad existing protections, state legislatures should take initiative.^[148] Defining, say, “the use, acquisition, or licensing of user-generated content for the purposes of generative AI model training without obtaining users’ specific informed consent”^[149] as a deceptive trade practice would very likely bring about an opt-in system. Though taking such action in response to AI would be innovative, it would not be unprecedented. As mentioned previously, legislatures in those states that follow the Uniform Deceptive Trade Practices Act’s model have enacted similarly specific provisions in an attempt to eliminate particular harmful practices.^[150] Prescribing cultural policy like this, through a provision with limited scope and a clear normative foundation, is preferable to the status quo as it achieves the objectives of copyright law without carrying the same baggage: namely, copyright law’s “unmooring” from theory due to market incentives and accreted legal complexity.^[151] This is perhaps more true in AI regulation than anywhere else, as AI has the potential to cause unprecedented distortion in copyright law.^[152] Of course, the tech companies would not simply roll over in response to the imposition of a de facto opt-in requirement—but fortunately, state UDAP laws are built to ensure effective consumer protection.

1. Providing a Private Right of Action: AI Training and Damages.—A significant benefit of UDAP laws for states is that they provide a sort of private law enforcement.^[153] However, enforcing the law through the provision of a private right of action is complicated in that it implicates all of the usual limitations of private lawsuits. Therefore, a natural hurdle to enforcing an opt-in requirement for AI training with UDAP law is in those laws’ damages requirements.^[154] Texas’s DTPA specifically requires consumers’ damages to stem from pecuniary loss or mental anguish.^[155] An unauthorized copy of some piece of user-generated content, a licensee might argue, is not worth any amount of money to the user. Perhaps that is true of the average Facebook post; in DTPA actions, Texas courts look to the fair market value of things lost by the consumer,^[156] which is probably exactly zero dollars for most user-generated content. But popular internet users are able to monetize their content in a variety of ways. Whether they make money through sponsorships, affiliate marketing kickbacks, or even art auctions, there exist users whose content is economically valuable and who would serve as useful private attorneys general.^[157] Mental anguish damages are also available under the DTPA and can serve as the basis for a claim where no great economic injury is alleged.^[158] The evidentiary burden to prove mental anguish is not inconsequential—plaintiffs need “direct evidence of the nature, duration, and severity of their mental anguish, thus establishing a substantial disruption in [their] daily routine”^[159]—but AI training is certainly capable of producing such evidence. For example, consider the (mis)use of AI models trained on photos of a person uploaded as user-generated content to generate sexually explicit images of that person without consent.^[160] Such events must produce mental anguish—especially for those victimized by AI-generated child sexual abuse material.^[161]

If the suggested legislation were passed, then, there would be plenty of consumers with worthy claims for damages. In Texas, tech companies would also have virtually no way to make users waive those claims. The DTPA allows waiver of a prescribed form, but the consumer must be represented by an attorney in the transaction, and the parties must have roughly commensurate bargaining power.^[162] This rules out waiver in adhesive consumer contracts. The lawsuits, then, would roll in, placing an economic strain upon noncompliant companies. This pressure would be limited, in some cases, due to “single-file” provisions in consumer contracts.^[163] In the absence of class actions, though, UDAP laws’ provision of exemplary damages and attorneys’ fees would significantly increase potential awards.^[164]

Of course, licensee tech companies are careful to limit their liability to negligible dollar amounts in their terms.^[165] If states adopt the recommended UDAP provision, though, that effort may be for naught. Texas courts, for example, interpret the provisions of the DTPA prohibiting waiver in most consumer transactions to mean that a defendant cannot limit its liability for claims alleging violations of the Act’s specific protections.^[166] Tech companies would also have equitable relief to worry about; UDAP laws, as tools of law enforcement, allow courts to issue orders enjoining violations.^[167] It is likely, then, that licensee tech companies would conclude that their chance of winning multiple private UDAP lawsuits is not sufficient to outweigh the risk of courts enjoining their licensing practices and billing them for exemplary damages.

2. Public Law Enforcement.—But private attorneys general are not the only parties able to enforce UDAP statutes—good old-fashioned state attorneys general can, too.^[168] Attorneys general can bring actions in the public interest to enjoin practices that are specifically proscribed by UDAP statutes or fall within catch-all prohibitions of unfair conduct.^[169] A prohibition of non-consensual AI training on user-generated content would, in fact, give state attorneys general another tool to engage in the growing trend of public enforcement actions against tech companies’ unfair practices. The Texas DTPA itself was used in conjunction with Texas data privacy laws to extract a billion-dollar settlement from Meta for its collection of Texans’ facial geometry data.^[170] In one action, the Attorneys General of forty states even used consumer protection law as a vehicle to set up a kind of opt-in regime for certain data; namely, Google agreed to “refrain from sharing a user’s precise location information with a third-party advertiser, absent [that user’s] express affirmative consent for sharing and use by that third party.”^[171] The public and private law enforcement functions of UDAP laws, taken together, provide a significant chance to solve AI’s user-generated content problem, though the solution is not without its limitations.

IV. Challenges to Consumer Protection-Driven Approaches

Statehouses around the country have challenged Big Tech before. Some attempts are successful and widely adopted, while the industry is incentivized to circumvent others. This Part will evaluate the industry’s likely response and challenges inherent to UDAP enforcement.

A. The Compliance Gambit

Any single state’s adoption of the proposed legislation would represent an improvement over the current state of play as it would at least protect the citizens of that jurisdiction. Whether the protection of all Americans would require mass adoption among the states, though, is another matter. This subpart will compare two paths the consumer protection approach to AI training could take: one in which nationwide compliance might be achieved after just a few key jurisdictions enact the law and another in which mass adoption would be required.

1. The Best Scenario: Broad Compliance à la Data Privacy Laws.—In the best-case scenario, just a handful of large states adopting the proposed legislation would lead to nationwide compliance. If data privacy laws are any indication, this outcome is even likely. Most any website operated by a large company Americans visit today will comply with at least one privacy law from without their jurisdiction and, depending on which state they reside in, potentially several.^[172] The two most impactful are the EU’s GDPR and the California Consumer Privacy Act (CCPA).^[173] Both laws have an extraterritorial reach.^[174] As a result, both European and Californian authorities can collect penalties from companies all over the United States.^[175] This reach has caused the proliferation of data protections like opt-ins for all but service-essential cookies in America.^[176] If the recommended legislation mirrors this pattern, its adoption in just a few jurisdictions—say, California, Texas, and Pennsylvania^[177]—could lead to nationwide compliance.

2. A Worse Outcome: Local Compliance à la Porn Bans.—Not all internet regulations see precautionary compliance outside of their own jurisdictions, though. As the current crop of state laws requiring age verification to view adult content online proves, tech companies are capable of tailoring legal compliance on a state-by-state basis.^[178]

Which path would the proposed legislation follow, then? The most likely answer lies in the difference between a law like the CCPA and a “porn ban.” Both laws impose high costs of compliance, but the scope of their protections and business incentives cause companies to think “better safe than sorry” of the former and cut their losses with the latter. The CCPA is, at its core, a piece of consumer protection legislation. Its protections follow Californians across state lines.^[179] Meanwhile, “porn ban” laws regulate adult websites directly via public law enforcement within particular states.^[180] This means that a company can completely escape liability by stopping all distribution of adult content in jurisdictions with bans. Considering that consumers can circumvent “porn bans” with near-trivial effort,^[181] pornographic websites are incentivized to make a dramatic exit from banning states while still reaping revenue from their residents.^[182] In contrast, a company could modify its website so as to collect zero data from any user within California but still be liable under the CCPA for mishandling traveling Californians’ data.^[183] Most likely, then, a consumer protection approach to AI training on user-generated content would result in large-scale compliance after passage in just a few states. UDAP laws are more like the CCPA than “porn bans” in that they allow individuals to enforce their protections.^[184] Furthermore, UDAP prohibitions apply to violators outside of their home states.^[185] All told, a path to mass compliance will not likely require mass adoption—but the full landscape of internet regulations reveals that this gambit is not without risk.

B. Standing in Suits Against Non-Licensed Opaque Models

A consumer protection approach to AI training will face challenges if AI firms go lower. Thus far, many popular consumer AI models have been trained on datasets compiled by nonprofit organizations.^[186] At least, the AI firms do not deny it, though they are cagey about exactly which datasets they use for each model so as not to reveal their methods to competitors.^[187] While nonprofits have compiled data massively without users’ specific consent, their datasets at least have the virtue of being free to the public and therefore searchable with the right technical know-how.^[188] The Andersen complaint even cites an AI training dataset search tool to show that the plaintiffs’ alleged injury is traceable to defendant Stability AI’s conduct.^[189]

Therein lies the rub. While the proposed legislation would provide a means to combat unfair licenses of user-generated content or unfair uses of above-board datasets, such a litigation-dependent solution may not be so effective against an AI company that decides to use a total black-box method. Consider the following scenario: An American AI firm purchases training data from a compiler outside United States jurisdiction that does not make its datasets public. But instead of receiving the data itself, the firm only purchases a right to access that data for such a time as is necessary to train a model. The firm trains a new model on that data, then loses any ability to access the dataset, and never knows exactly what the dataset contained. After the model is deployed, a digital artist comes to believe the model was trained on his art. How would that artist show that any injury was fairly traceable to the AI firm’s conduct? It is possible that the artist would be found to lack standing. However, he is not without hope. A showing that an AI model can “regurgitate” close copies of the artist’s works strongly supports the inference that the model was trained on that content.^[190] And, if a user has a large portfolio of content, a model producing content convincingly mimicking the style of that user would strongly support traceability.^[191]

Conclusion

There is much to be done about AI. The industry’s power demands, social effects, and increasing integration into business and daily life all call for sensible but potent policy approaches. But while the federal government promotes (and even uses)^[192] AI, problems with the way models are made fall by the wayside. This includes the fact that AI models are trained on user-generated content—including yours and mine—without permission. That particular training practice could prove unlawful, but that much is uncertain; in the meantime, it is definitely unethical and controversial. Luckily, our nation’s fifty policy laboratories have just the tools for the job. State legislatures should use their powers to protect consumers from the AI industry’s user-generated content addiction; in doing so, they would make strides in ensuring that the next generation of models is trained right.

. See, e.g., Dara Kerr, AI Brings Soaring Emissions for Google and Microsoft, a Major Contributor to Climate Change, NPR (July 12, 2024), https://www.npr.org/2024/07/12/g-s1-9545/ai-brings-soaring-emissions-for-google-and-microsoft-a-major-contributor-to-climate-change [https://perma.cc/NC4B-K77Y] (surveying the emissions implications of AI’s notorious energy demands and much of the industry’s opacity with respect thereto); Olga Akselrod & Cody Venzke, How Artificial Intelligence Might Prevent You from Getting Hired, ACLU (Aug. 23, 2023), https://www.aclu.org/news/racial-justice/how-artificial-intelligence-might-prevent-you-from-getting-hired [https://perma.cc/V4ZQ-KRHG] (explaining how AI tools replicate human racism, sexism, and other bigotries in the employment arena). ↑
. To do so would play into the industry’s hand; OpenAI CEO Sam Altman has made downright eschatological statements about the very products he sells in an apparent ploy to position current AI leaders as uniquely powerful oracles whose economic positions must be maintained. See Peter Guest & Morgan Meaker, Sam Altman’s Second Coming Sparks New Fears of the AI Apocalypse, Wired (Nov. 22, 2023), https://www.wired.com/story/sam-altman-second-coming-sparks-new-fears-ai-apocalypse/ [https://perma.cc/K2Q9-69RS] (critiquing “Altman’s strategy of raising billions of dollars . . . while . . . profess[ing] fears of extinction-level events”). ↑
. Google researchers created the transformer model architecture in 2017. Ashish Vaswani et al., Attention Is All You Need 2 (June 12, 2017) (unpublished manuscript), https://arxiv.org/pdf/1706.03762v1 [https://perma.cc/YNQ8-L44Z]. ChatGPT and Midjourney debuted to end users in 2022. Navigating the AI Revolution Timeline of 2023-2024, AI-PRO
(Aug. 7, 2024), https://ai-pro.org/learn-ai/articles/navigating-the-ai-revolution-timeline-of-2023-2024/ [https://perma.cc/P79S-M6WT]. ↑
. Vaswani et al., supra note 3, at 2. ↑
. See id. at 8 (noting that the initial transformer model significantly outperformed previous single models at significantly lower training cost). ↑
. Rick Merritt, What Is a Transformer Model?, Nvidia (Mar. 25, 2022), https://blogs.nvidia
.com/blog/what-is-a-transformer-model/ [https://perma.cc/R7P7-3APR]. ↑
. See, e.g., Kate Payne, An AI Chatbot Pushed a Teen to Kill Himself, a Lawsuit Against Its Creator Alleges, Associated Press (Oct. 25, 2024), https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0 [https://perma.cc/AYK5-MAMR] (reporting that a chatbot designed for social interaction, speaking as a character from Game of Thrones, encouraged a teenage boy to “come home” moments before he committed suicide). ↑
. See OpenAI, GPT-4 Technical Report 4–5 (Mar. 4, 2024) (unpublished manuscript) [hereinafter OpenAI, GPT-4], https://arxiv.org/pdf/2303.08774 [https://perma.cc/47EJ-DBT8] (documenting the high scores models behind ChatGPT were able to achieve on various scholastic and professional examinations). But see Sam Skolnik, Lawyer Sanctioned over AI-Hallucinated Case Cites, Quotations, Bloomberg L. (Nov. 26, 2024), https://news.bloomberglaw.com/
litigation/lawyer-sanctioned-over-ai-hallucinated-case-cites-quotations [https://perma.cc/3LHU-8V78] (documenting a Texas attorney’s punishment for submitting a brief containing AI-generated citations to fake cases); Casey Newton (@crumbler), Threads (May 23, 2024), https://www.threads.net/@crumbler/post/C7VGpYSPOgT [https://perma.cc/5ZPF-Z73F] (preserving for posterity that a Google AI model generated the text, “[a]ccording to UC Berkeley geologists, eating at least one small rock per day is recommended”). ↑
. See Iman Mirzadeh et al., GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models 7–10 (Aug. 27, 2025), https://arxiv.org/pdf/2410.05229 [https://perma.cc/3DM4-YU3Z] (observing that transformers performed well on elementary-level math problems that were likely included in their training data but significantly worse when the names or numerical values were changed); Amit Prakash, Emergent Properties of Large Language Models (LLMs) Including ChatGPT, ThoughtSpot (Feb. 22, 2023), https://www.thoughtspot
.com/data-trends/ai/large-language-models-vs-chatgpt [https://perma.cc/R5SX-2MWZ] (receiving nonsense outputs from ChatGPT in response to problems ranging in difficulty from the famously unsolved “3n+1” problem to arithmetic with large numbers). ↑
. Mirzadeh et al., supra note 9, at 10; see Apple Intelligence, Apple, https://www.apple.com/apple-intelligence/ [https://perma.cc/9AUS-V7ER] (marketing AI integration in new Apple products). ↑
. Ina Fried, For AI Firms, Anything “Public” Is Fair Game, Axios (Apr. 5, 2024), https://www.axios.com/2024/04/05/open-ai-training-data-public-available-meaning [https://perma.cc/JY6N-SWSP] (“‘Publicly available’ can sound like the company has permission to use the information—but, in many ways, it’s more like the legal equivalent of ‘finders, keepers.’”). ↑
. Id. ↑
. See, e.g., Class Action Complaint at 10–11, 13–15, Authors Guild v. OpenAI Inc., No. 1:23-cv-08292-SHS (S.D.N.Y. Dec. 4, 2023) (alleging that OpenAI models were trained on books by celebrity authors, including George R.R. Martin, Jodi Picoult, and John Grisham). ↑
. Marcelo Luis Barbosa dos Santos, The “So-Called” UGC: An Updated Definition of User-Generated Content in the Age of Social Media, 46 Online Info. Rev. 95, 108 (2022) (emphasis omitted). The privacy implications of AI models incorporating other personally identifiable information are beyond the scope of this Note. ↑
. See How ChatGPT and Our Foundation Models Are Developed, OpenAI, https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-foundation-models-are-developed [https://perma.cc/NWC8-K6JS] (“A significant portion of online content involves information about people, so our training data may incidentally include personal information.”). ↑
. Model Training, C3.ai, https://c3.ai/glossary/data-science/model-training/ [https://perma.cc/8V87-X27J]. ↑
. Id. ↑
. See Cal Newport, What if A.I. Doesn’t Get Much Better Than This?, New Yorker (Aug. 12, 2025), https://www.newyorker.com/culture/open-questions/what-if-ai-doesnt-get-much-better-than-this [https://perma.cc/W8FA-WMRN] (documenting the hype cycle and massive increases in model size following OpenAI’s 2020 publication of wishful “scaling law[s]” forecasting exponential increases in performance). ↑
. The sloppy 2025 releases of OpenAI’s GPT-5 and xAI’s Grok 4, for instance, were harbingers of the industry’s realization of scaling’s limits. Id. Such limits were theorized years earlier by machine learning researchers. Neil C. Thompson et al., The Computational Limits of Deep Learning 15 (July 27, 2022) (unpublished manuscript), https://arxiv.org/abs/2007.05558 [https://perma.cc/F799-EKPE]; see Neil C. Thompson et al., Deep Learning’s Diminishing Returns: The Cost of Improvement Is Becoming Unsustainable, IEEE Spectrum, Oct. 2021, at 51, 53–54 (finding that models have experienced proportionally less improvement per unit of computational power over time). ↑
. See OpenAI, GPT-4, supra note 8, at 4–5 (reporting that GPT-3.5 scored in the 10th percentile on the Uniform Bar Exam, while GPT-4 scored in the 90th percentile). ↑
. Pablo Villalobos et al., Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI (June 6, 2024), https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data [https://perma.cc/F6GP-KDAH]. ↑
. Ilia Shumailov et al., AI Models Collapse When Trained on Recursively Generated Data, 631 Nature 755, 759 (2024). ↑
. Clare Duffy, Social Media Platforms Are Using What You Create for Artificial Intelligence. Here’s How to Opt Out, CNN (Sep. 23, 2024), https://www.cnn.com/2024/09/23/tech/social-media-ai-data-opt-out/index.html [https://perma.cc/H22A-YLTA] (listing X, Meta platforms, Reddit, and more among social networking platforms that license users’ data to AI firms or use it to train in-house AI products); OpenAI, GPT-4, supra note 8, at 2 (listing “data licensed from third-party providers” as a component of ChatGPT’s training dataset). ↑
. Nicholas Carlini et al., Extracting Training Data from Diffusion Models, 32 USENIX Sec. Symp. 5253, 5266 (2023). ↑
. OpenAI and Journalism, OpenAI (Jan. 8, 2024), https://openai.com/index/openai-and-journalism/ [https://perma.cc/28M6-NTM7]. ↑
. See Case Tracker: Artificial Intelligence, Copyrights and Class Actions, BakerHostetler, https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificial-intelligence-copyrights-and-class-actions/ [https://perma.cc/L9R5-TYSL] (monitoring a dozen cases currently in progress). ↑
. 700 F. Supp. 3d 853 (N.D. Cal. 2023). ↑
. Class Action Complaint at 6, Andersen v. Stability AI Ltd., 700 F. Supp. 3d 853 (N.D. Cal. 2023) (No. 3:23-cv-00201-WHO) [hereinafter Andersen Complaint]. ↑
. E.g., Sarah Andersen (@SarahCAndersen), X, https://x.com/sarahcandersen [https://perma.cc/YAT5-6GAN]. ↑
. Andersen, 700 F. Supp. 3d at 870–75 (dismissing DMCA, right of publicity, and unfair competition claims, among others). ↑
. Id. at 864 (internal quotation marks omitted). Note that the issue is how the AI firm obtained the works for training purposes, not whether or how they were used by model outputs. Challenges of the latter type have already failed. See, e.g., Order on Fair Use at 27–30, Bartz v. Anthropic PBC, No. C 24-05417 (N.D. Cal. 2025) (ruling that AI output from a model trained on the plaintiffs’ novels was fair use because it did not harm the market for those works—but leaving open the possibility that it might if the model had produced “knockoffs”). ↑
. 788 F. Supp. 3d 1026 (N.D. Cal. 2025). ↑
. Notice of Motion and Motion for Leave to File Third Amended Consolidated Complaint at 1–3, Kadrey, 788 F. Supp. 3d 1026 (N.D. Cal. 2025) (No. 3:23-cv-03417-VC). ↑
. Reply in Support of Plaintiffs’ Motion for Leave to File Third Amended Consolidated Complaint at 1, Kadrey, 788 F. Supp. 3d 1026 (N.D. Cal. 2025) (No. 3:23-cv-03417-VC). ↑
. James Grimmelmann, The Ethical Visions of Copyright Law, 77 Fordham L. Rev. 2005, 2014 (2009). ↑
. Justin Hughes, The Philosophy of Intellectual Property, 77 Geo. L.J. 287, 300 (1988). ↑
. Id. at 300–02. ↑
. Some AI companies have struck deals to compensate professionals, such as journalists, but creators of user-generated content have not yet seen compensation. Melissa Heikkilä, AI Companies Are Finally Being Forced to Cough Up for Training Data, MIT Tech. Rev. (July 2, 2024), https://www.technologyreview.com/2024/07/02/1094508/ai-companies-are-finally-being-forced-to-cough-up-for-training-data/ [https://perma.cc/GP5G-KVTK]; see Eli Tan, When the Terms of Service Change to Make Way for A.I. Training, N.Y. Times (June 26, 2024), https://www.nytimes.com/2024/06/26/technology/terms-service-ai-training.html [https://perma.cc/N7YW-QREU] (noting that multiple tech companies have changed terms of service to use data for AI training without giving any additional consideration). ↑
. Hughes, supra note 36, at 325–29. ↑
. Id. at 328 (“[S]ince 1976, publication has not been required for federal copyright protection, and even before 1976, common law copyright or state statutes protected the author’s unpublished work . . . .”). ↑
. Oren Bracha & Talha Syed, A Law and Political Economy of Intellectual Property, 103 Texas L. Rev. 1403, 1412–13 (2025). ↑
. Hughes, supra note 36, at 329. ↑
. Oren Bracha, The Ideology of Authorship Revisited: Authors, Markets, and Liberal Values in Early American Copyright, 118 Yale L.J. 186, 192–93 (2008). ↑
. Hughes, supra note 36, at 330–32. ↑
. Margaret Jane Radin, Property and Personhood, 34 Stan. L. Rev. 957, 1014–15 (1982). ↑
. See id. at 1013 n.202 (noting that personhood “seems relevant” in copyright law but not exhaustively reviewing the issue). Radin’s avoidance of intellectual property was justified given the breadth of concepts protected by copyright. See Hughes, supra note 36, at 339–40 (discussing variance in the incorporation of personality into intellectual property). ↑
. Radin, supra note 45, at 1014 n.202 (discussing how California laws granting moral rights and rights to royalties in art align with Hegelian personhood theory and its legal antecedents). ↑
. Bracha & Syed, supra note 41, at 1413. ↑
. See infra subpart I(C) (discussing popular backlash to AI training). Scholar James Grimmelmann describes Lockean ideas as the “default ethical vision” of copyright and notes that the labor theory of copyright is central to the Copyright Act of 1976. Grimmelmann, supra note 35, at 2014–15. ↑
. Bracha & Syed, supra note 41, at 1414. ↑
. For example, works like those at issue in the Andersen case discussed above are imbued with personality. Andersen Complaint, supra note 28, at 6 (describing plaintiff Andersen’s artworks as “semi-autobiographical” and “find[ing] the humor in living as an introvert”). ↑
. Bryn Healy, Should Americans’ Consent Be Required for Their Data to Be Used to Train AI Models?, YouGov (June 27, 2024), https://today.yougov.com/technology/articles/49867-americans-consent-be-required-data-used-to-train-ai-models-artificial-intelligence-poll [https://perma.cc/8AR4-T37V] (reporting that 44% of American adult respondents said AI should not be allowed to train on open internet, only 20% said it should be allowed, and the remainder were unsure). ↑
. Derek Robertson, Exclusive Poll: Americans Favor AI Data Regulation, Politico (May 6, 2024), https://www.politico.com/newsletters/digital-future-daily/2024/05/06/exclusive-poll-americans-favor-ai-data-regulation-00156350 [https://perma.cc/LNC2-KY3T]; About, The AI Pol’y Inst., https://theaipi.org/about/ [https://perma.cc/C4RD-NLKQ]. ↑
. E.g., Marques Brownlee, The Truth About AI Getting “Creative”, at 08:23 (YouTube, Dec. 9, 2022), https://www.youtube.com/watch?v=0gNauGdOkro [https://perma.cc/3KCC-M5P3] (opining to an audience of nearly twenty million subscribers that “AI steals art without consent”); Tom Nicholas, How AI Theft Is Killing the Internet, at 00:00 (YouTube, Oct. 1, 2024), https://www.youtube.com/watch?v=ihRr7diYuKA [https://perma.cc/ER3K-8VK9] (representing YouTube video transcripts used in AI training datasets as “stolen”). ↑
. Tan, supra note 38; Craig Hale, Adobe Users Are Furious About the Company’s Terms of Service Change to Help It Train AI, Yahoo! (June 7, 2024), https://tech.yahoo.com/ai/articles/
adobe-users-furious-companys-terms-170102270.html [https://perma.cc/SS6U-WWU9]. ↑
. Tan, supra note 38. ↑
. Id. ↑
. See Procreate (@Procreate), X (Aug. 18, 2024, at 6:17 PM), https://x.com/Procreate/status/
1825311104584802470 [https://perma.cc/F4Q6-GNJW] (displaying, at approximately 0:02, a message from a user expressing their delight that Procreate would not use user-generated content for training “like [A]dobe”). ↑
. AI Is Not Our Future, Procreate, https://procreate.com/ai [https://perma.cc/5R6M-ZLXG]. ↑
. Bracha & Syed, supra note 41, at 1416–18. ↑
. Even Bracha and Syed acknowledge that some forms of exclusive IP rights are justified so long as they are properly conceptualized as social relations comprising “correlative entitlement/disentitlement pairs,” are embedded in law with clear normative justification, and have purposes other than pure rent-seeking. Id. at 1411–12, 1426–27. They specifically highlight policy responses to generative AI as an area where such reforms are needed before inconsistent fair use outcomes push copyright law further from theory. Id. at 1427–28. ↑
. E.g., Michele E. Gilman, Five Privacy Principles (from the GDPR) the United States Should Adopt to Advance Economic Justice, 52 Ariz. St. L.J. 368, 412–14 (2020) (suggesting that the United States should adopt the GDPR’s core protections); Caitlin Chin-Rothmann, Protecting Data Privacy as a Baseline for Responsible AI, Ctr. for Strategic & Int’l Stud. (July 18, 2024), https://www.csis.org/analysis/protecting-data-privacy-baseline-responsible-ai [https://perma.cc/
R8VZ-XKSK] (inquiring into “opportunities for EU and U.S. regulatory alignment” in an AI-specific context). ↑
. See Fed. Trade Comm’n, A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services 34–35 (2024) [hereinafter Fed. Trade Comm’n, A Look] (summarizing the core rights granted by the GDPR). See generally 2016 O.J. (L 119) 1–88 (restricting certain data collection and processing activities with respect to the data of persons in the EU and granting such persons certain rights against data controllers and processors). ↑
. 2024 O.J. (L 1689) 1–144. ↑
. 2019 O.J. (L 130) 92–125. ↑
. See Philip N. Yannella & Tim Dickens, New State Privacy Laws Creating Complicated Patchwork of Privacy Obligations, Reuters (June 11, 2024), https://www.reuters.com/legal/
legalindustry/new-state-privacy-laws-creating-complicated-patchwork-privacy-obligations-2024-06-07/ [https://perma.cc/8JJW-6PTY] (arguing that the current “patchwork” of state data privacy laws is too complicated for businesses and that a preemptive federal law is preferable). ↑
. Id. ↑
. Dell Cameron, Surprise! The Latest ‘Comprehensive’ US Privacy Bill Is Doomed, Wired (June 27, 2024), https://www.wired.com/story/apra-privacy-bill-doomed/ [https://perma.cc/T74Q-2J8K]. ↑
. Id. ↑
. Id. ↑
. See Yannella & Dickens, supra note 66 (discussing operational complexities created by targeted state privacy laws and businesses’ preference for comprehensive federal regulation). ↑
. See Kathryn Watson & Libby Cathey, Meta, Amazon and Tech CEOs Make $1 Million Investments in Trump’s Inauguration, CBS News (Dec. 16, 2024), https://www.cbsnews.com/
news/trump-tech-ceos-meta-amazon-donate-millions-inauguration/ [https://perma.cc/9MVB-PFZR] (naming Jeff Bezos, Mark Zuckerberg, Sam Altman, Elon Musk, and Tim Cook among the data profiteers who inspired then president-elect Trump to gleefully exclaim “everyone wants to be my friend”). ↑
. 2016 O.J. (L 119) 43–44. ↑
. E.g., How ChatGPT and Our Foundation Models Are Developed, OpenAI, https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-foundation-models-are-developed [https://perma.cc/49W2-2TQE] (“These models do not store or retain copies of the data they are trained on.”). ↑
. Sara Morrison, The Tricky Truth About How Generative AI Uses Your Data, Vox (July 27, 2023), https://www.vox.com/technology/2023/7/27/23808499/ai-openai-google-meta-data-privacy-nope [https://perma.cc/L8XE-4J7E] (“ChatGPT’s opt out and data deletion tools, for example, are only for data collected by people using the ChatGPT service.”); OpenAI Privacy Request Portal, OpenAI, https://privacy.openai.com/policies?submissionGuid=28bf34ce-f808-4419-b72c-e9227cd18e6a [https://perma.cc/JG5A-8JQL] (providing a venue to remove personal data from ChatGPT “responses”—this language makes it very likely that these requests do not force ChatGPT to “unlearn” anything about the requester, but rather adds a post-inference filter to prohibit any personal data from appearing to end users). ↑
. See supra note 30 and accompanying text. ↑
. 2019 O.J. (L 130) 113. ↑
. Id. at 114. ↑
. E.g., Sarah Andersen, supra note 29 (including “no A.I.” in her bio section). ↑
. In the first decision under the Directive, a plain-language statement purporting to prohibit the reuse of certain copyrighted photographs was held not to be “machine-readable” within the meaning of the Directive. Landgericht Hamburg [LG Hamburg] [Hamburg Regional Court] Sep. 27, 2024, 310 O 227/23, at 6 (¶ 42), https://www.landesrecht-hamburg.de/bsha/document/
NJRE001588058 [https://perma.cc/RA94-GX97]. ↑
. Landgericht, supra note 80. The plaintiff, unnamed in the opinion, was photographer Robert Kneschke, whose works were incorporated into the image dataset LAION. Germany—Hamburg District Court, 310 O.22723, LAION v Robert Kneschke, [27 September 24], Eur. Union Intell. Prop. Off., https://www.euipo.europa.eu/sk/law/recent-case-law/germany-hamburg-district-court-310-o-22723-laion-v-robert-kneschke [https://perma.cc/G93M-GFLH]. ↑
. Anton Korinek, Why We Need a New Agency to Regulate Advanced Artificial Intelligence: Lessons on AI Control from the Facebook Files, Brookings (Dec. 8, 2021), https://www.brookings.edu/articles/why-we-need-a-new-agency-to-regulate-advanced-artificial-intelligence-lessons-on-ai-control-from-the-facebook-files/ [https://perma.cc/Q4LY-TLUJ]; Brad Smith, Foreword to Microsoft, Governing AI: A Blueprint for the Future 7
(2023), https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/
presentations/CSR/Governing-AI-A-Blueprint-for-the-Future.pdf [https://perma.cc/Z7YF-UMRN]. ↑
. “Enterprise AI” models are tools for business operations and are typically trained on specific business data, while “consumer AI” models are trained on generalized public information and designed for individual users. Jeffrey Erickson, What Is Enterprise AI?, Oracle (Aug. 29, 2024), https://www.oracle.com/artificial-intelligence/enterprise-artificial-intelligence/ [https://perma.cc/X4TJ-LQQF]. ↑
. Korinek, supra note 82. ↑
. See id. (comparing single-objective regulatory efforts to the greed of King Midas). ↑
. See id. (describing how AI products from Amazon, Google, and Microsoft reproduced racist and sexist biases); Craig S. Smith, OpenAI’s Sam Altman to Congress: Regulate Us, Please!, Forbes (May 17, 2023), https://www.forbes.com/sites/craigsmith/2023/05/16/openais-sam-altman-to-congress-regulate-us-please/ [https://perma.cc/WZ4Q-TN8A] (recounting a congressional hearing over AI regulation that largely focused on AI output, including OpenAI CEO Sam Altman’s call for “limits on what a deployed model is capable of”). ↑
. See Smith, supra note 82, at 4–5 (emphasizing the need for “guardrails” but not limitations on development). ↑
. See White House Off. of Sci. & Tech. Pol’y, Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People 5–7 (2022), https://bidenwhitehouse.archives.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf [https://perma.cc/9GNS-5PG3] (focusing on discrimination, safety, and keeping humans in the loop); Exec. Order No. 14110, 88 Fed. Reg. 75191, 75191–93 (Nov. 1, 2023) (focusing on similar topics along with privacy and economic development concerns). ↑
. Katie Gilbert, U.S. Government Regulators May Be Favoring Their Future Private-Sector Employers, Yale Insights (Oct. 26, 2023), https://insights.som.yale.edu/insights/us-government-regulators-may-be-favoring-their-future-private-sector-employers [https://perma.cc/W6LL-Q6A7]; Ivana V. Katic & Jerry W. Kim, Caught in the Revolving Door: Firm-Government Employee Mobility as a Fleeting Regulatory Advantage, 35 Org. Sci. 281, 298 (2024) (finding that “revolving door” employment changes give firms significantly better regulatory outcomes in the short term). ↑
. Benjamin L.W. Sobel, Artificial Intelligence’s Fair Use Crisis, 41 Colum. J.L. & Arts 45, 91 (2017). ↑
. Id. at 92. ↑
. Id. at 91. ↑
. Cade Metz, OpenAI Completes Deal That Values Company at $157 Billion, N.Y. Times (Oct. 2, 2024), https://www.nytimes.com/2024/10/02/technology/openai-valuation-150-billion.html [https://perma.cc/36UA-7KQV]. ↑
. Noel Randewich, Nvidia Surpasses $3.6 Trillion Market Value After Trump Win, Reuters (Nov. 7, 2024), https://www.reuters.com/technology/nvidia-surpasses-36-trillion-market-value-after-trump-win-2024-11-07/ [https://perma.cc/SE5X-VGY8]. ↑
. Recording Indus. Ass’n of Am. v. Diamond Multimedia Sys. Inc., 180 F.3d 1072, 1081 (9th Cir. 1999) (holding that an MP3 player is not a “digital audio recording device”). ↑
. E.g., YouTube Partner Program Overview & Eligibility, YouTube Help, https://support.google.com/youtube/answer/72851?hl=en&co [https://perma.cc/5FAE-4ZDG]. ↑
. Though your humble author wonders how much AI firms might be willing to pay not to train models on his nonsense. ↑
. See supra notes 54–59 and accompanying text. ↑
. E.g., Zachary Olson, Op-Ed: AI Art Is Art Theft and Should Be a Crime, The Eagle
(Feb. 28, 2024), https://www.theeagleonline.com/article/2024/01/op-ed-ai-art-is-art-theft-and-should-be-a-crime [https://perma.cc/RWX3-RTTD] (arguing that “[a]rt generated by AI should be a crime, at least while AI models are made this way”). ↑
. E.g., X Terms of Service, X (Nov. 15, 2024) [hereinafter X ToS], https://cdn.cms-twdigitalassets.com/content/dam/legal-twitter/site-assets/terms-of-service-2024-11-15/en/x-terms-of-service-2024-11-15.pdf [https://perma.cc/KUF2-KGTG] (“By submitting, posting or displaying Content . . . you grant us a . . . royalty-free license . . . to use . . . such Content . . . . [This includes] use with and training of our machine learning and artificial intelligence models.”); Reddit User Agreement, Reddit (May 29, 2025), https://redditinc.com/policies/user-agreement [https://perma.cc/VD38-7PVM] (“[Our] license includes the right to use Your Content to train AI and machine learning models . . . .”). Meta, the company behind Facebook and Instagram, mentions its right to “create derivative works” in its Terms of Service without specifically mentioning AI. Terms of Service, Meta (Jan. 1, 2025) [hereinafter Meta ToS], https://mbasic.facebook.com/
legal/terms/plain_text_terms/ [https://perma.cc/EGA8-L4CF]. However, it notifies users of such usage in its privacy policy. Privacy Policy, Meta (Dec. 16, 2025), https://mbasic.facebook.com/
privacy/policy/printable/ [https://perma.cc/Z32M-RRL3] (“The provision of the Meta Products includes . . . automated processing . . . to: . . . enable creation of content like text, audio, images and videos, including through artificial intelligence technology. . . .”). ↑
. See Tan, supra note 38 (discussing revisions to multiple tech companies’ terms of service, including those that modified existing licenses to make room for AI training); Terms of Service, Facebook (Jan. 30, 2015) [hereinafter Facebook ToS], https://web.archive.org/web/
20170101192952/https://www.facebook.com/terms.php/ [https://perma.cc/FDC7-NQ9R] (licensing user-generated content in the version of terms users were subject to on Jan. 1, 2016). See generally Vaswani et al., supra note 3 (dating the emergence of the technological paradigm that mainstream AI models currently use to 2017). ↑
. Compare Facebook ToS, supra note 101 (granting Facebook a license to user-generated content), with Meta ToS, supra note 100 (doing the same, incorporating AI only by cross-reference to Meta’s Privacy Policy). ↑
. Tan, supra note 38 (comparing the United States to the EU, in which a special alert was required). ↑
. Peter A. Alces & Michael M. Greenfield, They Can Do What!? Limitations on the Use of Change-of-Terms Clauses, 26 Ga. St. U. L. Rev. 1099, 1137 (2010) (arguing that change-of-terms provisions leave consumers with “an absence of meaningful choice”); see, e.g., Meta ToS, supra note 100 (containing a change-of-terms provision and giving users the option only to delete their accounts if they disagree with any updates). ↑
. See Niva Elkin-Koren, Giovanni De Gregorio & Maayan Perel, Social Media as Contractual Networks: A Bottom Up Check on Content Moderation, 107 Iowa L. Rev. 987, 1022–24 (2022) (explaining that social media users have very little recourse under adhesive Terms of Service because American courts largely honor terms of service even though the gulf in bargaining power between platform and user is virtually infinite). ↑
. Restatement of Consumer Conts. § 2 (A.L.I., Tentative Draft No. 2, 2022) (proposing that only notice of such terms and an opportunity to cancel the transaction are required to make them binding); cf. Nguyen v. Barnes & Noble, Inc., 763 F.3d 1171, 1178–79 (9th Cir. 2014) (adhesive arbitration clause was unenforceable because the consumer was not on constructive notice). ↑
. See, e.g., About Grok, Your Humorous AI Assistant on X, X, https://help.x.com/en/using-x/about-grok [https://perma.cc/BE4Y-PFH8] (providing an opt-out mechanism). But see Fed. Trade Comm’n, A Look, supra note 63, at 33 (finding that some U.S. companies responded to data deletion requests not by actually erasing data such as user-generated content but by “de-identifying” it). ↑
. Melissa Heikkilä, How to Opt Out of Meta’s AI Training, MIT Tech. Rev. (June 14, 2024), https://www.technologyreview.com/2024/06/14/1093789/how-to-opt-out-of-meta-ai-training/ [https://perma.cc/J32N-WFZQ] (“Users in the US . . . don’t have any foolproof ways to prevent Meta from using their data to train AI . . . .”). ↑
. Platforms’ licenses authorize them to sub-license user-generated content. See supra note 100. ↑
. Stability AI licensed a dataset of labeled images called LAION, which itself sourced images from a free web crawl dataset called Common Crawl. Second Amended Complaint at 13–14, Andersen v. Stability AI Ltd., 700 F. Supp. 3d 853 (N.D. Cal. 2023) (No. 3:23-cv-00201-WHO). ↑
. See supra notes 101–06 and accompanying text. ↑
. For example, Andersen v. Stability AI plaintiff Sarah Andersen joined X (then Twitter) in October 2012. See Sarah Andersen, supra note 29 (noting that Andersen “[j]oined October 2012”). Facebook passed the billion-user mark in the same year. Hayley Tsukayama, Facebook Reaches 1 Billion Users, Wash. Post (Oct. 4, 2012), https://www.washingtonpost.com/business/
technology/facebook-reaches-1-billion-users/2012/10/04/5edfefb2-0e14-11e2-bb5e-492c0d30bff6_story.html [https://perma.cc/Z7CF-TLJV]. ↑
. Social media platforms ensure their ability to issue such ultimatums by empowering themselves to modify Terms of Service midstream. Ismail Amin, The Fine Print: Social Media Terms of Service, Lexology (July 15, 2025), https://www.lexology.com/library/detail.aspx?
g=ab4bcef0-05b1-4cee-9f62-cd9e133c6e5c [https://perma.cc/H6PX-Y4W2]. ↑
. See Restatement of Consumer Conts. § 5(d) (A.L.I., Tentative Draft No. 2, 2022) (proposing that the touchstone of procedural unconscionability is whether a consumer “does not meaningfully account for the term in making the contracting decision”); id. § 5(c) (proposing that terms are substantively unconscionable if they “unreasonably limit the consumer’s ability to . . . seek reasonable redress for a violation of a legal right”); Eric Andrew Horwitz, An Analysis of Change-of-Terms Provisions as Used in Consumer Service Contracts of Adhesion, 15 U. Mia. Bus. L. Rev. 75, 101–03 (2006) (arguing that both change-of-terms provisions and the resulting changes are procedurally unconscionable in non-periodic contracts); Alces & Greenfield, supra note 104, at 1137 (“[T]he consumer may reasonably avoid a particular exercise of the power given by a change-of-terms clause only by ending the relationship with the other party. Whether this is sufficient to supply a meaningful choice depends at best on the expense and inconvenience in ending the relationship.”). It would certainly be inconvenient to delete a decade-old social media presence over a minor change in contractual language. ↑
. See Radin, supra note 45, at 960 (“Once we admit that a person can be bound up with an external ‘thing’ . . . the person should be accorded broad liberty with respect to control over that ‘thing.’”). ↑
. See Tan, supra note 38 (describing controversial changes in major companies’ terms and conditions relating to AI training). ↑
. The existence of fine-print contractual language depresses consumers’ self-reported intention to speak out against unfairness in public. Meirav Furth-Matzkin & Roseanna Sommers, Consumer Psychology and the Problem of Fine-Print Fraud, 72 Stan. L. Rev. 503, 524–27 (2020). ↑
. Id. at 521 (“[A]lthough laypeople perceive the law governing fraud-and-fine-print situations as overly formalistic, they simultaneously believe that it is unfair to impose contractual obligations on consumers in fraud-and-fine-print cases.”). ↑
. Andrew Burgess, Consumer Adhesion Contract and Unfair Terms: A Critique of Current Theory and a Suggestion, 15 Anglo-Am. L. Rev. 255, 270–71 (1986) (arguing that law has adapted to accommodate adhesion contracts not because they satisfy classical theories of assent or bargaining, but simply because such acceptance is often in the public interest). ↑
. See, e.g., id. at 261–62 (discussing the potential unfairness of the various ways companies limit liability). ↑
. Id. at 263–64, 272–73 (discussing applications of unconscionability and “public interest” limitations). ↑
. For example, see the Federal Trade Commission’s rule requiring door-to-door salespeople to provide consumers with a “right to cancel” over the following three days. 16 C.F.R. § 429.1 (2025). In requiring a cooling-off period, the FTC proscribed the use of high-pressure sales to make irreversible transactions—a problem which otherwise required the resurrection of a dormant contract doctrine to address. See Williams v. Walker-Thomas Furniture Co., 350 F.2d 445, 448–50 (D.C. Cir. 1965) (establishing the doctrine of unconscionability). ↑
. See supra notes 107–09 and accompanying text. ↑
. Morgan Carter, Note, The Optimal Opt-In Option: Protecting Vulnerable Consumers in the Expanding Privacy Landscape, 124 Colum. L. Rev. 431, 441–43 (2024) (reviewing opt-out requirements in several state data privacy laws, including an Illinois privacy law that contains an opt-in requirement, but it is specific to biometric data). ↑
. See id. at 446–48 (criticizing opt-out schemes as “dark patterns” which trick consumers—especially disadvantaged consumers with low digital literacy—into acting against their own preferences). ↑
. “Unlearning,” or the ability to selectively remove certain information from AI models, is an emerging discipline. Currently, the technology is limited to models designed to reproduce training data or match data rather than models designed to generate new data. See, e.g., Guihong Li et al., Machine Unlearning for Image-to-Image Generative Models 23–25 (Feb. 2, 2024), https://arxiv.org/pdf/2402.00351 [https://perma.cc/B6P7-BNPS] (demonstrating an unlearning process for “image-to-image” models, which can ordinarily reconstruct images on which they were trained). ↑
. See Carter, supra note 124, at 453–54 (arguing the same in the personal data privacy context). ↑
. For Locke, the object of civil society itself was to preserve the individual’s right to obtain and, within the law’s boundaries, dispose of property. John Locke, The Second Treatise of Civil Government 28–29 (J.W. Gough ed., Basil Blackwell 1948). ↑
. Hughes, supra note 36, at 345. ↑
. Of course, it is impractical to expect that an author will retain complete control over a work online. See Kevin Collier, NFT Art Sales Are Booming. Just Without Some Artists’ Permission, NBC (Jan. 10, 2022), https://www.nbcnews.com/tech/security/nft-art-sales-are-booming-just-artists-permission-rcna10798 [https://perma.cc/XC7N-PEBG] (documenting unauthorized sales of visual art as speculative investments with little regard for the author’s wishes). However, note that United States law already provides authors and owners with some means of combatting unauthorized reproductions of works online. See 17 U.S.C. § 512(c)(1)(C) (allowing rightsholders to send takedown notices to hosting platforms, which are then required to remove infringing content). ↑
. See supra notes 54–59 and accompanying text. ↑
. Future of Tech Comm’n, The Future of Tech: A Blueprint for Action 43 (2022), https://www.politico.eu/wp-content/uploads/2022/02/15/Blueprint-Full-Future-of-Tech-Commission-Feb2022.pdf [https://perma.cc/JCP2-8D6Z] (reporting that 78% of registered voters interviewed, n=1,003, supported opt-in requirements for data sharing). ↑
. Prentiss Cox, Amy Widman & Mark Totten, Strategies of Public UDAP Enforcement, 55 Harv. J. on Legis. 37, 42 (2018). ↑
. Id. at 42–43. ↑
. Id. at 44. ↑
. Id. ↑
. See, e.g., N.Y. Gen. Bus. Law § 349 (McKinney 2025) (prohibiting “[d]eceptive acts or practices in the conduct of any business”). ↑
. Revised Unif. Deceptive Trade Pracs. Act § 2(a) (Nat’l Conf. of Comm’rs on Unif. State L. 1966). States following the Uniform Act include California, Pennsylvania, and Texas. Cal. Civ. Code Ann. § 1770(a) (West 2025); 73 Pa. Stat. and Cons. Stat. Ann. § 201-2(4) (West 2025); Tex. Bus. & Com. Code Ann. § 17.46 (West 2025). ↑
. Cal. Civ. Code § 1770(a)(11) (West 2025). ↑
. The protection recommended by this Note could be adopted via administrative processes in some states. See, e.g., 73 Pa. Stat. and Cons. Stat. Ann. § 201-3.1 (West 2025) (granting the Attorney General of Pennsylvania rulemaking authority to enforce the state’s UDAP law). Because UDAP rulemaking authority is only granted to agencies in about half of all states, this Note will continue to refer to legislation for simplicity. See Carolyn Carter, Nat’l Consumer L. Ctr., Consumer Protection in the States: A 50-State Evaluation of Unfair and Deceptive Practices Laws 17 (2018), https://www.nclc.org/wp-content/uploads/2022/09/UDAP_rpt.pdf [https://perma.cc/3NDY-76T8] (“Twenty-eight states and the District of Columbia give rulemaking authority to a state agency, but the remaining jurisdictions do not.”). ↑
. Tex. Bus. & Com. Code Ann. §§ 17.41–17.63 (West 2025). ↑
. Tex. Bus. & Com. Code Ann. § 17.46(b)(5) (West 2025). ↑
. See Yumilicious Franchise, L.L.C. v. Barrie, 819 F.3d 170, 175–76 (5th Cir. 2016) (holding that a franchisor did not violate the DTPA by allegedly leading a franchisee to have mistaken beliefs about the franchise agreement because it did not make affirmative misrepresentations). ↑
. See generally Carolyn Carter, Nat’l Consumer L. Ctr., Appendix C: State-by-State Summaries of State UDAP Statutes (2018), https://www.nclc.org/wp-content/uploads/2022/08/udap-appC-1.pdf [https://perma.cc/W8VP-XVUN] (surveying state UDAP statutes); Tex. Bus. & Com. Code Ann. § 17.50(a)(3) (West 2025). ↑
. Tex. Bus. & Com. Code Ann. § 17.45(5) (West 2025). ↑
. Chastain v. Koonce, 700 S.W.2d 579, 584 (Tex. 1985). ↑
. See, e.g., Tex. Bus. & Com. Code Ann. § 17.45(5) (West 2025) (applying to actions rather than contractual terms). ↑
. Alternatively, 28 states could achieve the same through administrative rulemakings. See supra note 140. ↑
. Any effective definition needs to cover use and acquisition as well as licensing to prevent AI firms from simply shifting to whole-internet datasets without any licenses at all (e.g., Common Crawl). See supra note 110. The definition should also be limited to “generative AI” so as not to step on uses of user-generated content termed forms of “AI” without the same legal/ethical/political implications discussed in this Note. See, e.g., Korinek, supra note 82 (describing Facebook’s “Deep Learning Recommendation Model,” which is described as “AI” but is not a generative transformer model). ↑
. E.g., Tex. Bus. & Com. Code Ann. § 17.46(b)(16) (West 2025) (declaring “disconnecting, turning back, or resetting the odometer of any motor vehicle so as to reduce the number of miles indicated on the odometer gauge” to be a deceptive trade practice). ↑
. Bracha & Syed, supra note 41, at 1420–22, 1426. ↑
. See id. at 1427–28 (positing AI regulation as an area of increasing trouble for copyright law, thus meriting better “embedded” cultural policy). ↑
. All states now grant a private UDAP right of action, which “increases the likelihood that a violator will be found out . . . and helps discourage illegal conduct.” Carter, supra note 140, at 33. ↑
. E.g., Tex. Bus. & Com. Code Ann. § 17.50(a) (West 2025). ↑
. Id.; Tex. Bus. & Com. Code Ann. § 17.45(11) (West 2025). ↑
. Mercedes-Benz of N. Am., Inc. v. Dickenson, 720 S.W.2d 844, 848 (Tex. App.—Fort Worth 1986). ↑
. For an extreme example, a digital “artist” Mike Winkelmann, who goes by “Beeple” online, made millions in 2021 by selling a collage of works as an NFT. Jacob Kastrenakes, Beeple Sold an NFT for $69 Million, Verge (Mar. 11, 2021), https://www.theverge.com/2021/3/
11/22325054/beeple-christies-nft-sale-cost-everydays-69-million [https://perma.cc/X38Y-96RC]. Winkelmann posts his (mostly vulgar) works on social media, meaning they have almost certainly been used for AI training. See Beeple (@beeple), X, https://x.com/beeple [https://perma.cc/55AS-9XXA] (exhibiting many of Winkelmann’s works posted as user-generated content). ↑
. Latham v. Castillo, 972 S.W.2d 66, 69 (Tex. 1998). ↑
. Parkway Co. v. Woodruff, 901 S.W.2d 434, 444 (Tex. 1995). ↑
. See, e.g., Philosophy Tube, AI Is an Ethical Nightmare, at 20:43–21:16 (YouTube,
Oct. 13, 2023), https://www.youtube.com/watch?v=AaU6tI2pb3M [https://perma.cc/L8R9-JSDB] (recounting that a fan used AI to generate pornography in the presenter’s likeness, a feat only possible thanks to AI training on her user-generated content). ↑
. Such attacks on minors have been launched by users of purpose-built deepfake models and, more recently, such mainstream general-purpose models as xAI’s Grok. Jim Axelrod, Teen Victim of AI-Generated “Deepfake Pornography” Urges Congress to Pass “Take It Down Act”, CBS News (Dec. 18, 2024), https://www.cbsnews.com/news/deepfake-pornography-victim-congress/ [https://perma.cc/YKX4-DE85]; Emma Woollacott, Watchdogs Around the World Probing Grok over CSAM Accusations, Forbes (Jan. 8, 2026), https://www.forbes.com/
sites/emmawoollacott/2026/01/08/watchdogs-around-the-world-probing-grok-over-csam-accusations/ [https://perma.cc/EFP5-BVK8]. ↑
. Tex. Bus. & Com. Code Ann. § 17.42(a) (West 2025). ↑
. See, e.g., X ToS, supra note 100 (“To the extent permitted by law, you also waive the right to participate as a plaintiff or class member in any purported class action . . . .”). ↑
. See, e.g., Tex. Bus. & Com. Code Ann. § 17.50(b)(1) (West 2025) (providing a formula for exemplary damages consisting of multipliers applied to economic and mental anguish damages depending on the defendant’s mental state). ↑
. E.g., Meta ToS, supra note 100 (limiting liability to $100). ↑
. Sw. Bell Tel. Co. v. FDP Corp., 811 S.W.2d 572, 576–77 (Tex. 1991) (holding that liability for DTPA § 17.46(b) violations cannot be limited by contract). ↑
. See, e.g., Tex. Bus. & Com. Code Ann. § 17.50(b)(2) (West 2025) (allowing prevailing plaintiffs to obtain an injunction). ↑
. Note that some states delegate public UDAP enforcement authority to some agency other than that state’s Office of the Attorney General; for simplicity’s sake, this Note will continue to refer to attorneys general as agents of public enforcement. Carter, supra note 140, at 28. ↑
. E.g., Tex. Bus. & Com. Code Ann. § 17.47(a) (West 2025); see Carter, supra note 140, at 28 (“Every state designates a state agency—usually the [A]ttorney [G]eneral’s office—to enforce its UDAP statute.”). ↑
. Plaintiff’s Petition at 13–14, 21–24, Texas v. Meta Platforms, Inc., No. 22-0121 (71st Dist. Ct., Harrison Cnty., Tex. Feb. 14, 2022); Agreed Final Judgment at 4, Texas v. Meta Platforms, Inc., No. 22-0121 (71st Dist. Ct., Harrison Cnty., Tex. July 30, 2024). ↑
. Assurance of Voluntary Compliance at 21, Massachusetts v. Google, LLC, No. 2284CV02578 (Super. Ct., Suffolk Cnty., Mass. Nov. 15, 2022) (emphasis omitted). ↑
. See Data Protection Laws in the United States, DLA Piper (Feb. 6, 2025), https://www.dlapiperdataprotection.com/?c=US [https://perma.cc/JG4Z-YLM9] (explaining the layered, “patchwork” nature of American data privacy laws); Who Does the Data Protection Law Apply To?, Eur. Comm’n, https://commission.europa.eu/law/law-topic/data-protection/rules-business-and-organisations/application-regulation/who-does-data-protection-law-apply_en [https://perma.cc/3GM2-RVAL] (clarifying via hypothetical that the EU’s GDPR applies extraterritorially). ↑
. See Richard Lawne, GDPR vs U.S. State Privacy Laws: How Do They Measure Up?, Fieldfisher (Mar. 1, 2023), https://www.fieldfisher.com/en/insights/gdpr-vs-u-s-state-privacy-laws-how-do-they-measure [https://perma.cc/N9TE-NPJS] (noting the CCPA’s distinctive character and that the GDPR remains the most comprehensive general-purpose privacy regulation). ↑
. 2016 O.J. (L 119) 33; Cal. Civ. Code § 1798.140(d) (West 2025). ↑
. Brian Daigle & Mahnaz Khan, U.S. Int’l Trade Comm’n, One Year In: GDPR Fines and Investigations Against U.S.-Based Firms (2019), https://www.usitc.gov/
publications/332/executive_briefings/gdpr_enforcement.pdf [https://perma.cc/QPR6-JA36] (“Since May 2018, . . . the EU has collectively imposed more than €380 million ($417 million) in total fines under GDPR.”); e.g., Class Action Complaint at 3–4, In Re Hanna Andersson & Salesforce.com Data Breach Litig., No. 3:20-cv-00812-EMC (N.D. Cal. 2020) (settling a $400,000 CCPA case against a Delaware corporation headquartered in Oregon); Bradley Arant Boult Cummings LLP, Hanna Andersson and Salesforce Receive Preliminary Approval for Settlement of CCPA-Based Class Action Litigation, JDSUPRA (Jan. 6, 2021), https://www.jdsupra.com/
legalnews/hanna-andersson-and-salesforce-receive-9222631 [https://perma.cc/K496-SV97]. ↑
. Emily Stewart, Why Every Website Wants You to Accept Its Cookies, Vox (Dec. 10, 2019), https://www.vox.com/recode/2019/12/10/18656519/what-are-cookies-website-tracking-gdpr-privacy [https://perma.cc/5EMP-9ZSW]. ↑
. These three large states’ UDAP statutes already contain “laundry lists” of proscribed conduct and are thus in position to enact the recommended legislation swiftly. See supra note 138 and accompanying text. ↑
. Anthony Cuthbertson, Porn Ban Mapped: Pornhub Now Blocks Visitors in a Third of US States, Independent (Jan. 6, 2025), https://www.the-independent.com/tech/pornhub-ban-news-states-florida-redtube-b2673628.html [https://perma.cc/8E6V-DUNC] (mapping states in which pornographic websites are blocked). ↑
. Cal. Civ. Code § 1798.140(i) (West 2025) (defining “consumer” as a California resident); Cal. Code Regs. tit. 18, § 17014 (2025) (providing definition of “resident” cross-referenced in the CCPA that includes “every individual who is domiciled in the State who is outside the State for a temporary or transitory purpose”). ↑
. E.g., Tex. Civ. Prac. & Rem. Code Ann. § 129B.006(a) (West 2025) (allowing the Attorney General of Texas to bring actions for civil penalties and injunctive relief). ↑
. Joshua Nelken-Zitser, After Pornhub Left Florida, VPN Demand Surged by More Than 1,000%, Bus. Insider (Jan. 6, 2025), https://www.businessinsider.com/pornhub-exited-florida-vpn-demand-surged-by-over-1000-percent-2025-1 [https://perma.cc/7UL8-QCWT]. ↑
. For example, Aylo (the parent company of several large porn sites) has blocked service to states with bans while publicly advocating for less invasive age-verification methods. Cuthbertson, supra note 178. ↑
. See Frequently Asked Questions (FAQs), Cal. Priv. Prot. Agency, https://cppa.ca.gov/
faq.html [https://perma.cc/ZD9B-DPN2] (“The CCPA provides privacy rights to California residents . . . . even if the person is temporarily outside of the state.”). ↑
. E.g., Tex. Bus. & Com. Code Ann. § 17.50(a) (West 2025). ↑
. E.g., Tex. Bus. & Com. Code Ann. § 17.50(a)(1) (West 2025) (providing consumers a right of action against “the use or employment by any person” of an unlawful act, regardless of that person’s characteristics); Agyei v. Endurance Power Prods., Inc., 198 F. Supp. 3d 764, 768, 772–74 (S.D. Tex. 2016) (holding that “a Delaware corporation with a principal place of business in Nebraska” could potentially be found liable under the DTPA). ↑
. E.g., Kate Knibbs, The Battle Over Books3 Could Change AI Forever, Wired (Sep. 4, 2023), https://www.wired.com/story/battle-over-books3/ [https://perma.cc/DJ75-GRGD] (noting that Meta and Bloomberg trained models on “Books3,” a dataset compiled by AI nonprofit Eleuther); Marissa Newman & Aggi Cantrill, The Future of AI Relies on a High School Teacher’s Free Database, Bloomberg (Apr. 23, 2023), https://www.bloomberg.com/news/features/2023-04-24/a-high-school-teacher-s-free-image-database-powers-ai-unicorns [https://perma.cc/SKK7-7PXC] (profiling Christoph Schuhmann, the mind behind LAION, a free dataset used to train image generation models). ↑
. See, e.g., OpenAI, GPT-4, supra note 8, at 2 (“Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about . . . dataset construction, training method, or similar.”). ↑
. See Benj Edwards, Have AI Image Generators Assimilated Your Art? New Tool Lets You Check, Ars Technica (Sep. 15, 2022), https://arstechnica.com/information-technology/2022/09/have-ai-image-generators-assimilated-your-art-new-tool-lets-you-check/ [https://perma.cc/PR3B-Y2TR] (describing methods to search if data has been used in AI training). ↑
. See Second Amended Complaint, supra note 110, at 6 (citing consumer-facing database querying tool “Have I Been Trained?” that accesses two popular image datasets used in AI training). ↑
. See Carlini et al., supra note 24, at 5266 (demonstrating the extraction of close copies of training images from image generation models). ↑
. See, e.g., RJ Palmer (@arvalis), X (Aug. 13, 2022), https://x.com/arvalis/status/
1558632898336501761 [https://perma.cc/6XSG-SPYJ] (documenting that, early on, Stable Diffusion was advertised as able to emulate the styles of various visual artists). ↑
. Lauren Weber & Caitlin Gilbert, White House MAHA Report May Have Garbled Science by Using AI, Experts Say, Wash. Post (May 29, 2025), https://www.washingtonpost.com/health/
2025/05/29/maha-rfk-jr-ai-garble/ [https://perma.cc/BD82-JGYY]. ↑