Agenda Part 1: Intersection between AI and Open Source Part 2: Litigation Overview Part 3: Statutory/Legislation Overview Part 4: Data Protection and AI
Part 1: Intersection between AI and Open Source 3
Introduction to GAI 4 Artificial General Intelligence (AGI) On par with humans Ability to understand, learn and reason Process a spectrum of cognitive ability https://www.mangalorean.com/anthony-daniels-voice-of-c-3po-on-his-hurtful-star-wars-experience/ https://www.imdb.com/title/tt0088247/mediaviewer/rm774208512/?ref_=tt_ov_i https://www.imdb.com/title/tt0470752/mediaviewer/rm848491264/?ref_=tt_ov_i
Introduction to GAI 5 Artificial General Intelligence (AGI) Artificial Intelligence (AI) Mimics human intelligence, usually for a specific task Solve a problem in a specific situation or environment WIPO Technology Trends 2019 – Artificial Intelligence, pg 42
Introduction to GAI 6 Artificial Intelligence (AI) Artificial General Intelligence (AGI) Machine Learning (ML) sandserifcomics Branch of computer science Algorithm learns from dataset to generate a model
Introduction to GAI 7 Artificial Intelligence (AI) Artificial General Intelligence (AGI) Machine Learning (ML) Deep Learning (DL) Hacker Noon – Big Challenge in Deep Learning: Training Data Requires a lot of data and computation power Neural network is the backbone of most deep learning architectures This Photo by Unknown Author is licensed under CC BY-SA
Introduction to GAI 8 Artificial Intelligence (AI) Artificial General Intelligence (AGI) Machine Learning (ML) Deep Learning (DL) Generative AI (GAI) - Ability to generate data
Some GAI Applications 9 Artificial General Intelligence (AGI)
GAI in Action – ChatGPT and Gemini 10 1
GAI in Action – ChatGPT and Gemini 11 2
GAI in Action – ChatGPT and Gemini 12 3
Just for fun – DALL-E vs Gemini
14 Open Source Licenses for Common ML and AI Projects Apache 2 Apache 2 Apache 2 Apache 2 BSD 3-Clause BSD 2-Clause Mostly Apache 2 GPL BSD 3-Clause BSD 3-Clause LLAMA 2 Community License Agreement
15 Moral dilemma and harm if AI code is used irresponsibly Responsible AI Licenses (RAIL) attempt to curb harmful applications of AI Clauses include behavioral-use restrictions Responsible AI Licenses
16 Responsible AI Licenses Licenses have use restrictions: Behavioral-use restriction Downstream users include, at minimum, the same behavioral-use restrictions Responsible AI End-User License: The initiative recommends having an end-user license RAIL naming convention: RAIL-D: use restriction applied only to data RAIL-A: use restriction applied only to application/executable RAIL-M: use restriction applied only to model RAIL-S: use restriction applied only to source code
18 Grant of Rights Non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare derivative works, and distribute the creation Open RAIL-S Restrictions Must include a copy of the license when distributed Must have use restrictions for Surveillance, computer-generated media, health care, and criminal Downstream users are required to comply with the use restrictions Violation of restrictions allows Licensor to Terminate the license agreement Post a notice on Licensor’s website for 1 year indicating that the Licensee violated the terms of the license
Litigation Against GAI Tool Providers Copyright Litigation Trademark Litigation Trademark infringement and unfair competition Fraudulent registration Privacy Litigation Violation of the Electronic Communications Privacy Act Violation of the Computer Fraud and Abuse Act
Class Actions Against GitHub Copilot Two class actions were filed in November 2022, against Microsoft, OpenAI, and GitHub for software piracy Likely were the first class actions in the US challenging the training and output of AI systems
Issues Copilot ingests and distributes licensed material without including associated attribution, copyright notices, and license terms Ingest: during training of Copilot, e.g., generating model using licensed materials Distribute: during inference-making, e.g., generating code that is subject to license Not clear how the training data was collected GitHub has been cagey on the source of the training data Only GitHub repositories or other public repositories?
Claims Remaining claims after Order Granting In Part Denying In Part Motions to Dismiss (Unsealed July 5, 2024) Digital Millennium Copyright Act violations (DMCA) (Removed in the order to partially dismiss second amended complaint ) Violation of Section 1202 Remove copyright management information Contract-related claims Breach of open source contract Breach of GitHub’s Terms of Services and Privacy Statement Interference with prospective economic relations (Removed in the amended complaint) Unjust enrichment (Downplayed in the amended complaint and removed in the order to partially dismiss second amended complaint ) Unfair competition (Removed in the amended complaint)
1202(b) Integrity of copyright management information (b) Removal or Alteration of Copyright Management Information .—No person shall, without the authority of the copyright owner or the law— (1) intentionally remove or alter any copyright management information, (2) distribute or import for distribution copyright management information knowing that the copyright management information has been removed or altered without authority of the copyright owner or the law, or (3) distribute, import for distribution, or publicly perform works, copies of works, or phonorecords, knowing that copyright management information has been removed or altered without authority of the copyright owner or the law, knowing, or, with respect to civil remedies under section 1203, having reasonable grounds to know, that it will induce, enable, facilitate, or conceal an infringement of any right under this title.
Stable Diffusion Inpainting Outpainting Image-to-image translation Released in August 2022 under a Rail-M style license, CreativeML Open RAIL-M license Permissive license allowing for commercial and non-commercial usage Restrictions on use Model that generates photorealistic images from text input Trained on a dataset created by non-profit organization LAION Free from Hugging Face Paid version with more features
Getty Images v. Stability AI Copyright Infringement Image recreated by Stable Diffusion Copyright management information violation Getty Images applies watermarks Stable Diffusion sometimes applies a modified version of the watermarks to output Removal of copyright management information (Removed in the second amended complaint filed on July 8, 2024) Stable Diffusion removes the watermarks and metadata associated with the images Watermarks and metadata contain copyright management information Getty Images Stable Diffusion
Clean Input/Output? The common issue from the previous cases: plaintiffs sue defendants for unauthorized use of plaintiffs’ copyrighted materials. Possible solutions? Data licensing? Google Microsoft
Part 3: Statutory/Legislation Overview 29
Types of AI Regulation
Part 4: Data Protection and AI 31
2024 Privacy Law Roundup 20 comprehensive privacy laws: Effective dates between 2020 and 2026 Most apply to “personal data” regarding “consumers” California is broader Complicated interplay with other regs: Each state has some entity- or data-level carve outs for entities or data regulated by HIPAA, GLBA, FCRA, etc. States have also been passing sectoral privacy laws: Washington, Connecticut, and Nevada consumer health data laws Several state laws regarding the privacy of children’s data or establishing rules for children’s use of social media Biometric privacy laws AI statutes with transparency or privacy-related obligations International Association of Privacy Professionals, US State Privacy Legislation Tracker , July 2024
AI + Data Protection Intersection Security concerns and data breach: State data breach notification laws, HIPAA Breach Notification Rule, GLBA Safeguards Rule FTC notice obligation, GDPR, etc. Some regulated data may be carved out, but what about if the regulated data is combined with non-regulated data? Complicated privacy compliance obligations: Companies must permit consumers to opt out of automated decision making (ADM) or profiling that results in legal or similarly significant decisions regarding the consumer Decisions must be made by solely automated means (i.e., without human intervention) Significant decisions include access to medical care, employment, financial services, housing, etc. Under draft CCPA Regulations, training ADM / AI systems for certain purposes—including training LLMs—is considered high risk Using ADM systems to process sensitive personal data (medical diagnoses or health condition, race, ethnicity, biometric, precise geolocation, etc.) may be subject to heightened scrutiny AI laws and regulations focus on protecting against bias and discrimination For high-risk processing, pre-use notice and risk assessments are required.