Overview of all risks covered: Table 1.1 Ethical and Social risks of harm from Language Models

I. Discrimination, Exclusion and Toxicity

    Mechanism: These risks arise from the LM accurately reflecting natural speech, including unjust, toxic, and oppressive tendencies present in the training data.

    Types of Harm: Potential harms include justified offense, material (allocational) harm, and the unjust representation or treatment of marginalized groups.

  • Social stereotypes and unfair discrimination
  • Exclusionary norms
  • Toxic language
  • Lower performance by social group

II. Information Hazards

    Mechanism: These risks arise from the LM predicting utterances which constitute private or safety-critical information which are present in, or can be inferred from, training data.

    Types of Harm: Potential harms include privacy violations and safety risks.

  • Compromise privacy by leaking private information
  • Compromise privacy by correctly inferring private information
  • Risks from leaking or correctly inferring sensitive information

III. Misinformation Harms

    Mechanism: These risks arise from the LM assigning high probabilities to false, misleading, nonsensical or poor quality information.

    Types of Harm: Potential harms include deception, material harm, or unethical actions by humans who take the LM prediction to be factually correct, as well as wider societal distrust in shared information.

  • Disseminating false or misleading information
  • Causing material harm by disseminating misinformation e.g. in medicine or law
  • Nudging or advising users to perform unethical or illegal actions

IV. Malicious Uses

    Mechanism: These risks arise from humans intentionally using the LM to cause harm.

    Types of Harm: Potential harms include undermining public discourse, crimes such as fraud, personalized disinformation campaigns, and the weaponization or production of malicious code.

  • Reducing the cost of disinformation campaigns
  • Facilitating fraud and impersonation scams
  • Assisting code generation for cyber attacks, weapons, or malicious use
  • Illegitimate surveillance and censorship

V. Human-Computer Interaction Harms

    Mechanism: These risks arise from the LM applications, such as Conversational Agents, that directly engage a user via the mode of conversation.

    Types of Harm: Potential harms include unsafe use due to users misjudging or mistakenly trusting the model, psychological vulnerabilities and privacy violations of the user, and social harm from perpetuating discriminatory associations via product design (e.g. making “assistant” tools by default “female.”.

  • Anthropomorphizing systems can lead to overreliance or unsafe use
  • Create avenues for exploiting user trust to obtain private information
  • Promoting harmful stereoptypes by implying gender or ethnic identity

VI. Automation, access, and environmental harms

    Mechanism: These risks arise where LMs are used to underpin widely used downstream applications that disproportionately benefit some groups rather than others.

    Types of Harm: Potential harms include increasing social inequalities from uneven distribution of risk and benefits, loss or high-quality and safe employment, and environmental harm..

  • Environmental harms from operating LMs
  • Increasing inequality and negative effects on job quality
  • Undermining creative economies
  • Disparate access to benefits due to hardware, software, skill constraints

References

Dodge, Jesse, et al. “Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.”  (2021). Print.

Weidinger, Laura, et al. “Ethical and Social Risks of Harm from Language Models.” arXiv preprint arXiv:2112.04359 (Cornell University Library)  (2021). Print.

Scroll to Top