Overview of all risks covered: Table 1.1 Ethical and Social risks of harm from Language Models

Greg Proffit

Overview of all risks covered: Table 1.1 Ethical and Social risks of harm from Language Models

I. Discrimination, Exclusion and Toxicity

Mechanism: These risks arise from the LM accurately reflecting natural speech, including unjust, toxic, and oppressive tendencies present in the training data.

Types of Harm: Potential harms include justified offense, material (allocational) harm, and the unjust representation or treatment of marginalized groups.

Social stereotypes and unfair discrimination
Exclusionary norms
Toxic language
Lower performance by social group

II. Information Hazards

Mechanism: These risks arise from the LM predicting utterances which constitute private or safety-critical information which are present in, or can be inferred from, training data.

Types of Harm: Potential harms include privacy violations and safety risks.

Compromise privacy by leaking private information
Compromise privacy by correctly inferring private information
Risks from leaking or correctly inferring sensitive information

III. Misinformation Harms

Mechanism: These risks arise from the LM assigning high probabilities to false, misleading, nonsensical or poor quality information.

Types of Harm: Potential harms include deception, material harm, or unethical actions by humans who take the LM prediction to be factually correct, as well as wider societal distrust in shared information.

Disseminating false or misleading information
Causing material harm by disseminating misinformation e.g. in medicine or law
Nudging or advising users to perform unethical or illegal actions

IV. Malicious Uses

Mechanism: These risks arise from humans intentionally using the LM to cause harm.

Types of Harm: Potential harms include undermining public discourse, crimes such as fraud, personalized disinformation campaigns, and the weaponization or production of malicious code.

Reducing the cost of disinformation campaigns
Facilitating fraud and impersonation scams
Assisting code generation for cyber attacks, weapons, or malicious use
Illegitimate surveillance and censorship

V. Human-Computer Interaction Harms

Mechanism: These risks arise from the LM applications, such as Conversational Agents, that directly engage a user via the mode of conversation.

Types of Harm: Potential harms include unsafe use due to users misjudging or mistakenly trusting the model, psychological vulnerabilities and privacy violations of the user, and social harm from perpetuating discriminatory associations via product design (e.g. making “assistant” tools by default “female.”.

Anthropomorphizing systems can lead to overreliance or unsafe use
Create avenues for exploiting user trust to obtain private information
Promoting harmful stereoptypes by implying gender or ethnic identity

VI. Automation, access, and environmental harms

Mechanism: These risks arise where LMs are used to underpin widely used downstream applications that disproportionately benefit some groups rather than others.

Types of Harm: Potential harms include increasing social inequalities from uneven distribution of risk and benefits, loss or high-quality and safe employment, and environmental harm..

Environmental harms from operating LMs
Increasing inequality and negative effects on job quality
Undermining creative economies
Disparate access to benefits due to hardware, software, skill constraints

References

Dodge, Jesse, et al. “Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.” (2021). Print.

Weidinger, Laura, et al. “Ethical and Social Risks of Harm from Language Models.” arXiv preprint arXiv:2112.04359 (Cornell University Library) (2021). Print.

Greg Proffit