“Sociodemographic Biases in Natural Language Processing: Two Case Studies”
Dr. Shomir Wilson, Assistant Professor and Director of the Human Language Technologies Lab in the College of Information Sciences and Technology at Penn State
Friday, February 10, 9:00–10:30 a.m. EST, in 127 Moore Building and virtually via Zoom
Large language models (LLMs) are widely used in natural language processing (NLP) to obtain high performance on a variety of tasks. However, the large corpora used to train these models contain sociodemographic biases, and LLMs tend to inherit those biases, with potentially harmful results. Shomir Wilson will present two case studies that reveal the sociodemographic biases of select LLMs within the context of sentiment analysis, a common NLP task. The first study shows that Word2Vec and GloVe exhibit negative sentiment bias toward terms for people with disabilities. The second study shows that GPT-2 exhibits a range of sentiment biases for nationality demonyms, i.e., words that specify national origins. Shomir will conclude with some thoughts on the significance of these biases and the challenges to mitigating or eliminating them.