With language models on the rise, how can Natural Language Processing be used for good?

A research team led by Prof. Rada Mihalcea and PhD student Zhijing Jin has created a method for identifying and categorizing research that uses NLP to address social problems.

Today’s news media and public discourse resound with concerns surrounding AI and its potential effects on society. The growing popularity of natural language processing (NLP) tools such as ChatGPT have brought these issues to the fore, introducing new questions about what role these technologies will play in our lives going forward. Many wonder whether these developments are a good thing and what they actually contribute to society.

Seeking to address these issues and demonstrate that NLP can in fact help, instead of harm, humanity, Janice M. Jenkins Collegiate Professor of Computer Science Rada Mihalcea and her team in the Michigan AI Lab have launched a project called NLP for Social Good. This initiative aims to empower researchers to apply NLP to serve the greater good.

Zhijing Jin, a doctoral student working in Mihalcea’s research group and one of the project’s leads, emphasizes the importance of this project.

“For a long time,” Jin says, “most people in the NLP domain mainly focused on specific linguistic research questions. While those continue to be a core interest for the NLP community, recent progress in the field has also made possible the use of NLP for practical applications. Our team is the first to really analyze and promote how NLP is being used for social good.”

For these purposes, Mihalcea, Jin, and their team have created an innovative new dataset called NLP4SG Papers. The team developed a robust set of NLP models and visualizations to identify and sort research papers that utilize NLP processes to address social problems.

Flow chart showing the decision tree for NLP4SG PaperAnalyzer, showing its three main tasks and methods used to accomplish them.
The decision tree of NLP4SG PaperAnalyzer, which identifies social good papers (Task 1), classifies the relevant UN SDGs (Task 2), and analyzes salient scientific terms (Task 3).

In building this system, Jin describes, “we asked ourselves, how do we define social good and how does NLP align with it?” In responding to this question, the research team turned to the United Nations Sustainable Development Goals (SDGs), which include promoting good health and well-being, supporting quality education, addressing climate change, ending poverty and hunger, and several others. They used these internationally agreed-upon measures as a framework for understanding how NLP researchers are advancing the greater good. 

Using their innovative PaperAnalyzer tool, the research team mapped the entire Association for Computational Linguistics (ACL) Anthology, comprising 76,000 papers, against these 17 UN SDGs, using a complex set of NLP tools to tag and sort these papers. 

Jin emphasizes the comprehensive nature of the NLP4SG dataset and the multiple steps involved in creating it. “There are several tasks involved in this process, such as classifying the papers and then looking into what UN goals they align with, as well as what NLP methods they used. We used a variety of models to accomplish these tasks, all of them state-of-the-art.” These include, according to Jin, a combination of large language models (LLMs), interpretability tools, language representation models, and more traditional techniques such as keyword matching and named entity recognition.

Through the use of these diverse tools, Mihalcea, Jin, and their colleagues successfully identified several trends in NLP research and its application to social problems. For instance, they found that one of the most researched areas among social good-related NLP papers has been health and well-being. Another area of rapid development, particularly in recent years, is misinformation detection. “The emphasis in these areas could be due to a set of factors,” Jin notes. “It might be easier to formulate research questions in these domains, or it might be due to abundant funding.”

Diagram showing which social good-related topic areas are most frequently represented in ACL papers, as well as what tasks and methods are applied in these papers.
A Sankey diagram showing which social good-related topic areas are most frequently represented in ACL papers, as well as what tasks and methods are applied in these papers.

Other topics, such as helping people out of poverty, have received less research attention in NLP. “One reason behind this is funding distribution, but another factor could be that some of these issues are never brought to people in the ivory tower.”

But change can be brought about, Jin argues, by promoting awareness of the new and emerging thematic areas addressing social problems, which might go neglected otherwise. “We want to actively build connections so that new researchers in the field are aware of rising research directions and can see the value in them, regardless of the current funding situation,” Jin states. 

Of the long-term goals of their project, Mihalcea says, “Our hope is that by highlighting the role that NLP can play for social good, we can inspire the next generation of NLP researchers to start making a difference in these high social-impact research areas.” In that way, she states, “we want to promote gradual change in the field.”