TrustNLP: Fourth Workshop on Trustworthy Natural Language Processing

Colocated with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

About

Recent advances in Natural Language Processing, and the emergence of pretrained Large Language Models (LLM) specifically, have made NLP systems omnipresent in various aspects of our everyday life. In addition to traditional examples such as personal voice assistants, recommender systems, etc, more recent developments include content-generation models such as ChatGPT, text-to-image models (Dall-E), and so on. While these emergent technologies have an unquestionable potential to power various innovative NLP and AI applications, they also pose a number of challenges in terms of their safe and ethical use. To address such challenges, NLP researchers have formulated various objectives, e.g., intended to make models more fair, safe, and privacy-preserving. However, these objectives are often considered separately, which is a major limitation since it is often important to understand the interplay and/or tension between them. For instance, meeting a fairness objective might require access to users’ demographic information, which creates tension with privacy objectives. The goal of this workshop is to move toward a more comprehensive notion of Trustworthy NLP, by bringing together researchers working on those distinct yet related topics, as well as their intersection.

Call for Papers

Topics

We invite papers which focus on different aspects of safe and trustworthy language modeling. Topics of interest include (but are not limited to):

  • Secure, Faithful & Trustworthy Generation with LLMs
  • Fairness in LLM alignment, Human Preference Elicitation, Participatory NLP
  • Data Privacy Preservation and Data Leakage Issues in LLMs
  • Toxic Language Detection and Mitigation
  • Red-teaming, backdoor or adversarial attacks and defenses for LLM safety
  • Explainability and Interpretability of LLM generation
  • Robustness of LLMs
  • Mitigating LLM Hallucinations & Misinformation
  • Fairness and Bias in multi-modal generative models: Evaluation and Treatments
  • Industry applications of Trustworthy NLP
  • Trustworthy NLP challenges and opportunities for Latin American and Caribbean languages
  • Regionally-relevant NLP fairness applications (toxicity, sentiment, content moderation, translation, etc.)
We welcome contributions which also draw upon interdisciplinary knowledge to advance Trustworthy NLP. This may include working with, synthesizing, or incorporating knowledge across expertise, sociopolitical systems, cultures, or norms.

Important Dates

  • Tues, April 2nd 2024: Workshop Paper Due Date (Direct Submission via Softconf)
  • Friday, April 5th, 2024: Workshop Paper Due Date (Fast-Track)
  • April 23rd, 2024: Notification of Acceptance
  • May 1st, 2024: Deadline for relevant NAACL Findings to submit non-archival (Direct submission via this form)
  • May 3rd, 2024: Camera-ready Papers Due
  • Friday June 21, 2024: TrustNLP Workshop day

Submission Information

All submissions undergo double-blind peer review (with author names and affiliations removed) by the program committee, and they will be assessed based on their relevance to the workshop themes.

All submissions go through the Softconf START conference management system. To submit, use this Softconf submission link.

Submitted manuscripts must be 8 pages long for full papers and 4 pages long for short papers. Please follow NAACL submission policies. Both full and short papers can have unlimited pages for references and appendices. Please note that at least one of the authors of each accepted paper must register for the workshop and present the paper. Template files can be found here.

We also ask authors to include a limitation section and broader impact statement, following guidelines from the main conference.

Fast-Track Submission

If your paper has been reviewed by ACL, EMNLP, EACL, or ARR and the average rating is higher than 2.5 (either average soundness or excitement score), the paper is qualified to be submitted to the fast-track. In the appendix, please include the reviews and a short statement discussing what parts of the paper have been revised.

Non-Archival Option

NAACL workshops are traditionally archival. To allow dual submission of work, we are also including a non-archival track. If accepted, these submissions will still participate and present their work in the workshop. A reference to the paper will be hosted on the workshop website (if desired), but will not be included in the official proceedings. Please submit through Softconf but indicate that this is a cross submission at the bottom of the submission form. You can also skip this step and inform us of your non-archival preference after the reviews. Papers accepted to the Findings of NAACL 2024 may also submit non-archival to the workshop here.

Policies

Accepted and under-review papers are allowed to submit to the workshop but will not be included in the proceedings.

No anonymity period will be required for papers submitted to the workshop, per the latest updates to the ACL anonymity policy. However, submissions must still remain fully anonymized.

Invited Speakers

Maria Pacheco

Maria Pacheco

Assistant Professor, University of Colorado, Boulder

Maria Pacheco is an Assistant Professor in the Department of Computer Science and a Faculty Fellow in the Institute of Cognitive Science at the University of Colorado Boulder. Before joining CU, she was a Postdoctoral Researcher at Microsoft Research NYC. Maria completed her PhD in Computer Science at Purdue University, and her BSc. in Computer Science and Engineering at the Universidad Simon Bolivar in Caracas, Venezuela, where she was born and raised.

Talk Title: Empowering text-as-data researchers with explainable NLP systems.

NLP systems are everywhere, and the broader research ecosystem is no exception. Researchers in the social sciences, humanities and industry have been making increasing use of NLP technologies to make sense of large textual repositories and answer fundamental questions in their fields of study. When NLP systems are employed as research tools, ensuring their reliability and transparency is incredibly important. Most current systems, while seemingly powerful, are opaque and often fail to satisfy the needs of researchers. In this talk, I will discuss the main challenges that arise when incorporating NLP technologies in text-as-data research, and make the case for explainable, controllable alternatives that can strike a balance between generalization and trustworthiness.


Jieyu Zhao

Prasanna Sattigeri

Principal Research Scientist, IBM Research

Prasanna Sattigeri is a Principal Research Scientist at IBM Research AI and the MIT-IBM Watson AI Lab, where his primary focus is on developing reliable AI solutions. His research interests encompass areas such as generative modeling, uncertainty quantification, and learning with limited data. His current projects are focused on the governance and safety of large language models (LLMs), aiming to establish both theoretical frameworks and practical systems that ensure these models are reliable and trustworthy. He has played a significant role in the development of several well-known open-source trustworthy AI toolkits, including AI Fairness 360, AI Explainability 360, and Uncertainty Quantification 360.

Talk Title: LLM Governance Elements: Detection and Alignment

This talk will go over the challenges and opportunities of developing and deploying large language models (LLMs), with a focus on building trustworthy AI systems. We investigate the difficulties of creating a detector library that can label a variety of risks, such as biased or hallucinated outputs. Detectors can be used not only as critical safeguards after deployment, but also for data curation and alignment during the development phase, allowing for effective governance across the LLM lifecycle. In addition, we will go over a few approaches that can help application developers tailor LLM behavior to not only mitigate common harms but also align with specific values or business requirements. Finally, we emphasize the importance of human-centered model evaluation, discussing how explanations and source attribution can improve transparency and trust in LLM applications.

Jieyu Zhao

Jieyu Zhao

Assistant Professor, University of Southern California

Jieyu Zhao is an assistant professor of Computer Science Department at University of Southern California. Prior to that, she was an NSF Computing Innovation Fellow at University of Maryland, College Park. Jieyu received her Ph.D. from Computer Science Department, UCLA. Her research interest lies in fairness of ML/NLP models. Her paper got the EMNLP Best Long Paper Award (2017). She was one of the recipients of 2020 Microsoft PhD Fellowship and has been selected to participate in 2021 Rising Stars in EECS workshop. Her research has been covered by news media such as Wires, The Daily Mail and so on. She was invited by UN-WOMEN Beijing on a panel discussion about gender equality and social responsibility.

Talk Title: Building Accountable NLP Models for Social Good

The rapid advancement of natural language processing (NLP) technologies has unlocked a myriad of possibilities for positive societal impact, ranging from enhancing accessibility and communication to supporting disaster response and public health initiatives. However, the deployment of these technologies also raises critical concerns regarding accountability, fairness, transparency, and ethical use. In this talk, I will discuss our efforts for auditing NLP models, detecting and mitigating biases, and understanding how LLMs make decisions. We hope to open the conversation to foster a community-wide effort towards more accountable and inclusive NLP practices.


Jieyu Zhao

Ahmad Beirami

Research Scientist, Google Research

Ahmad Beirami is a research scientist at Google Research, co-leading a research team on building safe, helpful, and scalable generative language models. At Meta AI, he led research to power the next generation of virtual digital assistants with AR/VR capabilities through robust generative language modeling. At Electronic Arts, he led the AI agent research program for automated playtesting of video games and cooperative reinforcement learning. Before moving to industry in 2018, he held a joint postdoctoral fellow position at Harvard & MIT, focused on problems in the intersection of core machine learning and information theory. He is the recipient of the 2015 Sigma Xi Best PhD Thesis Award from Georgia Tech.

Talk Title: Language Model Alignment: A Theoretical View

The goal of the language model alignment (post-training) process is to draw samples from an aligned distribution that improves a reward (e.g., make the generation safer) but does not perturb much from the base model. A simple baseline for this task is best-of-N, where N responses are drawn from the base model, ranked based on a reward, and the highest ranking one is selected. More sophisticated techniques generally solve a KL-regularized reinforcement learning (RL) problem with the goal of maximizing expected reward subject to a KL divergence constraint between the aligned model and the base model. An alignment technique is preferred if its reward-KL tradeoff curve dominates other techniques. In this talk, we give an overview of language model alignment and give an understanding of known results in this space through simplified examples. We also present a new modular alignment technique, called controlled decoding, which solves the KL-regularized RL problem while keeping the base model frozen through learning a prefix scorer, offering inference-time configurability. Finally, we also shed light on the remarkable performance of best-of-N in terms of achieving competitive or even better reward-KL tradeoffs when compared to state-of-the-art alignment baselines.


Schedule

Name Time (Mexico City Time, UTC−06:00)
Opening remarks 9:00 - 9:10am
Keynote 1 (Maria Pacheco) 9:10 - 9:50am
Keynote 2 (Ahmad Beirami) 9:50 - 10:30am
Virtual Poster Session + Coffee Break 10:30 - 11:10am
Keynote 3 (Jieyu Zhao) 11:10-11:50
Keynote 4 (Prasanna Sattigeri) 11:50am - 12:30pm
Lunch 12:30 - 2:00pm
In-person Poster Session 2:00 - 3:30pm
Coffee Break 3:30 - 4:00pm
Oral presentation / Best Paper Presentation 4:00-5:20pm
Closing Remarks 5:20-5:30pm

Accepted Papers

Archival Papers

Non-Archival Papers

  • Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning
    Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie and Xin Eric Wang
  • Uncertainty Assessment of Language Models through Rank-Calibration
    Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Hamed Hassani, Insup Lee, Osbert Bastani and Edgar Dobriban
  • (Best Short Paper) White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency
    Yixin Wan and Kai-Wei Chang
  • ConsEval: Illuminating and Improving the Consistency of LLM Evaluators
    Jiwoo Hong and James Thorne
  • Quantifying Memorization of Domain-Specific Pre-trained Language Models using Japanese Newspaper and Paywalls
    Shotaro Ishihara
  • Beyond Visual Augmentation: Investigating Bias in Multi-Modal Text Generation
    Fnu Mohbat, Vijay Sadashivaiah, Keerthiram Murugesan, Amit Dhurandhar, Ronny Luss and Pin-Yu Chen
  • Reevaluating Bias Detection in Language Models: The Role of Implicit Norms
    Farnaz Kohankhaki, Jacob-Junqi Tian, David B. Emerson, Laleh Seyyed-Kalantari and Faiza Khan Khattak
  • BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models
    Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu and Faiza Khan Khattak
  • (Runner-up Best Short Paper) Can Language Models Interpret Verbalized Uncertainty?
    Catarina Belem, Markelle Kelly, Sameer Singh, Mark Steyvers and Padhraic Smyth
  • (Spotlight Paper) CULTURE-GEN: Natural Language Prompts Reveal Uneven Culture Presence in Language Models
    Huihan Li, Liwei Jiang, Nouha Dziri, Xiang Ren and Yejin Choi
  • Big Brother is Watching You: Automatically Jailbreak GPT-4V for Facial Recognition
    Yuanwei Wu, Yue Huang, Yixin Liu, Hanchi Sun, Xiang Li, Pan Zhou and Lichao Sun
  • Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
    Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason and Khyathi Raghavi Chandu
  • Evaluating Personal Information Parroting in Language Models
    Nishant Subramani, Kshitish Ghate and mona Diab
  • Mitigating Social Biases in Language Models through Unlearning
    Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Mohammad Bolandraftar, Xiaodan Zhu and Faiza Khan Khattak
  • WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models
    Piotr Molenda, Adian Liusie and Mark Gales
  • TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
    Shuo Li, Sangdon Park, Insup Lee and Osbert Bastani
  • Multi-Level Explanations for Generative Language Models
    Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer and Soumya Ghosh
  • On the Calibration of Multilingual Question Answering LLMs
    Yahan Yang, Soham Dan, Dan Roth and Insup Lee
  • (Spotlight Paper) Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger
    Jiazhao Li, Yijin Yang, Zhuofeng Wu, V.G.Vinod Vydiswaran and Chaowei Xiao

Accepted NAACL Papers

  • From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
    Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu and Dong Yu
  • Rationale- based Opinion Summarization
    Haoyuan Li, Snigdha Chaturvedi
  • MisgenderMender: A Community-Informed Approach to Interventions for Misgendering
    Tamanna Hossain, Sunipa Dev, Sameer Singh
  • Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
    Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Yuval Pinter, Rahul Gupta

Committee

Organizers

Program Committee

  • Saied Alshahrani
  • Connor Baumler
  • Gagan Bhatia
  • Keith Burghardt
  • Yang Trista Cao
  • Javier Carnerero Cano
  • Canyu Chen
  • Xinyue Chen
  • Jwala Dhamala
  • Árdís Elíasdóttir
  • Aram Galstyan
  • Usman Gohar
  • Zihao He
  • Pengfei He
  • Qian Hu
  • Satyapriya Krishna
  • Jooyoung Lee
  • Yanan Long
  • Subho Majumdar
  • Ninareh Mehrabi
  • Sahil Mishra
  • Isar Nejadgholi
  • Huy Nghiem
  • Anaelia Ovalle
  • Jieyu Zhao
  • Aishwarya Padmakumar
  • Kartik Perisetla
  • Salman Rahman
  • Chahat Raj
  • Anthony Rios
  • Patricia Thaine
  • Simon Yu
  • Xinlin Zhuang
  • Chupeng Zhang
  • Chenyang Zhu
  • Christina Chance
  • Nishant Balepur
  • Elaine Yixin Wan
  • Xinchen Yang
Interested in reviewing for TrustNLP?

If you are interested in reviewing submissions, please fill out this form.

Questions?

Please contact us at trustnlp24naaclworkshop@googlegroups.com.