UQ for AI Safety and Fairness

This project covers a series of my work on uncertainty quantification under distribution shift, especially covariate shift. We look at various metrics to evaluate the quality of uncertainty estimates, ranging from marginal coverage, sharpness, to expected calibration errors (for subgroups). We aim to develop rigorous and principled methods that is also practical in real world applications. For example, we try to improve and better balance the sample complexity and computational complexity for popular distributon-free methods like conformal predictions. We also utilize data beyond traditional labeled data in supervised learning for the purpose of model calibration. For example, we leverage human annotation to “align” the LLM better with human uncertainty.

Here are some featured papers in this line of work:

A series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, utilizing the more sample efficient jackknife+ methods: Andrew Prinster, Anqi Liu, Suchi Saria. “JAWS: Auditing Predictive Uncertainty Under Covariate Shift”, in NeurIPS 2022.
A collection of methods based on the jackknife+ that achieve a practical balance of computational and statistical efficiency: Drew Prinster, Suchi Saria, Anqi Liu. “JAWS-X: Addressing Efficiency Bottlenecks of Conformal Prediction Under Standard and Feedback Covariate Shift”, in ICML 2023.
Calibrated and efficient uncertainty estimation for deep classification under distribution shifts: “Density-Softmax: Scalable and Calibrated Uncertainty Estimation under Distribution Shifts”, on ArXiv 2023.
Calibrated and efficient uncertainty estimation for deep regression under distribution shifts: Ha Manh Bui, Anqi Liu. “Density-Regression: Efficient and Distance-Aware Deep Regressor for Uncertainty Estimation under Distribution Shifts”, in AISTATS2024
Calibrating pre-trained large language models in cross-lingual tasks: Zhengping Jiang, Anqi Liu, Benjamin Van Durme. “Calibrating Zero-shot Cross-lingual (Un-)structured Predictions”, in EMNLP2022.
Human scalar annotation for better calibrating large language models: Zhengping Jiang, Anqi Liu, Benjamnin Van Durme. Addressing the Binning Problem in Calibration Assessment through Scalar Annotations, in TACL 2024.