I am a third-year CS PhD student at Duke University, advised by Prof. Bhuwan Dhingra. I develop methods to improve the factuality, reliability, and efficiency of large language models (LLMs), with the goal of making them more trustworthy and practical for real-world use.
Previously, I received my Master’s degree from Columbia University, where I was fortunate to be advised by Prof. Zhou Yu and Prof. Kathleen McKeown. I obtained my Bachelor’s degree from Tsinghua University. I have also spent time doing internships at Amazon AGI and ByteDance.
Research Interests
My research focuses on advancing LLM Agents across several key areas in AI/NLP/ML:
- Factuality and Knowledge: Improving how LLM Agents search, ground, and learn knowledge to produce accurate and verifiable answers
- Trustworthiness: Calibrating LLMs’ confidence to align with knowledge boundaries
- Efficient Inference: Developing algorithms for faster, resource-efficient inference without compromising performance
Recent News
- Aug 2025: Completed my internship at Amazon AGI
- May 2025: Two papers are accepted to ACL 2025
- February 2025: RAG conflict resolution paper accepted as a spotlight at ICLR 2025
Publications
When Human Gold Breaks: Co-Evolving Agents and Benchmarks for Evaluating Factuality in Deep Research Reports
Yukun Huang, Leonardo Ribeiro, Momchil Hardalov, Bhuwan Dhingra, Venkatesh Saligrama, Markus Dreyer
Under Review, 2025
Paper Bib
@article{Huang2025EvalDeepResearch,
title={When Human Gold Breaks: Co-Evolving Agents and Benchmarks for Evaluating Factuality in Deep Research Reports},
author={Huang, Yukun and Ribeiro, Leonardo and Hardalov, Momchil and Dhingra, Bhuwan and Saligrama, Venkatesh and Dreyer, Markus},
journal={Under Review},
year={2025},
url={https://kkkevinkkkkk.github.io},
note={Summary: We propose Evolve Benchmarking, a new paradigm for Deep Research Report (DRR) factuality evaluation where benchmarks and agents co-evolve through an Audit-then-Score protocol, replacing unreliable static human "gold". It yields two artifacts: DeepFact-Bench, an auditable evolving benchmark, and DeepFact-Eval, an expert-level agentic verifier for DRR claims.}
}Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra
Under Review 2025
Paper Bib
@misc{huang2025citepretrain,
title={Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models},
author={Huang, Yukun and Chen, Sanxing and Pei, Jian and Zaheer, Manzil and Dhingra, Bhuwan},
year={2025},
eprint={2506.17585},
archivePrefix={arXiv},
primaryClass={cs.AI},
doi={10.48550/arXiv.2506.17585},
url={https://arxiv.org/abs/2506.17585}
}When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Sanxing Chen, Xiaoyin Chen, Yukun Huang, Roy Xie, Bhuwan Dhingra
Under Review, 2025
PaperCode Bib
@article{Chen2025WhenGW,
title={When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training},
author={Chen, Sanxing and Chen, Xiaoyin and Huang, Yukun and Xie, Roy and Dhingra, Bhuwan},
journal={Under Review},
year={2025},
volume={abs/2509.24923},
url={https://arxiv.org/abs/2509.24923},
note={Summary: We compares Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) for teaching LLMs to explore in multi-armed bandit tasks. We introduce strategic and algorithmic reward designs, showing that RL generalizes better across environments but tends to learn more exploitative, short-sighted strategies than SFT.}
}To Trust or Not to Trust? Enhancing Large Language Models’ Situated Faithfulness to External Contexts
Yukun Huang, Sanxing Chen, Hongyi Cai, Bhuwan Dhingra
ICLR Spotlight, 2025
Paper Code Bib
@inproceedings{huang2025situatedfaithfulness,
title={To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts},
author={Huang, Yukun and Chen, Sanxing and Cai, Hongyi and Dhingra, Bhuwan},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
url={https://openreview.net/forum?id=K2jOacHUlO},
note={Spotlight}
}Real-time Factuality Assessment from Adversarial Feedback
Sanxing Chen, Yukun Huang, Bhuwan Dhingra
ACL, 2025
Paper Code Bib
@inproceedings{chen-etal-2025-real,
title={Real-time Factuality Assessment from Adversarial Feedback},
author={Chen, Sanxing and Huang, Yukun and Dhingra, Bhuwan},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year={2025},
pages={1610–1630},
url={https://aclanthology.org/2025.acl-long.81/},
doi={10.18653/v1/2025.acl-long.81}
}Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
Maximilian Holsman, Yukun Huang, Bhuwan Dhingra
ACL Findings, 2025
Paper Code Bib
@inproceedings{holsman-etal-2025-fuzzy,
title={Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff},
author={Holsman, Maximilian and Huang, Yukun and Dhingra, Bhuwan},
booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
year={2025},
pages={26257–26273},
url={https://aclanthology.org/2025.findings-acl.1346.pdf},
doi={10.18653/v1/2025.findings-acl.1346}
}Calibrating Long-form Generations From Large Language Models
Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra
EMNLP Findings, 2024
Paper Code Bib
@inproceedings{huang-etal-2024-calibrating,
title={Calibrating Long-form Generations From Large Language Models},
author={Huang, Yukun and Liu, Yixin and Thirukovalluru, Raghuveer and Cohan, Arman and Dhingra, Bhuwan},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
year={2024},
pages={13441–13460},
url={https://aclanthology.org/2024.findings-emnlp.785/},
doi={10.18653/v1/2024.findings-emnlp.785}
}Atomic Self-Consistency for Better Long Form Generations
Raghuveer Thirukovalluru, Yukun Huang, Bhuwan Dhingra
EMNLP, 2024
Paper Code Bib
@inproceedings{thirukovalluru-etal-2024-atomic,
title={Atomic Self-Consistency for Better Long Form Generations},
author={Thirukovalluru, Raghuveer and Huang, Yukun and Dhingra, Bhuwan},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
year={2024},
pages={12681–12694},
url={https://aclanthology.org/2024.emnlp-main.706.pdf},
doi={10.18653/v1/2024.emnlp-main.706}
}Learning and Evaluating Factual Clarification Question Generation Without Examples
Matthew Toles, Yukun Huang, Zhou Yu
ACL GEM2 Workshop, 2025
Paper Bib
@inproceedings{toles-etal-2025-learning,
title={Learning and Evaluating Factual Clarification Question Generation Without Examples},
author={Toles, Matthew and Huang, Yukun and Yu, Zhou},
booktitle={Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)},
year={2025},
pages={200–211},
url={https://aclanthology.org/2025.gem-1.15.pdf}
}Learning a Better Initialization for Soft Prompts via Meta-Learning
Yukun Huang, Kun Qian, Zhou Yu
AACL Oral, 2023
Paper Bib
@inproceedings{huang-etal-2023-learning-better,
title={Learning a Better Initialization for Soft Prompts via Meta-Learning},
author={Huang, Yukun and Qian, Kun and Yu, Zhou},
booktitle={Proceedings of the 13th International Joint Conference on Natural Language Processing (Short Papers)},
year={2023},
pages={67–75},
url={https://aclanthology.org/2023.ijcnlp-short.8.pdf},
doi={10.18653/v1/2023.ijcnlp-short.8}
}In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown
Preprint, 2022
Paper Bib
@misc{huang2022incontext,
title={In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models},
author={Huang, Yukun and Chen, Yanda and Yu, Zhou and McKeown, Kathleen},
year={2022},
eprint={2212.10670},
archivePrefix={arXiv},
primaryClass={cs.CL},
doi={10.48550/arXiv.2212.10670},
url={https://arxiv.org/abs/2212.10670}
}Services
Reviewer: ICLR (Notable Reviewer 2025), NeurIPS (Top Reviewer 2025), ACL, EMNLP, ARR
Teaching
- Teaching Assistant, Large Language Models (CS 590) — Duke University, Fall 2025
- Teaching Assistant, Probabilistic Machine Learning (STA 561) — Duke University, Spring 2025
- Teaching Assistant, Natural Language Processing (COMS 4701) — Columbia University, Summer 2022 & Spring 2023
- Teaching Assistant, Analysis of Algorithms (CSOR 4231) — Columbia University, Spring 2022