Publications

You can also find my articles on my Google Scholar profile.

Journal Articles

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

Published in Under Review, 2025

We introduce CitePretrainBench, the first benchmark that checks whether an LLM can trace answers back to the passages it saw during continual pre‑training. By adding Active Indexing—a bidirectional fact-identifier augmentation that explicitly teaches citation during pre‑training, LLMs achieve noticeable improvement over baselines.

Recommended citation: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra. (2025). "Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models." Under Review.
Download Paper

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Published in Preprint, 2023

We propose In-Context Learning Distillation, a method to transfer few-shot learning capabilities from large to smaller models by combining in-context and language modeling objectives. Under two paradigms—Meta-ICT and Multitask-ICT—our approach achieves consistent gains.

Recommended citation: Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown. (2023). "In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models." Preprint.
Download Paper

Conference Papers

Alexpaca: Learning Factual Clarification Question Generation Without Examples

Published in ACL GEM2 Workshop 2025, 2025

We propose PACQ, a framework for evaluating clarifying questions by the usefulness of retrieved information. Using fact-level masking, we generate a self-supervised PACQ dataset from HotpotQA and show that zero-shot LLMs lag behind humans, highlighting a gap in pragmatic question generation.

Recommended citation: Matthew Toles, Yukun Huang, Zhou Yu, Luis Gravano. (2025). "Alexpaca: Learning Factual Clarification Question Generation Without Examples." ACL GEM2 Workshop 2025.
Download Paper

Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff

Published in ACL 2025 Findings, 2025

We introduce Fuzzy Speculative Decoding (FSD), a flexible decoding algorithm that relaxes the strict distributional equivalence required by Speculative Decoding. By allowing controlled divergence between draft and target model distributions, FSD enables tunable trade-offs between speed and accuracy.

Recommended citation: Maximilian Holsman, Yukun Huang, Bhuwan Dhingra. (2025). "Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff." ACL 2025 Findings.
Download Paper

Real-time Fake News from Adversarial Feedback

Published in ACL 2025, 2025

We build a closed-loop generator-critic system that uses RAG to simulate challenging fact-checking scenarios. The resulting dataset exposes detectors blind spots (accuracy drops from 82%→64%), providing an open benchmark and methodology for stress‑testing misinformation defenses.

Recommended citation: Sanxing Chen, Yukun Huang, Bhuwan Dhingra. (2025). "Real-time Fake News from Adversarial Feedback." ACL 2025.
Download Paper

Enhancing Large Language Models’ Situated Faithfulness to External Contexts

Published in ICLR 2025 (Spotlight), 2025

We argue that models should be situated faithful—reasoning appropriately when external documents conflict with internal knowledge. We benchmark this challenge and propose confidence reasoning to dynamically calibrate trust in external sources, improving factual accuracy up to 53.7% → 78.1%.

Recommended citation: Yukun Huang, Sanxing Chen, Hongyi Cai, Bhuwan Dhingra. (2025). "Enhancing Large Language Models' Situated Faithfulness to External Contexts." ICLR 2025 (Spotlight).
Download Paper

Atomic Self-Consistency for Better Long Form Generations

Published in EMNLP 2024, 2024

We propose Atomic Self-Consistency (ASC) to improve recall in long-form LLM responses by merging accurate subparts from multiple generations, rather than selecting a single best output. ASC outperforms prior methods like Universal Self-Consistency across multiple QA benchmarks.

Recommended citation: Raghuveer Thirukovalluru, Yukun Huang, Bhuwan Dhingra. (2024). "Atomic Self-Consistency for Better Long Form Generations." EMNLP 2024.
Download Paper

Calibrating Long-form Generations from Large Language Models

Published in EMNLP 2024 Findings, 2024

We develop a calibration framework for long-form LLM generations, treating both correctness and confidence as distributions. We evaluate model calibration using self-consistency and self-verbalization methods, and demonstrate an application that reduces API costs by selectively answering.

Recommended citation: Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra. (2024). "Calibrating Long-form Generations from Large Language Models." EMNLP 2024 Findings.
Download Paper

Learning a Better Initialization for Soft Prompts via Meta-Learning

Published in AACL 2023 (Oral), 2023

We propose MetaPT, a meta-learned prompt tuning method that clusters pre-training data into auxiliary tasks to improve prompt initialization for few-shot learning. It outperforms prior methods across seven downstream tasks with improved stability.

Recommended citation: Yukun Huang, Kun Qian, Zhou Yu. (2023). "Learning a Better Initialization for Soft Prompts via Meta-Learning." Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. 67-75.
Download Paper