Papers

2025

(How) do language models track state?

Belinda Z. Li, Zifan Carl Guo, Jacob Andreas.

ICML 2025.
The surprising effectiveness of test-time training for few-shot learning.

Ekin Akyürek, Mehul Damani, Adam Zweiger, Linlu Qiu, Han Guo, Jyothish Pari, Yoon Kim, Jacob Andreas.

ICML 2025.
A hitchhiker’s guide to scaling law estimation.

Leshem Choshen, Yang Zhang, Jacob Andreas.

ICML 2025.
Modeling open-world cognition as on-demand synthesis of probabilistic models.

Lio Wong, Katherine Collins, Lance Ying, Cedegao Zhang, Tobias Gerstenberg, Timothy O’Donnell, Alex Lew, Jacob Andreas, Tyler Brooke-Wilson and Josh Tenenbaum.

CogSci 2025.
Language and experience: A computational model of social learning in complex tasks.

Cédric Colas, Tracey Mills, Ben Prystawski, M.H. Tessler, Noah Goodman, Jacob Andreas and Josh Tenenbaum.

CogSci 2025.
Learning how hard to think: Input-adaptive allocation of LM computation.

Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu and Jacob Andreas.

ICLR 2025.
Eliciting human preferences with language models.

Belinda Li*, Alex Tamkin*, Noah Goodman and Jacob Andreas.

ICLR 2025.

Press: VentureBeat
A probabilistic framework for LLM hallucination detection via belief tree propagation.

Bairu Hou, Yang Zhang, Jacob Andreas and Shiyu Chang.

NAACL 2025.
Language modeling with editable external knowledge.

Belinda Z. Li, Emmy Liu, Alexis Ross, Abbas Zeitoun, Graham Neubig and Jacob Andreas

NAACL Findings, 2025.

2024

Loose LIPS sink ships: Asking questions in Battleship with language-informed program sampling.

Gabe Grand, Valerio Pepe, Jacob Andreas and Joshua B. Tenenbaum.

CogSci, 2024.
Algorithmic capabilities of random transformers.

Ziqian Zhong and Jacob Andreas.

NeurIPS, 2024.
Generative AI in the era of “alternative facts”.

Saadia Gabriel, Liang Lyu, James Siderius, Marzyeh Ghassemi, Jacob Andreas, Asu Ozdaglar.

EMNLP 2024.
Adaptive language-guided abstraction from contrastive explanations.

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas and Andreea Bobu.

CoRL 2024.
Inspecting and editing knowledge representations in language models.

Evan Hernandez, Belinda Z. Li and Jacob Andreas.

COLM 2024.
An incomplete loop: Deductive, inductive and abductive learning in large language models.

Emmy Liu, Graham Neubig and Jacob Andreas.

COLM 2024.
Unforgettable generalization in language models.

Eric Zhang, Leshem Choshen and Jacob Andreas.

COLM 2024.
Toward in-context teaching: Adapting examples to students’ misconceptions.

Alexis Ross and Jacob Andreas.

ACL 2024.
Lexicon-level contrastive visual grounding improves language modeling.

Chengxu Zhuang, Ev Fedorenko and Jacob Andreas.

ACL Findings 2024.
Deductive closure training of language models for coherence, accuracy and updatability.

Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas.

ACL Findings 2024.
In-context language learning: Architectures and algorithms.

Ekin Akyürek, Bailin Wang, Yoon Kim and Jacob Andreas.

ICML 2024.
A multimodal automated interpretability agent.

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba.

ICML 2024.

project page
Decomposing uncertainty for large language models through input clarification ensembling.

Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang and Yang Zhang.

ICML 2024 Oral.
Learning phonotactics from linguistic informants.

Canaan Breiss*, Alexis Ross*, Amani Maina-Kilaas, Roger P. Levy, Jacob Andreas.

SCiL 2024.
Contextual and combinatorial structure in sperm whale vocalizations.

Pratyusha Sharma, Shane Gero, Roger Payne, David F. Gruber, Daniela Rus*, Antonio Torralba*, Jacob Andreas*.

Nature Communications 2024.

Press: New York Times (The Daily podcast), Washington Post
Regularized conventions: Equilibrium computation as a model of pragmatic reasoning.

Athul Paul Jacob, Gabriele Farina and Jacob Andreas.

NAACL 2024, SCiL 2024.
Visual grounding helps learn word meanings in low-data regimes.

Chengxu Zhuang, Evelina Fedorenko and Jacob Andreas.

NAACL 2024 Best paper.
Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual evaluations.

Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas and Yoon Kim.

NAACL 2024.
LaMPP: Language models as probabilistic priors for perception and action.

Belinda Z. Li, William Chen, Pratyusha Sharma and Jacob Andreas.

ICLR 2024 Workshop on Generative AI for Decision-Making.
The consensus game: Language model generation via equilibrium search.

Athul Paul Jacob, Yikang Shen, Gabriele Farina and Jacob Andreas.

ICLR 2024 Spotlight, NeurIPS R0-FoMo Workshop Best Paper.

Press: Quanta
Linearity of relation decoding in transformer language models.

Evan Hernandez*, Arnab Sen Sharma*, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov and David Bau.

ICLR 2024 Spotlight.

Press: Hacker News, MIT News.
Learning with language-guided state abstractions.

Andi Peng, Ilia Sucholutsky, Belinda Z. Li, Theodore Sumers, Thomas L. Griffiths, Jacob Andreas and Julie Shah.

ICLR 2024.

Press: MIT News
Learning adaptive planning representations with natural language guidance.

Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum and Jacob Andreas.

ICLR 2024.

Press: MIT News
Modeling boundedly rational agents with latent inference budgets.

Athul Paul Jacob, Abhishek Gupta and Jacob Andreas.

ICLR 2024.

Press: MIT News.
LILO: Learning interpretable libraries by compressing and documenting code.

Gabriel Grand, Lionel Wong, Maddy Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum and Jacob Andreas.

ICLR 2024.

Press: MIT News

2023

Alignment via mutual information.

Shinjini Ghosh, Yoon Kim, Ramon Fernandez Astudillo, Tahira Naseem and Jacob Andreas.

CoNLL 2023.

code
Cognitive dissonance: Why do language model outputs disagree with internal representations of truthfulness?

Kevin Liu, Stephen Casper, Dylan Hadfield-Menell and Jacob Andreas.

EMNLP 2023.

code
Pushdown layers: Encoding recursive structure in transformer language models.

Shikhar Murty, Pratyusha Sharma, Jacob Andreas and Christopher D. Manning.

EMNLP 2023.

code
AutoReply: Detecting nonsense in dialogue introspectively with discriminative replies.

Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis.

EMNLP Findings 2023.
Pseudointelligence: A unifying framework for language model evaluation.

Shikhar Murty*, Orr Paradise*, Pratyusha Sharma*.

EMNLP Findings 2023.
The clock and the pizza: Two stories in mechanistic explanation of neural networks.

Ziming Liu*, Ziqian Zhong*, Max Tegmark and Jacob Andreas.

NeurIPS 2023 Oral.

Press: Quanta code
A function interpretation benchmark for evaluating interpretability methods.

Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba. NeurIPS Datasets & Benchmarks 2023.

Press: MIT News. project page
Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network.

Carina Kauf, Greta Tuckute, Roger Levy, Jacob Andreas and Evelina Fedorenko.

Neurobiology of Language 2023.

Press: The Atlantic code
Compositionality as lexical symmetry..

Ekin Akyürek and Jacob Andreas.

ACL 2023 Lexical Semantics Area Award.

code
Grokking of hierarchical structure in vanilla transformers.

Shikhar Murty, Pratyusha Sharma, Jacob Andreas and Chris Manning.

ACL 2023.

code
Language modeling with latent situations.

Belinda Z. Li, Max Nye and Jacob Andreas.

ACL Findings 2023.
Guiding pretraining in reinforcement learning with large language models.

Yuqing Du*, Olivia Watkins*, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta and Jacob Andreas.

ICML 2023.

code
PromptBoosting: Text classification with langauge models in ten forward passes.

Bairu Hou, Joe O’Connor, Jacob Andreas, Yang Zhang and Shiyu Chang.

ICML, 2023.
What learning algorithm is in-context learning? Investigations with linear models.

Ekin Akyürek, Dale Schuurmans, Jacob Andreas*, Tengyu Ma*, Denny Zhou*.

ICLR 2023 Notable Top-5% Paper.

Press: Motherboard, MIT News. code
Characterizing intrinsic compositionality in transformers with tree projections.

Shikhar Murty, Pratyusha Sharma, Jacob Andreas and Christopher Manning. ICLR 2023.
Compositional semantic parsing with large language models.

Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, Denny Zhou. ICLR 2023.
Mastering the game of no-press Diplomacy via human-regularized reinforcement learning and planning.

Anton Bakhtin*, David J Wu*, Adam Lerer*, Jonathan Gray*, Athul Paul Jacob*, Gabriele Farina*, Alexander H Miller, Noam Brown.

ICLR 2023 Notable Top-5% Paper.

Press: NewScientist
Top-down synthesis for library learning.

Matthew Bowers, Theo X. Olausson, Lionel Wong, Gabriel Grand, Joshua B. Tenenbaum, Kevin Ellis, Armando Solar-Lezama.

POPL 2023.

code

2022

Pre-trained language models for interactive decision-making.

Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar*, Jacob Andreas*, Igor Mordatch*, Antonio Torralba*, Yuke Zhu*.

NeurIPS 2022 Oral.

project page
Human-level play in the game of Diplomacy by combining language models with strategic reasoning.

FAIR, Anton Bakhtin* , Noam Brown*, Emily Dinan*, Gabriele Farina, Colin Flaherty*, Daniel Fried, Andrew Goff, Jonathan Gray*, Hengyuan Hu*, Athul Paul Jacob*, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer*, Mike Lewis*, Alexander H. Miller*, Sasha Mitts, Adithya Renduchintala*, Stephen Roller, Dirk Rowe, Weiyan Shi*, Joe Spisak, Alexander Wei, David Wu*, Hugh Zhang*, Markus Zijlstra. Science 2022.

Press: The Economist, Forbes, The Washington Post, The New York Times, The New Yorker, VentureBeat.
Language models as agent models.

Jacob Andreas.

EMNLP Findings 2022.
Toward tracing factual knowledge in language models back to the training data.

Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu.

EMNLP Findings 2022.

code
Hierarchical phrase-based sequence-to-sequence learning.

Bailin Wang, Ivan Titov, Jacob Andreas and Yoon Kim.

EMNLP 2022.

code
Modeling strong and human-Like gameplay with KL-regularized search.

Athul Paul Jacob*, David J. Wu*, Gabriele Farina*, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown.

ICML 2022 Spotlight.
Toward understanding the communication in sperm whales.

Jacob Andreas, Gašper Beguš, Michael M. Bronstein, Roee Diamant, Denley Delaney, Shane Gero, Shafi Goldwasser, David F. Gruber, Sarah de Haas, Peter Malkin, Nikolay Pavlov, Roger Payne, Giovanni Petri, Daniela Rus, Pratyusha Sharma, Dan Tchernov, Pernille Tønnesen, Antonio Torralba, Daniel Vogt, Robert J. Wood.

iScience 2022.

Press: The New Yorker
Identifying concept libraries from language about object structure.

Lionel Wong*, William P. McCarthy*, Gabriel Grand*, Yoni Friedman, Joshua B. Tenenbaum, Jacob Andreas, Robert D. Hawkins, Judith E. Fan.

CogSci 2022.

code
Correcting robot plans with natural language feedback.

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, Dieter Fox.

RSS 2022.
Quantifying adaptability in pre-trained language models with 500 tasks.

Belinda Z. Li, Jane Yu, Madian Khabsa, Luke Zettlemoyer, Alon Halevy, Jacob Andreas.

NAACL 2022.

code
Skill induction and planning with latent language.

Pratyusha Sharma, Antonio Torralba, and Jacob Andreas.

ACL 2022.

project page
Natural language descriptions of deep visual features.

Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba and Jacob Andreas.

ICLR 2022 Oral.

Press: MIT News project page
Subspace regularizers for few-shot class incremental learning.

Afra Feyza Akyürek, Ekin Akyürek, Derry Wijaya and Jacob Andreas.

ICLR 2022.

code

2021

Teachable reinforcement learning via advice distillation.

Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas and Abhishek Gupta.

code
How do neural sequence models generalize? Local and global context cues for out-of-distribution prediction.

Anthony Bau and Jacob Andreas.

EMNLP 2021.
Toward a visual concept vocabulary for generative adversarial networks.

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba.

ICCV 2021.

project page
The low-dimensional linear geometry of contextualized word representations.

Evan Hernandez and Jacob Andreas.

CoNLL 2021.

code
Leveraging language to learn program abstractions and search heuristics.

Lionel Wong, Kevin Ellis, Joshua B. Tenenbaum and Jacob Andreas.

ICML 2021.

code
Implicit representations of meaning in neural language models.

Belinda Z. Li, Maxwell Nye and Jacob Andreas.

ACL 2021.

Press: Scientific American code
What context features can transformer language models use?

Joe O’Connor and Jacob Andreas.

ACL 2021.
Lexicon learning for few-shot sequence modeling.

Ekin Akyürek and Jacob Andreas.

ACL 2021.

code
Multitasking inhibits semantic drift.

Athul Paul Jacob, Mike Lewis and Jacob Andreas.

NAACL 2021.
Representing partial programs with blended abstract semantics.

Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B. Tenenbaum, Armando Solar-Lezama.

ICLR 2021.
Learning to recombine and resample data for compositional generalization.

Ekin Akyürek, Afra Feyza Akyürek and Jacob Andreas.

ICLR 2021.

code

2020

Compositional explanations of neurons.

Jesse Mu and Jacob Andreas.

NeurIPS 2020 Oral.

code
A benchmark for systematic generalization in grounded language understanding.

Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt and Brenden Lake.

NeurIPS 2020.

code
Good-enough compositional data augmentation.

Jacob Andreas.

ACL 2020.

code
Unnatural language processing: bridging the gap between synthetic and natural language data.

Alana Marzoev, Sam Madden, Frans Kaashoek, Mike Cafarella and Jacob Andreas.

NeurIPS workshop on Emergent Communication.

2019

A survey of reinforcement learning informed by natural language.

Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson and Tim Rocktäschel.

IJCAI 2019.
Measuring compositionality in representation learning.

Jacob Andreas.

ICLR 2019.

code