Joan Bruna

Affiliation

New York University

Talk Title

Separations in Self-Attention Layers via Compositional Reasoning

Abstract

We introduce a natural class of functions called compositional reasoning queries (CRQ), based on recursive inner-product comparisons along a tree, as a template to study the approximation power of transformers. First, focusing on single-layer self-attention, we establish super-polynomial separations between low-rank and full-rank self-attention, by exploiting the rotation symmetry of CRQs. Next, building on recent work by Merrill and Sabharwal, we show that transformers provably require depth logarithmic in the size of the context, and provide better parallelization than other architectures (RNNs, CoT). Joint work with Noah Amsel and Gilad Yehudai.

Bio

Website

https://cims.nyu.edu/~bruna/