Featured Speaker
- , University of Chicago (read with Lek-Heng Lim in Quanta Magazine)
Noncommutative positivstellensatz and stochastic gradient descent (Friday 1:15pm)
Abstract: Most 黑料社区s with even a fleeting brush with algebraic geometry would have seen Hilbert’s Nullstellensatz over the complex field. In mainstream algebraic geometry, it is a starting point for defining increasingly abstract geometric objects over progressively esoteric fields and rings, ultimately taking it to dizzying heights. But at least from an applied math perspective, the answers to the most natural questions about Nullstellensatz — what if we do it over reals? what if we have inequalities? what if we have variables that don’t commute? what if we have functions more general than polynomials? — cannot be found in mainstream algebraic geometry. Somewhat surprisingly, these have been answered by applied mathematicians working on control engineering problems. We will discuss one of these extensions of Nullstellensatz and see how it applies to solve an open problem about stochastic gradient descent. This is based on joint work with Zehua Lai.
Attention is a smoothed cubic spline (Saturday 9:30am)
Abstract: We highlight a perhaps important but hitherto unobserved insight: The attention module in a transformer is a smoothed cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. More precisely, we show that with ReLU-activation, attention, masked attention, encoder-decoder attention are all cubic splines. As every component in a transformer is constructed out of compositions of various attention modules (= cubic splines) and feed forward neural networks (= linear splines), all its components — encoder, decoder, and encoder-decoder blocks; multilayered encoders and decoders; the transformer itself — are cubic or higher-order splines. If we assume the Pierce-Birkhoff conjecture, then the converse also holds, i.e., every spline is a ReLU-activated encoder. Since a spline is generally just C^2, one way to obtain a smoothed C^\infty-version is by replacing
ReLU with a smooth activation; and if this activation is chosen to be SoftMax, we recover the original transformer as proposed by Vaswani et al. This insight sheds light on the nature of the transformer by casting it entirely in terms of splines, one of the best known and thoroughly understood objects in applied mathematics. This is joint work with Zehua Lai (Texas) and Yucong Liu (Georgia Tech).