Applied Math Seminar-Galerkin Transformer

Fri, 28 January, 2022 8:15pm

Date and Time: Friday, January 28th, 3:15-4:15 pm

The zoom link: https://gwu-edu.zoom.us/s/98013143310

passcode: 349324

Speaker: Shuhao Cao, Washington University in St. Louis

Title: Galerkin Transformer

Abstract: The Transformer in "Attention Is All You Need" is now THE ubiquitous architecture in every state-of-the-art model in Natural Language Processing (NLP) and Computer Vision (CV). At its heart and soul is the "attention mechanism". We study how to apply the attention mechanism the first time to a data-driven operator learning problem related to partial differential equations. Inspired by Fourier Neural Operator, an effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. It is demonstrated that the widely-accepted "indispensable" softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without the softmax normalization, the representation capability of a linearized Transformer variant can be proved to be on par with a Petrov-Galerkin projection layer-wise. Some simple changes mimicking projections in Hilbert spaces are applied to the attention mechanism, and it helps the final model achieve remarkable accuracy in end-to-end operator learning tasks with unnormalized data, surpassing the evaluation accuracy of the classical Transformer applied directly by 100 times. Meanwhile in many other experiments, the newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both speed and evaluation accuracy over its softmax-normalized counterparts, as well as other concurrently proposed linearizing variants.


Share This Event