DMR News

Advancing Digital Conversations

TTT Models Poised to Revolutionize Generative AI Landscape

ByYasmeeta Oon

Jul 21, 2024

TTT Models Poised to Revolutionize Generative AI Landscape

After years of dominance by transformers, the AI community is actively searching for new architectures to overcome their limitations.

Transformers are central to models like OpenAI’s Sora, Anthropic’s Claude, Google’s Gemini, and GPT-4. However, they face technical challenges, especially in terms of computation. Transformers are not very efficient at processing large amounts of data on standard hardware, leading to increases in power demand as companies scale their infrastructure.

Recently, researchers from Stanford, UC San Diego, UC Berkeley, and Meta proposed a new architecture called test-time training (TTT). Developed over a year and a half, TTT models promise to process more data than transformers while using much less compute power.

FeatureTransformersTTT Models
Data Processing EfficiencyInefficient on standard hardwareHighly efficient
Power ConsumptionHighLow
Hidden StateUses a long list of dataReplaces with a machine learning model
ScalabilityLimited by computational demandsScales without increasing internal size

A fundamental part of transformers is the “hidden state,” a long list of data that the model updates as it processes information. This is like a lookup table, which can be computationally demanding. In contrast, TTT models use an internal machine learning model that encodes data into representative variables called weights. This approach ensures the internal model size remains constant, regardless of the amount of data processed.

According to Yu Sun, a post-doc at Stanford and co-contributor to the TTT research, future TTT models could process billions of pieces of data, from words to images to videos, far beyond the capabilities of current models.

“Our system can say X words about a book without the computational complexity of rereading the book X times,” Sun explained. For example, video models like Sora can only process 10 seconds of video due to the limitations of their lookup table “brains.” The goal is to develop a system that can process long videos, resembling the visual experience of a human life.

While TTT models show promise, they are not yet a drop-in replacement for transformers. The researchers have only developed two small models, making it difficult to compare TTT to larger transformer implementations.

Mike Cook, a senior lecturer at King’s College London’s department of informatics, noted the innovation but remained cautious. He pointed out that adding another neural network layer is a familiar approach in computer science but doesn’t necessarily guarantee better performance.

The quest for transformer alternatives is gaining momentum. AI startup Mistral released a model called Codestral Mamba, based on state space models (SSMs), which also promise computational efficiency and scalability. AI21 Labs and Cartesia are also exploring SSMs, with Cartesia pioneering some of the first models.

If successful, these efforts could make generative AI more accessible and widespread, impacting various sectors and everyday life.


Featured Image courtesy of DALL-E by ChatGPT

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.

Leave a Reply

Your email address will not be published. Required fields are marked *