TTT Models Poised to Revolutionize Generative AI Landscape

After years of dominance by transformers, the AI community is actively searching for new architectures to overcome their limitations.

Transformers are central to models like OpenAI’s Sora, Anthropic’s Claude, Google’s Gemini, and GPT-4. However, they face technical challenges, especially in terms of computation. Transformers are not very efficient at processing large amounts of data on standard hardware, leading to increases in power demand as companies scale their infrastructure.

Recently, researchers from Stanford, UC San Diego, UC Berkeley, and Meta proposed a new architecture called test-time training (TTT). Developed over a year and a half, TTT models promise to process more data than transformers while using much less compute power.

Feature	Transformers	TTT Models
Data Processing Efficiency	Inefficient on standard hardware	Highly efficient
Power Consumption	High	Low
Hidden State	Uses a long list of data	Replaces with a machine learning model
Scalability	Limited by computational demands	Scales without increasing internal size

A fundamental part of transformers is the “hidden state,” a long list of data that the model updates as it processes information. This is like a lookup table, which can be computationally demanding. In contrast, TTT models use an internal machine learning model that encodes data into representative variables called weights. This approach ensures the internal model size remains constant, regardless of the amount of data processed.

According to Yu Sun, a post-doc at Stanford and co-contributor to the TTT research, future TTT models could process billions of pieces of data, from words to images to videos, far beyond the capabilities of current models.

“Our system can say X words about a book without the computational complexity of rereading the book X times,” Sun explained. For example, video models like Sora can only process 10 seconds of video due to the limitations of their lookup table “brains.” The goal is to develop a system that can process long videos, resembling the visual experience of a human life.

While TTT models show promise, they are not yet a drop-in replacement for transformers. The researchers have only developed two small models, making it difficult to compare TTT to larger transformer implementations.

Mike Cook, a senior lecturer at King’s College London’s department of informatics, noted the innovation but remained cautious. He pointed out that adding another neural network layer is a familiar approach in computer science but doesn’t necessarily guarantee better performance.

The quest for transformer alternatives is gaining momentum. AI startup Mistral released a model called Codestral Mamba, based on state space models (SSMs), which also promise computational efficiency and scalability. AI21 Labs and Cartesia are also exploring SSMs, with Cartesia pioneering some of the first models.

Tweets by cartesia_ai

If successful, these efforts could make generative AI more accessible and widespread, impacting various sectors and everyday life.

Featured Image courtesy of DALL-E by ChatGPT

TTT Models Poised to Revolutionize Generative AI Landscape

ByYasmeeta Oon

Yasmeeta Oon

Related News

Wilson Cheah’s Easy Guitar Method: A Breakthrough for Beginners of All Ages

June Chew’s Manifestation Coaching Framework Gains Momentum as a Transformational Pathway for Midlifers

Federal Authorities Order Chinese Tech Company to Shut Down Canadian Operations Over National Security Concerns

Leave a Reply Cancel reply