Summarized by Dodly:

New AI Model Claims 12M Token Context, But Skepticism Abounds

Summary

A new AI model, SubQ from Subquadratic, claims an unprecedented 12 million token context window and 52 times greater efficiency than existing models, with no quality loss. This would be groundbreaking for both cloud and local AI applications, potentially allowing models to process vast amounts of data like entire codebases or months of documents. The core innovation appears to be a sparse attention architecture, a concept that has historically been difficult to implement effectively. However, the announcement is met with significant skepticism due to a lack of detailed technical information and a formal technical report. While benchmarks are provided, including competitive scores on SWE-Bench Verified and 1 million token retrieval tasks, there's confusion and inconsistency regarding which model version and context length were tested. Crucially, benchmarks for the claimed 12 million token capability are absent, and comparisons are often made against models using a million tokens. The speaker has applied for early access to test these claims, especially the 12 million token version, as the widespread availability of efficient, large-context local models could significantly benefit users.

Summary

Play the full video