News

In the rapid development of artificial intelligence, especially in the realm of large language models (LLMs), we have gradually witnessed a transition from monolithic models to multi-agent systems.
The cost of false failures caused by the sockets is skyrocketing due to the extended test time of system-level testing.
On benchmark evaluations, K2 Think leads all other open-source models in competitive math performance. It scored 90.8 on AIME 2024, 81.2 on AIME 2025, and 73.8 on HMMT 2025, according to benchmarks ...