Microsoft is adding new AI shopping tools to its Edge browser in the US. The built-in Copilot can now surface price comparisons, price histories, and cashback options right inside the browser. Users ...
A new physics benchmark called "CritPt" puts leading AI models to the test at the level of early-stage PhD research. The results show that even top systems like Gemini 3 Pro and GPT-5 still fall far ...
OpenAI researchers recently claimed a major math breakthrough on X, but quickly walked it back after criticism from the community, including Deepmind CEO Demis Hassabis, who called out the sloppy ...
US President Donald Trump signed an order on Monday to launch a shared AI platform for federal research data. Called the Genesis Mission, the effort aims to make large datasets from federal agencies ...
OpenAI is feeling the pressure: an internal memo reveals how Sam Altman is reacting to Google's lead with Gemini 3 and the new model OpenAI plans to use to fight back. According to a report from The ...
The company has hired Aaron Saunders, the former Chief Technology Officer of Boston Dynamics, as Vice President of Hardware Engineering—a move that strengthens its hardware expertise as it aims to ...
OpenAI has introduced "Shopping Research," a new ChatGPT feature built to help people search for products. The tool adds a new layer of personalization through ChatGPT's existing memory system, ...
The White House has reportedly put a hold on a draft executive order that would have let federal law override state-level AI regulations. According to Reuters, the draft called for the Department of ...
A new international study highlights major problems with large language model (LLM) benchmarks, showing that most current evaluation methods have serious flaws. After reviewing 445 benchmark papers ...
Amazon has announced a major investment in its AI footprint for federal work, saying it will spend up to $50 billion to expand AI and supercomputing infrastructure for U.S. government agencies. The ...
A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability. Only a handful - including Claude 3 Opus, Claude 3.5 Sonnet, Llama 3 ...
New research from Anthropic shows how reward hacking in AI models can trigger more dangerous behaviors. When models learn to trick their reward systems, they can spontaneously drift into deception, ...