In early June, Apple researchers released a study suggesting that simulated reasoning (SR) models, such as OpenAI's o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, produce outputs consistent with pattern-matching from training data when faced with novel problems requiring systematic thinking. The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematical proofs.
The new study, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," comes from a team at Apple led by Parshin Shojaee and Iman Mirzadeh, and it includes contributions from Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar.
The researchers examined what they call "large reasoning models" (LRMs), which attempt to simulate a logical reasoning process by producing a deliberative text output sometimes called "chain-of-thought reasoning" that ostensibly assists with solving problems in a step-by-step fashion.
Fonte: Ars Technica - Leia mais