A new paper from Apple’s artificial intelligence scientists has found that engines based on large language models, such as those from Meta and OpenAI, still lack basic reasoning skills.
The group has proposed a new benchmark, GSM-Symbolic, to help others measure the reasoning capabilities of various large language models (LLMs). Their initial testing reveals that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models.
The group investigated the “fragility” of mathematical reasoning by adding contextual information to their queries that a human could understand, but which should not affect the fundamental mathematics of the solution. This resulted in varying answers, which shouldn’t happen.
Continue Reading on AppleInsider | Discuss on our Forums
Source: AppleInsider News
It’s the time of year for gift giving, and early Black Friday sales are already…
Save 45% on the Alexa-enabled Govee floor lamp with LED RGB lights ahead of Black…
A bride has gone viral after revealing how a small detail ruined all her wedding…
The new Ariana Grande-Cynthia Erivo musical is just the latest project to travel the Yellow…
The lo-fi D-20 timepiece from Timestop includes a dice roller built in, alongside the date…
In a stunning legal reversal, the Illinois Supreme Court has overturned Empire actor Jussie Smollett’s…