Set as Homepage - Add to Favorites

【mating in captivity "a friendship without passion is no for eroticism】

Source：Wisdom Convergence Information Network Editor：Tech Time：2025-06-26 22:20:53

By OpenAI's own testing,mating in captivity "a friendship without passion is no for eroticism its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1.

First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination rate is 33 percent, and o4-mini's hallucination rate is 48 percent — almost half of the time. By comparison, o1's hallucination rate is 16 percent, meaning o3 hallucinated about twice as often.

SEE ALSO: All the AI news of the week: ChatGPT debuts o3 and o4-mini, Gemini talks to dolphins

The system card noted how o3 "tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims." But OpenAI doesn't know the underlying cause, simply saying, "More research is needed to understand the cause of this result."

You May Also Like

OpenAI's reasoning models are billed as more accurate than its non-reasoning models like GPT-4o and GPT-4.5 because they use more computation to "spend more time thinking before they respond," as described in the o1 announcement. Rather than largely relying on stochastic methods to provide an answer, the o-series models are trained to "refine their thinking process, try different strategies, and recognize their mistakes."

However, the system card for GPT-4.5, which was released in February, shows a 19 percent hallucination rate on the PersonQA evaluation. The same card also compares it to GPT-4o, which had a 30 percent hallucination rate.

Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. Thanks for signing up!

In a statement to Mashable, an OpenAI spokesperson said, “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability.”

Evaluation benchmarks are tricky. They can be subjective, especially if developed in-house, and research has found flaws in their datasets and even how they evaluate models.

Plus, some rely on different benchmarks and methods to test accuracy and hallucinations. HuggingFace's hallucination benchmark evaluates models on the "occurrence of hallucinations in generated summaries" from around 1,000 public documents and found much lower hallucination rates across the board for major models on the market than OpenAI's evaluations. GPT-4o scored 1.5 percent, GPT-4.5 preview 1.2 percent, and o3-mini-high with reasoning scored 0.8 percent. It's worth noting o3 and o4-mini weren't included in the current leaderboard.

That's all to say; even industry standard benchmarks make it difficult to assess hallucination rates.

Related Stories

Is OpenAI building a social network for ChatGPT's viral image generator?
We tried the ChatGPT 'reverse location search' trend, and it's scary
The latest ChatGPT trend? People are using it to turn their pets into humans.

Then there's the added complexity that models tend to be more accurate when tapping into web search to source their answers. But in order to use ChatGPT search, OpenAI shares data with third-party search providers, and Enterprise customers using OpenAI models internally might not be willing to expose their prompts to that.

Regardless, if OpenAI is saying their brand-new o3 and o4-mini models hallucinate higher than their non-reasoning models, that might be a problem for its users.

UPDATE: Apr. 21, 2025, 1:16 p.m. EDT This story has been updated with a statement from OpenAI.

Topics ChatGPT OpenAI

1
2
3
4
5
6
7
8
9
10
11

Previous：Bestway Hydro

Next：Trump tells '60 Minutes' that climate change will 'change back again'

Related Articles

Related Recommendations

Categories

Latest Articles

Popular Articles

Hot Recommendations

Featured Column

Quick Links

Ads that portray the stereotype of men being bad at housework will be banned Trump's latest flagged drew Twitter's wrath after just 90 minutes Facebook tries to recruit more poll workers for presidential election Ed Sheeran doesn't need Twitter, but Twitter needs him Oregon wildfires turned an enchanted forest into a nightmare, photos show Zoom adds two What is invisible labor? It's real and it hurts. Here's what to know.Archie, the very first search engine, was released 30 years ago today Get more from Costco with a new Gold Star Executive Membership for $120 Facebook removes misinformation related to Oregon wildfires Walmart to start using autonomous drones for 1 Serving up technology in the public’s interest—hard, but worth it Ubisoft's Immortals uses a Zelda blueprint for a Greek myth adventure Watch a shot Living near YouTube bro Jake Paul seems like hell on earth Prince George is tired of your niceties and wants you to board the plane immediately The $80,000 Lucid Air: It'll be nice when we can drive it 'One Night in Miami' is a joyful celebration of four Black icons Archie, the very first search engine, was released 30 years ago today I'm Ed Sheeran and you can all go to hell The New York Times buys word puzzle game Wordle So, here's a wax figure of Ed Sheeran casually surrounded by cats Burger King trolls IHOP in the best way possible Charlie Chaplin was the original Distracted Boyfriend meme Cruise driverless taxi service launches in SF Mindy Kaling burns Trump twice in powerful commencement speech YouTuber asks the internet *not* to Photoshop his pleasant honeymoon photo Spotify will add a content advisory to *any* podcast discussing COVID Cameron Esposito's 'Rape Jokes' special is Users of guys Mom has lunch in same restaurant as Andrew Garfield, boldly asks him to record video for her son Give this dog a Dundie for recreating the entire opening intro of 'The Office' Woman gets sentimental neck tattoo and then realizes it's the Toronto Blue Jays logo 'Jackass Forever' is sublimely stupid and surprisingly inspiring John Legend announces new platform for music NFTs 'The Wire' creator's reported ban from Twitter sparks outrage Netflix's 'All of Us Are Dead’ is a glorious, gory thrill ride NASA: February is an excellent time to view a brilliant star nursery and planets Hillary Clinton masterfully mocks James Comey over his misuse of private email 'Dordle' is a simultaneous double

1.3925s , 10194.078125 kb

Copyright © 2025 Powered by 【mating in captivity "a friendship without passion is no for eroticism】,Wisdom Convergence Information Network

Top