Ask ChatGPT to estimate the carbs in your lunch. Now ask it again. And again. Five hundred times. You’d expect the same answer each time. It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a
They beat any human on that knowledge benchmark, completely unrelated to your 40% “test”. Try to answer any of the example questions on the main page.
I don’t need a metaphor I know LLMs are hallucinating, lying, bullshitting. That doesn’t invalidate my point.