2 Comments
⭠ Return to thread

Also - on hallucinations - I’ve noticed a new (?) behavior where the AI will fess to it right away like oh sorry, my bad and I’m hoping this goes into reinforcement learning or whatever the appropriate name is to get better here

Expand full comment

The thing is, LLMs will often say "Sorry, my bad" when you point a hallucination out. Even the original Bing did that, because they're trained through RLHF to be responsive to feedback.

But the key is to have LLMs catch their own mistakes BEFORE ever outputting them. The o1 model is slightly better here because it revisits its own assumptions during reasoning steps, but it's not perfect.

It's my understanding that the current approach to training LLMs means that hallucinations aren't possible to solve 100% - you can minimize them dramatically, but not eliminate them. We'd need new architectures and approaches for that.

Expand full comment