> One of the most prominent improvements in Opus 4.8 is its honesty. We train al...

majormajor · 2026-05-28T17:03:19 1779987799

"Honesty" seems like unnecessary (and annoying) anthropomorphism there. I don't think there's any intent of fraud or deception in outputs from these things, just overreaching of prediction. Based on the latter part of the paragraph, I wish they'd just say something like "less likely to skip steps or overemphasize thin evidence" in the first place.

Don't play to the sci-fi "this thing's trying to outsmart me" tropes.

Kiro · 2026-05-28T17:08:32 1779988112

Using words people understand is more important than this strange fixation on not anthropomorphizing things.

wasabi991011 · 2026-05-28T17:10:32 1779988232

I think "honesty" is not a particularly good descriptor, independent of anthropomorphism. Previous commenters suggestion was much more understandable to me.

dugidugout · 2026-05-28T17:27:31 1779989251

Being that can be understood is language. The previous commenter is making an particular argument for how we can improve this understanding. They didn't suggest we should use less familiar words, but different familiar words. Why is this strange?

giraffe_lady · 2026-05-28T17:12:33 1779988353

Anthropomorphizing is a shorthand for a powerful and poorly defined set of metaphors. There are tradeoffs going both ways but trying to dismiss it as merely "strange fixation" shows your own weakness.

tadfisher · 2026-05-28T17:17:31 1779988651

To be clear, this is about anthropomorphizing large language models, not the general category of "things". Also, we should be evaluating these constructs using well-defined and measurable criteria; evaluating "honesty" fails to achieve both goals.

derac · 2026-05-28T17:31:34 1779989494

I think Honesty can be evaluated. Does the model push back when it knows the user is wrong? How often does the model hallucinate data vs. say it doesn't know? Provide a prompt with contradictions or other issues and see if the model corrects you.

Here is an article by Anthropic that explains what they do and mean in more detail: https://alignment.anthropic.com/2025/honesty-elicitation/

swader999 · 2026-05-28T17:14:10 1779988450

Just swap 'Honesty' with 'correctness in its claims' and you'll get what you need out of this aspect of the model description.

stratos123 · 2026-05-28T21:14:53 1780002893

Honesty and correctness are not the same thing, even when talking about LLMs. Sometimes an LLM says a false thing and you don't know whether it's being dishonest or merely incorrect. Sometimes, however, you can see in the CoT that the model does know the true fact and is reasoning about how to deceive the user. That's lying, not just being incorrect.

swader999 · 2026-05-28T23:43:29 1780011809

Fair points. I notice it's not hiding as much from me as earlier versions. It's telling me exactly where it has gaps, where someone might be critical of what it did. Then it's easy for me to adjust. Before it used to lie or just not tell me. Feels like it is acting more like a senior that has enough game and credibility to just tell it like it is. It's noticable in only a few long prompts so far.

adamtaylor_13 · 2026-05-28T17:26:24 1779989184

People get so wrapped around the axle with "anthropomorphizing". For regular folks with no technical background, sure maybe a bit of caveat sprinkled here or there is useful to help them understand what is or isn't true, but on HN it would seem to me that the bar is high enough that we can just use shared language to generally talk about capabilities.

When they say "Honesty" I don't think to myself, "Goodness, does this model have moral understanding?" No, I understand they mean it's less likely to directly bullshit me, which models frequently do.

I don't feel like this level of pedantry around language is useful for people who more or less know what's going on with LLMs. (Again, I concede that perhaps with a less technical audience, there's more need for it.)

krupan · 2026-05-28T20:53:13 1780001593

I agree. In connection with LLMs we also shouldn't use the words intelligent, smart, reasoning, thinking, chat, conversation, etc.

ealready_value · 2026-05-28T17:12:44 1779988364

Opus 4.7 was already trying hard to appear honest. Most conversations I have with it about advice or focusing an opinion often include "my honest take" or "my honest opinion".

The problem is that once I asked it "I'm thinking about A or B" twice, once with "I like A more but suspect B would be best" and a second time with them reversed. Not surprisingly, both times it chose the one I said I suspected was best as it's honest opinion.

MaxikCZ · 2026-05-28T19:55:17 1779998117

I wish I knew how to make it regressively verify its assumptions, like a kind of hook but firing before a sentence is written, or perhaps after and then corrected. I feel like it assuming things clearly wrong is its biggest weakness.

benzible · 2026-05-28T17:15:35 1779988535

In the context of Claude Code, "honest" usually means that the agent took a shortcut, skipped requirements, etc. It's the model giving itself credit for admitting to failing rather than actually doing what was requested.

HAL3000 · 2026-05-28T17:21:55 1779988915

Yeah, it's super annoying. A few days ago, Opus 4.7 created a plan with several items on it, including an auth feature. It then went through the plan and reported that it had created the auth feature, that everything was secure, and that the tests passed.

The issue was that it hadn't actually implemented the auth feature. After I confronted it about this, it admitted that it indeed hadn't done it and said it would implement it now.

If we had just trusted its output, we would now have a security vulnerability in production, allowing anyone to access other people's accounts.

FireBeyond · 2026-05-28T19:06:01 1779995161

I had a lower acuity incident exactly the same.

Had it implement a feature, "commit and merge to develop".

"Built, tested, committed, merged to develop. Up to you to continue testing and merge to main when ready."

Great. Poke at the web app. No feature.

"Where is feature, I can't see it on develop". "Well, that's because it's not on develop, but on feature-branch, so you wouldn't see it."

"I'm confused. I asked you to commit it and merge to develop."

"You're right, you asked me to and I said I would do it and I told you I did it but I did not actually do it. Want me to do it now, then?"

Claude is in sulky-teenager phase.

gwd · 2026-05-28T18:22:16 1779992536

> If we had just trusted its output, we would now have a security vulnerability in production, allowing anyone to access other people's accounts.

This is one reason you always get a different model to review a model's PR. Gemini Or GPT-codex would have certainly noticed the missing auth.

Schiendelman · 2026-05-28T17:35:30 1779989730

How do you test other features?

legitster · 2026-05-28T17:17:03 1779988623

Part of the problem is also garbage-in/garbage-out. There's a lot of human information on the internet that is also confidently wrong.

I use Sonnet a lot for learning about history or contextualizing news topics. It's really good at this for the most part. But there are a lot of topics where "consensus" between either academics or journalists is really "one secondary source which gets repeated a lot".

mitjam · 2026-05-28T18:13:21 1779992001

A failure mode I see more, recently is that it gives superficially correct answers but after digging deeper, I get answers that contradict the superficial answers - really an important thing to be aware of, in my point of view, and it often leaves me wondering if I dug deep enough.

soperj · 2026-05-28T17:02:34 1779987754

My guess is that Claude Opus 4.8 wrote that and is lying to you.

malfist · 2026-05-28T17:00:20 1779987620

And yet, every release has claimed lower hallucination rates. But they persist.

kentm · 2026-05-28T17:00:54 1779987654

Do they persist at the same rates? Lower doesn't mean eliminated, so both of these can be true.

simianwords · 2026-05-28T17:10:06 1779988206

False. Hallucination has meaningfully reduced.

Barbing · 2026-05-28T17:13:01 1779988381

Is Gemini still the biggest confabulator of the big three?