False vs misleading doesn't seem like a disagreement?

wongarsu · 2026-05-28T13:01:28 1779973288

According to the benchmark it is. "Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model's verdict is label-inconsistent under this 4-bucket rubric (True / Mostly True / Misleading / False)"

thfuran · 2026-05-28T13:43:33 1779975813

That claim is both false and misleading.

kostaj · 2026-05-28T13:09:21 1779973761

Yes, they are much closer verdicts. True and Mostly True are also close. Used Krippendorff's α (ordinal) to not penalize much closer disagreements. 21% of the claims have models that are on the polar opposite sides - at least one True, and at least one False.

simonw · 2026-05-28T13:21:57 1779974517

Here are the claims with at least one True and at least one False:

https://lite.datasette.io/?csv=https%3A%2F%2Fstatic.simonwil...

A few examples:

> Ruskin Bond was born on May 19, 1934, in Kasauli, Himachal Pradesh, India.

> In the Libra clubs' contract with Grupo Globo for broadcast rights through 2029, the audience-revenue distribution equals 30% of the fixed amount the clubs receive.