Some Erdős problems are basically trivial using sophisticated techniques that we...

CSMastermind · 2026-04-26T05:00:35 1777179635

Worth mentioning, though, that people have already tried running all of them through LLMs at this point.

So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).

Tarq0n · 2026-04-26T06:02:23 1777183343

Not definitively. LLMs are stochastic with respect to input, temperature and the exact prompt. It's possible that the model was already capable of it but never received the exact right conditions to produce this output.

teiferer · 2026-04-26T06:48:10 1777186090

Every model is able to solve each problem, given the right prompt. (Worst case, the prompt contains the solution.)

pontifier · 2026-04-26T11:39:50 1777203590

Interesting... Exhaustive brute force prompting might expose previously unknown capabilities in existing models. Seems like a whole can of worms.

Calazon · 2026-04-26T14:26:30 1777213590

Exhaustive brute force prompting is completely unfeasible. The number of potential prompts is impossibly large.

teiferer · 2026-04-27T05:55:28 1777269328

It "exhaustive brute forcing" approach does not need an LLM in the loop. Just brute force the possible outputs instead. They will contain all the most beautiful novels you can imagine!

imiric · 2026-04-26T06:04:16 1777183456

> So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).

No, it's not.

While I don't dispute that new models may perform better at certain tasks, the fact that someone was able to use them to solve a novel problem is not proof of this.

LLM output is nondeterministic. Given the same prompt, the same LLM will generate different output, especially when it involves a large number of output tokens, as in this case. One of those attempts might produce a correct output, but this is not certain, and is difficult if not impossible for a human not expert in the domain to determine this, as shown in this thread.

famouswaffles · 2026-04-29T17:43:38 1777484618

This is one of a number of such results achieved only in the last few months with only the last crop of models. They have undoubtedly gotten better in this domain. Saying anything else is just denial. You can run these same problems on GPT-4 or 5 all you want, you'll get nowhere. In fact people did, and you're hearing about it now because it's these crop of models that are getting meaningful results.

notahacker · 2026-04-26T12:32:04 1777206724

As others have pointed out, a key part of the prompt used here may have been "don't search the internet" as it would most likely have defaulted to starting off with existing approaches to that problem...

_ccwi · 2026-04-26T06:08:50 1777183730

Minor aside, these models do not return the same answer every time you prompt it. Makes it harder to reason over their effectiveness.

rjh29 · 2026-04-26T06:12:38 1777183958

You don't need to say "Minor aside" either. Thankfully language is a creative endeavour not a scientific one.

rjh29 · 2026-04-26T12:42:25 1777207345

Context: parent originally said "you should not say 'worth mentioning', if it's worth mentioning you can just say it". That sentence has now been edited out so my comment looks weird.

_ccwi · 2026-04-26T13:47:20 1777211240

Your reply was so rude it convinced me to edit. Your second reply is a distortion of my original message too.

rjh29 · 2026-04-26T14:35:06 1777214106

Well I'm glad it had the desired effect. Your comment was ruder.

_ccwi · 2026-04-26T14:50:38 1777215038

I disagree, you have quoted me in a way that is not the tone or content of what I wrote.

vessenes · 2026-04-26T04:17:23 1777177043

Tao mentions that the conventional approach for this problem seems to be a dead-end, but it’s apparently a super ‘obvious’ first step. This seems very hopeful to me — in that we now have a new approach line to evaluate / assess for related problems.