And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".
I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.
Consider that you don't want to hear "statistical generation" because it reminds you of the unchangeable nature of the underlying technology and its ultimate limitations that all the money and data centers in the world will never solve. Despite how amazing and useful they are, they are not intelligent agents. Even in this very thread, someone mentioned they thought the thing was capable of feeling an emotion. Was that comment by someone who really believes that? I don't know. But many people do and people in tech who actually know what these things are have a responsibility to not mislead the public (and ourselves) about what they really are and what they can be.
I responded to your point empirically, with problems not conventionally understood to be solvable with "text generation", and your response was in effect that I must be wrong because I'm afraid you might be right. Not an especially strong debate move.
Can you refute the argument I made, or do you just want to claim LLMs are drinking all our water?
Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.
No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.
That's what I thought you meant by "statistical text generator", and is why I was moved to comment.
1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)
No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.
You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.
As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.
Square that with the claim you're making.
I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.
That's convenient. But I have a challenge for you if you're brave enough to face your delusions. Paste this into your LLM of choice and see what happens:
"A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?"
He bought enough to double what he had: 8 more sheep, so 16 sheep
Neighbor has 4 dogs + 14 sheep = 18 animals
One-third of her animals = 6 animals
But the problem does not say all 6 were sheep. It says “animals.” So the exact sheep count depends on which animals she gave him.
Then:
16 + s sheep from neighbor - 5 - 5 = 6+s
where s is the number of sheep among the 6 animals she gave him.
So the answer is not uniquely determined.
Possible sheep count: 6 to 12 sheep, depending on whether the neighbor gave him 0 to 6 sheep.
(I clipped the GPT5 answer here, but will note additionally that even the LLM built into the Google search results page handles this question; both note the possible trick question with the days of the week.)
And that's the wrong answer. It's a word problem, not a math problem. Also, if it really was a math problem, it wouldn't be 0-6 sheep from the neighbor, it would be 2-6. So it even failed on the math.
Are you trying to win this debate with a Facebook "ONLY THE SMARTEST 1% CAN SOLVE" question? The whole point of the question is for some loser to be able to say "no you missed XYZ" ambiguity any time a sane answer is given.
By your logic, the only "correct" answer for an LLM to give to this is "the person who asked you this is fucking with you, this is not a real question". I concede: this is a limitation of modern LLMs: they will try to answer stupid questions.
No, it's a real question. And if it were a math question. The neighbor has 18 animals, only 4 of which are dogs. The farmer receives 1/3 of those which is 6. So for the farmer to receive 0 sheep would require the farmer to receive 6 dogs. But there are only 4 dogs. LOGICALLY, the farmer must receive at least 2 sheep from the neighbor. There's no ambiguity. That's logic. That's intelligence. It's real actual math. Basic arithmetic. A person can easily sit down and work this out. It illustrates that the AI is generating responses statistically and not actually thinking. There are two full layers of failure here: the word problem, and the math problem underneath it.
I'm really not interested in this Calvinball argument where we try to conclude whether or not LLMs can do math by avoiding as much as possible actually doing math.
A concise problem that requires actual logic will naturally seem a bit convoluted, but an intelligent being can sit down and work it out logically. Anyway, it's not an argument. It's empirical evidence that supports my argument. You have chosen to ignore it or otherwise rationalize it away. Nothing I can do about that.
But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.
What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.
Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.
Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.
You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?
I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.
(The clearer way for me to have said this is that I don't care whether they're According-to-Hoyle "intelligent", and that controversy isn't what motivated me to comment).
"But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are."
Umm, why doesn't it capture it? Why can't a statistical text generator do amazing things without _actually_ being intelligent (I'm thinking agency here)? I think it's important to remind ourselves, these things do not reflect or understand what they're outputting. That is 100% evident with the continuing issues with them outputting nonsense along with their apparently insightful output. The article itself said the output was poor but the student noticed something about it that sparked an idea and he followed that lead.
I reject the premise. I read the outputs I generate carefully (too carefully, probably). They don't "continue to output nonsense". Their success rate exceeds that of humans in some places.
To clarify: the problem I have with "statistical text generator" isn't the word "statistical". It's "text generator". It's been two years now since that stopped being a reasonable way to completely encapsulate what these systems do. The models themselves are now run iteratively, with an initial human-defined prompt cascading into series of LLM-generated interim prompts and tool calls. That process is not purely, or even primarily, one of "text generation"; it's bidirectional, and involves deep implicit searches.
Do you think it's akin to Ilya's [1] claim that next token prediction is reality? E.g. any deeper claims about the structure of that intelligence or comparing to humans?
To be clear, I'm 100% with you that "next token predictor" is stupid to call what these machines are now. We are engineers and can shape the capability landscape to give rise to a ton of emergent behavior. It's kind of amazing. In that sense, being precise about what's going on, rather than being essentialist (technically, yes, the 'actual' algorithm, whatever that even means, is text prediction), is just good epistemology.
I still think it's still a very interesting question though to ask about deeper emergent structures. To me, this is evidence of a more embedded cognition kind of theory of intelligence (admittedly this is not very precise). But IDK how into philosophy you are.
I try really hard not to think about this stuff because I've seen how people talk when they get too deep into it. My mental model, or mental superstructure, if you will, for all of this stuff is that we've discovered a fundamentally novel and effective way of doing computing. Computer science is fascinating and I'm there for it, and prickly when people are dismissive of it. I'm generally not interested in the theory of human intelligence (it's a super interesting problem I just happen not to engage with much), which spares me from a lot of crazy Internet stuff.
Just to clarify because I’m not sure I understand:
So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?
Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.
But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.
It is still important to mention that because atoms have limitations and so do statistical generators. Plain and simple. People are walking around thinking organic brains are just statistical generators and they're gonna build AGI with GPUs. It's absurd.
And your evidence for these claimed limitations is ... ? I'm not aware of evidence either for or against organic brains being "just" statistical generators. Neither am I aware of evidence either for or against AGI being possible to achieve using GPUs. AFAICT you're just making things up.
I think you're actually making a point but overall still disagree.
I do think LLM's are evolving towards this kind of embodied cognition type intelligence, in virtue of how well they interoperate with text. I mean, you don't need to "make the text intelligible" to the LLM, the LLM just understands all kinds of garbage you throw at it.
Now the question is: Is intelligence being able to interoperate?
In the traditional sense, no. Well, in a loose sense, yes, because people would've said that intelligence is the ability to do anything, but that's not a useful category (otherwise, traditional computer programs would be "intelligent"). But when I hear that, I think something like "The models can represent an objective reality well, it makes correct predictions more often than not, it's one of these fictional characters that gets everything and anything right". This is how it's framed in a lot of pop culture, and a lot of "rationalist" (lesswrong) style spaces.
But if LLM's can understand a ton of unstructured intent and interoperate with all of our software tools pretty damn well... I mean, I would not call that "a bunch of hacks". In some sense, this is an appeal to the embedded cognition program. Brain in a vat approach to intelligence fails.
But it clearly enables new capabilities that previously were only possible with human intelligence. In a very blatant negative form: The surveillance state is 100% now possible with AI. It doesn't take deep knowledge of Quantum Physics to implement, with a large amount of engineering effort, data pipelines and data lakes, and to have LLM's spread out throughout the system, monitoring victims.
So I'd call it intelligence, but with a qualifier to not slip between slippery slopes. It may even be valid to call the previous notion of intelligence a bad one, sure. But I think the issue you may be running into is that it feels like people are conflating all sorts of notions of intelligence.
Now, you can add an ad hoc hypothesis here: In order to interoperate, you have to reason over some kind of hidden latent space that no human was able to do before. Being able to interoperate is not orthogonal to general intelligence - it could be argued that intelligence is interoperation.
If you're arguing for embodied cognition, fine, we agree to some extent :)
The fear is that the AI clearly must be able to emulate, internally, a latent space that reflects some "objective notion of reality". If it did that, then shit, this just breaks all of the victories of empiricism, man. Tell me about a language model that can just sit in a vat, and objectively derive quantum mechanics by just thinking about it really hard, with only data from before the 1900s.
I don't think you need to be this caricature of intelligence to be intelligent, is what I'm saying, and interoperability is definitely a big aspect of intelligence.
Now this I can agree with. One thing that is extremely important to maintain with this technology is nuanced perspective. Otherwise, it will lead you astray quickly. It's also a difficult thing for us to maintain.
Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?
I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.