I think i’ve only once flat out told one it was wrong about a specific assertion I quoted and it immediately was able to find its way to what I knew to be the correct claim.
I just wonder what would happen if i was in fact mistaken and I told it confidently it was wrong without elaborating


A concept that I think is really helpful for interpreting what an LLM does is the concept of a “Chinese room”. The idea is someone slips a piece of paper containing a message in Chinese under the door and inside that room is someone that doesn’t know Chinese following a set of rules for converting characters and numerals into a response based off their syntax. Afterwards the person in the room creates a response and slips it back out under the door. At no point does the person in the room understand the Chinese in the input or in the output, but the person standing outside of the room might believe there is a Chinese speaker inside of the room. This is the same idea with computerized outputs like LLMs. They only provide the illusion of intentionality and don’t actually have an understanding of inputs or their outputs.
What’s the difference between converting the characters and numerals into a response based off their syntax vs understanding them? How can you establish, as the person submitting the notes, whether whoever is in the Chinese room is the former or the latter?
I remember reading something about LLMs not being able to learn “x is y” equivalence relations. Can’t find it now but limitations like this are what make differences clear between what humans do and what we’ve managed to teach the neural network (which will be used to iterate and improve the model further, of course)
In the Chinese box analogy, this would be like them knowing cats are considered cute but not whether considered-cute animals include cats (if I remember the limitation type correctly). If you happen to slip the right instructions/questions, something they’ve seen before or something they’re capable of extrapolating, then nothing seems off; but if someone can say in one paragraph that cats are cute but they know of no cute animal, you’d not think they’re understanding what they’re saying, and so don’t really understand the language even if they give you plausible words in all other cases
(For cats it’ll work because there’s a billion example sentences out there. LLM vendors are also trying to sidestep such problems by having it generate a bunch of tangential text (in which it might happen to regurgitate the tokens it needs to piece together the answer) before answering the prompt, but that’s still not being able to apply logic)
Intentionality is the key difference. You can eventually tell a Chinese room’s nature by giving it new variables that it hasn’t encountered before. New problems lead to algorithmic breakdown.
That said, there’s deeper conversations you can have about what consciousness truly is, of course. My personal view is that it requires a level of complexity that we are still very far away from architecturally, and a level of scalability that we may not even be able to support ecologically. This thought experiment is mainly to show you what the inner workings of a computerized process can look like, and works to provide a demystified perspective.