

The writers of “reality” are a bunch of plagiarists: https://www.youtube.com/watch?v=14WE3A0PwVs /s
I’m also on Mastodon as https://hachyderm.io/@BoydStephenSmithJr .


The writers of “reality” are a bunch of plagiarists: https://www.youtube.com/watch?v=14WE3A0PwVs /s


Once companies started suing people trying to practice “responsible disclosure”, I stopped attacking people that choose maximum disclosure.
Responsible disclosure has always been a bit of a hedge. It’s rare to be able to show you are actually the first person/organization to discover a vulnerability.


we are going to need to develop a different model of learning, using, and processing information that considers the provenance of where the information came from and how it got there
They used to teach this in schools under “critical thinking skills”. Following the chain of sources to the primary sources was a task I had to to (at least in part) more than once in secondary school.
Authoritarians don’t like that tho.


I just bail on any site that requires age verification. It sucks, but there are still some that work. I do hear that using a VPN can often help.


<ominous>They’re learning…</ominous>


Buy the dip?
I’d like to get back to contributing, but I’m getting close to 12 months unemployed. :(


Yeah, there was some phonics in my primary school education, and I continue to approach new words in that way sometimes. But, they said Phonetically.


Cave 1.0 scored 1000000% but also force fed the proxy lemons, so it was treated as a failure.


Wait, I thought phonetically (example: papa hotel oscar novermber echo tango india charle alfa lima lima yankee) meant using a phonetic alphabet, not using word(s) with the same Soundex encoding.


If you look at the list of tasks, you can see how the 4 frontier models did. Some of them did complete one or two levels of one or two tasks. None of them completed a whole task. Some of the reasoning logs are funny in the replays.


Here’s another reply where the model mistakes running out of time/move for making progress


Yeah, for a fixed ruleset that can be provided up front the Alpha-Zero approach seems to work great.
These tasks strike me as a bit different. I’m sure the ruleset is fixed somewhere, but it’s not disclosed to the participants. In the task I walked myself through, there was a new wrinkle in each part – a new interactable, a (more) hidden goal, or an information limit. And, of course, part of the task is “discovering” all that from the bitmap frame(s) provided.
I’m unconvinced of the hype around “AI”, but this does seem like a legitimate research target that might stymie the Alpha{Go,Zero,Fold} series at least a bit.


The founder of ARC worked at Google until 2024 and wrote 2.5+ books in Deep Learning. So, I expect some of these benchmarks are based on limitations seen in Deepmind.
That said, it would be interesting to see how well Deepmind does at these tasks. My understanding is that the private tasks would still be dynamic enough to require “on the job training” so an Alpha-Go / Alpha-Zero / Alpha-Fold approach is unlikely to do well on ARC-AGI-3.
Still, I think commentary around models (including, but not limited to something from Deepmind) attempting these tasks would be much more interesting than most of the discourse around generative AI, whether text, image, video, or code generation.


https://arcprize.org/arc-agi/1
https://arcprize.org/arc-agi/2
(They were more static, but yes, eventually frontier models got good at them.)


I couldn’t find replays. Are there more? Also, it is a bit funny that “building the bridge” which at one point seems to be Claude’s “chosen goal” is just “running out of moves” and failing the task.
Task failed successfully, Claude. Task failed, successfully.


I finished one of the tasks. And, I imagine I could finish at least some of the others. But, I wasn’t being paid, and it wasn’t very entertaining, so I stopped.
EDIT: Got all the way through several other tasks that suddenly looked interesting while watching the replays available of the frontier models. If you paid me $250, I’d finish them all, though probably not optimally. I still don’t know how I’d do against the private sets.
They should ad a “global” and “friends-only” leaderboard (like the Zachtronics games, etc.) and really see the competition (at least human competition) heat up.


I believe there’s precedent in both directions, but if Britannica can provide decent evidence of the “cannibalization”, then the use (by OpenAL) is unlikely to pass the “fair use” criteria.


Maybe he should meet Putin.


On my 45th, instead of a colonoscopy, my Dr. had me do the “Cologuard” stuff. Insurance covered that. I turn 46 this year.
I thought that was just the Cybertruck, which yes, I wouldn’t drive even if someone gave me one. I’d flip it and buy something else.
I think both the sedan and roadster are okay electric cars, and I think they have enough range I could use them to reduce the amount of gas I burn in my Volt for longer trips.
But, I haven’t really been paying attention to Tesla recently, and Elmu has certainly been looking horrible to me.