

I wouldnt read too much into the lower scores, they include some absolutely tiny models. The one 70% lower than the top score at 24% correct is a 1B model from 2024. Honestly that it can do any information retrival from a 32k context is impressive.


I wouldnt read too much into the lower scores, they include some absolutely tiny models. The one 70% lower than the top score at 24% correct is a 1B model from 2024. Honestly that it can do any information retrival from a 32k context is impressive.


I’m not saying its anything other than morally repugnant, obviously, but in the example of a password with billions or trillions of combinations and where you can check the answers given torture pretty obviously is better than guessing.
That’s not a scenario that is ever likely to come up, and wouldn’t be justifiable even if it did, but pretending it wouldnt be effective is ridiculous.


Torture can be a useful way of extracting information if you have a way to instantly verify it, which actually makes it a good analogy to LLMs. If I want to know the password to your laptop and torture you until you give me the correct password and I log in then that works.


Im not misunderstanding at all, but do you really think governments make multi-billion dollar purchaces without having technical experts go over things with a fine tooth comb. Again if only one customer nation, of which there are dozens, found something like this or if it was ever used, it would wipe out the entire export market for US high tech weapons, why would they do that when they have effective soft power ways of achiving the same thing?


Do you honestly think that all the countries buying these planes havent inspected them? Even if they were incredibly well disguised the chance of them being discovered would essentially stop the US from selling military hardware abroad again as it would be hard proof that they couldnt be trusted.
There is no reason to do that when, as others have pointed out, they can just restrict access to parts, updates and mission planning software.


But also:

Google is still up 100% from where it was may last year, even taking that drop into account.


Search results have been degrading for a lot longer than LLMs have been a thing. Peak usefulness for them was around a decade ago.


CO2 doesn’t vary much in concentration by how close you are to an emission source unless you are literally sucking air out of a tailpipe. You might get a 10-20% increase in the centre of a city instead of the countryside, hardly enough to make up for being somewhere with so much energy coming in that they frequently have to curtail it (which could then be used for this instead).
This isnt CCS which cheaply turns CO2 into an inert form of carbon, its an expensive process for turning CO2 into a very useful form.


Sure, but you cant store that electricity as electricity. IMO this is most interesting as a energy storage technology, so the comparison isnt what that gasoline would do in an ICE car compared to an EV, its to what it would cost compared to battery storage (or compressed air or whatever other technology) to store a few weeks of output on the order of months. The big advantage I see here is that unlike those other technologies capacity is dirt cheap to build, its just a metal tank. So whenever a renewable plant would curtail its output it can instead redirect to creating gasoline to burn when the renewables arent producing much electricity.


I wonder is a scaled up version of this could work for grid-scale medium length storage. Smoothing out weeks of dunkleflaute is the main blocker to going to a primarily renewable grid. Gasoline is a lot easier to store than hydrogen and large scale gasoline generators should get close to the efficiency of natural gas peaker plants.


That doesn’t make any sense as a reason to turn off Gemini in your inbox though. Either you are ok with having your emails scanned and used in ML systems, in which case why bother turning off the feature; or you aren’t and turning off the feature doesn’t help you.


Google promises(new window) that Gmail’s 3 billion users will benefit from a “personal, proactive inbox assistant”. But given that these features are free, what’s the catch? Make no mistake, Google isn’t doing this out of generosity. The contents of your inbox are valuable to the company.
Email used to be a more private space where your communications could potentially be intercepted by bad actors, but largely your data was your own.
I dont think that is true wrt gmail is it? Google have been scanning your messages and using that for machine learning based ad targeting since it was released.


Sure, it probably wont take then as long, but its still misleading to portray “China reaches milestone a western company did a quarter of a century ago” as being equivalent to catching up.


China “has EUV” lithography in the same way ASML had it in 2001:
ASML built its first working prototype of EUV technology in 2001, and told Reuters it took nearly two decades and billions of euros in R&D spending before it produced its first commercially-available chips in 2019.
They are still an awful long way behind the west in this regard.


Those are how to install Linux inside windows.


The UK does not have regional electricity pricing. This is actually an issue as it means energy intensive businesses arent attracted to places close to large sources of renewable power (the North East and Scotland) and instead crowd into the overheated South East.
But it also means that the locals wont be helping with the leccy bill any more than someone in Aberdeen is.


The number of people suggesting that the appropriate responce to an optional feature for the standard bearer foss browser is to jump to a chrome based browser and further cement google’s dominance is depressing.


Its not a Microsoft thing, also I have no idea what you are agitated about, is there some sort of pop culture MCP that is terrible for it to be linked to? Searching for it the only thing other that Model context protocol I find is “make contribution payments”, “Metcalfe Copeman & Pettefar LLP Solicitors” and “MCP fixings” so whatever it is, I imagine MS are unaware of it.


Thing is, thats fine if you’re doing something like working on a version controlled codebase where you can just roll back whatever the agent does if you dont like it. The idea of using a windows computer that had an AI fucking around with system settings and registry entries gives me shivers. Thats before getting into the possibilities of hostile actors managing to prompt your AI to do something like give up sensitive information by getting it to read malicious information on a website.
It should be pointed out that those two changes are very much not equal. Energy density has only increased by a factor of ~5, whereas cost has fallen by a factor of ~90 (by eye).