Google is cannibalizing the web to feed AI

Sahwa@reddthat.com · 1 day ago

Google is cannibalizing the web to feed AI

Tollana1234567@lemmy.today · edit-2 12 hours ago

he means reddit mostly. AI SLOP GENERATOR, TRAINING ON SLOP like reddit. with a little of plagiarizing from authors, and artists.

thermal_shock@lemmy.world · 2 hours ago

I called to schedule a play date at my local dog daycare/boarding, it went to an AI answering service. I asked if it was AI since I could hear noice in the background (literal fake background chatter and noise), when she said yes, hung up. SO tired of AI everywhere. Fuck it all.

very_well_lost@lemmy.world · 56 minutes ago

lmao, I bet they trained the stupid thing on recordings from massive call centers. It probably thinks that all the ‘background’ noise is just part of how humans communicate.

BilSabab@lemmy.world · 3 hours ago

ah yes, reddit, the most well-mannered and measured of social media platforms that never indulges itself in spreading hate in misinformation.

borth@sh.itjust.works · 15 hours ago

I don’t understand how these companies want to seem and think they are so smart by choosing new niche data (scraped) to train AI in a bid to try and make it “smart”…

Has any other living being become “smart” by only ingesting information directly from the Internet? You can train other animals to perform many tasks and can probably say they are smart when they perform them as expected. I doubt any of the training methods is to tape headphones, a screen and sometimes a microphone to their faces forever (I kinda don’t wanna know if this false 😶).

The best example we have, is ourselves, and even though we use the Internet, babies are not taught how to walk and talk by only interacting with the Internet.

I feel like I might be saying too much, but I think the best AI we’re gonna get is to unplug it from the Internet, and then fucking raise it for 20 years like a normal, super fast-thinking child prodigy. Then just make copies of that and train further by having it go to school for the things needed.

BilSabab@lemmy.world · 3 hours ago

straightforward data scraping from the web usually ends up in having a whole lot of dark data

PhoenixDog@lemmy.world · 56 minutes ago

See: Grok creating child porn

NιƙƙιDιɱҽʂ@lemmy.world · 8 hours ago

That’s a very naive simplification of the AI training process. You start with that, then pay people pennies in a developing nation to produce hand crafted training data, resulting it using stupid words like delve and whimsical entirely too much.

Merely training on internet content with no RLFH training results in probable gibberish like that of GPT-2

sureshot0@discuss.online · 10 hours ago

Has any other living being become “smart” by only ingesting information directly from the Internet?

LMFAO

Uriel238 [all pronouns]@lemmy.blahaj.zone · 19 hours ago

More Perfect Union did a video on Google’s descent into evil. I think it’s this one

TLDW: Once Google pivoted from being a search service to an advertising agency, it was motivated to keep users from hyperlinking away from Google, and so offered summaries and alternatives controlled by Alphabet that allowed it to keep offering you ads.

So this AI service is just a natural iteration.

BrightCandle@lemmy.world · 1 day ago

The death of Stackoverflow is one of these events where the site has been completely killed by AI and yet its contents is completely necessary for AI to know about solving programming problems. Its death will mark the end of AIs ability to learn how to solve programming issues. Its cannibalizing itself in the process, as it destroys its sources it destroys its own ability to learn.

Strider@lemmy.world · 3 hours ago

And yet, they have not created the AI that could do without it. Any day now, promise!

artyom@piefed.social · edit-2 1 day ago

It’s not just that, it’s shitting where it eats. People are using it to fill the internet with disinformation, then it trains itself on it’s own disinformation, and breeds even worse disinformation. This is why AI can never be smarter than it was in 2021.

On top of that, due to the indiscriminate DDOSing of the entire internet by AI bots, websites have been blocking any web crawlers that are not Google, which just contributes to their monopoly.

chunes@lemmy.world · 22 hours ago

Model collapse isn’t a thing anymore. https://arxiv.org/html/2510.16657v1

CmdrShepard49@sh.itjust.works · edit-2 14 hours ago

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Lol, so to make a great model, they just need to have an even better one available first or a human who can verify every single thing it ingests.

Hmm, call me skeptical on this claim.

Grandwolf319@sh.itjust.works · 21 hours ago

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Yeah if you have a source of truth then your model is basically getting trained on that.

It’s like already having the answer

chunes@lemmy.world · 20 hours ago

The point is that it only needs to comprise a very small part of the model.

Grandwolf319@sh.itjust.works · 19 hours ago

My point was that having a verifier means your not really training a model on another model’s data, it’s basically as if you get new raw data from a non AI source

corsicanguppy@lemmy.ca · 17 hours ago

This assumes everything is valid on the external. If one slop cluster feeds off another - a slopveyor? - then there is nothing external for the validation hall-monitor to compare against. They’re trusting another model’s output as if it were gospel.

artyom@piefed.social · 18 hours ago

LOL OK

Zarxrax@lemmy.world · 1 day ago

I’m pretty sure AI is objectively smarter today than it was 5 years ago.

MrSmith@lemmy.world · edit-2 3 hours ago

Objectively + smarter, huh.

urandom@lemmy.world · 4 hours ago

It is not alive and cannot really think, so I doubt it’s smarter. It likely contains a bit more knowledge and a better interconnected network for it

bthest@lemmy.world · edit-2 4 hours ago

Actually it just appears smarter because people are objectively dumber than they were 5 years ago. “AI” is actually stagnate.

oce 🐆@jlai.lu · edit-2 10 hours ago

There’s better integration with all sorts of other sources of truth beyond the LLM training, which makes it seem smarter.

SpaceNoodle@lemmy.world · 1 day ago

Since LLMs literally can’t learn, no. They’re just increasingly tweaked to seem even more convincing.

Sharkticon@lemmy.zip · 22 hours ago

How can something with no intelligence be smarter?

TeamAssimilation@infosec.pub · 3 hours ago

It is evident that it has intelligence, it outputs intelligent responses usually adequate to its input, even if it’s badly phrased. What it doesn’t have is sentience, conscience, and a learning loop.

SystemDisc@feddit.org · 20 hours ago

This is true, depending on what you mean by smarter. They are undeniably more capable. However, the trendy, cool things is to hate on AI, rejecting all else. Sure, capitalism sucks, and the powerful rich people and companies who control AI suck. AI itself, though, can very easily result in massive benefits for humanity as a whole.

MrSmith@lemmy.world · 2 hours ago

“AI” is garbage generator, that consumes our resources to output mediocre slop. Case in point is your PFP.

It’s neither “trendy” nor “cool”, just common opinion, as more and more people get to try it and see how useless it actually is, for anything beyond average (often empirically wrong) output.

SystemDisc@feddit.org · 2 hours ago

Recent LLMs are very capable at doing almost all simple computing tasks. At some point in the future, we will have AI that is more generally useful than current LLMs. If and when we have truly general purpose AI, it could benefit humanity in countless ways. It could also continue to cause harm. We can’t know for sure.

MrSmith@lemmy.world · 2 hours ago

Full self driving next year.

SystemDisc@feddit.org · 1 hour ago

We already have it

CmdrShepard49@sh.itjust.works · 14 hours ago

More capable doesnt mean its smarter. A hammer can be made to be more capable but that doesn’t make it smart.

SystemDisc@feddit.org · 5 hours ago

That’s not 1:1, and like I said, it depends on what you mean by smarter.

Iusedtobeanalien@lemmy.world · 21 hours ago

It’s a self defeating strategy as more people turn to ai, less content gets produced so ai becomes static.

I truly believe the token model will kill AI, it will become too expensive

iocase@lemmy.zip · 20 hours ago

It already is too expensive and adding more compute doesn’t make it cheaper lol it just causes a race to the bottom among data center providers and an eventual crash there too.

CapuccinoCoretto@lemmy.world · 1 day ago

Friends don’t let friends use Google.

Rhaedas@fedia.io · 1 day ago

I can’t remember the name, but when the internet was just starting and there were a lot of search engines with no dominate ones, there was an aggregator program that you could input many search engines into, then use it as the searching tool. It would query all the engines and combine, sort, rank, and remove duplicate finds.

Edit: more specific - It was much like an FTP or torrent program but you’d load up what search engines to use and your search words, and it would actively pull the info then provide a single page with all results.

The reason I mention it is because we’re sort of back at that point. Google is failing, Bing never was great, and all the alternatives have their issues, usually with not having the same database to work with. So if you gathered all the best ones, the ones without ties to corporate or AI, then put their results together, maybe you’d have something like what Google was at its peak before “do no evil” got painted over.

Incidentally, Google became what it was/is because it gobbled up a lot of those early search engines’ databases. I miss you, Hotbot. You were a good one.

PlantJam@lemmy.world · 1 day ago

Search used to be so good. I had an old Honda civic that suddenly wouldn’t start. It wasn’t the starter, alternator, or battery. I managed to find a forum post with my exact issue, which was that a small rubber piece on the clutch pressed a button to “tell” the starter it was okay to start. Twenty minutes later I had zip tied a piece of plastic into place and had a working car again.

If I tried to diagnose that same issue today, it’d be dozens of SEO garbage slop sites without any actual useful information.

bluegreenpurplepink@lemmy.world · 7 hours ago

They are literally walling off all this information that used to be easy to access and for the public. It’s our data that we the people decide to share with the world and these rent seeing corporations are hiding it away so they can start charging us “tokens” to access our own public information.

unglueclass23@programming.dev · edit-2 8 hours ago

I was thinking the same thing recently. It’s not the place it once was. But in general the internet has changed a lot. And it’s not just AI.

All sorts of paywalls especially in news sites.
Everything is getting centralized into a few sites and they’re usually eithe poorly indexable or not at all (Discord, facebook, X, Instagram and so on)
Fediverse (Lemmy, Mastodon) also struggles with search engines.
People trying to sell you shit, create a brand even more than before. Because of this all sorts of SEO optimization crap is done like writing BS articles nobody cares about.
AI slop.
Search engines have gotten better of getting rid of “illegal stuff”.
A lot of sites are just presentational bloat with no substance. Very cool looking landing pages with all sorts of cool animations but when you need to actually find the information that you need… the same UI usually gets in the way.

Oh and now we’re getting into age verification crap also yay

PlantJam@lemmy.world · 22 hours ago

An example of number 4, there’s a poster I’ve seen on reddit that’s posting very relevant content, but then every post ends with “@xxxxxxxx on all socials”. It just takes the whole thing from content I might want to engage with to the exact opposite.

SillyDude@lemmy.zip · 1 day ago

I asked gpt5 and it told me to check the clutch safety switch. The thing you fixed.

CmdrShepard49@sh.itjust.works · 14 hours ago

Did it give a diagram and troubleshooting steps from the factory service manual too? This is all stuff you would typically find in forums. There’s always some dealer tech around who can copy and paste from their service equpiment/library

skulblaka@sh.itjust.works · 1 day ago

5-10 years ago, you could be pretty sure this was a thing that actually needed checked, since the post about the clutch safety switch was posted by a real person who presumably had the same problem as you and fixed it with this method.

Now, there’s no way to know if that’s actually the case, or if “clutch safety switch” is just a likely string of words to feed someone who is having car trouble. You might get lucky, or you might get sent on eight consecutive goose chases because an LLM fundamentally doesn’t know what factual knowledge is, it only knows how to reorder and regurgitate things that other people have said in other contexts.

fta@lemmy.zip · 21 hours ago

I agree with the larger point you’re making, but chatbots are getting better at referencing posts / websites from which they’re taking a solution.

That’s if and only if of course they used a web search tool to answer, and if that website is still alive — made less likely due to AI.

But for debugging something like this, it is actually helpful for now with citations enabled.

CmdrShepard49@sh.itjust.works · 14 hours ago

The problem is that right now would be the peak of this information being available. What are you going to find in 20 years when everyone has abandoned forums in favor of asking ChatGPT for all the answers? There would be nothing left to train the models on.

bthest@lemmy.world · 4 hours ago

I’m here now so me want thing now!

PlantJam@lemmy.world · 22 hours ago

I tried giving minimal information and still got similar results.

When I think about what got worse about the internet, it’s mostly the life stories before recipes, the novel length pages to maybe answer a simple question, and pretty much anything else related to SEO.

sexhaver87@sh.itjust.works · 4 hours ago

Not the virtually endless large language model babble?

PlantJam@lemmy.world · 3 hours ago

That’s what I was referring to by novel length pages. Feels like those predate the recent LLM stuff though.

Shifty Eyes@leminal.space · 21 hours ago

Was it MetaCrawler? https://en.wikipedia.org/wiki/MetaCrawler

SearXNG is the spiritual successor. https://en.wikipedia.org/wiki/SearXNG

Rhaedas@fedia.io · 20 hours ago

No, I don’t see mention of it being an application but like Dogpile is a web-based collector.

I did a search myself, but (given how searching sucks now) couldn’t find anything. Lots of hits for search engines themselves, but getting past that to other methods back then is difficult.

It was much like an FTP or torrent program but you’d load up what search engines to use and your search words, and it would actively pull the info then provide a single page with all results.

it's not often that shit just works@sh.itjust.works · 17 hours ago

I barely ever used it, but I’m assuming you are remembering gopher.

Gopher remained the most popular means of accessing the internet until 1994

Rhaedas@fedia.io · 17 hours ago

Not quite that old, more in the 2000 range based on when I had my PC that I used it on. This was a GUI app for Windows. Wish I had an idea, that was like… too long ago.

Stopwatch1986@lemmy.ml · 18 hours ago

I thought it was Autonomy. You installed a program, instructed puppies agents, logged out, and while you were offline the puppies searched through several engines. Next time you logged in the findings waited for you. That was the time of 56k modems and metered connections.

vrek@programming.dev · 1 day ago

I think the old aggregator you were thinking of was dogpile.com

Rhaedas@fedia.io · 1 day ago

I know the name, but no, it was an actual program on the computer.

vrek@programming.dev · 1 day ago

Hmm…interesting but I got no clue then

Flagstaff@programming.dev · 1 day ago

DuckDuckGo and Ecosia?

oce 🐆@jlai.lu · edit-2 10 hours ago

Ecosia and Qwant are building an EU based index. Maybe then they will become actually independent alternatives to American giants. It will take a while.

FarraigePlaisteaċ (sé/é)@lemmy.world · 1 day ago

Ecosia have planted 250,000+ trees so far and publish their accounts every month. I can’t think of a better option, unless there is a niche requirement.

RiverRabbits@lemmy.blahaj.zone · 1 day ago

they burn all their efforts by pushing genAI tech on their platform

FarraigePlaisteaċ (sé/é)@lemmy.world · 1 day ago

Isn’t that hyperbole rather than truth? They’re still carbon negative.

They don’t provide AI by default (at least, I don’t get it). So people like us can continue to not use AI and the hundreds of million who use it every day can still support tree planting.

I don’t like AI, but if they don’t add it they could risk limiting their reach and environmental goals.

RiverRabbits@lemmy.blahaj.zone · 22 hours ago

AI companies do not release any numbers themselves for carbon emissions. Therefore, companies that use AI cannot in any certainty claim to be carbon negative or neutral, because they have to count the supply chain emissions as well.

not adding AI does not stifle environmental goals, in fact you can only truthfully claim to strive for carbon goals if you do not use AI. After all, there is a reason that Microsoft abandoned their emission goals with AI as the cited reason first and foremost, which shows how incredibly dirty AI can be, even if no one releases any sensible metrics.

FarraigePlaisteaċ (sé/é)@lemmy.world · edit-2 3 hours ago

AI companies do not release any numbers themselves for carbon emissions.

I think the EU is even helping them keep that “operationally sensitive information” private, which is a shame.

not adding AI does not stifle environmental goals.

I can’t agree or disagree here. I know the demand for AI is huge: hundreds of millions of users per day and at least a billion per week. If Ecosia is seen not to have this feature, I would consider it possible that it hurts their adoption and therefore their goals.

which shows how incredibly dirty AI can be, even if no one releases any sensible metrics.

Yes. I’m not sure how much solace the “world greenest AI” slogan can really offer in that context. https://blog.ecosia.org/ecosia-ai/ - but when I’m recommending search to someone, I recommend Ecosia over Google, Bing, DDG, Qwant, Mojeek, etc. simply because I think they are more of a net positive than the other options.

Who knows, maybe in a year or twos time I’ll look back and regret it when more information surfaces. But they’ve been sensible enough until now with their operational choices to reach tree-planting goals.

Flagstaff@programming.dev · 4 hours ago

Yes, I was about to say to @RiverRabbits@lemmy.blahaj.zone that they’re nearly all using AI but Ecosia is still the only one planting trees, so it seems like a no-brainer to me…

squirrel@cake.kobel.fyi · edit-2 1 day ago

Go to https://ecosia.org in a private browser window. It says “AI that answers to the planet”. Search something and the AI Overview on top is enabled by default.

I use them with AI disabled, but it should be the default setting.

Edit: I just did a couple test searches and didn’t get the AI overview. Don’t know what triggers it.

unglueclass23@programming.dev · 23 hours ago

Go to https://ecosia.org/ in a private browser window. It says “AI that answers to the planet”. Search something and the AI Overview on top is enabled by default.

For what it’s worth i’ve been using them for like a year and I clear my cookies often and never got this AI overview thingy you’re talking about. I actually have no clue how it even looks like.

bthest@lemmy.world · edit-2 4 hours ago

That’s A-B testing. Another evil gas lighting dystopian thing that has got people questioning their sanity. Even something as basic as how a website’s UI behaves has to be hidden behind some secret “innovative” algorithm that “optimizes our engagement.”

FarraigePlaisteaċ (sé/é)@lemmy.world · 1 day ago

It’s not me downvoting you, by the way. Anyway, AI is the antithesis of eco-friendly so I share the criticism. But given their success to date I defer to their sense for pragmatism and results.

Yliaster@lemmy.world · 1 day ago

Startpage and Mojeek.

DDG has contracts w Microslop.

Flagstaff@programming.dev · 4 hours ago

Thanks for the warning; I’ll leave DDG for Qwant!

M137@lemmy.today · 1 hour ago

Qwant is so bad IMO, I tried to use it but the results just suck. I kept having to use Google or DDG to get better ones. Only truly great one is Kagi but since they’re US based and I can’t afford the monthly cost (even though it’s low) I don’t use it anymore.

Mwa@thelemmy.club · edit-2 1 day ago

Startpage uses google btw, Mojeek is decent. (I like Mojeek backend with SearXNG.)

Yliaster@lemmy.world · 1 day ago

That’s disappointing. But Mojeek is kind of unusable tbh.

Mwa@thelemmy.club · 1 day ago

true

Impractical_Island@lemmy.world · 21 hours ago

Only a minute away until Google starts a Soylent Green subsidiary company

Schwim Dandy@piefed.zip · 1 day ago

It’s the same arc every monopolistic corporation has taken before it, AI is just accelerating the pace of consuming your customer/product because profits must always increase.

There will be no large scale shift from these experiences because most people are either ok, apathetic or blissfully ignorant to the situation, the best you can do is to remove yourself from the exploitation of the userbase. Linux instead of Windows or Android, Almost any search engine other than Google, fediverse instead of reddit, etc.

DylanMc6 [any, any]@lemmy.dbzer0.com · 23 hours ago

We more less vibe-coding and more coders with thigh-high striped socks.

DylanMc6 [any, any]@lemmy.dbzer0.com · 5 hours ago

I didn’t mean to say “we more less”, I meant to type “we need less”. We need less vibe-coding and more codeers with thigh-high striped socks.

greyscale@lemmy.grey.ooo · 22 hours ago

were you typing with your left hand?

DylanMc6 [any, any]@lemmy.dbzer0.com · 5 hours ago

I wasn’t, and I didn’t mean to type “we more less” - I meant to type “we need less”.

Impractical_Island@lemmy.world · 21 hours ago

Then the dating pool I chase can now support me! Wonderful!

GreenKnight23@lemmy.world · 21 hours ago

Evotech@lemmy.world · 22 hours ago

Honestly I think Google is pretty fucked in the long term

Nobody google anymore. They just ask chat

NickwithaC@lemmy.world · 10 hours ago

Google knows that. This is their response.

givesomefucks@lemmy.world · 1 day ago

We put this real question to AI Mode, “Why does Google Search suck now?” And you won’t believe the answer! (Actually, you won’t be surprised.)

The Chocolate Factory’s AI search substitute offered this completely trustworthy reply:

"You are definitely not imagining it. Users, tech critics, and researchers have documented a measurable decline in Google Search quality. The core issue is that the search engine no longer feels like a tool designed to find the best corner of the web; instead, it feels like a vehicle designed to keep you on Google-owned properties or clicking on monetized links.

“The degradation of Google Search stems from a mix of aggressive monetization, an ongoing arms race with web spammers, and the disruptive introduction of AI features.”

AI is gonna be the same path. Once people trust it, it’ll stop mattering if it give the best info, what matters is if people buy products and believe advertisements/propaganda.

There just will never be as much money in providing nonbiased and accurate answers as there is in manipulating people. So as long as capitalism runs everything, the end goal will always be manipulating people for someone’s profit.

It’s not an accident, it’s not “enshitification”, it’s the natural and inevitable result of unregulated capitalism. The only way to avoid it, is to heavily regulate capitalism. Acting like it’s a separate problem that can be avoided without regulating capitalism just makes people think we can make unregulated capitalism work, if we could it would be working already.

lookingforanALFpolycule@lemmy.world · 1 day ago

Maybe put a nsfw tag on the image of a corpse?

Deebster@infosec.pub · edit-2 1 day ago

Are you referring to the photo of a baby eating meat from a bone?

lookingforanALFpolycule@lemmy.world · 1 day ago

binux@sh.itjust.works · 1 day ago

Do you faint whenever you hear about the existence of meat-eating animals?

lookingforanALFpolycule@lemmy.world · 1 day ago

I don’t faint when I hear about human murder but wouldn’t want human corpses on my timeline either

mabeledo@lemmy.world · 22 hours ago

The baby is not feasting on a human corpse. I don’t see the problem here.

GreenKnight23@lemmy.world · 21 hours ago

as a human who barbecues other humans, I can say without a doubt that is not a human.

the bone however looks like a scapula which can have human meat quite tough unless it’s from a white collar worker like a banker or a data entry specialist.

it's not often that shit just works@sh.itjust.works · 17 hours ago

If you ever get the chance, do try the Arrogant CEO, A5 grade.

Pairs well with fava beans and a nice Chianti.

FartsWithAnAccent@fedia.io · 1 day ago

Maybe keep your head on a swivel so the cannibal babies (sponsored by Google) don’t eat you? Stay alert, stay safe.