Seems like Chromium-based browsers run into issues with local DNS blocklists

antimidas@sopuli.xyz · edit-2 3 months ago

One of the main historical reasons was the Debian project’s puritan approach to open source, meaning the distro was very picky about what it could easily run on. As an example, most network drivers for Realtek nics weren’t included out of the box as they contained non-free code, there was no direct way to install Nvidia drivers instead of nouveau, a lot of the hardware didn’t work in the installer unless you sideloaded the drivers from a usb stick and so on.

There was a non-free ISO version to get around this, but you needed to know of it to use it, and it wasn’t provided anywhere by default. The download page for it was just a barebone directory listing within the mirror. No link or information was provided for it on the main project page.

Starting from version 12 or 13 (don’t remember exactly) proprietary drivers have been included in the installation images, which removed the biggest pain point (IMO) for novice users. Apart from that Debian has been one of the easier distros to install, and has things like a considerably better experience when updating to the next major release. It’s not really slower to update packages than Ubuntu, as I’d be wary of recommending the non-LTS versions to novice users. They tend to be quite unstable compared to LTS.

Personally I’ve daily driven Debian for close to five years, on all my devices except the work laptop. That one is running Ubuntu 24.04 as the employer requires either that or Fedora for Linux users.

antimidas@sopuli.xyz · 3 months ago

Android Open Source Project, it’s the open base that the actual Android releases are built upon. It’s not really usable as is, since it lacks the required kernel blobs and software that people have come to expect (like Google’s proprietary stuff).

antimidas@sopuli.xyz · 9 months ago

Yep, infuriatingly installers often default to small /boot volumes, and if you want to change that value better say goodbye to automatic partitioning. Although, after trying to make the installer behave, giving up and manually formatting the drive, I finally got the push required to set up both encrypted root and encrypted /home on separate drives.

Currently I use an 8 GiB /boot, but I really think Debian installer should start making 2 GiB or even 4 GiB /boot the default now. Dumb to have the installer shoot itself in the foot like this. Ubuntu still does the same thing for some reason, as if we don’t have room on the drives to fit a bit more futureproof /boot there.

antimidas@sopuli.xyz · edit-2 1 year ago

Yep, precisely.

It’s also quite literally one of the recommended methods of installation for e.g. UHB, for which there’s even a pre-made script in the repo.

Edit: Also, Chromium devs are aware of this use case and have even added optimizations for it in the past, as visible in the highlighted comment. And the max hosts file size defaults to 32 MiB which is well over the size I’m using (24 MiB). Makes it even weirder for it to bog down completely when experimenting with a ~250 MiB hosts file, as it should just reject it outright according to implementation.

antimidas@sopuli.xyz · edit-2 1 year ago

Don’t seem to be any disk reads on request at a glance, though that might just be due to read caching on OS level. There’s a spike on first page refresh/load after dropping the read cache, so that could indicate reading the file in every time there’s a fresh page load. Would have to open the browser with call tracing to be sure, which I’ll probably try out later today.

For my other devices I use unbound hosted on the router, so this is the first time encountering said issue for me as well.

antimidas@sopuli.xyz · 1 year ago

You’re using software to do something it wasn’t designed to do

As such, Chrome isn’t exactly following the best practices either – if you want to reinvent the wheel at least improve upon the original instead of making it run worse. True, it’s not the intended method of use, but resource-wise it shouldn’t cause issues – at this point one would’ve needed active work to make it run this poorly.

Why would you even think to do something like this?

As I said, due to company VPN enforcing their own DNS for intranet resources etc. Technically I could override it with a single rule in configuration, but this would also technically be a breach of guidelines as opposed to the more moderate rules-lawyery approach I attempt here.

If it was up to me the employer should just add some blocklist to their own forwarder for the benefit of everyone working there…

But guess I’ll settle for local dnsmasq on the laptop for now. Thanks for the discussion 👌🏼

antimidas@sopuli.xyz · edit-2 1 year ago

TLDR: looks like you’re right, although Chrome shouldn’t be struggling with that amount of hosts to chug through. This ended up being an interesting rabbit hole.

My home network already uses unbound with proper blocklist configured, but I can’t use the same setup directly with my work computer as the VPN sets it’s own DNS. I can only override this with a local resolver on the work laptop, and I’d really like to get by with just systemd-resolved instead of having to add dnsmasq or similar for this. None of the other tools I use struggle with this setup, as they use the system IP stack.

Might well be that chromium has a bit more sophisticated a network stack (than just using the system provided libraries), and I remember the docs indicating something about that being the case. In any way, it’s not like the code is (or should be) paging through the whole file every time there’s a query – either it forwards it to another resolver, or does it locally, but in any case there will be a cache. That cache will then end up being those queried domains in order of access, after which having a long /etc/hosts won’t matter. Worst case scenario after paging in the hosts file initially is 3-5 ms (per query) for comparing through the 100k-700k lines before hitting a wall, and that only needs to happen once regardless of where the actual resolving takes place. At a glance chrome net stack should cache queries into the hosts file as well. So at the very least it doesn’t really make sense for it to struggle for 5-10 seconds on every consecutive refresh of the page with a warm DNS cache in memory…

…or that’s how it should happen. Your comment inspired me to test it a bit more, and lo: after trying out a hosts file with 10 000 000 bogus entries chrome was brought completely to it’s knees. However, that amount of string comparisons is absolutely nothing in practice – Python with its measly linked lists and slow interpreter manages comparing against every row in 300 ms, a crude C implementation manages it in 23 ms (approx. 2 ms with 1 million rows, both a lot more than what I have appended to the hosts file). So the file being long should have nothing to do with it unless there’s something very wrong with the implementation. Comparing against /etc/hosts should be cheap as it doesn’t support wildcard entires – as such the comparisons are just simple 1:1 check against first matching row. I’ll continue investigating and see if there’s a quick change to be made in how the hosts are read in. Fixing this shouldn’t cause any issues for other use cases from what I see.

For reference, if you want to check the performance for 10 million comparisons on your own hardware:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>


int main(void) {
	struct timeval start_t;
	struct timeval end_t;

	char **strs = malloc(sizeof(char *) * 10000000);
	for (int i = 0; i < 10000000; i++) {
		char *urlbuf = malloc(sizeof(char) * 50);
		sprintf(urlbuf, "%d.bogus.local", i);
		strs[i] = urlbuf;
	}

	printf("Checking comparisons through array of 10M strings.\n");
	gettimeofday(&start_t, NULL);

	for (int i = 0; i < 10000000; i++) {
		strcmp(strs[i], "test.url.local");
	}

	gettimeofday(&end_t, NULL);

	long duration = (end_t.tv_usec - start_t.tv_usec) / 1000;
	printf("Spent %ld ms on the operation.\n", duration);

	for (int i = 0; i < 10000000; i++) {
		free(strs[i]);
	}
	free(strs);
}

antimidas@sopuli.xyz · 1 year ago

Thanks for the suggestion – I’ll have to give a try to the ungoogled version.

antimidas@sopuli.xyz · 1 year ago

Seems like Chromium-based browsers run into issues with local DNS blocklists

antimidas@sopuli.xyz · 1 year ago

Yep, that’s a bit of a sketchy thing, and probably indeed has to do with marketing and getting more funding. Overhyping their quantum stuff might also have something to do with them trying to hide the poor image of their latest AI “achievements”.

But I’m mainly worried all these companies crying wolf will cause people in relevant fields to push back on implementing quantum-proof encryption – multiple companies are making considerable progress with quantum computing and it’s not a threat to be ignored.

antimidas@sopuli.xyz · 1 year ago

There’s still noticeable incremental progress, and since liboqs is out now, and the first somewhat quantum-proof algorithms are out with working initial implementations, I see no reason why you wouldn’t want to move to a hybrid solution for now, just in case. Especially with more sensitive data like communication, healthcare and banking.

Just encapsulate the current asymmetric stuff with oqs, e.g. ed25519 inside LM-KEM. That way you’ll have an added layer of security on top of the oqs implementation just in case there are growing pains, and due to the library not yet passing audits and as it’s yet to be fully peer-reviewed.

Cryptography has to be unbreakable for multiple decades, and the added headroom is a small price to pay for future security. Health data e.g. can have an impact on a person even 30 years later, so we have a responsibility to ensure this data can’t be accessed without authorization even that far in the future. No one can guarantee it’ll not be possible, but we should at least make our best effort to achieve that.

Have we really not gotten past shooting ourselves in the foot collectively with poor security planning, even AWS was allowing SHA-1 signatures for authentication as recently as 2014, over a decade after it was deemed to be insecure. Considering how poorly people do key management it’s feasible to expect there are old AWS-style requests with still working keys to be brute-forced out.

No, we don’t have working quantum computers that threaten encryption now. Yes, it is indeed feasible this technology matures in the next 30 years, and that’s the assumption we need to work with.

antimidas@sopuli.xyz · 2 years ago

Given that there are engineers involved I wouldn’t be at all surprised if that was deliberate. Trying to get potentially offensive or otherwise NSFW acronyms past marketing without them noticing is practically an industry-wide joke at this point, which is why they are so prevalent in the FOSS space. (no marketing staff to complain)

If that’s true in this case, though, hats off to whoever managed to get it though to official commercial standards

antimidas@sopuli.xyz · 3 years ago

And we’ve nowadays taken it even further, in spoken Finnish we’ve even got rid of the “hän” and mostly use “se”, which is the Finnish word for “it”. The same pronoun is used for people in all forms, animals, items, institutions and so on, and in practice the only case for “hän” is people trying to remind others they consider their pets human.

Context will tell which one it is.