Today was when I realized dbzer0 doesn’t use Cloudflare. Which is kinda on brand for them tbh.
Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it’s knees. It’s a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.
Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.
Why does turning off bots turn off federation?
Cloudflare’s bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).
For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It’s telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.
For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it’s from 1IP then it’s probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.
I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.
This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.
I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).
The thing that confuses me is, wouldn’t a whitelist for federated instances and request frequency throttling at the account level solve this issue?
I suppose this would require that the client not have a public front end that keeps full navigation functionality, but for a smaller instance that seems like an easy sacrifice to make in exchange for stability.
“But then how will new instances get federated?” maybe they have to actually talk to the admins of other instances to get vouched in to the whitelist. Just because the network is distributed doesnt mean it needs to be fully inclusive by default, and in fact it explicitly isn’t.
I’m assuming I’m missing something super basic that makes all this not enough, bots spoofing the requests with the credentials of a whitelisted instance maybe?
Seems like maybe the instances should have encrypted keys that handshake each other with batch requests.
Am I on to something or just wildly gesticulating?
I’m very much a Jon Snow when it comes to how DDoS mitigation, DNS, etc, etc., actually work. But surely there are other corporations that offer the same services that work just as good or better than Cloudflare, no?
Cloudflare has a generous free tier. I think thats why it got so popular.
Begs the question; when will it go the route of other services with its generous free tier?
A good chance. Depends on if they think the free tier is still stacking up for them.
E.g. getting their name out there with hobbyists means people recognise the name at work and have staff already familiar, is this still important? Probably not, considering how widespread they are now.
Being able to say in sales speeches they mitigate X billion DDOS attacks and X trillion GB of data saved etc, maybe that is still worth it to them to keep the free tier in order to win big contracts?
Since they dropped their no video streaming clause from the T&Cs of free accounts, I’m guessing they aren’t about to back down on the unlimited bandwidth but over time they are adding more and more value add premium features, which may be they core strategy.
But I do not doubt that they will drop or enshittify the free tier as soon as they think it’s the best strategic move.
Truth.
Too lazy to create the meme. Insert the two astronauts looking at earth meme
Wait, there is no decentralized internet?
Always has been.
Apparently there is a decentralized internet out there. Just we are not experiencing it right now. Skill issue, huh?
insert cursed wojak reaction
Was this the cloudflare thing? I didn’t even notice which lemmy servers were using that
Mine were PieFed.social, PieFed.zip and fedit.pl. I am aware of lemmy.world too. Probably there were more
sh.itjust.works was out
Breath of fresh air
Always have an extra alt
Just only one extra alt, I swear…
Just one billion more alts bro!
And 43 proxies, a vpn, tor, ip2, irc, archie, finger, and fortran!









