• Dave@lemmy.nz
    link
    fedilink
    English
    arrow-up
    29
    ·
    6 hours ago

    Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it’s knees. It’s a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.

    Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.

      • Dave@lemmy.nz
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 hours ago

        Cloudflare’s bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

        For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It’s telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

        For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it’s from 1IP then it’s probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

        I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

        This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

        I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

        • Cooper8@feddit.online
          link
          fedilink
          English
          arrow-up
          1
          ·
          52 minutes ago

          The thing that confuses me is, wouldn’t a whitelist for federated instances and request frequency throttling at the account level solve this issue?

          I suppose this would require that the client not have a public front end that keeps full navigation functionality, but for a smaller instance that seems like an easy sacrifice to make in exchange for stability.

          “But then how will new instances get federated?” maybe they have to actually talk to the admins of other instances to get vouched in to the whitelist. Just because the network is distributed doesnt mean it needs to be fully inclusive by default, and in fact it explicitly isn’t.

          I’m assuming I’m missing something super basic that makes all this not enough, bots spoofing the requests with the credentials of a whitelisted instance maybe?

          Seems like maybe the instances should have encrypted keys that handshake each other with batch requests.

          Am I on to something or just wildly gesticulating?

      • Dave@lemmy.nz
        link
        fedilink
        English
        arrow-up
        8
        ·
        5 hours ago

        Cloudflare has a generous free tier. I think thats why it got so popular.

        • dohpaz42@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          4 hours ago

          Begs the question; when will it go the route of other services with its generous free tier?

          • Dave@lemmy.nz
            link
            fedilink
            English
            arrow-up
            5
            ·
            4 hours ago

            A good chance. Depends on if they think the free tier is still stacking up for them.

            E.g. getting their name out there with hobbyists means people recognise the name at work and have staff already familiar, is this still important? Probably not, considering how widespread they are now.

            Being able to say in sales speeches they mitigate X billion DDOS attacks and X trillion GB of data saved etc, maybe that is still worth it to them to keep the free tier in order to win big contracts?

            Since they dropped their no video streaming clause from the T&Cs of free accounts, I’m guessing they aren’t about to back down on the unlimited bandwidth but over time they are adding more and more value add premium features, which may be they core strategy.

            But I do not doubt that they will drop or enshittify the free tier as soon as they think it’s the best strategic move.

  • sk1nnym1ke@piefed.social
    link
    fedilink
    English
    arrow-up
    35
    ·
    edit-2
    6 hours ago

    Too lazy to create the meme. Insert the two astronauts looking at earth meme

    Wait, there is no decentralized internet?

    Always has been.

    • Kierunkowy74@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      12
      ·
      7 hours ago

      Apparently there is a decentralized internet out there. Just we are not experiencing it right now. Skill issue, huh?

      insert cursed wojak reaction