• NuXCOM_90Percent@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 hours ago

    A lot of people don’t understand how AI training and AI inference work, they are two completely separate processes.

    Yes, they are. Not sure why you are bringing that up.

    For those wondering what the actual difference is (possibly because they don’t seem to know):

    At a high level, training is when you ingest data to create a model based on characteristics of that data. Inference is when you then apply a model to (preferably new) data. So think of training as “teaching” a model what a cat is, and inference as having that model scan through images for cats.

    And a huge part of making a good model is providing good data. That is, generally speaking, done by labeling things ahead of time. Back in the day it was paying people to take an amazon survey where they said “hot dog or no hot dog”. These days… it is “anti-bot” technology that gets that for free (think about WHY every single website cares what is a fire hydrant or a bicycle…)

    But that is ALSO just simple metrics like “Did the user use what we suggested”. Instead of saying “not hot dog” it is “good reply” or “no reply” or “still read email” or “ignored email” and so forth.

    And once you know what your pain points are with TOTALLY anonymized user data, you can then “reproduce” said user data to add to your training set. Which is the kind of bullshit facebook, allegedly, has done for years where they’ll GLADLY delete your data if you request it… but not that picture of you at the McDonald’s down the street because that belongs to Ronjon Buck who worked there one summer. But they’ll gladly anonymize your user data so the picture of you actually just corresponds to “User 25156161616” that happens to be the sibling of your sister and so forth…

    in fact a lot of research is being done right now trying to make it possible to do both because it would be really handy to be able to do them together and it can’t really be done like that yet.

    That is literally just a feedback loop and is core to pretty much any “agentic” network/graph.

    Go ahead and do so, they will have separate sections specifically about the use of data for training. Data privacy is regulated by a lot of laws, even in the United States, and corporate users are extremely picky about that sort of stuff.

    There also tend to be laws about opting in and forced EULA agreements. It is almost like the megacorps have acknowledged that they’ll just do whatever and MAYBE pay a fee after they have made so much more money already.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      2
      ·
      2 hours ago

      Yes, they are. Not sure why you are bringing that up.

      I am bringing it up because the setting Google is presenting only describes using AI on your data, not training AI on your data.