My limited knowledge on this subject: The z-score is how many standard deviations you are from the mean.
In statistical analysis, things are often evaluated against a p (probability) of 0.05 (or 5%), which also corresponds to a z-score of 1.96 (or roughly 2).
So, when you’re looking at your data, things with a z score >2 or <2 would correspond to findings that are “statistically significant,” in that you’re at least 95% sure that your findings aren’t due to random chance.
As others here have pointed out, z-scores closer to 0 would correspond to findings where they couldn’t be confident that whatever was being tested was any different than the control, akin to a boring paper which wouldn’t be published. “We tried some stuff but idk, didn’t seem to make a difference.” But it could also make for an interesting paper, “We tried putting healing crystals above cancer patients but it didn’t seem to make any difference.”


There’s certainly a lot to discuss, relative to experimental design and ethics. Peer review and good design hopefully minimize the clearly undesirable scenarios you describe as well as other subtle sources of error.
I was really just trying to explain what we’re looking at on op’s graph.