P-Value Calculator
P-value from a z-score (normal)
About P-Value Calculator
A p-value is the probability of observing test results at least as extreme as what you got, assuming the null hypothesis is true. The Toolenza calculator converts a z-statistic (or t-statistic with degrees of freedom) into a one- or two-tailed p-value via the normal / Student-t distribution.
What p-value actually means (and doesn't)
It does not mean "the probability the null is true" — that's the famous misinterpretation. It means: if the null were true, how often would we see this result or one more extreme by chance.
A p-value of 0.03 means: if there's truly no effect, we'd still see a result this strong (or stronger) 3% of the time by random sampling.
The 0.05 threshold
Fisher proposed 0.05 in 1925 as a conventional cutoff, with the caveat that it shouldn't be treated as a rule. A century later, "p < 0.05 = significant" has hardened into automatic-reasoning across the social sciences — and the replication crisis is the consequence. Modern best practice:
- Pre-register your hypothesis before collecting data. P-hacking (re-running analyses until p < 0.05 appears) is the #1 source of irreproducible findings.
- Report effect size, not just p. A tiny p-value with a tiny effect is statistical significance without practical significance.
- Use confidence intervals to communicate the uncertainty range, not p-values to communicate a binary verdict.
- For exploratory analyses, use 0.01 or stricter; reserve 0.05 for confirmatory work.
Pitfalls
Multiple comparisons — if you test 20 outcomes at p < 0.05, expect 1 false positive by chance. Adjust with Bonferroni or false-discovery-rate methods. Small samples — p-values from n < 30 are unreliable for non-normal data; use bootstrapping or exact tests instead.
Frequently asked questions
If the null hypothesis were true, there's a 3% chance of observing data at least as extreme as yours. It does NOT mean the null is 3% likely to be true.
Fisher's convention from the 1920s — there's nothing magical about it. Modern guidance emphasises effect sizes, confidence intervals, and pre-registration over arbitrary thresholds.
Two-tailed is the default — you're testing for any difference. One-tailed if you have a directional hypothesis stated before collecting data.
Running many tests until one crosses 0.05 by chance, then reporting only that one. Pre-register your hypothesis and analysis plan to avoid it.
Embed this tool on your site
Drop a one-line iframe snippet into any blog, lesson plan, or knowledge base. Powered-by-Toolenza link included.
Embed this tool
Paste this snippet into any HTML page. The tool runs entirely in your reader's browser.
Related tools
P-Value Calculator
No reviews yet — be the first to share your thoughts.
- No reviews yet — be the first to share your thoughts.