P-Value Calculator

P-value from a z-score (normal)

Calculators

P-Value Calculator

P-value from a z-score (normal)

Z-score

P-value (two-tailed)

Updates as you type

About P-Value Calculator

A p-value is the probability of observing test results at least as extreme as what you got, assuming the null hypothesis is true. The Toolenza calculator converts a z-statistic (or t-statistic with degrees of freedom) into a one- or two-tailed p-value via the normal / Student-t distribution.

What p-value actually means (and doesn't)

It does not mean "the probability the null is true" — that's the famous misinterpretation. It means: if the null were true, how often would we see this result or one more extreme by chance.

A p-value of 0.03 means: if there's truly no effect, we'd still see a result this strong (or stronger) 3% of the time by random sampling.

The 0.05 threshold

Fisher proposed 0.05 in 1925 as a conventional cutoff, with the caveat that it shouldn't be treated as a rule. A century later, "p < 0.05 = significant" has hardened into automatic-reasoning across the social sciences — and the replication crisis is the consequence. Modern best practice:

Pre-register your hypothesis before collecting data. P-hacking (re-running analyses until p < 0.05 appears) is the #1 source of irreproducible findings.
Report effect size, not just p. A tiny p-value with a tiny effect is statistical significance without practical significance.
Use confidence intervals to communicate the uncertainty range, not p-values to communicate a binary verdict.
For exploratory analyses, use 0.01 or stricter; reserve 0.05 for confirmatory work.

Pitfalls

Multiple comparisons — if you test 20 outcomes at p < 0.05, expect 1 false positive by chance. Adjust with Bonferroni or false-discovery-rate methods. Small samples — p-values from n < 30 are unreliable for non-normal data; use bootstrapping or exact tests instead.

Frequently asked questions

If the null hypothesis were true, there's a 3% chance of observing data at least as extreme as yours. It does NOT mean the null is 3% likely to be true.

Fisher's convention from the 1920s — there's nothing magical about it. Modern guidance emphasises effect sizes, confidence intervals, and pre-registration over arbitrary thresholds.

Two-tailed is the default — you're testing for any difference. One-tailed if you have a directional hypothesis stated before collecting data.

Running many tests until one crosses 0.05 by chance, then reporting only that one. Pre-register your hypothesis and analysis plan to avoid it.

Embed this tool on your site

Drop a one-line iframe snippet into any blog, lesson plan, or knowledge base. Powered-by-Toolenza link included.

Embed this tool

Paste this snippet into any HTML page. The tool runs entirely in your reader's browser.

<iframe src="https://toolenza.io/tools/p-value" width="100%" height="600" frameborder="0" loading="lazy" allow="clipboard-write" style="border:0;border-radius:12px;box-shadow:0 1px 3px rgba(0,0,0,.08);"></iframe>

Preview embed →