skopio
Skopio/Glossary/Breach Corpus

What is a Breach Corpus?

A breach corpus is a publicly disclosed dataset of credentials, identifiers, or records exposed in a documented security incident.

Definition

When a major service is breached and the attacker dumps the stolen data publicly, security researchers index that data so users can check whether their accounts were affected. The collection of these indexed disclosures is called a breach corpus. The most authoritative public breach corpus is Have I Been Pwned (HIBP) maintained by Troy Hunt. Other corpora aggregate re-published dumps and add fields HIBP doesn't expose (passwords as hashes, IP addresses, usernames).

Working with breach corpora is legal and standard practice in OSINT — the data is, by definition, in the public domain after disclosure. What matters is what you do with it. Checking your own credentials: legal and recommended. Helping a fraud team check whether a new customer's email shows up in known breaches: legal and standard. Stuffing leaked credentials into login forms: illegal regardless of where you got them. Skopio uses HIBP-class corpora plus aggregated re-publication and exposes search by email, phone, username, IP, and password hash (with k-anonymity protocol so the plaintext never leaves your device).

Real-world examples

  • 1

    Have I Been Pwned (HIBP) — the canonical public breach corpus

  • 2

    The 'Collection #1-#5' aggregated breach drops (2019-2020)

  • 3

    LinkedIn 2021 (700M records exposed)

  • 4

    Adobe 2013 (153M records, password hashes leaked)

  • 5

    Twitter 2022 (5.4M records exposed via API vulnerability)

Related Skopio categories

Lookup categories where this term applies.

Frequently asked questions

Is it legal to use breach corpora?+

Yes for verification and security research. Disclosed breach data is public information after disclosure. Using leaked credentials to break into accounts is illegal — that's credential stuffing, separate from corpus indexing.

Should I be checking my own breach exposure?+

Yes — regularly. New breaches are disclosed weekly. HIBP's free notify-me-on-new-breach service is the simplest way; Skopio's email category gives the same answer plus social/WHOIS context.

Where do breach corpora actually come from?+

Attackers publish dumps on forums or paste sites. Security researchers download, deduplicate, normalize, and index them. The corpus exposes 'is this email in breach X?' without redistributing the underlying credentials.

Are passwords stored in breach corpora?+

Some breaches included passwords in plaintext (Adobe 2013, RockYou 2009); most stored only hashes. Modern breach search uses k-anonymity protocols (only first 5 hash chars sent) so plaintext passwords never travel.

Can I get the actual leaked data?+

Public corpora like HIBP let you check status but don't redistribute the underlying credentials. Some commercial services (Snusbase) expose more fields under subscription. Skopio returns matches and breach names without redistributing leaked credentials.

Попробуйте Skopio для воркфлоу «Breach Corpus»

Первый пробив каждый день — бесплатно. Без карты. Без обязательств.