Skip to main content
PIIpersonally identifiable informationpersonal dataGDPR PIIdata classificationsensitive data

What Is PII (Personally Identifiable Information)?

Personally identifiable information (PII) is any data that can be used, on its own or in combination with other information, to identify a specific living individual. The term originates in US regulatory and federal information security language but is now used broadly across privacy, security, and data governance globally — often as a near-synonym for what European law calls "personal data" under the GDPR.

PII includes the obvious — name, government-issued ID number, email address, phone number, residential address — but also a long list of less obvious data: IP addresses, cookie identifiers, device fingerprints, MAC addresses, biometric templates, behavioral patterns, location traces, and combinations of demographic attributes that uniquely identify someone in context. Modern privacy regulation is built around the principle that identifiability, not data type, is what matters. Two pieces of innocuous data combined can become PII even if neither alone is.

TL;DR

PII is any information that can identify a specific living person — directly or indirectly, alone or in combination. It includes names, IDs, emails, IP addresses, location data, biometrics, and identifying behavioral patterns. Governance depends on knowing where PII lives, how it flows, and who can access it — capabilities a data catalog with classification and lineage provides. The PII inventory is the prerequisite for every privacy regulation, every breach response, and every subject access request.

PII Defined

The most widely cited PII definition comes from NIST Special Publication 800-122: "Any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."

Two ideas in this definition matter operationally. The first is "distinguish or trace" — PII includes both direct identifiers (a passport number identifies one person) and indirect identifiers (a birthdate plus postal code plus gender identifies a small handful of people, often exactly one in a small population). The second is "linked or linkable" — information that becomes PII when joined with other information available to a reasonable actor. The implication: whether a field is PII is not always a property of the field in isolation. It is a property of the field plus the rest of the data ecosystem it can be combined with.

Categories of PII

PII is usually grouped into categories that drive different protection requirements and governance treatments.

Direct identifiers

Data points that, on their own, identify an individual. Examples: full name, government identification numbers (SSN, national ID, passport, driver's license), email address, phone number, residential or work address, unique account identifiers, biometric records (fingerprints, facial geometry, iris patterns, DNA profiles), photographs and video where the subject is recognizable.

Indirect or quasi-identifiers

Data points that do not identify someone alone, but do in combination. Examples: date of birth, postal code, gender, ethnicity, occupation, employer, education level, behavioral patterns. Classic re-identification research (Sweeney, Narayanan and Shmatikov, and others) has repeatedly shown that surprisingly small combinations of quasi-identifiers — date of birth + ZIP code + gender — are sufficient to uniquely identify the large majority of individuals in a population. Datasets stripped of direct identifiers but rich in quasi-identifiers are not anonymous.

Online and device identifiers

Identifiers that originate in digital systems but are routinely linked back to individuals: IP addresses, MAC addresses, IMEI numbers, advertising IDs (AAID, IDFA), cookie identifiers, device fingerprints, persistent session IDs. GDPR explicitly classifies online identifiers as personal data; US case law and FTC guidance increasingly do the same.

Financial PII

Payment card numbers, bank account numbers, IBANs, transaction histories tied to identifiable individuals. Subject to PCI-DSS for cardholder data and to financial privacy regulations like GLBA in the US.

Health PII (PHI)

Protected Health Information in US terminology — health status, medical history, treatment, and payment information linked to an individual. Subject to HIPAA in the US and to GDPR's special category protection in the EU.

Genetic and biometric PII

Genetic profiles, fingerprints, facial geometry, voice prints, gait patterns, retinal scans. Treated as especially sensitive under most major privacy regimes because of their immutability — a compromised password can be changed, a compromised fingerprint cannot.

Behavioral and inferred PII

Increasingly, data inferred about individuals — predicted health conditions, predicted political views, predicted sexual orientation — is treated as PII under modern privacy laws. The fact that the data was inferred rather than collected does not change its identifiability or sensitivity. AI and machine learning systems generate substantial volumes of inferred PII that need governance treatment equivalent to collected PII.

PII — Categories & Governance Treatment PII — CATEGORIES & GOVERNANCE TREATMENT Identifiability is contextual — fields combine to become PII "Whether a field is PII is not a property of the field alone; it is a property of the data ecosystem" DIRECT IDENTIFIERS Name · SSN · Email Passport · Account ID QUASI- IDENTIFIERS DOB · ZIP · Gender Occupation · Behavior ONLINE & DEVICE IDENTIFIERS IP · Cookie · AAID MAC · Fingerprint FINANCIAL PII PAN · IBAN Transaction history HEALTH PII (PHI) Diagnoses · Treatment Medical history · Insurance HIPAA · GDPR special category BIOMETRIC & GENETIC Fingerprint · Face · Iris Voice · DNA · Gait Immutable — extra-sensitive BEHAVIORAL / INFERRED Predicted health · politics Browsing history · location AI-generated PII counts too Governance Treatment — same backbone, calibrated by category PII discovery & classification · Catalog tagging · Lineage across systems · Ownership · Access control Masking & tokenization · Retention & deletion · Audit trail · DSR fulfillment workflow "You cannot protect PII you have not catalogued. You cannot delete what you cannot find."
Click to enlarge

PII vs Personal Data

"PII" and "personal data" are often used interchangeably, but the technical scope differs and the difference matters for compliance:

  • PII (US terminology) — Anchored in US federal information security and NIST guidance. Historically narrower, focusing on information that can be used to "distinguish or trace" identity. The exact boundary has expanded over time but remains less formal than the EU equivalent.
  • Personal data (EU terminology) — Defined in GDPR Article 4 as "any information relating to an identified or identifiable natural person." This is intentionally broad — broader than the historical US PII definition — and explicitly includes online identifiers, location data, and inferred information.

In practical governance, the safer working definition is the broader one: any information that relates to an identifiable individual, alone or in combination, including online identifiers and inferred attributes. Build governance against this superset and the entity is covered under both regimes. Build it against a narrower US-style PII definition and the entity is exposed under GDPR, the UK GDPR, and an increasing number of US state laws that have adopted the EU-style definition.

Sensitive PII

Within the universe of PII, most regulations identify a subset that warrants additional protection. The exact categories differ, but the typical "sensitive" or "special category" set includes:

  • Racial or ethnic origin
  • Political opinions, religious or philosophical beliefs
  • Trade union membership
  • Genetic and biometric data (when used for identification)
  • Health data, including mental health
  • Sex life and sexual orientation
  • Criminal convictions and offenses
  • Government-issued identification numbers in many jurisdictions
  • Financial account credentials and payment card data
  • Precise geolocation
  • Children's data

Processing sensitive PII typically requires a stronger legal basis (explicit consent, employment or social security necessity, vital interests, etc.) and stronger technical controls. From a governance perspective, the practical requirement is that sensitive PII fields can be reliably identified across the catalog so that policy can be enforced consistently — encryption, masking, access restriction, retention limits — without exception.

Regulations Governing PII

PII handling is regulated by a thick layer of overlapping laws and standards. The most consequential in 2026:

  • GDPR (EU) — The most comprehensive personal data regime globally. See data privacy for the full breakdown.
  • UK GDPR & Data Protection Act 2018 — UK's post-Brexit GDPR analogue, substantively similar to EU GDPR with some divergences.
  • CCPA / CPRA (California) — Sets requirements for sale, sharing, and use of California residents' personal information; the de facto US national floor.
  • HIPAA (US) — Federal law governing Protected Health Information for covered entities and business associates.
  • GLBA (US) — Federal financial privacy law for financial institutions handling Non-Public Personal Information.
  • PCI-DSS — Industry standard for entities handling payment card data. Not a law, but contractually mandatory.
  • State-level US privacy laws — Virginia (VCDPA), Colorado (CPA), Connecticut (CTDPA), Utah (UCPA), Texas (TDPSA), Oregon (OCPA), and more than 15 others — most modeled loosely on GDPR.
  • LGPD (Brazil), PIPL (China), POPIA (South Africa), India DPDPA, Quebec Law 25 — Major non-US, non-EU regimes with similar structural obligations but distinct legal bases and rights.

Multinational organizations rarely have the luxury of complying with one regime in isolation. The practical pattern is to build a single PII inventory and treatment policy that satisfies the strictest applicable rule for each data category — and to serve different regulators with focused views of that single underlying truth.

How to Govern PII at Scale

PII governance is not solved by a privacy policy on the website. It is operationally solved by infrastructure that answers four questions reliably:

  1. Where is the PII? Automated data classification tied to a comprehensive data catalog. The classification engine must cover structured systems (databases, warehouses, lakehouses), semi-structured stores, and increasingly unstructured data — documents, emails, support tickets, transcripts — where most undiscovered PII actually lives.
  2. How does it flow? System-level and column-level lineage so that when a subject requests erasure, the catalog can locate every downstream copy. Without lineage, erasure is best-effort.
  3. Who can access it? Role-based and attribute-based access controls integrated with the catalog so that PII fields and PII-bearing assets carry their access rules with them as they are reused. Data masking for non-privileged access reduces both regulatory exposure and breach blast radius.
  4. Who is accountable for it? Documented data ownership so that every PII-bearing asset has a named accountable party — for breach response, subject access requests, retention decisions, and ongoing risk reviews.

These four capabilities together convert PII governance from a paper exercise into an enforceable operating model. They are also, not coincidentally, the same capabilities that satisfy GDPR Article 30 records of processing, NIS2 supply chain provisions, DORA's asset inventory obligations, and the data-handling requirements of every major AI regulation now appearing on the horizon. The PII inventory is the substrate on which all of these regimes sit.

Conclusion

PII is not a category of fields. It is a property of information in a context — a property that expands as data is combined, joined, inferred, and shared. Treating PII governance as a list of tagged columns will miss most of the actual risk. Treating it as an infrastructure problem — discovery, classification, lineage, ownership, access control, audit — is what scales. The organizations that have built that infrastructure for general data governance are the ones with credible PII programs. The organizations that are still trying to assemble PII inventories from spreadsheets are the ones surprised by every breach and every subject access request.

See it in action

Data & Analytics Catalog

Create a unified view of your data assets and gain insights faster with automated data discovery.

Next step

Trusted data starts here.

Pick one problem. We map the data first, fix what's broken, then help your team trust every number.

Take the product tour
© Dawiso s.r.o. All rights reserved