Concepts¶

Understand the core concepts that power DataKit. This section explains the fundamental building blocks you'll use when building data pipelines.

Core Concepts¶

High-level architecture and how the components fit together.

Self-contained units of data processing with metadata and code.

Configuration files that define your package: Transform, DataSet, Connector, and Store manifests.

Data contracts — tables, S3 prefixes, topics — with schema and column-level lineage.

Reactive dependency graphs derived from Transform and DataSet manifests.

Track data flow and dependencies with OpenLineage and Marquez.

PII classification, compliance metadata, and data protection.

Development, integration, and production workflows.

Infrastructure contexts that separate what runs from where it runs.

We recommend reading the concepts in this order:

DataKit is built on these principles:

Principle	Description
Developer Experience First	Simple happy path: bootstrap → run → validate → publish → promote
Immutability	Released artifacts cannot be modified; versions are permanent
Separation of Concerns	Connectors, Stores, DataSets, and Transforms have distinct ownership
Security by Default	PII classification is required, not optional
Observability	Every operation emits metrics and lineage events