Concepts¶
Understand the core concepts that power DataKit. This section explains the fundamental building blocks you'll use when building data pipelines.
Core Concepts¶
Data Packages¶
Self-contained units of data processing with metadata and code.
Manifests¶
Configuration files that define your package: Transform, DataSet, Connector, and Store manifests.
DataSets¶
Data contracts — tables, S3 prefixes, topics — with schema and column-level lineage.
Cells & Stores¶
Infrastructure contexts that separate what runs from where it runs.
Learning Path¶
We recommend reading the concepts in this order:
- Overview - Start with the big picture
- Data Packages - Understand the core unit of work
- Manifests - Learn how to configure packages
- DataSets - Data contracts with schema and classification
- Pipelines - Understand the reactive dependency graph
- Lineage - Track data flow
- Governance - Classify and protect data
- Environments - Deploy across stages
- Cells & Stores - Understand the Package × Cell model
Key Principles¶
DataKit is built on these principles:
| Principle | Description |
|---|---|
| Developer Experience First | Simple happy path: bootstrap → run → validate → publish → promote |
| Immutability | Released artifacts cannot be modified; versions are permanent |
| Separation of Concerns | Connectors, Stores, DataSets, and Transforms have distinct ownership |
| Security by Default | PII classification is required, not optional |
| Observability | Every operation emits metrics and lineage events |