DataKit Documentation¶
A Kubernetes-native data pipeline platform enabling teams to contribute reusable, versioned "data packages" with a complete developer workflow.
bootstrap → local run → validate → publish → promote
What is DK?¶
DK (DataKit) is a developer-first platform for building, testing, and deploying data pipelines. It provides:
- 📦 Data Packages: Self-contained units of data processing with Transforms, Assets, Stores, and Connectors
- 🔄 GitOps Workflow: PR-based promotion through dev → int → prod environments
- 📊 Data Lineage: Automatic OpenLineage tracking with Marquez integration
- 🔒 Governance: Built-in PII classification and compliance metadata
Quick Links¶
Getting Started¶
New to DK? Start here to install the CLI and run your first pipeline in under 30 minutes.
Concepts¶
Understand the core concepts: data packages, manifests, lineage, and governance.
The DK Workflow¶
# 1. Create a new data package
dk init my-pipeline --runtime generic-python
# 2. Start local development environment
dk dev up
# 3. Validate your package
dk lint ./my-pipeline
# 4. Run locally
dk run ./my-pipeline
# 5. Build and publish
dk build ./my-pipeline
dk publish ./my-pipeline
# 6. Promote to an environment
dk promote my-pipeline v0.1.0 --to dev
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ Developer │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ DK CLI (dk) │ │
│ │ init, dev, run, lint, build, publish, promote │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────┴──────────────┐ │
│ ▼ ▼ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ SDK │ │ OCI Registry │ │
│ │ Validation │ │ Immutable Pkgs │ │
│ │ Lineage Emit │ │ │ │
│ └────────────────────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ GitOps (Kustomize + ArgoCD) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Kubernetes Platform Controller │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Need Help?¶
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
- Contributing - How to contribute to DK
- GitHub Issues - Report bugs or request features