Skip to content

DataKit Documentation

A Kubernetes-native data pipeline platform enabling teams to contribute reusable, versioned "data packages" with a complete developer workflow.

bootstrap → local run → validate → publish → promote

Get Started View on GitHub


What is DK?

DK (DataKit) is a developer-first platform for building, testing, and deploying data pipelines. It provides:

  • 📦 Data Packages: Self-contained units of data processing with Transforms, Assets, Stores, and Connectors
  • 🔄 GitOps Workflow: PR-based promotion through dev → int → prod environments
  • 📊 Data Lineage: Automatic OpenLineage tracking with Marquez integration
  • 🔒 Governance: Built-in PII classification and compliance metadata

🚀 Getting Started

New to DK? Start here to install the CLI and run your first pipeline in under 30 minutes.

Get Started →

📚 Concepts

Understand the core concepts: data packages, manifests, lineage, and governance.

Learn Concepts →

🛠 Tutorials

Step-by-step guides for building real-world pipelines and workflows.

View Tutorials →

📖 Reference

Complete CLI reference, manifest schemas, and configuration options.

Browse Reference →


The DK Workflow

# 1. Create a new data package
dk init my-pipeline --runtime generic-python

# 2. Start local development environment
dk dev up

# 3. Validate your package
dk lint ./my-pipeline

# 4. Run locally
dk run ./my-pipeline

# 5. Build and publish
dk build ./my-pipeline
dk publish ./my-pipeline

# 6. Promote to an environment
dk promote my-pipeline v0.1.0 --to dev

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                         Developer                            │
│                            │                                 │
│                            ▼                                 │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                   DK CLI (dk)                        │   │
│  │  init, dev, run, lint, build, publish, promote      │   │
│  └─────────────────────────────────────────────────────┘   │
│                            │                                 │
│              ┌─────────────┴──────────────┐                 │
│              ▼                            ▼                 │
│  ┌────────────────────┐      ┌────────────────────┐        │
│  │       SDK          │      │    OCI Registry    │        │
│  │  Validation        │      │  Immutable Pkgs    │        │
│  │  Lineage Emit      │      │                    │        │
│  └────────────────────┘      └────────────────────┘        │
│                                          │                  │
│                                          ▼                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │               GitOps (Kustomize + ArgoCD)            │   │
│  └─────────────────────────────────────────────────────┘   │
│                            │                                 │
│                            ▼                                 │
│  ┌─────────────────────────────────────────────────────┐   │
│  │            Kubernetes Platform Controller            │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

View Full Architecture →


Need Help?