Pipelines¶

A pipeline is the dependency graph derived from Transform and DataSet manifests (dk.yaml files). Each Transform declares its inputs and outputs; the graph is built automatically by scanning those declarations.

Overview¶

There is no separate pipeline manifest. The graph emerges from the individual Transform and DataSet manifests already present in your project:

Transforms declare spec.inputs and spec.outputs (DataSet references).
DataSets are the nodes that connect transforms together.
Triggers on each Transform control when it runs (schedule, on-change, manual).

Viewing the Graph¶

# Show full dependency graph
dk pipeline show

# Show graph leading to a specific destination DataSet
dk pipeline show --destination event-summary

# Render as Mermaid diagram
dk pipeline show --output mermaid

# Render as Graphviz DOT
dk pipeline show --output dot

# JSON adjacency list
dk pipeline show --output json

# Scan specific directories
dk pipeline show --scan-dir ./transforms --scan-dir ./datasets

Output Formats¶

Format	Description
`text`	Text tree (default)
`mermaid`	Mermaid diagram
`json`	JSON adjacency list
`dot`	Graphviz DOT format

Scheduling¶

Scheduling is configured via the trigger field on a Transform manifest:

# In dk.yaml (Transform)
spec:
  trigger:
    policy: schedule
    schedule:
      cron: "0 6 * * *"
      timezone: America/New_York

Field	Required	Default	Description
`trigger.policy`	Yes	—	Trigger policy (schedule, on-change, manual, composite)
`trigger.schedule.cron`	Yes*	—	Standard 5-field cron expression (* when policy is schedule)
`trigger.schedule.timezone`	No	UTC	IANA timezone for cron evaluation

CLI Commands¶

Command	Description
`dk pipeline show`	Display pipeline dependency graph