Testing Guide¶
This document describes the testing strategy and best practices for the DataKit codebase.
Quick Start¶
# Run all tests
make test
# Run tests with coverage
make coverage
# Run tests for a specific module
cd sdk && go test -v ./...
# Run tests in short mode (skip E2E)
go test -short ./...
Test Organization¶
Tests are organized following Go conventions:
contracts/
├── transform.go
├── transform_test.go # Unit tests for transform.go
├── asset.go
├── asset_test.go # Unit tests for asset.go
├── connector.go
├── store.go
└── testdata/ # Test fixtures
sdk/
├── validate/
│ ├── validator.go
│ ├── validator_test.go # Unit tests
│ └── testdata/ # Valid/invalid fixtures
└── manifest/
├── parser.go
├── parser_test.go
└── testdata/
cli/
├── cmd/
│ ├── lint.go
│ └── lint_test.go
└── internal/
└── testutil/ # Shared test utilities
tests/
└── e2e/ # End-to-end tests
├── workflow_test.go
└── testdata/
Test Patterns¶
Table-Driven Tests¶
We use table-driven tests for functions with multiple input scenarios:
func TestValidateTransform(t *testing.T) {
tests := []struct {
name string
input *contracts.Transform
wantErr bool
}{
{"valid transform", validTransform(), false},
{"missing name", &contracts.Transform{}, true},
{"empty version", &contracts.Transform{Metadata: contracts.TransformMetadata{Name: "test"}}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := ValidateTransform(tt.input)
if (err != nil) != tt.wantErr {
t.Errorf("error = %v, wantErr %v", err, tt.wantErr)
}
})
}
}
Test Fixtures¶
Test data is stored in testdata/ directories:
testdata/
├── valid/
│ ├── transform-full.yaml
│ └── asset-basic.yaml
├── invalid/
│ ├── missing-name.yaml
│ └── invalid-store-ref.yaml
└── golden/
└── expected-output.json
Load fixtures in tests:
func loadFixture(t *testing.T, path string) []byte {
t.Helper()
data, err := os.ReadFile(filepath.Join("testdata", path))
if err != nil {
t.Fatalf("failed to load fixture: %v", err)
}
return data
}
Mocking¶
We use interface-based mocks for external dependencies:
// Mock implementation
type mockRegistryClient struct {
pushFunc func(ctx context.Context, artifact Artifact) error
}
func (m *mockRegistryClient) Push(ctx context.Context, a Artifact) error {
if m.pushFunc != nil {
return m.pushFunc(ctx, a)
}
return nil
}
// Usage in test
client := &mockRegistryClient{
pushFunc: func(ctx context.Context, a Artifact) error {
return nil // Simulate success
},
}
Running Tests¶
Unit Tests¶
# All unit tests
make test-unit
# Specific package
cd sdk && go test -v ./validate/...
# With race detection
go test -race ./...
# With coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out # View HTML report
End-to-End Tests¶
E2E tests require Docker and test the full workflow:
E2E tests: - Use the real dk CLI binary - Run in isolated temp directories - Clean up after themselves - Skip automatically if Docker is unavailable
Coverage Reports¶
# Generate coverage report
make coverage
# View coverage by function
go tool cover -func=coverage/combined.out
# View HTML report
go tool cover -html=coverage/sdk.out
Coverage Targets¶
| Package | Target | Rationale |
|---|---|---|
| contracts/ | 90% | Core types, high stability |
| sdk/validate/ | 80% | Business logic |
| sdk/manifest/ | 80% | Parsing logic |
| sdk/registry/ | 70% | External integration |
| cli/cmd/ | 60% | UI layer |
| Overall | 70% | CI threshold |
Writing New Tests¶
Checklist¶
- ✅ Test file named
*_test.goin same package - ✅ Use table-driven pattern for multiple scenarios
- ✅ Mock external dependencies
- ✅ Include edge cases and error conditions
- ✅ Add fixtures to
testdata/if needed - ✅ Run
go test -raceto check for races
Common Patterns¶
Testing errors:
if (err != nil) != tt.wantErr {
t.Errorf("error = %v, wantErr %v", err, tt.wantErr)
}
if tt.wantErrMsg != "" && !strings.Contains(err.Error(), tt.wantErrMsg) {
t.Errorf("error message = %v, want containing %v", err, tt.wantErrMsg)
}
Testing CLI commands:
cmd := NewRootCmd()
buf := new(bytes.Buffer)
cmd.SetOut(buf)
cmd.SetArgs([]string{"lint", "--strict"})
err := cmd.Execute()
Testing file operations:
tmpDir := t.TempDir() // Automatically cleaned up
path := filepath.Join(tmpDir, "test.yaml")
err := os.WriteFile(path, data, 0644)
CI Integration¶
Tests run automatically on every PR:
- Test: go test with race detection and coverage
- Build: Verify all modules build successfully
Coverage reports are uploaded as artifacts and the coverage percentage is displayed in the job summary.
Using Seed Profiles for Test Data¶
Assets can declare multiple seed profiles in their dev.seed section. This lets you set up different data scenarios for tests without writing SQL or managing fixture files separately.
Defining profiles¶
spec:
dev:
seed:
inline: # default profile
- { id: 1, name: "alice" }
- { id: 2, name: "bob" }
profiles:
large:
file: testdata/1000-rows.csv
edge-cases:
inline:
- { id: -1, name: "" }
- { id: 999, name: "O'Reilly" }
empty: {}
Loading profiles in tests¶
Switch between profiles to run the same pipeline against different data:
# Reset to default data
dk dev seed
# Run with edge-case data
dk dev seed --profile edge-cases
dk run
# Run with a large data set
dk dev seed --profile large
dk run
# Test with an empty table
dk dev seed --profile empty --clean
dk run
Idempotency¶
Seed runs are idempotent by default. A SHA-256 checksum of the resolved data is stored in a _dp_seed_meta table in PostgreSQL. On subsequent runs, if the data hasn't changed, the seed is skipped entirely — no duplicate-key errors, no unnecessary writes.
Use --force to re-seed even when the checksum matches, or --clean to DROP and recreate the table from scratch.
Troubleshooting¶
Tests fail with "package not found"¶
Ensure you're in the correct module directory:
E2E tests require Docker¶
Check Docker is running:
Skip E2E tests if Docker is unavailable:
Coverage is below threshold¶
Identify untested code: