Data cleaning & validation pipeline
Attach a raw dataset and the agent profiles, cleans, and validates it against explicit rules, then drafts the cleaned data and a quality report for you to approve.
What it installs
Agents 1
-
Data Engineer Agent
Cleans and validates data, QAs its own output, and drafts a quality report.
Workflows 1
-
Clean and validate data
Profile, clean, validate, QA the output, then draft the cleaned data and quality report for approval.
Documents 1
-
Validation rules
Your editable config of the checks, tolerances, and exclusions that define when a dataset counts as clean. Set these to steer the pipeline.
Goals 1
-
Only clean data flows downstream
Only validated, reproducibly cleaned data enters downstream analysis.
Skills 3
-
validate-data
QA a dataset and the analysis built on it before it is trusted — methodology, calculation, and bias checks producing a confidence assessment. Adapted from anthropics/knowledge-work-plugins/validate-data.
-
data-context-extractor
Profile messy data and extract its structure, entities, metrics, and hidden hygiene rules before transforming it. Adapted from anthropics/knowledge-work-plugins/data-context-extractor.
-
etl-pipeline
Design a repeatable extract-transform-load flow with cleaning, validation, idempotency, and quality reporting. Adapted from claude-office-skills/skills/etl-pipeline.
Folders 1
-
Data Cleaning
Requirements
What this template expects to do its job. Task Machine does not verify these — you decide whether your setup is ready.
- Product analytics access — Connect your product analytics so the agent can profile and cross-check source tables directly. Until you connect it, it works from attached exports and the validation rules document.
Get started
Install Data cleaning & validation pipeline and run it with approvals.
Join the waitlist and we will send early access when the first private beta spots open.
Private beta. We invite teams in batches and never share your email.