Skip to content

Guides

In-depth coverage of DataChain capabilities. Start with Get Data In and Transform for the core workflow, then explore deeper topics as needed.

Get Data In

  • Reading Data: storage files, structured formats, SQL databases, in-memory sources, metadata merging
  • Remote Storage: S3, GCS, Azure configuration, credentials, and access patterns

Transform

Get Data Out

  • Exporting Data: pandas, Parquet, CSV, JSON, PyTorch DataLoader, train/test split, storage, SQL databases

Datasets

  • Datasets: creating, versioning, namespaces, comparing, management, metrics

Knowledge Base

  • Knowledge Base: skill installation, dc-knowledge/ generation, agent workflow, browsing

Scale and Recover

Reference