R2 Data Catalog snapshot expiration now removes unreferenced data files
Key Points
- Snapshot expiration now deletes unreferenced data files
- Reduces storage costs and manual maintenance
- Enable via Wrangler snapshot-expiration settings
Summary
R2 Data Catalog (the managed Apache Iceberg catalog in R2) now removes unreferenced data files from R2 storage when snapshots are expired. Previously snapshot expiration only cleaned up Iceberg metadata (manifests and manifest lists), leaving orphaned data files until you ran manual maintenance (for example, Spark's remove_orphan_files or expire_snapshots). This change reduces storage costs and operational overhead by automating data-file cleanup.
Key Points
- Automatic deletion of data files that are no longer referenced by retained snapshots when a snapshot is expired.
- Prior behavior only removed metadata files; stale data files required manual reclamation.
- Operational impact: lower storage costs and fewer manual maintenance jobs.
- Example to enable catalog-level expiration via Wrangler:
npx wrangler r2 bucket catalog snapshot-expiration enable my-bucket --older-than-days 7 --retain-last 10. - See the maintenance documentation for retention semantics and other automatic maintenance operations.