R2 Data Catalog snapshot expiration now removes unreferenced data files
Key Points
- Automatic removal of unreferenced data files during snapshot expiration
- Eliminates manual maintenance jobs and reduces storage costs
- Simplified configuration with Wrangler CLI commands
Summary
Cloudflare R2 Data Catalog, a managed Apache Iceberg catalog, now automatically removes unreferenced data files during snapshot expiration. This eliminates manual maintenance overhead and reduces storage costs.
Key Points
- Automatic cleanup: Snapshot expiration now removes both metadata files (manifests, manifest lists) and orphaned data files in a single operation
- Cost reduction: Stale data files no longer consume storage after being dereferenced by active snapshots
- Operational simplification: Eliminates the need to manually run
remove_orphan_filesorexpire_snapshotsthrough Spark or other engines - Easy enablement: Configure with
npx wrangler r2 bucket catalog snapshot-expiration enablecommand - Flexible retention: Set expiration policies with
--older-than-daysand--retain-lastparameters
Configuration Example
Enable catalog-level snapshot expiration to automatically clean up snapshots older than 7 days while retaining the last 10 snapshots.