Seek Optimized Zip (SOZip) files are cloud native archives
I recently came across this new specification called Seek Optimized Zip (SOZip). It builds upon the ZIP format and is compatible with existing ZIP tools. The key feature of SOZip is that it’s a seekable archive format which means that you can decompress and read a file from the archive without having to download the entire archive.
SOZip offers some excellent advantages for storing large datasets in the cloud:
- We can combine multiple files into a single archive while storing them in object storage like S3. This can help reduce per request costs and improve performance when storing large number of small files.
- Since SOZip is a seekable archive format, we can decompress and read a specific file from the archive without having to download the entire archive. This can help reduce the amount of data that needs to be downloaded when working with large datasets.
- The files in the SOZip archive are compressed. So we save on storage costs too.