Featured
Don’t Just Download S3 Files, Stream and Zip Them the Go Way
Need to download hundreds of files from S3 and serve them as a single downloadable zip file, without overloading your system or your user’s patience? In this blog, I’ll walk you through a production-grade solution written in Go that uses parallelism, streaming, and minimal memory overhead to efficiently zip and deliver S3 files.
🧠 The Problem
Let’s say your system stores PDFs, images, or reports in AWS S3, and a client wants to download a selected set of them, ideally, as a single zip archive.
Instead of downloading them one by one and then zipping them (which is inefficient and resource-heavy), you want to:
- Fetch all files in parallel
- Stream them into a single
.zipfile on the fly - Avoid loading entire files into memory
- Serve or upload the resulting archive efficiently
What seems like a simple task quickly becomes non-trivial when performance, memory constraints, and scalability enter the equation, especially when you’re working with large file sets.
🏗️ The Solution Architecture
To handle this efficiently, we’ll leverage Go’s concurrency model and streaming IO:
goroutines+sync.WaitGroupfor concurrent S3 downloadsio.Pipeto stream data from S3 directly into a zip archivechannelsfor coordination between downloading and zipping
Flow Overview
- Spin up a goroutine for each S3 file using the AWS SDK
- Stream each file into an
io.Pipe - Read from each pipe and write to a zip archive in parallel
- Properly close all writers and the zip stream after completion
🧬 Core Concepts in Use
io.Pipe: Bridges two goroutines, one writes (downloads from S3), the other reads (writes into zip).sync.WaitGroup: Waits for all download and zip workers to complete.chan: Channels coordinate the work and share metadata like filenames and zip paths.
💻 Code Walkthrough
Let’s walk through the simplified core functions that do the heavy lifting.
1. Kick Off Concurrent S3 Downloads
func DownloadMultipleS3DocumentAsync(
s3Documents []DocumentS3Data,
pipeMap map[string]*io.PipeReader,
wg *sync.WaitGroup,
fileChannel chan S3ChannelData,
s3Client *s3.Client,
) {
downloader := manager.NewDownloader(s3Client)
for _, doc := range s3Documents {
pipeReader, pipeWriter := io.Pipe()
pipeMap[doc.Key] = pipeReader
wg.Add(1)
go func(doc DocumentS3Data, pw *io.PipeWriter) {
defer wg.Done()
defer pw.Close()
fileChannel <- S3ChannelData{Key: doc.Key, Name: doc.Name}
_, err := downloader.Download(context.TODO(), PipeWriterWrapper{Writer: pw}, &s3.GetObjectInput{
Bucket: aws.String(doc.Bucket),
Key: aws.String(doc.Key),
})
if err != nil {
log.Errorf("Failed download: %s", err)
}
}(doc, pipeWriter)
}
}
Each file is streamed directly from S3 into a PipeWriter, which the zip writer will later consume.
2. Stream Pipes Into a Zip File
func WriteToZipAsync(
pipeMap map[string]*io.PipeReader,
fileChannel chan S3ChannelData,
wg *sync.WaitGroup,
zipFileChannel chan *os.File,
) {
go func() {
zipFile, err := os.Create(fmt.Sprintf("%s.zip", uuid.NewString()))
if err != nil {
log.Fatalf("Zip file creation failed: %v", err)
}
zipWriter := zip.NewWriter(zipFile)
for data := range fileChannel {
reader := pipeMap[data.Key]
f, err := zipWriter.Create(data.Name)
if err != nil {
log.Errorf("Zip entry creation failed: %v", err)
continue
}
s3FileData := make([]byte, 0, 5 * 1024 * 1024) // 5 MB
wg.Add(1)
for {
n, err := reader.Read(s3FileData[:cap(s3FileData)])
if err != nil && err != io.EOF { // if err is EOF, file reading is completed
log.Errorf("Error reading from pipeReader for file: %s: %s", data.Key, err)
return
}
if n == 0 { // if n is 0, file reading is completed
wg.Done()
return
}
_, err = f.Write(s3FileData[:n])
if err != nil {
log.Errorf("Error writing to zip file for file: %s: %s", data.Name, err)
return
}
}
}
zipWriter.Close()
zipFile.Close()
zipFileChannel <- zipFile
}()
}
- Zipping happens concurrently but safely via buffered IO and synchronization.
- Files are never fully loaded into memory
f.Writetransfers the data in chunks from pipe data read in memory to zip directly
3. Putting It All Together
func CreateZippedArchive(s3Docs []DocumentS3Data, s3Client *s3.Client) (*os.File, error) {
var wg sync.WaitGroup
pipeMap := make(map[string]*io.PipeReader)
fileChannel := make(chan S3ChannelData)
zipFileChannel := make(chan *os.File)
DownloadMultipleS3DocumentAsync(s3Docs, pipeMap, &wg, fileChannel, s3Client)
WriteToZipAsync(pipeMap, fileChannel, &wg, zipFileChannel)
wg.Wait()
close(fileChannel)
return <-zipFileChannel, nil
}
This is your end-to-end engine to stream files from S3 and produce a zip archive, fast and efficiently.
⚡ Why This Is Fast and Scalable
- No Temp Storage: All files are streamed, nothing is written to disk until the final zip.
- Controlled Memory:
io.Pipeandio.Copyensure minimal memory usage, even with large files. - High Throughput:
goroutinesallow full network and IO parallelism. - Safe Cleanup: Proper use of
WaitGroupand pipe closures avoids leaks or zip corruption.
🛠️ Production Considerations
- Use
context.WithTimeoutto cancel slow or hanging downloads. - Monitor memory use if files are massive.
- Limit parallelism if hitting AWS throttling or bandwidth bottlenecks.
- Add logging and retries for network reliability.
🧪 Real-World Use Case
This pattern is live in a production system that serves thousands of users, enabling them to bulk download reports, media, and invoices without latency or resource bottlenecks. It’s been performance-tested with dozens to hundreds of concurrent files at once.
✅ TL;DR
To download and zip multiple S3 files in Go, efficiently and concurrently:
- Use
io.Pipeto stream file content - Launch parallel downloads using goroutines
- Zip them using a shared
zip.Writer - Coordinate all work with
sync.WaitGroupandchannels
🚀 Conclusion
Implementing a high-performance S3 bulk download and zip service in Go isn’t just about saving time, it’s about building scalable systems that respect memory, I/O and user experience.
By leveraging Go’s concurrency primitives, streaming via io.Pipe, and clean channel orchestration, we avoid temporary file overhead and unlock true parallelism.
Whether you’re zipping tens or thousands of documents, this pattern can serve as a battle-tested foundation for production-scale document bundling workflows. If you’re looking to integrate similar capabilities in your system, I hope this breakdown gives you both the architectural clarity and code-level confidence to get started.
Learn more Building a High-Performance S3 Bulk Downloader and Zipper in Go
