Featured
Don’t Just Download S3 Files, Stream and Zip Them the Go Way
Need to download hundreds of files from S3 and serve them as a single downloadable zip file, without overloading your system or your user’s patience? In this blog, I’ll walk you through a production-grade solution written in Go that uses parallelism, streaming, and minimal memory overhead to efficiently zip and deliver S3 files.
🧠 The Problem
Let’s say your system stores PDFs, images, or reports in AWS S3, and a client wants to download a selected set of them, ideally, as a single zip archive.
Instead of downloading them one by one and then zipping them (which is inefficient and resource-heavy), you want to:
- Fetch all files in parallel
- Stream them into a single
.zip
file on the fly - Avoid loading entire files into memory
- Serve or upload the resulting archive efficiently
What seems like a simple task quickly becomes non-trivial when performance, memory constraints, and scalability enter the equation, especially when you’re working with large file sets.
🏗️ The Solution Architecture
To handle this efficiently, we’ll leverage Go’s concurrency model and streaming IO:
goroutines
+sync.WaitGroup
for concurrent S3 downloadsio.Pipe
to stream data from S3 directly into a zip archivechannels
for coordination between downloading and zipping
Flow Overview
- Spin up a goroutine for each S3 file using the AWS SDK
- Stream each file into an
io.Pipe
- Read from each pipe and write to a zip archive in parallel
- Properly close all writers and the zip stream after completion
🧬 Core Concepts in Use
io.Pipe
: Bridges two goroutines, one writes (downloads from S3), the other reads (writes into zip).sync.WaitGroup
: Waits for all download and zip workers to complete.chan
: Channels coordinate the work and share metadata like filenames and zip paths.
💻 Code Walkthrough
Let’s walk through the simplified core functions that do the heavy lifting.
1. Kick Off Concurrent S3 Downloads
func DownloadMultipleS3DocumentAsync(
s3Documents []DocumentS3Data,
pipeMap map[string]*io.PipeReader,
wg *sync.WaitGroup,
fileChannel chan S3ChannelData,
s3Client *s3.Client,
) {
downloader := manager.NewDownloader(s3Client)
for _, doc := range s3Documents {
pipeReader, pipeWriter := io.Pipe()
pipeMap[doc.Key] = pipeReader
wg.Add(1)
go func(doc DocumentS3Data, pw *io.PipeWriter) {
defer wg.Done()
defer pw.Close()
fileChannel <- S3ChannelData{Key: doc.Key, Name: doc.Name}
_, err := downloader.Download(context.TODO(), PipeWriterWrapper{Writer: pw}, &s3.GetObjectInput{
Bucket: aws.String(doc.Bucket),
Key: aws.String(doc.Key),
})
if err != nil {
log.Errorf("Failed download: %s", err)
}
}(doc, pipeWriter)
}
}
Each file is streamed directly from S3 into a PipeWriter
, which the zip writer will later consume.
2. Stream Pipes Into a Zip File
func WriteToZipAsync(
pipeMap map[string]*io.PipeReader,
fileChannel chan S3ChannelData,
wg *sync.WaitGroup,
zipFileChannel chan *os.File,
) {
go func() {
zipFile, err := os.Create(fmt.Sprintf("%s.zip", uuid.NewString()))
if err != nil {
log.Fatalf("Zip file creation failed: %v", err)
}
zipWriter := zip.NewWriter(zipFile)
for data := range fileChannel {
reader := pipeMap[data.Key]
f, err := zipWriter.Create(data.Name)
if err != nil {
log.Errorf("Zip entry creation failed: %v", err)
continue
}
s3FileData := make([]byte, 0, 5 * 1024 * 1024) // 5 MB
wg.Add(1)
for {
n, err := reader.Read(s3FileData[:cap(s3FileData)])
if err != nil && err != io.EOF { // if err is EOF, file reading is completed
log.Errorf("Error reading from pipeReader for file: %s: %s", data.Key, err)
return
}
if n == 0 { // if n is 0, file reading is completed
wg.Done()
return
}
_, err = f.Write(s3FileData[:n])
if err != nil {
log.Errorf("Error writing to zip file for file: %s: %s", data.Name, err)
return
}
}
}
zipWriter.Close()
zipFile.Close()
zipFileChannel <- zipFile
}()
}
- Zipping happens concurrently but safely via buffered IO and synchronization.
- Files are never fully loaded into memory
f.Write
transfers the data in chunks from pipe data read in memory to zip directly
3. Putting It All Together
func CreateZippedArchive(s3Docs []DocumentS3Data, s3Client *s3.Client) (*os.File, error) {
var wg sync.WaitGroup
pipeMap := make(map[string]*io.PipeReader)
fileChannel := make(chan S3ChannelData)
zipFileChannel := make(chan *os.File)
DownloadMultipleS3DocumentAsync(s3Docs, pipeMap, &wg, fileChannel, s3Client)
WriteToZipAsync(pipeMap, fileChannel, &wg, zipFileChannel)
wg.Wait()
close(fileChannel)
return <-zipFileChannel, nil
}
This is your end-to-end engine to stream files from S3 and produce a zip archive, fast and efficiently.
⚡ Why This Is Fast and Scalable
- No Temp Storage: All files are streamed, nothing is written to disk until the final zip.
- Controlled Memory:
io.Pipe
andio.Copy
ensure minimal memory usage, even with large files. - High Throughput:
goroutines
allow full network and IO parallelism. - Safe Cleanup: Proper use of
WaitGroup
and pipe closures avoids leaks or zip corruption.
🛠️ Production Considerations
- Use
context.WithTimeout
to cancel slow or hanging downloads. - Monitor memory use if files are massive.
- Limit parallelism if hitting AWS throttling or bandwidth bottlenecks.
- Add logging and retries for network reliability.
🧪 Real-World Use Case
This pattern is live in a production system that serves thousands of users, enabling them to bulk download reports, media, and invoices without latency or resource bottlenecks. It’s been performance-tested with dozens to hundreds of concurrent files at once.
✅ TL;DR
To download and zip multiple S3 files in Go, efficiently and concurrently:
- Use
io.Pipe
to stream file content - Launch parallel downloads using goroutines
- Zip them using a shared
zip.Writer
- Coordinate all work with
sync.WaitGroup
andchannels
🚀 Conclusion
Implementing a high-performance S3 bulk download and zip service in Go isn’t just about saving time, it’s about building scalable systems that respect memory, I/O and user experience.
By leveraging Go’s concurrency primitives, streaming via io.Pipe
, and clean channel orchestration, we avoid temporary file overhead and unlock true parallelism.
Whether you’re zipping tens or thousands of documents, this pattern can serve as a battle-tested foundation for production-scale document bundling workflows. If you’re looking to integrate similar capabilities in your system, I hope this breakdown gives you both the architectural clarity and code-level confidence to get started.
Learn more Building a High-Performance S3 Bulk Downloader and Zipper in Go