Building a High-Performance S3 Bulk Downloader and Zipper in Go

Don’t Just Download S3 Files, Stream and Zip Them the Go Way

Need to download hundreds of files from S3 and serve them as a single downloadable zip file, without overloading your system or your user’s patience? In this blog, I’ll walk you through a production-grade solution written in Go that uses parallelism, streaming, and minimal memory overhead to efficiently zip and deliver S3 files.

🧠 The Problem

Let’s say your system stores PDFs, images, or reports in AWS S3, and a client wants to download a selected set of them, ideally, as a single zip archive.

Instead of downloading them one by one and then zipping them (which is inefficient and resource-heavy), you want to:

  • Fetch all files in parallel
  • Stream them into a single .zip file on the fly
  • Avoid loading entire files into memory
  • Serve or upload the resulting archive efficiently

What seems like a simple task quickly becomes non-trivial when performance, memory constraints, and scalability enter the equation, especially when you’re working with large file sets.

🏗️ The Solution Architecture

To handle this efficiently, we’ll leverage Go’s concurrency model and streaming IO:

  • goroutines + sync.WaitGroup for concurrent S3 downloads
  • io.Pipe to stream data from S3 directly into a zip archive
  • channels for coordination between downloading and zipping

Flow Overview

  1. Spin up a goroutine for each S3 file using the AWS SDK
  2. Stream each file into an io.Pipe
  3. Read from each pipe and write to a zip archive in parallel
  4. Properly close all writers and the zip stream after completion
Architecture diagram showing the flow of implementation
Flow Architecture Diagram

🧬 Core Concepts in Use

  • io.Pipe: Bridges two goroutines, one writes (downloads from S3), the other reads (writes into zip).
  • sync.WaitGroup: Waits for all download and zip workers to complete.
  • chan: Channels coordinate the work and share metadata like filenames and zip paths.

💻 Code Walkthrough

Let’s walk through the simplified core functions that do the heavy lifting.

1. Kick Off Concurrent S3 Downloads

func DownloadMultipleS3DocumentAsync(
s3Documents []DocumentS3Data,
pipeMap map[string]*io.PipeReader,
wg *sync.WaitGroup,
fileChannel chan S3ChannelData,
s3Client *s3.Client,
) {
downloader := manager.NewDownloader(s3Client)
for _, doc := range s3Documents {
pipeReader, pipeWriter := io.Pipe()
pipeMap[doc.Key] = pipeReader
wg.Add(1)
go func(doc DocumentS3Data, pw *io.PipeWriter) {
defer wg.Done()
defer pw.Close()
fileChannel <- S3ChannelData{Key: doc.Key, Name: doc.Name}
_, err := downloader.Download(context.TODO(), PipeWriterWrapper{Writer: pw}, &s3.GetObjectInput{
Bucket: aws.String(doc.Bucket),
Key: aws.String(doc.Key),
})
if err != nil {
log.Errorf("Failed download: %s", err)
}
}(doc, pipeWriter)
}
}

Each file is streamed directly from S3 into a PipeWriter, which the zip writer will later consume.

2. Stream Pipes Into a Zip File

func WriteToZipAsync(
pipeMap map[string]*io.PipeReader,
fileChannel chan S3ChannelData,
wg *sync.WaitGroup,
zipFileChannel chan *os.File,
) {
go func() {
zipFile, err := os.Create(fmt.Sprintf("%s.zip", uuid.NewString()))
if err != nil {
log.Fatalf("Zip file creation failed: %v", err)
}
zipWriter := zip.NewWriter(zipFile)
for data := range fileChannel {
reader := pipeMap[data.Key]
f, err := zipWriter.Create(data.Name)
if err != nil {
log.Errorf("Zip entry creation failed: %v", err)
continue
}
s3FileData := make([]byte, 0, 5 * 1024 * 1024) // 5 MB
wg.Add(1)
for {
n, err := reader.Read(s3FileData[:cap(s3FileData)])
if err != nil && err != io.EOF { // if err is EOF, file reading is completed
log.Errorf("Error reading from pipeReader for file: %s: %s", data.Key, err)
return
}
if n == 0 { // if n is 0, file reading is completed
wg.Done()
return
}
_, err = f.Write(s3FileData[:n])
if err != nil {
log.Errorf("Error writing to zip file for file: %s: %s", data.Name, err)
return
}
}
}
zipWriter.Close()
zipFile.Close()
zipFileChannel <- zipFile
}()
}
  • Zipping happens concurrently but safely via buffered IO and synchronization.
  • Files are never fully loaded into memory
  • f.Write transfers the data in chunks from pipe data read in memory to zip directly

3. Putting It All Together

func CreateZippedArchive(s3Docs []DocumentS3Data, s3Client *s3.Client) (*os.File, error) {
var wg sync.WaitGroup
pipeMap := make(map[string]*io.PipeReader)
fileChannel := make(chan S3ChannelData)
zipFileChannel := make(chan *os.File)
DownloadMultipleS3DocumentAsync(s3Docs, pipeMap, &wg, fileChannel, s3Client)
WriteToZipAsync(pipeMap, fileChannel, &wg, zipFileChannel)
wg.Wait()
close(fileChannel)
return <-zipFileChannel, nil
}

This is your end-to-end engine to stream files from S3 and produce a zip archive, fast and efficiently.

⚡ Why This Is Fast and Scalable

  • No Temp Storage: All files are streamed, nothing is written to disk until the final zip.
  • Controlled Memory: io.Pipe and io.Copy ensure minimal memory usage, even with large files.
  • High Throughput: goroutines allow full network and IO parallelism.
  • Safe Cleanup: Proper use of WaitGroup and pipe closures avoids leaks or zip corruption.

🛠️ Production Considerations

  • Use context.WithTimeout to cancel slow or hanging downloads.
  • Monitor memory use if files are massive.
  • Limit parallelism if hitting AWS throttling or bandwidth bottlenecks.
  • Add logging and retries for network reliability.

🧪 Real-World Use Case

This pattern is live in a production system that serves thousands of users, enabling them to bulk download reports, media, and invoices without latency or resource bottlenecks. It’s been performance-tested with dozens to hundreds of concurrent files at once.

✅ TL;DR

To download and zip multiple S3 files in Go, efficiently and concurrently:

  • Use io.Pipe to stream file content
  • Launch parallel downloads using goroutines
  • Zip them using a shared zip.Writer
  • Coordinate all work with sync.WaitGroup and channels

🚀 Conclusion

Implementing a high-performance S3 bulk download and zip service in Go isn’t just about saving time, it’s about building scalable systems that respect memory, I/O and user experience.

By leveraging Go’s concurrency primitives, streaming via io.Pipe, and clean channel orchestration, we avoid temporary file overhead and unlock true parallelism.

Whether you’re zipping tens or thousands of documents, this pattern can serve as a battle-tested foundation for production-scale document bundling workflows. If you’re looking to integrate similar capabilities in your system, I hope this breakdown gives you both the architectural clarity and code-level confidence to get started.

Learn more Building a High-Performance S3 Bulk Downloader and Zipper in Go

Leave a Reply