Building a High-Performance S3 Bulk Downloader and Zipper in Go

Don’t Just Download S3 Files, Stream and Zip Them the Go Way

Need to download hundreds of files from S3 and serve them as a single downloadable zip file, without overloading your system or your user’s patience? In this blog, I’ll walk you through a production-grade solution written in Go that uses parallelism, streaming, and minimal memory overhead to efficiently zip and deliver S3 files.

🧠 The Problem

Let’s say your system stores PDFs, images, or reports in AWS S3, and a client wants to download a selected set of them, ideally, as a single zip archive.

Instead of downloading them one by one and then zipping them (which is inefficient and resource-heavy), you want to:

Fetch all files in parallel
Stream them into a single .zip file on the fly
Avoid loading entire files into memory
Serve or upload the resulting archive efficiently

What seems like a simple task quickly becomes non-trivial when performance, memory constraints, and scalability enter the equation, especially when you’re working with large file sets.

🏗️ The Solution Architecture

To handle this efficiently, we’ll leverage Go’s concurrency model and streaming IO:

goroutines + sync.WaitGroup for concurrent S3 downloads
io.Pipe to stream data from S3 directly into a zip archive
channels for coordination between downloading and zipping

Flow Overview

Spin up a goroutine for each S3 file using the AWS SDK
Stream each file into an io.Pipe
Read from each pipe and write to a zip archive in parallel
Properly close all writers and the zip stream after completion

Architecture diagram showing the flow of implementation — Flow Architecture Diagram

🧬 Core Concepts in Use

io.Pipe: Bridges two goroutines, one writes (downloads from S3), the other reads (writes into zip).
sync.WaitGroup: Waits for all download and zip workers to complete.
chan: Channels coordinate the work and share metadata like filenames and zip paths.

💻 Code Walkthrough

Let’s walk through the simplified core functions that do the heavy lifting.

1. Kick Off Concurrent S3 Downloads

func DownloadMultipleS3DocumentAsync(
    s3Documents []DocumentS3Data,
    pipeMap map[string]*io.PipeReader,
    wg *sync.WaitGroup,
    fileChannel chan S3ChannelData,
    s3Client *s3.Client,
) {
    downloader := manager.NewDownloader(s3Client)
for _, doc := range s3Documents {
        pipeReader, pipeWriter := io.Pipe()
        pipeMap[doc.Key] = pipeReader
wg.Add(1)
        go func(doc DocumentS3Data, pw *io.PipeWriter) {
            defer wg.Done()
            defer pw.Close()
fileChannel <- S3ChannelData{Key: doc.Key, Name: doc.Name}
_, err := downloader.Download(context.TODO(), PipeWriterWrapper{Writer: pw}, &s3.GetObjectInput{
                Bucket: aws.String(doc.Bucket),
                Key:    aws.String(doc.Key),
            })
            if err != nil {
                log.Errorf("Failed download: %s", err)
            }
        }(doc, pipeWriter)
    }
}

Each file is streamed directly from S3 into a PipeWriter, which the zip writer will later consume.

2. Stream Pipes Into a Zip File

func WriteToZipAsync(
    pipeMap map[string]*io.PipeReader,
    fileChannel chan S3ChannelData,
    wg *sync.WaitGroup,
    zipFileChannel chan *os.File,
) {
    go func() {
        zipFile, err := os.Create(fmt.Sprintf("%s.zip", uuid.NewString()))
        if err != nil {
            log.Fatalf("Zip file creation failed: %v", err)
        }
zipWriter := zip.NewWriter(zipFile)
for data := range fileChannel {
            reader := pipeMap[data.Key]
f, err := zipWriter.Create(data.Name)
            if err != nil {
                log.Errorf("Zip entry creation failed: %v", err)
                continue
            }
s3FileData := make([]byte, 0, 5 * 1024 * 1024) // 5 MB
            wg.Add(1)
            for {
                n, err := reader.Read(s3FileData[:cap(s3FileData)])
                if err != nil && err != io.EOF { // if err is EOF, file reading is completed
                    log.Errorf("Error reading from pipeReader for file: %s: %s", data.Key, err)
                    return
                }
                if n == 0 { // if n is 0, file reading is completed
                    wg.Done()
                    return
                }
_, err = f.Write(s3FileData[:n])
                if err != nil {
                    log.Errorf("Error writing to zip file for file: %s: %s", data.Name, err)
                    return
                }
            }
        }
zipWriter.Close()
        zipFile.Close()
zipFileChannel <- zipFile
    }()
}

Zipping happens concurrently but safely via buffered IO and synchronization.
Files are never fully loaded into memory
f.Write transfers the data in chunks from pipe data read in memory to zip directly

3. Putting It All Together

func CreateZippedArchive(s3Docs []DocumentS3Data, s3Client *s3.Client) (*os.File, error) {
    var wg sync.WaitGroup
    pipeMap := make(map[string]*io.PipeReader)
    fileChannel := make(chan S3ChannelData)
    zipFileChannel := make(chan *os.File)
DownloadMultipleS3DocumentAsync(s3Docs, pipeMap, &wg, fileChannel, s3Client)
    WriteToZipAsync(pipeMap, fileChannel, &wg, zipFileChannel)
wg.Wait()
    close(fileChannel)
return <-zipFileChannel, nil
}

This is your end-to-end engine to stream files from S3 and produce a zip archive, fast and efficiently.

⚡ Why This Is Fast and Scalable

No Temp Storage: All files are streamed, nothing is written to disk until the final zip.
Controlled Memory: io.Pipe and io.Copy ensure minimal memory usage, even with large files.
High Throughput: goroutines allow full network and IO parallelism.
Safe Cleanup: Proper use of WaitGroup and pipe closures avoids leaks or zip corruption.

🛠️ Production Considerations

Use context.WithTimeout to cancel slow or hanging downloads.
Monitor memory use if files are massive.
Limit parallelism if hitting AWS throttling or bandwidth bottlenecks.
Add logging and retries for network reliability.

🧪 Real-World Use Case

This pattern is live in a production system that serves thousands of users, enabling them to bulk download reports, media, and invoices without latency or resource bottlenecks. It’s been performance-tested with dozens to hundreds of concurrent files at once.

✅ TL;DR

To download and zip multiple S3 files in Go, efficiently and concurrently:

Use io.Pipe to stream file content
Launch parallel downloads using goroutines
Zip them using a shared zip.Writer
Coordinate all work with sync.WaitGroup and channels

🚀 Conclusion

Implementing a high-performance S3 bulk download and zip service in Go isn’t just about saving time, it’s about building scalable systems that respect memory, I/O and user experience.

By leveraging Go’s concurrency primitives, streaming via io.Pipe, and clean channel orchestration, we avoid temporary file overhead and unlock true parallelism.

Whether you’re zipping tens or thousands of documents, this pattern can serve as a battle-tested foundation for production-scale document bundling workflows. If you’re looking to integrate similar capabilities in your system, I hope this breakdown gives you both the architectural clarity and code-level confidence to get started.

Learn more Building a High-Performance S3 Bulk Downloader and Zipper in Go

Building a High-Performance S3 Bulk Downloader and Zipper in Go

Don’t Just Download S3 Files, Stream and Zip Them the Go Way

🧠 The Problem

🏗️ The Solution Architecture

Flow Overview

🧬 Core Concepts in Use

💻 Code Walkthrough

1. Kick Off Concurrent S3 Downloads

2. Stream Pipes Into a Zip File

3. Putting It All Together

⚡ Why This Is Fast and Scalable

🛠️ Production Considerations

🧪 Real-World Use Case

✅ TL;DR

🚀 Conclusion

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

ChatGPT for social media: 75+ Tried-and-Tested Prompts

10 Essential First Time Travel Tips That Will Change Your Journey Forever!

Natural Approaches for Asthma — Bridgit Danner, Holistic Detox Coach

Archives

Building a High-Performance S3 Bulk Downloader and Zipper in Go

Don’t Just Download S3 Files, Stream and Zip Them the Go Way

🧠 The Problem

🏗️ The Solution Architecture

Flow Overview

🧬 Core Concepts in Use

💻 Code Walkthrough

1. Kick Off Concurrent S3 Downloads

2. Stream Pipes Into a Zip File

3. Putting It All Together

⚡ Why This Is Fast and Scalable

🛠️ Production Considerations

🧪 Real-World Use Case

✅ TL;DR

🚀 Conclusion

Like this:

By skyforbes

Related Posts

Depth Over Downloads: Why Spending Time with Books Still Matters

From 500K Downloads to Profitable Side Project: The Couply Pivot Story

LLM Sandbox: One Year Journey to 100k Downloads 🎉

Leave a ReplyCancel reply

You Missed

ChatGPT for social media: 75+ Tried-and-Tested Prompts

10 Essential First Time Travel Tips That Will Change Your Journey Forever!

Natural Approaches for Asthma — Bridgit Danner, Holistic Detox Coach