Robust Async File Downloader in Python with aiohttp

Introduction

Efficiently downloading large files in Python can be challenging, especially when you want to support asynchronous downloads, caching, file validation, and real-time progress feedback. In this blog, we’ll walk through building a production-ready async file downloader using the aiohttp library, with features like cache validation, file size and MD5 hash checking, and a customizable progress callback.

Why Choose aiohttp for Asynchronous Downloads?

aiohttp is my go-to library for asynchronous HTTP operations in Python. It’s fast, mature, and designed for non-blocking network tasks—making it ideal for:

Downloading large files efficiently
Handling multiple downloads concurrently
Integrating with modern async Python workflows

Key Features of Our Async File Downloader

Here’s what sets this downloader apart:

Asynchronous Downloading: Harness the power of async/await for non-blocking file transfers.
Smart Caching: Skip downloads if the file already exists and matches expected size or MD5 hash.
Robust Validation: Automatically check file size and MD5 hash after download to ensure integrity.
Custom Progress Callback: Get real-time feedback with a callback function for download progress.

Implementation Overview

Below is a streamlined version of the AsyncFileDownloader class. It’s designed for clarity and extensibility:

import aiohttp
import asyncio
import hashlib
import os
import time
class AsyncFileDownloader:
    def __init__(self, output_dir="."):
        self.output_dir = output_dir
        os.makedirs(self.output_dir, exist_ok=True)
    async def _md5sum(self, file_path, chunk_size=8192):
        md5 = hashlib.md5()
        with open(file_path, "rb") as f:
            while True:
                chunk = f.read(chunk_size)
                if not chunk:
                    break
                md5.update(chunk)
        return md5.hexdigest()
    async def _validate_file(self, file_path, expected_size=None, expected_md5=None):
        if not os.path.exists(file_path):
            return False
        if expected_size is not None and os.path.getsize(file_path) != expected_size:
            return False
        if expected_md5 is not None:
            actual_md5 = await self._md5sum(file_path)
            if actual_md5 != expected_md5:
                return False
        return True
    async def download(self, url, filename=None, expected_size=None, expected_md5=None, callback=None, frequency=0.5):
        if not filename:
            filename = os.path.basename(url)
        file_path = os.path.join(self.output_dir, filename)
        # Cache validation
        if await self._validate_file(file_path, expected_size, expected_md5):
            print(f"File {file_path} already valid. Skipping download.")
            return file_path
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as resp:
                resp.raise_for_status()
                total_bytes = int(resp.headers.get('Content-Length', 0)) or expected_size or 0
                bytes_downloaded = 0
                start_time = time.time()
                last_callback = start_time
                with open(file_path, "wb") as f:
                    async for chunk in resp.content.iter_chunked(8192):
                        f.write(chunk)
                        bytes_downloaded += len(chunk)
                        now = time.time()
                        if callback and (now - last_callback >= frequency or bytes_downloaded == total_bytes):
                            time_elapsed = now - start_time
                            callback(bytes_downloaded, total_bytes, time_elapsed)
                            last_callback = now
        print(f"Downloaded {file_path}")
        # Post-download validation
        if not await self._validate_file(file_path, expected_size, expected_md5):
            os.remove(file_path)
            raise ValueError(f"Downloaded file {file_path} failed validation.")
        return file_path

Let’s break down the workflow:

Initialization: Set your output directory for downloads.
Cache Validation: Before downloading, check if the file already exists and matches the expected size or MD5 hash.
Async Download: If needed, stream the file in chunks and write to disk.
Progress Callback: Receive real-time updates on download progress, bytes transferred, and elapsed time.
Post-download Validation: After download, validate the file again. If it fails, delete and raise an error.

Example Usage

Here’s how you can use the downloader in your own projects:

import asyncio
def print_progress(bytes_downloaded, total_bytes, time_elapsed):
    percent = (bytes_downloaded / total_bytes) * 100 if total_bytes else 0
    print(f"Downloaded: {bytes_downloaded}/{total_bytes} bytes ({percent:.2f}%), Time elapsed: {time_elapsed:.2f}s")
async def main():
    downloader = AsyncFileDownloader(output_dir="downloads")
    await downloader.download(
        url="https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p1p41242.bz2",
        expected_size=291274499,
        expected_md5="5b594c2af71ecf65505dc42d49ab6121",
        callback=print_progress,
        frequency=1.0
    )
asyncio.run(main())

Other Considerations

Limit Concurrency: For large files or many simultaneous downloads, use a semaphore or queue to avoid overwhelming your system.
Validate Everything: Always check files after download to guarantee data integrity.
Explore Alternatives: While aiohttp is excellent, consider httpx for advanced async HTTP needs.

Conclusion

With aiohttp, you get speed, reliability, and flexibility—perfect for data engineering, web scraping, and AI workflows.

If you found this post helpful, consider subscribing to my newsletter for more deep dives into Python, AI, and engineering best practices.

Have questions, feedback, or your own download tips? Drop a comment below

Learn more Robust Async File Downloader in Python with aiohttp

Robust Async File Downloader in Python with aiohttp

Introduction

Why Choose aiohttp for Asynchronous Downloads?

Key Features of Our Async File Downloader

Implementation Overview

Example Usage

Other Considerations

Conclusion

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

If a woman engineer had been the first to design pickle jars, they would never sometimes require a man to open them.

Adding Creativity to the Classroom

Wis (@wisteria_888_) on X

Investment Banking Analyst 2 or 3 Technology M&A

Archives

Robust Async File Downloader in Python with aiohttp

Introduction

Why Choose aiohttp for Asynchronous Downloads?

Key Features of Our Async File Downloader

Implementation Overview

Example Usage

Other Considerations

Conclusion

Like this:

By skyforbes

Related Posts

What I Learned: Thousands of Downloads on Zero Marketing Spending

Consultio — Consulting Corporate WordPress Theme download

Agno — Creative Portfolio Agency WordPress Theme Download

Leave a ReplyCancel reply

You Missed

If a woman engineer had been the first to design pickle jars, they would never sometimes require a man to open them.

Adding Creativity to the Classroom

Wis (@wisteria_888_) on X

Investment Banking Analyst 2 or 3 Technology M&A