Downloading large files with integrity checks in Javascript

Downloading large files in web applications comes with its own set of challenges — especially when it comes to maintaining data integrity. Recently, we worked on a system for downloading large files. Since these files are critical, it’s essential to ensure they’re downloaded completely and without corruption.

These kinds of downloads are often affected by slow connections, browser memory limitations, or interruptions that can lead to incomplete or corrupted files. That’s where efficient downloading techniques and checksum validation become really important.

Memory Usage

Most browsers try to load the entire file into memory before processing it. While this works fine for small files, it becomes a serious issue with large files. You might end up with memory errors or even crash the whole application.

A better approach is to download the file in chunks, allowing you to handle it bit by bit without loading the entire file into memory at once. This also makes it easier to resume downloads or recover from interruptions.

File Integrity

Making sure the downloaded file is complete and uncorrupted is key — especially for large data sets. This is typically done using hashing.

You can generate a SHA-256 hash of the file on the server and send it along with the file. Once the download is complete on the client side, you calculate the hash again and compare it to the original. If they match, you know the file is good.

Server Side Implementation

Let’s say you’re using FastAPI on the backend. You can create an endpoint like /download/{filename} that does the following:

  • Takes a large file
  • Compresses it (e.g. using ZIP)
  • Streams it back to the frontend
  • Includes the file’s SHA-256 hash in the response header for integrity checking

This approach keeps memory usage low and allows the client to validate the file once it’s downloaded.

# === Configuration ===
TARGET_FILE = "sample.dcm"
ZIP_FILENAME = "sample.zip"
HASH_STORE = {}
CHUNK_SIZE = 8192
# === Utility Function ===
def create_zip_and_hash(filename: str):
zip_buffer = BytesIO()
hasher = hashlib.sha256()
# Create ZIP in memory
with ZipFile(zip_buffer, mode='w', compression=ZIP_DEFLATED) as zipf:
# Read file in chunks and add to zip using writestr
with open(filename, 'rb') as f:
file_data = f.read()
zipf.writestr(os.path.basename(filename), file_data)
zip_buffer.seek(0)
# Stream and hash chunks
while True:
chunk = zip_buffer.read(CHUNK_SIZE)
if not chunk:
break
hasher.update(chunk)
yield chunk
# Store the hash after streaming all chunks
HASH_STORE["last_hash"] = hasher.hexdigest()
# === API Endpoints ===
@app.get("/download")
def download_file():
if not os.path.exists(TARGET_FILE):
raise HTTPException(status_code=404, detail="Target file not found.")
zip_stream, file_hash = create_zip_and_hash(TARGET_FILE)
# Store hash in memory using filename as key
HASH_STORE["last_hash"] = file_hash
headers = {
"Content-Disposition": f"attachment; filename={ZIP_FILENAME}"
}
return StreamingResponse(
zip_stream,
media_type="application/zip",
headers=headers)
# === Verify Checksum===
@app.post("/verify")
def verify_hash(hash_from_frontend: str):
original_hash = HASH_STORE.get("last_hash")
if not original_hash:
return JSONResponse(status_code=400, content={"detail": "No hash stored for comparison."})
if hash_from_frontend == original_hash:
return {"status": "success", "message": "Hash matched successfully."}
else:
return {"status": "failure", "message": "Hash did not match."}

Frontend: Downloading Large Files in Javascript

Handling large files on the frontend — especially in JavaScript — comes with its own quirks.

At first, you might consider using crypto-js for hashing. However, it’s outdated and no longer maintained.

Modern browsers offer the SubtleCrypto API through window.crypto.subtle, which can perform operations like SHA-256 hashing. But here’s the catch: crypto.subtle.digest() can’t process data in chunks. It requires loading the full file into memory first—again, not ideal for large files.

import { sha256 } from '@noble/hashes/sha256';
export const calculateSHA256 = async (file) => {
const sliceSize = 10 * 1024 * 1024; // 10 MiB
let start = 0;
const hash = sha256.create(); // streaming hash
while (start < file.size) {
const slice = file.slice(start, start + sliceSize);
const buffer = await slice.arrayBuffer();
hash.update(new Uint8Array(buffer));
start += sliceSize;
}
// Convert Uint8Array hash to hex string
const hashBytes = hash.digest();
const hashHex = Array.from(hashBytes)
.map((b) => b.toString(16).padStart(2, '0'))
.join('');
return hashHex;
};

Better Solutions: WebAssembly and @noble/hashes

To hash large files in chunks, you have a couple of solid options:

– WebAssembly (WASM)

  • Offers near-native performance in the browser
  • Ideal for large files
  • Some libraries support streaming or chunked hashing
  • But it’s more complex to set up (e.g. managing binary dependencies, ensuring browser compatibility)

– @noble/hashes

  • A modern, pure JavaScript library
  • Lightweight, easy to use, and actively maintained
  • Supports chunked hashing
  • No binary dependencies, so it works smoothly across browsers
  • Not as fast as WASM, but fast enough for most real-world needs

Conclusion

If you’re building a web app that needs to download large files, it’s essential to think about both performance and data integrity. Using techniques like chunked downloading, server-side streaming, and hash validation can help you handle these large files reliably.

For hashing, if speed is your top priority and you’re okay with a bit more complexity, WASM-based libraries are the way to go. But if you’re looking for a simple, reliable, and browser-friendly solution, @noble/hashes is a great choice.

Learn more Downloading large files with integrity checks in Javascript

Leave a Reply