Manually sorting downloaded files is tedious and error-prone, especially when filenames don’t tell the full story. With a few lines of Python, you can automate file organization by content type and file hash, making your archive more searchable, tamper-proof, and robust against duplicates.
The Problem: Manual File Sorting Is Inefficient
The traditional approach involves:
- Guessing file types based on extension (which can be missing or wrong)
- Manually renaming or moving files
- Risking overwrites or leaving duplicate files scattered across folders
The Pattern: Type and Hash-Based Auto-Organization
By using pathlib.Path.glob
, shutil.move
, hashlib.md5
, and mimetypes.guess_type
, you can scan a directory, classify files by their real content type, and move them to new locations based on a content hash.
from pathlib import Path
import shutil, hashlib, mimetypes
for f in Path("downloads").glob("*"):
with open(f, "rb") as b: h = hashlib.md5(b.read()).hexdigest()
kind = mimetypes.guess_type(f.name)[0] or "unknown"
ext_folder = Path("organized") / kind.split('/')[0]
ext_folder.mkdir(parents=True, exist_ok=True)
shutil.move(str(f), ext_folder / f"{h[:8]}-{f.name}")
How It Works
Path("downloads").glob("*")
…
Learn more Auto-Organize Downloads by Content Type and Hash in Python