Automate High-Resolution Image Download for Amazon ASINs with Python using the Crawlbase API

Introduction

If you’re working with Amazon product data, you may find yourself needing to automate the process of downloading high-resolution images for a list of products identified by their ASINs (Amazon Standard Identification Numbers). Whether you’re building an e-commerce site, conducting market research, or simply archiving product images, this task can quickly become tedious if done manually.

In this article, I’ll walk you through a Python script that automates the entire process — from fetching ASINs from a CSV file to downloading and organizing the images in neatly labeled folders. By the end of this tutorial, you’ll have a powerful tool at your disposal that saves time and eliminates repetitive work.

Prerequisites

Before diving into the code, make sure you have the following:

  1. Basic Knowledge of Python: You should be familiar with Python programming.
  2. Python Installed: If you don’t have Python installed, download and install it from python.org.
  3. Python Libraries: The script uses requests for downloading images and urllib and json from Python’s standard library.

You can install the requests library using pip:

pip install requests

The Problem

The task is to automate the following steps:

  1. Compose the Product URL: Generate the Amazon product URL using the ASIN.
  2. Fetch Product Data: Use an API to fetch product details, including the number of images and their URLs.
  3. Create Folders: For each ASIN, create a corresponding folder to store the images.
  4. Download Images: Download the images and rename them sequentially.
  5. Handle Multiple ASINs: Process each ASIN in a list from a CSV file.

Let’s break down the solution step by step.

Step 1: Read ASINs from a CSV File

First, we need to read ASINs from a CSV file. The file (asins_list.csv) should contain ASINs in the first column. We use Python’s built-in csv module to handle this:

import os
folder_name = asin
if not os.path.exists(folder_name):
os.makedirs(folder_name)

Step 2: Compose the URL and Fetch Data

For each ASIN, we construct the Amazon product URL and fetch product data using the Crawlbase API. This requires us to handle URL encoding and work with JSON responses:

from urllib.request import urlopen
from urllib.parse import quote_plus
import json
for asin in asins:
url = f'https://www.amazon.es/dp/{asin}'
encoded_url = quote_plus(url)
handler = urlopen(f'https://api.crawlbase.com/?token=your_token_here&scraper=amazon-product-details&url=' + encoded_url)
response = handler.read().decode('utf-8')
parsed_json = json.loads(response)
body = parsed_json.get('body', {})
images_count = body.get('imagesCount', 0)
high_res_images = body.get('highResolutionImages', [])

Step 3: Create a Local Folder

For organizational purposes, we create a folder named after each ASIN. If the folder already exists, it won’t be recreated:

import os
folder_name = asin
if not os.path.exists(folder_name):
os.makedirs(folder_name)

Step 4: Download and Rename Images

We loop through the list of image URLs, download each one, and save it with a sequential name in the corresponding folder:

import requests
def download_image(url, folder, file_name):
response = requests.get(url)
if response.status_code == 200:
with open(os.path.join(folder, file_name), 'wb') as file:
file.write(response.content)
for i, img_url in enumerate(high_res_images):
file_name = f"{asin}_{i+1}.jpg"
download_image(img_url, folder_name, file_name)
print(f"Downloaded {file_name} for ASIN {asin}")

Step 5: Handle Errors and Continue

To ensure the script continues even if it encounters errors (e.g., missing data or network issues), we wrap the code in a try-except block:

for asin in asins:
try:
# [code for composing URL, fetching data, creating folders, and downloading images]
print(f"Completed processing ASIN {asin}: {images_count} images found.")
except Exception as e:
print(f"Failed to process ASIN {asin}: {e}")

Complete Script

Putting it all together, here’s the complete Python script:

import os
import csv
import requests
from urllib.request import urlopen
from urllib.parse import quote_plus
import json
def download_image(url, folder, file_name):
response = requests.get(url)
if response.status_code == 200:
with open(os.path.join(folder, file_name), 'wb') as file:
file.write(response.content)
with open('asins_list.csv', 'r') as csvfile:
asin_reader = csv.reader(csvfile)
asins = [row[0] for row in asin_reader] # Assuming ASINs are in the first column
for asin in asins:
try:
url = f'https://www.amazon.es/dp/{asin}'
encoded_url = quote_plus(url)
handler = urlopen(f'https://api.crawlbase.com/?token=your_token_here&scraper=amazon-product-details&url=' + encoded_url)
response = handler.read().decode('utf-8')
parsed_json = json.loads(response)
body = parsed_json.get('body', {})
images_count = body.get('imagesCount', 0)
high_res_images = body.get('highResolutionImages', [])
folder_name = asin
if not os.path.exists(folder_name):
os.makedirs(folder_name)
for i, img_url in enumerate(high_res_images):
file_name = f"{asin}_{i+1}.jpg"
download_image(img_url, folder_name, file_name)
print(f"Downloaded {file_name} for ASIN {asin}")
print(f"Completed processing ASIN {asin}: {images_count} images found.")
except Exception as e:
print(f"Failed to process ASIN {asin}: {e}")
print("Processing complete.")

Conclusion

This script provides a robust and automated solution for downloading high-resolution product images from Amazon, given a list of ASINs. With minimal setup, you can save yourself countless hours of manual work and ensure that your image assets are organized and properly named.

Feel free to adapt and expand this script based on your specific needs. Happy coding!

Learn more Automate High-Resolution Image Download for Amazon ASINs with Python using the Crawlbase API

Leave a Reply