Download Kaggle Datasets within Google Colab

Photo by Claudio Schwarz on Unsplash

Google Colab is a free, cloud-based platform that allows users to write and execute Python code in a Jupyter Notebook environment. It’s an invaluable tool for data scientists and machine learning practitioners because it offers powerful computing resources, including free access to GPUs, all within a browser-based interface.

One of the most common tasks in data science projects is accessing and manipulating datasets. Kaggle provides a vast repository of datasets that are crucial for these projects. Traditionally, many users manually download these datasets from Kaggle and then upload them to Google Drive / Google Colab for use. However, this approach can be time-consuming and cumbersome, especially with large datasets.

This article focuses on how to download Kaggle datasets within Google Colab and store them within your Google Drive for later use.

Create a Kaggle API token

The first step is to create an API token from your Kaggle account in order to connect to Kaggle externally:

  • Log in your Kaggle account (if you don’t have an account already, it’s really quick to create one using a Google account)
  • Go to Settings -> API
  • Select Create new token ; this will download a file kaggle.json locally in your computer

Set up Google Drive and Kaggle API token within Google Colab

  • Open Google Colab and start a new notebook
  • Click on the folder icon to the left, and then select Mount Drive
Mount Drive button within Google Colab
Mount Drive button within Google Colab
  • Once clicked, the Google Drive will mount to your notebook (this will take 1–2 minutes) and you can view it in the sidebar
Google Drive mounted in Google Colab
Google Drive mounted in Google Colab
  • Navigate within drive/MyDrive and create a Kaggle folder inside
  • Create locally a kaggle_install.sh file or download it from here
#!/bin/bash
pip install kaggle
mkdir -p ~/.kaggle
# change the path /content/drive/MyDrive/<folder>
# based on where you have uploaded kaggle.json within your Drive
cp /content/drive/MyDrive/Kaggle/kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
  • Within the Kaggle folder upload the kaggle.json and kaggle_install.sh files

The above steps for downloading the API token and uploading the necessary setup files in your Google Drive are only required once, unless you lose your API token.

Run the setup file

For every new notebook you create, you need to do the following steps to connect to Kaggle:

  • Make sure your Google Drive is mounted to Colab
  • Navigate to the Kaggle folder from within the notebook
%cd "/content/drive/My Drive/Kaggle"
  • Run the kaggle_install.sh file
!bash kaggle_install.sh
  • Return to the main directory (if you don’t your files will be saved within the Kaggle folder)
%cd "/content"

Download datasets from Kaggle

After the setup is done, you are now ready to download any dataset from Kaggle. Here is are the insturction on how you can do that:

  • Within your notebook use the following command and replace the author , dataset and filename placeholders. The author/dataset can be retrieved from the URL (f.e. from https://www.kaggle.com/datasets/shariful07/student-mental-health, we would use shariful07/student-mental-health)
# Command for downloading datasets from Kaggle
!kaggle datasets download <author>/<dataset> -f <filename>
  • If the parameter -f <filename> is not provided, then the dataset will be downloaded as a .zip file
  • If the filename contains spaces or special characters, it needs to be wrapped with double quotation marks, f.e. :
!kaggle datasets download shariful07/student-mental-health -f "Student Mental health".csv

Learn more Download Kaggle Datasets within Google Colab

Leave a Reply