Download OSM Data as Parquet and Query Using DuckDB

TL/DR

OpenStreetMap (OSM) is a collaborative project that provides freely accessible, community-driven geospatial data representing features on Earth. OSM data could be very much useful in case you would like to perform GIS-related data transformations. You might need to download the Polygons or to extract specific statistics from an area. Although there are plenty of ways to interact with OSM data through libraries and APIs, I would like to show you very quick how you can download OSM earth, OSM features and OSM elements to your local machine and check it out using DuckDB. Please be aware that you might need to have at least 1TB of memory OSM features and OSM elements.

OSM Earth refers broadly to the representation of the planet within OSM’s ecosystem, encompassing all the geographical data contributed by its users, including natural and man-made elements. OSM features are the specific objects or entities represented in this dataset, such as roads, buildings, rivers, parks, or businesses. For example, a “school” is a feature tagged with amenity=school, and a “forest” is a feature tagged with landuse=forest. These features are further broken down into OSM elements, which are the building blocks of OSM data. The three primary elements are nodes (points with latitude and longitude, like a tree or a bus stop), ways (ordered collections of nodes, forming lines or polygons, like a road or a lake), and relations (groups of nodes, ways, or other relations that represent complex structures, like a bus route or a Multipolygon forest).

One way to access to the whole OSM data considering all its releases is daylight project in which is accessible through AWS repository of open data (link here).

You do not have to have an AWS account but you need to install AWS CLI for your terminal.

Image generated by ChatGPT

You can open a terminal in your VSCODE, and apply the following command to see which version are available there:

aws s3 ls - no-sign-request s3://daylight-map-distribution/release/

At the time of writing this article the latest version is v1.58.

If you want to see the Parquet files inside a specific version of OSM Earth, and estimate the size, you can apply the following command:

aws s3 ls --no-sign-request s3://daylight-openstreetmap/earth/release=v1.58/ 
--recursive > osm_earth_file.txt

Now, you should be able to see a file named osm_earth_file.txt with the following content:

Details of the OSM Earth Files

You can estimate the total size of the OSM earth through using the following Python script in which ChatGPT helped me to write it:

# Python code generated with the help of ChatGPT
# List of files to calculate total size
files = ['osm_features_file', 'osm_elements_file', 'osm_earth_file']
for file in files:
total_size = 0 # Reset total size for each file
# Open the file with the correct encoding
with open(f"./data/osmdetails/{file}.txt", "r", encoding="utf-16") as f:
for line in f:
parts = line.split()
# Check if the line contains a valid size entry (3rd column is numeric)
if len(parts) >= 3 and parts[2].isdigit():
file_size = int(parts[2]) # Extract the file size from the 3rd column
total_size += file_size # Add to total size for this file
# Print total size in GB for each file
print(f"Total size of {file} is {total_size / 1e9:.2f} GB") # Convert bytes to GB

Before applying the above python code snippet to estimate the total Parquet file sizes, you need to apply below also in your terminal:

# Storing the osm features file details
aws s3 ls --no-sign-request s3://daylight-openstreetmap/parquet/osm_features/release=v1.58/
--recursive > osm_features_file.txt
# Storing the osm elements file details
aws s3 ls --no-sign-request s3://daylight-openstreetmap/parquet/osm_elements/release=v1.58/
--recursive > osm_elements_file.txt

Therefore after downloading the file details and saving the details as .txt file through above commands, and running the python code snippet, you should be able to see the following sizes:

Parquet File sizes for each of OSM files

So you can see that you need to have around 1TB of empty space on your laptop to download the whole OSM data.

OSM earth is the smallest among these three, and if you want to download the files, you can apply the following command:

aws s3 cp --no-sign-request s3://daylight-openstreetmap/earth/release=v1.58/ 
your_local_path\release=v1.58 --recursive

After downloading the files as Parquet, you can query the files (here theme=adminstrative) using a Python code snippet but you need to have installed “DuckDB” beforehand. If you want to know more about DuckDB, you can read my other post through this link.

and you should be able to see the content:

Content of OSM Earth — theme=adminstrative

As above shows, wkt column is there in which you can extract Polygons of your choice from it.

Great! Following above you should be able to download the whole OSM data as Parquet files locally. In case you are interested in more data engineering, data science and GIS content, you can follow my Medium and/or YouTube channel.

Thanks a lot for reading. Please follow for more!

Learn more Download OSM Data as Parquet and Query Using DuckDB

Leave a Reply