Top 15 Python Packages with 100 Million+ Downloads in 2024

Created by Meng Li

Discover the top 15 Python packages with over 100 million downloads on PyPI, their uses, and why they’re essential for developers.

Today, I’ll share with you the most downloaded Python packages on PyPI over the past year. Let’s explore what these packages do, how they relate to each other, and why they are so popular.

1. Urllib3: 893 million downloads

Urllib3 is a Python HTTP client. It offers features that aren’t available in the standard library.

  • Thread safety
  • Connection pooling
  • SSL/TLS client verification
  • Multipart file uploads
  • Tools for request retries and handling HTTP redirects
  • Support for gzip and deflate encoding
  • HTTP and SOCKS proxy support

Despite its name, Urllib3 is not a successor to Python’s built-in urllib2.

If you prefer using Python’s core features (like when installation isn’t possible), you might want to look at urllib.request.

For end-users, I highly recommend the `requests` package (see item 6 in the list).

Urllib3 ranks first because nearly 1,200 other packages depend on it, many of which are also top downloads on this list.

2. Six: 732 million downloads

Six is a tool for Python 2 and Python 3 compatibility.

Its goal is to allow code to run on both Python 2 and 3.

It provides functions that hide the differences between the two versions. A simple example is `six.print_()`.

In Python 3, you use `print()` with parentheses, while in Python 2, you use `print` without them.

So, using `six.print_()` lets your code work in both versions.

Key Points:

  • The name ‘six’ comes from 2 x 3 = 6.
  • Similar tools include `future`.
  • If you plan to move your code to Python 3 only (and drop Python 2), check out `2to3`.

While I understand why this package is popular, I hope people move away from Python 2 soon, especially since it’s been officially unsupported since January 1, 2020.

3. Botocore, Boto3, S3transfer, AWS CLI

These projects are grouped together:

  • Botocore: 660 million downloads (Rank 3)
  • S3transfer: 584 million downloads (Rank 7)
  • AWS CLI: 394 million downloads (Rank 17)
  • Boto3: 329 million downloads (Rank 22)

Botocore is the foundation of AWS’s low-level interface. It powers Boto3 (Rank 22), a library that lets you interact with AWS services like S3 and EC2.

Botocore also powers AWS CLI, the command-line interface for AWS.

S3transfer (Rank 7) is a Python library for managing S3 transfers. It’s still in development, and its homepage advises caution, suggesting that users lock versions because its API might change even in minor updates.

Boto3, AWS CLI, and many other projects rely on S3transfer.

The high ranks of these AWS-related libraries show how popular AWS services are.

4. Pip: 627 million downloads

Many of you probably know and love pip, the Python package installer.

Pip makes it easy to install packages from the Python Package Index and other repositories (like local mirrors or custom ones with private software).

Fun facts about pip:

  • Pip stands for “Pip Installs Packages.”
  • Pip is very simple to use. To install a package, just run `pip install <package_name>`. To uninstall, run `pip uninstall <package_name>`.
  • The biggest advantage of pip is its ability to install multiple packages at once, usually listed in a `requirements.txt` file. This file can also specify exact package versions. Most Python projects include this file.
  • Using pip with virtualenv (Rank 57) allows you to create isolated environments that won’t interfere with your system’s Python environment.

5. python-dateutil: 617 million downloads

The python-dateutil module adds powerful extensions to the standard datetime module.

Things that regular Python datetime can’t do can be handled by python-dateutil.

This library lets you do some cool stuff.

Here’s one useful example: parsing fuzzy date strings from a log file:

from dateutil.parser import parse
log_line = "INFO 2020–01–01T00:00:01 Happy new year, human."
timestamp = parse(log_line, fuzzy=True)
print(timestamp)
# Output: 2020–01–01 00:00:01

6. Requests: 611 million downloads

Requests are built on top of the most downloaded library, Urllib3. It makes sending HTTP requests incredibly simple.

Many users prefer Requests over Urllib3, so the number of end-users might even exceed that of Urllib3. The latter is more low-level and usually used as a dependency in other projects.

Here’s an example showing how easy it is to use Requests:

import requests
r = requests.get("https://api.github.com/user", auth=("user", "pass"))
r.status_code
# 200
r.headers["content-type"]
# 'application/json; charset=utf8'
r.encoding
# 'utf-8'
r.text
# u'{"type":"User"…'
r.json()
# {u'disk_usage': 368627, u'private_gists': 484, …}

7. S3transfer

See the description under Rank 3 for the connection between Ranks 3, 7, 17, and 22.

8. Certifi: 552 million downloads

In recent years, almost all websites have started using SSL, as indicated by the padlock icon in the address bar, meaning the site is secure and encrypted, preventing eavesdropping.

Encryption is based on SSL certificates, issued by trusted companies or nonprofits like LetsEncrypt. These organizations sign the certificates digitally.

Using the public part of these certificates, browsers can verify a website’s signature, ensuring you’re visiting the real site and that no one is eavesdropping.

Python can do the same, using Certifi.

Certifi includes a collection of root certificates, similar to those found in browsers like Chrome, Firefox, and Edge.

Certifi is a set of root certificates that allows Python code to verify SSL certificates’ authenticity.

Many projects trust and rely on Certifi, which is why it ranks so high.

9. Idna: 527 million downloads

According to its PyPI page, Idna provides “support for the IDNA protocol (Internationalised Domain Names in Applications) as defined in RFC5891.”

Let’s break down what Idna means:

IDNA is a set of rules for handling domain names with non-ASCII characters. But aren’t domain names already supporting non-ASCII characters? What’s the issue?

The problem is that many applications (like email clients and web browsers) don’t support non-ASCII characters. More specifically, email and HTTP protocols don’t support these characters.

This isn’t a big issue in some countries, but it can be inconvenient in places like China, Russia, Germany, and Indonesia.

So, some smart people in these countries came together to create IDNA.

At the core of IDNA are two functions: ToASCII and ToUnicode.

ToASCII converts internationalized Unicode domain names to ASCII strings, and ToUnicode does the reverse. In the Idna package, these functions are called `idna.encode()` and `idna.decode()`. Here’s an example:

import idna
idna.encode("ドメイン.テスト")
# b'xn - eckwd4c7c.xn - zckzah'
print(idna.decode("xn - eckwd4c7c.xn - zckzah"))
# ドメイン.テスト

10. PyYAML: 525 million downloads

YAML is a data serialization format designed to be both human-readable and easy for machines to parse.

PyYAML is Python’s YAML parser and emitter, meaning it can read and write YAML.

It can encode any Python object into YAML: lists, dictionaries, and even class instances.

Python has its own configuration manager, but YAML’s capabilities far exceed those of Python’s built-in ConfigParser (which only supports basic .ini files).

For example, YAML can store any data type: boolean, list, float, etc.

In ConfigParser, everything is stored as a string. If you need to load an integer using ConfigParser, you must specify that you want an integer:

config.getint("section", "my_int")

But with PyYAML, the type is recognized automatically, so you get an int with just:

config["section"]["my_int"]

YAML also allows deep nesting, which is handy, even if not all projects need it.

You can choose which one to use, but many projects use YAML for configuration files, which is why PyYAML is so popular.

11. Pyasn1: 512 million downloads

Like IDNA, this project also has a lot of technical details in its description:

“A pure Python implementation of ASN.1 types and DER/BER/CER encoding (X.208).”

Fortunately, there’s still plenty of information available about this decades-old standard.

ASN.1 stands for Abstract Syntax Notation One and is an early form of data serialization.

It originated in the telecommunications industry.

You may have heard of Protocol Buffers or Apache Thrift; ASN.1 is essentially their version from 1984.

ASN.1 describes a cross-platform interface for sending data structures between different systems.

Remember Certifi from Rank 8?

ASN.1 is used to define the format of certificates in the HTTPS protocol and many other encryption systems.

It’s also widely used in protocols like SNMP, LDAP, Kerberos, UMTS, LTE, and VOIP.

It’s a very complex standard, and some implementations have been found to be vulnerable.

Check out this discussion on Reddit about ASN.1:https://www.reddit.com/r/programming/comments/1hf7ds/useful_old_technologies_asn1/

Unless absolutely necessary, I recommend avoiding it. However, many packages depend on this protocol, which is why it’s widely used.

12. Docutils: 508 million downloads

Docutils is a modular system for converting plain text documents into other formats like HTML, XML, and LaTeX.

It can read plain text files written in reStructuredText, a format similar to Markdown.

You’ve probably heard of or read PEP documents. What are they?

PEP stands for Python Enhancement Proposal.

PEPs are design documents providing information to the Python community or describing a new feature for Python or its processes.

PEPs should provide precise technical specifications and justifications for the feature.

PEP documents are written in a standard reStructuredText template and then converted into polished documents using Docutils.

The core of Sphinx, a tool for creating documentation projects, also uses Docutils. If Docutils is the machine, Sphinx is the factory.

Sphinx was originally designed to build Python’s own documentation, but many other projects use it for their docs.

You’ve probably read the documentation on readthedocs.org — those docs are created with Sphinx and Docutils.

13. Chardet: 501 million downloads

You can use the Chardet module to detect the character encoding of a file or data stream.

This is especially useful when working with large amounts of random text.

It can also help determine the encoding of strings in data downloaded from the internet.

After installing Chardet, you can use the command-line tool `chardetect` like this:

chardetect somefile.txt
# Output: somefile.txt: ascii with confidence 1.0

You can also use the library in your programs. See the documentation here: https://chardet.readthedocs.io/en/latest/usage.html.

Requests and many other packages depend on Chardet. I suspect not many people use Chardet directly, so its popularity likely comes from these dependencies.

14. RSA: 492 million downloads

RSA is a pure Python implementation of the RSA encryption algorithm. It supports:

  • Encryption and decryption
  • Signing and verifying signatures
  • Key generation according to PKCS#1 version 1.5

It can be used both as a Python library and from the command line.

  • The three letters in RSA stand for the surnames of Ron Rivest, Adi Shamir, and Leonard Adleman, who invented the algorithm in 1977.
  • RSA is one of the first public-key cryptosystems, widely used for secure data transmission. This system involves two keys: a public key and a private key. Data encrypted with the public key can only be decrypted with the private key.
  • RSA is slow, so it’s typically used to encrypt the shared key in a symmetric encryption system, which is faster and better suited for encrypting large amounts of data.

Here’s a simple example of using RSA:

import rsa
# Bob creates a key pair:
bob_pub, bob_priv = rsa.newkeys(512)
# Alice encrypts a message for Bob with his public key
crypto = rsa.encrypt("hello Bob!", bob_pub)
# When Bob gets the message, he decrypts it with his private key:
message = rsa.decrypt(crypto, bob_priv)
print(message.decode("utf8"))
# Output: hello Bob!

If Bob has the private key, Alice can be sure only Bob can read the message.

But Bob can’t be sure that Alice is the sender because anyone can get Bob’s public key.

To prove the message is from Alice, she can sign it with her private key.

Bob can then verify the signature using Alice’s public key, confirming that the message indeed came from Alice.

Many other packages, like google-auth (Rank 37), oauthlib (Rank 54), and awscli (Rank 17), depend on RSA. This package isn’t often used directly because faster, more native methods are available.

15. JMESPath: 473 million downloads

Working with JSON in Python is easy because JSON maps perfectly to Python dictionaries.

I think this is one of the best features.

To be honest, I hadn’t heard of JMESPath before, even though I’ve used JSON a lot.

I usually use `json.loads()` and manually pull data from dictionaries, often writing loops.

JMESPath (pronounced “James path”) makes working with JSON in Python even easier.

It lets you define, in a declarative way, how to extract data from a JSON document. Here are some basic examples:

import jmespath
# Get a specific element
d = {"foo": {"bar": "baz"}}
print(jmespath.search("foo.bar", d))
# Output: baz
# Using a wildcard to get all names
d = {"foo": {"bar": [{"name": "one"}, {"name": "two"}]}}
print(jmespath.search("foo.bar[*].name", d))
# Output: ["one", "two"]

This is just the tip of the iceberg. For more, see its documentation and PyPI page.

Learn more Top 15 Python Packages with 100 Million+ Downloads in 2024

Leave a Reply