If you’re an active trader, a data scientist, or just someone who likes building tools around financial data, you’ve probably asked yourself: What’s the best way to keep a fresh dataset of daily stock prices without paying a fortune or managing clunky Excel files?
In this article, I’ll walk you through a complete Python workflow that:
- Downloads daily price bars for a list of tickers using the Polygon API
- Automatically appends only new data from yesterday
- Stores everything in a clean, deduplicated CSV file that grows over time
This solution is ideal for retail traders or analysts who want a scalable, code-first approach to market data — and don’t want to spend hours manually exporting and cleaning it.
🛠️ Tools We’ll Use
- Python 3.10+
- Polygon.io API — a reliable market data provider with a free tier
- Pandas — for reading and storing structured tabular data
- Requests — to make HTTP calls to the Polygon API
We’ll also build in rate limiting to stay within your plan’s request quota.
Disclaimer
This article is for educational purposes only. I am not affiliated with Polygon.io, Deepnote, or any other company mentioned in this post. All opinions are my own, and I do not receive any compensation for recommending these tools. Use them at your own discretion and always review their pricing and usage terms before integrating into your workflow.
Understand the Goal
Every day, you want to:
- Check yesterday’s date
- Pull OHLCV (Open, High, Low, Close, Volume) bars for all your tickers
- Append that data to a local
price_history.csv
file - Make sure there are no duplicate entries (e.g. if you run the script twice by accident)
The solution should be:
- Fully automated
- Storage-efficient (no redundant entries)
- Respectful of API rate limits
Set Up Your Ticker List
We download all stock tickers (also known as “tickers metadata”) from the Polygon API using the /v3/reference/tickers
endpoint.
import requests
import pandas as pd
def get_active_polygon_tickers(api_key, asset_class="stocks", limit=1000, verbose=True, save_path=None):
"""
Downloads all active tickers from Polygon.io's reference endpoint.
Parameters:
api_key (str): Your Polygon API key
asset_class (str): 'stocks', 'crypto', 'fx' (default: 'stocks')
limit (int): Max items per page (Polygon max: 1000)
verbose (bool): Print page progress
save_path (str or None): If set, saves the results to a CSV file
Returns:
pd.DataFrame: All active tickers as a DataFrame
"""
url = "https://api.polygon.io/v3/reference/tickers"
all_results = []
page = 1
params = {
"active": "true",
"asset_class": asset_class,
"limit": limit,
"apiKey": api_key
}
while True:
if verbose:
print(f"📥 Fetching page {page}...")
response = requests.get(url, params=params)
data = response.json()
if "results" not in data:
print("⚠️ No results or error:", data)
break
all_results.extend(data["results"])
if not data.get("next_url"):
break
url = data["next_url"]
params = None
page += 1
df = pd.DataFrame(all_results)
if verbose:
print(f"✅ Retrieved {len(df)} active tickers")
if save_path:
df.to_csv(save_path, index=False)
if verbose:
print(f"💾 Saved to {save_path}")
return df
Handle Polygon Rate Limits
The free tier of Polygon allows 5 requests per minute for the aggregates endpoint. So we’ll build a function that spaces out API calls accordingly using Python’s time.monotonic()
clock.
Here’s the function that does the heavy lifting:
def get_aggregated_price_bars_rl(api_key, symbols, start_date, end_date, time_span="day", multiplier=1, as_dataframe=True, reqs_per_min=5, session=None, verbose=True):
import time
import requests
import pandas as pd
if session is None:
session = requests.Session()
def one_call(sym):
url = f"https://api.polygon.io/v2/aggs/ticker/{sym}/range/{multiplier}/{time_span}/{start_date}/{end_date}?apiKey={api_key}"
r = session.get(url, timeout=15)
r.raise_for_status()
js = r.json()
js["ticker"] = sym
return js
def format_json_to_df(raw):
df_all = pd.DataFrame()
rename = {"t": "Date", "v": "Volume", "vw": "VWAP", "o": "Open", "c": "Close", "h": "High", "l": "Low", "n": "Trades"}
for js in raw:
if "results" not in js:
continue
df = pd.DataFrame(js["results"])[list(rename.keys())].rename(columns=rename)
df["Date"] = pd.to_datetime(df["Date"], unit="ms").dt.date.astype(str)
df["symbol"] = js["ticker"]
df_all = pd.concat([df_all, df], ignore_index=True)
return df_all
raw_responses = []
start_window = time.monotonic()
calls_in_window = 0
for sym in symbols:
if calls_in_window >= reqs_per_min:
elapsed = time.monotonic() - start_window
wait = max(0, 60 - elapsed)
if verbose:
print(f"⏳ Rate limit hit – sleeping {wait:.1f}s")
time.sleep(wait)
start_window = time.monotonic()
calls_in_window = 0
try:
raw_responses.append(one_call(sym))
except Exception as e:
print(f"⚠️ Failed: {sym} – {e}")
calls_in_window += 1
return format_json_to_df(raw_responses) if as_dataframe else raw_responses
Put It All Together in a Daily Workflow
Here’s the main script that:
- Loads yesterday’s date
- Reads
tickers.csv
- Calls the above function
- Appends results to your long-term CSV (
price_history.csv
)
import os
import pandas as pd
from datetime import datetime, timedelta
# === CONFIG ===
API_KEY = "YOUR_POLYGON_API_KEY"
TICKER_CSV_FILE = "tickers.csv"
HISTORY_FILE = "price_history.csv"
REQS_PER_MIN = 5
# === USE YESTERDAY'S DATE FOR START and END DATE ===
yesterday = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
START_DATE = yesterday
END_DATE = yesterday
# === LOAD TICKERS ===
tickers_df = pd.read_csv(TICKER_CSV_FILE)
symbols = tickers_df['symbol'].dropna().astype(str).str.upper().unique().tolist()
# === DOWNLOAD DATA ===
new_data = get_aggregated_price_bars_rl(
api_key=API_KEY,
symbols=symbols,
start_date=START_DATE,
end_date=END_DATE,
time_span="day",
multiplier=1,
as_dataframe=True,
reqs_per_min=REQS_PER_MIN,
verbose=True
)
# === LOAD EXISTING HISTORY IF PRESENT ===
if os.path.exists(HISTORY_FILE):
old_data = pd.read_csv(HISTORY_FILE)
combined = pd.concat([old_data, new_data], ignore_index=True)
combined.drop_duplicates(subset=["symbol", "Date"], keep="last", inplace=True)
else:
combined = new_data.copy()
# === SAVE UPDATED HISTORY ===
combined.sort_values(["symbol", "Date"], inplace=True)
combined.to_csv(HISTORY_FILE, index=False)
print(f"✅ Saved updated history to {HISTORY_FILE}")
Run Everything in Deepnote (No Local Setup Required)
We’ll also run this workflow in Deepnote — a collaborative, cloud-based Jupyter environment.
Deepnote is perfect for this kind of job:
- No installation needed — just paste in your code
- Environment management is built-in
- You can schedule notebooks to run daily or trigger them manually
- CSV files can be stored in the Files panel or connected to cloud drives (like Google Drive)
You can run this entire pipeline — from loading tickers to appending data — right from your browser.
What You Get
After a few weeks, your price_history.csv
will contain daily OHLCV data for every ticker in your list — updated daily and with no duplicates.
It’s a solid foundation for:
- Building trading models
- Running backtests
- Tracking portfolio history
- Performing daily reports or charts
A Note on Free API Plans
Polygon’s free plan allows:
- 5 requests/minute for aggregated bars
- No real-time quotes (only historical)
- ~2-year historical limit on daily bars
If you want faster refreshes or intraday intervals, consider upgrading your plan or batching the download across time chunks.
Final Thoughts
As a retail investor with a full time job you don’t need a Bloomberg Terminal or a hedge fund’s budget to track the markets intelligently. By working with end-of-day (EOD) price data and a simple Python script, you can build a clean, automated data pipeline that updates your stock watchlist every day — no servers, no databases, no stress.
One of the biggest advantages of using EOD data is time. You’re not reacting to every tick — you’re collecting, analyzing, and making decisions on your own schedule. This leaves plenty of room for:
- Signal generation
- Portfolio rebalancing
- Backtesting strategies
- Running technical or statistical models
In contrast, real-time data demands a far more complex setup: streaming architectures, live execution, and constant monitoring. For many retail traders, data analysts, and solo quant enthusiasts, that overhead simply isn’t worth it.
With EOD data, the pace is slower — but the possibilities are just as powerful.
Thank you for being a part of the community
Before you go:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Newsletter | Podcast | Twitch
- Start your own free AI-powered blog on Differ 🚀
- Join our content creators community on Discord 🧑🏻💻
- For more content, visit plainenglish.io + stackademic.com