Data format reference
Every dataset uses one file per market round, named by the round's closing unix timestamp. This page documents the exact schema so you can build a parser before downloading anything — or just grab a free sample and run the snippets below.
File naming
btc-updown-5m-1770992100.csv # BTC 5-minute price ticks
eth-updown-5m-1770992400.csv # ETH 5-minute price ticks
1777014300_orderbook.jsonl # BTC 5m orderbook depthThe 10-digit number is the unix timestamp of the round's scheduled close. Consecutive 5-minute rounds are exactly 300 seconds apart, so a full day is 288 files.
Price CSV schema
| column | type | description |
|---|---|---|
timestamp | float | Unix time of the tick, sub-second precision. Timezone-independent — use this for analysis. |
datetime | string | Human-readable time as recorded. Note: written in the recorder's local timezone (UTC+8). |
up_price | float | Last traded price of the Up share, 0.00–1.00. |
down_price | float | Last traded price of the Down share, 0.00–1.00. |
remaining_minutes | float | Minutes until the round resolves. |
timestamp,datetime,up_price,down_price,remaining_minutes
1770992101.078051,2026-02-13 22:15:01,0.5100,0.5100,4.98
1770992102.787616,2026-02-13 22:15:02,0.5300,0.4700,4.95
1770992104.529263,2026-02-13 22:15:04,0.5600,0.4500,4.92Ticks arrive roughly every 1.5–2 seconds, so a 5-minute round typically holds 150–200 rows. As a round approaches resolution the winning side converges toward 1.00. The last row's larger price identifies the winner.
Orderbook JSONL schema
One JSON object per line. Each object is a complete depth snapshot:
{
"timestamp": 1777014301.46,
"datetime": "2026-04-24 07:05:01",
"remaining_minutes": 4.98,
"up": { "bids": [{"price": "0.01", "size": "8045.24"}, ...],
"asks": [{"price": "0.99", "size": "7966.61"}, ...] },
"down": { "bids": [...], "asks": [...] }
}bidsare sorted ascending — the best (highest) bid is the last element.asksare sorted descending — the best (lowest) ask is the last element.- Prices and sizes are strings, exactly as returned by the Polymarket CLOB API.
- Both outcome tokens (Up and Down) carry a full ladder in every snapshot.
Loading the archive with pandas
import glob
import pandas as pd
frames = []
for path in glob.glob("btc_5m/btc-updown-5m-*.csv"):
df = pd.read_csv(path)
df["market_close"] = int(path.rsplit("-", 1)[1].removesuffix(".csv"))
frames.append(df)
ticks = pd.concat(frames, ignore_index=True)
ticks["ts"] = pd.to_datetime(ticks["timestamp"], unit="s", utc=True)
# label every round with its winner
winners = (
ticks.sort_values("timestamp")
.groupby("market_close")
.last()
.assign(winner=lambda d: (d.up_price > d.down_price).map({True: "up", False: "down"}))
)Reading orderbook snapshots
import json
def best_bid_ask(book):
return float(book["bids"][-1]["price"]), float(book["asks"][-1]["price"])
with open("1777014300_orderbook.jsonl") as fh:
for line in fh:
snap = json.loads(line)
bid, ask = best_bid_ask(snap["up"])
print(snap["remaining_minutes"], bid, ask, ask - bid)Data quality notes
- Rounds with fewer than 10 recorded ticks are treated as incomplete and excluded from site statistics; they may still appear in the raw archive.
- Short recorder outages can produce missing rounds — each day page shows exactly how many rounds were captured.
- All dates on this site group rounds by UTC day derived from the filename timestamp.
License
Free samples may be used for evaluation and research with attribution. The purchased archive is licensed to the buyer for internal research, backtesting and model training; redistribution or resale of the raw files is not permitted.