Data format reference

Every dataset uses one file per market round, named by the round's closing unix timestamp. This page documents the exact schema so you can build a parser before downloading anything — or just grab a free sample and run the snippets below.

File naming

btc-updown-5m-1770992100.csv          # BTC 5-minute price ticks
eth-updown-5m-1770992400.csv          # ETH 5-minute price ticks
1777014300_orderbook.jsonl            # BTC 5m orderbook depth

The 10-digit number is the unix timestamp of the round's scheduled close. Consecutive 5-minute rounds are exactly 300 seconds apart, so a full day is 288 files.

Price CSV schema

column	type	description
`timestamp`	float	Unix time of the tick, sub-second precision. Timezone-independent — use this for analysis.
`datetime`	string	Human-readable time as recorded. Note: written in the recorder's local timezone (UTC+8).
`up_price`	float	Last traded price of the Up share, 0.00–1.00.
`down_price`	float	Last traded price of the Down share, 0.00–1.00.
`remaining_minutes`	float	Minutes until the round resolves.

timestamp,datetime,up_price,down_price,remaining_minutes
1770992101.078051,2026-02-13 22:15:01,0.5100,0.5100,4.98
1770992102.787616,2026-02-13 22:15:02,0.5300,0.4700,4.95
1770992104.529263,2026-02-13 22:15:04,0.5600,0.4500,4.92

Ticks arrive roughly every 1.5–2 seconds, so a 5-minute round typically holds 150–200 rows. As a round approaches resolution the winning side converges toward 1.00. The last row's larger price identifies the winner.

Orderbook JSONL schema

One JSON object per line. Each object is a complete depth snapshot:

{
  "timestamp": 1777014301.46,
  "datetime": "2026-04-24 07:05:01",
  "remaining_minutes": 4.98,
  "up":   { "bids": [{"price": "0.01", "size": "8045.24"}, ...],
            "asks": [{"price": "0.99", "size": "7966.61"}, ...] },
  "down": { "bids": [...], "asks": [...] }
}

bids are sorted ascending — the best (highest) bid is the last element.
asks are sorted descending — the best (lowest) ask is the last element.
Prices and sizes are strings, exactly as returned by the Polymarket CLOB API.
Both outcome tokens (Up and Down) carry a full ladder in every snapshot.

Loading the archive with pandas

import glob
import pandas as pd

frames = []
for path in glob.glob("btc_5m/btc-updown-5m-*.csv"):
    df = pd.read_csv(path)
    df["market_close"] = int(path.rsplit("-", 1)[1].removesuffix(".csv"))
    frames.append(df)

ticks = pd.concat(frames, ignore_index=True)
ticks["ts"] = pd.to_datetime(ticks["timestamp"], unit="s", utc=True)

# label every round with its winner
winners = (
    ticks.sort_values("timestamp")
         .groupby("market_close")
         .last()
         .assign(winner=lambda d: (d.up_price > d.down_price).map({True: "up", False: "down"}))
)

Reading orderbook snapshots

import json

def best_bid_ask(book):
    return float(book["bids"][-1]["price"]), float(book["asks"][-1]["price"])

with open("1777014300_orderbook.jsonl") as fh:
    for line in fh:
        snap = json.loads(line)
        bid, ask = best_bid_ask(snap["up"])
        print(snap["remaining_minutes"], bid, ask, ask - bid)

Data quality notes

Rounds with fewer than 10 recorded ticks are treated as incomplete and excluded from site statistics; they may still appear in the raw archive.
Short recorder outages can produce missing rounds — each day page shows exactly how many rounds were captured.
All dates on this site group rounds by UTC day derived from the filename timestamp.

License

Free samples may be used for evaluation and research with attribution. The purchased archive is licensed to the buyer for internal research, backtesting and model training; redistribution or resale of the raw files is not permitted.