Data format reference

Every dataset uses one file per market round, named by the round's closing unix timestamp. This page documents the exact schema so you can build a parser before downloading anything — or just grab a free sample and run the snippets below.

File naming

btc-updown-5m-1770992100.csv          # BTC 5-minute price ticks
eth-updown-5m-1770992400.csv          # ETH 5-minute price ticks
1777014300_orderbook.jsonl            # BTC 5m orderbook depth

The 10-digit number is the unix timestamp of the round's scheduled close. Consecutive 5-minute rounds are exactly 300 seconds apart, so a full day is 288 files.

Price CSV schema

columntypedescription
timestampfloatUnix time of the tick, sub-second precision. Timezone-independent — use this for analysis.
datetimestringHuman-readable time as recorded. Note: written in the recorder's local timezone (UTC+8).
up_pricefloatLast traded price of the Up share, 0.00–1.00.
down_pricefloatLast traded price of the Down share, 0.00–1.00.
remaining_minutesfloatMinutes until the round resolves.
timestamp,datetime,up_price,down_price,remaining_minutes
1770992101.078051,2026-02-13 22:15:01,0.5100,0.5100,4.98
1770992102.787616,2026-02-13 22:15:02,0.5300,0.4700,4.95
1770992104.529263,2026-02-13 22:15:04,0.5600,0.4500,4.92

Ticks arrive roughly every 1.5–2 seconds, so a 5-minute round typically holds 150–200 rows. As a round approaches resolution the winning side converges toward 1.00. The last row's larger price identifies the winner.

Orderbook JSONL schema

One JSON object per line. Each object is a complete depth snapshot:

{
  "timestamp": 1777014301.46,
  "datetime": "2026-04-24 07:05:01",
  "remaining_minutes": 4.98,
  "up":   { "bids": [{"price": "0.01", "size": "8045.24"}, ...],
            "asks": [{"price": "0.99", "size": "7966.61"}, ...] },
  "down": { "bids": [...], "asks": [...] }
}

Loading the archive with pandas

import glob
import pandas as pd

frames = []
for path in glob.glob("btc_5m/btc-updown-5m-*.csv"):
    df = pd.read_csv(path)
    df["market_close"] = int(path.rsplit("-", 1)[1].removesuffix(".csv"))
    frames.append(df)

ticks = pd.concat(frames, ignore_index=True)
ticks["ts"] = pd.to_datetime(ticks["timestamp"], unit="s", utc=True)

# label every round with its winner
winners = (
    ticks.sort_values("timestamp")
         .groupby("market_close")
         .last()
         .assign(winner=lambda d: (d.up_price > d.down_price).map({True: "up", False: "down"}))
)

Reading orderbook snapshots

import json

def best_bid_ask(book):
    return float(book["bids"][-1]["price"]), float(book["asks"][-1]["price"])

with open("1777014300_orderbook.jsonl") as fh:
    for line in fh:
        snap = json.loads(line)
        bid, ask = best_bid_ask(snap["up"])
        print(snap["remaining_minutes"], bid, ask, ask - bid)

Data quality notes

License

Free samples may be used for evaluation and research with attribution. The purchased archive is licensed to the buyer for internal research, backtesting and model training; redistribution or resale of the raw files is not permitted.