Methodology & Data Sources
Everything on this site is computed from public datasets with a documented, reproducible method. This page explains where the numbers come from, how they are calculated, and what their limits are.
Data sources
- Nasdaq/Quandl WIKI Prices (public domain) - about 3,000 US stocks with split- and dividend-adjusted daily prices, frozen at March 2018.
- DoltHub post-no-preference/stocks (CC BY-SA 4.0) - US stocks and ETFs from 2011 to the present, including delisted tickers, with dividend and split tables. Refreshed roughly daily.
- Kaggle ETF history (benjaminpo, CC BY-SA 4.0; borismarjanovic, CC0) - deep ETF price history back to each fund's inception (SPY to 1993, QQQ to 1999, GLD to 2004).
- FRED TB3MS (US government, public domain) - the 3-month Treasury-bill rate used as the risk-free rate in Sharpe and Sortino calculations.
- Long-history academic series - Robert Shiller's S&P 500 data (1871+), Ken French's market and size portfolios (1926+), World Bank gold and silver prices (1960+), and Shiller home-price data, used only for the clearly-labeled backfill tickers.
- Crypto - Bitcoin from Blockchain.com (2010+) and Ethereum from Binance data (2017+).
Total-return method
For each ticker we build a daily total-return index: prices are split-adjusted, and every dividend is reinvested on its ex-date. Monthly returns are the month-end percentage change of that index. Formally, for each trading day:
rett = (split_mult × closet + dividendt) / closet-1 − 1
Different sources are joined by chain-linking returns, never by concatenating price levels, so the combined series is continuous across source boundaries regardless of price-scale differences.
Data cleaning
Raw feeds contain errors, so the pipeline validates rather than trusts them:
- Split validation. A recorded split only counts if the price actually breaks by about the recorded ratio near the recorded date. Bogus or misdated split records (which would otherwise fabricate huge one-day jumps) are dropped, and each split is applied on the day the price actually moved.
- Unrecorded-split repair. A clean 2x, 3x, ... or 1/2, 1/3, ... one-day break with no matching split record is treated as a missing adjustment and neutralized. Genuine market spikes (which are never exactly integer-ratio, and usually revert) are kept.
- Exchange test symbols (ZVZZT and friends) are excluded entirely.
Backfill (estimated) tickers
Tickers ending in X, such as SP500X, GOLDX, or AGGX, are separate estimated series that extend an asset class before its cheapest real fund existed: the proxy index's monthly return minus a pro-rated expense ratio, chain-linked onto the real fund's returns once it exists. Real fund tickers are never modified; estimated months are flagged and the UI shows where real data begins.
Statistics
- CAGR is the compound annual growth rate over the selected window (explainer).
- Volatility is the standard deviation of monthly total returns, annualized by √12.
- Max drawdown is the largest peak-to-trough decline of the total-return series (explainer).
- Sharpe and Sortino divide excess return over the T-bill rate by total and downside volatility respectively (explainer).
Limitations
- Hypothetical results exclude trading costs, spreads, taxes, and fund tracking error (backfills apply an expense haircut, real funds use actual prices).
- The WIKI-era stock universe is community-maintained and carries some survivorship bias; the Dolt era includes delisted tickers.
- Mutual funds are not covered (the sources are exchange-traded securities only).
- The most recent day or two may lag the live market, matching the upstream feed's cadence.
Frequently asked questions
Where does the price data come from?
US stock data splices the public-domain Nasdaq/Quandl WIKI dataset (through March 2018) with the DoltHub post-no-preference/stocks dataset (2011 to present, refreshed daily). ETF history extends to each fund's inception via Kaggle ETF datasets, and the risk-free rate is the 3-month US Treasury bill from FRED.
Do the returns include dividends?
Yes. All results use total returns: every dividend and distribution is assumed reinvested on its ex-date, and prices are fully split-adjusted.
What are the backfill tickers like SP500X and GOLDX?
They are separate, clearly-labeled estimated series that extend an asset class before the real fund existed (the S&P 500 to 1871, gold to 1968, US bonds to 1953) using academic index data, with an expense-ratio haircut applied. Real fund tickers only ever show their real history; you opt into a backfill explicitly.
How often is the data updated?
Daily. A refresh worker pulls the newest end-of-day rows from the upstream dataset each day, recomputes the trailing month, and republishes. The upstream feed itself typically runs one or two days behind the live market.
Nothing on this site is investment advice; see the disclosures in the footer. Attribution for CC BY-SA sources appears in the footer of every page.