A Reader Said 'Hold 13F Picks for Years, Not Weeks.' We Tested It — and Nearly Fooled Ourselves.

June 30, 2026 · FindataFox

The sharpest pushback on our backtests was a good one: "the 45-day filing lag only kills a copy-it-next-week trade. The signal that holds up is manager conviction over years, not a quarter." Fair. So we held the most patiently-owned institutional names for one to four years. They still trailed the S&P at every horizon — and the one result that looked like a win turned out to be a statistical illusion we had to catch in our own output.

TL;DR

A reader on one of our earlier studies made the best argument against our nulls: we'd only tested 1- and 2-quarter holds, and of course those fail — by the time a 13F is public, a fast trade is stale. But a position a manager intends to hold for years doesn't care about a 45-day lag. Treat the filing as a conviction signal over years, and it should survive.

It's a genuinely good point, so we tested it properly: take the stocks held longest by their institutional base (the literal "conviction over years" signal), buy the top decile at the 45-day-lag price, and hold for 1, 4, 8, 12, or 16 quarters — up to four years.

Hold	Annualized return	S&P 500	Factor-adjusted α	Honest t-stat
1 quarter	13.7%	15.1%	+6.9%	1.8
1 year	12.6%	15.1%	+5.7%	1.6
2 years	10.6%	14.4%	+2.9%	0.7
3 years	9.3%	13.5%	+6.3%	1.0
4 years	9.1%	13.1%	+3.9%	0.4

Every single horizon underperformed the S&P — and holding longer returned less, not more. Nothing is statistically significant. The reader's hypothesis is reasonable, and it doesn't hold up.

But the interesting part is the column that isn't in that table — the one that almost fooled us.

The trap: how we nearly announced a win

Our raw output for the 1-year hold showed a factor-adjusted alpha of +5.7% with a t-statistic of 3.26 — and the 3-year hold showed t=3.54. A t above 2 is the conventional "this is real, not luck" threshold. For a moment it looked like we'd finally found the thing: hold institutional favorites for a year and you generate significant alpha.

That would have been wrong, and here's why.

When you test a 1-year hold but start a new test every quarter, consecutive tests overlap by nine months. The 1-year return starting in March and the one starting in June share three-quarters of the same data. They are not independent observations — but the standard t-stat math assumes they are. So it counts the same returns over and over and reports false confidence. The longer the hold, the worse the overlap: a 4-year hold started each quarter overlaps the next by 3.75 years.

The honest correction is to deflate the t by roughly the square root of the hold length (the rough factor by which overlap inflates it). Do that, and:

1-year: 3.26 → 1.6
3-year: 3.54 → 1.0
4-year: 1.44 → 0.4

Nothing clears 2. The eye-catching significance was autocorrelation, not signal. With 13 years of data, a non-overlapping 3-year hold gives you only four independent windows — you simply cannot establish significance from four data points, no matter what the naive math prints.

We're telling you this because it's exactly the trap that produces the "edges" other sites sell. Run a long-hold backtest with overlapping windows, don't correct for it, and the software will happily hand you a t-stat of 3 on a strategy that's actually noise. We caught it in our own output and fixed the tool. That's what testing in the open is supposed to look like.

"Positive alpha" while losing to the market — the low-beta mirage

There's a second subtlety. Even uncorrected, how can a basket show positive factor-adjusted alpha (+5.7%) while underperforming the S&P (12.6% vs 15.1%)?

Because patiently-held names are lower-beta — they swing less than the market. The factor model rewards "did well for how much risk you took," and on that basis the basket looks fine. But you don't spend risk-adjusted returns; you spend dollars, and in dollars it trailed the index. A positive factor alpha that still loses to the market is not an edge — it's a low-risk portfolio doing what low-risk portfolios do.

What it means

The 45-day lag isn't the only problem — there just isn't a tradeable edge at any horizon. Holding longer makes the lag stop hurting, but "robust to the lag" was never the same as "has skill." The patient names earn their long-run, mostly-mega-cap factor return: roughly the market, minus a bit, at lower risk.
The reader's instinct — that the stable core matters more than the quarterly churn — is probably right. It just doesn't translate into beating the market by buying it 45 days late.
This closes the last open door from our backtest scoreboard: short holds failed, and now long holds fail too. The public, lagged 13F signal has no tradeable edge at one quarter or at four years.

The honest caveats

13F is long U.S. equity only — no shorts, options, leverage, or non-US. A real long-horizon edge could live outside what we can see.
The signal is the stock's institutional base, not a single famous fund. "Held longest by institutions overall" is a structural property, tested as a basket — not a bet on one manager's conviction.
Long-hold significance is fundamentally hard. Thirteen years of data can't significance-test a four-year hold; we report the limitation rather than paper over it.
Research and education, not investment advice. Past performance does not predict future results.

Part of our 13F research series: we backtested the popular strategies — none beat the market · skill doesn't persist · the famous pickers are closet indexers · beating the S&P isn't an edge. NOT investment advice.