Goldman's Tail Hedging Analysis Is Right About the Direction. The Hard Part Starts After the Model.
Goldman's framing is directionally right. The implementation stack that sits below the model is where programs actually live or die.
Goldman Sachs Asset Management published "From Defense to Offense: Finding the True Value of Tail-Risk Hedging" in January. The paper makes the case that standalone tail hedging adds almost nothing: 0.8 basis points annually, even at 99% reliability. Where the value shows up, they argue, is in what hedging permits. A 50% reliable strategy with modest alpha enables roughly 30 bps of incremental compounding through higher equity allocation. Most readers skim past that.
The directional claim is right. Tail hedging is a permission structure. It lets you stay long through drawdowns instead of panic-selling at the bottom. We are glad that we've come to the same conclusions in our both our fundamental research and psychological philosophy.
The paper earns credit for the things it does well: the two-regime framing of defense versus offense, the model-based demonstration that standalone tail hedging has almost no standalone value, the explicit utility-based sizing framework, and a useful benchmarking lens for comparing strategies and managers. Those are useful conceptual anchors for a conversation that too often gets stuck arguing about cost alone.
The paper models hedging at the conceptual level using two parameters, reliability and alpha. That is a reasonable scope choice for a framework piece, especially because Goldman is discussing a broad set of tail-risk approaches rather than just options overlays. But for option-based tail hedging programs, it leaves five operational decisions - the ones that actually determine whether a real program survives - downstream of the analysis rather than inside it.
Five implementation variables for option-based hedges
The implementation stack beneath the reliability/alpha model
Reliability emerges from strike, entry, monetization, rehedge, and reinvestment decisions working together as an operating stack.
Cost. In one illustrative case, Goldman sizes the optimal hedge at about 1.3% of portfolio risk for a 50% reliable, zero-alpha strategy. Premium cost sits outside the model's scope. The CBOE PPUT Index, which is passive monthly put-buying and about as naive as it gets, has underperformed the S&P by roughly 4-5% annualized over 30 years. AQR showed similar results: systematic put-buying runs around -6.4% annualized (Ilmanen, Thapar, Tummala & Villalon, 2020). Israelov's "Pathetic Protection" went further: protective puts are "quite ineffective at reducing drawdowns versus the simple alternative of statically reducing exposure." Verdad Capital reached the same wall from a different direction. After testing passive put-buying across OptionMetrics data since 1996, they concluded: "we were unable to identify a simple options-based approach that both protected against black swan events and did so at an acceptably low cost." Their caveat is worth reading in full, because it names the gap directly: "achieving this goal requires either a complexity of strategy or a robust active approach that goes beyond our basic quantitative efforts." That's the passive baseline, and three independent research teams arrived there from different angles. Programs with real entry discipline, sizing up when protection is cheap and scaling back when it's expensive, operate at a fraction of that drag. The spread between naive and intelligent implementation is large and the model absorbs cost into the alpha parameter. In practice, cost is the variable that ends hedge programs before they ever have the chance to pay off.
Strike selection. The decision between 15% out-of-the-money (OTM) puts and 35% OTM puts corresponds to a completely different strategy. One responds to corrections at maybe 2-3% annual drag. The other is catastrophe insurance at a fraction of that. The payout frequencies, management requirements, and the way the total portfolio actually feels to the investor are completely different.
Different drawdown shapes create different hedge outcomes
The same hedge structure can look brilliant in a fast crash and useless in a slow grind. That is a shape problem before it is a strike problem.
15% OTM and 35% OTM are different programs
The deductible determines which class of event you insure and how much carry you agree to tolerate while waiting.
The same Verdad study showed that deeper OTM strikes were more efficient in sudden crises like COVID, because the primary payoff driver is implied vol convexity, not intrinsic value. Most allocators who say tail hedging is "too expensive" are quoting near-the-money protection on passive strategies and applying that cost to a problem that only calls for catastrophic coverage. For option-based hedges, this framework doesn't specify a strike.
Monetization. In our experience, monetization is the single most consequential operational decision for option-based programs - and the one the reliability/alpha framing collapses most aggressively. The model treats payoff as a function of reliability. In practice, payoff is a function of whether someone pulls the trigger in a compressed window. In real markets, nobody delivers the payoff. You take it or you don't, but this is a choice. It is your option.
The S&P fell 34% in 23 trading days in March 2020. Universa reported a 3,612% return on its tail-risk book, although this reporting has been debated.
Eight trading days after the bottom, the S&P had recovered 17%. By mid-April, up 28%. A hedge that peaked around March 16-23 and wasn't monetized inside that window gave back most of its value within weeks. Israelov & Nze Ndong (2023) studied this directly: the monetization window during COVID was compressed to days, not weeks, and the V-shaped recovery made timing the dominant variable in hedge value capture.
The payoff spike is perishable
Own the hedge and delay the monetization decision long enough, and the market will hand back a meaningful share of the convexity for you.
Sell too early and you clip the tail you're paying to capture. Sell too late and the V-shaped recovery eats your hedge P&L. Don't sell at all and you're not running a hedge program, you're buying lottery tickets and letting them expire. Bhansali, Chang, Holdom & Rappaport showed in "Monetization Matters" (2020) that simple rules-based monetization, selling at pre-defined price multiples of initial cost, significantly improved hedge program performance versus hold-to-expiry using actual March 2020 data. Their conclusion: actively managed tail-risk strategies can result in significant increases in efficacy.
A monetization system must have a ruleset detailing when to take profits, in what tranches, using what triggers - we walked through specific rulesets in a separate note. This is the most consequential operational decision in a tail hedge program, and the framework either doesn't address it or hides it behind the alpha and reliability parameters. It's also the clearest example of why Verdad's "simple, passive approach" hits a ceiling. A passive strategy has no monetization framework. It holds to expiry or it doesn't. The gap between that and a managed program with real exit discipline is where most of the reliability range lives.
Reinvestment. Goldman frames the value of tail hedging as enabling a higher static equity allocation. Hold more beta across the cycle. But the bigger mechanism is what you do with the proceeds once you've monetized. A program that cashes out during a crash generates liquidity at exactly the moment equity prices are depressed. Reinvesting those proceeds, buying cheap equity with hedge profits, is counter-cyclical capital deployment that compounds through the recovery. That now becomes a compounding engine. Bhansali & Davis formalized this at PIMCO in "Offensive Risk Management" (2010), showing that the "shadow value" of a tail hedge program, the optionality to deploy capital at distressed prices, can exceed the direct hedge payoff. The paper stops at the risk-budget framing; the reinvestment mechanism sits outside its scope. The inherent and implicit reinvestment of a well-designed tail hedge program, buying at depressed prices rather than forced selling, doesn't even begin to describe the emotional and psychological effects during a major drawdown, which we spoke about here and here.
Rehedging. After you monetize, do you re-establish protection immediately in an elevated vol environment? Or wait for normalization and accept being unhedged through a potential second leg? The answer depends on regime and probability assessment, and it changes the character of the program entirely. Those decisions sit outside the reliability-and-alpha abstraction.
Governance
Governance is arguably the dominant determinant of program survival, and the most common failure mode we see in practice. We've laid out a governance checklist for options overlays specifically.
Goldman's sizing framework uses CRRA utility. Constant Relative Risk Aversion. The investor has stable preferences and acts rationally across all market conditions. If they held all equities before, that reveals their risk tolerance, and the model optimizes from there.
Do you know of any investors like that? Didn't think so.
In August 2017, CalPERS, the largest US public pension at over $400B under management, allocated to Universa Investments and LongTail Alpha as tail-risk hedges. The programs ran for two years during a bull market.
In October 2019, CalPERS terminated both programs. The hedges were a visible drag during a rally. The cost was small but it showed up negative in every quarterly review. The benefit, protection against something that hadn't happened, was invisible.
By January 2020 the positions were fully unwound. On February 19, the S&P peaked. Thirty-three days later it had fallen 34%. CalPERS' estimated missed payout: over $1 billion.
The cost of the hedge was probably a handful of basis points. The cost of cancelling it was the largest missed windfall in public pension history.
And this is hardly unique. It follows the familiar failure pattern we've called the quiet tax. Allocate with conviction, bleed through quiet months, field uncomfortable questions in quarterly reviews, cancel the program, watch the tail event arrive. Benartzi and Thaler documented the underlying mechanism in 1993: myopic loss aversion. Evaluate a tail hedge on a quarterly horizon and it always looks like a waste. Evaluate it over a full cycle and it transforms the portfolio. Most governance structures evaluate quarterly. Gneezy & Potters (1997) confirmed it experimentally: subjects who evaluated their portfolios more frequently took less risk and earned lower returns, even when the underlying opportunities were identical.
Goyal & Wahal (2008) showed the same pattern at institutional scale: across 3,400 plan sponsor decisions, the managers they fired subsequently outperformed the managers they hired. Committees systematically buy high and sell low. The pressure to act on recent performance is structural, not a character flaw.
Goldman's model can tell you the optimal hedge allocation. It says much less about how to keep the program in place through a raging bull market and exuberant investor appetite (which is probably exactly when one needs it). In our experience, governance is often the primary failure mode, more so than the math, strike selection, or headline cost. That's why, in our view, deep understanding of the strategy, with all its nuances, is paramount to actually running it successfully long-term.
Reliability
Goldman's central metric, "reliability," is defined as correlation between a strategy's returns and an ideal hedge. They cite PPUT at 40%, their own blended strategies at 20-70%.
That's a wide range. But what actually drives it? Goldman doesn't unpack the implementation choices behind that range here. Verdad's work gives a useful floor for options-based programs: passive put-buying, the simplest possible implementation, couldn't simultaneously protect against tail events and keep costs survivable. They tried. Their conclusion was that you need either structural complexity or active management to get there. For options-based hedges, that reads like the low end of the range. The high end likely reflects a materially different operating stack, with stronger implementation choices and management.
In practice, reliability isn't a parameter you set. It reflects how well the five decisions above work together: cost discipline, strike selection, monetization framework, reinvestment logic, rehedge rules. Two programs with identical "50% reliability" can behave completely differently depending on how they got there. A move from 20% to 70% reliability often comes from shifting from a passive allocation toward a managed program with real implementation expertise.
The paper presents reliability as an input. In our view, it is more useful as a scoreboard for implementation outcomes.
Where that leaves us
Goldman's thesis is correct and it's what we've seen in our research: tail hedging enables risk-taking. That framing is valuable and worth reading regardless of whether you agree with the implementation details.
But a framework that treats the hedge as a black box with two dials misses the part that determines whether the program actually works. The implementation layer, how you manage cost, when you monetize, what you do with the proceeds, how you govern the program through the quiet periods when everyone wants to cancel it, is where tail hedge programs live or die.
The philosophy is public. The engineering details are much less so. Maybe that is by design?
Which raises a question worth sitting with: if the implementation is what matters and the implementation is opaque, what exactly is an allocator supposed to do with a paper that only models the philosophy?
If you're evaluating or running a tail hedge program and these questions sound familiar, that's what we do.
Philosophical note for veriolab.com. Educational only. Not investment advice. Verio Labs provides modeling, analytics, and evaluation. We do not manage assets or give trade recommendations. See our Disclosures.