top of page

2 January 2026 at 12:26:36 pm

Markouts: What Enterprise AI Must Steal From Quants

In quantitative trading, you don't wait for production P&L to know if your system works.


You use markouts.


A markout is a fast proxy for whether your system makes good trades. For each trade, you measure how price moved right after at different time horizons: 1 second, 10 seconds, 1 minute, 10 minutes, etc. Consistently negative markouts mean your P&L will bleed.


Markouts also help you diagnose your strategy in ways just P&L can't. Bad markouts from the first second itself? Your alpha is garbage or you're getting picked off by someone faster. Good markouts at 1 second but shit by 10 seconds? Your alpha decays too fast. Negative P&L just tells you something's wrong. Markouts tell you exactly where to look.


Quant trading is one of the few domains where autonomous systems actually work. The feedback infrastructure makes it possible. Layered mechanisms (feature correlations, markouts, backtests, paper trading) that let you iterate fast before terminal ground truth arrives. You don't have to wait for production P&L to tell you your trading system is fucked.


Enterprise workflows have nothing like this.


Why Enterprise AI Can't Learn


Three years in, most enterprises still can't tell you whether their AI works.


First, there's an architectural reason. Trading firms build tech and ops end-to-end, controlling every layer and instrumenting down to the nanosecond. Enterprise runs on fragmented systems that were never designed to trace a decision from input to outcome. CRM doesn't talk to contract management doesn't talk to support doesn't talk to revenue. There's barely any backprop of what works and what doesn't.


Take something as simple as using AI for writing content. Most companies run the same loop: AI drafts, humans edit and post. No signal flows back to the AI. Did that blog post convert? Did that email sequence outperform the last one? The AI never learns because nobody closes the loop.


There's also a deeper behavioral reason: markouts answer questions people inside the organization might not want answered.


You can't hide as a quant because your P&L is your P&L. Your system either works or it doesn't, and when it doesn't, your markouts turn negative, your P&L bleeds. The feedback is immediate and loud.


Most business functions aren't like this. Judgment stays implicit and attribution stays fuzzy. People can claim their playbook works, their process adds value, their instincts matter, and it's pretty hard to prove otherwise.


Markouts fuck that up.


They tell you which playbooks actually move numbers. Which teams create value. Who's busy versus who actually affects close rates, cycle times, churn. They make it impossible to coast.


The VP of Sales with 20 years of instinct suddenly has a scoreboard, and he's doing worse than the weird email sequence an intern tested last quarter. The team that's always "swamped" gets exposed as producing half the output per hour of the team nobody notices.


That's scary. Not just for underperformers, but for everyone. Most people have never operated in an environment where their output gets evaluated with trading-level precision. That's what a "quant business" paradigm would look like: applying the same feedback loops that make autonomous trading work to enterprise operations. The rigor that quants accept as normal would terrify most business teams.


Most companies keep judgment implicit because clarity is politically expensive.


Most companies aren't failing at enterprise AI because they're incompetent. They're failing because their systems can't learn.


French Restaurants and FDEs


Palantir figured this out early with their Forward Deployed Engineer (FDE) model. The story goes that CEO Alex Karp asked CTO Shyam Sankar:


"Do you know why French restaurants are so good?"


The answer: the wait staff is part of the kitchen. They understand the food, the methodology, the technique. They're not carrying plates, they're part of a feedback system that shapes what the kitchen does next. Karp wanted that for engineering and that birthed the Forward Deployed Engineer.


FDEs sat in war rooms to understand what needed to be built, built it and watched analysts actually use the software. They saw which data got used and which got ignored. They noticed when someone sighed at a recommendation and manually overrode it. When an analyst copied data into a side spreadsheet because they didn't trust the output. When a screen got glanced at and closed without action.


These micro-signals were the markout. Humans closing feedback loops by hand.


FDEs didn't just observe whether the software worked. Being embedded in the workflow meant they captured signals that made the software actually work. They were also hunting for anomalies: moments where an analyst's counter-intuitive action hinted at a workflow nobody had designed, or where an unexpected data source turned out to be the one that actually mattered. Manual data mining.


The FDE wasn't a glorified consultant. It was the cost of manufacturing ground truth in domains that don't produce it naturally.


You can't build this from the outside. You have to get inside the enterprise, sit in the workflows, watch the decisions, see what actually happens. But once you've built it for one client, what you learn transfers. The patterns in legal contracting at Company A inform what you instrument at Company B. The calibration that works for one sales team gives you a head start on the next.


Palantir proved that the FDE model works, and companies that win in enterprise AI will have to master the art of doing this thing that doesn't scale at scale.


Building Enterprise Markouts


Every enterprise thinks their context and workflows are unique. They're right, and that's the problem. You can't import best practices because the signals that matter vary by company, by team, by workflow. The metric that predicts closed deals at Company A might be noise at Company B.


Most companies respond by giving up on real measurement. They track vanity metrics that feel good but predict nothing. Reply rates that don't correlate with close rates. NPS that doesn't correlate with retention. Dashboards full of numbers nobody has validated.


An enterprise markout is a fast, instrumented signal that tells you whether a decision is working before the terminal outcome arrives.


Speed: Feedback at hours, days, weeks. Not just when the deal closes 6 months later. Otherwise you can't iterate.

Validation: You prove your fast signals predict terminal outcomes. Most companies skip this step.

Attribution: You trace outcomes back to decisions. Which model, which prompt, which context? Without this, you know something's broken but not what to fix.


In quant, you don't use a signal until you've proven it predicts returns. You test thousands of candidates. 99% are noise. You throw them away. Sales teams often do the opposite: they track dozens of metrics and assume they matter. Opens, clicks, replies, meetings booked. Nobody checks if any of it actually predicts closed deals.


So how should you build markouts for sales? Take a year of data and run the correlations. Does "replied within 24 hours" predict closed deals? Does "multiple stakeholders attended demo"? Most of what you track is noise. A few signals actually correlate with outcomes.


Layer them by time:

  • Hours: They didn't just reply. They forwarded it to three people you've never heard of. Two opened it within an hour. Your CRM shows none of this.

  • Days: A VP who wasn't on any call just visited your pricing page twice. Your champion's reply latency dropped from 2 hours to 20 minutes. The buying committee is expanding and you can't see it.

  • Weeks: The deal that closed had four stakeholders engaged by week two. The one that stalled had one for six weeks. The pipeline stage was identical.

  • Quarters: Won or lost. To whom. The real reason was implementation timeline, not price. But that's not what the dropdown said.


Each layer should predict the next. If "replied quickly" doesn't correlate with "meeting booked", and "meeting booked" doesn't correlate with "closed-won", you're tracking noise that feels like signal.


Once the loop is closed, you find patterns humans would never design. Your highest-converting sequence sends the third follow-up at 6am on a Saturday. Deals where a second stakeholder engages in week one close at 3x the rate, but only if champion response latency stays under four hours. No sales playbook contains that. The patterns emerge because the infra exists to capture it.


This is why you can't spec markout infrastructure from outside. You have to be embedded inside the business, watching real workflows work and break, to learn what signals matter in that specific context.


That's why FDEs come first. The infrastructure is what you encode after you've learned what works. And once you've built it, it compounds.


Move 37 for Enterprise


AlphaGo's Move 37 violated 3,000 years of Go wisdom. No human would have played it. But that dubious-looking move won the game, and the reason AlphaGo could come up with it was millions of self-play games with a clear feedback signal: win or lose. Strong feedback infrastructure lets you explore moves beyond human intuition.


Same pattern in quant trading. XTX, RenTec, and HRT became giants by testing thousands of hypotheses systematically, finding what works statistically, and deploying it even when they couldn't explain why. You don't need to know why some high-dimensional relationship exists to profit from it. You just need the infrastructure to detect it and double down. The "why it works" matters less than "does it work".


Some of the most profitable trading signals aren't human-interpretable at all. They don't need to be. The feedback infrastructure lets you mine signal from data at scale, regardless of whether it fits existing mental models.


This doesn't mean uncontrolled. It means counter-intuitive. You still see everything your systems do. But you're finding edges that competitors stuck on gut feeling will never see.


Enterprise is waiting for its Move 37 moment. But there's no infrastructure to find it. We're stuck with human-legible playbooks because there's no way to test alternatives at scale.


What if the resume signal that best predicts performance is response latency to the interview scheduling email? What if customers who open a support ticket in their first week churn less than customers who never contact support at all? What if the signal that best predicts loan default isn't credit score, but the timestamp of when the application was submitted?


These patterns only emerge when you can test at scale. And they're the kind of patterns that make VPs uncomfortable, because they imply that the playbook they've been running for a decade is leaving money on the table.


The Litmus Test


If you're building enterprise AI, ask yourself: can you take a decision your system made last week and show, quantitatively, whether it helped? If you can't, you have no markouts. You're hoping. And hope is a bad trading strategy.


Soon the gap between companies with markouts and companies without will look like the gap between Renaissance and a retail day trader.

Vihan Singh @ 2025

bottom of page