Home
GhostLabs  /  Insights  /  Research

The GhostScore: a 100-point framework for smart contract risk.

How we built a scoring model from $4.3 billion in historical exploits, why governance gets the same weight as code security, and what happens when a project fails too many critical thresholds at once. Published benchmarks included.

01 / The Problem

Binary audits miss the shape of risk.

The conventional smart contract audit produces a binary verdict: pass or fail, with a list of findings sorted by severity. This served the industry when the failure mode was a single reentrancy exploit. It no longer reflects how projects actually collapse.

Between 2021 and early 2026, on-chain exploits drained over $4.3 billion from protocols and token holders. When we studied the post-mortems, a pattern emerged that binary audits structurally cannot capture: the majority of catastrophic losses involved governance failure, not code failure.

67% of the exploit value we analysed traced back to compromised admin keys, single-signer treasuries, unaudited proxy upgrades pushed by privileged wallets, or opaque team structures where accountability was architecturally impossible. The code was often technically sound. The humans around it were not.

This insight shaped the GhostScore. A scoring model that treats governance as structurally equal to code security, that penalises critical failures non-linearly, and that produces a single composite number legible to traders, compliance officers, and AI systems alike.

The core thesis

A project with perfect code and terrible governance is not a secure project. It is a project where the exploit path runs through people rather than functions. The scoring model must capture both.

02 / The Architecture

Five pillars, twenty questions each.

The GhostScore decomposes smart contract risk into five orthogonal dimensions. Each pillar asks a battery of binary-outcome questions derived from the exploit corpus. Each pillar scores 0 to 20. The composite score is a weighted sum of all five, subject to amplification and overrides.

Pillar 01

Security

Code-level risk. Static analysis findings, known vulnerability patterns, bytecode verification status, audit history, proxy upgrade safety, dependency health. The traditional audit surface, necessary but not sufficient.

Highest weight band
Pillar 02

Team & Governance

Human-layer risk. Admin key configuration, multi-sig presence, time-lock controls, team transparency, deployer wallet history, governance structure. The pillar that catches what code audits cannot see.

Highest weight band
Pillar 03

Tokenomics

Economic design risk. Supply concentration, vesting schedules, fee structures, max-transaction limits, liquidity lock status, treasury controls, mint authority. The mechanics of value extraction.

Medium weight band
Pillar 04

Value

External validation signals. Liquidity depth, trading volume consistency, exchange listing breadth, market capitalisation stability, holder distribution patterns. What the market already knows.

Medium weight band
Pillar 05

Health

Vitality indicators. Contract age, deployment chain maturity, development activity, integration count, community size, documentation quality. The long-term viability signal.

Lower weight band
Pillar 06 · Deep Audit Only

The AI Consensus Layer

What no single engine sees. An AI consensus engine reads the contract end to end, traces the exploit paths that static analysis misses, writes the narrative that explains why the code does what it does, and seals the report. Available on Deep Audit and Enterprise tiers.

AI consensus overlay

Why five, not three or seven

We tested architectures with three pillars (collapsing governance into security, tokenomics into market) and seven (splitting security into static and dynamic, splitting governance into on-chain and off-chain). The five-pillar model produced the tightest correlation with actual exploit outcomes while maintaining pillar orthogonality. Each pillar captures a meaningfully different failure mode.

Collapsing governance into security obscured the single most important finding in the dataset: that governance failures and code failures are statistically independent. A project can have flawless code and catastrophic governance, or vice versa. The model must see both.

03 / Weight Calibration

Empirical weights from $4.3 billion in exploits.

The five pillar weights are not arbitrary. They were derived from a regression analysis of exploit outcomes between 2021 and early 2026, using dollar-value-at-risk as the dependent variable.

The calibration process worked as follows. We assembled a corpus of 340+ documented exploit incidents with known root causes and quantified losses. Each incident was retrospectively scored against the five-pillar framework: which pillar failures would the model have needed to detect in order to flag the project before the exploit?

We then optimised pillar weights to maximise the model's ability to separate exploited projects from non-exploited projects in the same market conditions. The optimisation was constrained to require all weights to sum to 1.0 and remain positive.

The governance finding

The result that most surprised us: the optimal weight for Team & Governance was nearly identical to the weight for Security. Not slightly lower. Not a fraction. Nearly equal.

This runs counter to the industry's implicit assumption that code is king. In practice, the data shows that governance compromise is responsible for a roughly equal share of aggregate exploit losses. The implications are significant: any scoring model that treats governance as a secondary concern is structurally miscalibrated against the actual threat landscape.

Weight band summary

Security and Team & Governance together account for approximately 60% of the composite score. They are in the same weight band. Deliberately, and empirically.

Tokenomics and Value occupy the middle band, together roughly 30%.

Health carries the smallest weight, approximately 10%. It captures long-term viability but rarely predicts acute exploit risk.

Recalibration cadence

Weights are recalibrated semi-annually as new exploit data accrues. The May 2026 calibration is the current production version. We publish the calibration date and the approximate exploit corpus size with every GhostScore, but we do not publish exact weight values. This is a deliberate competitive boundary: the concepts are public and replicable, the precise calibration is our edge.

04 / The Non-Linear Amplifier

One critical failure is a warning. Multiple critical failures are a pattern.

The GhostScore includes a non-linear penalty function called the amplifier that converts critical-threshold question failures into a score deduction that grows faster than linearly.

Across the five pillars, a subset of questions are designated as critical-threshold questions. These represent failure modes with a direct path to loss of funds: hidden mint functions, unverified source code, honeypot detection, sanctions exposure, active exploit indicators, deployer wallets linked to previous incidents.

A project that fails one critical question receives a meaningful but contained penalty. A project that fails three receives a disproportionately larger penalty. A project that fails ten or more will see its composite score driven toward zero regardless of how well it performs on non-critical questions.

Why non-linear

Linear penalty functions treat each critical failure as independent. In reality, critical failures cluster. A project with a hidden mint function is far more likely to also have an unverified deployer, absent governance, and suspicious tokenomics than a random project would be. The amplifier captures this correlation: as critical failures accumulate, the probability that the project is adversarial rises superlinearly, and the score should reflect that.

The amplifier also solves a structural problem in weighted-average scoring. Without it, a project could score high on four pillars and catastrophically on one, yet still produce a composite score in the "caution" range rather than the "evacuate" range. The amplifier ensures that systemic critical failure is legible in the final number.

Amplifier in practice

In our benchmark scoring, a confirmed honeypot site with 18 critical failures scored 0 out of 100. Not because every individual question failed, but because the amplifier recognised the pattern: when nearly every critical threshold is breached, the project is not unlucky. It is adversarial. The score reflects that conclusion.

05 / Hard Overrides

Some signals bypass the model entirely.

Certain conditions are so unambiguously dangerous that they override the composite scoring pipeline and cap the final score directly.

The GhostScore includes a set of hard overrides: conditions that, when detected, impose a maximum score ceiling regardless of what the five-pillar assessment produces. These are not penalty deductions. They are architectural circuit breakers.

  • Confirmed honeypot. The contract traps user funds by design. Score is capped at a very low ceiling.
  • OFAC-sanctioned address. The deployer or a linked wallet appears on international sanctions lists. Score is capped near zero.
  • Known exploit match. The contract address matches a confirmed exploit in curated incident databases. Score is forced to the minimum.
  • Active indicators of value extraction. Real-time detection of rug-pull or pump-and-dump patterns in progress. Score is capped low.
  • Mixer or tumbler association. The deployer wallet has direct interaction with mixing services, indicating obfuscation intent. Score is capped.

Hard overrides exist because a probabilistic model should not produce an optimistic score for a project that has already been confirmed as malicious by external evidence. The model scores risk; overrides recognise certainty.

06 / Published Benchmarks

Named scores for real projects.

A scoring model is only credible if it produces scores you can challenge. Below are published GhostScore benchmarks for projects scored against the production model as of May 2026.

Project Score Tier Key insight
Bitcoin (BTC) 95 10 · The Gold Standard Zero critical failures. 15 years of operation without exploit. Perfect Security and Value pillars. The benchmark ceiling.
Tether (USDT)
0xdac17f...831ec7
27 3 · Literary Fiction Strong Value pillar. Multiple critical failures in Team & Governance: centralised control, no on-chain governance, opaque attestations. The model treats governance structurally, not reputationally.
Binance-Peg ETH
0x2170ed...f933f8
29 3 · Literary Fiction Healthy ecosystem metrics. Governance black hole: single-entity control, no time-locks, no multi-sig, complete centralisation. Same pattern as USDT. Brand does not override structure.
elienmusk.site 0 1 · Total Asymmetry 18 critical failures. Amplifier drove score to zero. Honeypot confirmed. The model’s floor case: adversarial by every available signal.

Why publish USDT and Binance-Peg ETH scores

Because the model's credibility depends on applying the same framework to large-cap tokens as to obvious honeypots. A scoring model that scores elienmusk.site at zero but gives USDT a pass because of brand recognition is not a model. It is marketing.

The USDT and Binance-Peg ETH scores are not commentary on whether those tokens will lose value. They are a structural assessment: these tokens concentrate extraordinary control in a single entity without the governance safeguards the model requires for a high score. That is an empirical observation, not a prediction.

Note what the model captures that a traditional audit cannot: USDT's smart contract code is simple and functional. A code-only audit would likely return a clean report. The GhostScore sees the governance layer around that code and scores accordingly.

07 / Limitations & Honest Boundaries

What the GhostScore does not do.

Transparency about limitations is itself a signal. A model that claims completeness is a model that has not been tested against reality.

Not investment advice

The GhostScore describes structural properties of smart contracts and the entities that deploy them. It does not predict price movement, market conditions, or future team behaviour. A score of 95 does not mean "buy." A score of 27 does not mean "sell." It means the governance structure has measurable deficiencies that increase asymmetric risk.

Not a replacement for human audit

The free automated assessment is a triage layer. It surfaces high-confidence signals in seconds across every major EVM chain. For protocols deploying significant TVL, the GhostScore should be one input alongside a dedicated human audit from firms like Trail of Bits, OpenZeppelin, or Spearbit.

Point-in-time assessment

A GhostScore reflects the state of the contract and its surrounding signals at the moment of assessment. Proxy upgrades, ownership transfers, and liquidity events can change the score. GhostLabs Sentinel monitoring addresses this for continuously monitored contracts, but one-time assessments are inherently snapshots.

EVM-first coverage

The current model is calibrated for EVM-compatible chains. Non-EVM chains (Solana, Cosmos, Move-based chains) have different attack surfaces and require separate calibration. Solana support is in development for Q3 2026.

The weight recalibration lag

Pillar weights are recalibrated semi-annually. A novel exploit category that emerges between calibrations may be underweighted until the next recalibration cycle. This is a known limitation and one reason we publish the calibration date.

08 / Frequently Asked Questions

Questions we get asked most.

How does GhostLabs score smart contracts?

GhostLabs uses a 100-point scoring framework called the GhostScore. It evaluates smart contracts across five pillars: Security, Team & Governance, Tokenomics, Value, and Health. Each pillar is scored out of 20, then combined using empirically calibrated weights derived from analysis of $4.3 billion in historical exploits. A non-linear amplifier penalises projects that fail multiple critical-threshold questions.

Is the GhostScore free?

Yes. The automated assessment is free, unlimited, and available on every supported EVM chain. Enter any contract address at ghostlabs.asia for an instant GhostScore. Deep Audits with line-level findings, exploit narratives, and a signed PDF report are available as a paid tier.

What chains does GhostLabs support?

Ethereum, Base, Arbitrum, Optimism, BNB Chain, Polygon, Avalanche, Linea, zkSync Era, Scroll, Mantle, Blast, Sonic, Berachain, and Hyperliquid. Solana is scheduled for Q3 2026. See the full chain coverage list.

Why does USDT score 27 when it has never been exploited?

The GhostScore measures structural risk, not historical outcomes. USDT concentrates extraordinary control in a single entity without on-chain governance safeguards, multi-sig protections, or transparency mechanisms that the model requires for a high governance score. The exploit data shows that this governance pattern, applied to other projects, has historically preceded catastrophic losses. The model does not grant exceptions for brand size.

How is the GhostScore different from CertiK, Hacken, or GoPlus?

Most existing tools focus primarily on code-level security analysis. The GhostScore is a multi-dimensional model that weights governance equally with code security, based on empirical exploit data. It uses a non-linear amplifier for critical failures and produces a single composite score on a 10-tier scale. The free tier is unlimited with no API key required. See our detailed side-by-side comparison or the full methodology.

Can the GhostScore predict whether a token will be exploited?

No. The GhostScore is a risk assessment, not a prediction. It identifies structural properties that have historically correlated with exploit outcomes. A high score means the project exhibits the structural characteristics of projects that have not been exploited. A low score means it exhibits characteristics associated with higher exploit probability. It is a signal, not a guarantee.

How often are scores updated?

Anyone can trigger a free re-assessment at any time. For contracts under Sentinel monitoring, scores update automatically within seconds of any detected on-chain change (proxy upgrade, ownership transfer, liquidity event). The pillar weights themselves are recalibrated semi-annually against the growing exploit corpus.

Score any contract. Free.

The same 100-point framework, the same five pillars, the same non-linear amplifier. Enter a contract address and see what the model finds. Every chain. Every time.