Why Do We Need an AI-Based MVP System?

Many players have reported that our MVP score — intended to measure a player's impact on the game outcome — does not always reflect the true contribution of a player in a match.

We iterated multiple times on improving the MVP calculation logic using hand-crafted rules and weighted formulas. While each iteration improved consistency, the system still failed in certain scenarios.

The underlying problem is that this is an RTS game, where player impact is highly contextual.

There is no single statistic, or fixed combination of statistics, that can correctly identify the most impactful player across all matches.

Each match differs in:

map topology and size
number of players
match duration
strategies, pacing, and meta

As a result, identical statistics can represent very different levels of impact depending on the context.

Limitations of Rule-Based MVP Scoring

In some matches, the MVP is clearly the player with the highest combat efficiency (e.g. units defeated).

In others, the most impactful contribution comes from:

economic disruption rather than direct combat,
territorial control enabling team expansion,
absorbing enemy pressure to create strategic advantage for teammates.

Unfortunately, our game API does not expose high-level tactical information such as:

pressure absorption,
indirect enablement of allied players.

Instead, we only have access to aggregated, low-level telemetry.

This makes rule-based systems inherently brittle:

they require constant tuning,
they fail in edge cases,
they encode developer assumptions that may not generalize.

Step 1: Data Collection and Normalization

Over several months, we collected telemetry data from all submitted matches, resulting in:

13,000+ matches
160,000+ player entries

Per-Match Normalization

Raw metrics such as:

resources consumed,
units defeated,
military power,
territory controlled

cannot be compared globally across matches.

For example, longer matches naturally inflate resource and unit statistics, introducing spurious correlations.

To mitigate this, all player metrics are normalized on a per-match basis.

Each value is represented as a percentage relative to other players in the same match, ensuring that:

players are compared against all other participants of the match, including both allies and opponents,
the model learns relative contribution instead of absolute magnitude.

The processed dataset is stored as a CSV file (~50 MB) and used as training input.

Step 2: Framing the Problem as Ranking

Rather than predicting an absolute “impact score”, we frame MVP detection as a learning-to-rank problem.

Why Not Regression?

Regression models attempt to predict an absolute target value.
However, in RTS games:

impact does not have a globally meaningful scale,
match context dominates absolute values.

What matters is not how much impact a player had, but how their impact compares to other players in the same match.

Ranking Objective

Our goal becomes:

Given a group of players from the same match, produce an ordering from lowest to highest impact.

This framing naturally aligns with the MVP concept.

Step 3: Pseudo-Target Construction

Because there is no ground-truth MVP label, we use weak supervision.

We construct a pseudo-target using a weighted combination of normalized metrics, representing an approximate notion of player impact:

combat contribution,
survival,
resource usage,
territorial control.

Players are ranked within each match using this pseudo-target. The pseudo-target serves only as a relative ordering signal, not as a ground-truth score, and is intentionally noisy.

Importantly:

the model is not trained to reproduce this formula,
it is trained to learn feature interactions that explain the relative ordering across many matches.

This approach provides a stable training signal while allowing the model to generalize beyond fixed weights.

Step 4: Model Choice — LambdaRank (LightGBM)

We use LambdaRank, implemented in LightGBM, which is designed for learning-to-rank tasks.

Key properties:

optimization is pairwise (player A vs player B within the same match),
training explicitly respects match boundaries,
the loss function focuses on ranking quality rather than numeric error.

We optimize for NDCG@k, prioritizing correct ordering of top-ranked players (MVP candidates).

Step 5: Training Process

Training data consists of:

feature vectors describing player behavior,
per-match ranking labels,
explicit group definitions representing match boundaries.

The model is trained using gradient-boosted decision trees, allowing it to capture:

non-linear feature interactions,
different performance profiles (economic, military, hybrid),
context-dependent importance of features.

With over 160,000 samples, the model converges reliably without overfitting when using conservative hyperparameters and a moderate number of boosting rounds.

Step 6: Inference and Impact Score Scaling

At inference time, the model outputs a raw ranking score for each player.

These scores:

are unbounded,
have no absolute meaning,
are only comparable within the same match.

For presentation purposes, we apply per-match min–max normalization and map the result to a 0–1000 Impact Score.

This preserves relative ordering while providing a stable and intuitive value for UI display.

Summary

By reframing MVP detection as a ranking problem and leveraging weak supervision with LambdaRank, we are able to:

compare players fairly within highly variable match contexts,
reduce reliance on brittle rule-based systems,
support diverse and non-obvious playstyles,
adapt naturally as strategies and meta evolve.

This system does not attempt to define “the perfect player”.
Instead, it answers a more tractable and meaningful question:

Who had the greatest impact in this specific match?

ws-stats.pl

custom-AI-model