Skill Score Methodology Whitepaper
A fully transparent explanation of how we calculate, normalize, weight, and validate your Skill Score — including sample sizes, benchmarks, and known limitations.
1. Design Philosophy
Core principle: Your Skill Score should never feel arbitrary. Every number you see is traceable to a specific formula, with publicly documented inputs and weights.
The Skill Score is a composite metric designed to give professionals a single, interpretable number (0–100) that reflects their current skill level across multiple communication and productivity dimensions.
We chose a hybrid weighted-average model over alternatives (percentile ranking, machine-learned scores, or simple averages) for three reasons:
- Transparency: Every input, weight, and normalization step is documented and auditable.
- Stability: Hysteresis rules prevent frustrating level fluctuations from single bad sessions.
- Actionability: Category breakdowns show exactly which skills to improve.
2. Score Architecture
Overall Score = (Typing × 0.25) + (Writing × 0.25) + (Resume × 0.25) + (Reading × 0.15) + (Consistency × 0.10) Where each category score ∈ [0, 100] Overall Score ∈ [0, 100] Window = 7-day rolling average (configurable)
The score is computed server-side using a 7-day rolling window. Only sessions within the last 7 days contribute to the score, ensuring it reflects current ability rather than historical peaks.
If a user has no activity in a category during the window, that category score is 0 — not interpolated. This is intentional: we want the score to reflect actual, recent practice.
3. Category Weights
| Category | Weight | Rationale | Data Sources |
|---|---|---|---|
| Typing | 25% | Foundational skill for all digital work. Measured objectively. | typing_history (WPM, accuracy) |
| Writing | 25% | Written communication quality — readability + error rate. | grammar_history (readability, issues, words) |
| Resume | 25% | Career-document quality. Activity frequency + AI-scored quality. | ai_usage (resume endpoints), baseline_results |
| Reading | 15% | Reading speed + comprehension. Lower weight due to shorter sessions. | reading_history (WPM, comprehension) |
| Consistency | 10% | Active days in 7-day window. Rewards regular practice. | Distinct active days across all tables |
| Total | 100% | ||
4. Normalization Functions
Each category has a different natural scale (e.g., typing WPM ranges 0–200+, accuracy 0–100%). We normalize all categories to a 0–100 scale using piecewise linear functions:
Typing Normalization
WPM component = min(60, (avgWPM / 120) × 60) // 0–120 WPM → 0–60 pts Accuracy component = (avgAccuracy / 100) × 40 // 0–100% → 0–40 pts Typing Score = min(100, WPM + Accuracy) // Combined: 0–100
Cap at 120 WPM because professional typing benchmarks consider 120+ WPM "expert." Accuracy is weighted less because WPM variation has higher practical impact.
Writing Normalization
Readability component = min(60, avgReadability × 0.6) // 0–100 score → 0–60 pts Issue rate = avgIssues / avgWordCount × 100 // Issues per 100 words Issue component = max(0, 40 − (issueRate × 8)) // Lower is better → 0–40 pts Writing Score = min(100, Readability + Issue) // Combined: 0–100
Readability uses a composite Flesch-Kincaid-like score. Issue rate penalty: 5+ issues per 100 words zeroes out the issue component entirely.
Resume Normalization
Activity component = min(40, (sessionsCount / 20) × 40) // 0–20 sessions → 0–40 pts Quality component = min(60, (avgResumeScore / 100) × 60) // 0–100 → 0–60 pts Resume Score = min(100, Activity + Quality) // Combined: 0–100
Sessions count rewards iterative resume improvement. Quality comes from the most recent baseline resume score or AI analysis result.
Reading Normalization
Speed component = min(50, (avgWPM / 500) × 50) // 0–500 WPM → 0–50 pts Comprehension component = (avgComprehension / 100) × 50 // 0–100% → 0–50 pts Reading Score = min(100, Speed + Comprehension) // Combined: 0–100
Cap at 500 WPM based on speed-reading research upper bounds. Speed and comprehension are equally weighted because fast reading without understanding is not valuable.
Consistency Normalization
Consistency Score = min(100, (activeDays / 7) × 100) // 0–7 days → 0–100 Active day = any day with at least 1 session in: typing_history, grammar_history, pomodoro_history, or reading_history
7/7 active days = perfect consistency score. Pomodoro sessions count toward active days but do not affect skill categories directly.
5. Level Bands & Stability Rules
Stability Rule (Hysteresis)
Level UP: requires ≥ 3 active days in the current window Level DOWN: requires ≥ 5 active days in the current window If active days are insufficient, level remains unchanged regardless of score.
Why? Prevents frustrating level drops from one bad day after vacation. The asymmetry (3 up / 5 down) is intentional — we want upward progress to feel rewarding while downward movement requires sustained evidence of decline.
6. Benchmark Data
| Metric | Internal Benchmark | External Reference | Sample Size | Collection Period |
|---|---|---|---|---|
| Average Typing WPM | 52 WPM | 40–50 WPM (TypeRacer avg) | n=347 | Jan–Feb 2026 |
| Average Typing Accuracy | 94.2% | 92–96% (industry avg) | n=347 | Jan–Feb 2026 |
| Average Writing Readability | 68/100 | 60–70 (Flesch-Kincaid avg) | n=189 | Jan–Feb 2026 |
| Average Grammar Issues/100 words | 2.1 | 1.5–3.0 (Grammarly reports) | n=189 | Jan–Feb 2026 |
| Average Reading WPM | 238 WPM | 200–250 WPM (adult avg) | n=94 | Feb 2026 |
| Baseline Completion Rate | 67% | — | n=412 | Jan–Feb 2026 |
| 7-Day Consistency (active days) | 3.1 days | — | n=256 | Feb 2026 |
| Median Overall Skill Score | 34.7 | — | n=256 | Feb 2026 |
All internal benchmarks are calculated from anonymized, aggregated user data. No individual user data is exposed. External references are from publicly available industry reports and academic literature.
7. Validation & Sample Sizes
Early Cohort Validation
- Cohort size: n=127 (beta users, Nov 2025 – Jan 2026)
- Weight validation: A/B tested 3 weight configurations
- User satisfaction: 78% rated score as "fair or better"
- Correlation test: Score correlated 0.71 with self-reported skill level
Score Distribution
- Level 1 (Starter): 23% of users
- Level 2 (Developing): 38% of users
- Level 3 (Proficient): 27% of users
- Level 4 (Advanced): 10% of users
- Level 5 (Expert): 2% of users
Based on n=256 users, Feb 2026
8. Score Decay & Freshness
// Maintenance cron (weekly): // Scores not recalculated within 7 days are decayed: decayed_score = current_score × 0.98 // This applies to all sub-scores independently. // After decay, level bands are recalculated. // Expired sessions (>30 days) are cleaned up. // AI usage logs older than 90 days are purged.
Decay rate: 2% per week for inactive users. This ensures the score reflects current ability rather than historical performance. Active users are never decayed.
9. Known Limitations
Small sample sizes
Our benchmarks are based on early adopters (n=94 to n=412). Distributions will shift as the user base grows. We commit to updating benchmark tables quarterly.
Weight subjectivity
Category weights are partially subjective. While validated against user satisfaction data, there is no universally "correct" weighting for professional skills. We plan to offer customizable weights in a future update.
AI-dependent categories
Resume scoring relies on AI (GPT-4o-mini) analysis, which introduces model-specific biases. We mitigate this with cached, deterministic prompts but acknowledge AI scoring is not equivalent to human expert review.
English-centric
All benchmarks and normalization are calibrated for English text. Non-native English speakers may score lower in writing/reading categories due to language proficiency rather than skill gaps.
10. Methodology Changelog
Added reading category (15% weight), reduced consistency from 15% to 10%. Added stability rule hysteresis. Published benchmark data with sample sizes.
Full rewrite with 5-category model. Introduced weekly rolling window, normalization functions, and level bands. Replaced EMA smoothing with windowed averages.
Initial 3-category model (Typing, Grammar, Focus). Simple EMA-based calculation. No level bands.
Related Resources
Questions about our methodology? Contact our team.
This whitepaper is versioned and maintained on our changelog.