SFL
SkillFocusLab Research & Methodology Team
· v2.1 · Peer-reviewed internally

Skill Score Methodology Whitepaper

A fully transparent explanation of how we calculate, normalize, weight, and validate your Skill Score — including sample sizes, benchmarks, and known limitations.

1. Design Philosophy

Core principle: Your Skill Score should never feel arbitrary. Every number you see is traceable to a specific formula, with publicly documented inputs and weights.

The Skill Score is a composite metric designed to give professionals a single, interpretable number (0–100) that reflects their current skill level across multiple communication and productivity dimensions.

We chose a hybrid weighted-average model over alternatives (percentile ranking, machine-learned scores, or simple averages) for three reasons:

  1. Transparency: Every input, weight, and normalization step is documented and auditable.
  2. Stability: Hysteresis rules prevent frustrating level fluctuations from single bad sessions.
  3. Actionability: Category breakdowns show exactly which skills to improve.

2. Score Architecture

Overall Score = (Typing × 0.25) + (Writing × 0.25) + (Resume × 0.25) + (Reading × 0.15) + (Consistency × 0.10)

Where each category score ∈ [0, 100]
Overall Score ∈ [0, 100]
Window = 7-day rolling average (configurable)

The score is computed server-side using a 7-day rolling window. Only sessions within the last 7 days contribute to the score, ensuring it reflects current ability rather than historical peaks.

If a user has no activity in a category during the window, that category score is 0 — not interpolated. This is intentional: we want the score to reflect actual, recent practice.

3. Category Weights

Category Weight Rationale Data Sources
Typing25%Foundational skill for all digital work. Measured objectively.typing_history (WPM, accuracy)
Writing25%Written communication quality — readability + error rate.grammar_history (readability, issues, words)
Resume25%Career-document quality. Activity frequency + AI-scored quality.ai_usage (resume endpoints), baseline_results
Reading15%Reading speed + comprehension. Lower weight due to shorter sessions.reading_history (WPM, comprehension)
Consistency10%Active days in 7-day window. Rewards regular practice.Distinct active days across all tables
Total100%
Weight selection disclosure: Weights were set based on educational research on composite skill metrics and internal A/B testing with early cohorts (n=127). We plan to publish weight sensitivity analysis in Q3 2026. Weights may be adjusted based on user feedback — any changes will be documented in the changelog.

4. Normalization Functions

Each category has a different natural scale (e.g., typing WPM ranges 0–200+, accuracy 0–100%). We normalize all categories to a 0–100 scale using piecewise linear functions:

Typing Normalization

WPM component   = min(60, (avgWPM / 120) × 60)     // 0–120 WPM → 0–60 pts
Accuracy component = (avgAccuracy / 100) × 40         // 0–100% → 0–40 pts
Typing Score       = min(100, WPM + Accuracy)          // Combined: 0–100

Cap at 120 WPM because professional typing benchmarks consider 120+ WPM "expert." Accuracy is weighted less because WPM variation has higher practical impact.

Writing Normalization

Readability component = min(60, avgReadability × 0.6)  // 0–100 score → 0–60 pts
Issue rate            = avgIssues / avgWordCount × 100  // Issues per 100 words
Issue component       = max(0, 40 − (issueRate × 8))   // Lower is better → 0–40 pts
Writing Score         = min(100, Readability + Issue)    // Combined: 0–100

Readability uses a composite Flesch-Kincaid-like score. Issue rate penalty: 5+ issues per 100 words zeroes out the issue component entirely.

Resume Normalization

Activity component = min(40, (sessionsCount / 20) × 40)  // 0–20 sessions → 0–40 pts
Quality component  = min(60, (avgResumeScore / 100) × 60) // 0–100 → 0–60 pts
Resume Score       = min(100, Activity + Quality)           // Combined: 0–100

Sessions count rewards iterative resume improvement. Quality comes from the most recent baseline resume score or AI analysis result.

Reading Normalization

Speed component         = min(50, (avgWPM / 500) × 50)        // 0–500 WPM → 0–50 pts
Comprehension component = (avgComprehension / 100) × 50       // 0–100% → 0–50 pts
Reading Score           = min(100, Speed + Comprehension)      // Combined: 0–100

Cap at 500 WPM based on speed-reading research upper bounds. Speed and comprehension are equally weighted because fast reading without understanding is not valuable.

Consistency Normalization

Consistency Score = min(100, (activeDays / 7) × 100)  // 0–7 days → 0–100

Active day = any day with at least 1 session in:
  typing_history, grammar_history, pomodoro_history, or reading_history

7/7 active days = perfect consistency score. Pomodoro sessions count toward active days but do not affect skill categories directly.

5. Level Bands & Stability Rules

1
Starter
0–20 pts
2
Developing
20–40 pts
3
Proficient
40–60 pts
4
Advanced
60–80 pts
5
Expert
80–100 pts

Stability Rule (Hysteresis)

Level UP:   requires ≥ 3 active days in the current window
Level DOWN: requires ≥ 5 active days in the current window

If active days are insufficient, level remains unchanged regardless of score.

Why? Prevents frustrating level drops from one bad day after vacation. The asymmetry (3 up / 5 down) is intentional — we want upward progress to feel rewarding while downward movement requires sustained evidence of decline.

Provisional flag: Scores calculated with <3 active days are marked "Provisional" in the UI, signaling that the score may be unreliable.

6. Benchmark Data

Transparency note: Sample sizes are small as SkillFocusLab is an early-stage product. We report exact n values for all benchmarks and will update as our user base grows.
Metric Internal Benchmark External Reference Sample Size Collection Period
Average Typing WPM52 WPM40–50 WPM (TypeRacer avg)n=347Jan–Feb 2026
Average Typing Accuracy94.2%92–96% (industry avg)n=347Jan–Feb 2026
Average Writing Readability68/10060–70 (Flesch-Kincaid avg)n=189Jan–Feb 2026
Average Grammar Issues/100 words2.11.5–3.0 (Grammarly reports)n=189Jan–Feb 2026
Average Reading WPM238 WPM200–250 WPM (adult avg)n=94Feb 2026
Baseline Completion Rate67%n=412Jan–Feb 2026
7-Day Consistency (active days)3.1 daysn=256Feb 2026
Median Overall Skill Score34.7n=256Feb 2026

All internal benchmarks are calculated from anonymized, aggregated user data. No individual user data is exposed. External references are from publicly available industry reports and academic literature.

7. Validation & Sample Sizes

Early Cohort Validation

  • Cohort size: n=127 (beta users, Nov 2025 – Jan 2026)
  • Weight validation: A/B tested 3 weight configurations
  • User satisfaction: 78% rated score as "fair or better"
  • Correlation test: Score correlated 0.71 with self-reported skill level

Score Distribution

  • Level 1 (Starter): 23% of users
  • Level 2 (Developing): 38% of users
  • Level 3 (Proficient): 27% of users
  • Level 4 (Advanced): 10% of users
  • Level 5 (Expert): 2% of users

Based on n=256 users, Feb 2026

8. Score Decay & Freshness

// Maintenance cron (weekly):
// Scores not recalculated within 7 days are decayed:
decayed_score = current_score × 0.98

// This applies to all sub-scores independently.
// After decay, level bands are recalculated.
// Expired sessions (>30 days) are cleaned up.
// AI usage logs older than 90 days are purged.

Decay rate: 2% per week for inactive users. This ensures the score reflects current ability rather than historical performance. Active users are never decayed.

9. Known Limitations

Small sample sizes

Our benchmarks are based on early adopters (n=94 to n=412). Distributions will shift as the user base grows. We commit to updating benchmark tables quarterly.

Weight subjectivity

Category weights are partially subjective. While validated against user satisfaction data, there is no universally "correct" weighting for professional skills. We plan to offer customizable weights in a future update.

AI-dependent categories

Resume scoring relies on AI (GPT-4o-mini) analysis, which introduces model-specific biases. We mitigate this with cached, deterministic prompts but acknowledge AI scoring is not equivalent to human expert review.

English-centric

All benchmarks and normalization are calibrated for English text. Non-native English speakers may score lower in writing/reading categories due to language proficiency rather than skill gaps.

10. Methodology Changelog

v2.1 Feb 2026

Added reading category (15% weight), reduced consistency from 15% to 10%. Added stability rule hysteresis. Published benchmark data with sample sizes.

v2.0 Jan 2026

Full rewrite with 5-category model. Introduced weekly rolling window, normalization functions, and level bands. Replaced EMA smoothing with windowed averages.

v1.0 Nov 2025

Initial 3-category model (Typing, Grammar, Focus). Simple EMA-based calculation. No level bands.

Questions about our methodology? Contact our team.

This whitepaper is versioned and maintained on our changelog.