dev-zen

How It Works β€” Burnout Risk Detection

A detailed look at the science, algorithm, and design decisions behind dev-zen's burnout risk score.


1. Why Burnout Matters for Developers

Burnout is not just "feeling tired." The World Health Organization classified occupational burnout as a syndrome in ICD-11 (2019), characterised by three dimensions: emotional exhaustion, depersonalisation (cynicism toward work), and reduced professional efficacy [1]. Software developers face a disproportionately high risk:

The challenge is that burnout is gradual. People rarely notice the slide from "a tough week" to "I can't look at another pull request." dev-zen tackles this by computing a daily, objective risk signal from data the user already tracks β€” mood, stress, sleep, hydration, and work hours β€” turning subjective feelings into an actionable early warning.


2. Research Foundations

Our model draws on three bodies of established research:

2.1 Maslach Burnout Inventory (MBI)

The gold-standard burnout assessment since 1981, the MBI measures emotional exhaustion, depersonalisation, and personal accomplishment via a 22-item questionnaire [5]. While clinically validated, a 22-item survey is too heavy for daily tracking. dev-zen operationalises the MBI's core dimensions through proxy signals:

MBI Dimension dev-zen Proxy Signal
Emotional Exhaustion Low mood trend + high stress
Depersonalisation Declining engagement (mood + habit completion)
Reduced Efficacy Elevated work hours without corresponding recovery

2.2 Job Demands-Resources (JD-R) Model

Bakker & Demerouti's JD-R model (2007) frames burnout as the result of an imbalance between demands (workload, time pressure) and resources (recovery, social support, autonomy) [6]. dev-zen maps this directly:

When demands consistently exceed resources, burnout risk rises.

2.3 Allostatic Load Theory

McEwen's concept of allostatic load (1998) describes the cumulative physiological cost of chronic stress [7]. Sleep deprivation, dehydration, and sustained high cortisol (indicated by user-reported stress) all increase allostatic load. Our algorithm captures this through multiple biological and behavioural channels rather than relying on any single metric.

2.4 Developer-Specific Research

These informed our decision to weight recovery signals (sleep, hydration) alongside psychological signals (mood, stress).


3. The Algorithm

3.1 Overview

dev-zen computes a Burnout Risk Score (0–100) once per day using a weighted linear model. The score is deterministic β€” same inputs always produce the same output β€” and runs entirely on-device (offline-capable, no data leaves the phone).

Burnout Score = Ξ£ (weight_i Γ— normalised_component_i) Γ— 100

Higher score = higher burnout risk.

3.2 Input Signals

The algorithm consumes a 7-day rolling window of user-logged data:

Signal Source Raw Range Aggregation
Mood Daily mood logs 1–5 (awful β†’ great) Mean over last 7 days
Stress Daily stress logs 1–10 (calm β†’ extreme) Mean over last 7 days
Sleep Apple Health / manual Hours per night Mean over last 7 days
Work Hours User profile (onboarding) 4–16 hours/day Static value
Hydration Water log entries Glasses per day Mean over days with logs

Why 7 days? A 7-day window captures short-term trends (a bad week) without being diluted by older data. It also aligns with standard epidemiological practice for self-reported wellbeing measures [11]. Recomputation happens on each dashboard load, so the score is always current.

3.3 Normalisation

Each signal is normalised to a 0–1 scale where 1 = worst case (maximum burnout contribution) and 0 = best case (no burnout contribution):

Component Formula Worst Case (β†’ 1) Best Case (β†’ 0)
Mood (5 - moodAvg) / 4 Mood = 1 (awful) Mood = 5 (great)
Stress (stressAvg - 1) / 9 Stress = 10 (extreme) Stress = 1 (calm)
Work (workHours - 4) / 12 16 hours/day 4 hours/day
Sleep 1 - (sleepAvg - 5) / 4 ≀ 5 hours/night β‰₯ 9 hours/night
Hydration 1 - (waterAvg / waterGoal) 0 glasses Meeting/exceeding goal

All values are clamped to [0, 1] after computation.

3.4 Weights

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Weight Distribution                    β”‚
β”‚                                                       β”‚
β”‚   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  Mood          30 %                 β”‚
β”‚   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  Stress        30 %                 β”‚
β”‚   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ       Work Hours    20 %                 β”‚
β”‚   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            Sleep         10 %                 β”‚
β”‚   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            Hydration     10 %                 β”‚
β”‚                                                       β”‚
β”‚   Psychological ─────── 60 %                          β”‚
β”‚   Behavioural / Physical 40 %                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Rationale for the 30-30-20-10-10 distribution:

3.5 Final Score Computation

score = (mood_weight Γ— mood_norm
       + stress_weight Γ— stress_norm
       + work_weight Γ— work_norm
       + sleep_weight Γ— sleep_norm
       + hydration_weight Γ— hydration_norm) Γ— 100

Result is rounded to the nearest integer and clamped to [0, 100].

3.6 Worked Examples

Example A β€” Healthy Developer

Signal Value Normalised
Mood avg 4.2 / 5 (5 - 4.2) / 4 = 0.20
Stress avg 3.0 / 10 (3.0 - 1) / 9 = 0.22
Work hours 8 h (8 - 4) / 12 = 0.33
Sleep avg 7.5 h 1 - (7.5 - 5) / 4 = 0.375
Hydration avg 7 / 8 goal 1 - 7/8 = 0.125
Score = (0.30 Γ— 0.20 + 0.30 Γ— 0.22 + 0.20 Γ— 0.33 + 0.10 Γ— 0.375 + 0.10 Γ— 0.125) Γ— 100
      = (0.060 + 0.066 + 0.066 + 0.0375 + 0.0125) Γ— 100
      = 0.242 Γ— 100
      = 24   β†’ Low Risk βœ…

Example B β€” Developer Approaching Burnout

Signal Value Normalised
Mood avg 2.1 / 5 (5 - 2.1) / 4 = 0.725
Stress avg 7.5 / 10 (7.5 - 1) / 9 = 0.722
Work hours 12 h (12 - 4) / 12 = 0.667
Sleep avg 5.5 h 1 - (5.5 - 5) / 4 = 0.875
Hydration avg 3 / 8 goal 1 - 3/8 = 0.625
Score = (0.30 Γ— 0.725 + 0.30 Γ— 0.722 + 0.20 Γ— 0.667 + 0.10 Γ— 0.875 + 0.10 Γ— 0.625) Γ— 100
      = (0.2175 + 0.2166 + 0.1334 + 0.0875 + 0.0625) Γ— 100
      = 0.7175 Γ— 100
      = 72   β†’ High Risk πŸ”΄

Primary driver: Mood (highest weighted contribution at 0.2175).


4. Risk Levels

The continuous 0–100 score is mapped to three risk levels for user-facing communication:

Score Range Risk Level Colour User Message
0 – 39 Low Teal (#18B89A) "Great work β€” keep up your healthy routines."
40 – 69 Moderate Amber (#E8940A) "Watch your stress and sleep this week."
70 – 100 High Red (#E84848) "Take a break. Your burnout risk is elevated."

Why three buckets, not five or a continuous gradient?

Burnout research shows that people respond better to categorical risk labels than to precise numbers for behaviour change (Weinstein, 1999) [15]. Three levels are actionable: "you're fine", "pay attention", or "intervene now." More granularity creates decision paralysis; fewer levels loses the "watch out" middle ground.


5. Primary Driver Identification

Beyond the aggregate score, the algorithm identifies the primary driver β€” the single component contributing the most weighted risk. This is the component with the highest value of (weight_i Γ— normalised_component_i).

This powers the actionable insight: "Your burnout risk is Moderate. Main driver: stress. Consider taking a 10-minute walk between meetings."

When all inputs are null (new user, no data yet), primaryDriver is null and the UI shows a prompt to start logging.


6. Handling Missing Data

Real users don't log perfectly. The algorithm gracefully handles null (missing) inputs:

Missing Signal Default Normalised Value Rationale
Mood 0.4 Slightly pessimistic β€” nudges user to log
Stress 0.3 Conservative β€” assumes some baseline stress
Sleep 0.3 Conservative β€” assumes some sleep deficit
Hydration 0.3 Conservative β€” assumes mild underhydration

Design principle: "absent data is not good data." We deliberately avoid defaulting to zero (which would imply "everything is fine") because a user who stops logging may be disengaging β€” itself a burnout signal. The conservative defaults produce a mildly elevated score that encourages the user to resume logging without triggering false alarms.


7. Data Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Logs   β”‚    β”‚  7-Day Agg   β”‚    β”‚  Algorithm    β”‚    β”‚  Dashboard    β”‚
β”‚  (mood,      │───▢│  (averages   │───▢│  (normalise,  │───▢│  (score,      β”‚
β”‚  stress,     β”‚    β”‚  per signal) β”‚    β”‚  weight, sum) β”‚    β”‚  risk label,  β”‚
β”‚  water)      β”‚    β”‚              β”‚    β”‚              β”‚    β”‚  driver)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                       β”‚                     β”‚
       β–Ό                                       β–Ό                     β–Ό
  WatermelonDB                          Pure function           UI Card +
  (mood_logs,                           (src/lib/               Wellbeing
   stress_logs,                          burnout.ts)             Ring
   water_logs)                                β”‚
                                              β–Ό
                                     burnout_scores table
                                     (daily snapshot for
                                      trend analysis)
  1. Collection: User logs mood (1–5), stress (1–10), and water (glasses) throughout the day.
  2. Aggregation: The useTodayData() hook queries WatermelonDB for the last 7 days and computes means.
  3. Computation: calculateBurnoutScore() β€” a pure function β€” normalises, weights, and sums.
  4. Persistence: Result is stored in burnout_scores table with all five normalised components, enabling historical trend charts.
  5. Display: The Today dashboard renders a colour-coded card with score, risk label, contextual message, and a wellbeing progress ring (inverted score).

8. Limitations & Future Improvements

Current Limitations

Limitation Impact Planned Mitigation
Sleep data is null 10 % of the model runs on a default value Apple Health integration (Sprint 4)
Work hours are static Doesn't capture overtime spikes Calendar/time-tracking API integration
Linear model Cannot capture nonlinear interactions (e.g., low sleep + high stress compounds worse than the sum) TFLite on-device ML model (v2)
Self-reported data Subject to mood-congruent bias (stressed people rate stress higher) Passive sensors (HRV, screen time)
No temporal weighting Day 1 and day 7 contribute equally Exponential decay weighting

Roadmap (v2 β€” ML-Based)

The current rule-based model is a deliberate v1 choice: interpretable, debuggable, and privacy-safe. The v2 upgrade path involves:

  1. On-device TFLite model trained on anonymised, opt-in data.
  2. HRV (Heart Rate Variability) from Apple Watch / Google Health as a physiological stress indicator.
  3. Screen time as a passive work-hours proxy.
  4. Temporal weighting: Recent days count more (exponential decay with Ξ» = 0.85).
  5. Interaction terms: Sleep Γ— Stress interaction to capture compounding effects.
  6. Personalised baselines: Individual thresholds learned from each user's historical data, rather than fixed cutoffs.

9. Privacy & Ethics


10. Validation & Testing

The algorithm is covered by a comprehensive test suite (src/lib/__tests__/burnout.test.ts):


References

[1] World Health Organization. (2019). International Classification of Diseases 11th Revision (ICD-11). Burn-out (QD85). https://icd.who.int/browse/2024-01/mms/en#129180281

[2] GitLab. (2024). 2024 Global DevSecOps Report. https://about.gitlab.com/developer-survey/

[3] Haystack Analytics. (2023). Developer burnout and its impact on software security.

[4] Maslach, C., & Leiter, M. P. (2016). Understanding the burnout experience: Recent research and its implications for psychiatry. World Psychiatry, 15(2), 103–111.

[5] Maslach, C., Jackson, S. E., & Leiter, M. P. (1996). Maslach Burnout Inventory Manual (3rd ed.). Consulting Psychologists Press.

[6] Bakker, A. B., & Demerouti, E. (2007). The Job Demands-Resources model: State of the art. Journal of Managerial Psychology, 22(3), 309–328.

[7] McEwen, B. S. (1998). Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York Academy of Sciences, 840(1), 33–44.

[8] Thorp, A. A., et al. (2011). Sedentary behaviors and subsequent health outcomes in adults. American Journal of Preventive Medicine, 41(2), 207–215.

[9] Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 107–110.

[10] Perlow, L. A., & Porter, J. L. (2009). Making time off predictable β€” and required. Harvard Business Review, 87(10), 102–109.

[11] Stone, A. A., et al. (1998). A comparison of coping assessed by ecological momentary assessment and retrospective recall. Journal of Personality and Social Psychology, 74(6), 1670–1680.

[12] Kodz, J., et al. (2003). Working Long Hours: A Review of the Evidence. UK Department of Trade and Industry.

[13] Walker, M. (2017). Why We Sleep: Unlocking the Power of Sleep and Dreams. Scribner.

[14] Ganio, M. S., et al. (2011). Mild dehydration impairs cognitive performance and mood of men. British Journal of Nutrition, 106(10), 1535–1543.

[15] Weinstein, N. D. (1999). What does it mean to understand a risk? Evaluating risk comprehension. Journal of the National Cancer Institute Monographs, 25, 15–20.