How It Works β Burnout Risk Detection
A detailed look at the science, algorithm, and design decisions behind dev-zen's burnout risk score.
1. Why Burnout Matters for Developers
Burnout is not just "feeling tired." The World Health Organization classified occupational burnout as a syndrome in ICD-11 (2019), characterised by three dimensions: emotional exhaustion, depersonalisation (cynicism toward work), and reduced professional efficacy [1]. Software developers face a disproportionately high risk:
- 40 % of developers reported experiencing burnout in 2024 (GitLab DevSecOps Survey) [2].
- 83 % of developers cited burnout as a contributing factor to security incidents (Haystack Analytics) [3].
- Developer burnout costs companies an estimated $6,200 β $12,000 per employee per year in lost productivity, turnover, and errors [4].
The challenge is that burnout is gradual. People rarely notice the slide from "a tough week" to "I can't look at another pull request." dev-zen tackles this by computing a daily, objective risk signal from data the user already tracks β mood, stress, sleep, hydration, and work hours β turning subjective feelings into an actionable early warning.
2. Research Foundations
Our model draws on three bodies of established research:
2.1 Maslach Burnout Inventory (MBI)
The gold-standard burnout assessment since 1981, the MBI measures emotional exhaustion, depersonalisation, and personal accomplishment via a 22-item questionnaire [5]. While clinically validated, a 22-item survey is too heavy for daily tracking. dev-zen operationalises the MBI's core dimensions through proxy signals:
| MBI Dimension | dev-zen Proxy Signal |
|---|---|
| Emotional Exhaustion | Low mood trend + high stress |
| Depersonalisation | Declining engagement (mood + habit completion) |
| Reduced Efficacy | Elevated work hours without corresponding recovery |
2.2 Job Demands-Resources (JD-R) Model
Bakker & Demerouti's JD-R model (2007) frames burnout as the result of an imbalance between demands (workload, time pressure) and resources (recovery, social support, autonomy) [6]. dev-zen maps this directly:
- Demands: Stress level, work hours
- Resources: Sleep quality, hydration, mood (as a proxy for psychological resource availability)
When demands consistently exceed resources, burnout risk rises.
2.3 Allostatic Load Theory
McEwen's concept of allostatic load (1998) describes the cumulative physiological cost of chronic stress [7]. Sleep deprivation, dehydration, and sustained high cortisol (indicated by user-reported stress) all increase allostatic load. Our algorithm captures this through multiple biological and behavioural channels rather than relying on any single metric.
2.4 Developer-Specific Research
- Prolonged sitting + screen time compounds cognitive fatigue (Thorp et al., 2011) [8].
- Flow state disruption from notifications and context switching creates hidden stress spikes (Mark et al., 2008) [9].
- "Always-on" culture in software teams blurs recovery boundaries (Perlow & Porter, 2009) [10].
These informed our decision to weight recovery signals (sleep, hydration) alongside psychological signals (mood, stress).
3. The Algorithm
3.1 Overview
dev-zen computes a Burnout Risk Score (0β100) once per day using a weighted linear model. The score is deterministic β same inputs always produce the same output β and runs entirely on-device (offline-capable, no data leaves the phone).
Burnout Score = Ξ£ (weight_i Γ normalised_component_i) Γ 100
Higher score = higher burnout risk.
3.2 Input Signals
The algorithm consumes a 7-day rolling window of user-logged data:
| Signal | Source | Raw Range | Aggregation |
|---|---|---|---|
| Mood | Daily mood logs | 1β5 (awful β great) | Mean over last 7 days |
| Stress | Daily stress logs | 1β10 (calm β extreme) | Mean over last 7 days |
| Sleep | Apple Health / manual | Hours per night | Mean over last 7 days |
| Work Hours | User profile (onboarding) | 4β16 hours/day | Static value |
| Hydration | Water log entries | Glasses per day | Mean over days with logs |
Why 7 days? A 7-day window captures short-term trends (a bad week) without being diluted by older data. It also aligns with standard epidemiological practice for self-reported wellbeing measures [11]. Recomputation happens on each dashboard load, so the score is always current.
3.3 Normalisation
Each signal is normalised to a 0β1 scale where 1 = worst case (maximum burnout contribution) and 0 = best case (no burnout contribution):
| Component | Formula | Worst Case (β 1) | Best Case (β 0) |
|---|---|---|---|
| Mood | (5 - moodAvg) / 4 |
Mood = 1 (awful) | Mood = 5 (great) |
| Stress | (stressAvg - 1) / 9 |
Stress = 10 (extreme) | Stress = 1 (calm) |
| Work | (workHours - 4) / 12 |
16 hours/day | 4 hours/day |
| Sleep | 1 - (sleepAvg - 5) / 4 |
β€ 5 hours/night | β₯ 9 hours/night |
| Hydration | 1 - (waterAvg / waterGoal) |
0 glasses | Meeting/exceeding goal |
All values are clamped to [0, 1] after computation.
3.4 Weights
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Weight Distribution β
β β
β βββββββββββββββ Mood 30 % β
β βββββββββββββββ Stress 30 % β
β ββββββββββ Work Hours 20 % β
β βββββ Sleep 10 % β
β βββββ Hydration 10 % β
β β
β Psychological βββββββ 60 % β
β Behavioural / Physical 40 % β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Rationale for the 30-30-20-10-10 distribution:
Mood (30 %) and Stress (30 %): Psychological signals are the strongest predictors of burnout per MBI and JD-R literature. Mood captures the exhaustion axis; stress captures the demands axis. Together they account for 60 % of the score, reflecting that burnout is fundamentally a psychological phenomenon.
Work Hours (20 %): The single strongest objective demand signal. Research consistently links >50 hours/week with elevated burnout risk (Kodz et al., 2003) [12]. Weighted lower than mood/stress because perception of workload matters more than raw hours β some developers thrive on long hours if the work is engaging.
Sleep (10 %): Sleep deprivation impairs emotional regulation and amplifies perceived stress (Walker, 2017) [13]. Weighted at 10 % because the current input is coarse (hours only, no quality metrics) and Apple Health integration is pending.
Hydration (10 %): Even mild dehydration (1β2 % body mass loss) impairs cognitive performance and mood (Ganio et al., 2011) [14]. Weighted lowest because the relationship to burnout is indirect β it's a recovery/self-care proxy more than a direct burnout driver.
3.5 Final Score Computation
score = (mood_weight Γ mood_norm
+ stress_weight Γ stress_norm
+ work_weight Γ work_norm
+ sleep_weight Γ sleep_norm
+ hydration_weight Γ hydration_norm) Γ 100
Result is rounded to the nearest integer and clamped to [0, 100].
3.6 Worked Examples
Example A β Healthy Developer
| Signal | Value | Normalised |
|---|---|---|
| Mood avg | 4.2 / 5 | (5 - 4.2) / 4 = 0.20 |
| Stress avg | 3.0 / 10 | (3.0 - 1) / 9 = 0.22 |
| Work hours | 8 h | (8 - 4) / 12 = 0.33 |
| Sleep avg | 7.5 h | 1 - (7.5 - 5) / 4 = 0.375 |
| Hydration avg | 7 / 8 goal | 1 - 7/8 = 0.125 |
Score = (0.30 Γ 0.20 + 0.30 Γ 0.22 + 0.20 Γ 0.33 + 0.10 Γ 0.375 + 0.10 Γ 0.125) Γ 100
= (0.060 + 0.066 + 0.066 + 0.0375 + 0.0125) Γ 100
= 0.242 Γ 100
= 24 β Low Risk β
Example B β Developer Approaching Burnout
| Signal | Value | Normalised |
|---|---|---|
| Mood avg | 2.1 / 5 | (5 - 2.1) / 4 = 0.725 |
| Stress avg | 7.5 / 10 | (7.5 - 1) / 9 = 0.722 |
| Work hours | 12 h | (12 - 4) / 12 = 0.667 |
| Sleep avg | 5.5 h | 1 - (5.5 - 5) / 4 = 0.875 |
| Hydration avg | 3 / 8 goal | 1 - 3/8 = 0.625 |
Score = (0.30 Γ 0.725 + 0.30 Γ 0.722 + 0.20 Γ 0.667 + 0.10 Γ 0.875 + 0.10 Γ 0.625) Γ 100
= (0.2175 + 0.2166 + 0.1334 + 0.0875 + 0.0625) Γ 100
= 0.7175 Γ 100
= 72 β High Risk π΄
Primary driver: Mood (highest weighted contribution at 0.2175).
4. Risk Levels
The continuous 0β100 score is mapped to three risk levels for user-facing communication:
| Score Range | Risk Level | Colour | User Message |
|---|---|---|---|
| 0 β 39 | Low | Teal (#18B89A) | "Great work β keep up your healthy routines." |
| 40 β 69 | Moderate | Amber (#E8940A) | "Watch your stress and sleep this week." |
| 70 β 100 | High | Red (#E84848) | "Take a break. Your burnout risk is elevated." |
Why three buckets, not five or a continuous gradient?
Burnout research shows that people respond better to categorical risk labels than to precise numbers for behaviour change (Weinstein, 1999) [15]. Three levels are actionable: "you're fine", "pay attention", or "intervene now." More granularity creates decision paralysis; fewer levels loses the "watch out" middle ground.
5. Primary Driver Identification
Beyond the aggregate score, the algorithm identifies the primary driver β the single component contributing the most weighted risk. This is the component with the highest value of (weight_i Γ normalised_component_i).
This powers the actionable insight: "Your burnout risk is Moderate. Main driver: stress. Consider taking a 10-minute walk between meetings."
When all inputs are null (new user, no data yet), primaryDriver is null and the UI shows a prompt to start logging.
6. Handling Missing Data
Real users don't log perfectly. The algorithm gracefully handles null (missing) inputs:
| Missing Signal | Default Normalised Value | Rationale |
|---|---|---|
| Mood | 0.4 | Slightly pessimistic β nudges user to log |
| Stress | 0.3 | Conservative β assumes some baseline stress |
| Sleep | 0.3 | Conservative β assumes some sleep deficit |
| Hydration | 0.3 | Conservative β assumes mild underhydration |
Design principle: "absent data is not good data." We deliberately avoid defaulting to zero (which would imply "everything is fine") because a user who stops logging may be disengaging β itself a burnout signal. The conservative defaults produce a mildly elevated score that encourages the user to resume logging without triggering false alarms.
7. Data Pipeline
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β User Logs β β 7-Day Agg β β Algorithm β β Dashboard β
β (mood, βββββΆβ (averages βββββΆβ (normalise, βββββΆβ (score, β
β stress, β β per signal) β β weight, sum) β β risk label, β
β water) β β β β β β driver) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β β β
βΌ βΌ βΌ
WatermelonDB Pure function UI Card +
(mood_logs, (src/lib/ Wellbeing
stress_logs, burnout.ts) Ring
water_logs) β
βΌ
burnout_scores table
(daily snapshot for
trend analysis)
- Collection: User logs mood (1β5), stress (1β10), and water (glasses) throughout the day.
- Aggregation: The
useTodayData()hook queries WatermelonDB for the last 7 days and computes means. - Computation:
calculateBurnoutScore()β a pure function β normalises, weights, and sums. - Persistence: Result is stored in
burnout_scorestable with all five normalised components, enabling historical trend charts. - Display: The Today dashboard renders a colour-coded card with score, risk label, contextual message, and a wellbeing progress ring (inverted score).
8. Limitations & Future Improvements
Current Limitations
| Limitation | Impact | Planned Mitigation |
|---|---|---|
| Sleep data is null | 10 % of the model runs on a default value | Apple Health integration (Sprint 4) |
| Work hours are static | Doesn't capture overtime spikes | Calendar/time-tracking API integration |
| Linear model | Cannot capture nonlinear interactions (e.g., low sleep + high stress compounds worse than the sum) | TFLite on-device ML model (v2) |
| Self-reported data | Subject to mood-congruent bias (stressed people rate stress higher) | Passive sensors (HRV, screen time) |
| No temporal weighting | Day 1 and day 7 contribute equally | Exponential decay weighting |
Roadmap (v2 β ML-Based)
The current rule-based model is a deliberate v1 choice: interpretable, debuggable, and privacy-safe. The v2 upgrade path involves:
- On-device TFLite model trained on anonymised, opt-in data.
- HRV (Heart Rate Variability) from Apple Watch / Google Health as a physiological stress indicator.
- Screen time as a passive work-hours proxy.
- Temporal weighting: Recent days count more (exponential decay with Ξ» = 0.85).
- Interaction terms: Sleep Γ Stress interaction to capture compounding effects.
- Personalised baselines: Individual thresholds learned from each user's historical data, rather than fixed cutoffs.
9. Privacy & Ethics
- All computation is on-device. No health data leaves the phone. The burnout score is computed locally using WatermelonDB (SQLite on iOS/Android, LokiJS on web).
- No external ML inference. The rule-based model requires zero network access.
- User controls all data. Logs can be deleted at any time; the score recomputes from available data.
- No employer visibility. dev-zen is a personal tool β there is no team dashboard, manager view, or data sharing.
- Conservative defaults. Missing data nudges the score up slightly (encouraging logging) rather than down (falsely reassuring).
10. Validation & Testing
The algorithm is covered by a comprehensive test suite (src/lib/__tests__/burnout.test.ts):
- Boundary tests: Worst-case inputs β score β₯ 70; best-case inputs β score < 10.
- Range validation: Score always in [0, 100]; components always in [0, 1].
- Risk level mapping: Verified at boundary values (39 β low, 40 β moderate, 69 β moderate, 70 β high).
- Sensitivity tests: Each component independently raises the score when worsened.
- Determinism: Same inputs β identical output (pure function, no side effects).
- Null handling: All-null inputs β valid result with null primaryDriver; partial nulls β valid score with identified driver.
References
[1] World Health Organization. (2019). International Classification of Diseases 11th Revision (ICD-11). Burn-out (QD85). https://icd.who.int/browse/2024-01/mms/en#129180281
[2] GitLab. (2024). 2024 Global DevSecOps Report. https://about.gitlab.com/developer-survey/
[3] Haystack Analytics. (2023). Developer burnout and its impact on software security.
[4] Maslach, C., & Leiter, M. P. (2016). Understanding the burnout experience: Recent research and its implications for psychiatry. World Psychiatry, 15(2), 103β111.
[5] Maslach, C., Jackson, S. E., & Leiter, M. P. (1996). Maslach Burnout Inventory Manual (3rd ed.). Consulting Psychologists Press.
[6] Bakker, A. B., & Demerouti, E. (2007). The Job Demands-Resources model: State of the art. Journal of Managerial Psychology, 22(3), 309β328.
[7] McEwen, B. S. (1998). Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York Academy of Sciences, 840(1), 33β44.
[8] Thorp, A. A., et al. (2011). Sedentary behaviors and subsequent health outcomes in adults. American Journal of Preventive Medicine, 41(2), 207β215.
[9] Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 107β110.
[10] Perlow, L. A., & Porter, J. L. (2009). Making time off predictable β and required. Harvard Business Review, 87(10), 102β109.
[11] Stone, A. A., et al. (1998). A comparison of coping assessed by ecological momentary assessment and retrospective recall. Journal of Personality and Social Psychology, 74(6), 1670β1680.
[12] Kodz, J., et al. (2003). Working Long Hours: A Review of the Evidence. UK Department of Trade and Industry.
[13] Walker, M. (2017). Why We Sleep: Unlocking the Power of Sleep and Dreams. Scribner.
[14] Ganio, M. S., et al. (2011). Mild dehydration impairs cognitive performance and mood of men. British Journal of Nutrition, 106(10), 1535β1543.
[15] Weinstein, N. D. (1999). What does it mean to understand a risk? Evaluating risk comprehension. Journal of the National Cancer Institute Monographs, 25, 15β20.