Data-Driven HR Strategy to Boost Retention & Performance

Introduction

In today’s fast-paced work environments, productivity is often judged by surface-level metrics like tasks completed or hours logged. While useful, these numbers don’t tell the whole story. They overlook important human factors like job satisfaction, training, and whether employees feel engaged in their work.

This article explores how data analytics and machine learning can offer a deeper look into what really drives productivity — and why some employees choose to leave. We go beyond basic performance stats by creating a Comprehensive Productivity Index (CPI), predicting who might resign, and grouping employees into behavioral segments using clustering. Finally, we use SHAP, a tool that helps explain AI decisions, to understand why the model makes certain predictions.

Whether you’re in HR, leadership, or analytics, this guide shows how modern data tools can help spot problems early, support your team better, and improve retention.

Data & Methodology

We used an open-source dataset from Kaggle that includes anonymized information on employees’ performance, satisfaction, and resignation status. The dataset includes:

  • Hours worked
  • Projects handled
  • Satisfaction scores
  • Monthly salary
  • Training hours
  • Years at the company
Nine-step HR analytics pipeline illustrating the data analysis process from data collection to improving productivity and retention.

Step 1: Creating a Comprehensive Productivity Index (CPI)

To understand productivity more holistically, we created a Comprehensive Productivity Index (CPI). Instead of relying on one measure, the CPI blends multiple aspects of performance:

  • Projects per hour worked (output efficiency)
  • Performance score per salary (value for investment)
  • Satisfaction score (engagement)
  • Training hours per year at the company (growth mindset)

Each factor was weighted equally, and the combined score (CPI) allowed us to fairly compare employees across roles and tenure.

Code used to create CPI for data-driven HR strategy.
Python code snippet for CPI calculation

A histogram of CPI scores revealed a typical “long tail”: most employees had moderate productivity, with a few high-performing outliers.

Histogram plot visualizing the distribution of CPI for data-driven HR strategy.
Distribution of Comprehensive Productivity Index (CPI) across employees — most are moderate, with a few outliers.

Step 2: Predicting Employee Turnover

Next, we used machine learning models to predict who might resign. We tested the following models:

  • Decision Tree
  • Random Forest
  • Gradient Boosting

We trained each model using 80% of the data and tested on the remaining 20%. The Gradient Boosting model had the best performance.

Step 3: Segmenting Employees Using Clustering

To gain deeper insight into the workforce, we applied K-Means clustering — a technique that groups To uTo better understand workforce behaviors, we applied K-Means clustering. We grouped employees based on:

  • CPI
  • Satisfaction score
  • Training hours
  • Years at company

The clustering revealed three key employee segments with overlapping but distinct patterns.

Step 4: Explaining Predictions with SHAP

Machine learning can often feel like a “black box.” To interpret our model’s predictions, we used SHAP (SHapley Additive Explanations). SHAP shows how much each feature contributes to an individual prediction.

Sample SHAP summary plot showed:

CPI had some influence, but less than emotional engagement.

Satisfaction had the largest impact — lower scores significantly increased resignation risk.

Training hours showed a mixed effect — lots of training without support could signal burnout.

Understanding the CPI: A Smarter Way to Measure Productivity

Productivity is more than just hours or task counts. The Comprehensive Productivity Index (CPI) lets us look at multiple aspects of performance:

FactorWhat It Tells Us
Projects per hourOutput efficiency
Performance per salaryValue for investment
Satisfaction scoreEmotional engagement
Training per yearGrowth and development effort

This blended score gives companies a more meaningful way to identify high performers and spot areas for improvement.

Predicting Resignations with Machine Learning

Top predictors of resignation:

  • Employee satisfaction
  • Training hours
  • Years at the company

CPI alone wasn’t a strong predictor, emphasizing that engagement and support matter more than productivity alone.

Key Insight: Even high performers are at risk of leaving if they’re not satisfied or well-supported.

Segmenting the Workforce: The 3 Employee Types We Found

Cluster plot visualizing the different employee segments. To emphasize the importance of data-driven HR strategy.
Clusters of employees based on productivity, satisfaction, training, and tenure. While some overlap exists, distinct segments emerge, supporting tailored workforce strategies.

With clustering, we identified three overlapping but informative employee segments. While the boundaries between groups weren’t sharply separated, the analysis still revealed meaningful patterns in productivity, satisfaction, and training behaviors.

Table showing three employee clusters with metric including CPI, satisfaction score, training hours, and years at company.
Average values of CPI, satisfaction, training, and tenure across employee clusters.

Cluster 0: High-Output Newcomers

  • Very high CPI (12.15)
  • Moderate satisfaction (3.03)
  • High training hours (~72)
  • New to the company (less than 1 year)

These are highly trained, very productive newcomers. But their satisfaction isn’t yet high — possibly because they’re still adjusting or under pressure.

Action: Focus on onboarding support and early engagement to prevent burnout.

Cluster 1: Stable Contributors

  • Moderate CPI (3.21)
  • High satisfaction (4.01)
  • Consistent training (~45 hours)
  • Long tenure (5+ years)

These are reliable, satisfied employees with experience and balanced productivity.

Action: Keep them engaged with growth opportunities and recognition — they’re your core workforce.

Cluster 2: Disengaged Veterans

  • Lowest CPI (2.68)
  • Very low satisfaction (1.99)
  • Similar training and tenure as Cluster 1

Despite being with the company a long time, these employees are unhappy and underperforming.

Action: Consider coaching, role shifts, or deeper engagement strategies to reconnect them.

Key Takeaway: Tenure and training don’t guarantee performance or satisfaction. How employees feel about their work is just as important as how long they’ve been doing it.

Making Predictions Explainable with SHAP

To go deeper into why someone might leave, we used SHAP values. These show how much each factor (like satisfaction or training) influenced the model’s prediction.

SHAP summary plot visualizing the impact of key features such as satisfaction score, training hours, and work hours on employee attrition risk. For the emphasis of data-driven HR strategy.
SHAP summary plot showing the impact of each feature on resignation predictions. Employee satisfaction had the strongest effect with — low satisfaction strongly increasing turnover risk.

SHAP Insights:

SHAP helped us understand why certain employees were predicted to resign:

  • Low satisfaction → higher resignation risk
  • Excessive training without support → possible burnout
  • CPI and tenure had weaker influence

Final Recommendations

Based on everything we learned, here’s how organizations can use data to improve productivity and retention:

Focus on Employee Satisfaction

  • High satisfaction keeps people around — even more than productivity.
  • Use regular surveys and open communication to catch dissatisfaction early.

Train with Purpose

  • Training is valuable, but it needs to align with career goals.
  • Avoid overwhelming employees with training if they’re not supported emotionally or professionally.

Customize You HR Strategy

  • Use employee segments to tailor support.
  • Newcomers need onboarding.
  • Veterans may need fresh challenges.
  • Stable contributors may be ready for leadership or stretch roles.

Use Predictive Tools Proactively

  • Predictive models and SHAP help you spot flight risks early.
  • Use data not just to understand who left — but who might leave next.

Key Takeaways

  • Surface metrics don’t capture real productivity — CPI does.
  • High performers can still leave if they’re disengaged.
  • Employee clusters reveal meaningful patterns for tailored HR action.
  • SHAP makes AI decisions transparent and actionable.

Conclusion

This project underscores the power of data-driven insight in understanding productivity, engagement, and retention. But turning these insights into action takes more than tools — it takes strategy.

At Productivity Advocates, we help organizations look past surface-level KPIs and uncover the human stories behind the data. Through advanced analytics like clustering, predictive modeling, and SHAP, we empower leaders to:

  • Identify at-risk employees before it’s too late
  • Tailor engagement strategies to different employee types
  • Improve training, morale, and retention

Don’t let guesswork drive your workforce strategy. Use data science with a human lens to build a stronger, more sustainable team.

Partner with Productivity Advocates. Let your data tell the story — and make decisions that matter.

Image Credits

All images, charts, and graphics in this article were created by Stephanie M. Liechty using Google Colab and Canva

Also View

Premium Maintenance Plan





    Bright Maintenance Plan





      Basic Maintenance Plan





        Template Site Details





          Free Mobile App Estimate





            Free Website Estimate