{"id":2152,"date":"2017-01-05T10:18:50","date_gmt":"2017-01-05T15:18:50","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2152"},"modified":"2018-03-24T02:34:14","modified_gmt":"2018-03-24T07:34:14","slug":"predicting-when-people-quit-their-jobs","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2017\/01\/05\/predicting-when-people-quit-their-jobs\/","title":{"rendered":"Predicting When People Quit Their Jobs"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>2017 is upon us and that means some of you may be going into your annual review or thinking about your career after graduation. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a> used machine learning on a job-related dataset for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Predictive_analytics\">predictive analytics<\/a>. Let's see what he learned.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/hr_analytics.jpg\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#d8b06063-22a7-4d28-ad2c-e28f9bc1b3f2\">Dataset<\/a><\/li><li><a href=\"#30aa61e9-c83c-4f46-ad9e-e900ce9eb909\">Big Picture - How Bad Is Turnover at This Company?<\/a><\/li><li><a href=\"#53159959-d07b-4fca-bb41-c166c2be0794\">Defining Who Are the \"Best\"<\/a><\/li><li><a href=\"#02c689ab-974e-45d6-98c9-32b551de4825\">Defining Who Are the \"Most Experienced\"<\/a><\/li><li><a href=\"#a32df18d-ffe4-47f9-95c2-08f97bc3c87d\">Job Satisfaction Among High Risk Group<\/a><\/li><li><a href=\"#b0ad3e01-9dcf-4db7-b213-8bf1245ac265\">Was It for Money?<\/a><\/li><li><a href=\"#17f9139b-34a9-46be-a99a-9efd21b5a4c5\">Making Insights Actionable with Predictive Analytics<\/a><\/li><li><a href=\"#8a14240e-ff0a-4af7-9950-fcaf6ef00d74\">Evaluating the Predictive Performance<\/a><\/li><li><a href=\"#347ef57f-8ac9-4b2a-962f-b42154fcd703\">Using the Model to Take Action<\/a><\/li><li><a href=\"#13ec5457-e0fb-4d5a-9e76-08fa969743cc\">Explaining the Model<\/a><\/li><li><a href=\"#22a3cf56-02c0-4461-8bc0-178c58471277\">Operationalizing Action<\/a><\/li><li><a href=\"#92694283-dca5-44e1-8833-62a174f83b27\">Summary<\/a><\/li><\/ul><\/div><p>Companies spend money and time recruiting talent and they lose all that investment when people leave. Therefore companies can save money if they can intervene before their employees leave. Perhaps this is a sign of a robust economy, that one of the datasets popular on Kaggle deals with this issue: <a href=\"\">Human Resources Analytics - Why are our best and most experienced employees leaving prematurely?<\/a>  Note that this dataset is no longer available on Kaggle.<\/p><p>This is an example of <b>predictive analytics<\/b> where you try to predict future events based on the historical data using machine learning algorithms. When people talk about predictive analytics, you hear often that the key is to turn insights into action. What does that really mean? Let's also examine this question through exploration of this dataset.<\/p><h4>Dataset<a name=\"d8b06063-22a7-4d28-ad2c-e28f9bc1b3f2\"><\/a><\/h4><p>The Kaggle page says \"our example concerns a big company that wants to understand why some of their best and most experienced employees are leaving prematurely. The company also wishes to predict which valuable employees will leave next.\"<\/p><p>The fields in the dataset include:<\/p><div><ul><li>Employee satisfaction level, scaling 0 to 1<\/li><li>Last evaluation, scaling 0 to 1<\/li><li>Number of projects<\/li><li>Average monthly hours<\/li><li>Time spent at the company in years<\/li><li>Whether they have had a work accident<\/li><li>Whether they have had a promotion in the last 5 years<\/li><li>Sales (which actually means job function)<\/li><li>Salary - low, medium or high<\/li><li>Whether the employee has left<\/li><\/ul><\/div><p>Let's load it into MATLAB. The new <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/detectimportoptions.html\">detectImportOptions<\/a><\/tt> makes it easy to set up import options based on the file content.<\/p><pre class=\"codeinput\">opts = detectImportOptions(<span class=\"string\">'HR_comma_sep.csv'<\/span>);     <span class=\"comment\">% set import options<\/span>\r\nopts.VariableTypes(9:10) = {<span class=\"string\">'categorical'<\/span>};         <span class=\"comment\">% turn text to categorical<\/span>\r\ncsv = readtable(<span class=\"string\">'HR_comma_sep.csv'<\/span>, opts);          <span class=\"comment\">% import data<\/span>\r\nfprintf(<span class=\"string\">'Number of rows: %d\\n'<\/span>,height(csv))         <span class=\"comment\">% show number of rows<\/span>\r\n<\/pre><pre class=\"codeoutput\">Number of rows: 14999\r\n<\/pre><p>We will then hold out 10% of the dataset for model evaluation (<tt>holdout<\/tt>), and use the remaining 90% (<tt>train<\/tt>) to explore the data and train predictive models.<\/p><pre class=\"codeinput\">rng(1)                                              <span class=\"comment\">% for reproducibility<\/span>\r\nc = cvpartition(csv.left,<span class=\"string\">'HoldOut'<\/span>,0.1);            <span class=\"comment\">% partition data<\/span>\r\ntrain = csv(training(c),:);                         <span class=\"comment\">% for training<\/span>\r\nholdout = csv(test(c),:);                           <span class=\"comment\">% for model evaluation<\/span>\r\n<\/pre><h4>Big Picture - How Bad Is Turnover at This Company?<a name=\"30aa61e9-c83c-4f46-ad9e-e900ce9eb909\"><\/a><\/h4><p>The first thing to understand is how bad a problem this company has. Assuming each row represents an employee, this company employs close to 14999 people over some period and about 24% of them left this company in that same period (not stated). Is this bad? Turnover is usually calculated on an annual basis and we don't know what period this dataset covers. Also the turnover rate differs from industry to industry. That said, this seems pretty high for a company of this size with an internal R&amp;D team.<\/p><p>When you break it down by job function, the turnover seems to be correlated to job satisfaction. For example, HR and accounting have low median satisfaction levels and high turnover ratios where as R&amp;D and Management have higher satisfaction levels and lower turnover ratios.<\/p><p>Please note the use of new functions in R2016b: <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/xticklabels.html\">xticklabels<\/a><\/tt> to set x-axis tick labels and <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/xtickangle.html\">xtickangle<\/a><\/tt> to rotate x-axis tick labels.<\/p><pre class=\"codeinput\">[g, job] = findgroups(train.sales);                 <span class=\"comment\">% group by sales<\/span>\r\njob = cellstr(job);                                 <span class=\"comment\">% convert to cell<\/span>\r\njob = replace(job,<span class=\"string\">'and'<\/span>,<span class=\"string\">'&amp;'<\/span>);                       <span class=\"comment\">% clean up<\/span>\r\njob = replace(job,<span class=\"string\">'_'<\/span>,<span class=\"string\">' '<\/span>);                         <span class=\"comment\">% more clean up<\/span>\r\njob = regexprep(job,<span class=\"string\">'(^| )\\s*.'<\/span>,<span class=\"string\">'${upper($0)}'<\/span>);    <span class=\"comment\">% capitalize first letter<\/span>\r\njob = replace(job,<span class=\"string\">'Hr'<\/span>,<span class=\"string\">'HR'<\/span>);                       <span class=\"comment\">% more capitalize<\/span>\r\nfunc = @(x) sum(x)\/numel(x);                        <span class=\"comment\">% anonymous function<\/span>\r\nturnover = splitapply(func, train.left, g);         <span class=\"comment\">% get group stats<\/span>\r\nfigure                                              <span class=\"comment\">% new figure<\/span>\r\nyyaxis <span class=\"string\">left<\/span>                                         <span class=\"comment\">% left y-axis<\/span>\r\nbar(turnover*100)                                   <span class=\"comment\">% turnover percent<\/span>\r\nxticklabels(job)                                    <span class=\"comment\">% label bars<\/span>\r\nxtickangle(45)                                      <span class=\"comment\">% rotate labels<\/span>\r\nylabel(<span class=\"string\">'Employee Turnover %'<\/span>)                       <span class=\"comment\">% left y-axis label<\/span>\r\ntitle({<span class=\"string\">'Turnover &amp; Satisfaction by Job Function'<\/span>; <span class=\"keyword\">...<\/span>\r\n    sprintf(<span class=\"string\">'Overall Turnover %.1f%%'<\/span>, sum(train.left)\/height(train)*100)})\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% overlay another plot<\/span>\r\nsatisfaction = splitapply(@median, <span class=\"keyword\">...<\/span><span class=\"comment\">              % get group median<\/span>\r\n    train.satisfaction_level, g);\r\nyyaxis <span class=\"string\">right<\/span>                                        <span class=\"comment\">% right y-axis<\/span>\r\nplot(1:length(job),satisfaction)                    <span class=\"comment\">% plot median line<\/span>\r\nylabel(<span class=\"string\">'Median Employee Satisfaction'<\/span>)              <span class=\"comment\">% right y-axis label<\/span>\r\nylim([.5 .67])                                      <span class=\"comment\">% scale left y-axis<\/span>\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% stop overlay<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_01.png\" alt=\"\"> <h4>Defining Who Are the \"Best\"<a name=\"53159959-d07b-4fca-bb41-c166c2be0794\"><\/a><\/h4><p>We are asked to analyze why the best and most experienced employees are leaving. How do we identify who are the \"best\"? For the purpose of this analysis, I will use the performance evaluation score to determine who are high performers. As the following histogram shows, employees with lower scores as well as higher scores tend to leave, and people with average scores are less likely to leave. The median score is 0.72. Let's say anyone with 0.8 or higher scores are high performers.<\/p><pre class=\"codeinput\">figure                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(train.last_evaluation(train.left == 0))   <span class=\"comment\">% histogram of those stayed<\/span>\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(train.last_evaluation(train.left == 1))   <span class=\"comment\">% histogram of those left<\/span>\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% stop overlay<\/span>\r\nxlabel(<span class=\"keyword\">...<\/span><span class=\"comment\">                                          % x-axis label<\/span>\r\n    sprintf(<span class=\"string\">'Last Evaluation - Median = %.2f'<\/span>,median(train.last_evaluation)))\r\nylabel(<span class=\"string\">'# Employees'<\/span>)                               <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Stayed'<\/span>,<span class=\"string\">'Left'<\/span>)                             <span class=\"comment\">% add legend<\/span>\r\ntitle(<span class=\"string\">'Distribution of Last Evaluation'<\/span>)            <span class=\"comment\">% add title<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_02.png\" alt=\"\"> <h4>Defining Who Are the \"Most Experienced\"<a name=\"02c689ab-974e-45d6-98c9-32b551de4825\"><\/a><\/h4><p>Among high performers, the company is particularly interested in \"most experienced\" people - let's use Time Spent at Company to measure the experience level. The plot shows that high performers with 4 to 6 years of experience are at higher risk of turnover.<\/p><pre class=\"codeinput\">hp = train(train.last_evaluation &gt;= 0.8,:);         <span class=\"comment\">% subset high performers<\/span>\r\n\r\nfigure                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(hp.time_spend_company(hp.left == 0))      <span class=\"comment\">% histogram of those stayed<\/span>\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(hp.time_spend_company(hp.left == 1))      <span class=\"comment\">% histogram of those left<\/span>\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% stop overlay<\/span>\r\nxlabel(<span class=\"keyword\">...<\/span><span class=\"comment\">                                          % x-axis label<\/span>\r\n    sprintf(<span class=\"string\">'Time Spent @ Company - Median = %.2f'<\/span>,median(hp.time_spend_company)))\r\nylabel(<span class=\"string\">'# Employees'<\/span>)                               <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Stayed'<\/span>,<span class=\"string\">'Left'<\/span>)                             <span class=\"comment\">% add legend<\/span>\r\ntitle(<span class=\"string\">'Time Spent @ Company Among High Performers'<\/span>) <span class=\"comment\">% add title<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_03.png\" alt=\"\"> <h4>Job Satisfaction Among High Risk Group<a name=\"a32df18d-ffe4-47f9-95c2-08f97bc3c87d\"><\/a><\/h4><p>Let's isolate the at-risk group and see how their job satisfaction stacks up. It is interesting to see that not only people with very low satisfaction levels (no surprise) but also the highly satisfied people left the company. It seems like people with a satisfaction level of 0.7 or higher are at an elevated risk.<\/p><pre class=\"codeinput\">at_risk = hp(hp.time_spend_company &gt;= 4 &amp;<span class=\"keyword\">...<\/span><span class=\"comment\">        % subset high performers<\/span>\r\n    hp.time_spend_company &lt;= 6,:);                  <span class=\"comment\">% with 4-6 years experience<\/span>\r\n\r\nfigure                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(at_risk.satisfaction_level(<span class=\"keyword\">...<\/span><span class=\"comment\">            % histogram of those stayed<\/span>\r\n    at_risk.left == 0))\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(at_risk.satisfaction_level(<span class=\"keyword\">...<\/span><span class=\"comment\">            % histogram of those left<\/span>\r\n    at_risk.left == 1))\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% stop overlay<\/span>\r\nxlabel(<span class=\"keyword\">...<\/span><span class=\"comment\">                                          % x-axis label<\/span>\r\n    sprintf(<span class=\"string\">'Satisfaction Level - Median = %.2f'<\/span>,median(at_risk.satisfaction_level)))\r\nylabel(<span class=\"string\">'# Employees'<\/span>)                               <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Stayed'<\/span>,<span class=\"string\">'Left'<\/span>)                             <span class=\"comment\">% add legend<\/span>\r\ntitle(<span class=\"string\">'Satisfaction Level of the Best and Most Experienced'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_04.png\" alt=\"\"> <h4>Was It for Money?<a name=\"b0ad3e01-9dcf-4db7-b213-8bf1245ac265\"><\/a><\/h4><p>Let's isolate high performing seasoned employees and check their salaries. It is clear that people who get a higher salary are staying while people with medium or low salary leave. No big surprise here, either.<\/p><pre class=\"codeinput\">at_risk_sat = at_risk(<span class=\"keyword\">...<\/span><span class=\"comment\">                           % subset at_risk<\/span>\r\n    at_risk.satisfaction_level &gt;= .7,:);\r\nfigure                                              <span class=\"comment\">% new figure<\/span>\r\nhistogram(at_risk_sat.salary(at_risk_sat.left == 0))<span class=\"comment\">% histogram of those stayed<\/span>\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% overlay another plot<\/span>\r\nhistogram(at_risk_sat.salary(at_risk_sat.left == 1))<span class=\"comment\">% histogram of those left<\/span>\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% stop overlay<\/span>\r\nxlabel(<span class=\"string\">'Salary'<\/span>)                                    <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'# Employees'<\/span>)                               <span class=\"comment\">% y-axis label<\/span>\r\nlegend(<span class=\"string\">'Stayed'<\/span>,<span class=\"string\">'Left'<\/span>)                             <span class=\"comment\">% add legend<\/span>\r\ntitle(<span class=\"string\">'Salary of the Best and Most Experienced with High Satisfaction'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_05.png\" alt=\"\"> <h4>Making Insights Actionable with Predictive Analytics<a name=\"17f9139b-34a9-46be-a99a-9efd21b5a4c5\"><\/a><\/h4><p>At this point, you may say, \"I knew all this stuff already. I didn't get any new insight from this analysis.\"  It's true, but why are you complaining about that? <b>Predictability is a good thing for prediction!<\/b><\/p><p>What we have seen so far all happened in the past, which you cannot undo. However, if you can predict the future, you can do something about it. Making data actionable - that's the true value of predictive analytics.<\/p><p>In order to make this insight actionable, we need to quantify the turnover risk as scores so that we can identify at-risk employees for intervention. Since we are trying to classify people into those who are likely to stay vs. leave, what we need to do is build a classification model to predict such binary outcome.<\/p><p>I have used the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\">Classification Learner<\/a> app to run multiple classifiers on our training data <tt>train<\/tt> to see which one provides the highest prediction accuracy.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/classification_learner.png\" alt=\"\"> <\/p><p>The winner is the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/treebagger.html\">Bagged Trees<\/a> classifier (also known as \"Random Forest\") with 99.0% accuracy. To evaluate a classifier, we typically use the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/perfcurve.html\">ROC curve plot<\/a>, which lets you see the trade-off between the true positive rate vs. false positive rate. If you have a high AUC score like 0.99, the classifer is very good at identifying the true class without causing too many false positives.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/roc.png\" alt=\"\"> <\/p><p>You can also export the trained predictive model from the Classification Learner. I saved the exported model as <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/btree.mat\">btree.mat<\/a><\/tt> that comes with some instructions.<\/p><pre class=\"codeinput\">load <span class=\"string\">btree<\/span>                                          <span class=\"comment\">% load trained model<\/span>\r\nhow_to = btree.HowToPredict;                        <span class=\"comment\">% get how to use<\/span>\r\ndisp([how_to(1:80) <span class=\"string\">' ...'<\/span>])                         <span class=\"comment\">% snow snippet<\/span>\r\n<\/pre><pre class=\"codeoutput\">To make predictions on a new table, T, use: \r\n  yfit = c.predictFcn(T) \r\nreplacing ...\r\n<\/pre><h4>Evaluating the Predictive Performance<a name=\"8a14240e-ff0a-4af7-9950-fcaf6ef00d74\"><\/a><\/h4><p>Let's pick 10 samples from the <tt>holdout<\/tt> data partition and see how the predictive model scores them. Here are the samples - 5 people who left and 5 who stayed. They are all high performers with 4 or more years of experience.<\/p><pre class=\"codeinput\">samples = holdout([111; 484; 652; 715; 737; 1135; 1293; 1443; 1480; 1485],:);\r\nsamples(:,[1,2,5,8,9,10,7])\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n    satisfaction_level    last_evaluation    time_spend_company    promotion_last_5years      sales       salary    left\r\n    __________________    _______________    __________________    _____________________    __________    ______    ____\r\n     0.9                  0.92               4                     0                        sales         low       1   \r\n    0.31                  0.92               6                     0                        support       medium    0   \r\n    0.86                  0.87               4                     0                        sales         low       0   \r\n    0.62                  0.95               4                     0                        RandD         low       0   \r\n    0.23                  0.96               6                     0                        marketing     medium    0   \r\n    0.39                  0.89               5                     0                        support       low       0   \r\n    0.09                  0.92               4                     0                        sales         medium    1   \r\n    0.73                  0.87               5                     0                        IT            low       1   \r\n    0.75                  0.97               6                     0                        technical     medium    1   \r\n    0.84                  0.83               5                     0                        accounting    low       1   \r\n<\/pre><p>Now let's try the model to predict whether they left the company.<\/p><pre class=\"codeinput\">actual = samples.left;                              <span class=\"comment\">% actual outocme<\/span>\r\npredictors = samples(:,[1:6,8:10]);                 <span class=\"comment\">% predictors only<\/span>\r\n[predicted,score]= btree.predictFcn(predictors);    <span class=\"comment\">% get prediction<\/span>\r\nc = confusionmat(actual,predicted);                 <span class=\"comment\">% get confusion matrix<\/span>\r\ndisp(array2table(c, <span class=\"keyword\">...<\/span><span class=\"comment\">                             % show the matrix as table<\/span>\r\n    <span class=\"string\">'VariableNames'<\/span>,{<span class=\"string\">'Predicted_Stay'<\/span>,<span class=\"string\">'Predicted_Leave'<\/span>}, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'RowNames'<\/span>,{<span class=\"string\">'Actual_Stayed'<\/span>,<span class=\"string\">'Actual_Left'<\/span>}));\r\n<\/pre><pre class=\"codeoutput\">                     Predicted_Stay    Predicted_Leave\r\n                     ______________    _______________\r\n    Actual_Stayed    5                 0              \r\n    Actual_Left      0                 5              \r\n<\/pre><p>The model was able to predict the outcome of those 10 samples accurately. In addition, it also returns the probability of each class as score. You can also use the entire <tt>holdout<\/tt> data to check the model performance, but I will skip that step here.<\/p><h4>Using the Model to Take Action<a name=\"347ef57f-8ac9-4b2a-962f-b42154fcd703\"><\/a><\/h4><p>We can use the probability of leaving as the risk score. We can select high performers based on the last evaluation and time spent at the company, and then score their risk level and intervene in high risk cases. In this example, the sales employee with the 0.92 evaluation score is 100% at risk of leaving. Since this employee has not been promoted in the last 5 years, perhaps it is time to do so.<\/p><pre class=\"codeinput\">samples.risk = score(:,2);                        <span class=\"comment\">% probability of leaving<\/span>\r\n[~,ranking] = sort(samples.risk,<span class=\"string\">'descend'<\/span>);       <span class=\"comment\">% sort by risk score<\/span>\r\nsamples = samples(ranking,:);                     <span class=\"comment\">% sort table by ranking<\/span>\r\nsamples(samples.risk &gt; .7,[1,2,5,8,9,10,11])      <span class=\"comment\">% intervention targets<\/span>\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n    satisfaction_level    last_evaluation    time_spend_company    promotion_last_5years      sales       salary     risk  \r\n    __________________    _______________    __________________    _____________________    __________    ______    _______\r\n    0.09                  0.92               4                     0                        sales         medium          1\r\n    0.73                  0.87               5                     0                        IT            low             1\r\n    0.84                  0.83               5                     0                        accounting    low        0.9984\r\n    0.75                  0.97               6                     0                        technical     medium    0.96154\r\n     0.9                  0.92               4                     0                        sales         low       0.70144\r\n<\/pre><h4>Explaining the Model<a name=\"13ec5457-e0fb-4d5a-9e76-08fa969743cc\"><\/a><\/h4><p>Machine learning algorithms used in predictive analytics are often a black-box solution, so HR managers may need to provide an easy-to-understand explanation about how it works in order to get the buy-in from their superiors. We can use the predictor importance score to show which attributes are used to compute the prediction.<\/p><p>In this example, the model is using the mixture of attributes at different weights to compute it, with emphasis on the satisfaction level, followed by the number of projects and time spent at the company. We also see that people who got a promotion in the last 5 years are less likely to leave. Those attributes are very obvious and therefore we feel more confident about the prediction with this model.<\/p><pre class=\"codeinput\">imp = oobPermutedPredictorImportance(<span class=\"keyword\">...<\/span><span class=\"comment\">            % get predictor importance<\/span>\r\n    btree.ClassificationEnsemble);\r\nvals = btree.ClassificationEnsemble.PredictorNames; <span class=\"comment\">% predictor names<\/span>\r\nfigure                                              <span class=\"comment\">% new figure<\/span>\r\nbar(imp);                                           <span class=\"comment\">% plot importnace<\/span>\r\ntitle(<span class=\"string\">'Out-of-Bag Permuted Predictor Importance Estimates'<\/span>);\r\nylabel(<span class=\"string\">'Estimates'<\/span>);                                <span class=\"comment\">% y-axis label<\/span>\r\nxlabel(<span class=\"string\">'Predictors'<\/span>);                               <span class=\"comment\">% x-axis label<\/span>\r\nxticklabels(vals)                                   <span class=\"comment\">% label bars<\/span>\r\nxtickangle(45)                                      <span class=\"comment\">% rotate labels<\/span>\r\nax = gca;                                           <span class=\"comment\">% get current axes<\/span>\r\nax.TickLabelInterpreter = <span class=\"string\">'none'<\/span>;                   <span class=\"comment\">% turn off latex<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_06.png\" alt=\"\"> <h4>Operationalizing Action<a name=\"22a3cf56-02c0-4461-8bc0-178c58471277\"><\/a><\/h4><p>The predictor importance score also give us a hint about when we should intervene. Given that satisfaction level, last evaluation, and time spent at the company are all important predictors, it is probably a good idea to update the predictive scores at the time of each annual evaluation and then decide who may need intervention.<\/p><p>It is feasible to implement a system to schedule such analysis when a performance review is conducted and automatically generate a report of high risk employees using the model derived from MATLAB. Here is <a href=\"http:\/\/jsfiddle.net\/Toshiaki\/03xhhqk9\/show\/\">a simple live demo<\/a> of such a system running the following code using MATLAB Production Server. It uses the trained predictive model we just generated.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/jsfdemo.png\" alt=\"\"> <\/p><p>The example code just loads the data from MAT-file, but you would instead access data from an <a href=\"https:\/\/www.mathworks.com\/help\/database\/index.html\">SQL database<\/a> in a real world use scenario. Please also note the use of a new function <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsonencode.html\">jsonencode<\/a><\/tt> introduced in R2016b that converts structure arrays into JSON formatted text. Naturally, we also have <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsondecode.html\">jsondecode<\/a><\/tt> that converts JSON formatted text into appropriate MATLAB data types.<\/p><pre class=\"codeinput\">dbtype <span class=\"string\">scoreRisk.m<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n1     function risk = scoreRisk(employee_ids)\r\n2     %SCORERISK scores the turnover risk of selected employees\r\n3     \r\n4     load data                                  % load holdout data\r\n5     load btree                                 % load trained model\r\n6     X_sub = X(employee_ids,:);                 % select employees\r\n7     [predicted,score]= btree.predictFcn(X_sub);   % get prediction\r\n8     risk = struct('Ypred',predicted,'score',score(:,2));% create struct\r\n9     risk = jsonencode(risk);                   % JSON encode it\r\n10    \r\n11    end\r\n<\/pre><p>Depending on your needs, you can build it in a couple of ways:<\/p><div><ul><li><a href=\"https:\/\/www.mathworks.com\/products\/compiler\/features.html#sharing-matlab-programs-with-excel-users\">Excel add-in<\/a> with <a href=\"https:\/\/www.mathworks.com\/products\/compiler.html\">MATLAB Compiler<\/a><\/li><li><a href=\"https:\/\/www.mathworks.com\/products\/matlab-compiler-sdk.html\">Server-based solution<\/a> with <a href=\"https:\/\/www.mathworks.com\/products\/matlab-production-server.html\">MATLAB Production Server<\/a><\/li><\/ul><\/div><p>Often, you would need to retrain the predictive model as human behavior changes over time. If you use code generated from MATLAB, it is very easy to retrain using the new dataset and redeploy the new model for production use, as compared to cases where you reimplement the model in some other languages.<\/p><h4>Summary<a name=\"92694283-dca5-44e1-8833-62a174f83b27\"><\/a><\/h4><p>In the beginning of this analysis, we explored the dataset which represents the past. You cannot undo the past, but you can do something about the future if you can find quantifiable patterns in the data for prediction.<\/p><p>People often think the novelty of insights is important. What really matters is what action you can take on them, whether novel or well-known. If you do nothing, no fancy analytics will deliver any value. Harvard Business Review recently publushed an article <a href=\"https:\/\/hbr.org\/2016\/12\/why-youre-not-getting-value-from-your-data-science\">Why You&#8217;re Not Getting Value from Your Data Science<\/a> but failed to mention this issue.<\/p><p>In my opionion, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Analysis_paralysis\">analysis paralysis<\/a> is one of the biggest reason companies are not getting the value because they are failing to take action and are stuck at the data analysis phase of the data science process.<\/p><p>Has this example helped you understand how you can use predictive analytics to solve practical problems? Can you think of way to apply this in your area of work? Let us know what you think <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2152#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_bff9db25bfb94aa3b922ab8f3731e822() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='bff9db25bfb94aa3b922ab8f3731e822 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' bff9db25bfb94aa3b922ab8f3731e822';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_bff9db25bfb94aa3b922ab8f3731e822()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016b<br><\/p><\/div><!--\r\nbff9db25bfb94aa3b922ab8f3731e822 ##### SOURCE BEGIN #####\r\n%% Predicting When People Quit Their Jobs\r\n% 2017 is upon us and that means some of you may be going into your annual\r\n% review or thinking about your career after graduation. Today's guest\r\n% blogger, <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\r\n% Toshi Takeuchi> used machine learning on a job-related dataset for\r\n% <https:\/\/en.wikipedia.org\/wiki\/Predictive_analytics\r\n% predictive analytics>. Let's see what he learned.\r\n% \r\n% <<hr_analytics.jpg>>\r\n%\r\n%%\r\n% Companies spend money and time recruiting talent and they lose all that\r\n% investment when people leave. Therefore companies can save money if they\r\n% can intervene before their employees leave. Perhaps this is a sign of a\r\n% robust economy, that one of the datasets popular on Kaggle deals with\r\n% this issue: < Human\r\n% Resources Analytics - Why are our best and most experienced employees\r\n% leaving prematurely?>\r\n% \r\n% This is an example of *predictive analytics* where you try to predict\r\n% future events based on the historical data using machine learning\r\n% algorithms. When people talk about predictive analytics, you hear often\r\n% that the key is to turn insights into action. What does that really mean?\r\n% Let's also examine this question through exploration of this dataset.\r\n\r\n%% Dataset\r\n% The Kaggle page says \"our example concerns a big company that wants to\r\n% understand why some of their best and most experienced employees are\r\n% leaving prematurely. The company also wishes to predict which valuable\r\n% employees will leave next.\"\r\n% \r\n% The fields in the dataset include:\r\n% \r\n% * Employee satisfaction level, scaling 0 to 1\r\n% * Last evaluation, scaling 0 to 1\r\n% * Number of projects\r\n% * Average monthly hours\r\n% * Time spent at the company in years\r\n% * Whether they have had a work accident\r\n% * Whether they have had a promotion in the last 5 years\r\n% * Sales (which actually means job function)\r\n% * Salary - low, medium or high\r\n% * Whether the employee has left\r\n% \r\n% Let's load it into MATLAB. The new\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/detectimportoptions.html\r\n% detectImportOptions>| makes it easy to set up import options based on the\r\n% file content.\r\n\r\nopts = detectImportOptions('HR_comma_sep.csv');     % set import options\r\nopts.VariableTypes(9:10) = {'categorical'};         % turn text to categorical\r\ncsv = readtable('HR_comma_sep.csv', opts);          % import data\r\nfprintf('Number of rows: %d\\n',height(csv))         % show number of rows\r\n\r\n%% \r\n% We will then hold out 10% of the dataset for model evaluation\r\n% (|holdout|), and use the remaining 90% (|train|) to explore the data and\r\n% train predictive models.\r\n\r\nrng(1)                                              % for reproducibility\r\nc = cvpartition(csv.left,'HoldOut',0.1);            % partition data\r\ntrain = csv(training(c),:);                         % for training \r\nholdout = csv(test(c),:);                           % for model evaluation\r\n\r\n%% Big Picture - How Bad Is Turnover at This Company?\r\n% The first thing to understand is how bad a problem this company has.\r\n% Assuming each row represents an employee, this company employs close to\r\n% 14999 people over some period and about 24% of them left this company in\r\n% that same period (not stated). Is this bad? Turnover is usually\r\n% calculated on an annual basis and we don't know what period this dataset\r\n% covers. Also the turnover rate differs from industry to industry. That\r\n% said, this seems pretty high for a company of this size with an internal\r\n% R&D team.\r\n% \r\n% When you break it down by job function, the turnover seems to be\r\n% correlated to job satisfaction. For example, HR and accounting have low\r\n% median satisfaction levels and high turnover ratios where as R&D and\r\n% Management have higher satisfaction levels and lower turnover ratios.\r\n% \r\n% Please note the use of new functions in R2016b:\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/xticklabels.html\r\n% xticklabels>| to set x-axis tick labels and\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/xtickangle.html xtickangle>|\r\n% to rotate x-axis tick labels.\r\n\r\n[g, job] = findgroups(train.sales);                 % group by sales\r\njob = cellstr(job);                                 % convert to cell\r\njob = replace(job,'and','&');                       % clean up\r\njob = replace(job,'_',' ');                         % more clean up\r\njob = regexprep(job,'(^| )\\s*.','${upper($0)}');    % capitalize first letter\r\njob = replace(job,'Hr','HR');                       % more capitalize\r\nfunc = @(x) sum(x)\/numel(x);                        % anonymous function\r\nturnover = splitapply(func, train.left, g);         % get group stats\r\nfigure                                              % new figure\r\nyyaxis left                                         % left y-axis\r\nbar(turnover*100)                                   % turnover percent\r\nxticklabels(job)                                    % label bars\r\nxtickangle(45)                                      % rotate labels\r\nylabel('Employee Turnover %')                       % left y-axis label\r\ntitle({'Turnover & Satisfaction by Job Function'; ...\r\n    sprintf('Overall Turnover %.1f%%', sum(train.left)\/height(train)*100)})\r\nhold on                                             % overlay another plot\r\nsatisfaction = splitapply(@median, ...              % get group median\r\n    train.satisfaction_level, g);\r\nyyaxis right                                        % right y-axis\r\nplot(1:length(job),satisfaction)                    % plot median line\r\nylabel('Median Employee Satisfaction')              % right y-axis label\r\nylim([.5 .67])                                      % scale left y-axis\r\nhold off                                            % stop overlay\r\n\r\n%% Defining Who Are the \"Best\"\r\n% We are asked to analyze why the best and most experienced employees are\r\n% leaving. How do we identify who are the \"best\"? For the purpose of this\r\n% analysis, I will use the performance evaluation score to determine who\r\n% are high performers. As the following histogram shows, employees with\r\n% lower scores as well as higher scores tend to leave, and people with\r\n% average scores are less likely to leave. The median score is 0.72. Let's\r\n% say anyone with 0.8 or higher scores are high performers.\r\n\r\nfigure                                              % new figure\r\nhistogram(train.last_evaluation(train.left == 0))   % histogram of those stayed\r\nhold on                                             % overlay another plot\r\nhistogram(train.last_evaluation(train.left == 1))   % histogram of those left\r\nhold off                                            % stop overlay\r\nxlabel(...                                          % x-axis label\r\n    sprintf('Last Evaluation - Median = %.2f',median(train.last_evaluation)))\r\nylabel('# Employees')                               % y-axis label\r\nlegend('Stayed','Left')                             % add legend\r\ntitle('Distribution of Last Evaluation')            % add title\r\n\r\n%% Defining Who Are the \"Most Experienced\"\r\n% Among high performers, the company is particularly interested in \"most\r\n% experienced\" people - let's use Time Spent at Company to measure the\r\n% experience level. The plot shows that high performers with 4 to 6 years\r\n% of experience are at higher risk of turnover.\r\n\r\nhp = train(train.last_evaluation >= 0.8,:);         % subset high performers\r\n\r\nfigure                                              % new figure\r\nhistogram(hp.time_spend_company(hp.left == 0))      % histogram of those stayed\r\nhold on                                             % overlay another plot\r\nhistogram(hp.time_spend_company(hp.left == 1))      % histogram of those left\r\nhold off                                            % stop overlay\r\nxlabel(...                                          % x-axis label\r\n    sprintf('Time Spent @ Company - Median = %.2f',median(hp.time_spend_company)))\r\nylabel('# Employees')                               % y-axis label\r\nlegend('Stayed','Left')                             % add legend\r\ntitle('Time Spent @ Company Among High Performers') % add title\r\n\r\n%% Job Satisfaction Among High Risk Group\r\n% Let's isolate the at-risk group and see how their job satisfaction stacks\r\n% up. It is interesting to see that not only people with very low\r\n% satisfaction levels (no surprise) but also the highly satisfied people\r\n% left the company. It seems like people with a satisfaction level of 0.7\r\n% or higher are at an elevated risk.\r\n\r\nat_risk = hp(hp.time_spend_company >= 4 &...        % subset high performers\r\n    hp.time_spend_company <= 6,:);                  % with 4-6 years experience\r\n\r\nfigure                                              % new figure\r\nhistogram(at_risk.satisfaction_level(...            % histogram of those stayed\r\n    at_risk.left == 0)) \r\nhold on                                             % overlay another plot\r\nhistogram(at_risk.satisfaction_level(...            % histogram of those left\r\n    at_risk.left == 1)) \r\nhold off                                            % stop overlay\r\nxlabel(...                                          % x-axis label\r\n    sprintf('Satisfaction Level - Median = %.2f',median(at_risk.satisfaction_level)))\r\nylabel('# Employees')                               % y-axis label\r\nlegend('Stayed','Left')                             % add legend\r\ntitle('Satisfaction Level of the Best and Most Experienced')\r\n\r\n%% Was It for Money?\r\n% Let's isolate high performing seasoned employees and check their\r\n% salaries. It is clear that people who get a higher salary are staying\r\n% while people with medium or low salary leave. No big surprise here,\r\n% either.\r\n\r\nat_risk_sat = at_risk(...                           % subset at_risk\r\n    at_risk.satisfaction_level >= .7,:);\r\nfigure                                              % new figure\r\nhistogram(at_risk_sat.salary(at_risk_sat.left == 0))% histogram of those stayed\r\nhold on                                             % overlay another plot\r\nhistogram(at_risk_sat.salary(at_risk_sat.left == 1))% histogram of those left\r\nhold off                                            % stop overlay\r\nxlabel('Salary')                                    % x-axis label\r\nylabel('# Employees')                               % y-axis label\r\nlegend('Stayed','Left')                             % add legend\r\ntitle('Salary of the Best and Most Experienced with High Satisfaction')\r\n\r\n%% Making Insights Actionable with Predictive Analytics\r\n% At this point, you may say, \"I knew all this stuff already. I didn't get\r\n% any new insight from this analysis.\"  It's true, but why are you\r\n% complaining about that? *Predictability is a good thing for prediction!*\r\n% \r\n% What we have seen so far all happened in the past, which you cannot\r\n% undo. However, if you can predict the future, you can do something about\r\n% it. Making data actionable - that's the true value of predictive\r\n% analytics.\r\n% \r\n% In order to make this insight actionable, we need to quantify the\r\n% turnover risk as scores so that we can identify at-risk employees for\r\n% intervention. Since we are trying to classify people into those who are\r\n% likely to stay vs. leave, what we need to do is build a classification\r\n% model to predict such binary outcome.\r\n% \r\n% I have used the\r\n% <https:\/\/www.mathworks.com\/help\/stats\/classificationlearner-app.html\r\n% Classification Learner> app to run multiple classifiers on our training\r\n% data |train| to see which one provides the highest prediction accuracy.\r\n% \r\n% <<classification_learner.png>>\r\n% \r\n% The winner is the <https:\/\/www.mathworks.com\/help\/stats\/treebagger.html\r\n% Bagged Trees> classifier (also known as \"Random Forest\") with 99.0%\r\n% accuracy. To evaluate a classifier, we typically use the\r\n% <https:\/\/www.mathworks.com\/help\/stats\/perfcurve.html ROC curve plot>,\r\n% which lets you see the trade-off between the true positive rate vs. false\r\n% positive rate. If you have a high AUC score like 0.99, the classifer is\r\n% very good at identifying the true class without causing too many false\r\n% positives.\r\n% \r\n% <<roc.png>>\r\n% \r\n% You can also export the trained predictive model from the Classification\r\n% Learner. I saved the exported model as\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/btree.mat btree.mat>| that\r\n% comes with some instructions.\r\n\r\nload btree                                          % load trained model\r\nhow_to = btree.HowToPredict;                        % get how to use\r\ndisp([how_to(1:80) ' ...'])                         % snow snippet\r\n\r\n%% Evaluating the Predictive Performance\r\n% Let's pick 10 samples from the |holdout| data partition and see how the\r\n% predictive model scores them. Here are the samples - 5 people who left\r\n% and 5 who stayed. They are all high performers with 4 or more years of\r\n% experience.\r\n\r\nsamples = holdout([111; 484; 652; 715; 737; 1135; 1293; 1443; 1480; 1485],:);\r\nsamples(:,[1,2,5,8,9,10,7])\r\n%% \r\n% Now let's try the model to predict whether they left the company. \r\n\r\nactual = samples.left;                              % actual outocme\r\npredictors = samples(:,[1:6,8:10]);                 % predictors only\r\n[predicted,score]= btree.predictFcn(predictors);    % get prediction\r\nc = confusionmat(actual,predicted);                 % get confusion matrix\r\ndisp(array2table(c, ...                             % show the matrix as table\r\n    'VariableNames',{'Predicted_Stay','Predicted_Leave'}, ...\r\n    'RowNames',{'Actual_Stayed','Actual_Left'}));\r\n\r\n%%\r\n% The model was able to predict the outcome of those 10 samples accurately.\r\n% In addition, it also returns the probability of each class as score.\r\n% You can also use the entire |holdout| data to check the model\r\n% performance, but I will skip that step here. \r\n\r\n%% Using the Model to Take Action\r\n% We can use the probability of leaving as the risk score. We can select\r\n% high performers based on the last evaluation and time spent at the\r\n% company, and then score their risk level and intervene in high risk\r\n% cases. In this example, the sales employee with the 0.92 evaluation score\r\n% is 100% at risk of leaving. Since this employee has not been promoted in\r\n% the last 5 years, perhaps it is time to do so.\r\n\r\nsamples.risk = score(:,2);                        % probability of leaving\r\n[~,ranking] = sort(samples.risk,'descend');       % sort by risk score\r\nsamples = samples(ranking,:);                     % sort table by ranking\r\nsamples(samples.risk > .7,[1,2,5,8,9,10,11])      % intervention targets\r\n\r\n%% Explaining the Model\r\n% Machine learning algorithms used in predictive analytics are often a\r\n% black-box solution, so HR managers may need to provide an\r\n% easy-to-understand explanation about how it works in order to get the\r\n% buy-in from their superiors. We can use the predictor importance score to\r\n% show which attributes are used to compute the prediction.\r\n% \r\n% In this example, the model is using the mixture of attributes at\r\n% different weights to compute it, with emphasis on the satisfaction level,\r\n% followed by the number of projects and time spent at the company. We also\r\n% see that people who got a promotion in the last 5 years are less likely\r\n% to leave. Those attributes are very obvious and therefore we feel more\r\n% confident about the prediction with this model.\r\n\r\nimp = oobPermutedPredictorImportance(...            % get predictor importance\r\n    btree.ClassificationEnsemble);\r\nvals = btree.ClassificationEnsemble.PredictorNames; % predictor names\r\nfigure                                              % new figure\r\nbar(imp);                                           % plot importnace\r\ntitle('Out-of-Bag Permuted Predictor Importance Estimates');\r\nylabel('Estimates');                                % y-axis label\r\nxlabel('Predictors');                               % x-axis label\r\nxticklabels(vals)                                   % label bars\r\nxtickangle(45)                                      % rotate labels\r\nax = gca;                                           % get current axes\r\nax.TickLabelInterpreter = 'none';                   % turn off latex\r\n\r\n%% Operationalizing Action\r\n% The predictor importance score also give us a hint about when we should\r\n% intervene. Given that satisfaction level, last evaluation, and time spent\r\n% at the company are all important predictors, it is probably a good idea\r\n% to update the predictive scores at the time of each annual evaluation and\r\n% then decide who may need intervention.\r\n% \r\n% It is feasible to implement a system to schedule such analysis when a\r\n% performance review is conducted and automatically generate a report of\r\n% high risk employees using the model derived from MATLAB. Here is\r\n% <http:\/\/jsfiddle.net\/Toshiaki\/03xhhqk9\/show\/ a simple live demo> of such\r\n% a system running the following code using MATLAB Production Server. It\r\n% uses the trained predictive model we just generated.\r\n%\r\n% <<jsfdemo.png>>\r\n%\r\n% The example code just loads the data from MAT-file, but you would instead\r\n% access data from an <https:\/\/www.mathworks.com\/help\/database\/index.html\r\n% SQL database> in a real world use scenario. Please also note the use of a\r\n% new function |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsonencode.html\r\n% jsonencode>| introduced in R2016b that converts structure arrays into\r\n% JSON formatted text. Naturally, we also have\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsondecode.html jsondecode>|\r\n% that converts JSON formatted text into appropriate MATLAB data types.\r\ndbtype scoreRisk.m\r\n\r\n%%\r\n% Depending on your needs, you can build it in a couple of ways:\r\n% \r\n% * <https:\/\/www.mathworks.com\/products\/compiler\/features.html#sharing-matlab-programs-with-excel-users\r\n% Excel add-in> with <https:\/\/www.mathworks.com\/products\/compiler.html\r\n% MATLAB Compiler>\r\n% * <https:\/\/www.mathworks.com\/products\/matlab-compiler-sdk.html Server-based\r\n% solution> with\r\n% <https:\/\/www.mathworks.com\/products\/matlab-production-server.html MATLAB\r\n% Production Server>\r\n%\r\n% Often, you would need to retrain the predictive model as human behavior\r\n% changes over time. If you use code generated from MATLAB, it is very easy\r\n% to retrain using the new dataset and redeploy the new model for\r\n% production use, as compared to cases where you reimplement the model in\r\n% some other languages.\r\n\r\n%% Summary\r\n% In the beginning of this analysis, we explored the dataset which\r\n% represents the past. You cannot undo the past, but you can do something\r\n% about the future if you can find quantifiable patterns in the data for\r\n% prediction.\r\n% \r\n% People often think the novelty of insights is important. What really\r\n% matters is what action you can take on them, whether novel or well-known.\r\n% If you do nothing, no fancy analytics will deliver any value. Harvard\r\n% Business Review recently publushed an article\r\n% <https:\/\/hbr.org\/2016\/12\/why-youre-not-getting-value-from-your-data-science\r\n% Why You\u00e2\u20ac&#x2122;re Not Getting Value from Your Data Science> but failed to\r\n% mention this issue.\r\n%\r\n% In my opionion, <https:\/\/en.wikipedia.org\/wiki\/Analysis_paralysis\r\n% analysis paralysis> is one of the biggest reason companies are not\r\n% getting the value because they are failing to take action and are stuck\r\n% at the data analysis phase of the data science process.\r\n%\r\n% Has this example helped you understand how you can use predictive analytics\r\n% to solve practical problems? Can you think of way to apply this in your\r\n% area of work? Let us know what you think\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=2152#respond here>.\r\n##### SOURCE END ##### bff9db25bfb94aa3b922ab8f3731e822\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predict_when_people_quit_job_06.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>2017 is upon us and that means some of you may be going into your annual review or thinking about your career after graduation. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a> used machine learning on a job-related dataset for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Predictive_analytics\">predictive analytics<\/a>. Let's see what he learned.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/01\/05\/predicting-when-people-quit-their-jobs\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[66,43],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2152"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2152"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2152\/revisions"}],"predecessor-version":[{"id":2770,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2152\/revisions\/2770"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}