November 2023
Issue Map
Four neat lines of multi-colored blocks (five each) end in a jumble of blocks, all mixed together.
TD Magazine

Make Sense Out of Evaluation Data

Wednesday, November 1, 2023

Converting raw data into credible insights doesn't require sophisticated statistical knowledge or skills.

Collecting training program evaluation data is only one aspect of an L&D professional's job. To be a fully competent L&D professional, however, you must also know how to make sense of the data you collect. That means having the ability to convert raw data into actionable insights.


The benefits are twofold: You will make learning program design decisions based on data, not intuition or opinion; and you will demonstrate learning program value using facts, not anecdotal information.

The data on data

Unfortunately, analyzing learning data is not an area that many L&D practitioners know much about or have experience doing. According to Towards Maturity's L&D's Relationship With Data and How to Use It More Effectively report, 51 percent of L&D professionals said their department can't use data effectively because they lack analytic skills. In addition, the Association for Talent Development's Effective Evaluation whitepaper reveals that only 40 percent of responding TD professionals believe their learning evaluation efforts helped them achieve their organization's business goals. Only 50 percent of respondents see those efforts as assistive in achieving the company's learning goals.

One reason for those troubling percentages is that most organizations only evaluate learning programs at Level 1 (reaction; 83 percent) and Level 2 (learning; 80 percent). Further, the ATD study found that only 54 percent conduct Level 3 (behavior) evaluations, 38 percent conduct Level 4 (results) evaluations, and 16 percent conduct Level 5 (return on investment) evaluations.

A second reason is that few L&D professionals systematically analyze the evaluation data they collect to identify program trends, make program comparisons, or develop targeted corrective actions to improve program effectiveness—as evidenced by the fact that just half of TD professionals view their L&D department's efforts as helping the department achieve its learning goals.

Measurement and evaluation experts such as Don and Jim Kirkpatrick, Jack and Patti Phillips, and Dave Vance, have frequently mentioned that the reasons for evaluating L&D initiatives are to demonstrate a training program's value, improve a training program, decide whether to continue or discontinue a training program, and run the learning department like a business. Yet, none of those are possible without first organizing evaluation data so that it offers insight into making informed learning program decisions.

Table 1. Calculating the Percentage of Favorable Responses
Quantity of 5s on a seven-point scale: 4
Quantity of 6s on a seven-point scale: 6
Quantity of 7s on a seven-point scale: 2
Total favorable responses (4 + 6 + 2): 12
Total responses (favorable and unfavorable): 20
Percentage of favorable responses (12 ÷ 20 = 0.6): 60%

Guidelines to collecting data

The data collected with Level 1–4 evaluations are of two types: quantitative and qualitative. Quantitative data is numeric and consists of Level 2 knowledge test scores, Level 1 and Level 3 Likert scale survey results, and Level 4 HR or business unit metrics such as turnover, lost time accidents, and sales. Qualitative data, on the other hand, is non-numeric and consists of written responses to open-ended or comment-box survey questions and participant responses recorded during a focus group or interview session.

When analyzing your collected evaluation data, keep in mind four general guidelines.

Include all relevant data. Don't discard some data points because they don't show the program you're evaluating in a positive light. The goal is to paint an accurate picture of what is and isn't working regarding a particular program so that you can take targeted, corrective actions to improve its effectiveness.

Keep all evaluation data individually anonymous. You will get more authentic data if the respondents know their responses won't be linked to them. Emphasize the anonymity to learners before administering an evaluation. Another way to ensure personal anonymity is by making the training cohort group the unit of analysis rather than the individual participants.

Use population samples where appropriate. If you're delivering a program to multiple cohorts, collecting data from all the attendees is unnecessary. Instead, select a random sample of participants—but ensure the selection process is random because, otherwise, you risk incorporating unintentional bias into the data.

Place participant demographic questions at the end of the evaluation. If you choose to collect such data, this best practice will help ensure you get honest responses to your evaluation items. Placing the questions at the beginning of the evaluation can cause some respondents to select more favorable responses out of concern that you may view their true responses as unfavorable. Make sure to also keep the number of demographic questions to a minimum and make completion optional (if appropriate).

Table 2. Calculating the Norm for Survey Items
Sum of scores for survey item A: 178
Quantity of respondents to survey item A: 40
Norm for survey item A (178 ÷ 40): 4.45
Table 3. Calculating the Norm for Test Questions
Quantity of correct responses for test question A: 32
Quantity of respondents to test question A: 40
Norm for test question A (32 ÷ 40 = 0.8): 80%


Making sense of quantitative data

Now that you know what statistics to use to decipher your collected evaluation data, let's look at how you would apply them to make sense of Level 1, 2, and 3 data (I won't cover Level 4 evaluations because they don't involve survey data or test scores).

For Level 1 and Level 3 Likert scale survey data:

  1. Consolidate all like-minded survey items into scales.
  2. Compute the mean or the percentage of favorable responses for each item and scale.
  3. When using a mean score, include a comparison score such as a pretraining score, a norm, or a cut score.
  4. Calculate the percentage of scores that exceed the comparison score.
  5. For Level 2 knowledge test score data:

    1. Consolidate all like-minded test items into scales.
    2. Compute the percentage of participants who answered each test question correctly; do the same for each scale.
    3. Use a pre-program percentage, a norm, or a cut score as a comparison score.
    4. Calculate the percentage of correct responses that exceed the comparison score.

    Making sense of qualitative data

    Comprehending the qualitative data you collect with your Level 1–3 evaluations is sometimes more challenging than deciphering the quantitative data. Note that providing a business executive with only a list of the comments received in response to a particular question is only half the work. The other half involves examining the comments so that they provide actionable insights. While there is no exact formula for doing so, you can use these steps:

    1. Analyze the responses for themes or patterns.
    2. Consolidate all like-minded comments into clusters.
    3. Count the quantity of comments in each cluster.
    4. Place the clusters in numeric order from highest quantity to lowest.
    5. Compare the results with the quantitative data collected and look for correlations.

    Formulating conclusions and recommendations

    After summarizing the quantitative and qualitative data from the Level 1–3 evaluations, make conclusions and recommendations based on the results. Focus on scale scores first and compare each to a comparison score. Then you can highlight the scores above the comparison score and recommend ways to improve scores below the comparison score. As you examine the individual survey or test items that comprise each scale, note the high and low scores that account for the scale falling above or below the comparison score. Finally, recommend ways to improve item scores that fall significantly below the comparison score.

    There is no single best way to report the results from your evaluation efforts to a stakeholder. Nonetheless, a best practice is to start by determining the stakeholder's preferred method of communication. For example, some executives may want a report to review before meeting to discuss the results. Others may like to have a presentation that provides an overview of the results. Still others may want to meet and have an informal discussion.

    In any case, conveying the results using the stakeholder's preferred communication method will increase interest in your data and further position you as a valued business partner. You can additionally pique a stakeholder's interest by beginning with a hook (something interesting within the data) and delivering the data as a story. Take the stakeholder on a journey of discovery with a beginning, middle, and end. Make sure to tell the truth without bias and to provide context. Last, leave a final report.

    Making sense of training evaluation data may seem like a daunting task. However, the math skills you learned in elementary school are all you need to produce credible, insightful results.

About the Author

Ken is the founder and CEO of Phillips Associates and the creator of the Predictive Learning Analytics™ (PLA) learning evaluation methodology. He is also a measurement and evaluation master, having spoken and received rave reviews at the ATD International Conference & EXPO on measuring and evaluating learning issues every year since 2008. He also has presented at the Annual Training Conference and Expo every year since 2013 on similar topics.
Ken has pooled his measurement and evaluation knowledge and experience into workshops and presentations for L&D professionals. All the sessions are highly engaging, practical, and filled with relevant content most L&D professionals haven't previously heard. In short, they are not a rehash of traditional measurement and evaluation theory.
Before pursuing a PhD in the combined organizational behavior and educational administration fields at Northwestern University, Ken held management positions with two colleges and two national corporations. Also, he has more than two dozen published learning instruments and articles to his credit. Ken is also a contributing author to five L&D books and the author of the recently released TD at Work publication titled, "Evaluate Learning with Predictive Learning Analytics."
Ken earned the Certified Professional in Learning and Performance CPLP® (now CPTD) credential from ATD in 2006 as a pilot pioneer and has recertified five times, most recently in 2021.

Sign In to Post a Comment
such an informative article, thanks for sharing Mr. Ken .
I'm one of the team members that established Customer Retention Unit at our bank. it's highly important to deliver data backed with facts and analysis to convince the higher management to implement new policies or procedures in regard to retaining our valued customers.
1- i use our tracking sheet to filter the main reason of customers' churn (Raw Data) 2-point out the weaknesses of our lending policy. 3-present solutions.
Thanks for the kind words about the article, Mohammed. It appears that you are using data to pinpoint areas needing improvement and recommending targeted corrective actions to make the improvements. Your approach is spot-on and a fantastic example of data-driven decision-making. Kudos!
Sorry! Something went wrong on our end. Please try again later.
Sorry! Something went wrong on our end. Please try again later.
Great article! Any best practices on getting comparison data/scores when an organization doesn't have this historical data.
Hi, Josh. Thanks for the kind words about my article. As for your question, I have two suggestions. 1) You can always establish a cut score with the business executive who requested the training. It's not as "scientific" as calculating a norm or having pre-program data. However, because the score is established in cooperation with the business executive, it's a credible starting point. In addition, after collecting data from approximately 40 participants, you can calculate a norm.
Sorry! Something went wrong on our end. Please try again later.
Sorry! Something went wrong on our end. Please try again later.
Sorry! Something went wrong on our end. Please try again later.