My 2011 Personal Annual Report

On the Struggle of Locating High-Quality Images

I’ve been pretty vocal about the need for greater use of images in our evaluation communications. And while I can get most people to vow to halt the use of clip art, finding high-quality images can be a total pain. What’s at our fingertips (i.e., available on Google Images) is a problem because it isn’t often licensed for free use and it sort of sucks. I mean, lots of what’s available via Google Images are the cliche, emotionless images that actually work against the connection we’re trying to make with our audience. Here’s a sampling of what to avoid:

The alien dudes – totally un-connectable

The kumbaya symbol of diversity – so overused, it’s a turnoff

The handshake – cliche and ubiquitous

Susan Kistler, Executive Director of the American Evaluation Association, blogged about other free sites to locate high-quality images and I suggest you bookmark these places.

But oh! The time you can spend scrolling through images! Hours lost!

What’s the solution? Just like you wouldn’t wander around the grocery store aimlessly, you should go into the stockphoto site with a list of appropriate images in mind. Get really specific in the site’s search engine. Tell that thing what you want! Make it do your bidding! Which brings me to the most important point of this post:

You have to know what you want.

The most efficient method of high-quality communication in our evaluation reporting is to invest in 30 minutes of visual thinking. Take this as your hall pass to get out of the office with a sketch pad and just doodle. What images come to mind when you think about your topic, your client, your message? Brainstorm, sketch, and play. Grab a small group of people and ask them to do some free association with you (i.e., “What images come to mind when I say ‘connect'”?). Then you’ll be much better prepared to shop the stock photo site like you shop at the supermarket – as quickly as possible.

Juice Analytics

Zach Gemignani, of Juice Analytics fame, gave the keynote at the AEA/CDC Summer Institute yesterday. I had followed their 30 Days to Context Connection list earlier last year, so I was super excited to witness the fun in person. His keynote speech focused on the 10 steps to becoming a Data Vizard. Yep, vizard.

Good tips in there, too. One was to follow the leaders – meaning, check out the awesome folks who have cut down some of the hard work out there on data visualization. Though I thought his list was a little slim (okay, he only had 45 minutes), he did point out the range of leaders out there, from Stephen Few to Jonathan Harris (Side note: Why only white men getting to lead the field of data viz?)

My favorite tip was to think like a designer. He said there’s a thin overlap of folks who are both data junkies and designers (that’s me). But those more on the data junkie side can make tiny adjustments to normal presentations that will help make a bigger impact. For example, choose one color for emphasis and use it to actually emphasize, not decorate. My hack job of his slide, illustrating this idea, is below.

Another tip was about choosing the right chart. For help on that task, check out Juice Analytics’ chart chooser. It’ll guide you through your data needs and let you download a chart template for Excel that is designed for clarity and beauty. Cool!

Where Are The Bad Evaluators?

I’m fresh out of a weekly discussion forum on evaluation. We talked about some of the same topics that always seem to arise: the importance of knowing your client’s organizational culture, the perils of unclear boundaries, the stigma of bad evaluation. Something struck me this time about that last point: Where are all of these bad evaluators we keep referring to?

Who are these offenders who have so tainted my client that she has built up a resistance to evaluation? Did they dry up and find jobs in academia? Did they go native and become entrepreneurs? Or did they move along to another project and continue their streak of damage?

Surely it is none of us. Surely it is all of us.

The simple fact is that we rarely know the truth of how our clients view us and our work. Most often, our contract is over when a report is delivered (rightly or wrongly). We don’t typically stick around, watching how the program uses (or doesn’t) our findings and recommendations. We have no idea if they shelf our report or use it as a doorstop. If they think we suck, we wouldn’t know. If we traumatized them, we would probably be unaware.

The second simple fact is that no evaluation is perfect. Something, somewhere is bound to fall short of expectations (rightly or wrongly). So chances are that there is something “bad” about every evaluation project… I’m sure you can see where I’m headed with this.

Of course, this question implies that there are tons of bad evaluators out there who keep leaving sour tastes in our clients’ mouths. Yet even with my frequent communication with dozens of evaluators around the country, I would not be able to pinpoint any of them as particularly “bad.” I suspect this is because we are all a little bad (and not in the good let’s-go-have-a-beer-at-lunch kind of bad). We all make mistakes that irrevocably mar our projects. But we lack a safe culture that allows evaluators to talk about their own mishaps. It is far easier and professional reputable to refer to Those Bad Evaluators Over There who hurt our chances at doing good work.

Where are the bad evaluators? It is probably safe to say there is one typing this blog post. And one reading it, too.

Remember This

Data visualization (or information visualization or infographics) isn’t just a sweet way to display your evaluation findings. It is a critical pathway to helping clients actually remember what you said. Blame the brain.

Visual processing of information is the dominant method among all the senses – it is called the Pictorial Superiority Effect. There are like 10 kajillion sensory receptors in the eye. And this has served us well, evolutionarily. The ability to pick up slight differences in motion, color, and shape have saved us from being dinner for the lurking tiger or waking the snoozing python. While we don’t have to be quite as perceptive these days (unless you’ve recently driven in downtown Chicago), the biological functions are still there. This preattentive functioning works without intentional effort, as our eyes scan the grassy horizon or the latest evaluation report. Evaluators should be making better use of the preattentive function with data visualization. Clients will be much more likely to have their attention caught if the heights of two bars on a graph are different or if an image is included in a page of otherwise gray text.

But once we have caught a client’s preattention with an infographic, we need to help the client use their working memory to process the information. Working memory is like a sieve (how many times have you forgotten what you went into a room looking for?). Evaluators will need to do as much as possible to reduce the cognitive load when trying to guide the processing of our findings. This can be accomplished through clean, clear, undistracting graphics. The graphic should do the mental calculations for the viewer.

Then to encode the information into long-term memory, it needs a bit more of our assistance. By combining the graphic with verbal explanation, more connections are created in the brain, more schemas are activated, and better recall occurs. Verbal communication alone results in about 10% retention after 72 hours. Combining verbal and visual increases the retention rate to 75%. Using graphic visualization to emphasize information speeds the acquisition of that information and reduces opportunity for misinterpretation. These end results are precisely what we want to encourage among clients listening to or reading our evaluation findings. It is another step we evaluators can take responsiblity for in trying to ensure that our findings are used. While comprehension, retention, and recall may not (yet) predict use of our results, it sure is a step in the right direction. And a pretty one, at that.

I’m going to talk more about data visualization and the use of graphic design in evaluation at this year’s American Evaluation Conference. Check me out.

I’m also working to organize a new Topical Interest Group on data visualization and reporting. If you are an AEA member and want to join, contact me or come to the informational meeting at the conference this year on Friday night, 6:05-6:25 PM in the Goliad Room.

In the meantime, here’s what I’ve been reading on this topic:

Brain Rules by John Medina

Visual Language for Designers by Connie Malamed

Design Elements: A Graphic Style Manual by Timothy Samara

Wine Evaluation. Yep, Wine Evaluation.

I had the awesome opportunity to host our local wine guru – Terry Stingley – at The Evaluation Center this week. He spoke to us about how to evaluate wine. We, of course, were thinking strictly about how to apply these notions to program evaluation. I learned so much about wine!

Terry said there are four things one does when tasting wine, called the Deductive Tasting Method:

1. Look at it. Tilt it to the side. If it clings to the side of the glass (or, in our case, styrofoam cups – sorry, Terry) and appears thick, it has a higher alcohol content. Also look at color. For reds, lighter means older. For whites, darker means older. In reds one also wants to look for the color change in the wine between the middle of the glass and the rim – lots of variation in color means it is older. Ready to taste? Me, too, but not yet.

2. Smell it. The wine will reflect what was grown in the region. Wines made in Europe will have traces of earthiness or wood to them. Fruits should be present in all wines. But to detect this, you have to swirl the wine and really stick your nose way down in the glass.

3. Okay, now you can taste it. The same and new fruits, earths, and woods should emerge. I swear I tasted leather in one glass we sampled. Olfactory in full effect here.

4.  Evaluate it. The true test comes in step four, where the taster simply determines how well the essences deduced in the first three steps are balanced. Imbalance is referred to as angularity. You’ve had that too sweet wine – it’s actually angular. One wine we tasted was too acidic and Terry said it could be in better balance if we had let it breathe at least an hour (he even suggested letting it breathe for a full day).

And while this seems to be derived from pure subjective opinion – little e evaluation – trained wine tasters reportedly have a 95-99% reliability when basing their judgement on the first three indicators. If only we could find the same level of consistency in determining program effectiveness, where the characteristics also change on an annual basis.

Terry is my new favorite person and I’m following him on Twitter. He works as The Wine Guru for Harding’s Marketplace, a local family-owned grocery chain, where he has revolutionized the placement of Michigan wine. He’ll be working there until I replace him.

Eval + Comm

It had perhaps less than six words per slide. It had high quality graphics. It had a systematic and consistent placement of elements. But something about the presentation today still bugged the kernel of a graphic designer inside me.

The presenter had clearly read some basic literature on slideshow presentations (Presentation Zen is my fave) or heard me rant about this topic in the hallways. Like many of us who are clued in to the need for better communication of evaluation topics, he totally thought he knew what he was doing. Two major issues still need to be addressed for those of us who have Graphic Design for Evaluators 101 under our belts.

1. Pick a metaphor or theme and stick with it. The presentation in focus was on nonequivalent dependent variables. Sheesh, right? Normally, I’d suggest thinking of an awesome and relatable metaphor for your topic that can be consistently carried throughout the presentation. I will give $5 to whoever can come up with a good metaphor for nonequivalent dependent variables. In lieu of a metaphor, pick some theme – but just one. Today’s talk featured targets. You know, bull’s eyes. It related to internal validity, I get it. And the target icons were repeated throughout the presentation. This is good. But then odd elements were also chosen, like writing slide text on graphic images of post-it notes. The post-it note communicates draft quality, office work, perhaps even organization. But it didn’t relate to bull’s eyes at all. The post-it note, while cute, was conflicting with the main message. It may have also been…

2. Graphic overload. Adding more graphic elements to the presentation decreases its communication ability. If it isn’t necessary, eliminate it from the slide. Don’t put a border around it and call more attention to it. Likewise, the presentation had extraneous arrows and excessive animation. Like “chartjunk,” these elements distracted from the message. Slidejunk! I’m watching the alien genderless being wave around a dartboard, not listening to your message, my friend. When a client thinks back to your evaluation debriefing, surely it is not the silver alien you want them to be remembering. The presentation should support the speaker.

While we’re probably still patting ourselves on the back for discovering the power of stock photo sites, let’s move ever upward.

Data Visualization and Reporting TIG

Evaluation use is a hot topic, but no one is looking at the role of graphic design.

Guidance on graphic design of evaluation reports in the literature of our field is sparse. Typically, discussion of use of evaluation findings (or lack thereof) focuses on types of use (i.e., conceptual, process, etc) and factors affecting use (i.e., relevance, timing, etc.) but graphic design is notably absent. Texts on the topic of communicating or reporting evaluation findings are also limited in this regard. They tend to limit their discussion to knowing one’s audience and formats of reporting (i.e., brochures, newsletters, oral presentations). Some texts acknowledge the role of graphic design in reporting, but give it a cursory address, such as suggesting that one hire a graphic designer, or “use white space” with no direction on how to make that happen. A couple of evaluators have advocated for the “methods-oriented report” that emphasizes results over the traditional layout, but these have been short on the details of how to enact their recommendations in a word-processing program. Only a few texts have attempted to give guidance on graphic design, such as providing direction on how to create charts or structure a report. However, the resources are all dated. In fact, if one takes into consideration contemporary teachings on graphic design principles, the evaluation texts have the potential to be miseducating.

In my last post, I said I’d develop a checklist for the use of graphic design in reporting. Yep, its coming. In the meantime, I am working on the proposal for a new TIG (Topical Interest Group) within the American Evaluation Association. Its time to bring consideration of data visualization and communication to the mainstream in evaluation. If you’re interested in being a member, send me an email.

And in that meantime, check out how data visualization is developing in other fields. (A shoutout to Humphrey Costello for sending me the links to these blogs, none of which are written by AEA members.)

Andrew Gelman’s blog, which makes statistical translation look easy.

Nathan Yau’s blog, which also has a link to the periodic table of swearing, FYI.

Shawn Allen’s blog, which is designed for a course but we can peek anyway.

Let’s bring this same caliber of work into evaluation. Let’s be interesting. Join me.

Valuing Values with Values and Values

or How Semantics Constrained a Field

Our discipline is stricken with too many values. I’m speaking semantically, of course. We use the word “values” to mean many things, including personal values, cultural and organizational values, criteria (or dimensions of merit), general and specific values (in terms of standards), monetary value (or worth), and in the action form as valuing (or judging). A conflation of terms hinders the discipline’s ability to be accessible others, particularly our clients and stakeholders, and unnecessarily confuses beginning evaluators, and perhaps even those who are more experienced. Although Scriven’s (1991) Evaluation Thesaurus defines several of these, the text itself is a hurdle for an evaluation newcomer and hence not serving the purpose I aim for here: clarity.

Given the fact that well-published and experienced evaluators are confusing their uses of the word “value” or some derivation of it, it is sufficient to say that additional clarifying language may bring focus to our work and at least help us understand each other a bit better. At least six different meanings of the word “values” (or some derivative of it) appear throughout our field and our lives.

Table 1. Values Taxonomy

Value #


Contextual Meaning Use in Evaluation Literature


Personal values

Beliefs or morals. Core understandings and dispositions held by individuals. Informed by #2, but with individual variation. To some, hardly mentioned but as a source of bias. To others, the basis from which the remaining types of values are decided.[i]


Cultural or organizational values

Central tenets that can characterize a mass of people. Variably held by the people in the mass. UN Statement on Human Rights might represent largest mass. Religious texts would be another. When asked for details, some evaluators fall back on a widely-agreed upon set of values, like the UN’s work. Other skinny answers, like the values of the org, are used when criteria are needed, without examining the fit or questioning the values of the org.[ii]


Criteria (general values)

Those dimensions on which an evaluand will be judged, based on its performance. What makes a “good” evaluand of a certain type. Often conflated with #1 and #2. While #1 and #2 may appear here but are not sufficient to determine a good evaluand of a certain type.[iii]


Standards (specific values)

The distinguishment between performance levels for an evaluand. The demarcation between “good,” “fair,” and “bad” levels of performance. Specific aspects of #3. Often conflated with #3. Can be presented in a rubric or grading scheme to show how an evaluand will be judged (#6). Again, #1 and #2 appear here.[iv]


Monetary worth

The cost of something, given its benefits or opportunities it affords. One of the three determinants of an evaluative conclusion. Tangentially related to #1-#4 and #6, but most distinct.[v]


Judging (valuing)

The root of the word “evaluation,” determining merit, significance or #5 of an evaluand, and the ultimate distinguishing factor of our profession. Use of some of #1-#5 to make a decision about an evaluand. Lack of clarity on which of the above terms leads to “valuing.”[vi]

Without specifying the type of values to which we are referring we end up making (or not) sense that sounds like this:

Evaluation: The valuing (#6) of value (#5) based on values (#3), determined by values (#4), underscored by values (#2) and influenced by many values (#1).

So when we ask a client to discuss with us the values of the evaluation, which type are we referring to? More importantly, which type do the clients think we are referring to? When we use the term “value judgment” with a group of stakeholders, it is quite possible to have multiple interpretations.

Let’s look at a specific example of how our vocabulary has caused confusion for the field of evaluation, and hence for our stakeholders:

As Scriven (2007) notes in the Values subcomponent of the Key Evaluation Checklist, many sources of values should be taken into consideration when making a criteria (#3) list. While forms #1 and #2 should be part and parcel of this list, so too must they be scrutinized with other sources, such as legal considerations. In this way, values types #1 and #2 become part of the evaluation (#6). However, contrary to the assertion of Schwandt (1997), House and others (cf. SenGupta, Hopson & Thompson-Robinson [2004]), the act of valuing #6, does not create values type #1. At best, and in a long-reaching circumstance, evaluators may cause people to reflect or rethink their personal values (#1) but this seems more to be an overstatement of the impact of evaluation. Value type #2 also doesn’t stand the logical test of being determined by value type #6. While organizations may change practices as a result of evaluations, organizational values, like cultural and personal values (#2 and #1) are deep-seated, underpinning day-to-day actions and well as overall visions and missions. In essence, it appears that an oversimplification of the values taxonomy has occurred. However, it is also possible that the authors speak to a more specific variation of values that is not made clear in their writings.

The main mistake involved in this conflation of terms seems to be more due to an overstatement of evaluation impact and an underestimation of the sources of our values (#3). Let’s look at Schwandt’s (1997) phrasing: “Assessments of value [#6] cannot help but entail making claims about what ought to be done or avoided or what is right to do… Through making such interpretations, evaluators not only inform the means by which human or social good is realized but shape our definition of the social good as well” (p. 26). Essentially, Schwandt is claiming that our criteria of merit (#3) has the potential to shape social views of good, bad, and ugly. Yet it is clear through the research in the field and reflection on individual practice that criteria are not created in a vacuum. Rather, evaluators pull together criteria from existing social norms, even if from varying sources. So rather than create new definitions of social good, we are simply reflecting what society (or some subcategorized portion thereof) has said is good.  This is precisely why we consider values (#2) when making criteria. (Still, if we take organizational values to be our criteria, we are essentially making the same mistake as many goal-based evaluations, which do not question the appropriateness of the organization’s goals prior to making determinations about the merit, worth, or significance of the organization’s efforts to achieve those goals.)

It is evident from this illustration that even when an evaluator-author holds two separate meanings in her head, when using the same word to represent those meanings, confusion can easily occur for readers, evaluation students, clients, and perhaps even the evaluator-author herself. (The same can be said for our double meaning for the word “standards” – those of the Joint Committee origin and those on which performance is judged [i.e. value type #4] can be confusing when we take a moment to educate stakeholders on what it is we do. But that’s another blog post.)

Instead of picking away at the logic of scholars, let us focus on where we can improve. The larger point for us to address is that we are struggling to be understood and it has been a self-defeating act.  As a first cut at clarity, I propose in Table 2 a sort of glossary from which we can adopt simpler, more appropriate linguistic choices and better express the logistics of our work together as evaluators.

Table 2. Revised Values Taxonomy

Value #

Revised Term

Old Term



Personal values

Personal values

Beliefs or morals. Core understandings and dispositions held by individuals. Informed by #2, but with individual variation.


Cultural or organizational values

Cultural or organizational values

Central tenets that can characterize a mass of people. Variably held by the people in the mass. UN Statement on Human Rights might represent largest mass. Religious texts would be another.



Criteria (general values)

Those dimensions on which an evaluand will be judged, based on its performance. What makes a “good” evaluand of a certain type.



Standards (specific values)

The distinguishment between performance levels for an evaluand. The demarcation between “good,” “fair,” and “bad” levels of performance. Specific aspects of #3.


Monetary worth

Monetary worth

The cost of something, given its benefits or opportunities it affords.



Judging (valuing)

The root of the word “evaluation,” determining merit, significance or #5 of an evaluand, and the ultimate distinguishing factor of our profession.

Perhaps unsurprisingly, there has been little manipulation of the “old” value-related terms in creating new ones. Rather, the idea is that we use other established names when we have them, and in places where we don’t, we use the appropriate adjectives to describe ourselves and our work, with the explicit purpose of reducing confusion in the field. To restate the earlier example:

Evaluation: The judging of monetary worth [among other things] based on criteria, determined by standards, underscored by cultural and organizational values and influenced by many personal values.

Ah, now don’t we all feel much better?

[i] This is often considered the source of “subjectivity” and seen has having little place in evaluation. See, for example, Davidson’s (2005) description of Subjective 1 (based on Scriven) in Evaluation Methodology Basics, pages 88-92.

[ii] Stufflebeam (2007) suggests the use of widely-accepted cultural values, such as those stemming from the United Nations. He discusses this in his description of the CIPP model in Evaluation Theory, Models, and Applications, page 331.

[iii] Scriven (2007) outlines the sources of values quite well in his Key Evaluation Checklist, pages 6-10, though it would have served the field better to consistently call it “criteria of merit” or just “criteria.”

[iv] Referring to standards as specific values is becoming more common. Coryn (2007) delineates this in his dissertation, Evaluation of Researchers and Their Research, page 43.

[v] Interchanging “monetary worth” and “value” is common, but especially included in evaluation through Scriven’s Evaluation Thesaurus, pages 382-383.

[vi] For more explication, see Scriven’s Evaluation Thesaurus, page 375.

When the Evaluator is Evaluated

Last year at about this time I was knee deep in survey redesign. I had joined this awesome project that has been conducting an annual survey for over 10 years. The surveyed parties are grantees in one of NSF’s program streams. Nice as they are, they’d been quite vocal about how the survey doesn’t meet their own evaluation needs, is difficult to complete, is too long, etc.

When I joined the project, I was put onto the task of redesign. As a team, we cut over a quarter of the questions and replaced another 15 percent. Better layout improved the design, cutting out several more pages (it was, like, over 25 to begin with). We’d even conducted three public forums with the grantees to get their input on what could be more useful. I felt pretty great when we released the revised version of the survey last winter.

Then, I got the request from my boss – the one where he delegated me to complete part of the survey. (In a weird twist, because we too were funded by the NSF program stream, we also had to complete the survey. Its like the snake eating its own tail.) So I found myself trying to answer questions about the number of students we have. What? We don’t have students – we serve the other grantees! Do I write in “0” or “n/a”? How many collaborations do we have with other organizations? Geez, it depends on what you mean by “collaboration.” I’d like to think our partners find mutual benefit, but you’d really have to ask them.

In short, it was tortuous. I had always known that the types of grantees were so diverse that making a survey applicable to everyone was a difficult task. But I didn’t even think about our own work and how we would answer these questions. Half of me wants to throw the whole thing away and the other half wants to be content knowing no instrument is perfect. But when congressional budgets rely on the data to stay alive and the project has had over 10 years to get it right, we ought to be a bit closer to perfection.