User Testing

Kobayashi Maru. To merely mention the name Kobayashi Maru invites debate among Trekkies, the devoted followers of all things Star Trek. It is a test – a computer simulation. Participants take the test to evaluate their leadership skills by virtually commanding a spaceship traveling across the galaxy. The Kobayashi Maru test uncovers hidden weaknesses and unforeseen strengths—a practice not unlike user testing.

Now, it is your turn. You sit in the captain’s chair. Shortly after the test begins, you receive a distress call from the Kobayashi Maru, another spaceship, which sits damaged and unmovable across a contested border. The ship’s crew cries out for your help. To rescue the Kobayashi Maru, you must cross the border. Yet, to do so could cause a war and lead to your own destruction.

Do you try to sneak across the border? Do you fight? Do you run? Whatever choice you make, whatever action you take, you will fail. Failing is certain, for this is how the test is designed.

Whatever choice you make, whatever action you take, you will fail.

The star of Star Trek was Captain James T. Kirk. If you are familiar with the story, you know that when he faced the Kobayashi Maru, he failed it, too. But on his third attempt, he was successful. How did he win in a no-win scenario? He cheated. He reprogrammed the computer simulation to turn a no-win scenario into a no-lose. Kirk discovered the moments of failure within the simulation and changed them into moments of success. His actions demonstrate a vital lesson about testing: sometimes you have to lose to learn how to win.

For those who are new to user testing, the concept can sound frightening and dramatic. User testing exposes all your hard work to the whims and opinions of strangers. “What if they don’t like what we built?” you wonder. “What if they hate it?”

Take a moment to imagine people testing an application you designed. Test participants flow through your application, link by link and screen by screen. You start to think, “Hey, this testing thing isn’t so bad.” Then it happens.

A tester clicks a link. He pauses for a moment. He clicks the back button. He tries another button. He tries again. He gets lost. He gets frustrated. You watch your design take on damage, as he shoots barrages of criticisms and vents his anger into open space. “Raise the shields,” you scream. Alarms blare. Fires burn. Sparks shoot across the room as wires dangle from the rafters. Soon after, your once-promising application floats lifeless, scorched and battered, surrounded by a debris field of scribbled Post-it notes and haphazard observations

Of course, that scenario is fictitious. Testing is far less dramatic and far more practical than many believe. More often than not, testers blame themselves for failures more than they blame the software that they are testing. They feel incapable — sometimes even stupid. They direct their frustration inward, not at you. As software creators, we should never fear testers; we should only feel empathy for them. They experience moments of failure so that we may design moments of success.

User testing strives for discovery, not destruction. We discover the hidden weaknesses and unforeseen strengths of software: the stuff we do not otherwise notice as captains of our own creations.

User testing strives for discovery, not destruction.

Rather than the high-stakes drama of the previous testing scenario, testing tends to go much more like this. You sit in a room. You greet a participant as she walks in. “Thanks for coming in today,” you say. “Sure, I hope I can help,” she replies. You ask her to complete a task, such as buy an airline ticket online. She does her best. You record your observations. After a few minutes, her smile transforms into pursed lips. She lets out a small, “hmm.” Your ears perk up and your eyes widen, as you take notice of her mouse pointer floating across her screen. She searches and clicks. She searches and clicks again. The hmm becomes a “hmmpf!” She is lost. You wait a few seconds and ask, “How do you think you’d get back to the previous screen?” That is about as dramatic as it gets. No alarms. No fires. At most, you see a few sparks.

Qualitative and quantitative testing

Let’s start with a testing method you can use today: qualitative testing. You can run a qualitative test at any time during a project. It’s quick. It’s painless. It’s helpful.

Please take a moment and read the following paragraph aloud. Whisper it to yourself if you wish. Ready, go!

Testing is quick, painless, and helpful. 
I’m participating in a test right now.

If you read the line above aloud, we just ran a qualitative test together, albeit a small one. I asked you to do something, and you attempted to do so. Qualitative testing shares similarities with surveying. In surveying, we ask people questions. In testing, we ask people to perform tasks. Qualitative testing offers insights based on what you observe when participants perform those tasks. For example, you might ask a participant to locate information about NASA’s Curiosity Mars rover mission. You notice that she first visits Google and searches for “NASA Mars.” We ask her why she chose that phrase. She replies, “I recall hearing about a NASA and Mars website.” She clicks the first link listed in the search results “mars.nasa.gov.” After a few moments, she scrolls down the page, pauses to review it, and then finds a tout for “Looking for Curiosity?” We ask her about why she is pausing, and she tells us she was looking for the word “Curiosity.”

On the surface, such a test reveals little insight; however, it may indicate the future behaviors of other users. The participant recalled hearing about a similar website, potentially signaling an audience’s exposure to press coverage. She clicked the first search result, possibly demonstrating which website a future user may choose. We witnessed her pausing and scanning the page for the term “Curiosity,” perhaps highlighting the importance of the term. All observations must be taken with a grain of salt, however. Qualitative tests help us understand how some users may perceive an experience, but it does not prove anything. It poses a question about each observation: “Will other people experience the same?” User testing does not provide an answer, but we should take comfort in the ubiquity of this dilemma. As the medical researcher Jonas Salk once wrote, “What people think of as the moment of discovery is really the discovery of the question.”   

Quantitative methods enter the equation when you score an observation. This score can be any type of quantification, but it is frequently a success/fail or numerical tally. For example, you wish to prove a button is difficult to find. Ask enough participants and you will prove participants’ perceptions of a button being either easy or difficult to find. Yet, perceptions are subject to bias. How can we prove others will feel the same way?

A complete explanation of confidence intervals, error rates, samples, and populations warrants its own book, but the short answer is that truly quantitative tests require lots of participants. To reach a 95% confidence of a result to a question, such as “will someone feel something is either easy or hard?” we would need to test approximately 384 randomly selected people (using a population of one million). That is quite an effort, and often one too daunting for a user test. This is why most user tests tend to be purely qualitative, or a mixture of qualitative and low confidence quantitative.

We could measure the time it takes to complete a task. Participant A takes 2 minutes. Participant B takes 3 minutes. Participant C takes 4 minutes. Afterward, we tally the results, giving us an average of 3 minutes ([2 + 3 + 4] / 3 results = 3). The more participants we add to our test, the greater the confidence of our result. You will find that similar scoring can be used to determine the average of all sorts of numerical measurements. However, always be wary of making decisions based on small sample sizes alone. Augment your findings with qualitative questions to help bolster or disprove quantitative claims.

Remote testing

I am a remote testing convert. The idea of remote testing seemed absurd to me at first. How can you run a test without being in the room with the participant? How could you gauge the participant’s attitude, emotional state, or comfort level? Then I ran a few remote tests. I remained unconvinced until I heard a baby cry.

Many remote and online testing services record a test participant’s computer screen. You see recordings of the tests much in the same way that the participants do, including how participants set up their home computers, their desktop backgrounds, and all the crap they leave on them. You see browser toolbars, running applications, and even the occasional indicators of viral infections. The most telling information you receive from remote testing is the audio. Not only do you hear what the participant is saying, but you also hear everything else going on around him or her. During the testing of a financial website, I heard a baby cry. The participant paused the test, came back, and I could hear the nearby baby jostling and cooing.

So, what does such a remote test tell us? For one, it tells us that whatever testing environment we set up in a laboratory will be light years ahead of what most participants have at home. It is a sobering realization that while software creators tend to have fast processors, high-resolution screens, and the latest OS updates, a sizable proportion of Americans do not. If your software is for home use, there is no better place to test software than on a participant’s home computer. Perhaps most importantly, participants often feel more comfortable in their own homes than they do in a lab. They pause. They tend to their kids. They answer phone calls. They use your software in the context of their own messy lives, not in the context of your organized lab.

Still, face-to-face interaction will always have its place in user testing. With the increased need to test gestures on mobile devices, it is helpful to see both participants and what they are testing. He or she may hold her phone with one hand and swipe with the other. They may turn their tablets from portrait to landscape and rest them on their knees. Someone may be vision impaired or hard of hearing — all things perhaps best suited to test in a controlled environment. Only you can determine when face-to-face or remote testing is preferable. Regardless of which you choose, you will still find value in the discoveries, sights, and sounds revealed when testing your work.

User testing: The final frontier

You can test prototypes, visual design, wireframes, cocktail napkin sketches, behaviors, nomenclature, and sentiment — in fact, you can test almost anything. The border between ignorance and evidence is far-reaching but easily crossed. Only one obstacle blocks our way.

Although financial cost may occasionally be a barrier to user testing, the real impediment is fear. Fear that testing displaces prerogative. Fear that testing exposes ineptitude. Fear that testing threatens creativity. None of which proves true. Prerogative, ineptitude, and creativity remain whether you test or not. Testing allows you to discover the strengths and weaknesses of software — not of the people who created it. Once we accept this fact, we lower our shields. We seek out new knowledge and new challenges, and boldly go where many others will.