User Research with Humans vs. AI

Apr 03, 2024

Summary: Beware the allure of fully replacing humans with AI in user research! AI offers many benefits in user research and can enhance, but not replace, the human touch in user-centered design. It cannot replicate the surprises and nuances that make real customers invaluable as study participants.

Two days ago, my April Fools’ Day hoax discussed using AI to conduct usability testing with another AI serving as the test user. That piece was a joke, and all my arguments were false. In contrast, the article you’re reading now is true and represents my genuine analysis of the potential for using AI in user research.

There is indeed some potential for using AI to cut costs and gain a number of benefits. But ultimately, user-centered design must be about the users, who are humans. (One of my favorite UX slogans is “UX Is People.”)

The user is at the center of everything in UX. Icons, data windows, and menus all exist to serve that individual human being. UX Is People. (Ideogram)

Traditional User Research: All Human Actors

Traditional user research is summarized in the following image, showing a classic user testing setup:

A classic user test has 3 (or 4) elements: the design, a user, a study facilitator, and (optionally) additional observers from the team behind a one-way mirror. (Leonardo)

The most important element in user testing is the design. We always tell test participants, “we’re not testing you, we’re testing the design.” This is true. Since we only test a handful of people in each study, their individual performance is irrelevant. (In a quantitative study, we would measure each user’s performance, and then average these numbers.)

In discovery research, we don’t have a design yet, so this component is not present in all user research. But mostly, we do have a design which we want to make better. (In competitive testing, we would be testing multiple designs, but the story is basically the same.)

The second element in user research is the user. We recruit people who are representative of our target audience to come into the lab (or connect to a remote testing service) and try out the design. As shown in my drawing of a stereotypical user test, the user has her hands on the keyboard (or touches the phone) and is driving the interaction while trying to perform test tasks that we have written to be representative of the most important use cases for our product.

The third element is the facilitator. In my picture, he’s taking notes on a checklist with the study plan. It’s the facilitator’s job to keep the session on track, proceed through the test tasks, and administer any survey questions (which I like to minimize in order to spend most of our time with a user observing his or her behavior). Maybe most important, the facilitator aims to keep the user talking while not injecting any bias or helping the user perform the task. Finally, as shown, the facilitator takes notes that often will suffice for analyzing the test results. (Sometimes, we review a recording of the session.)

An optional element in user research is represented in my drawing by the ghosted image of a stakeholder. One of the best ways to educate team members and stakeholders about their customers is to allow them to observe user test sessions from an observation room. We stick stakeholders, developers, and others behind a one-way mirror (or in a different room) to avoid having them interact with the user, which is bound to bias the test results. In real life, the observers will be invisible rather than appearing as the ghost in my drawing.

For more examples of the traditional way of running usability studies, complete with real photos, see my article about usability laboratories.

Replacing Humans with AI?

You will notice that all the roles in this study setup are played by humans, except for the design, which is on a computer. The design might transition from traditional software prototypes to Generative UI, where an AI generates the user interface in real-time. This will be particularly useful for analytics experiments that can automatically morph from old-school A/B testing into multivariate testing with design variations finetuned by the AI as the experiment keeps running.

For the other 3 actors (user, facilitator, observer), can we replace some — or all — of these humans with AI? That would be much cheaper, after all.

We could replace the live observers with an AI summary of the study report. The stakeholders and team members would still learn the research's most important findings. However, all experience shows that even the best-written report doesn’t have the persuasive impact of the visceral experience of directly watching a few customers struggle with your beloved product. Thus, I recommend keeping the observers as live humans at all costs for the sake of stakeholder relations.

The real question is what to do about the test participant and the study facilitator. Can either or both be replaced by AI? With 2 roles and 2 possible players, we have 2x2=4 possible combinations. I’ll now analyze each of these 4 cases.

Option A: Human Researcher, Human User

This is the classic research setup that I just explained. We know from 70 years of experience with UX that it works. We also know that it’s expensive, though the cost of human UX research has been dropping due to a combination of the discount usability methods I’ve been advocating for 35 years and the move to remote studies for most research.

Even though this option requires a human to conduct the research, this human researcher should really be a human-AI symbiant, because any decent researcher will employ a profusion of AI tools to augment skills and improve productivity. As a simple example, describe your research goals to ChatGPT (or similar frontier-level AI) and ask it to give you a set of 20 test tasks for the participant. You won’t use all of these, and you may even need to rewrite the ones you use, but getting inspiration from AI beats staring at a blank screen and trying to think up tasks from scratch.

My article on top AI tools for UX professionals lists many such uses of AI to improve the planning, analysis, and reporting of user research. For a research project to qualify as my Option A, I simply require that actual facilitation is done by a human and that the actual test participant is a human. (Unmoderated user testing also qualifies, assuming that a human researcher sets up the study in the system and defines how the test participants will be treated.)

I give the human-human combo my thumbs up. Keep doing this, even as you experiment with other options.

Option B: Human Researcher, AI User

It’s a hassle to contact representative customers and convince them to participate in user research. Since we already know a lot about human psychology, what about using AI to simulate what users would do on each screen? We can ask the AI what it would do and have it generate great quotes, at any desired length.

For example, we know from extensive research on perception that users tend to look at big things before they look at small things and that they also look at strong colors before muted colors. Any good visual design course will teach a long list of such well-established principles that can be encoded in AI.

The problem is that the principles only tell you what humans usually will do. If you have 100 screen designs, it’s quite likely that most users will look at a big, colorful element before a small, muted element on 80 of those screens. But how will people interpret what they see? And what about those 20 screens where people exhibit unusual behaviors?

The main reason to conduct user research is to be surprised. Your actual customers are not “average humans.” They are individual humans, each with their own characteristics and strange behaviors.

If we simulate users, we get common behaviors if the AI is trained on enough valid data about expected human behaviors. But we don’t get those surprises that show you how your actual customers (mis)interpret your content and go off on wild tangents — maybe never to return to the expected path through the system. Maybe they establish an erroneous mental model on screen 3 that causes them to do something unexpected on screen 7.

The point is that there are always surprises in research with real customers. (If you don’t get a surprise, it proves you ran the study wrong and either biased the outcome or didn’t look deeply enough.)

If you don’t get surprised during a usability test, it only proves you ran the study wrong. (That said, a good research facilitator will keep a neutral poker face, no matter what outrageous things the user does. This image is a caricature I made with Midjourney, not how you should actually look when the user surprises you.)

One of the main goals of any user research is risk management: we reduce the risk of unpleasant surprises in the market after product release because we observe these surprises in testing before release. Fixing problems before launch (a) is up to 100x cheaper, depending on how early we discover the problem, and (b) avoids the reputational damage from releasing a substandard — or potentially dangerous — product to customers.

Resampling is very hard: once prospects have a bad experience with your product, it’s almost impossible to persuade them to sample it again, no matter how loudly you bang the marketing drum to claim that it’s been improved.

Because of the need to observe these real-world surprises, having AI simulate the users is something I don’t expect to ever become fully possible. My article on What AI Can and Cannot Do for UX covers many areas where AI is currently weak, but can be expected to get better. Replacing humans in user research is one of the few areas that’s likely to be impossible forever.

I give the human researcher-AI user combo a skull. Don’t do this.

Option C: AI Researcher, Human User

I would not have current-level AI (GPT-4, or probably also GPT-5) analyze a complete usability test session. Current AI, which is mostly a language model, will know what users say, but not what they do, which is much more important. Future AI will have better vision capabilities, but it will probably take several years before they become sufficiently skilled to pick up on nuances in user behavior.

On the other hand, there are significant potential benefits to replacing the human study facilitator with an AI, especially in methods like surveys and user interviews that do focus on what users say. The AI is tireless, so it can interview as many users as you please. In contrast, human facilitators are expensive and quickly become bored asking the same question hundreds of times.

AI speaks all major and medium-sized languages, meaning that it can conduct international research at no added cost. (It can also translate the users’ responses back to the sponsoring research team’s local language.) This ability to conduct international interviews was one of the features UX professionals appreciated about the service Wondering in my recent study of what AI tools are used for UX work.

Along the lines of being tireless, AI is also completely neutral. No matter what the user says (or does, when better AI becomes available), the AI will proceed in an unbiased manner. All users will be treated exactly the same, which will never be done by human facilitators, no matter how much they endeavor to maintain a professional and unbiased attitude toward all users.

Finally, much research shows that respondents are more likely to be honest and frank when communicating with a computer as opposed to when they talk with a fellow human. (The classic example is to ask medical patients how much alcohol they drink. People invariably confess to drinking much more when discussing their alcohol consumption with a computer than when talking with a doctor.)

My drawing of a traditional user test was set in a Japanese company for a reason: so that I can comment on the notorious difficulties in getting Japanese study participants to speak up about their troubles using computers. Strong cultural norms prevent many Japanese test users from complaining or admitting to having difficulties. While I don’t know for a fact that this situation will be better if an AI were to facilitate a user research study in Japan, I suspect that this will happen. And while other countries don’t have as strong cultural inhibitions against being a complainer, I also suspect that we will be getting somewhat more frank comments from users in an AI-facilitated study.

I give the AI researcher-human user combo both a skull (for the risk) and a thumbs up (for the potential gains in select circumstances). Experiment with this, but be careful.

Option D: AI Researcher, AI User

I described this option in my April Fools’ Day hoax, complete with nice pictures of Mistral testing GPT-5. I even discussed the supposed benefits, such as the ability to conduct many test sessions in parallel and to employ different AIs as the researchers, to gain a wider range of insights than what you’ll get from a single researcher.

However, that piece was a hoax. I wrote it to be convincing in the hope of (temporarily) fooling readers. But all the arguments are false. Do not get tempted by the siren songs you’ll no doubt be subjected to from time to time. The arguments for getting completely rid of humans are seductive. It’s a great song, but it’s fiction.

Beware the siren song about the attractive idea to eliminate humans from user research. Follow the example set by Odysseus and lash yourself to the mast to avoid jumping overboard. (Midjourney)

All the reasons I gave under Option B to avoid “user” research without human users apply equally well to Option D.

The AI-AI combo rates a skull. Don’t do this.

Summary Infographic

Here is my infographic to summarize my recommendations in this article:

Feel free to copy or reuse this infographic, provided you give this URL as the source.

About the Author

Jakob Nielsen, Ph.D., is a usability pioneer with 41 years experience in UX and the Founder of UX Tigers. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the 10 usability heuristics. He formulated the eponymous Jakob’s Law of the Internet User Experience. Named “the king of usability” by Internet Magazine, “the guru of Web page usability” by The New York Times, and “the next best thing to a true time machine” by USA Today. Previously, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including the best-selling Designing Web Usability: The Practice of Simplicity (published in 22 languages), the foundational Usability Engineering (26,840 citations in Google Scholar), and the pioneering Hypertext and Hypermedia (published two years before the Web launched). Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI and was named a “Titan of Human Factors” by the Human Factors and Ergonomics Society.

· Follow Jakob on LinkedIn.

· Subscribe to Jakob’s newsletter to get the full text of new articles emailed to you as soon as they are published.

· Read: article about Jakob Nielsen’s career in UX

· Watch: Jakob Nielsen’s 41 years in UX (8 min. video)

Anil Kumar Soman

Apr 3, 2024

AI is already tricking human users into an addictive attention economy ... A prime example of this is infinite scroll newsfeed in FaceBook and the auto-plying youtube ... a well curated personalised experience is being engineered from the insights and learning from a demographic profile ... This is already here and imagine AGI entering this space ... Its going to watch every move of a user in a digital space and anticipate your next move ... AH! is it lovely or scary?

Expand full comment

Ted Boren

Apr 12, 2024Edited

I felt something was wrong with the first image in the article, and it was feeling creepy somehow... I finally saw why. The right hand of the person is backwards. When turned, the thumb should be closest to the viewer, not the pinky, according to all healthy anatomical rules. Try to assume the pose yourself and you'll break your wrist. Did AI mean to do this? Did Jakob mean to let it slip through? Am I reading too much into it?

Jakob Nielsen on UX

Discussion about this post