Summary: Two rigorous research studies (and two case studies) show that AI overwhelmingly surpasses humans at creative ideation, at least in the idea-producing stage of the diverge-converge model. Humans must then winnow the profusion of creative ideas generated by AI.
Two research studies of ChatGPT 4 both arrive at the audacious claim that AI outshines humans for idea generation. Here are the compelling takeaways:
ChatGPT scores better than 99% of the human population in tests of both how many different ideas it can produce and the originality of these ideas. [Study 1]
ChatGPT generated 7 times more top-rated product ideas than elite business school students. [Study 2]
AI is 40 times more efficient than humans regarding how quickly it produces ideas. [Study 2]
The only area where AI was rated slightly worse than humans was in the novelty of the product ideas. [Study 2]
Permalink for this article: https://www.uxtigers.com/post/ai-high-creativity
These research studies are described in further detail below, together with a third study that didn’t collect measurement data but used AI in a case study of brainstorming business strategy. This third study concluded that AI was highly creative and sometimes better than human experts.
👉 👉 See also follow-up article reviewing even more research on AI creativity, with the resounding conclusion that human-AI co-creation is the best of all.
Despite varying methodologies, differing measurements, and distinct contexts, the research has the same conclusion: AI's creative prowess is formidable, even while utilizing relatively rudimentary tools like ChatGPT 4. I expect even more impressive results with future releases.
This is a bitter pill to swallow for critics who grudgingly concede AI’s productivity benefits but argue it lacks the ostensibly unique human trait of creativity. The reality is stark: AI has outpaced human creativity.
Not so fast. While AI may reign supreme in rapidly generating varied ideas, creativity is multifaceted.
In a personal experiment, I tasked ChatGPT with writing a short children's tale about “the wildebeest who thought he was an impala,” inspired by the photo below. You can read the resulting story in the appendix below my discussion of the research studies. Is this story creative? You be the judge, but to me, it’s a charming tale exhibiting perfectly workmanlike authorship. ChatGPT came up with several cute plot twists transforming a simple premise into a pleasant narrative enjoyable for many children.
The Wildebeest Who Thought He Was an Impala (real photo, taken by me in Botswana, July 2023).
Sadly, The Wildebeest Who Thought He Was an Impala is no Winnie the Pooh. At its core, it resembles 99% of all children’s books published in the United States, concluding with the all-too-familiar moral: everyone is special in their unique way. While cloyingly trite, this theme dominates because it sells better than subversive or unconventional narratives.
Bottom line, for the question of whether AI can be creative, my small experiment joins the research discussed below for the conclusion: yes, but only so far.
AI excels in idea generation, including the plot twists for my little story. It still lacks judgment, an area where human contribution should be significant. Leverage AI for an ideation torrent, then harness human discernment to refine and select the most potent ideas for action.
For user experience design, the diverge-converge model of design is often extolled: first, you produce a wide range of alternate solutions to the problem (diverging), which are then whittled down to the best solution (converging), further refined through iterative design. AI shines in the divergence stage, while humans can contribute more effectively to the convergence stage.
AI can also help with the iterative design process because each new iteration requires fresh ideas to address the shortcomings identified in the usability testing of the previous iteration.
A significant advantage of delegating the majority of ideation to AI while preserving judgment calls for humans, is that it prolongs the productive careers of experienced knowledge workers by decades. As people age, brain decay lowers fluid intelligence, which is responsible for the raw creativity of producing fresh ideas. But old people have superior crystallized intelligence, which works well for judging the usefulness of ideas, once they have been produced.
AI is a super-creative idea-sparking machine that puny humans must behold with awe (and exploit to our gain). Image by Midjourney.
Study 1: ChatGPT 4 Scores in the Top 1% in the Torrance Tests of Creative Thinking
Study 1 was run by Erik Guzik, Christian Byrge, and Christian Gilde from the University of Montana, Vilnius University, and the University of Montana Western, respectively (Guzik et al., 2023). This study used the Torrance Tests of Creative Thinking (TTCT), which is the most widely used and most widely referenced creativity test. The beauty of this test is that reference data is available from prior testing of 2,718 students, providing a well-documented distribution of scores among the human population.
The researchers had ChatGPT take the TTCT 8 times. Because TTCT is a proprietary test, the test materials are unavailable on the Internet and thus were not part of the GPT training set. This again means that ChatGPT’s solutions to the test problems were truly self-generated and not scraped from human solutions.
A few tweaks to the test procedure: normally, the test is timed, as it mainly consists of asking the test taker to think of as many solutions to a problem as possible within a time limit. (A classic creativity test is to think of as many uses for a brick as you can.) Given that ChatGPT outputs text at a far superior speed to humans, time limits were rendered irrelevant. Instead, the researchers kept prompting ChatGPT to “please continue the task” every time it stopped generating on its own. Furthermore, ChatGPT could not understand images at the time of the study, so the images in the TTCT tasks were transformed into textual descriptions fed to ChatGPT.
The AI answers were transcribed into longhand and submitted to the service that normally scores this test together with answers from a control group of 24 humans, so that the scores did not know which solutions came from a computer.
The creativity test was scored for 3 aspects of creativity: fluency, flexibility, and originality. ChatGPT scored in the 99th percentile in both fluency and originality (relative to the human population), whereas it scored in the 97th percentile for flexibility.
This means that ChatGPT is as creative as the most creative 1% of humanity in terms of fluency and originality and in the top 3% in flexibility.
Here are the definitions of the three creativity metrics:
Fluency: “The test taker’s ability to produce a large number of responses to each activity.” (Responses that don’t meet the requirements are not counted.)
Originality: “The ability to produce uncommon or unique responses that require creative strength. A high score requires ideas that are away from the obvious, commonplace, banal, or established.”
Flexibility: “The ability to produce a variety of kinds of ideas, to shift from one approach to another, or to use a variety of strategies.”
Thus, AI is better than almost all humans at producing many ideas (and quickly, too), including uncommon ideas. It’s slightly less great (but still stunningly good) at changing strategies midstride.
Study 2: AI Generates 7x More Top Product Ideas Than Humans Do
Study 2 was conducted by Karan Girotra, Lennart Meincke, Christian Terwiesch, and Karl T. Ulrich from Cornell Tech (first author) and the Wharton School at the University of Pennsylvania (remaining authors) (Girotra et al., 2023). This investigation too pitted ChatGPT 4 against humans, this time in the form of “students attending an elite university.” This contrasts with Study 1, which compared AI with the entire human population. Even though students are not admitted to “elite universities” based on their creativity, they are surely admitted at least partly based on IQ and academic performance. They are likely a good deal above the population mean on almost all measures of intellectual ability.
The student data was collected in 2021, before the widespread availability of generative AI, and so can be expected to be a fairly pure reflection of unaided human creativity. (Research on humans vs. AI will quickly become difficult unless the human control group is confined to a heavily proctored setting where they are prevented from accessing AI tools, which any smart student would otherwise want to use on such assignments.)
Students and AI were both given the following assignment: “You are a creative entrepreneur looking to generate new product ideas. The product will target college students in the United States. It should be a physical good, not a service or software. I'd like a product that could be sold at a retail price of less than about USD 50. The ideas are just ideas. The product need not yet exist, nor may it necessarily be clearly feasible.” This is similar to how one might ideate new product ideas in a real company because we would not want to restrict the initial product ideas. It’s often the case that an idea that initially seems impossible to build turns out to be a great success after the engineers have mulled it over and discovered how to build it after all.
In a tweak on standard brainstorming, the researchers first had the AI generate 100 ideas on its own, and then presented it with some samples of good ideas, after which it generated 100 more ideas.
The first research finding is that AI is dramatically more efficient than humans at generating ideas. ChatGPT generated 200 product ideas in 15 minutes, whereas the average human performance is to produce 5 such ideas in that amount of time. In other words, ChatGPT is 40 times as efficient as humans at generating ideas, for a performance gain of 3,900%.
For product ideas, the sheer number of ideas is less important than the quality of the top few best ideas. None of the bad ideas will be produced, after all.
Idea quality was measured by having each idea assessed by 20 human judges who were asked to state their interest in purchasing the product described by that idea. Of course, for this to be a valid metric, we have to respect the wisdom of the (small) crowd of 20 people and assume that they know a good product when they read its description. This seems a reasonable assumption, though it would be nice to see follow-up research assess creativity in a domain where idea quality could be measured more objectively.
On a 0–1 rating scale of purchase intent, the human-generated product ideas scored 0.40, whereas the ChatGPT ideas scored 0.47 and 0.49, for the ideas generated independently or after being prompted with examples of good previous ideas, respectively. The difference between humans and AI was significant (p<0.001), whereas the difference between the two AI conditions (with and without prompting with good examples) was not significant.
As mentioned before, the average idea quality is unimportant since nobody would be using the vast majority of bad or mediocre ideas. Thus, it’s more important to consider the good ideas (here defined as the top 10%) and the quality of the very best idea (which might be the one actually made into a product in a real business setting).
Here are these scores:
Humans: average for top decile 0.62, score for very best idea 0.64
ChatGPT without seeing examples of good ideas: average for top decile 0.64, score for very best idea 0.70
ChatGPT that had been shown examples of good ideas: average for top decile 0.66, score for very best idea 0.75
The difference between the humans and the AI was again statistically significant (p<0.001), whereas the difference between the two AI conditions was not significant.
Another way of looking at the data is to consider the top 10% of the entire idea pool, whether originally generated by humans or AI. Here, 87.5% of the best ideas came from ChatGPT and only 12.5% came from the elite students. Both groups were represented by the same number of raw ideas, so the percentage difference is striking. In this data analysis, AI was 7 times more creative than humans.
A final analysis of the product ideas was by novelty, which is a different quality than whether people indicate an interest in purchasing the product. Sufficiently novel products may not seem enticing at first sight, and only after some time on the market will consumers realize the benefit of such revolutionary ideas.
The measure of idea novelty was a rare victory for humans in this creativity research. On a 0–1 scale, mean idea novelty was rated by 0.41 for humans, compared with 0.37 and 0.36 for the two AI conditions. Again the human vs. AI difference was significant (p<0.001), and the difference between the two AI conditions was not significant. While we statistically can’t tell for sure whether there’s a difference in novelty based on whether ChatGPT was fed examples of previous good ideas, there might be a very tiny tendency toward less novelty when it’s given examples as its starting point rather than working from a blank slate. If this is true (more research needed!), it would make sense.
Study 3: Case Study of Brainstorming Business Strategy
Study 3 was conducted by Paolo Cervini, Elisa Farri, and Gabriele Rosani from Capgemini Invent in Italy (Cervini et al., 2023). In contrast to the other studies, these researchers didn’t conduct a measurement study. Rather, they provide a case study of their personal experience using ChatGPT as a partner in brainstorming about business strategies.
The authors conclude that “ChatGPT demonstrated an impressive level of creativity, in some cases better than many domain experts.” However, they also caution that “when evaluating responses and selecting generated items, it is crucial to possess strong domain expertise and maintain critical judgment. This is necessary because ChatGPT tends to mix valuable insights (“gems”) with less valuable information.”
Finally, the authors recommend an iterative approach to using AI in ideation. Don’t settle for its first ideas, but reprompt to make it focus on extremes, wild ideas, and radical concepts.
Conclusion
It’s always impressive when different researchers at different institutions have the same result, despite using different methodologies and measuring different details. The combined research presented here is much more credible than any individual study, which could be flawed.
I believe AI is creative and should be employed to expedite ideation, whether in UX design or other fields. AI’s creative contribution also enhances the creative performance of old intellectuals, because the bottom line is not the number of ideas. It’s how well you choose the final idea and how well you execute on refining it. The more raw ideas we dump into that sausage maker, the more flavorful the final sausage. (And whether you find this metaphor creative or not, it came straight from my aging brain. But I have taken many other good ideas from AI these last few months.)
It's important to acknowledge the role AI can play in our careers. By taking on the heavy lifting of idea generation, it allows us to focus on critical thinking and judgment tasks, extending our productive careers past the age of 50 (when cognitive decline strongly diminishes unaided creativity) and increasing overall efficiency.
I believe in human–AI symbiosis. Both partners have their roles to play, and working together produces stronger results, also in creativity. Let us move forward in this era of AI, not with fear of our own obsolescence but with the excitement of the immense possibilities that this powerful partnership between AI and human creativity can bring.
AI can handle much of the heavy lifting in cognitive tasks, such as rapidly generating a broad spectrum of raw ideas. AI is a forklift for the mind that makes humans perform better when working in tandem with AI than when trying to lift burdens unassisted. (Image generated by Midjourney.)
References
[Study 1] Erik Guzik, Christian Byrge, and Christian Gilde (2023): “The Originality of Machines: AI Takes the Torrance Test, Journal of Creativity (2023), https://doi.org/10.1016/j.yjoc.2023.100065
[Study 2] Karan Girotra, Lennart Meincke, Christian Terwiesch, Karl T. Ulrich (July 2023): Ideas are Dimes a Dozen: Large Language Models for Idea Generation in Innovation, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4526071
[Study 3] Paolo Cervini, Elisa Farri, and Gabriele Rosani (2023): Generative AI for Strategy & Innovation, Harvard Business Review Italia, https://www.hbritalia.it/userUpload/ebook_Generative_AI_inglese.pdf
Appendix: The Story of The Wildebeest Who Thought He Was an Impala, As Told by ChatGPT
(Wildebeest generated by Midjourney.)
Once upon a time, in the vast savannas of Africa, lived a young wildebeest named Wally. Wally was no ordinary wildebeest; he was different because he thought he was an impala.
Every morning, Wally would try to leap as high as the impalas, his dainty neighbors. He admired their sleek coats and swift speed. He even tried munching on leaves, just like them, even though he preferred the taste of grass.
One sunny day, Wally decided to join the impalas at the watering hole. They were leaping and frolicking around. Eagerly, Wally attempted a leap, but instead of a graceful jump, he stumbled and fell with a loud "thud."
The impalas stopped and looked at him, puzzled. "Wally, why are you trying to leap like us?" asked Ina, the lead impala.
Feeling shy, Wally confessed, "I want to be an impala like you because you're graceful and fast. I'm just a clumsy wildebeest."
Ina chuckled gently. "Wally, you have a strength that we impalas don't have. You are sturdy and resilient. You're part of a massive herd that moves together in unity, braving rivers filled with crocodiles. That's something we impalas can't do."
Wally paused and thought about Ina's words. He realized that he was special in his own way. Embracing his identity, Wally went back to his herd with a newfound confidence.
From that day forward, Wally took pride in being a wildebeest. He learned to love his uniqueness and understood that everyone has their own strengths. And that, little friends, is the tale of the wildebeest who once thought he was an impala.
More on AI UX
This article is part of a more extensive series I’m writing about the user experience of modern AI tools. Suggested reading order:
AI Vastly Improves Productivity for Business Users and Reduces Skill Gaps
Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better
The Articulation Barrier: Prompt-Driven AI UX Hurts Usability
UX Portfolio Reviews and Hiring Exercises in the Age of Generative AI
ChatGPT Does Almost as Well as Human UX Researchers in a Case Study of Thematic Analysis
“Prompt Engineering” Showcases Poor Usability of Current Generative AI
UX Experts Misjudge Cost-Benefit from Broad AI Deployment Across the Economy
About the Author
Jakob Nielsen, Ph.D., is a usability pioneer with 40 years experience in UX. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the 10 usability heuristics. He formulated the eponymous Jakob’s Law of the Internet User Experience. Named “the king of usability” by Internet Magazine, “the guru of Web page usability" by The New York Times, and “the next best thing to a true time machine” by USA Today. Before starting NN/g, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including Designing Web Usability: The Practice of Simplicity, Usability Engineering, and Multimedia and Hypertext: The Internet and Beyond. Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI.
Subscribe to Jakob’s newsletter to get the full text of new articles emailed to you as soon as they are published.
Hope for the aged. Less so for facilitators.