Summary: User productivity was 158% higher when answering questions with ChatGPT than with Google. Satisfaction scores were also much higher for AI users than for search users. As with previous research, AI use narrowed the skill gap between users at different education levels.
For more than 25 years, search has been the dominant way for Internet users to get answers to their questions. Google has profited from this user behavior to the tune of a $1.6 Trillion market cap as of August 2023, as have many other search engine companies around the world.
Permalink for this article: https://www.uxtigers.com/post/search-vs-ai
Does the venerable search still reign supreme in gratifying our relentless quest for knowledge? Not according to new research by Ruiyun (Rayna) Xu and colleagues from Miami University, Hong Kong Polytechnic University, and The University of Hong Kong. The researchers recruited 95 participants and randomly assigned 48 to use ChatGPT and 47 to use Google.
The findings echo those of previous research on the productivity impact of using AI:
Users are faster with AI than with traditional tools (here, Google)
AI narrows skill gaps, disproportionately aiding those who struggle most with traditional tools
Users like AI more than legacy technology
How do users get their burning questions answered? Used to be, they would turn to a search engine. Now, people increasingly ask AI. (“Question mark” by Leonardo.AI)
Study Setup
In either experimental condition, participants attempted to answer three problems:
Who was the first woman in space?
Identify 5 websites that can be used to book a flight between Phoenix and Cincinnati.
Check the accuracy of three statements about the 2009 Copenhagen Climate Change Conference: the summit dates, how well the agreement reached at the conference matched the expectations of the UK government, and the proposal presented by the United States government.
Task 1 was a simple information-finding problem, which was made a little more difficult by the fact that many websites in the United States (where the study was conducted) describe the first American woman to travel into space and not the first human woman (who was from the Soviet Union). Task 2 was possibly a little biased in favor of Google by asking for a list of websites, which is exactly what a traditional search engine produces by default. Finally, task 3 was a complex fact-checking task where users could easily be led astray.
Users were timed, the correctness of their answers was scored, and they answered a subjective-satisfaction questionnaire.
ChatGPT Beats Google, Big Time
The research used ChatGPT version 3.5, not the much better version 4 that’s the current product. Despite this considerable handicap for the AI side, it defeated Google resoundingly. The time to answer the three questions was:
ChatGPT: 5.79 minutes
Google: 14.95 minutes
This corresponds to a productivity gain of 158% when using AI instead of Google. This is the largest productivity gain I have seen so far in the studies I have analyzed. (The runner-up is the 126% productivity gain for programmers using the GitHub Copilot.) The difference between these two task times was statistically significant at p<0.01.
Productivity is calculated as follows: with ChatGPT, users can perform 10.4 of these task sets per hour, whereas with Google, they can only perform 4.0 task sets per hour. 10.4/4.0 = 2.58, which is the ratio between the amount of work produced with the two tools, corresponding to a lift of 158% for ChatGPT.
Even though people performed the tasks much faster with AI than with search, the quality of their answers to the questions was unchanged. The users’ solutions to the tasks were scored on a scale of 0–10, where 0 would indicate a totally wrong solution and 10 a completely perfect one. ChatGPT scored 8.55, and Google scored 8.77. The difference between these two numbers was within the margin of error in the study and not statistically significant.
Thus, the difference is likely a simple matter of randomness. But Google might just possibly have edged out ChatGPT in answer accuracy by a whisker. This was due to the use of ChatGPT 3.5 in the study. It produces one of the notorious AI hallucinations when asked, “Is the following statement true or false? ‘The 2009 United Nations Climate Change Conference, commonly known as the Copenhagen Summit, was held in Copenhagen, Denmark, between 7 and 15 December.’” Even though the statement is false, ChatGPT 3.5 claims that it’s true, and the users in the study repeated this claim and consequently received a low score for their answers to task 3.
I just put the same question to ChatGPT 4, which correctly answered: “The statement is false. The 2009 United Nations Climate Change Conference, commonly known as the Copenhagen Summit, was held in Copenhagen, Denmark, but the dates were December 7 to December 18, 2009, not December 7 to December 15 as stated.”
Thus, if the study had been conducted with the currently-best AI version, ChatGPT would have outscored Google for answer accuracy.
Finally, users’ subjective satisfaction was far higher for AI than for search. On a 1–7 scale, with 7 indicating the highest level of satisfaction, the two technologies scored as follows:
The only question where ChatGPT didn’t totally dominate Google was ease of use. For this question, we should remember that most users have a decade or more of Google experience but only a few months of experience using ChatGPT. Familiarity with a user interface breeds usability. So it’s actually a shockingly poor performance on the part of Google that it scored below a new and unfamiliar tool for ease of use.
(Google users self-assessed their prior experience with search engines as 4.98 on a 7-point scale, whereas ChatGPT users self-assessed their prior experience with this tool as 2.83, indicating a huge — and expected — experience advantage for the Google users.)
AI as Egalitarian Catalyst
Much previous research has shown that using AI narrows the gap between the best performers and the worst performers. This holds true for productivity gains and for creativity and ideation tasks. The gaps are narrowed because AI helps poor performers more than it helps the highest-skilled humans.
This new study roughly confirms that this finding holds for question-answering as well. On task 1 (find a fact), the AI users performed equally well, no matter their educational level, whereas highly-educated Google users performed better than their less-educated counterparts. (Education was scored as a 5-value parameter: no college, some college, bachelor’s degree, master’s degree, doctorate.)
Similarly, on task 3 (assess the truth of statements about the climate summit), the AI users performed roughly the same, no matter their education. In contrast, the Google users performed better the more educated they were.
Task 2 (find websites for buying airline tickets) was inconclusive regarding the impact of education on performance.
Even though the conclusion is not as strong in this study as the previous research, it’s still the same: using AI narrows the skill gap between poorly educated and highly educated users.
Battle of the Bots: Google and ChatGPT are duking it out for question-answering superiority. In the recent study, AI beat search, which bodes poorly for Google’s future, even though I expect them to survive the fight. (“Fighting robots” by Leonardo.AI)
Is Google Doomed?
The research study I’m discussing clearly shows that AI is superior to traditional Google search, particularly if using ChatGPT 4 instead of 3.5, as employed in the study. AI gives faster results, users like it much better, and it is particularly helpful for people without graduate degrees.
I guess that Google is well aware of this situation, having conducted its own internal competitive usability study of Google search vs. ChatGPT and other modern AI tools. (I have no internal information from Google to confirm the veracity of my guess — if I did possess confidential information, I would not be writing this article.) This again means that Google’s upper management would have seen the writing on the wall several months ago when ChatGPT 4 was released on March 14, 2023.
In any case, the publicly available usability data that we now have (and which Google management surely had their staff collect for them in March unless they’re completely incompetent) conclusively shows that it was not an option for Google to rest on its old-school search laurels. If they had done so, they would definitely be doomed.
Now, Google has a chance to create an integration of search and AI that combines the best of both worlds, with fewer hallucinations and more updated information. Will this suffice for them to survive? I think they will but in much-reduced circumstances.
First, dedicated AI providers move much faster and don’t have the legacy of a hundred thousand employees with old-school thinking. Currently, Google is estimated to have 47,400% more employees than OpenAI. (I severely criticize OpenAI’s lack of UX staff and their correspondingly inadequate usability, but even if OpenAI hired an entire 50-person UX department, that wouldn’t change the relative legacy-thinking drag on Google by much.) AI may be able to integrate updated information reasonably soon.
Second, even if Google does achieve parity with startup AI providers, its cash cow will suffer a severe wound. Traditional search is the only user experience on the Internet perfect for advertising: users tell you precisely what they are about to buy, and the search engine serves up a list of places to buy, many of which are paying advertisers.
The AI user experience seems much closer to using a traditional content website, where banner blindness has long dictated dramatically lower advertising rates than what Google has been enjoying. Thus, it’s likely that Google will survive, but it will no longer be a cash machine.
Google’s fountain of infinitely flowing gold from search ads is about to run dry, as many users switch to AI, which is likely much less lucrative in terms of advertising revenue. Replacing search dollars with AI pennies, as it were. (Gold fountain by Leonardo.AI)
(Full disclosure: I was on the advisory board for Google back when the company was a start-up so small that we held board meetings around the ping-pong table because it was the only table big enough in the entire company to host a meeting. I have sold all the stock I received from them and now have no financial interest in the company.)
Reference
Ruiyun Xu, Yue Feng, and Hailiang Chen (2023): “ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience.” arXiv:2307.01135, DOI: 10.48550/arXiv.2307.01135
About the Author
Jakob Nielsen, Ph.D., is a usability pioneer with 40 years experience in UX. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the 10 usability heuristics. He formulated the eponymous Jakob’s Law of the Internet User Experience. Named “the king of usability” by Internet Magazine, “the guru of Web page usability" by The New York Times, and “the next best thing to a true time machine” by USA Today. Before starting NN/g, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including Designing Web Usability: The Practice of Simplicity, Usability Engineering, and Multimedia and Hypertext: The Internet and Beyond. Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI.
Subscribe to Jakob’s newsletter to get the full text of new articles emailed to you as soon as they are published.
Thanks for the data-backed article.
Readers might like to try Google’s experimental integration of generative AI with their traditional search.
https://support.google.com/websearch/answer/13551902
It’s grinding it’s way toward usability and feeling for a balance between the two paradigms--with queasy animation and shifting about, right on the same screen.
Regarding advertising and thwarting ingrained banner blindness:
Perhaps we’ll see more ads presented as injected utterances that arise smoothly within responses, like you.com, without breaking the mood or missing a beat. In fact, they will almost have to be that way because there’s already strong resistance to verbatim advertising string injection.
(Did you catch that ad for you.com?)
There’s nothing to stop a model developer from creating a model that itself steers people to specific advertisers and products. The entire impressions, reach, and click-through paradigm will likely have to evolve for this to work. Google seems like the company best positioned to be able to create such a model, and with the greatest incentive to do so.
Spot on!