UX Roundup: Jakob Live | Reducing Extreme Poverty | Training with AI Agents | Healthcare Leapfrogging | Human vs AI Doctors | AI Leaderboard
Summary: Jakob Live | Progress in reducing extreme poverty | AI Agents as training aides | Will AI finally leapfrog healthcare delivery into the modern world? | AI beats human doctors for diagnosis | AI Leaderboard
UX Roundup for October 7, 2024. (Midjourney)
Jakob Nielsen Live on Stage in San Francisco
I will have a rare in-person event next week in San Francisco as part of SFTechWeek: a conversation with Felix Lee, CEO of ADPList. The event is hosted by Dovetail (thank you!) at its main San Francisco location.
Event time: Thursday, October 10, 6:00 PM.
The event is free, but advance registration is required due to limited seating.
Live on stage in San Francisco October 10. This was the best I could do, but the characters don’t really look like us. (Ideogram)
Progress in Reducing Extreme Poverty in the World
I’ll start this newsletter with some good news: the world economy is back on track in reducing extreme poverty. The percentage of the world population living in extreme poverty is now the lowest it has been in human history: 8.5%.
This is 690 M people who are extremely poor, but that’s much better than the 1.8 B people who were extremely poor in 2000. More than a billion humans have escaped the miserable existence of extreme poverty in less than a generation.
This happy news comes from the World Bank’s global poverty project.
The number of people living in poverty, though not extreme poverty, is also down and now stands at 43.6% of the world's population, or 3.5 B people. Down from 4.2 B people in 2000.
The definitions of extreme poverty and regular poverty are:
Extreme poverty: living on less than US $2.15 per day, in inflation-adjusted 2017 dollars, corrected for purchasing power parity (PPP). Because of inflation, this corresponds to $2.76 in present-day dollars, or $1,007 per year.
Regular poverty: living on less than US $6.85 per day, in inflation-adjusted 2017 dollars, corrected for PPP. This corresponds to $8.80 in 2024, or $3,212 per year.
I doubt many of my readers earn less than $3 K per year, let alone less than $1 K. But that’s because you mostly live in rich countries and mostly have good jobs. (I target readers with IQ and education equivalent to at least an undergraduate college degree.)
The following chart shows the percentage of the world population living in either extreme poverty or regular poverty from 1981 to 2024, based on that World Bank data. There was a small bump up in 2020 due to the economic slowdown caused by the Covid lockdowns, but the big trend is clear: Down, especially for extreme poverty, which is by far the worst problem.
Worldwide poverty rates since 1981. Extreme poverty is less than $2.76 daily (2024 PPP dollars), and regular poverty is less than $8.80 daily (2024 PPP dollars). Plotted based on World Bank data.
Even though poverty trends are down, extreme poverty is not dropping as fast as it used to. Before 2020, the biggest gains came from introducing market economies in China and India. Both countries can still improve, especially in the area of regular poverty. But almost all remaining extreme poverty is in Sub-Saharan Africa, where the economy has been resistant to improvements.
This is one of the reasons I am so aggressive in promoting fast buildout of AI. Yes, this will likely double our standard of living in rich countries. Nice, and I’ll take it.
But the most significant benefits will accrue to poor people in both categories tracked here. Doubling the world economy will undoubtedly give them more money, just as contracting the world economy in 2020 hurt them. Even better, AI will incalculably improve education and healthcare in poor countries.
When people live on less than $3 per day, they can’t afford $2 in daily tuition per child to free their kids from the government schools where the teachers are incompetent and rarely show up for work in the first place. It won’t take many years before AI-powered education will turn this problem around. As poor children finally get a decent education, they’ll grow up to be qualified to have much better jobs than their parents and will likely join the middle class. For sure, they’ll escape that extreme poverty trap.
AI Agents as Training Aides
Olivia Moore is one of my favorite analysts of the AI business space. She and Anish Acharya have written an interesting analysis of the potential for AI voice agents, now that we have technologies like OpenAI’s advanced voice mode and Google’s NotebookNL automated podcast hosts.
Their slogan is, “Now is the time to reinvent the phone call,” because they consider phone calls “an API to the world.” I’m not sure I agree that phone calls are the best way for AI agents to interact with companies in the long run, but there are certainly many promising cases for the short term.
An interesting voice AI use case they discuss is Hyperbound, which helps companies train salespeople using simulated calls with voice agents that are customized to their customer personas. Sales reps from 7k companies have used Hyperbound to practice 100k calls. (It’s not that the Hyperbound AI agent is an actual customer, it’s that the human sales reps can train their sales technique in simulated sales calls before bothering actual customers.)
A related idea is to make an AI voice agent that can simulate users in user research. This would allow newbie UX researchers to hone their skills in usability study facilitation, user interviews, and other research methodologies without the need for expensive human test participants.
Just as the Hyperbound simulated sales calls don’t close actual sales with real customers, it’s important to limit this use of simulated research participants to training sessions, intended to build the skills of junior staff and students. A simulated user is not a real customer, so you can not use what the AI says as the basis for any design decisions. Eventually, you’ll want to do real research with real customers, but training sessions with AI will help you get better at those studies.
Besides the cost savings of using AI instead of customers, having research training sessions with an AI agent also allows researchers to experience “difficult users” and practice different ways of steering a test session back on track when and if certain test participants derail it. Luckily, most test participants are nice and attempt to comply with study protocols. But sometimes you get participants who clam up and don’t think out loud. Or, conversely, you get chatty people who go off on extended tangents at the smallest chance. And you even get people who cheated on the recruiting screener to get admitted to a study they don’t qualify for.
How do you get the quiet people to talk? How do you get the chatty ones to focus on the test tasks? And how do you spot cheaters? Practice sessions help you improve your skills in these situations.
Practicing study facilitation skills while interacting with an AI agent that simulates a user. This doesn’t produce valid test data, but is a good way to train students or new UX research staff. (Midjourney)
This idea exemplifies the “AI as a teacher” metaphor for working with AI. Of course, you could do the same with a human instructor, but no university (or commercial) course I have ever seen has enough instructors for every student to practice twenty study sessions, which is probably what it takes to evolve most students’ skills.
Will AI Finally Leapfrog Healthcare Delivery into the Modern World?
Julie Yoo (general partner at the A16Z VC firm) wrote an interesting article titled “The leapfrog opportunity in healthcare AI.” Anybody who has been to a doctor’s office for the last 20 years or more will have noticed that the doctor spends more time with a terrible computer system than with the patient.
Physicians spend more time with low-usability health information systems than with patients. (Midjourney)
Particularly in the United States, healthcare runs on antiquated technology. Honestly, doctors still often send prescriptions by fax. Yoo mentions that IT budgets in the healthcare sector are only 3.5% of revenues — less than half of IT spending in the financial sector.
However, this lack of good IT in healthcare can now be turned into an opportunity: there’s not as much legacy technology standing in the way of AI. (In contrast, most other enterprise businesses run on immense legacy systems that will take time and expense to replace with AI.) AI in healthcare is a classic leapfrog scenario.
Frogs are notoriously strong jumpers. The idea of leapfrogging is that we jump over the barriers created by legacy technology if we never implemented that technology in the first place. A classic example is mobile telephony in Africa, which leapfrogged the landlines that were never installed in many places. (Midjourney)
Yoo points out that “the scarcest asset in healthcare is clinical judgment, which today only exists in the form of human doctors and nurses.” AI has the potential to take over some of this clinical judgment, especially in routine cases, and also for freeing up much of the wasted time supposed clinicians spend on non-clinical paperwork.
Being an investor, Yoo is particularly thrilled about the business opportunity to use AI in medicine in the United States. The country spends a stunning $4T on healthcare annually, and it’s almost impossible to innovate due to heavy-handed government regulation agencies. The good news, from her perspective, is that agencies like the FDA eventually do approve many AI systems, and once approved, the immense government regulatory bureaucracy serves as a moat to keep competitors from entering the market, allowing for fat profits for those companies that succeeded in overcoming the agencies.
When a wide moat surrounds a castle, it’s hard for besieging enemy forces to get across to attack the defenders. Warren Buffett famously used moats as a metaphor for a business that’s impervious to competition. (Midjourney)
While I appreciate that investors can make the most money in American healthcare, I am most excited about AI’s potential to improve healthcare in developing countries. While clinical judgment is scarce in the U.S., it’s non-existent in many places around the world. Even a poor town with a few overworked doctors typically won’t have the specialists needed for advanced care. AI-driven healthcare will uplevel the quality of medicine for billions of people.
Of course, there’s no conflict between using AI to improve healthcare in rich and poor countries. Once an AI service has been developed (likely funded by investors in rich countries), we can also add sufficient compute to run inference for patients in poor countries. It may take two to four additional years before effective AI compute becomes cheap enough for poor countries to afford it. But since effective compute prices drop by about a factor of 100 every two years, something that’s expensive now will be cheap soon enough. (Unlike human experts: a specialist physician who makes $500K in the U.S. this year won’t be willing to work in Africa for $5K in two years or $50 in four years.)
AI Beats Human Doctors for Medical Diagnosis
A new paper by Ethan Goh and many coauthors, mainly from Stanford University, compares the performance of ChatGPT-4 and human physicians in diagnosing patients. The study included 50 human doctors from major hospitals, half junior (residents) and half mid-level (attending physicians).
The participants were asked to diagnose 6 case studies “chosen to reflect diagnostic challenges across different adult medicine specialties” based on written information. The challenge of diagnosing patients based on their medical records without access to the actual patient ensured that all participants (human or AI) received the exact same experimental stimulus. However, not having the patient at hand is probably not the preferred approach to healthcare.
The diagnoses were then scored by a team of expert human doctors, who obviously didn’t know the study conditions that produced the various diagnoses. The scores represent the percentage of medical insights the experts wanted included in a diagnosis for each case study.
The two study conditions required performing the diagnosis with ChatGPT-4 or conventional medical resources such as Epocrates or even Google. The no-AI participants were explicitly asked to refrain from using any AI tool but could use any other medical tool they normally use for clinical care.
The doctors who used ChatGPT-4 scored a little higher than those who used conventional medical resources: 76% vs. 74%. However, this difference was not statistically significant.
Of more interest was that ChatGPT-4 on its own handily outperformed the human doctors, with a score of 92%. The difference between AI and humans was significant at p=0.03.
Diagnosing patients based on written health records. AI alone did better than human doctors alone, and sadly, the human doctors didn’t take advantage of all the AI’s diagnostic insights when they were given AI help. (Midjourney)
In other words, when AI was given written medical case studies, the experts rated its average diagnostic insights at 92%. But when junior or mid-level human doctors were given AI insights to help in their diagnosis, the resulting diagnostic insights were only rated as 76%. The human doctors rejected 16% worth of insights they had received from AI.
Sadly, it’s all too common right now for humans to undervalue AI. Yes, AI makes mistakes: even in this case study, AI only got 92% of diagnostic insights and missed 8%. But we have probably overemphasized the problem of AI errors to the extent that many people are too skeptical of AI.
After all, the question is not whether AI is perfect but whether it is better than what humans would do without it. When AI helps, we should use it, despite its weaknesses.
My hope is that as we get more experience with AI being equal to or better than humans, people will stop discriminating against it.
AI Leaderboard
The LMSYS Chatbot Arena is a website where you can serve the same prompt to two different AI services and compare their output. It comes in two versions:
“Battle” where the site randomly chooses two AI versions for you and don’t disclose their name until after you have voted for which answer you think is best.
“Side-by-side” where you pick the two competing AIs yourself. Of course, when you know who generated which answer, any biases you may have will influence your vote, so this is not as fun an experiment, though it can be useful if you specifically want to compare two particular Ais.
Participating in a few Arena battles is a service to the community, because you provide independent assessment of the relative quality of the many different AI services relative to the particular problems you are interested in.
The data for the AI Leaderboard comes from people like you feeding the same prompt to two AI services and then refereeing their fight to produce the best answer. (Midjourney)
It is also a way for you to hone your skills at assessing AI output. When you get two different answers to the same prompt and have to vote on which is the better solution, you have to think about what makes one AI answer better than another.
Based on thousands of pairwise votes, the site then constructs the AI Leaderboard which ranks the current AIs by how often they have been rated better than their competitors.
As an example, I asked for bullet points to promote the first item in this newsletter (healthcare AI) on LinkedIn. Here are the two answers I received:
AI A
💾 Less IT legacy in healthcare = faster AI adoption
👨⚕️ Doctors spend more time with bad computers than patients
⏱️ AI can free up doctors' time.
⚕️ Focus on patient care, not paperwork
💔 Lack of access to healthcare in developing countries
🌍 Cheaper AI-powered diagnostics for poor areas
AI B
🐸 Healthcare’s tech is stuck in the past, but AI can leapfrog ahead
🩺 AI stepping in where clinical judgment is scarce
🧠 Routine decisions, automated. Sharpen clinical focus where it matters
🏰 A moat around healthcare? Regulations make AI approval tough, but profitable
📉 From scarcity to abundance: AI’s promise for global healthcare
💸 AI tech getting cheaper fast: AI-driven healthcare for everyone
Which do you prefer? There’s a clear stylistic difference, with AI B being more verbose.
As it turns out, AI A was a bot called “gemeni-test,” which I assume is an unreleased AI version being tested by Google. AI B was good old ChatGPT 4o. (If you check the posting image I created, I used a mix of the two, now that I had them, following my old advice to ideate with AI by looking at more variations than you need.)
About the Author
Jakob Nielsen, Ph.D., is a usability pioneer with 41 years experience in UX and the Founder of UX Tigers. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the 10 usability heuristics. He formulated the eponymous Jakob’s Law of the Internet User Experience. Named “the king of usability” by Internet Magazine, “the guru of Web page usability” by The New York Times, and “the next best thing to a true time machine” by USA Today.
Previously, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including the best-selling Designing Web Usability: The Practice of Simplicity (published in 22 languages), the foundational Usability Engineering (27,548 citations in Google Scholar), and the pioneering Hypertext and Hypermedia (published two years before the Web launched).
Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI and was named a “Titan of Human Factors” by the Human Factors and Ergonomics Society.
· Subscribe to Jakob’s newsletter to get the full text of new articles emailed to you as soon as they are published.
· Read: article about Jakob Nielsen’s career in UX
· Watch: Jakob Nielsen’s 41 years in UX (8 min. video)
Interesting read
The test via LMSYS Chatbot Arena is fascinating and I've done a similar experiment recently too with Perplexity Pro.
It's also fun to see the image you generated with Ideogram!