Automated Content Strategy and Tone-of-Voice Metrics

Aug 23, 2023

Summary: AI tools can analyze large volumes of content to provide metrics like readability, style, and sentiment. This allows companies to easily track content quality at scale and steer their content strategy using objective metrics.

With the advent of advanced AI tools specializing in language analysis, anybody can effortlessly collect and track several important metrics across a company’s content production. This empowers organizations to steer content strategy and keep it on track, whether your organization is orchestrating large or small volumes of content.

Readability

Readability measures readers’ ability to effortlessly comprehend written text, which is crucial for usability. Users must read the words and process the content, understand its meaning, connect it to prior knowledge, and synthesize this information into a broader picture. The ultimate goal is to enable users to act correctly based on this understanding.

When content becomes impenetrable, comprehension fails at one or multiple stages, leading users down a path of misunderstanding. Complex language can also alienate readers. Often, if the content is too hard to read, people will abandon a website.

Content complexity can hurt users in many ways:

Difficult or rare words that people don’t understand or (even worse) misunderstand. This includes jargon that insiders find easy but alienates outside customers.
Long or convoluted sentence structures that prohibit users from following the writer’s stream of thought
Passive voice requires readers to do more cognitive gymnastics to transform the stated message into understanding the relationships between subjects and objects. If told that the lawn was mowed by Peter, you need extra mental effort to realize that Peter mowed the lawn.

The deep definition of readability requires us to run usability studies to test whether members of the target audience can understand our content and whether they act correctly based on their understanding. For example, on an e-commerce site, have people read two different product descriptions to determine which product is best for specific conditions. I strongly recommend conducting such studies occasionally, but it’s too much overhead to conduct complete research for every piece of content. Also, since qualitative usability testing is the best way to reveal effective rewrites for better understanding, we will be left without metrics that can be tracked over time or compared across sections of a website or departments of content contributors.

Readability Scores

This is where readability scores come into play, usually as a calculated reading level. The reading level is the years of formal education required to comprehend the text easily. For example, saying that a text reads at the 9th-grade reading level means that it’s supposed to be easy for somebody who has completed the first year of high school (in an education system where the high school is on top of 8 years of previous education). Similarly, text written at a 12th-grade reading level is easy for a high school graduate, and a 16th-grade reading level would be easy for somebody with a bachelor’s degree. (Much as we don’t usually refer to the 4th year of college as the 16th grade, we persist in counting the years when discussing reading levels.)

A caveat for all these grade levels is that they refer to students who have mastered the curriculum, learning what they were supposed to during each school year. In reality, many students learn very little and are passed through the education system without the ability to read at grade level. This is why my recommendation for writing for a broad consumer audience is to produce text at the 8th-grade reading level, even in a country like the United States, where most people have graduated high school. Diploma or not, many of these people can only read at the 8th-grade level.

Reading levels are easy to compute automatically, using one of several widely recognized formulas. As Wikipedia says, “Polysyllabic words affect this score significantly.” Or, as I would say, “Long words are harder to read.” The readability scores are mainly based on counting word length and sentence length, so they are approximations of actual readability, as would be assessed by testing users’ actual understanding.

The most common readability formula is the Flesch-Kincaid reading level, named after two pioneering researchers in the field. Rudolf Flesch developed the original Flesch Reading Ease formula in 1948, and J. Peter Kincaid modified it in 1975 to produce the formula we use today. A calculator for Flesch-Kincaid reading level is built into Microsoft Word: under the “Editor” button, go to the “Insights” heading and click “Document Stats.”

We can get more detailed information from AI. For example, here’s how Claude analyzed my article on UX Portfolio Reviews and Hiring Exercises in the Age of Generative AI:

Based on the Flesch-Kincaid Grade Level formula, the content has a grade level of 11.4, making it suitable for readers at the 11th grade level or higher. The Flesch Reading Ease score of 42.6 also indicates that the content is quite difficult to read.

Some key factors contributing to the high grade level score are:

Long, complex sentences with multiple clauses
Academic and technical vocabulary like "symbiants", "divergence", "homogeneity"
Abstract concepts that require background knowledge
Lengthy paragraphs with dense information

To make the content more readable for a general audience, suggestions include:

Shortening sentences
Using more common vocabulary
Explaining concepts clearly with examples
Breaking content into shorter paragraphs
Using headers, lists, and other formatting for scanability

In summary, the high grade level and low reading ease score reflect advanced academic writing. Simplification would make the content more accessible to a mainstream readership.

Good advice, Claude! I didn’t take it because I was targeting a highly-educated and sophisticated audience of UX professionals and UX hiring managers. These readers are expected to have the requisite background knowledge, a lack of which makes my article incomprehensible to an average person chosen off the street.

The Claude IA tool is an excellent helper for tracking readability and identifying ways authors can simplify their content.

Unfortunately, AI-generated copywriting often has too elevated reading levels for the general public, even when you ask for a lower reading level in the prompt. So while AI-produced text has advantages, you need to retain a copy review as part of your editorial process.

Automated AI analysis is an essential tool to track the quality of rapidly burgeoning content at scale. (“Content pile” by Leonardo.AI)

Writing Style and Tone of Voice

As we’ve seen, readability is easy to calculate automatically, meaning we can record and track these scores over time across a vast corpus of text.

What about more complex matters like writing style and tone of voice? These can now be assessed automatically as well. We can’t use simple formulas, but current AI is up to doing the analysis for us.

For individual articles, you can copy and paste the text into an AI tool with a prompt like “analyze the following article and provide an analysis of the prevailing style and tone of voice employed in writing the content.” You’ll get good insights, and if you perform this exercise repeatedly, you may see changes over time if or when you intend to recalibrate your writing style towards specific objectives.

For larger amounts of text, the regular chat interface won’t work. Instead, compile the text into a corpus contained in a large text file that you upload to the Code Interpreter tool in ChatGPT 4.

(As an aside, “Code Interpreter” is an abomination of a name for a feature, and even advanced users of ChatGPT often don’t know about it. The term has zero discoverability and probably makes most people think it’s a debugging tool for developers. In reality, the ChatGPT Code Interpreter is a feature for working with data sets instead of having a chat session. Sad, but what else than terrible feature names can you expect from a company that used a name like ChatGPT for its leading product? No competent marketing professional ever touched OpenAI’s naming strategy. Though “strategy” may be too charitable a word here. Update September 2, 2023: ChatGPT has changed the feature name from Code Interpreter to the much more reasonable Advanced Data Analysis.)

The mastermind behind the nomenclature of ChatGPT and its features seems to be none other than the Abominable Snowman, captured mid-action in naming the "Code Interpreter." (Image by Leonardo.AI.)

After uploading your corpus to Code Interpreter, you can now enter a prompt like the one above and have ChatGPT perform the content analysis on all your content. For example, an appendix to this article shows ChatGPT’s analysis of my articles from the last two months, forming a corpus of 42,000 words. Lots of words, but the analysis was cooked in a minute.

I’ll leave it to you whether you agree that my writing style is “technical and professional, clear and direct, and analytical and thoughtful,” with a tone of voice that’s “authoritative, informative, and concerned.” Such is the conclusion drawn by ChatGPT, which matches quite well with my understanding. More important, it aligns with my intentional direction, affirming that my writing embodies my vision. No major course corrections are needed.

Adjustments would have been imperative if the analysis had rendered a different verdict.

For a big company (as opposed to an individual writer), automated AI analysis of style and tone of voice becomes a valuable content strategy tool:

Track across time. Are you veering from your approved style?
Compute for sections of your website. Are they uniformly aligned?
Compute for departments or groups of content providers. Are specific teams straying from the standard?
Use the AI analysis as a starting point in creating a fresh style guide or tone-of-voice definition.

Tracking any of these without AI is so arduous that it is destined to remain unfulfilled. With AI? Run monthly analyses and see how you do. You can steer even a supertanker of a content ship back on course before it runs aground.

A downside of the AI style and tone analyses is that they produce prose descriptors that must be interpreted. If you get the same bullets each month, you’re probably pretty steady, but even if the terms change slightly, that may not matter much. You’ll have to read the full description and conclusions from the analysis and likely supplement with your human insight from reading some docs yourself.

Polarity and Subjectivity Scores

The ChatGPT Code Interpreter can also calculate polarity and subjectivity scores for individual content pieces or an entire corpus.

Polarity score quantifies the overall sentiment of a piece of text. It ranges from -1 (most negative) to +1 (most positive), with 0 indicating a neutral sentiment.
Subjectivity score gauges the objective or subjective feel of the text, with higher scores indicating more subjectivity. It ranges from 0 (utterly objective or fact-based) to 1 (wholly subjective or opinion-based).

Upon evaluating two months of my writings, ChatGPT computed my polarity score as 0.13 (a slightly positive sentiment) and my subjectivity score as 0.48 (a moderate level of subjectivity). I am happy with both numbers and don’t feel the need to change, but other organizations with different content goals might arrive at different conclusions for identical scores. There’s no objectively best level of subjectivity — it depends on your goals.

In my case, I’m happy to be slightly positive overall since I was soundly criticized during my earlier writing career (the dot-com bubble days) for being too negative. Now, I’m optimistic about technology, but not hopelessly so. You can even see in the present article that I am positive about using AI to quantify and guide content strategy. Similarly, for my goal of being a thought leader in user experience, it’s appropriate to write with a blend of facts and interpretation or judgment. Entirely objective writing would be too dull and provide insufficient guidance for my audience. Still, an excessively emotional or opinionated style would be useless for my audience of pragmatic business professionals.

Your situation is likely different on all counts, but you can define your desired levels of polarity and subjectivity and track the scores over time.

Conclusion: Style Is No Matter of Taste

Current AI tools already provide extensive support for monitoring content objectively and efficiently, meaning that it can be done at scale and with regularity. Future AI will undoubtedly excel further.

With AI, we can now dissect content for multifaceted writing styles and deduce explicit, often measurable conclusions. This capability transcends the often-contentious personal biases of editors, content strategists, or corporate leadership. These stakeholders should be involved in defining your company’s desired style. But once defined, disengage from human subjective feelings and use AI to track how well your content meets the goals.

Or, in a haiku:

Polarity scores,
Subjective or objective,
AI’s judgment fair.

Full Poetic Summary

Through realms uncharted, over heaps of words,
The AI scans with unrelenting gaze,
The style and tone of scribes in distant herds,
Their quills and ink replaced by metrics' blaze.

Readability, that chariot of light,
Guides readers down the path with grace and ease,
While complexity, a challenging fight,
Can leave the mind adrift in foggy seas.

The scores and numbers, once the cryptic lore,
Of scholars in the halls of learning high,
Now democratized, they touch the shore,
Of common minds, with truth they multiply.

With AI's help, no more the content drifts,
In stormy seas of inconsistency,
The ship is guided by analytical gifts,
Toward the harbor of transparency.

Appendix: AI Analysis of Jakob Nielsen’s Articles

I commissioned ChatGPT 4 Code Interpreter to analyze a corpus of 42,000 words comprising articles I wrote over the last two months. I agree with its analysis, except for its naïve understanding of social media engagement. Here is what it said:

The text appears to be focused on the field of user experience (UX) and technology, with a particular emphasis on usability and AI systems.

Style

Technical and Professional: The language used is technical and aimed at professionals in the field of UX and AI. Terms like "Generative AI systems," "ChatGPT," and "command-based UI paradigm" indicate that the content is geared towards an educated audience.
Clear and Direct: Despite the technical nature, the writing attempts to make the concepts accessible. It is structured with clear headings and summaries.
Engaging with Social Media Callouts: The inclusion of social media tags and connections like "Follow me on LinkedIn" adds an engaging and contemporary aspect to the writing.
Analytical and Thoughtful: The writing is not just descriptive; it's analytical and reflects a deep understanding of the subject matter.

Tone

Authoritative: The tone is confident and assertive, indicative of a thought leader in the field. The writer takes a clear stance on the topic, particularly in favor of usability.
Informative: The tone is also educational and informative, aiming to enlighten the reader about the subject matter without oversimplifying the complexity.
Concerned: There is a clear concern for the usability issues related to current technologies, and this concern is expressed without being alarmist.

Conclusion

The prevailing style and tone of the provided corpus align well with the objectives of engaging a highly educated audience of UX designers and researchers. The writing is assertive and takes a clear stand in favor of usability, reflecting an authoritative voice in the field.

Here's the analysis of the corpus in terms of positivity and subjectivity:

Polarity score: 0.1292.

This positive value indicates a slightly positive sentiment in the overall text. The writing likely contains more positive expressions and opinions than negative ones, but it's not overly optimistic or enthusiastic. This aligns with a professional and balanced tone that might be expected in a thought leadership piece focusing on usability and user experience.

Subjectivity Score: 0.4786

The subjectivity score falls within the range of [0, 1], with a score of 0.4786 indicating a moderate level of subjectivity. This suggests that the text includes both objective information and personal opinions or judgments. The blend of factual data with interpretative analysis aligns well with a thought leadership approach, where the writer’s expertise and perspective are integral to the content.

More on AI UX

This article is part of a more extensive series I’m writing about the user experience of modern AI tools. Suggested reading order:

About the Author

Jakob Nielsen, Ph.D., is a usability pioneer with 40 years experience in UX. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the 10 usability heuristics. He formulated the eponymous Jakob’s Law of the Internet User Experience. Named “the king of usability” by Internet Magazine, “the guru of Web page usability" by The New York Times, and “the next best thing to a true time machine” by USA Today. Before starting NN/g, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including Designing Web Usability: The Practice of Simplicity, Usability Engineering, and Multimedia and Hypertext: The Internet and Beyond. Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI.

Follow Jakob on LinkedIn.
Subscribe to Jakob’s newsletter to get the full text of new articles emailed to you as soon as they are published.

Jakob Nielsen on UX

Discussion about this post