The Ultimate Guide to Google Ngram
Google Ngram shows you the popularity of any keyword in books over the past 200+ years. It's like Google Trends but instead of looking at searches, it looks at books.
It's easy to spend hours exploring the tool, which highlights fascinating long-term trends like chicken meat whose fascinating rise we covered here.
There are also fascinating insights in the world of culture and politics: Mentions of "war on drugs" exploded in the late 1970's and, about a decade later, in 1990, mentions of "war on terror" surged.
The percentages on the Y-axis in Google Ngram represent the percent of keywords in Google's sample of books, written in English and published in the United States, that are the target keyword. For example, searching Google Ngram for "the" shows that "the" makes up 4.2% of modern published text.
Google Ngram's compare feature is also a useful tool for comparing historical popularity of major brands.
How accurate is Google Ngram?
Google Ngram is mostly accurate, though it does have small errors from time to time. You'll see in the graph above, depicting airline popularity over time, that "American Airlines" appears to have been mentioned in the 1830s. This is obviously impossible because planes weren't invented until 1903 and airlines not until 1909. Digging deeper, we can see that one of the articles which Google says is from around this date, 1843, is in fact not from 1843. While The Economist was indeed around in 1843 (the very year it was founded), none of the companies mentioned in the article were around then.
These errors are sometimes a result of Google's imperfect digitization process. Google Books uses machine learning to convert scanned images of book pages to searchable text, but when it encounters any words that are too hard for it to decipher, it passes it to a human.
In fact, Google has the equivalent of more than 500k full-time employees working for free to decipher failed book scans, among other things – and you yourself are one of them. It's done using CAPTCHAs
When Google's system for converting book photos to text fails, it presents the problem to a user. The reason there are often 2 words is because one is from a book and is not know and the other is computer generated and is distorted but already known by Google. If the user gets the known word right, Google assumes they got the unknown word right too. In some cases, the same unknown word to multiple users to be sure, but as with all human-based processes, mistakes slip through.
While it does have small errors from time to time, Google Ngram is mostly accurate and, in cases where there's a surprising rise or decline, remember that this may instead be caused by changes in language. After all, the way people talk–and spell–can change significantly over the course of hundreds of years.
It's also important to consider sentence structures when using Google Ngram. Searching for "food can" could be pulled from both of the following sentences: 1. "She had a food can with extra vegetables." 2. "Perishable food can go bad if left out."
The easiest way to test the accuracy is to look at events and see if the graphs match logic. (Note that, in this case, WWI of course wasn't called that until WWII because nobody knew there was going to be a second global war).
Alternatives to Google Ngram
Although not many other tools support such long data timelines, the best free alternative to Google Ngram is Google Trends, which dates back to 2004.