Saturday, July 23

The Best Thing Since Sliced Bread: Amazon's Text Stats


Update (Oct 30, 2012): Apparently Amazon's Text Stats have disappeared. A shame.

Text stats at Amazon aren't new, they've been around since 2007, but I never knew about them until I read Gabe Habash's article, Book Lies: Readability is Impossible to Measure.

Apparently, for every book in Amazon's "Search Inside" program you can see its readability statistics as well as how these stats compare with other books.

Gabe Habash writes:
One of Amazon’s best and little-known book features is its “Text Stats” page, a tiny link that’s tucked three-quarters down a book’s page under the “Inside This Book” heading. Clicking the link takes you to a page with graphs and numbers, the most interesting (and objective) of which is word count. It’s always fun to compare War and Peace’s word count (590,000) to major textbooks, and to see that Tolstoy smashes most of them with his stern Russian will.

But there are other figures on the page, and these are meant to tell you, as close to objectively as possible, how readable and how complex the book is. We put these measurements to the test to see how accurate they are in determining how readable and how complex a text is. The six books we sampled are Finnegans Wake by James Joyce, Where I’m Calling From by Raymond Carver, The Great Gatsby by F. Scott Fitzgerald, The Memory Keeper’s Daughter by Kim Edwards, The Tipping Point by Malcolm Gladwell, and Moby-Dick by Herman Melville.

As an example, let's take Great The Scarlet Letter by Nathaniel Hawthorne.

Here's the book page for The Scarlet Letter. About 2/3 down the page you'll come to the heading: Inside This Book (this is after both the Customer Reviews, and the Customer's Also Bought headings). Here you will learn what the first sentence of the book was as well as various statistically improbably phrases, capitalized phrases, a concordance of 100 most common words and (drum roll please) a link that will take you to a place called "Text Stats".

If you click this link, you will learn that the Fog Index for The Scarlet Letter is 14.1 (the number of years of education you need to understand the text), the Flesch Index is 14.3 (90-100 would indicate a book very easy to read, 10 would indicate one abominably difficult) and so on.

Now, what's really fun is comparing books. The first thing I did is look at Harry Potter and the Philosopher's Stone, but it didn't have readability statistics. In a mischievous mood I pulled up Twilight, but while it did display certain readability statistics, the text stats were missing.

Disappointed I headed on over to The Vampire Diaries: The Return: Midnight but that one didn't have text stats either.

Not to be put off, I looked at Great Expectations and was not disappointed. Apparently you need four less years of education to read Great Expectations than you would for The Scarlet Letter and the words Dickens used are, overall, 4% less complex than the ones Hawthorne chose.

I'm not sure how useful this is, but it would have been fun to compare, say, the text stats of Harry Potter and the Philosopher's Stone to Twilight.

Links:
Book Lies: Readability is Impossible to Measure
The Scarlet Letter
Text Stats for The Scarlet Letter
Great Expectations
Text Stats for Great Expectations

No comments:

Post a Comment

Because of the number of bots leaving spam I had to prevent anonymous posting. My apologies to anyone this inconveniences, I wish I didn't have to do it. I do appreciate each and every comment.