I’m wasting my afternoon with Google’s n-grams database, a feature that tracks frequencies of words/character strings and phrases in the Google Books corpus. This tool was the subject of a recent Science paper. Among many nifty little results the paper reported was the shape of the frequency-over-time curve for names of years: there’s a strong spike right before and during the year in question, then a more gradual – and often predictable – decrease in usage.
My first thought, obviously (at least if you’re me), was “1984”. Is there a signature in the Google Books corpus of everyone’s (my) favourite dystopian novel? Well I’m glad you (I) asked!
First I tried just plugging in the number 1984 and comparing it to other years from the 80s as a reference. Oddly, there was nothing exceptional about ’84. Like all the other years, it has a sharp increase a year or two before and then a more gradual decrease afterwards. (I’m linking to the results, not posting pictures, because I’m lazy. I think that’s my prerogative.)
But of course, the book is not called “1984”. It’s “Nineteen Eighty-Four.” And, unfortunately, n-grams can’t deal with hyphens. But let’s forge ahead sans hyphen. Here’s a link to the results for Nineteen Eighty Four, Nineteen eighty four, nineteen eighty four, and, for reference, nineteen eighty five. (As you might have guessed, it’s case-sensitive. I’ll also note that the year 1985 with different capitalization patterns gives the same result.) Guess what – it starts to increase right when Orwell published the book, in 1949.
I still find it odd that the number 1984 doesn’t get a bump at all. Whatever. Here for your amusement is the rise of several more Nineteen Eighty-Four-themed words! At first it looks like there’s a huge spike in “Big Brother” use right when the book was published, but that actually starts in 1945 and ends the year after publication. And it’s been on the increase (why?!) since long before then. And, if you zoom in to post-1940, you’ll see that both “Thought Police” and “doublethink” were in the corpus just before Orwell’s book came out. I have no idea why this is! Maybe there’s a 1945 draft of the book in the corpus? Or was Orwell reflecting on his own time more than I’ve been taught – and he himself said – he was?