Results for Big Data

Taking Google’s Ngram For a Spin

November 01, 2013
Eric Schultz

As part of a recent parents weekend at a fine New England educational institution, I had the pleasure of watching Peter Norvig,  director of research at Google, demonstrate Google’s Ngram Viewer (Ngram section starts around 9:30).  Released in December 2010, Ngram (Wikipedia tells us) is a “phrase-usage graphing tool” that charts the yearly count of selected n-grams (letter combinations), or words and phrases.  The Google database currently includes 5.2 million books published between 1500 and 2008 containing 500 billion words in American English, British English, French, German, Spanish, Russian, and Chinese

Peter Norvig’s example that day illustrated the power of the Ngram database, comparing the word combinations “The United States is” and “The United States are.”


The result seems logical: the singular “is” becomes the dominant verb after the American Civil War.  This was a relatively straightforward example, however, and I should warn as you read through this article that you’ll need to channel a little of your inner art historian as the graphs become more complicated and require a longer look, often informed by a refresh on dates.  

I also need to warn that the Ngram tool is a little like a chain saw in the hands of a beginner; my graph does not look exactly like Peter’s (and I’m not sure the reason), everything is cap-sensitive, and there’s no simple facility yet (of which I am aware) for combining terms.  So, for example, one cannot search on “Franklin Delano Roosevelt” and “FDR” combined.



As my kindly doctor says, he won’t prescribe for an illness he cannot diagnose.  So, as a complete novice, I am not endorsing Ngram for serious historical research.  I have concluded, however, in a competition between Angry Birds and NGram, the latter is a far more fascinating diversion.

Here’s a comparison I ran on the terms “one nation under God” and “one nation indivisible.”  Remembering that “under God” became a pressing issue in the 1950s and was signed into law by President Eisenhower in 1954, this graph again makes good sense.



Now let me offer something a little more nuanced, comparing the terms “George Washington” and “Abraham Lincoln” (and remembering that “President Washington” or “Abe Lincoln” might be good terms to one day combine in a total search).  The results are below.


What to make of this?  I would have bet that Lincoln had more sheer volume of mentions than Washington, at least in the last generation, but it turns out to be the opposite.  We can see increases at the time of Lincoln’s 100thbirthday in 1909, and Washington’s 200th in 1932.  Beyond that, it appears that Washington really remains first in the hearts (or at least the publications) of his countrymen.  (To complicate the picture, when John Adams is added, he dominates both Washington and Lincoln for most of the 19thcentury before falling behind permanently around 1900.)

Another graph shows the comparison of four wartime events and seems more straightforward, with the emotional force of Pearl Harbor clearly reflected in literature.



Having written about the history of air conditioning for United Technologies (“Weathermakers to the World,” 2012), I was curious to see what would happen when I tested the term.  Sure enough, the history of the technology was plotted on the screen, from its introduction to the public in movie theaters and department stores beginning around 1925, to its hyped status in the 1930s as a technology capable of pulling America out of the Great Depression, to its growth as the Baby Boomers returned from WWII and invaded suburbia.  



When the New York Times called America’s 1970 census “the Air-Conditioned Census,” it resulted in a decade of torrid press until air conditioning became a mature, more mundane topic.  As climate change becomes a persistent topic (and Google updates its Ngram data beyond 2008), we might well see another upsurge in “air conditioning” literature.

I graphed a small sample of American historians, just to get a sense for the push and pull of various interpretations.  (I could see this as the dreaded final exam in a History class, with the simple instructions: “Comment.”)  



Remembering that every “John Fiske” (historian, philosopher and other) ever written about is contained in the Ngram results, I leave it to my professional historian friends to make sense of this chart.  I might add only that, knowing the world a bit as an entrepreneur, the emphasis on Frederick Jackson Turner’s frontier thesis beginning in the 1980s is not surprising, as he is the adopted historian of the high-tech crowd.

Finally, as some of this writing was done with the World Series raging, I wanted to compare the Cardinals and Red Sox. Being a lifelong Sox fan, I was overjoyed to learn that the recent separation in press shown by Ngram accurately fortold the results of the on-field competition.



Taking Google’s Ngram For a Spin Taking Google’s Ngram For a Spin Reviewed by Joseph Landis on November 01, 2013 Rating: 5

Why History Students Should Love Big Data

April 09, 2013
Eric Schultz

Spring 1976. Wilson Hall, Brown University. The late, great Professor William McLoughlin has just informed his 85 students in “American Social and Intellectual History” that they are to write their first paper. All he has given us is the title: “The Age of Jefferson and Adams.” We groan. Then he adds: “Keep it to three pages or less. Double-spaced.” We smile. Three pages? How hard can that be?

“If you make the margins too wide,” McLoughlin adds, “I’ll mark you down a grade.”

Needless to say, nobody got an A on that paper, or so the good professor informed us. There may have been a B or two. Not me. It was all I could do to contain my flowery opening paragraph to a single page. Some of us recovered slightly in round two, wherein we committed “The Age of Lincoln and Calhoun” to three, double-spaced pages. Some retreated to organic chemistry and other more reasonable challenges.

Little did I know, but I had just been introduced to Big Data—though it would take 35 years to earn that name. Take an endless, insurmountable, seemingly disconnected pile of information, separate the grain from the chaff (or, as my engineering buddies would say, the signal from the noise), and tell a concise, compelling story about what it all means.

For the last year, you may have noticed, it’s been hard to escape stories about “Big Data.” In a world where everything can be measured—from your location to how well you sleep to how long you brush your teeth to all of your “Likes” on Facebook, Big Data is upon us with avengeance. Or, as it were, like a Cloud.

Some of you will be lifelong historians and wrestle with Big Historical Data for your careers. I happened to take a left turn into business school and ended up working at and running companies. In the process, I spent an awful lot of time pondering questions about marketing and strategy.

This is how strategy works: Take an endless, insurmountable, seemingly disconnected pile of information, separate the grain from the chaff, and tell a concise, compelling story about what it all means. Sound familiar? I’ve had to do that kind of work in everything from baby products to pet food to Red Sox baseball to the global perishable supply chain. I thank my lucky stars every day for Professor McLoughlin.

Now, we’re being told, in the emerging world of Big Data there will be more and more piles of the stuff lying around. Is there any group in the world better trained to make sense of it all—to wade confidentially into the sea of Big Data—than historians? It’s not just about monitoring, data gathering, and quantitative analysis. (Though a course or two in statistics is a good replacement for the Greek most of us got to skip, and get yourself over to the Computer Science building and spend a semester or two doing some simple coding, just so you can see how the other half will live their lives.)

But in the end, conventional data analysis falls short. When Chris Anderson (of “long tail” and “information wants to be free” fame) wrote that “[w]ith enough data, the numbers speak for themselves,” he was dead wrong. Causal analysis is an extraordinarily deceptive and nuanced thing. Data sets on their own are neutral and largely useless. Cobbled into relational information they begin to sing. But only when a storyteller comes along and provides context and human insight does Big Data really give up the goods.

Tom Friedman, author of The World Is Flat, believes that integration is the new specialty—that someone with a renaissance view of the world is more likely to spark an innovation than a pure engineer. If you are learning the craft of history, that could very well be you.

I do not know exactly what Big Data jobs will look like over the next generation, but I couldn’t predict a decade ago that there would be thousands of “app developers” or positions called “Chief Evangelist” or professional bloggers. I certainly didn’t know to put “Pope” and “tweet ‘in the same sentence. But I stand firm in the belief that God blesses the storyteller; it is he or she who makes data human, and our only real chance to use it like a tool instead of a club.
Why History Students Should Love Big Data Why History Students Should Love Big Data Reviewed by Joseph Landis on April 09, 2013 Rating: 5
ads 728x90 B
Powered by Blogger.