In this assignment you will learn how to use software to analyze book-length texts.
Students will learn be able to:
Identify how historical novels can be used to study the past.
Correctly identify the century and country in which Miguel de Cervantes, William Shakespeare, and Wu Cheng-en wrote, and about what region each author wrote.
Enter texts into Voyant and words into Ngrams.
Interpret the Voyant and Ngram outputs for historical importance.
Write about how distant learning informs our understanding of the past.
Miguel de Cervantes wrote Don Quijote at the very beginning of the 17th century (1605–1615). Most literature scholars agree it is one of most important novels of the last 500 years. Similarly, Journeys to the West, published by Wu Cheng-en in 1592, is considered a classic of Chinese literature, and recounts the humorous exploits of a Monkey king in 100 chapters. William Shakespeare wrote 38 plays and many poems that forever shaped the English language.
Taken together, these two novels and one collection of plays offer a substantial introduction to the historical though of the English, Spanish, and Chinese around the turn of the 17th century. Ideally, we’d read all the works and then compare how each author treated certain subjects. However, it is unrealistic to read that much literature in one course. Yet, we may still analyze all of these texts by using distant reading.
To start, let’s just explore how using lots of words differs from using just a few words. For one, we can’t read word-for-word everything we’re going to analyze. When you read a newspaper article, you understand each word as you read it and you think about the overall meaning of the words in a sentence, and the meaning of the sentence in a paragraph. This is a type of reading is what we call “close reading” and you’ve been doing it since you could read.
“Distant reading” uses digital tools, such as Ngrams and Voyant, to look at large groups of words, entire collections of texts in fact. Figuratively standing this far back from the words, we need some help understanding patterns that might emerge. So we use “Big Data” that is software or web sites that collect, process, and explain large amounts of data in a way we probably couldn’t do on our own personal computers.
Read articles and view tools.
Google scanned thousands of books, and now we can ask one of their databases about the number of times a word appears in books written since 1500. To be clear, Google has not scanned all books in the world, or even all the books in English, but that doesn’t make their tool useless, just not all inclusive.
Ngrams shows the incidence (number of times in a given year) a word appears in books over time.
What you see is the trend in usage for words. The overall number of times (expressed as a percentage of the total words used) a word appears is not important to us as historians. We care about change over time.
Now, try expanding the dates to 1500–2000. You’ll see that there are big spikes in the graph before 1700 This is likely due to incomplete records, not because words became much more popular or unpopular in a short amount of time. For some years lots of books were scanned, for other books, few books were scanned.
Once you’ve played with Ngrams for a bit, chose three words from the background readings on Shakespeare, Wu, and Cervantes, related to poverty and enter them in the Ngram viewer. What do you see? What interpretation might you make from this graph? Write up your analysis as the first paragraph for your Experiment 4 assignment.
- Mind the fallacy that the absence of evidence is not evidence of absence. Simply because a word does not apear in the data does not mean that people in the period studied didn’t use it or care about: the word just isn’t there.
Now that you’ve played with a tool that is simple to use and explores a huge body (what we sometimes call corpus) of texts, we’re going to use a more advanced digital tool that let’s use choose specific works to analyze.
You can also ask Voyant to show you what words other words are connected to with “Links.”(1) Voyant also shows the most common phrases in the collection in phrases(2).