Words Assignment I: Introduction to Distant Readings Using Voyant and Ngrams

In this assignment you will learn how to use software to analyze book-length texts.

Learning Objectives

Students will learn be able to:

  1. Identify how historical novels can be used to study the past.

  2. Correctly identify the century and country in which Miguel de Cervantes, William Shakespeare, and Wu Cheng-en wrote, and about what region each author wrote.

  3. Enter texts into Voyant and words into Ngrams.

  4. Interpret the Voyant and Ngram outputs for historical importance.

  5. Write about how distant learning informs our understanding of the past.

Background Information

Miguel de Cervantes wrote Don Quijote at the very beginning of the 17th century (1605–1615). Most literature scholars agree it is one of most important novels of the last 500 years. Similarly, Journeys to the West, published by Wu Cheng-en in 1592, is considered a classic of Chinese literature, and recounts the humorous exploits of a Monkey king in 100 chapters. William Shakespeare wrote 38 plays and many poems that forever shaped the English language.

Taken together, these two novels and one collection of plays offer a substantial introduction to the historical though of the English, Spanish, and Chinese around the turn of the 17th century. Ideally, we’d read all the works and then compare how each author treated certain subjects. However, it is unrealistic to read that much literature in one course. Yet, we may still analyze all of these texts by using distant reading.

To start, let’s just explore how using lots of words differs from using just a few words. For one, we can’t read word-for-word everything we’re going to analyze. When you read a newspaper article, you understand each word as you read it and you think about the overall meaning of the words in a sentence, and the meaning of the sentence in a paragraph. This is a type of reading is what we call “close reading” and you’ve been doing it since you could read.

“Distant reading” uses digital tools, such as Ngrams and Voyant, to look at large groups of words, entire collections of texts in fact. Figuratively standing this far back from the words, we need some help understanding patterns that might emerge. So we use “Big Data” that is software or web sites that collect, process, and explain large amounts of data in a way we probably couldn’t do on our own personal computers.

Directions for Assignment

  1. Read articles and view tools.

Wu Cheng’en Reading: Journey to the West

Introduction to William Shakespeare

Introduction to Miguel de Cervantes – NOTE- NEW READING!

Voyant tools

Google ngrams

Google scanned thousands of books, and now we can ask one of their databases about the number of times a word appears in books written since 1500. To be clear, Google has not scanned all books in the world, or even all the books in English, but that doesn’t make their tool useless, just not all inclusive.

Ngrams shows the incidence (number of times in a given year) a word appears in books over time.

  1. Go to https://books.google.com/ngrams.

Ngrams (continued.

  1. What you see is the trend in usage for words. The overall number of times (expressed as a percentage of the total words used) a word appears is not important to us as historians. We care about change over time.

  2. Now, try expanding the dates to 1500–2000. You’ll see that there are big spikes in the graph before 1700 This is likely due to incomplete records, not because words became much more popular or unpopular in a short amount of time. For some years lots of books were scanned, for other books, few books were scanned.

  3. Once you’ve played with Ngrams for a bit, chose three words from the background readings on Shakespeare, Wu, and Cervantes, and enter them in the Ngram viewer. What do you see? What interpretation might you make from this graph? Write up your analysis as the first paragraph for your **Words I Assignment. **

Introduction to Voyant

Now that you’ve played with a tool that is simple to use and explores a huge body (what we sometimes call corpus) of texts, we’re going to use a more advanced digital tool that let’s use choose specific works to analyze.

Links and Phrases

You can also ask Voyant to show you what words other words are connected to with “Links.”(1) Voyant also shows the most common phrases in the collection in phrases(2).

Using stopwords

Stop words are words which are filtered from results, and are often based on lists of very common words. For instance, in the screenshot, the most common word is “shall.” Voyant has already edited out common words, such as “and” and “the.” As historians, we may want to remove other words from the list. To do this we click on the little button in the upper right-hand corner of the word cloud.

Edit stopwords

Clicking that button bring up a window and we click on “Edit List” next to Stopwords.

Stopwords

I’m going to add “shall” and “good” to the stopwords and click “save” and then “confirm.”

What changed?

You can see our word cloud changed, telling us what other words were most important in all of Shakespeare’s plays and sonnets.

Chose two words in Shakespeare and compare his usage of them based on the counts.

For example, I might enter “love” and “death” and see which term he used more.

Write one paragraph about what your two term comparison could tell us a about 16th century England. You can use your background reading to aid your analysis.

Final product

Your assignment is two paragraphs, one using Ngrams, one using Voyant, based on the instructions above. Save your paragraphs as .txt and upload them to the D2L Assignment Submission Folder.

Grading criteria

Student:

  1. Entered texts into Voyant and words into Ngrams.

  2. Interpret the Voyant and Ngram outputs for historical importance.

  3. Write about how a distant learning analysis informs our understanding of the past.

  4. Used the background readings to inform her/his analysis.

  5. Wrote two paragraphs with complete sentences, and proper citation of all quotations.