Words Assignment I: Introduction to Distant Readings Using Voyant and Ngrams

In this assignment you will learn how to use software to analyze book-length texts.

Learning Objectives

Students will learn be able to:

  1. Identify how historical novels can be used to study the past.
  2. Enter texts into Voyant and words into Ngrams.
  3. Interpret the Voyant and Ngram outputs for historical importance.
  4. Write about how distant learning informs our understanding of the past.

Background Information

To start, let’s just explore how using lots of words differs from using just a few words. For one, we can’t read word-for-word everything we’re going to analyze. When you read a newspaper article, you understand each word as you read it and you think about the overall meaning of the words in a sentence, and the meaning of the sentence in a paragraph. This is a type of reading is what we call “close reading” and you’ve been doing it since you could read.

“Distant reading” uses digital tools, such as Ngrams and Voyant, to look at large groups of words, entire collections of texts in fact. Figuratively standing this far back from the words, we need some help understanding patterns that might emerge. So we use “Big Data” that is software or web sites that collect, process, and explain large amounts of data in a way we probably couldn’t do on our own personal computers.

Directions for Assignment

  1. View tools.

Voyant tools

Google ngrams

Google scanned thousands of books, and now we can ask one of their databases about the number of times a word appears in books written since 1500. To be clear, Google has not scanned all books in the world, or even all the books in English, but that doesn’t make their tool useless, just not all inclusive.

Ngrams shows the incidence (number of times in a given year) a word appears in books over time.

  1. Go to https://books.google.com/ngrams.

Ngrams (continued.

  1. What you see is the trend in usage for words. The overall number of times (expressed as a percentage of the total words used) a word appears is not important to us as historians. We care about change over time.

  2. Once you’ve played with Ngrams for a bit, chose three words from Tale of Kieu and enter them in the Ngram viewer. What do you see? What interpretation might you make from this graph? Write up your analysis as the first paragraph for your **Words I Assignment. **

Introduction to Voyant

Now that you’ve played with a tool that is simple to use and explores a huge body (what we sometimes call corpus) of texts, we’re going to use a more advanced digital tool that let’s use choose specific works to analyze.

Links and Phrases

You can also ask Voyant to show you what words other words are connected to with “Links.”(1) Voyant also shows the most common phrases in the collection in phrases(2).

Using stopwords

Stop words are words which are filtered from results, and are often based on lists of very common words. For instance, in the screenshot, the most common word is “shall.” Voyant has already edited out common words, such as “and” and “the.” As historians, we may want to remove other words from the list. To do this we click on the little button in the upper right-hand corner of the word cloud.

Edit stopwords

Clicking that button bring up a window and we click on “Edit List” next to Stopwords.

Stopwords

I’m going to add “shall” and “good” to the stopwords and click “save” and then “confirm.”

What changed?

You can see our word cloud changed, telling us what other words were most important in all of Shakespeare’s plays and sonnets.

Chose two words in this document and compare the usage of them based on the counts.

For example, I might enter “love” and “death” and see which term he used more.

Write one paragraph about what your two term comparison could tell us about 18th century Vietname. You can use your background reading to aid your analysis.

Final product

Your assignment is two paragraphs, one using Ngrams, one using Voyant, based on the instructions above. Save your paragraphs as .txt and upload them to the D2L Assignment Submission Folder.

Grading criteria

Student:

  1. Entered texts into Voyant and words into Ngrams.

  2. Interpret the Voyant and Ngram outputs for historical importance.

  3. Write about how a distant learning analysis informs our understanding of the past.

  4. Used the background readings to inform her/his analysis.

  5. Wrote two paragraphs with complete sentences, and proper citation of all quotations.