Example: Shakespeare – WordSeer Project Page

Other posts have shown how WordSeer can be used to explore small, well-defined questions: what word did Shakespeare use for ‘beautiful’? Is the occurrence of the word ‘love’ the same in the comedies and tragedies? This post is different. WordSeer has developed enough to support a simple, but complete, exploratory analysis.

The question we’ll think about is this:

“How does the portrayal of men and women in Shakespeare’s plays change under different circumstances?”

As one answer, we’ll see how WordSeer suggests that when love is a major plot point, the language referring to women changes to become more physical, and the language referring to men becomes more sentimental. You can watch a screencast here, or just read this post.

Search

We began our analysis with the question, “what are some things that are portrayed as ‘his’ and some things that are ‘hers’?. A typical keyword search returns an unstructured lists of results, and a standard approach in literature study is to view them in a concordance. This is a list of all the sentences in which a word occurs, with the target word aligned in the center of the view, exposing the contexts to its left and right, sorted in some way. WordSeer uses the word tree concordance visualization which makes common contexts easier to see by grouping them in a tree-like structure.

word tree for the word 'her' generated by wordseer — Figure 1. Word Tree for the word “her” generated by WordSeer.

The word tree for her is shown in Figure 1 above. Some words like beauty stand out, but constructions like her own muddy the picture. The problem lies in the different ways in which his and her are used. The word his is always a possessive pronoun, and word sequences containing his would nearly always be relevant. However, her can also be a 3rd-person pronoun, and will yield constructions like “I told her that X” and “I gave her the Y”.

Figure 2. The grammatical relationships made searchable by WordSeer.

With WordSeer, we can get around this problem with grammatical search.The system uses natural language processing (NLP) to extract relationships between words, and allows users to specify both keywords and relationships between them. In the tool’s search interface, pairs of words are specified using input boxes, and the relationship between them is selected from a drop-down menu (Figure 2). Leaving a word-input box blank returns all matches.

With this feature, we can take advantage of the fact that possessive relationships between words can be automatically detected, to express our question precisely: “what are all the words with which his has a possessive relationship?”. The results are shown in Figure 3 below.

Grammatical search results for "possessed-by" "his" — Figure 3. Grammatical search results for possessed-by his

Comparing these words with those for her (Figure 4 below) reveals immediate differences. The word father is most common for her, with husband, and son close behind. Several body parts enter the picture: eyes, hand, face, tongue, lips, cheek. A picture emerges: women’s most commonly-mentioned possessions are their male relatives and their bodies.

Figure 4. Grammatical search results for possessed-by her

Visualization, Reading, and Hypothesis-Generation

Our next question was whether this physical, male-dominated picture of women was consistent, or whether it changed in different types of plays. We used the tool’s collections feature to divide the plays into comedies, tragedies, and histories – the three most commonly-accepted categorizations of Shakespeare’s plays. We also created pre-1600, and post-1600 categories to check whether there were temporal differences.

collections-organized — Figure 5. Initial collections

Collections were created using the “collections” bay, a collapsible window at the bottom of the screen. We added the appropriate plays through the document listing (sortable and filterable by date, title, full-text search, grammatical search, and length).

We used the tool’s newspaper-strip visualization (Figure 6) to compare the prevalence of the two categories of words in different types of plays. Each play is represented as a long column. Within each column, small, colored horizontal blocks (corresponding to 10 sentences each) highlight the presence of a match.

The results for the tragedies collection were similar to the results for comedies (Figure 6) but in histories (Figure 7), an interesting pattern emerged. It seemed that body parts (blue) were somewhat less prevalent in these plays, but family (orange) remained unchanged.

Figure 7. Possessed-by her in the histories — Figure 7. Comparing the prevalence of body parts possessed-by her (eyes, lips, cheeks, and face)(blue) and relatives possessed-by her (husband, father, sons, daughters, children) (orange) in the histories. Each column is a play, represented in alternating shades of grey.

Hypothesis-building: close reading, annotation, and exploration

WordSeer supports quick, large-scale analysis through search and visualization, but in all cases maintains links back to the source text. Hovering over a blue or orange highlighted block in Figures 6 or 7 brings up a popup displaying the matching sentence. Clicking opens the reading interface to that point (Figure 8). The full text of the document is loaded, and the system automatically scrolls to the relevant sentence, and highlights it.

WordSeer's reading interface — Figure 8. WordSeer’s reading interface. If the document is subdivided into sections, these appear on the right as a table of contents.

Hovering over a few body-part results quickly led to a new hypothesis. In our rough sample, many of the mentions sounded romantic. We used the reading and annotating interface to follow up on this by clicking on the highlighted blocks in the newspaper-column visualization.

Highlighting text creates a snippet to which tags and notes can be attached — Figure 9. Highlighting text creates a snippet, to which tags and notes can be attached.

We selected the speeches referring to body parts and tagged them by the topics they seemed to contain. It soon became apparent that many of the mentions were speeches by a lover.

Our hypothesis was strengthened when we viewed related words. For exploration of style and language, WordSeer uses computational linguistics to calculate words commonly used in similar contexts, or commonly used within a 10-sentence window of each other. Clicking on any word while reading brings up a small window showing related words.

Related words for face. — Figure 10. Related words for face

In our example, the the related words for body-parts (e.g. Figure 10 for face) strengthened our growing suspicion that female body part mentions were associated with romance. The popup shows that other body parts are frequently mentioned, along with love, fair, and sweet.

Assembling Evidence

We created a final pair of categories focusing on love: not-love-stories for plays in which love is not a major plot point, and love-stories for plays in which it is. When we reorganized the plays along these lines, the results were immediate.

References to "her" body parts, and "her" relatives in the love stories collection — Figure 11. Visualization of the love-stories collection comparing the prevalence of body parts possessed-by her (blue) and relatives possessed-by her (orange).

In the love-stories (Figure 11), we see both body parts and male relatives. By contrast, the not-love-stories visualization (Figure 12) shows predominantly male relatives, and hovering over the occurrences of body parts reveals a gloomy picture of her tear-stained cheeks and her sorrowful eyes.

The grammatical search results (below) agree with the newspaper-strip visualizations and related words. We see more physical attributes possessed-by her in the in the love-stories collection (Figure 13a) than in the not-love collection (13b).

The grammatical search results show that the language around men changes as well (Figures 14a and 14b below). In the not-love case, the only woman to appear is mother, at number 20, but in the love case, wife takes first place, followed by favor. Compared to the physical language for women, these words have a more sentimental quality.

Thus, we see that, while a male-dominated picture of both men and women is always present, physical aspects are more prominent for women in plays about love. For men, the more sentimental aspects come to the fore.

Conclusion

WordSeer is being developed through case studies. This means we observe scholars working with texts, figure out what they need, and then try to translate it into interactions, text mining algorithms, and visualizations. Therefore, when the time comes to demonstrate it, I always think examples work better than anything else.

So what do the literature scholars among you think of this simple example? How might it be improved, and made more convincing? What are its flaws? What would you have done? Please comment, even if it is to criticize. It would be great to hear your thoughts.