Example: Shakespeare
Other posts have shown how WordSeer can be used to explore small, well-defined questions: what word did Shakespeare use for ‘beautiful’? Is the occurrence of the word ‘love’ the same in the comedies and tragedies? This post is different. WordSeer has developed enough to support a simple, but complete, exploratory analysis.
The question we’ll think about is this:
“How does the portrayal of men and women in Shakespeare’s plays change under different circumstances?”
As one answer, we’ll see how WordSeer suggests that when love is a major plot point, the language referring to women changes to become more physical, and the language referring to men becomes more sentimental. You can watch a screencast here, or just read this post.
Search
We began our analysis with the question, “what are some things that are portrayed as ‘his’ and some things that are ‘hers’?. A typical keyword search returns an unstructured lists of results, and a standard approach in literature study is to view them in a concordance. This is a list of all the sentences in which a word occurs, with the target word aligned in the center of the view, exposing the contexts to its left and right, sorted in some way. WordSeer uses the word tree concordance visualization which makes common contexts easier to see by grouping them in a tree-like structure.
The word tree for her is shown in Figure 1 above. Some words like beauty stand out, but constructions like her own muddy the picture. The problem lies in the different ways in which his
and her
are used. The word his
is always a possessive pronoun, and word sequences containing his
would nearly always be relevant. However, her
can also be a 3rd-person pronoun, and will yield constructions like “I told her that X”
and “I gave her the Y”
.
With WordSeer, we can get around this problem with grammatical search.The system uses natural language processing (NLP) to extract relationships between words, and allows users to specify both keywords and relationships between them. In the tool’s search interface, pairs of words are specified using input boxes, and the relationship between them is selected from a drop-down menu (Figure 2). Leaving a word-input box blank returns all matches.
With this feature, we can take advantage of the fact that possessive relationships between words can be automatically detected, to express our question precisely: “what are all the words with which his
has a possessive relationship?”. The results are shown in Figure 3 below.
Comparing these words with those for her
(Figure 4 below) reveals immediate differences. The word father
is most common for her
, with husband
, and son
close behind. Several body parts enter the picture: eyes
, hand
, face
, tongue
, lips
, cheek
. A picture emerges: women’s most commonly-mentioned possessions are their male relatives and their bodies.
Visualization, Reading, and Hypothesis-Generation
Our next question was whether this physical, male-dominated picture of women was consistent, or whether it changed in different types of plays. We used the tool’s collections feature to divide the plays into comedies, tragedies, and histories – the three most commonly-accepted categorizations of Shakespeare’s plays. We also created pre-1600, and post-1600 categories to check whether there were temporal differences.
Collections were created using the “collections” bay, a collapsible window at the bottom of the screen. We added the appropriate plays through the document listing (sortable and filterable by date, title, full-text search, grammatical search, and length).
We used the tool’s newspaper-strip visualization (Figure 6) to compare the prevalence of the two categories of words in different types of plays. Each play is represented as a long column. Within each column, small, colored horizontal blocks (corresponding to 10 sentences each) highlight the presence of a match.
The results for the tragedies collection were similar to the results for comedies (Figure 6) but in histories (Figure 7), an interesting pattern emerged. It seemed that body parts (blue) were somewhat less prevalent in these plays, but family (orange) remained unchanged.
Hypothesis-building: close reading, annotation, and exploration
WordSeer supports quick, large-scale analysis through search and visualization, but in all cases maintains links back to the source text. Hovering over a blue or orange highlighted block in Figures 6 or 7 brings up a popup displaying the matching sentence. Clicking opens the reading interface to that point (Figure 8). The full text of the document is loaded, and the system automatically scrolls to the relevant sentence, and highlights it.
Hovering over a few body-part results quickly led to a new hypothesis. In our rough sample, many of the mentions sounded romantic. We used the reading and annotating interface to follow up on this by clicking on the highlighted blocks in the newspaper-column visualization.
We selected the speeches referring to body parts and tagged them by the topics they seemed to contain. It soon became apparent that many of the mentions were speeches by a lover.
Our hypothesis was strengthened when we viewed related words. For exploration of style and language, WordSeer uses computational linguistics to calculate words commonly used in similar contexts, or commonly used within a 10-sentence window of each other. Clicking on any word while reading brings up a small window showing related words.
In our example, the the related words for body-parts (e.g. Figure 10 for face
) strengthened our growing suspicion that female body part mentions were associated with romance. The popup shows that other body parts are frequently mentioned, along with love
, fair
, and sweet
.
Assembling Evidence
We created a final pair of categories focusing on love: not-love-stories for plays in which love is not a major plot point, and love-stories for plays in which it is. When we reorganized the plays along these lines, the results were immediate.
In the love-stories (Figure 11), we see both body parts and male relatives. By contrast, the not-love-stories visualization (Figure 12) shows predominantly male relatives, and hovering over the occurrences of body parts reveals a gloomy picture of her
tear-stained cheeks
and her
sorrowful eyes
.
The grammatical search results (below) agree with the newspaper-strip visualizations and related words. We see more physical attributes possessed-by her
in the in the love-stories collection (Figure 13a) than in the not-love collection (13b).
The grammatical search results show that the language around men changes as well (Figures 14a and 14b below). In the not-love case, the only woman to appear is mother, at number 20, but in the love case, wife takes first place, followed by favor. Compared to the physical language for women, these words have a more sentimental quality.
Thus, we see that, while a male-dominated picture of both men and women is always present, physical aspects are more prominent for women in plays about love. For men, the more sentimental aspects come to the fore.
Conclusion
WordSeer is being developed through case studies. This means we observe scholars working with texts, figure out what they need, and then try to translate it into interactions, text mining algorithms, and visualizations. Therefore, when the time comes to demonstrate it, I always think examples work better than anything else.
So what do the literature scholars among you think of this simple example? How might it be improved, and made more convincing? What are its flaws? What would you have done? Please comment, even if it is to criticize. It would be great to hear your thoughts.