How do you read? An analysis of survey responses.

A big question for me, as a designer of text analysis tools for the humanities is: how do the tools I’m building fit in? Sure, you can have fancy word trees and grammatical search histograms. Sure, they’re chock-full of interesting information that you can make an argument about. But where exactly in the humanistic analysis process does a scholar need things like that? I have no idea.

But there’s more. I don’t just build tools, I build environments. And that means support for reading the text, navigating it, searching it, and (most importantly) “working” with it. And I have no idea what that means either. So over the past few weeks I’ve been having hour-long chats with late-stage PhD students from the literature and history departments, and asking them to tell me about how they do research. I asked  all kinds of confusing and mundane questions like, “How do you decide what to underline?” and , “Can you define formalism for me?” and, “You mean you actually copy it out by hand?” and “How do you organize all the quotes you collect?” and, “How do you go about proving that?” and, “So you scanned in everything in those boxes?”

I only did twelve of those interviews, but patterns began to emerge.  So I did a survey. A simple one, with six questions about reading habits. This survey’s purpose was to confirm whether some of the patterns I noticed around reading were general. If you just want the charts summarizing the responses, you can find them here (those numbers include around 20 more responses I got while I was writing this post). For a full analysis in which I extract some general patterns in humanities scholars’ reading processes, read on.

This was the first page of the survey, with two screener questions. Over two days, I got 153 responses from humanities scholars whose primary sources were mostly textual (and 18 responses from others, but I removed those responses for this analysis). Filter questions: are you a humanities scholar? Are your primary sources mostly textual?

Working with text

My first question was about copied-out snippets. From what I heard in the interviews, humanities scholars working on a project eventually reach a point where they have some interpretation, some interesting angle they want to take. At this point, they start actively reading and re-reading their primary sources. Snippets begin to collect in their notes in large numbers.

Do you  copy out snippets of text from your primary sources?

The results are below. An overwhelming 90% of textual humanities scholars surveyed copy out snippets either frequently or occasionally.

Answers to "Do you copy out snippets of text from your primary sources into your notes?"

What are snippets for?

But why? From what the interviewees said, copied-out snippets seem to be in a different category from other annotations (such as margin notes, underlines, highlights, or circles on the page). In fact, copying out seems reserved for a more important class of items. But what makes these copied-out snippets so special? This was my next question.

Why do you copy out snippets of text

The results are below, and the responses confirmed the interviews. Most of the textual humanities scholars surveyed said that snippets were evidence, interesting or thought-provoking passages, or examples of something they were looking for.

Answer to why do you copy out snippets of text

In addition to the 4 existing choices, there were 28 “other” responses, which fell into these groups:

  1. As a way to store quotations I think I’ll need (5 such responses)
  2. Writing it out helps me understand and think about it (2)
  3. Writing it out helps me remember it (5)
  4. The snippet is a good summary of an author’s point or argument (4)
  5. They help me outline a longer argument (3)
  6. I need to translate it into another language (3)

It seems that copied-out snippets play a role very much like the evidence a lawyer lays before a jury. They provoke thought, they justify an interpretation, they are examples that support an argument (or indeed counterexamples that need to be explained away). A scholar amasses many such “pieces of argument” and then organizes them to tell a coherent story.

What do snippets look like?

As a tool builder and an information retrievalist I need to know: how long are these snippets, and is there accompanying information that users will want to add?

Snippet length

Are snippets a few words long? A few paragraphs?

What do snippets look like?

The results are below. They revealed that snippets varied greatly in length.

  • Many respondents had “some” or “many” snippets between a few words to a paragraph long
  • About 25% of had “some” or “many” snippets that were longer than a paragraph.

Accompanying information

The interviews suggested that in addition to the literal text of the snippets themselves, there were often various kinds of notes as well as visual finding aids and citation information.

The results are below. Notes about why the snippet was relevant were very common.

And so were ideas that the scholars got from the copied-out snippet.

By far the most common was citation information: 76% always added it.

Other markers, such as post-it flags, colored highlights, and tags to enable keyword search were less common, but also done:

For a tool designer, the message is clear. If you want humanities scholars to read text using your tool, you must support all of the above activities.

Collecting evidence

When a scholar wonders, “Where else have I seen that before?” or, “Is this an unusual exception?” or, “Is this a pattern?”, my interviewees told me that there are two approaches they can take: they had either thought of this before, and relevant passages were already copied into their notes, or they’d have go back and re-read the relevant texts to search for evidence. To see if this was a general pattern, I created the next two survey question.

When do you revisit and re-read textual primary sources that you've already read?

The responses are below, and they confirm my interviewees responses. 79% re-read when they had a new idea or interpretation, 65% when they had a new hypothesis, and 67% when they noticed a new pattern.

Answers to why do you revisit sources you've already read.

There were also 13 “other” responses, which fell into the following categories:

  1. Just for fun (2 responses)
  2. When I teach (5)
  3. When I find my notes don’t have everything I need (3)
  4. To understand it better (2)
  5. To compare with other documents (1)

But what do scholars want to do with the text they re-read? This was my final question. I wanted to compare how note-taking behavior varied between first-time reading and re-reading:

The results are below. Compare how frequently scholars copied out snippets the first time (top) with when they were re-reading (bottom). The copy-out rate is pretty much the same, or perhaps even a little higher while re-reading.

These responses reveal an inefficiency in the “finding evidence” portion of the scholarly process. Reading is a great way to understand, to learn, to remember. But it is a very inefficient way to search for something. First, it’s very slow (and when you speed it up, it starts to lose its reliability). Second, it’s very subject to your state of mind (sometimes things pop out at you, sometimes they don’t). And third, it’s impossible to do thoroughly for very large collections.Yes, you need to revisit, re-find and re-acquaint yourself with the material. But what I object to is that you often have to re-read in order to do it.

A (partial) solution already exists: full-text digitization and search. Search can take you a long way, especially if your examples are associated with particular words. However, there are a great many sophisticated information retrieval technologies that go beyond keyword search. We can calculate and retrieve text passages by similarity, we can allow you to mark relevant passages and return more like those, we can train classifiers on what you’ve marked interesting, and have them automatically classify text, we can use the google translate API to help identify foreign words, and online dictionaries to find synonyms.

Based on the answers to these two questions, it seems to me that the following three “finding a snippet-of-text” problems might really benefit from a little information retrieval and visualization, because finding them by reading is especially hard, and formulating them as search queries can be difficult:

  1. Find me more examples like this
  2. Find me other other places in the text where this happens
  3. Show me all the places in the text where this concept comes up

In conclusion

Together, the survey and interviews helped me understand the mechanics of scholarly reading. As I interpret it, humanities scholars working around textual primary sources follow a process like this (and this is why I’m blogging, so you can all violently disagree with me in the comments):

  1. Scholars begin a project with a (sometimes vague) hypothesis, interest, or interpretation in mind
  2. They read primary sources to solidify their understanding and find evidence for their arguments
  3. They notice or realize things while reading passages from the text
  4. They copy out the passages if they are sufficiently thought-provoking, provide evidence, or are relevant to an interpretation.
  5. They add information to the passages they copy out:
    1. Why the passage is relevant/ how it fits into their argument
    2. Ideas they got from the passage
    3. Citation information so they can find it and cite it properly later
  6. If they haven’t already collected the necessary supporting material, they search, read and re-read other texts to find more support.
  7. When they find supporting or relevant passages, they copy them out (see step 5)
  8. They curate their collected passages (“pieces of argument”) into a written product representing their argument

Laying out these steps has helped me come up with design requirements. For example, even if no extra visualizations and analysis tools are added, and the interface only supports reading, these steps told me the basic “reading tools” my system must have:

  1. Copying out variable-length snippets of text into a note-taking area
  2. Preserving the link between the copied-out snippet and its place in the text
  3. Taking notes around a snippet
  4. Tagging and keyword search
  5.  Exporting all of the above into a format compatible with MS Word or similar, so that scholars can integrate it with their other work

Next, if we’re talking about systems with keyword search functionality, steps 6 and 7 make it clear that the ability to save, organize, and annotate search results is key. This is in addition to the ability to switch seamlessly between looking at search results and reading the source text surrounding any particular search result.

If we want to get more sophisticated with information retrieval (which I do), the three “finding snippets of text” problems  I identified above need attention.

But what about when you have visualizations? Coming back to my original question, where do these fit in? I’m still not sure, but I think the steps I’ve found above will give me a good place to start looking for answers.

Posted in Digital Humanities
6 comments on “How do you read? An analysis of survey responses.
  1. If this survey is for your academic research, are you not obligated to secure ethics approval from your institution, and does that not usually end up in being required to secure informed consent from all survey participants before the survey is launched? You may not be able to use this date if it is collected without having the necessary approvals ….

    • silverasm says:

      Thanks Aimee, I’ve looked at this. For many studies, this is true, but surveys are an easier category, especially completely anonymous ones.

  2. Natalia says:

    I’ll concede that re-reading is inefficient, but it’s also awesome, and sometimes leads me to new and better ideas. (We’re not just trying to find evidence to confirm something we already think, but also understand its context and revise what we think.) So I’m not sad to re-read a lot of the time. I also often have a visual memory of the thing I’m looking for, which makes re-reading a little closer to skimming. (I know not all brains work this way; I often remember exact phrases from my initial reading.)

    That said, there are different circumstances in which an alternative to re-reading is helpful; in the initial writing phase re-reading may be very generative, but if you’re, say, revising an article for publication or tracking down a citation, it can be a gigantic pain.

  3. DrDavisTCE says:

    I basically agree with what your survey found. I do wonder though if you would be able to create a key word search that encouraged/facilitated/allowed multiple search terms at one time for the same thing. I don’t always label my notes in the same way, depending on what I am thinking got at the time.

    I wrote about my answers and thoughts here:

  4. D.A. says:

    This is such an illuminating post–thank you! One thing that’s somewhat missing here, though–and it’s something I find missing from most digital reading experiences (and therefore the reason why I still, stubbornly, tend to print things)–is the tactile feel of reading. Sometimes, the notes and movements I make when I navigate a text are less about separating out what will be useful to me later and more about keeping my attention and focus while I read. I underline, for instance, because I find that doing so helps me process what I’m reading; I circle key words; I use vertical lines in the margins for snippets that I think will be especially important (the more parallel lines, the more important). These are all things that I find difficult to do in digital tools, and they occur mostly at a step in the process before copying things out into my notes. I don’t know if this is a common practice or not, but it’s something that the survey didn’t quite capture.

    • silverasm says:

      Thanks D.A. It is a very common practice! So common that there is already a lot of research on reading interfaces that can help replicate this feel. Sadly, products still have to catch up with the research. But yes, I’m aware of the role that marking the text plays in focusing attention and furthering understanding. I left such questions out because they’ve already been well researched (although it seems not to have resulted in better interfaces).

5 Pings/Trackbacks for "How do you read? An analysis of survey responses."
  1. [...] just completed an interesting and brief survey on humanities reading practices from Aditi Muralidharan. If you complete the survey, you can see the aggregate survey results. [...]

  2. [...] This survey is now closed, after 153 responses. Share this:TwitterLike this:LikeBe the first to like this. Filed under Digital Humanities ← Empirical [...]

  3. [...] was on Twitter (to get a break from grading) and saw a post I really wanted to read. It is called “How do you read?” and is on Text Mining and the Digital Humanities. The author did 12 in-depth interviews and had a [...]

  4. [...] the steps I’ve found above will give me a good place to start looking for answers. Share this:TwitterLike this:LikeBe the first to like this. Filed under Digital Humanities ← Empirical [...]

  5. [...] sensemaking cycle. In our studies of the work processes of Historians and Literary Scholars, and a survey of their reading habits, we’ve found that the humanities scholars begin by reading, making annotations, and [...]