On Tuesday, Feb. 1, I’ll be presenting my latest project WordSeer, at the Farsight 2011 conference on the future of search. This event will be streamed live from TechCrunch, the tech world’s favorite blog about new technology and startup news, and will be attended by high-profile techies from Bing, Google, Blekko, and the like. Please tune in at 10am PST Tuesday, and follow along with #futuresearch on twitter, and let’s get the digital humanities some high-tech exposure that day!
WordSeer is a new way of searching through text inspired by the way literary scholars work. Literature scholars ask detailed, analytical questions of text, for which it’s important for them to get a sense of how different words are used and in what contexts. For our project, we teamed up with scholars who are exploring language use in a collection of North American slave narratives.
When analyzing text, traditional keyword-based search can only take you so far. Instead of having to read every document hoping to come across relevant passages, you can immediately zoom in on them with a search. But can we do better? When trying to form a hypothesis or get a sense of contents, a long list of search results is still unwieldy because it’s not really the matching sentences we’re interested in, it’s what they have to say about our topic.
Luckily, we don’t have to stop at matching keywords. Sentences aren’t mysterious bags of words, they follow rules and have structures, which computers have been capable of deciphering with speed and precision for some years now. From these structures, computers can automatically infer relationships between words. For example, in the sentence,
“The good God has given every man intellect”
computers can automatically infer that “God” is described as “good”, and that he is the agent doing the giving.
With WordSeer, we’re going beyond keyword search by using language processing to automatically extract and aggregate the parts of matching sentences relevant to a query. In the first place, we make it easy to express an analytical query in terms of a grammatical relationship. For example, if a scholar wanted to know what the slave narratives collection indicated about the relationship between slaves and God, they could simply ask (live demo link) how God ”is described” (for which WordSeer finds and displays all the adjectives that are applied to the word God) and what “is done by“ God (for which WordSeer finds and categorizes all instances of verbs in which God is an agent).
Of course, this is only a rough, high-level picture of what the slave narratives say about how God is described and what God does, but a rough idea can often serve to guide intuition and help generate or discredit hypotheses. By making the process of “getting a rough idea” quick and inexpensive, we can speed up the entire research pipeline.