Digital Humanities and the Future of Search

On Tuesday, Feb. 1, I’ll be presenting my latest project WordSeer, at the Farsight 2011 conference on the future of search.  This event will be streamed live from TechCrunch, the tech world’s favorite blog about new technology and startup news, and will be attended by high-profile techies from Bing, Google, Blekko, and the like. Please tune in at 10am PST Tuesday, and follow along with #futuresearch on twitter, and let’s get the digital humanities some high-tech exposure that day!

Wordseer Search Box

WordSeer is a new way of searching through text inspired by the way literary scholars work. Literature scholars ask detailed, analytical questions of text, for which it’s important for them to get a sense of how different words are used and in what contexts. For our project, we teamed up with scholars who are exploring language use in a collection of North American slave narratives.

When analyzing text, traditional keyword-based search can only take you so far. Instead of having to read every document hoping to come across relevant passages, you can immediately zoom in on them with a search. But can we do better? When trying to form a hypothesis or get a sense of contents, a long list of search results is still unwieldy  because it’s not really the matching sentences we’re interested in, it’s what they have to say about our topic.

 

The grammatical structure of a sentence

Luckily, we don’t have to stop at matching keywords. Sentences aren’t mysterious bags of words, they follow rules and have structures, which computers have been capable of deciphering with speed and precision for some years now. From these structures, computers can automatically infer relationships between words. For example, in the sentence,

“The good God has given every man intellect”

computers can automatically infer that “God” is described as “good”, and that he is the agent doing the giving.

With WordSeer, we’re going beyond keyword search by using language processing to automatically extract and aggregate the parts of matching sentences relevant to a query. In the first place, we make it easy to express an analytical query in terms of a grammatical relationship. For example, if a scholar wanted to know what the slave narratives collection indicated about the relationship between slaves and God, they could simply ask (live demo link) how God  ”is described” (for which WordSeer finds and displays all the adjectives that are applied to the word God) and what is done by God (for which WordSeer finds and categorizes all instances of verbs in which God is an agent).

Of course, this is only a rough, high-level picture of what the slave narratives say about how God is described and what God does, but a rough idea can often serve to guide intuition and help generate or discredit hypotheses. By making the process of “getting a rough idea” quick and inexpensive, we can speed up the entire research pipeline.

Posted in Digital Humanities, Information Seeking, Natural Language Processing, Text Mining
4 comments on “Digital Humanities and the Future of Search
  1. jonpincus says:

    Wow, fascinating research — and love the foundation in digital humanities. Excellent presentation at the Future of Search, too!

  2. Gazal Garg says:

    This is so full of sense – using language rules for searching. A very novel step for Artificial Intelligence!

  3. Hi!
    I want to invite you as speaker to this meetup: http://www.meetup.com/San-Francisco-Information-Retrieval-Group/. Let me know if you are interested or would like to do it.

    Thanks a lot!

    You can contact me at
    griscz at indextank . com
    :)

1 Pings/Trackbacks for "Digital Humanities and the Future of Search"
  1. [...] Aditi Muralidharan (@silverasm), a fellow alum from THATCamp Bay Area, has an update on her WordSeer project, at Digital Humanities and the Future of Search. [...]