A big question for me, as a designer of text analysis tools for the humanities is: how do the tools I’m building fit in? Sure, you can have fancy word trees and grammatical search histograms. Sure, they’re chock-full of interesting…
A big question for me, as a designer of text analysis tools for the humanities is: how do the tools I’m building fit in? Sure, you can have fancy word trees and grammatical search histograms. Sure, they’re chock-full of interesting…
A common task in literature study is to find examples of a theme. Until now, literary scholars searching for examples have had to rely on searching for sets of words they think are associated with the theme. Theme-finding by searching…
A new version of WordSeer is in the works. It’s been guided by the advice of our long-suffering literature-scholar collaborators. And by the tales of frustration and trial-and-error of the students of the Hamlet class who tried to use WordSeer to…
When scholars try to make sense out of large collections of text, they frequently do two things: compare, and collect. They collect samples of “interesting” things, and compare them with each other along various relevant dimensions. In this post, I…
A common problem in search and exploration interfaces is the vocabulary problem. This refers to the great variety of words with which different people can use to describe the same concept. For people exploring a text collection, this makes search difficult. There…
On Tuesday, Feb. 1, I’ll be presenting my latest project WordSeer, at the Farsight 2011 conference on the future of search. This event will be streamed live from TechCrunch, the tech world’s favorite blog about new technology and startup news,…
More and more source text in the humanities gets digitized every day, making it accessible to large scale computational analysis. Nevertheless, traditional methods of humanistic analysis are based on detailed arguments built upon on close readings of individual texts. How…
This year’s conference of the Association for Computational Linguistics, the most prestigious event in computational linguistics, had a paper that got me very excited. It’s called Extracting Social Networks from Literary Fiction [pdf], and here’s the abstract (emphasis added): We…
Take an example question that a literary scholar might have,
“How is the character Mary talked about in this text from by author X”?
It’s fairly open ended – what does “talked about” mean? How do we translate this into computational terms? In this post, I’ll describe some tools that natural language processing (NLP) has to offer, and show how each can be used to tackle this question along with pointers to sofware and tutorials.
Joseph Turian & co. at MetaOptimize have started a Q+A forum for “data geeks” – people in machine learning or data mining who deal with questions about visualizing, processing, or otherwise making sense of big data sets