July 24, 2014

Text Visualization / Content Analysis with @VoyantTools

Many of my research colleagues do content analysis on newspapers, and there's a new tool which may provide useful to them. Let's explore Voyant Tools, a "web-based reading and analysis environment" which provides lots of high-level insight into text.

 I did a quick LexisNexis search on articles written in college newspapers about sexual assault and pasted a few of them into Voyant-Tools.org. You can see the word cloud above as well as the text on the right.

If you click on any of the words in the cloud or in the text itself you'll also see where in the document the term appears, and you can see a list of Keywords in Context.

Click on the plus-sign next to the phrase, and you see more of the context.

I was able to export a URL for this Keywords in Context chart, so you can see it in all its glory.

There are myriad other export features in the tool, including a list of words by count, comma- and tab-separated options, and more.

It seems like a good option for exploring text on a very broad level. And it's a quick way to provide graphics for publications or presentations on your text analysis.

There is a stop-word list so you can exclude common words; you can edit this list as well (I excluded lots of common LexisNexis terminology like "u-wire" and "document;" should I have excluded "said" as well?). It is possible to upload multiple documents, so that you can compare coverage of a topic in one newspaper against coverage in another paper.

Some of the limitations for newspaper research include:
  • It's not possible to analyze pdfs, for relatively obvious reasons; but this eliminates the ability to search many historic newspapers which are available online only as pdfs.
  • If you export multiple stories from LexisNexis or America's News, they are exported as one document, which makes it impossible to compare documents against each other in Voyant-Tools. To do this kind of analysis, you'd need to export the documents one at a time, which would quickly get tiresome.
Here's a screen shot of an analysis I did of eight individually downloaded articles from LexisNexis -- that process was a bit cumbersome, but the data is interesting:

The chart at right shows the number of times the word "women" appears in each of the eight artcles. You can see a quick analysis of all the words in the eight articles under the Word cloud (or here).

This has great potential in the newspaper content analysis toolbox.

No comments: