Bookworm is a tool for visualization and analysis. It is useful for plotting usage trends in collections of texts. The HathiTrust Digital Library is one such collection of text — it consists of materials from the digitized holdings of some of the most important research libraries in the world (which form part of the HathiTrust consortium), and currently consists of approximately twelve million physical volumes of text in digitized form.
I conducted a workshop earlier this month here at the University of Illinois, Urbana-Champaign, to “incubate creativity” for interested students who attended with the purpose of participating in the Shout Out for the Humanities Student Contest organized by 4humanities.org.
Both undergraduate and graduate students can participate in the contest that 4humanities.org is organizing. The theme of the contest is:
“Why is studying the humanities–e.g., history, literature, languages, philosophy, art history, media history, and culture–important to you? To society? How would you convince your parents, an employer, a politician, or others that there is value in learning the humanities?”
I noticed that humanities centers at several universities (UCSB, UI Chicago and IUPUI@Indianapolis among them) were already planning workshops to “incubate” submissions to the context by students, and so it seemed to make sense to organize a workshop right here at UIUC, because it occurred to me that:
(1) The content/topic of the contest itself is (or ought to be) of interest to all of us who care about the humanities
(2) Not only the HathiTrust+Bookworm tool (which, of course, is what this blog is about), but also the Word Similarity Tool (built by David Mimno of Cornell using text data provided by the HathiTrust Research Center) potentially allow some of the students to build interesting philological arguments in support of what they choose to write about the theme — through discovering historical lexical trends associated with words having to do with the humanities, and exploring the trends of how other words have tended to co-occur with those words over different, particular time-spans of historical time. Basically, with the help of these two tools, students would be able to build “philological” arguments to provide context for their thesis / argument.
In the workshop, I had the students who attended explore the words that they said they found interesting/attractive about the humanities — words like “narrative”, “plurality”, and “story” were some of the words that came up. Did these words always used to be equally prevalent as now? How have their fortunes fared over time (as shown by usage trends), especially when the query is narrowed using facets (by Library of Congress class, by country (USA or UK))? And, using the Word Similarity tool (which creates a co-occurrence table), what other words have tended, over time, to co-occur with the words that “define” the humanities for the students? These were some of the things the students explored.
It occurs to me that it will be useful for the students to be reflexive about the tools — are “counting”-based (statistically based) tools like these antithetical to the humanities? One can probably argue that some kind of “bridging” of the “two cultures” (to use C.P. Snow’s phrase) takes place when we use text-analysis tools like this to take a philological approach to humanistic inquiry and argumentation — here I am also thinking of Rens Bod’s book A New History of the Humanities: The Search for Principles and Patterns from Antiquity to the Present (2014), in which Bods suggested that quantitative thinking had always been a part of the philological tradition has in fact been quite integral to the genealogy of the humanities.
For the past few weeks, I have been thinking a lot about how HT+Bookworm can be used in the classroom, in the context of undergraduate teaching. I started talking about this in the previous blog post, in which I mentioned that HT+Bookworm was about to be tried out in Prof. Christi Merrill’s undergraduate Comparative Literature class at the University of Michigan as part of a classroom activity.
We had the students carry out that in-class activity on November 19, 2015. What follows is a description of what happened, and a reflection — incorporating feedback from the HTRC user community during the conference call — of what we learned from it.
Christi had thoughtfully prepared an assignment that she wanted the students in the class to turn in by the evening, after they had had a chance to explore HT+Bookworm in class during the day. The assignment, which had students working in small groups of three or four each, was this:
- Each group of students has to create a document with screenshots of HT+Bookworm plots for at least five queries (corresponding to five keywords that they get to choose — the only restriction is that the keywords must be related to “translation” in some way, as that is the subject of the class).
- From the HT+Bookworm plots, the group then identifies source texts (each source text is a volume from the HathiTrust Digital Library) that they find to be of particular interest to them. (Recall that by clicking on particular points on the plot, students can display on the screen the digitized source texts that contributed to that point in the plot — as HT+Bookworm’s Graphical User Interface makes it possible to drill down to them from the usage trend plot.) The student group is asked to identify at least nine such source texts.
- Using the Translation Network Builder tool (developed by Christi’s team at Michigan), the students then create a network illustrating relationships between the source texts from the HathiTrust Digital Library that they have identified. This being a polyglot class around the theme of translation, Christi provided some additional requirements/constraints on the assignment. These were the following:
- The nine source texts should encompass at least three different languages between them
- These nine source texts should contain all the five query words between them
- The group then adds to the network a quotation from each of three sources, chosen such that each quotation is in a different language. The quotations should help illustrate the thematic content of the network.
- The group turns in the URL for this translation network they created, to the instructor. Here is an example of a network, which was turned in by one of the groups in the class.
What was the pedagogical utility of using HT+Bookworm in this way? One student wrote something in the post-class survey that is of interest here — (s)he wrote:
It helped me think about… the context of my source work beyond the moment of its creation.
Last but not the least, what additional affordances did the Translation Network Builder built by Christi’s team provide for this student exercise, and what synergy did it create with HT+Bookworm? The Translation Network Builder team has an in-built functionality that comes in handy — this functionality is that it allows you to build a graph in which, when you are intending to add a new node (i.e., a new vertex) to the network graph, then, if that new node is a volume contained in the HathiTrust Digital Library, then you can choose to create what the Translation Network Builder calls a “work” node. (By “work” it means what means what, in library parlance, one would call a “volume”.) You can do this by selecting “Search from Record” from the contextual menu when seeking to create the new node using the Translation Network Builder. This is helpful to the students, as, once they have identified, from their use of HT+Bookworm, which source texts from the HathiTrust Digital Library they want to put in their network, adding those volumes as vertices in the graph is easy: as long as the student group members know the volumeID of the volume (which they can note when they identified the volume while using HT+Bookworm), they can bring up that volume’s representation (a graphic showing that book’s cover from the HathiTrust Digital Library — a graphic that can be made clickable in the future in order to bring up the actual digitized text on-click) and add it as a node/vertex to the network graph.
However, an important thing to remember is that Christi’s Translation Network Builder tool is primarily intended to build networks in which quite a few of the nodes/graphs stand in a relationship of “translation” to each other (i.e. are translations of each other). In the case of this student exercise, however, this is not the case — the students, in this exercise, are finding volumes from multiple languages, conceptually linked to each other by relations that the students can describe (and annotate on the edges between the graphs, with those edges representing the relations). However, these relations, in the case of this particular exercise, will not simply be one of translation (in the sense of the volumes being translations of each other). So, the translation-specific functionalities of the Translation Network Builder tool (such as one that makes it easy to add “translated-from” and “translated-to” types of links/edges between nodes) will be greatly underutilized or non-utilized during this particular student exercise.
We are excited about a classroom activity (our first-ever with HT+Bookworm), involving students in an undergraduate comparative literature class at the University of Michigan taught by Christi Merrill, will take place tomorrow (Thursday Nov 19) in Ann Arbor, Michigan.
HT+Bookworm is good at tracking changes in the frequency of use of words over time. (Right now, we can track only individual words (a.k.a. unigrams), although it will be extended to track bigrams and trigrams, too, very soon.
“Over time” is a key phrase here. When we do close readings of individual texts, which those of us who are interested in literary studies do all the time, we have an interpretive snapshot of the text at a fixed, synchronic point of time. But what we do not get is a sense of change that the word has undergone over time, and that is important for students in a literature class to understand, too.
So, in Christi’s class tomorrow, we will have the students use HT+Bookworm as an exploratory tool, in the service of three learning objectives:
- seeing how social change over time correlates with change in preponderance of one word-concept over another over time
- noting how occurrences of related word-concepts in multiple languages/places compare with each other over time
- observing how metaphorical associations of a word vary over time, by making the different metaphorical associations of the word easy to compare.
The astute reader will have noted that all three bullet points above have to do with comparisons : comparison between the relative prevalence of one word concept over another; comparing occurrences of word-concepts in different languages/places; and comparing the different metaphorical associations of a word. This is not surprising — not only because this is a comparative literature class, but also because when we are trying to understand something so complex as social processes and how they are reflected in patterns of lexical use, comparison is very helpful in tuning out any common noise.
The HathiTrust Research Center (HTRC) will be doing a two-hour expository workshop at the Linguistic Society of America (LSA)’s Biennial Linguistic Institute, at the University of Chicago, on July 13, 2015: “The HathiTrust Research Center: Large-scale Computational Analysis with the World’s First Massive Digital Library.”
The workshop will touch upon all aspects of HTRC’s functionalities, but, especially since this is an expository rather than hands-on workshop, we will be spending quite a bit of time on HT+BW in particular, given that we felt that HT+BW’s interactive and instantly-gratifying responsiveness will lend itself well to keeping people alert and interested as they sit through two hours of discourse in a crowded room in midsummer! Here is a preview of the slides for the workshop. (The HT+BW slides are slides 16 through 39 in this slide deck).
In the slides, we explore the regular/irregular verb past tense of “wed”/”wedded” (an example that Erez Lieberman Aiden broached at the last HTRC UnCamp), and we make a tantalizing discovery that Shakespeare gets mixed up in the story when we try to follow the history of this pair. HT+BW is great for literary sleuthing, and thinking about irregular verbs is making me think of Sherlock Holmes and his Baker Street irregulars…
In other slides, we also extend our exploration of the fortunes of the “lady”/”woman” word pair, which we had showcased a couple of months ago in our public LibGuide to HT+BW but only for our earlier, running-with-much-fewer-volumes non-Google-digitized public domain prototype. This time, we plotted “lady”/”woman” for the full pre-1923 public domain content of about 4 million volumes, and we seem to be getting cleaner results. In the slides, we speculate about what this particular case may have to tell us about what effect less/more democracy and less/more of class-society may have on word usage.
Eleanor Dickson, who has recently joined the HTRC team as a visiting digital humanities specialist, also contributes to the slide deck her exploration of the trajectories of the word “playground” in the UK and the USA, which seem to tell an interesting sociological story.
A workshop on HT+BW was held for interested faculty and students at the University of Illinois, Urbana-Champaign Scholarly Commons, on April 29, 2015.
- “The HathiTrust+Bookworm tool for lexical trend discovery“. Workshop, Scholarly Commons, University of Illinois at Urbana-Champaign Library, April 29, 2015.
Materials from this workshop are available, below. (You are invited to reuse and/or adapt these workshop materials to organize a workshop for your own institution, research group or class.)
Explore the demo from UnCamp 2015 over millions of texts!
A LibGuide for the HT+Bookworm prototype has now been published.
LibGuides are sets of web pages for assistance that are compiled by library personnel. This particular LibGuide has been created by the Scholarly Commons of the Library of the University of Illinois, Urbana-Champaign.