Web App Screenshots

Click an Image to See Full Size

application screenshot
Original WikiCandidate Site

application screenshot
WikiCandidate Explorer

The WikiCandidate Project



The WikiCandidate Project was a collaborative research project I undertook with Tarleton Gillespie and Dmitry Epstein at Cornell University. The project centered around a Website, titled “WikiCandidate ’08,” that was built to look like the campaign site of a fictional U.S. Presidential candidate running in the 2008 election. The twist was that all of the information on the campaign site, including the candidate’s image, biography, press releases, and issue statements, were tied to a wiki software engine and entirely created and editable by visitors to the site, who were instructed to work together to create an “ideal” President.

The original site was coded by a team of undergraduate developers headed by Alan Garcia and Brian Alson from the university’s Information Science Department, who did a tremendous job putting together a lot of esoteric functionality for the modified wiki engine. When it came time to look at the data from the site, however, it was time to write my own piece of software.

Tarleton, Dmitry, and I each had different questions about what users would do with the WikiCandidate software. Tarleton was curious to see how the design of the software might affect the discourse of the users who came to the site to write WikiCandidate’s political platform and debate the issues, as well as how the contributions of early users might set the terms for later ones. Dmitry was curious about how users would settle their differences of opinion to create a common platform, and focused on issues of dispute resolution using digital tools.

My questions were about whether users, left to their own devices to create an ideal candidate, would come up with innovative styles of discourse and/or reproduce the tone and tropes of conventional political campaign speech.

Users contributed a great deal of content, but the site was still of a moderate enough size to read “cover to cover.” In doing so, I noticed that, while users of different political stripes frequently engaged in edit wars—especially regarding the candidate’s stance on particular issues—they often preserved bits and pieces of previous users’ contributions, while changing the valence of the candidate’s position. For instance, under the issue of “Gun Control” a user originally wrote “This candidate understands that guns and firearms have made a significant mark on our country’s history,” adding “the Columbine Massacre is just one example.” Another editor eventually came along and kept the original sentence, but changed the subsequent one to read, “the American Revolution is just one example.”

Occurrences like this were not uncommon, and I wanted to do more to explore the surviving phrase fragments, which arguably constituted tiny areas of tacit agreement among users and could have framing effects on ensuing edits, even when other aspects of the candidate’s stance changed entirely. Despite noticing events like this, it was difficult to keep track, across the many issues and editors on the site, of exactly which phrases were being preserved and who was doing it. In short, phrase survival seemed (a) like an interesting thing to look at more closely, and (b) like a good candidate for programmatic analysis. So, I built a software tool, which I dubbed “WikiCandidate Explorer” to identify exceptional instances of phrase survival across edits of various campaign issues. It turned out there were many, and that these were interesting, but I will save the results for a different time, and focus here on the process of building the Explorer tool.

I built out the application in a Socratic fashion. I started by asking it to identify all phrases of set length, then added a method for finding the longest surviving phrase in an article. Seeing these, I then wondered about phrases that were shorter, but still prominent. So I built functions for finding the longest phrase that occurred in a user-specified number or percentage of drafts. I built out other portions of the tool in a similar way, starting with a simple question and then adding functionality to give me more nuance.

For instance, the tool initially told me how many drafts a phrase occurred in, but not what proportion of drafts. So I changed the program slightly to report the percentage of drafts in addition to the number. Once I knew the proportion of drafts, I wanted to know where the phrase appeared in the history of the article and whether or not all the occurrences were contiguous. So I had the program list the draft numbers. This generated long spools of numbers, though, and I wanted something cleaner. So I created a small visualization for each phrase to mark on a timeline where the drafts occurred, while keeping the number list available as a mouse-over display.

Then I wanted to know who was writing the drafts, so I had the program color code the timeline markers by author. When I saw the output, I then realized that many of the “repeated phrases” were actually coming from the same authors who were simply proofreading their work and saving their changes to consecutive drafts of an article. This realization in turn led me to then add an option to the tool and the user interface to ignore back-to-back drafts by the same user when looking for surviving phrases, taking only the last consecutive draft by each author. This added feature changed the phrases I found, and in many cases made the phrases identified by the software more interesting.

In the end, this iterative process, in which I built out a custom tool to explore the data, answering my own questions as they occurred to me, struck me as an especially useful research method. It is not a replacement for in-depth qualitative analysis, but a supplement to it. Rather than using programming to mine data sets from Websites that I would never read, I could write tools to aid me in qualitative research—helping me to deepen and challenge my understanding of data that I had already read cover to cover.

I continue to be impressed by the ease with which a few string parsing functions allow you to explore new aspects of a data set at a whim. I was also struck by how the tool came together. Every function in the PHP files, every form field in the interface, and every readout in the reports it generated had a story behind it about a question I’d asked, an aspect of the data I’d considered. The software itself was, in some sense, the manifestation of my research narrative—in a way that could never be true for an all-purpose piece of software like Atlas.ti or SPSS. This is something I have continued to do as I’ve moved on with new research projects.

blog comments powered by Disqus