Session Proposals – This&THATCamp Sussex Humanities Lab

TALK/MAKE – What are the odds: Big data meets political science and they go to the races

Stephen Lindsay — Tue, 26 Apr 2016 17:53:07 +0000

‘What are the odds?’ (WATO) was an interdisciplinary collaboration between political scientists and human-computer interaction researchers at Swansea University to try to bring elements of big data to the world of political forecasting. The project used page scraping to gather data on political bets on gambling websites to form a picture of the likely outcome of large public votes.

In recent years, the politics of predicting political events has been front and centre of debates thanks to surprise results in the UK general elections.

While the data collected in WATO was initially intended for presentation to political sciences researchers it was also made available to individual members of the public on a front facing website – tell me the odds.

Designing and building the WATO system raised more questions than it has answered for us; we still need to better understand:

The role that trust plays in intensely political areas of research and design
The best ways to present complex gambling data to members of the public without misleading them about it’s reliability
How we can help members of the public to engage with the analysis of this data in complex, real time transparent ways
How we can help researcher make use of the large amount of archival data we have and, more generally, what the techniques are to harvest data that is in the public domain, but which doesn’t necessarily want to be

This session will seek to explore all of these issues and more and should be of interest to data scientists, political scientists; social scientists with an interest in big data; or anyone with an opinion on the intersection between politics and research.

TALK/PLAY – TextLab Project in Practice

Rebecca Russell — Wed, 02 Mar 2016 16:09:50 +0000

TextLab is a Vertically Integrated Project at the University of Strathclyde involving students from the English Literature department and the Computer & Information Science department.

We use tools like:

AntConc (www.laurenceanthony.net/software/antconc/), a freeware text analysis toolkit for concordancing and text analysis.

Ubiqu+ity (vep.cs.wisc.edu/ubiq/) which generates statistics and identifies linguistic patterns and groups.

WordHoard (wordhoard.northwestern.edu/), an application for the close reading and scholarly analysis of texts, largely used on this project for determining the log-likelihoods of specific words and generating word clouds to display this information in a user-friendly manner.

In TextLab we use these programs to analyse the language of Shakespeare and to find patterns and discrepancies that would almost certainly be invisible to the naked eye.

But can we also use them to solve a murder?

To demonstrate the uses of these various tools, we have developed a murder-mystery type scenario in which Romeo (of Romeo and Juliet) has been found murdered while staying in a house with Hamlet, Brutus, and Lady Macbeth. A confession note was found by the body, signed by Brutus, but he claims he is innocent. We will demonstrate how some of these analytical tools could help us identify the killer, simply from the language used in the letter.

TEACH/PLAY – Scaling Up Impact

Julie Weeds — Fri, 26 Feb 2016 09:57:32 +0000

Researchers in every field are being made increasingly aware of the need for their research to have impact. However, often researchers don’t realize where beyond their own field of study their research might have impact. How does one go about finding the documents on the Internet which could be connected conceptually to another document, such as one’s own work or project proposal? Searching for key terms is a good start but often the same kinds of documents come up top again and again and it is difficult to sift through the results to find something different and relevant. Further, documents from other domains or sources (e.g., government policy documents) may use a different vocabulary making them less likely to come up in keyword searches.

In the Text Analytics Group (TagLab) at Sussex, we are developing a system which does four things. First, it automatically identifies key words and phrases for a document or set of documents. Second, it searches the web using queries based on combinations of the key words/phrases and related words. Third, it allows the user to build custom classifiers (using active learning), e.g., for relevance. Finally, it clusters the results with a view to making it easier to identify documents or clusters of documents outside of existing known clusters. The purpose of this session is to teach delegates about the underlying technology and to give delegates the opportunity to play with and evaluate the prototype system, using their own work as input.

Delegates will ideally have a laptop with Google Chrome installed to be able to access the software which is run as a web service. It would also be helpful if delegates brought with them a digital copy of some of their own work (e.g., an academic paper or grant proposal) in raw text format (i.e., ‘.txt’) which can be uploaded and processed by the system. However, we can help with both installation of software and file formatting as required.

MAKE – Digitising a single day

Adam Crymble — Tue, 23 Feb 2016 10:40:58 +0000

by James Gillray, published by Hannah Humphrey, hand-coloured stipple engraving, published 28 July 1792

The theme of this event is ‘scale’. Lately in digital humanities, we’ve tended to think about big scales – big data, longue durée. But what about the very small? In this session, I propose that we work together to ‘digitise’ a single day from the past, thinking about not only what that challenge means, but also what we can find out about the value of computers for understanding the small and the mundane. What does a digitised day look like? How much survives? Can we build a coherent picture? Picture of what?

I’d like to propose Friday 6 February 1789 as our case study. For most people then living, it was a very normal day. But for King George III, it was the first day his doctors allowed him to use his knife and fork, after an extended period of mental health problems. Thus, for George, it was an extraordinary day.

Drawing on our various experiences and disciplinary backgrounds, I hope you’ll help me explore the challenge of bringing together the various digital traces of a scant 24 hours from long ago. In the process I suspect we will be reorganizing the archive from one typically categorised by creator into one that emphasises a moment with innumerable perspectives.

I hope you’ll join me.

Adam Crymble,

Digital History Research Centre,

University of Hertfordshire

TEACH/PLAY – Treating BMD data as a ‘(semi-)closed set’

Richard Light — Thu, 18 Feb 2016 18:13:21 +0000

Starting from 1837, the GRO Civil Registration index provides a nominally complete record of births, marriages and deaths for England and Wales. The FreeBMD project has transcribed the vast majority of this material (up to 1983, when the GRO register went digital. Ironically, the FreeBMD project is still negotiating to make this more recent data available). Free UK Genealogy, of which FreeBMD is one project, is committed to making all its data available under an open data licence, and is working towards this goal, initially by getting all contributors to sign an agreement which allows this. This session is intended to explore what might be possible once these data sources are indeed open.

The high degree of completeness in this data makes it feasible to think in terms of a ‘closed set’ (as against the ‘open world’ assumption that cultural history usually has to adopt). In principle it should be possible to algorithmically match deaths to births that fall within this period, thereby providing an extra impetus to single-name studies.

A companion project – FreeCEN – offers census data, which places individuals within households on a specific date. While the coverage of FreeCEN is less complete than that of FreeBMD, the data it does hold offers much richer information about relationships between individuals, placing them in a social/family context.

Richard Light has scraped all the FreeBMD and FreeCEN data relating to his own name and his mother’s maiden name. The data behind these experiments will be made available as open data. It currently lives in an XML database, and can be published using the Linked Data approach. The plan for this session is for Richard to explain what has been achieved so far with this data, and then for everyone to explore what other techniques might be applied to it.

TEACH – Open Source Personal Digital Archiving

James Baker — Mon, 15 Feb 2016 09:43:09 +0000

SESSION REQUIREMENTS
If you intend to come to this session and want to do this work on your own laptop, please make sure you do the following in advance of coming to Sussex (it may be possible during but the files are a bit big!):

Download and extract the latest BitCurator Virtual Machine at wiki.bitcurator.net/index.php?title=Main_Page
Download and setup VirtualBox and make sure the BitCurator Virtual Machine works as per wiki.bitcurator.net/index.php?title=BitCurator_Virtual_Machine_Install
If you have an old floppies, flash drives, CD, DVDs, or hard disks you want to try and capture as part of the session, bring them along! (and don’t worry, I’ll bring along some dummy media for us to play with)
See Processing Workflow for Digital Media for the session handout (I’ll bring some copies along!)

The paper archive has been replaced by physical data storage – a new format that requires historians, archivists, and humanists to think and act afresh. In just 35 years most people – in Britain and worldwide – have come to create text and data in a fundamentally new way. The first step towards working with these personal digital archives if to preserve them. You can’t just turn on an old computer and start browsing: the act of booting it up adds new data to the archive with fresh data stamps, thus compromising its authenticity. Thankfully open source digital forensic tools aimed at archivists and scholars have made huge strides in recent years thanks largely to the efforts of the BitCurator project led by University of North Carolina Chapel Hill.

In this session, we’ll work together to capture some dummy media (bring your own if you want to work with the real thing!) and explore that media using BitCurator: a suite of open source digital forensics and data analysis tools design to help collecting institutions process born-digital materials.