How do we locate the individual in the noise of data? How do we tell big stories that aren’t reductive? What tools and technologies can empower researchers, educators and learners in and outside the academe to grapple with the macroscope?
This&THATCamp Sussex Humanities Lab takes place on 19-20 May 2016 at the University of Sussex. It brings together humanists, technologists, educators, and learners to share, build, and make together around the theme of scale. Spread over two days to enable a fruitful balance of doing and talking, of teaching and demonstrating, of hacking and yacking, this delegate-led unconference throws open the Sussex Humanities Lab to stimulate novel collaborations that reinvent the humanities, one bit at a time.
The event will focus on hands-on sessions that explore methods, practice, and strategies for working with humanities data at scale, be that close up or at a distance; but in reality, anything that isn’t a standard talk goes! Sessions proposed thus far can be found at this.thatcamp.org/category/session-proposals/. As participants, you will pick on the first day when, where, and whether the sessions proposed take place.
In addition to the unconference elements, the event will feature a keynote from Melodee Beals (Loughborough) entitled ‘A Series of Small Things: The Case Study in the Age of Big Data’.
Those interested in joining us should register at this.thatcamp.org/register/. Please note that spaces are limited so registration is vital.
‘What are the odds?’ (WATO) was an interdisciplinary collaboration between political scientists and human-computer interaction researchers at Swansea University to try to bring elements of big data to the world of political forecasting. The project used page scraping to gather data on political bets on gambling websites to form a picture of the likely outcome of large public votes.
In recent years, the politics of predicting political events has been front and centre of debates thanks to surprise results in the UK general elections.
While the data collected in WATO was initially intended for presentation to political sciences researchers it was also made available to individual members of the public on a front facing website – tell me the odds.
Designing and building the WATO system raised more questions than it has answered for us; we still need to better understand:
The role that trust plays in intensely political areas of research and design
The best ways to present complex gambling data to members of the public without misleading them about it’s reliability
How we can help members of the public to engage with the analysis of this data in complex, real time transparent ways
How we can help researcher make use of the large amount of archival data we have and, more generally, what the techniques are to harvest data that is in the public domain, but which doesn’t necessarily want to be
This session will seek to explore all of these issues and more and should be of interest to data scientists, political scientists; social scientists with an interest in big data; or anyone with an opinion on the intersection between politics and research.
As computer systems get increasingly sophisticated, the experiences of users — even specialist ones working in the digital humanities — are increasingly abstracted from the underlying electrical and mechanical operations of computers. Underneath everything there are only zeroes and ones travelling through machines. Building on the work of the Minimal Computing Lab at the Centre for Textual Studies at De Montfort University, this session seeks to encourage play and making at the lowest possible level of abstraction, to remind digital humanists of what computers are doing with binary operations and think about how these activities might be explained. In particular, we will try out various practical experiments that might be suitable for conveying to a wider audience — our non-digital peers in the humanities and the general public — just what it is that these machines do.
A fundamental mystery underpins much of today’s work in digital humanities: just how is it that a machine can store and process language? To help dispel the mystery we return to the mechanical and electrical level of binary operations, leaving aside the usual mathematical accounts to look at what happens when human language is encoded in this form.
What is it about computers that we need to explain to non-specialists when we are thinking this ‘close to the metal’? The building blocks of all our computers are logic gates, memory, and data storage. Using hands-on experimentation with simple electrical circuits (for gates and memory) and paper-tape (for data storage), the session will attempt to think through just how we might best help those who are entirely baffled by the digitization of our field to get a grounding in the use of machines for cultural work. The premise being explored here is that it is best not to start with programming environments like Scratch or Python, or high-level encoding standards like XML and TEI, that run within highly complex operating systems whose operations we fail to explain. Instead we should begin with noughts and ones and how current flowing in wires and marks in physical media can represent the two fundamental numbers, 0 and 1, that we use to encode language.
As well as hands-on experimentation with old computer equipment, the session will consider whether we can ‘perform’ various aspects of computer operations using actors. We will experiment with sending messages across the room in binary using physical props (flags, hats, lamps) and with the human body standing in for various parts of the computer as it performs its functions. This work is suitable for anyone willing to get up and follow simple instructions: no acting ability is required or even desirable.
Categories:Uncategorized|Comments Off on [CANCELLED] It’s all just noughts and ones
Ubiqu+ity (vep.cs.wisc.edu/ubiq/) which generates statistics and identifies linguistic patterns and groups.
WordHoard (wordhoard.northwestern.edu/), an application for the close reading and scholarly analysis of texts, largely used on this project for determining the log-likelihoods of specific words and generating word clouds to display this information in a user-friendly manner.
In TextLab we use these programs to analyse the language of Shakespeare and to find patterns and discrepancies that would almost certainly be invisible to the naked eye.
But can we also use them to solve a murder?
To demonstrate the uses of these various tools, we have developed a murder-mystery type scenario in which Romeo (of Romeo and Juliet) has been found murdered while staying in a house with Hamlet, Brutus, and Lady Macbeth. A confession note was found by the body, signed by Brutus, but he claims he is innocent. We will demonstrate how some of these analytical tools could help us identify the killer, simply from the language used in the letter.
Researchers in every field are being made increasingly aware of the need for their research to have impact. However, often researchers don’t realize where beyond their own field of study their research might have impact. How does one go about finding the documents on the Internet which could be connected conceptually to another document, such as one’s own work or project proposal? Searching for key terms is a good start but often the same kinds of documents come up top again and again and it is difficult to sift through the results to find something different and relevant. Further, documents from other domains or sources (e.g., government policy documents) may use a different vocabulary making them less likely to come up in keyword searches.
In the Text Analytics Group (TagLab) at Sussex, we are developing a system which does four things. First, it automatically identifies key words and phrases for a document or set of documents. Second, it searches the web using queries based on combinations of the key words/phrases and related words. Third, it allows the user to build custom classifiers (using active learning), e.g., for relevance. Finally, it clusters the results with a view to making it easier to identify documents or clusters of documents outside of existing known clusters. The purpose of this session is to teach delegates about the underlying technology and to give delegates the opportunity to play with and evaluate the prototype system, using their own work as input.
Delegates will ideally have a laptop with Google Chrome installed to be able to access the software which is run as a web service. It would also be helpful if delegates brought with them a digital copy of some of their own work (e.g., an academic paper or grant proposal) in raw text format (i.e., ‘.txt’) which can be uploaded and processed by the system. However, we can help with both installation of software and file formatting as required.
by James Gillray, published by Hannah Humphrey, hand-coloured stipple engraving, published 28 July 1792
The theme of this event is ‘scale’. Lately in digital humanities, we’ve tended to think about big scales – big data, longue durée. But what about the very small? In this session, I propose that we work together to ‘digitise’ a single day from the past, thinking about not only what that challenge means, but also what we can find out about the value of computers for understanding the small and the mundane. What does a digitised day look like? How much survives? Can we build a coherent picture? Picture of what?
I’d like to propose Friday 6 February 1789 as our case study. For most people then living, it was a very normal day. But for King George III, it was the first day his doctors allowed him to use his knife and fork, after an extended period of mental health problems. Thus, for George, it was an extraordinary day.
Drawing on our various experiences and disciplinary backgrounds, I hope you’ll help me explore the challenge of bringing together the various digital traces of a scant 24 hours from long ago. In the process I suspect we will be reorganizing the archive from one typically categorised by creator into one that emphasises a moment with innumerable perspectives.
Starting from 1837, the GRO Civil Registration index provides a nominally complete record of births, marriages and deaths for England and Wales. The FreeBMD project has transcribed the vast majority of this material (up to 1983, when the GRO register went digital. Ironically, the FreeBMD project is still negotiating to make this more recent data available). Free UK Genealogy, of which FreeBMD is one project, is committed to making all its data available under an open data licence, and is working towards this goal, initially by getting all contributors to sign an agreement which allows this. This session is intended to explore what might be possible once these data sources are indeed open.
The high degree of completeness in this data makes it feasible to think in terms of a ‘closed set’ (as against the ‘open world’ assumption that cultural history usually has to adopt). In principle it should be possible to algorithmically match deaths to births that fall within this period, thereby providing an extra impetus to single-name studies.
A companion project – FreeCEN – offers census data, which places individuals within households on a specific date. While the coverage of FreeCEN is less complete than that of FreeBMD, the data it does hold offers much richer information about relationships between individuals, placing them in a social/family context.
Richard Light has scraped all the FreeBMD and FreeCEN data relating to his own name and his mother’s maiden name. The data behind these experiments will be made available as open data. It currently lives in an XML database, and can be published using the Linked Data approach. The plan for this session is for Richard to explain what has been achieved so far with this data, and then for everyone to explore what other techniques might be applied to it.
SESSION REQUIREMENTS If you intend to come to this session and want to do this work on your own laptop, please make sure you do the following in advance of coming to Sussex (it may be possible during but the files are a bit big!):
If you have an old floppies, flash drives, CD, DVDs, or hard disks you want to try and capture as part of the session, bring them along! (and don’t worry, I’ll bring along some dummy media for us to play with)
The paper archive has been replaced by physical data storage – a new format that requires historians, archivists, and humanists to think and act afresh. In just 35 years most people – in Britain and worldwide – have come to create text and data in a fundamentally new way. The first step towards working with these personal digital archives if to preserve them. You can’t just turn on an old computer and start browsing: the act of booting it up adds new data to the archive with fresh data stamps, thus compromising its authenticity. Thankfully open source digital forensic tools aimed at archivists and scholars have made huge strides in recent years thanks largely to the efforts of the BitCurator project led by University of North Carolina Chapel Hill.
In this session, we’ll work together to capture some dummy media (bring your own if you want to work with the real thing!) and explore that media using BitCurator: a suite of open source digital forensics and data analysis tools design to help collecting institutions process born-digital materials.