TEACH/PLAY – Treating BMD data as a ‘(semi-)closed set’

Starting from 1837, the GRO Civil Registration index provides a nominally complete record of births, marriages and deaths for England and Wales.  The FreeBMD project has transcribed the vast majority of this material (up to 1983, when the GRO register went digital. Ironically, the FreeBMD project is still negotiating to make this more recent data available). Free UK Genealogy, of which FreeBMD is one project, is committed to making all its data available under an open data licence, and is working towards this goal, initially by getting all contributors to sign an agreement which allows this.  This session is intended to explore what might be possible once these data sources are indeed open.

The high degree of completeness in this data makes it feasible to think in terms of a ‘closed set’ (as against the ‘open world’ assumption that cultural history usually has to adopt).  In principle it should be possible to algorithmically match deaths to births that fall within this period, thereby providing an extra impetus to single-name studies.

A companion project – FreeCEN – offers census data, which places individuals within households on a specific date.  While the coverage of FreeCEN is less complete than that of FreeBMD, the data it does hold offers much richer information about relationships between individuals, placing them in a social/family context.

Richard Light has scraped all the FreeBMD and FreeCEN data relating to his own name and his mother’s maiden name.  The data behind these experiments will be made available as open data.  It currently lives in an XML database, and can be published using the Linked Data approach.  The plan for this session is for Richard to explain what has been achieved so far with this data, and then for everyone to explore what other techniques might be applied to it.

Categories: Session Proposals, Session: Play, Session: Teach, Uncategorized |
About Richard Light

Worked for the Museum Documentation Association until 1991, developing museum data standards and systems. Freelance since then. Particular interest in markup languages (XML and friends) and Linked Data. Helped develop the Modes museum cataloguing software.