The Hansard website receives more than 7 million hits a year, but we rarely get the chance to meet the people who use our content. So we had a bright idea. We could track down some of them, ask them how they use Hansard, and then blog about them. I say “track down” because I like to think I’d make a pretty mean Detective Columbo. But the truth is it's not always that hard. Sometime opportunities just fall into your lap.
Hansard at Huddersfield
At the start of July, Hansard held a launch event for our exciting new website. That's right - it now comprises over 200 years of debates in one location! It was at that event, over a coffee and a sugary pastry, that I met Fransina De Jager, the administrator of a project called Hansard at Huddersfield, which is currently funded by the Arts and Humanities Research Council.
Fransina and I geeked out for a while at the Hansard launch event. I graduated in linguistics, and we talked about corpus linguistics, which is the branch of the subject that informs the Hansard at Huddersfield project. Corpus linguistics is the study of language in collections of text. A corpus is a body of text - usually stored as an electronic database - that is representative of a language or an aspect of a language. For example, you could have a corpus of Jane Austen novels. Or, for that matter, Columbo scripts.
I just remember scraping through my corpus linguistics module at university. But I can't have embarrassed myself too much because Fransina invited me to the project's second end-user meeting. It was at this event, a couple of weeks later, that I met the rest of the team - over more coffee and sugary pastries.
The project is led by Professor Lesley Jeffries from the University of Huddersfield, with Dr Alexander von Lünen, Professor Marc Alexander, Dr Hugo Sanjurjo-González and Fransina. At the meeting, the team introduced the project to possible end users. They described the linguistic methods involved, and listened to user feedback and ideas.
Hansard at Huddersfield is a project that aims to use linguistic methods to make Hansard searchable in a variety of ways. "Isn't it already searchable?", you ask. Of course it is. You can search Hansard by date, by Member of Parliament, using a keyword or a string of words, or even by volume and column number. And this is very useful for people who are interested in the political narrative.
The Hansard at Huddersfield team hope to provide a tool that makes Hansard easy to search in a different way. This could be helpful for people who are more interested in monitoring patterns in data. Linguists and historians might want to study in detail changes in language use over time. Others might want to find out how subjects that affect them are debated by MPs, by looking at the wider topic, rather than searching for debates using keywords.
Hansard at Huddersfield builds on work done in the SAMUELS (Semantic Annotation and Mark-Up for Enhancing Lexical Searches) project, which was led by Professor Marc Alexander at the University of Glasgow in 2014-15.
The SAMUELS project developed a system for "automatically annotating words in texts with their precise meanings, disambiguating between possible meanings of the same word, with the aim of ultimately enabling a step-change in the way we deal with large textual data."
And what larger textual dataset can there be than 200 years of Hansard? The SAMUELS project semantically tagged Hansard debates from 1803 to 2005 using the Historical Thesaurus of English. This tagged version of Hansard is called the Hansard corpus.
The problem with the Hansard corpus is that it’s not made for the interested general user. It's not really even made for researchers in any field other than corpus linguistics. I definitely don’t find it intuitive, although that may be because I didn’t do so well in that corpus linguistic module seven years ago…
Lesley and her team in Huddersfield aim to change that by making this potentially powerful search tool more user-friendly, including by providing insightful data-driven visuals and perhaps updating the corpus to include content from the past decade.
Just one more thing
Not particularly original, but that subheading is the only Columbo reference I could think of to round off this post. This is the first in our “meet the users” series, and we’re on the lookout. We love hearing about the ways that people use Hansard. If you use or have used Hansard - either on a regular basis or for a one-off project - and would like to write a blog post or feature in one, please email email@example.com.
If you don’t have time to write a blog post, please let us know your thoughts on Hansard in our reader survey. It only takes two minutes, and we’d really value your contribution.