Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus

Andrew Kehoe, Matt Gee

    Research output: Contribution to journalArticlepeer-review

    Abstract

    This paper presents work based on the new Birmingham Blog Corpus: a 600 million word collection of blog posts and reader comments, available through the WebCorp Linguist?s Search Engine interface. We begin by describing the steps involved in building the corpus, including a discussion of the sources chosen for blog data, the ?seeding? techniques used, and the design decisions taken. We then go on to focus on textual ?aboutness? (Phillips 1985). Whereas in previous work we examined social tagging sites as an aboutness indicator (Kehoe & Gee 2011), in this paper we analyse the reader comments found at the bottom of posts in our blog corpus. Our aim is to determine whether free-text comments offer different insights into the reader perspective on aboutness than those offered by social tags, and whether comments present further linguistic challenges. Online comments are often associated with blogs but are found increasingly on web documents of all kinds, and we also examine the growing importance of reader comments on online news articles.
    Original languageEnglish
    JournalStudies in Variation, Contacts and Change in English: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis
    Volume12
    Publication statusPublished (VoR) - 2012

    Fingerprint

    Dive into the research topics of 'Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus'. Together they form a unique fingerprint.

    Cite this