Best Practices for Spoken Corpora in Linguistic Research

A key concern of researchers involved in the creation and sharing of language resources is to attain maximum usability, reliability and longevity of these resources for present and future researchers in the language sciences. The view developed in this volume is that spoken corpora construction and sharing are major research endeavours that should also be laid open to academic debate in a manner that is more visible than is currently the case in corpus linguistics.

The present volume brings together multiple research perspectives to bear on the question of what constitutes best practices for the construction of spoken corpora. The book brings into closer contact scholars whose specializations have often remained in relatively different streams of scientific investigation; that is, scholars whose work falls primarily in conversation analysis, pragmatics and discourse analysis, but who are involved in spoken corpus compilation, on the one hand, and scholars who also specialize in linguistics but who have been intensively involved in developing various infrastructures for spoken corpora, on the other hand. This combination of scholars brings into better relief the concerns of data providers, data curators and data users in linguistic research.

This book is thus unique in that it highlights best practices from both the perspective of assembling, annotating and linguistic analysis of spoken corpora, as well as from the perspective of processing, archiving and disseminating spoken language. In doing so, the contributions emphasise not only the considerable promise that the rapid technological changes that society continues to experience in this area offer, but also possible dangers for the unwary.

Şükriye Ruhi retired from Middle East Technical University as Professor of Linguistics in 2012. She continues to conduct research in pragmatics and is project director of the Spoken Turkish Corpus.

Michael Haugh is an Associate Professor in Linguistics and International English at Griffith University, Brisbane, Australia. His main research interests are in pragmatics and conversation analysis. He has taken a leading role in the establishment of the Australian National Corpus (www.ausnc.org.au), as well as in creating the Griffith Corpus of Spoken Australian English.

Thomas Schmidt is Head of the Archive of Spoken German at the Institute for the German Language in Mannheim. His main research interests are in text and corpus technology. He is one of the developers of the EXMARaLDA system (www.exmaralda.org), and is currently in charge of building up the Research and Teaching Corpus of Spoken German (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).

Kai Wörner is currently coordinating all activities related to the curation and archiving of research data at the Faculty of Humanities of the Universität Hamburg. He is also a developer of the EXMARaLDA system and member of the Hamburg Centre for Language Corpora (www.corpora.uni-hamburg.de).

"The book provides an overview of numerous projects of spoken corpora, and discusses the main issues related to the standardization, creation, annotation, copyright, and conservation of these data. It provides clear explanations for the non-specialist, and discusses key issues of interest for the specialist as well. [...] This book is an important contribution to the documentation of ongoing projects on corpora creation. The authors provide detailed descriptions that offer the reader enough information on current standards, project content, and the rationale behind the decision making in corpus linguistics. This book is particularly useful in the creation of large databases for a diverse body of languages."

Yolanda Rivera Castillo University of Puerto Rico LINGUIST List, 27.1.15

Buy This Book

New and Forthcoming

Shop All

Menu

Best Practices for Spoken Corpora in Linguistic Research