ELFA – English as a Lingua Franca in Academic Settings

Current versions of this resource: 
The Helsinki Korp Version of the ELFA Corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Select the corpus in Korp icon-question-circle
The Transcriptions of the ELFA Corpus, Downloadable Version
icon-info-circle Metadata and license
icon-quote-rightAttribution instructions
Download the resource
The Audio Files of the ELFA Corpus, Downloadable Version
icon-info-circle Metadata and license
icon-quote-rightAttribution instructions
Apply for rights to access the resource
Download the resource
Search for other versions of this resource

The ELFA corpus (English as a Lingua Franca in Academic Settings) contains approximately 1 million words of transcribed spoken academic English as a Lingua Franca (approximately 131 hours of recorded speech).

The data consists of both recordings and their transcripts, which are available in several versions:

The recordings were made at the University of Tampere, the University of Helsinki, Tampere University of Technology, and Helsinki University of Technology.

The speech events in the corpus include both monologic events, such as lectures and presentations (33 % of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%).

As for the disciplinary domains, the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%).

Also the speakers in ELFA represent a wide range of first language backgrounds as the data comprises approximately 650 speakers with 51 different first languages ranging from African languages (e.g. Akan, Dagbani, Igbo, Kikuyu, Somali, Swahili), to Asian (e.g. Arabic, Bengali, Chinese, Hindi, Japanese, Persian, Turkish, Uzbek), and European languages (e.g. Czech, Danish, Dutch, French, German, Italian, Lithuanian, Polish, Portuguese, Russian, Romanian, Swedish etc.).The percentage of speech by native English speakers is 5%. Also, considering that the recordings were made in Finnish speaking universities, the percentage of speech by Finnish mother tongue speakers is relatively low at 28.5%.

Please note that this corpus contains personal data. By using the material, you agree to follow the personal data guidelines given by the Language Bank of Finland.

Further details on the terms and conditions regarding the different corpus versions are available in the corresponding metadata records.

The old LAT version of this corpus was removed in 2020

The experimental corpus version The Helsinki LAT Version of the ELFA Corpus is no longer available since the LAT service (lat.csc.fi) was discontinued in the Language Bank of Finland in December 2020. However, more accessible versions of the same content are maintained in Korp and in the download service.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201403262

Search the Language Bank Portal:
Juraj Šimko
Researcher of the Month: Juraj Šimko


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information