ELFA – English as a Lingua Franca in Academic Settings (elfa)

ELFA – English as a Lingua Franca in Academic Settings (elfa)

Currently available versions of this resource

Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level
Shortname	Name and metadata	License	Location	Cite	Resource group and help	Apply	Publication year	Support level

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information
Shortname	Name and metadata	License	Formats	Support level	Contact Person	Resource group and help	Location	Other information

Resource information

The ELFA corpus (English as a Lingua Franca in Academic Settings) contains approximately 1 million words of transcribed spoken academic English as a Lingua Franca (approximately 131 hours of recorded speech).

The data consists of both recordings and their transcripts, which are available in several versions:

The transcripts can be queried via the Korp interface (The Helsinki Korp Version of the ELFA Corpus). The Korp version of the corpus is publicly available.
The transcripts can be downloaded in plain text and XML format (The Transcriptions of the ELFA Corpus, Downloadable Version). This version is also publicly available.
The audio files, corresponding to the transcript files, can be downloaded for research use (The Audio Files of the ELFA Corpus, Downloadable Version). Due to personal data, this part of the ELFA material requires individual access permissions for which you may apply in the Language Bank Rights system, see instructions.

The recordings were made at the University of Tampere, the University of Helsinki, Tampere University of Technology, and Helsinki University of Technology.

The speech events in the corpus include both monologic events, such as lectures and presentations (33 % of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%).

As for the disciplinary domains, the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%).

Also the speakers in ELFA represent a wide range of first language backgrounds as the data comprises approximately 650 speakers with 51 different first languages ranging from African languages (e.g. Akan, Dagbani, Igbo, Kikuyu, Somali, Swahili), to Asian (e.g. Arabic, Bengali, Chinese, Hindi, Japanese, Persian, Turkish, Uzbek), and European languages (e.g. Czech, Danish, Dutch, French, German, Italian, Lithuanian, Polish, Portuguese, Russian, Romanian, Swedish etc.).The percentage of speech by native English speakers is 5%. Also, considering that the recordings were made in Finnish speaking universities, the percentage of speech by Finnish mother tongue speakers is relatively low at 28.5%.

Please note that this corpus contains personal data. By using the material, you agree to follow the personal data guidelines given by the Language Bank of Finland.

Content corresponding to the previous LAT version of the material is now available in Korp and the Language Bank download service

The experimental corpus version The Helsinki LAT Version of the ELFA Corpus is no longer available since the LAT service (lat.csc.fi) was discontinued in the Language Bank of Finland in December 2020. However, more accessible versions of the same content are maintained in Korp and in the download service.

License and access

Some versions of this resource are available publicly (PUB), whereas others require you to log in as an academic user (ACA) or to apply for individual access rights (RES). Apply
Click on the license image to see the resource-specific license text.
Some/all versions of this resource may contain personal data (license condition +PRIV). The license may then include additional data protection terms and conditions that you must follow. If processing personal data, maintain a public Privacy Notice regarding your project and provide the link to the Language Bank of Finland, see instructions.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-201403262

Last modified on 2025-09-26