Name

aaltoasr-recognize.py — transcribe speech

Synopsis

Mylly: Speech → Aalto ASR Recognize
Input file
an audio file containing speech
Parameters
Character encoding (default UTF-8, TODO)
RawTranscript – produce transcript in raw format
SegMorph – also produce morph level segmentation
SegPhone – also produce phone level segmentation
Output files
script.txt (plain text file with requested content)
script.textgrid (Praat TextGrid transcript)
error.log

Description

Aalto ASR Recognize applies automatic methods to
transcribe a spoken audio file.

This is a heavy computation that can take many minutes even for
a short input file. It is recommended to experiment with just a
few words at first.

Input

Input consists of one audio file and parameters that
indicate optional forms of output.

Input file
some recognized audio format, e.g. WAV
SegPhone
Produce a phone level section (letter by letter) in the
plain text transcript file.
Character encoding
UTF-8 (not sure yet what this does in this tool)

Output

Output consists of a plain text transcript file together with a
Praat TextGrid version of the same result. There may also be an
error log that contains diagnostic output (even on a succesful
execution).

script.txt
script.textgrid

Examples

See also

This recognizer is available in Taito as aaltoasr-rec, in
module aaltoas.

An aligner is also available.

The Aalto ASR command line tools have a –help option and
a user
guide
.

Bugs

Search the Language Bank Portal:
Tommi Kurki
Researcher of the Month: Tommi Kurki

 

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4140599 / +358 29 4129317