aaltoasr-recognize.py — transcribe speech
- Mylly: Speech → Aalto ASR Recognize
- Input file
- an audio file containing speech
- Character encoding (default UTF-8, TODO)
- RawTranscript – produce transcript in raw format
- SegMorph – also produce morph level segmentation
- SegPhone – also produce phone level segmentation
- Output files
- script.txt (plain text file with requested content)
- script.textgrid (Praat TextGrid transcript)
Aalto ASR Recognize applies automatic methods to
transcribe a spoken audio file.
This is a heavy computation that can take many minutes even for
a short input file. It is recommended to experiment with just a
few words at first.
Input consists of one audio file and parameters that
indicate optional forms of output.
- Input file
- some recognized audio format, e.g. WAV
- Produce a phone level section (letter by letter) in the
plain text transcript file.
- Character encoding
- UTF-8 (not sure yet what this does in this tool)
Output consists of a plain text transcript file together with a
Praat TextGrid version of the same result. There may also be an
error log that contains diagnostic output (even on a succesful
This recognizer is available in Taito as aaltoasr-rec, in
An aligner is also available.
The Aalto ASR command line tools have a –help option and