Name
aaltoasr-recognize.py — transcribe speech
Synopsis
- Mylly: Speech → Aalto ASR Recognize
- Input file
- an audio file containing speech
- Parameters
- Character encoding (default UTF-8, TODO)
- RawTranscript – produce transcript in raw format
- SegMorph – also produce morph level segmentation
- SegPhone – also produce phone level segmentation
- Output files
- script.txt (plain text file with requested content)
- script.textgrid (Praat TextGrid transcript)
- error.log
Description
Aalto ASR Recognize applies automatic methods to
transcribe a spoken audio file.
This is a heavy computation that can take many minutes even for
a short input file. It is recommended to experiment with just a
few words at first.
Input
Input consists of one audio file and parameters that
indicate optional forms of output.
- Input file
- some recognized audio format, e.g. WAV
- SegPhone
- Produce a phone level section (letter by letter) in the
plain text transcript file.
- Character encoding
- UTF-8 (not sure yet what this does in this tool)
Output
Output consists of a plain text transcript file together with a
Praat TextGrid version of the same result. There may also be an
error log that contains diagnostic output (even on a succesful
execution).
- script.txt
- script.textgrid
Examples
See also
This recognizer is available in Taito as aaltoasr-rec, in
module aaltoas.
An aligner is also available.
The Aalto ASR command line tools have a –help option and
a user
guide.