You have a choice between three options: enter text in the text box, choose a demo text, or upload a file. A variety of file formats are supported: plain utf-8 text (.txt), and unless the formatting is especially convoluted, .pdf, .doc, .docx, .csv, .epub, .html, .odt, .rtf and .xls files.
The output is presented as a table, which is also available for download as a spreadsheet or TSV (tab separated values) file.
The table has five columns. The first shows the token (word, punctuation unit, url... whatever the tokenizer consideres to be one token) as it appeared in the original text. The second column shows the lemma, or base form, of the token. The next column shows the most likely morphological tags for the token. The final two columns represent named entitiesin BIO notation; the first one comes from FiNER, a rule-based tagger for contemporary Finnish text (rules written by Pekka Kauppinen, see Ruokolainen et al.), and the second from a Stanford NER tagger trained on historical (19th century) Finnish texts, see Kettunen and Löfberg.
Page generated in 0.01 seconds