This resource collection contains word embeddings trained with word2vec from various corpora.
The embedding file is in a simple and easily parsed textual format produced by word2vec. The first line in the file gives the vocabulary size and dimension. Each line after that begins with a vocabulary item, followed by a space, followed by 128 floating point numbers (represented textually) each followed by a space.
Latest versions/subcorpora: | |
Word embeddings trained with word2vec from the Finnish Text Collection Metadata and license Attribution instructions |
Download the resource |
Word embeddings trained with word2vec from the Suomi24 corpus Metadata and license Attribution instructions |
Download the resource |
Search for all versions of this resource in META-SHARE |
Of this language resource several versions are (or will be) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022041401