wordvec – Word embeddings trained with word2vec

word2vec is a tool developed by the Turku NLP group for analyzing the semantic similarity of words.

This resource collection contains word embeddings trained with word2vec from various corpora. The embedding file is in a simple and easily parsed textual format produced by word2vec. The first line in the file gives the vocabulary size and dimension. Each line after that begins with a vocabulary item, followed by a space, followed by 128 floating point numbers (represented textually) each followed by a space.

Latest versions/subcorpora:  
Word embeddings trained with word2vec from the Finnish Text Collection
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Word embeddings trained with word2vec from the Suomi24 corpus
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions of this resource in META-SHARE  

Of this language corpus different versions/subcorpora are (or will be) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2022041401

Search the Language Bank Portal:
Aleksi Sahala
Researcher of the Month: Aleksi Sahala

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information