Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source Suomi-selkosuomi-rinnakkaiskorpus Ylen suomenkielisestä uutisarkistosta 2019-2020, lähdeaineisto short name: ylenews-fi-2019-2020-selko-par-src metadata: http://urn.fi/urn:nbn:fi:lb-2022111625 license: CLARIN ACA - NC The complete license is available at http://urn.fi/urn:nbn:fi:lb-2022050901 A copy of the license is included in LICENSE.txt. The license details may be subject to change, so before downloading the resource, please refer to the latest version of the license at the above link. This is a parallel corpus created of the Ylenews articles from 2019-2020 by aligning the standard Finnish versions with the easy-language versions. The corpus was created by Anna Dmitrieva and Aleksandra Konovalova and is available in csv format. Description of all columns in the dataset: - index_in_selko: This index consists of two parts divided by an underscore. The first (longer) part is the identifier of the entire Easy Finnish article taken from the original dataset. The second (shorter) part is the number of the paragraph. Since the Yle Selkosuomi articles usually consist of multiple paragraphs, each paragraph describing a separate piece of news, we represent each paragraph as a separate little article in our dataset. Paragraph numbering starts with 0. - index_in_regular: The identifier of the regular Finnish article taken from the original dataset. - selko_text: A piece of news in Easy Finnish. - regular_text: A corresponding piece of news in regular Finnish. - cos_sim: The cosine similarity score between the first 15 sentences of the articles in the pair (each sentence was vectorized with a SentenceTransformer model, then an average vector for each article in the pair was obtained, and finally, these two average vectors were compared). - status: A score given to this pair of articles by the human assessor. Positive status means that the articles are definitely talking about the same phenomenon. Negative means the opposite, that the articles definitely talk about something different. Neutral status means that it is unclear whether the articles talk about the same thing. - comments: Comments given by the human assessor. The news articles were obtained from the datasets available via Kielipankki (http://urn.fi/urn:nbn:fi:lb-2021050401 and http://urn.fi/urn:nbn:fi:lb-2021050701) For further information, please contact fin-clarin@helsinki.fi .