Corpus for the study of Language and Gender in Mexico and Spain (CoLaGe), text version Korpus kielen ja sukupuolen tutkimiseen Meksikossa ja Espanjassa (CoLaGe), tekstiversio Shortname: colage-txt Metadata: http://urn.fi/urn:nbn:fi:lb-2025090323 Rightholder: Pekka Posio Data controller, regarding personal data: University of Helsinki License: CLARIN ACA +PRIV +OTHER* v2.1 The complete license is available at http://urn.fi/urn:nbn:fi:lb-2025090321 A copy of the license is included in LICENSE_colage-txt.txt. The license details may be subject to change, so before downloading the resource, please refer to the latest version of the license at the above link. NB. This resource contains personal data. You must comply with the data protection terms and conditions when processing the personal data. See the license for details. Resource group page: http://urn.fi/urn:nbn:fi:lb-2024030607 Resource description: The corpus is the downloadable version of Corpus for the study of Language and Gender in Mexico and Spain (CoLaGe), text version. The data have been collected as part of the research project Gender, society, and language use: evidence from Mexico and Spain funded by Kone Foundation in Valencia, Spain (2021-2022) and Guadalajara, Mexico (2022–2023). The objective has been to create a comparable corpus of spoken Spanish from each city to enable the study of the interconnections between speaker gender, societal gender roles and expectations and variation in spoken language combining sociolinguistic and social psychological methodologies. The data consist of sociolinguistic interviews divided into parts where gender is vs. is not activated as discourse topic, and two role plays simulating conflictive situations, with the informant playing one role and the interviewer the other role. The informants represent a middle class socioeconomic background and are divided into two age groups, 30–40 and 60–70. A thorough description of the data and the sociolinguistic variables is available with the data. Structure of the data in download: The transcriptions are in .xlsx (Excel) and .eaf (Elan) format, except for phonetic material which has .TextGrid files (Praat). The data is divided into 3 packages depending on the subset (GDL_Diversity; Guadalajara; Valencia). Below is a list of the packages and their approximate sizes (in unpackaged format) as well as the number of files they contain: colage-txt-GDL_Diversity_transcripts.zip 11M 64 files colage-txt-Guadalajara_transcripts.zip 50M 300 files colage-txt-Valencia_transcripts.zip 31M 255 files The metadata in pseudonymized format is in file CoLaGe_metadata_CSV_pseudonymized.csv which is included in all the packages. For further information, please contact fin-clarin@helsinki.fi .