Helsinki Corpus of Swahili 2.0 (HCS 2.0)

General information

The Helsinki Corpus of Swahili 2.0 is available in the Language Bank of Finland (Kielipankki) as two versions: a Not Annotated Version and an Annotated Version.

The Not Annotated Version of the Helsinki Corpus of Swahili 2.0 is available for download. (More information in META-SHARE.)

The Annotated Version of Helsinki Corpus of Swahili 2.0 is available in the Korp concordance service at http://urn.fi/urn:nbn:fi:lb-201608301. (More information in META-SHARE.)

To use HCS 2.0 Annotated Version in Korp, you need to have an academic status in an institute belonging to the Haka or eduGAIN federation, or to apply for personal access rights. You also need to log in to Korp.

The annotated version of HCS 2.0

The corpus is divided into subcorpora as follows

  • Old material (old-mat, ~1952-2003)
    • Books (see list of authors below)
    • News (~1988–2003)
  • New material (new-mat, 2004–2015)
    • Bunge (2004–2006)
    • News (2005–2015)

Authors in old-mat/books

filename=”hcs-na-v2/old-mat/books/ADI”
title=”Adili na Nduguze”
author=”Shaaban Robert”
publisher=”EALB”
year=”1952″

filename=”hcs-na-v2/old-mat/books/EPO”
author=”Farouk Topan”
title=”Aliyeonja Pepo”
publisher=”TPH”
place=”Dar es Salaam”
year=”1977″

filename=”hcs-na-v2/old-mat/books/FAR”
author=”J. S. Mushi”
title=”Baada ya Dhiki Faraja”
publisher=”TPH”
place=”Dar es Salaam”
year=””

filename=”hcs-na-v2/old-mat/books/FAS”
author=””
title=”Makala za Semina ya Kimataifa ya Waandishi wa Kiswahili III Fasihi”
publisher=”Taasisi ya Uchunguzi wa Kiswahili cha Chuo Kikuu”
place=”Dar es Salaam”
year=”1983″

filename=”hcs-na-v2/old-mat/books/FED”
author=”Amanda Lihamba”
title=”Hawala ya fedha”
publisher=”TPH”
place=”Dar es Salaam”
year=””

filename=”hcs-na-v2/old-mat/books/GAM”
author=”E. Kezilahabi”
title=”Gamba la Nyoka”
publisher=”EAP”
place=”Dar es Salaam”
year=”1981 (1979)”

filename=”hcs-na-v2/old-mat/books/HIS”
author=”Ireri Mbaabu”
title=”Historia ya Usanifishaji wa Kiswahili”
publisher=”Longman”
place=”Nairobi”
year=”1991″

filename=”hcs-na-v2/old-mat/books/HUK”
author=”N. Ngahyoma”
title=”Huka”
publisher=”TPH”
place=”Dar es Salaam”
year=”1973″

filename=”hcs-na-v2/old-mat/books/KIC”
author=”E. Kezilahabi”
title=”Kishwamaji”
publisher=”Typography Ltd”
place=”Nairobi”
year=”1974″

filename=”hcs-na-v2/old-mat/books/KIN”
author=”Ebrahim N. Hussein”
title=”Kinjeketile”
publisher=”Oxford UP”
place=”Dar es Salaam na Nairobi”
year=”1969″

filename=”hcs-na-v2/old-mat/books/KUN”
author=”M. Mulokozi na K. Kahigi ”
title=”Kunga za Ushairi na Diwani Yetu”
publisher=”TPH”
place=”Dar es Salaam”
year=”1979″

filename=”hcs-na-v2/old-mat/books/LI-NYO”
author=”George Liwenga”
title=”Nyota za Huzuni”
publisher=TPH”
place=”Dar es Salaam”
year=”1981″

filename=”hcs-na-v2/old-mat/books/LUG”
author=”Abdu Mtajuka Khamisi”
title=”Makala za Semina ya Kimataifa ya Waandishi wa Kiswahili I, Lugha ya Kiswahili”
publisher=”Chuo Kikuu cha Dar es Salaam”
place=”Dar es Salaam”
year=”1983″

filename=”hcs-na-v2/old-mat/books/LWI”
author=”Martha Mlagala Mvungi”
title=”Lwidiko”
publisher=”TPH”
place=”Dar es Salaam”
year=””

filename=”hcs-na-v2/old-mat/books/MAS”
author=”Ebrahim N. Hussein”
title=”Mashetani”
publisher=”Oxford UP”
place=”Nairobi”
year=””

filename=”hcs-na-v2/old-mat/books/MAT”
author=”Salim A. Kibao”
title=”Matatu ya Thamani”
publisher=”Heinemann Educational Books”
place=”Nairobi”
year=”1975″

filename=”hcs-na-v2/old-mat/books/MKE”
author=””
title=”Mke mmoja waume watatu”
publisher=”EAPH”
place=”Dar es Salaam”
year=”1975″

filename=”hcs-na-v2/old-mat/books/MO-NYO”
author=”Suleiman Mohamed”
title=”Nyota ya Rehema”
publisher=”Oxford UP”
place=”Nairobi”
year=”1983″

filename=”hcs-na-v2/old-mat/books/MU-NYO”
author=”C. G. Mung’ong’o”
title=”Njozi iliyopotea”
publisher=”TPH”
place=”Dar es Salaam”
year=”1980″

filename=”hcs-na-v2/old-mat/books/MZI”
author=”E. Kezilahabi”
title=”Mzingile”
publisher=”DUP”
place=”Dar es Salaam”
year=”1991″

filename=”hcs-na-v2/old-mat/books/NAG”
author=”E. Kezilahabi”
title=”Nagona”
publisher=”DUP”
place=”Dar es Salaam”
year=”1990″

filename=”hcs-na-v2/old-mat/books/NGO”
author=””
title=”Ng’ombe akivunjika mguu”
publisher=”Longman”
place=”Nairobi”
year=””

filename=”hcs-na-v2/old-mat/books/NJO”
author=”William B. Seme”
title=”Njozi za Usiku”
publisher=”Longman”
place=”Nairobi”
year=”1975″

filename=”hcs-na-v2/old-mat/books/PEP”
author=”Saad S. Yahya”
title=”Pepeta”
publisher=”Kenya Litho Ltd”
place=”Nairobi”
year=”1973″

filename=”hcs-na-v2/old-mat/books/ROS”
author=”E. Kezilahabi”
title=”Rosa Mistika”
publisher=”EALB”
place=”Dar es Salaam”
year=”1971″

filename=”hcs-na-v2/old-mat/books/SH-INS”
author=”Shaaban Robert”
title=”Insha na mashairi”
publisher=”Thomas Nelson ans Sons Ltd”
place=”Dar es Salaam”
year=”1959″

filename=”hcs-na-v2/old-mat/books/SH-KIE”
author=”Shaaban Robert”
title=”Kielelezo cha Insha”
publisher=”Oxford University Press”
place=”Nairobi”
year=”1966″

filename=”hcs-na-v2/old-mat/books/SH-KUF”
author=”Shaaban Robert”
title=”Kufikirika”
publisher=”Mkuki na Nyota Publishers”
place=”Dar es Salaam”
year=”1991″

filename=”hcs-na-v2/old-mat/books/SH-KUS”
author=”Shaaban Robert”
title=”Kusadikika”
publisher=”Evans Brothers Kenya Ltd”
place=”Nairobi”
year=”1990 (1951)”

filename=”hcs-na-v2/old-mat/books/SH-PAM”
author=”Shaaban Robert”
title=”Pambo la Lugha”
publisher=”Oxford University Press”
place=”Nairobi”
year=”1966″

filename=”hcs-na-v2/old-mat/books/SH-SAN”
author=”Shaaban Robert”
title=”Sanaa ya Ushairi”
publisher=”Nelson”
place=”Nairobi”
year=”1972″

filename=”hcs-na-v2/old-mat/books/SH-WAS”
author=”Shaaban Robert”
title=”Wasifu wa Siti binti Saad”
publisher=”Mkuki na Nyota Publishers”
place=”Dar es Salaam”
year=”1991″

filename=”hcs-na-v2/old-mat/books/TAT”
author=”Said A. Mohamed”
title=”Tata za Asumini”
publisher=”Longman”
place=”Nairobi”
year=””

filename=”hcs-na-v2/old-mat/books/TWE”
author=”Freddy Macha”
title=”Twen’ Zetu Ulaya”
publisher=”Grand Arts Promotions”
place=””
year=”1984″

filename=”hcs-na-v2/old-mat/books/UJA”
author=”Julius K Nyerere”
title=”Ujamaa”
publisher=”Oxford UP”
place=”Dar es Salaam”
year=”1968″

filename=”hcs-na-v2/old-mat/books/USH”
author=”F. Senkoro”
title=”Ushairi”
publisher=”DUP”
place=”Dar es Salaam”
year=”1988″

filename=”hcs-na-v2/old-mat/books/YAR”
author=”Mark Lemki”
title=”Yarabi Maskini”
publisher=”EALB”
place=”Dar es Salaam”
year=”1976″

Annotations

The following tables list the annotation features used in the annotated version of HCS 2.0 available in Korp.

Part-of-speech features (attribute pos)

ABBR abbreviation
ADJ adjective
ADV adverb
AG-PART agent particle
CC coordinating conjunction
CONJ conjunction
CONJ/CC conjunction or coordinator
DEM demonstrative pronoun
EXCLAM exclamation
GEN-CON genitive connector
GEN-CON-KWA genitive connector, classes 15 and 17
INTERROG interrogative
N noun
NUM numeral
NUM-ROM Roman numeral
POSS-PRON possessive pronoun
PREP preposition
PREP/ADV preposition or adverb
PRON pronoun
PROPNAME proper name
REL-LI relative construction with -li- marker
REL-LI-VYO relative construction with -li- marker, referring to manner
REL-SI relative negative construction with -si- marker
REL-SI-VYO relative negative construction with -si- marker, referring to manner
TITLE title
V verb
V-BE auxiliary verb
V-DEF defective verb

Punctuation marks and diacritics (attribute pos)

COLON colon
COMMA comma
DOUBLE-QUOTE double quote
DOUBLE-QUOTE-CLOSING closing double quote
DOUBLE-QUOTE-OPENING opening double quote
HYPHEN hyphen
LEFT-PARENTHESIS left parenthesis
PERCENT-MARK percent mark
QUESTION-MARK question mark
RIGHT-PARENTHESIS right parenthesis
SEMI-COLON semicolon
SINGLE-QUOTE single quote
SINGLE-QUOTE-CLOSING closing single quote
SINGLE-QUOTE-OPENING opening single quote
SLASH slash
STOP stop

Features for nouns

Features for nouns (attribute msd)
1/2-PL plural of the noun class group 1/2
1/2-SG singular of the noun class group 1/2
10-PL noun class 10 plural
11-SG noun class 11 singular
11/10-PL plural of the noun class group 11/10
11/10-SG singular of the noun class group 11/10
11/6-PL plural of the noun class group 11/6
11/6-SG singular of the noun class group 11/6
15-SG noun class 15 singular
16-SG noun class 16 singular
17-SG noun class 17 singular
18-SG noun class 18 singular
3/4-PL plural of noun class group 3/4
3/4-SG singular of noun class group 3/4
5/6-PL plural of noun class group 5/6
5/6-SG singular of noun class group 5/6
6-PLSG noun class 6 plural with singular meaning, e.g. ’maji’
7/8-PL plural of noun class group 7/8
7/8-SG singular of noun class group 7/8
9/10-PL plural of noun class group 9/10
9/10-SG singular of noun class group 9/10
9/6-PL plural of noun class group 9/6
9/6-SG singular of noun class group 9/6
Single noun class features (attribute msd)
1-SG noun class 1, singular
1-SG1 first person singular
1-SG3 third person plural
2-PL noun class 2 plural
2-PL1 first person plural
3-SG noun class 3 singular
4-PL noun class 4 plural
5-SG noun class 5 singular
6-PL noun class 6 plural
7-SG noun class 7 singular
8-PL noun class 8 plural
9-SG noun class 9 singular
PL1 first person plural
PL2 second person plural
PL3 third person plural
SG singular
SG1 first person singular
SG2 second person singular
SG3 third person singular
LOC locative
LOC-16 locative class 16
LOC-17 locative class 17
LOC-18 locative class 18

Features for adjectives and numerals

A-INFL inflecting adjective
A-UNINFL noninflecting adjective
AD-ADJ adverb modifying adjective
ADJ-POST postmodifying adjective
ADJ-PR-REL adjective constructed with present tense relative
ADJ-PRE premodifying adjective
ADJ-REL adjective constructed with relative verb structure
ADJ-REL-NEG adjective constructed with negative present tense relative
CARD cardinal number
COMP comparative
NUM-INFL inflecting numeral
NUM-ROM Roman numeral
NUM-UNINFL noninflecting numeral
ORD ordinary number
OTE inflection class ’OTE’
SUPER superlative
Features for verbs
Features for verbs: Subject prefix (attribute msd)
9-SG-SP subject prefix of noun class 9 singular
4-PL-SP subject prefix of noun class 4 plural
SUB-PREF=1-SG subject prefix of class 1 singular
SUB-PREF=1-SG1 subject prefix of first person singular
SUB-PREF=1-SG2 subject prefix of second person singular
SUB-PREF=1-SG3 subject prefix of third person singular
SUB-PREF=10-PL subject prefix of class 10 plural
SUB-PREF=11-SG subject prefix of class 11 singular
SUB-PREF=15-SG subject prefix of class 15 singular
SUB-PREF=16-SG subject prefix of class 16 singular
SUB-PREF=17-SG subject prefix of class 17 singular
SUB-PREF=18-SG subject prefix of class 18 singular
SUB-PREF=2-PL subject prefix of class 2 plural
SUB-PREF=2-PL1 subject prefix of first person plural
SUB-PREF=2-PL2 subject prefix of second person plural
SUB-PREF=2-PL3 subject prefix of third person plural
SUB-PREF=3-SG subject prefix of class 3 singular
SUB-PREF=4-PL subject prefix of class 4 plural
SUB-PREF=5-SG subject prefix of class 5 singular
SUB-PREF=6-PL subject prefix of class 6 plural
SUB-PREF=6-PLSG subject prefix of class 6 plural with singular meaning
SUB-PREF=7-SG subject prefix of class 7 singular
SUB-PREF=8-PL subject prefix of class 8 plural
SUB-PREF=9-SG subject prefix of class 9 singular
SUB-PREF=HABIT-PL subject prefix of habitual verb form plural
SUB-PREF=HABIT-SG subject prefix of habitual verb form singular
SUB-PREF=PL1 subject prefix of first person plural
SUB-PREF=PL2 subject prefix of second person plural
SUB-PREF=SG1 subject prefix of first person singular
SUB-PREF=SG2 subject prefix of second person singular
Features for verbs: TAM markers (attribute msd)
TAM=COND-NEG:singe conditional present negative, marker -singe-
TAM=COND:ki conditional present, marker -ki-
TAM=COND:nge conditional present, marker -nge-
TAM=COND:ngeli conditional past, marker -ngeli-
TAM=FUT future
TAM=FUT:ta future, marker -ta-
TAM=FUT:taka future, marker -taka-
TAM=FUT:to future, marker -to-
TAM=NARR:ka narrative, marker -ka-
TAM=NEG-a present negative
TAM=PAST simple past
TAM=PAST-NEG:ku past negative, marker -ku-
TAM=PAST-NEG:kw past negative, marker -kw-
TAM=PAST:liisha past, marker -liisha-
TAM=PAST:likwisha past, marker -likwisha-
TAM=PAST:lisha past, marker -lisha-
TAM=PERF:me perfect, marker -me-
TAM=PERF:meisha perfect, marker -meisha-
TAM=PERF:mekwisha perfect, marker -mekwisha-
TAM=PERF:mesha perfect, marker -mesha-
TAM=PERF:sha perfect, marker -sha-
TAM=PR:a present, marker -a-
TAM=PR:na present, marker -na-
TAM=SBJN subjunctive
TAM=SBJN-CONS subjunctive consecutive
IMP imperative
IMP-PL2 imperative of second person plural
Features for verbs: Relative prefix (attribute msd)
REL-PREF=1-SG-SUB relative prefix referring to subject of class 1 singular
REL-PREF=10-PL relative prefix referring to class 10 plural
REL-PREF=11-SG relative prefix referring to class 11 singular
REL-PREF=15-SG relative prefix referring to class 15 singular
REL-PREF=16-SG relative prefix referring to class 16 singular
REL-PREF=17-SG relative prefix referring to class 17 singular
REL-PREF=18-SG relative prefix referring to class 18 singular
REL-PREF=2-PL-SUB relative prefix referring to subject of class 2 plural
REL-PREF=3-SG relative prefix referring to class 3 singular
REL-PREF=4-PL relative prefix referring to class 4 plural
REL-PREF=5-SG relative prefix referring to class 5 singular
REL-PREF=6-PL relative prefix referring to class 6 plural
REL-PREF=6-PLSG relative prefix referring to class 6 plural with singular meaning
REL-PREF=7-SG relative prefix referring to class 7 singular
REL-PREF=8-PL relative prefix referring to class 8 plural
REL-PREF=9-SG relative prefix referring to class 9 singular
1-SG-OBJ-REL noun class 1, relative, referring to object
10-PL-REL relative prefix referring to noun class 10 plural
16-SG-REL relative prefix referring to noun class 16 singular
2-PL-OBJ-REL relative prefix referring to object
3-SG-REL relative prefix referring to noun class 3 singular
8-PL-REL relative prefix referring to noun class 8 plural
Features for verbs: Object prefix (attribute msd)
OBJ-PREF=1-SG1 object prefix referring to first person singular
OBJ-PREF=1-SG2 object prefix referring to second person singular
OBJ-PREF=1-SG3 object prefix referring to third person singular
OBJ-PREF=10-PL object prefix referring to class 10 plural
OBJ-PREF=11-SG object prefix referring to class 11 singular
OBJ-PREF=15-SG object prefix referring to class 15 singular
OBJ-PREF=16-SG object prefix referring to class 16 singular
OBJ-PREF=2-PL1 object prefix referring to first person plural
OBJ-PREF=2-PL2 object prefix referring to second person plural
OBJ-PREF=2-PL3 object prefix referring to third person plural
OBJ-PREF=3-SG object prefix referring to class 3 singular
OBJ-PREF=4-PL object prefix referring to class 4 plural
OBJ-PREF=5-SG object prefix referring to class 5 singular
OBJ-PREF=6-PL object prefix referring to class 6 plural
OBJ-PREF=7-SG object prefix referring to class 7 singular
OBJ-PREF=8-PL object prefix referring to class 8 plural
OBJ-PREF=9-SG object prefix referring to class 9 singular
OBJ-PREF=PL-REFL reflexive object prefix -ji- referring to plural
OBJ-PREF=SG-REFL reflexive object prefix -ji- referring to singular
1-SG2-OBJ object prefix referring to second person singular
1-SG3-OBJ object prefix referring to third person singular
10-PL-OBJ object prefix referring to noun class 10 plural
15-SG-OBJ object prefix referring to noun class 15 singular
16-SG-OBJ object prefix referring to noun class 16 singular
17-SG-OBJ object prefix referring to noun class 17 singular
2-PL1-OBJ object prefix referring to first person plural
2-PL3-OBJ object prefix referring to third person plural
7-SG-OBJ object prefix referring to class 7 singular
9-SG-OBJ object prefix referring to class 9 singular
SG-REFL-OBJ reflexive prefix referring to singular
PL-REFL-OBJ reflexive object prefix -ji- referring to plural
Features for verbs: Verb extension markers (attribute msdextra)
APPL applicative
CAUS causative
CS subordinating conjunction
PASS passive
PS passive
REC reciprocal
REDUPL reduplication
STAT stative
Miscellaneous features of verbs (attribute msdextra)
AN-S verb requiring animate subject
HUM-S verb requiring human subject
AUX-WA auxiliary verb
COMPL completed action
COND-IF conditional verb form with the marker -if-
EMPH emphasis
HUM-ACT verb expressing human action
HUM-S verb requiring human subject
INFMARK infinitive marker
MONOSLB monosyllabic verb
NEG negative
NO-IN prepositon ’in’ not required
NO-TO prepositon ’to’ not required
NOSUBJ no subject required
OBJ object prefix
REL-LI relative verb construction with the tense marker -li-
REL-LI-VYO relative verb construction with the marker -vyo-
REL-SI relative prefix, marker -si-
SV intransitive verb
SVO monotransitive verb
SVOO ditransitive verb
VFIN finite verb
INF infinitive
INF-NEG negative infinitive

Miscellaneous features (attribute msdextra)

**CLB clause boundary
<LOC locative on the left
AR word of Arabic origin
CAP capital initial
CC-PL associotional copula attached to plural, e.g. ’nao’
CC-SG associotional copula attached to singular, e.g. ’naye’
DIM diminutive
DOM-AN domestic animal
FEM female
HC health care domain
HUM human
IDIOM idiom
MALE male
MASS mass
MWE multi-word expression
NA-POSS possessive pronoun ’na’
NO-GLOSS no gloss required
NOART no article required
NOGLOSS no gloss required
NON-STD non-standard form
NOUN+POSSESSIVE noun and possessive combined
NOVERB no verb required
PERS Persian origin
PLACE place
POR portuguese origin
POSS possessive
PREFR preferred reading
PROP-CAND candidate for proper name
PROVERB proverb
TIME time
WEEK week

Syntactic tags (attribute syntax)

@-FAUXV non-finite auxiliary predicator
@-FMAINV non-finite main predicator
@-FMAINV-n non-finite main predicator
@-FMAINVkwisha< non-finite main predicator, referring to completed action
@<AD-A postmodifying ad-adjective
@<DN determiner, noun on the left
@<NADJ adjective qualifier, noun on the left
@<NDEM denonstrative, noun on the left
@<NH postmodifying noun, head on the left
@<P other postmodifier
@<QN postmodifying quantifier
@A> appositional premodifier
@AD-A> premodifying ad-adjective
@ADVL adverbial
@AG agentive adverbial
@CC coordinator
@CS subordinator
@DN> determiner, noun on the right
@FAUXV finite auxiliary predicator
@FMAINV finite main predicator
@FMAINVintr finite main predicator, intransitive
@FMAINVintr-ass defective main predicator, associated with subject
@FMAINVintr-def defective main predicator, intransitive
@FMAINVintr-loc defective main predicator, locative
@FMAINVtr+OBJ> finite main predicator, transitive
@FMAINVtr-OBJ> finite main predicator, intransitive
@GCON genitive
@I-OBJ indirect object
@NADJ adjective
@NADJ> postmodifying adjective
@NDEM> demonstrative, noun on the right
@NH noun head
@OBJ object
@P> other premodifier
@PAT patient
@PCOMPL-S subject complement
@QN quantifier
@SUBJ subject
@SUBJ+rel referent of the following relative verb

Moving from HCS in Lemmie to HCS 2.0 in Korp

If you have used the previous version of HCS in the Lemmie service or if you try to replicate the examples in the HCS instructions for Lemmie in Korp’s extended search, please see the instructions for moving from Lemmie to Korp. In particular, note the following differences:

  • Annotation values in HCS 2.0 are in uppercase.
  • Some morphological features are in attribute msdextra although most are in msd. See the headings of the annotation tables below for the attribute names.
  • The attribute values of the CQP expressions used in Korp are regular expressions, so you need to replace the truncation symbol “*” of Lemmie with “.*“ (a full stop followed by an asterisk).
  • Attribute constraints for a single token are separated by an ampersand (&) in CQP.

For example, the Lemmie query


  [pos='v' msd='*caus*' msd='*appl*']

is converted to the following CQP query for Korp


  [pos='V' & msdextra='.*CAUS.*' & msdextra='.*APPL.*']
Search the Language Bank Portal:
Katri Hiovain-Asikainen
Researcher of the Month: Katri Hiovain-Asikainen

 

Upcoming events


Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information