Akan POS Tagger (SpaCy model)

This is a custom-trained Part-of-Speech (POS) tagging model for the Akan language using spaCy.

✨ Installation and Usage

Prerequisites

pip install spacy huggingface_hub

Loading and Using the Model

from huggingface_hub import snapshot_download
import spacy

# Download the model from Hugging Face Hub
model_path = snapshot_download(repo_id="michsethowusu/akan-pos-tagger")

# Load the model
nlp = spacy.load(model_path)

# Use the model for POS tagging
doc = nlp("bosom som nyΙ›")
for token in doc:
    print(f"{token.text} -> {token.tag_}")

Expected Output

bosom -> N
som -> V
nyΙ› -> ADV

🏷 Complete POS Tags Reference

This model uses a comprehensive custom POS tagset with over 100 tags for Akan linguistic structures:

Adjectives

  • ADJ – adjective
  • ADJatt – attributive adjective
  • ADJC – comparative adjective
  • ADJpret – predicative adjective
  • ADJS – superlative adjective

Adverbs

  • ADV – adverb
  • ADVdir – directional adverb
  • ADVm – manner adverb
  • ADVneg – negative operator
  • ADVplc – place adverb
  • ADVtemp – temporal adverb

Articles and Determiners

  • ART – article
  • DET – determiner
  • DEM – demonstrative

Auxiliary and Copula

  • AUX – auxiliary
  • COP – copula
  • COPident – identity copula
  • COPloc – locative copula
  • COPneg – negative copula

Numerals

  • CARD – cardinal numeral (e.g. 4, sixty-five)
  • NUM – numeral
  • NUMpart – partitive numeral
  • ORD – ordinal

Conjunctions

  • CONJ – conjunction
  • CONJC – coordinating conjunction (e.g. and, or)
  • CONJS – subordinating conjunction (e.g. although, when)
  • CONJSinf – subordinating conjunction (introducing an infinitive clause)

Nouns

  • CN – common noun
  • N – common noun
  • Nbare – bare noun
  • Ncomm – noun with common gender
  • NFEM – feminine noun
  • NMASC – masculine noun
  • NNEUT – neuter noun
  • NNO – noun neutral for number (e.g. data, aircraft)
  • Np – proper noun
  • Npinst – name of an institution
  • Nploc – name of a location
  • Npname – personal name
  • Nrel – relational noun
  • Nspat – spatial noun

Pronouns

  • PN – personal pronoun
  • PNabs – absolute pronoun (Bantu)
  • PNana – pronominal anaphor
  • PNdem – demonstrative pronoun
  • PNposs – possessive pronoun
  • PNrefl – reflexive pronoun
  • PNrel – relative pronoun
  • PROint – interrogative pronoun
  • PROPana – propositional anaphor

Prepositions and Postpositions

  • PREP – preposition
  • PREPdir – directional preposition
  • PREPplc – locative preposition
  • PREPsel – selected preposition
  • PREPtemp – temporal preposition
  • PPOST – postposition

Particles

  • PRT – particle
  • PRTexist – existential marker
  • PRTinf – infinitive marker
  • PRTint – interrogative particle
  • PRTn – nominal particle
  • PRTneg – negative particle
  • PRTposs – possessive particle
  • PRTpred – predicative particle
  • PRTprst – presentational particle
  • PRTresp – response words such as "thanks, please, no, yes..."
  • PRTv – verbal particle

Verbs

  • V – verb
  • V1 – first verb in a verbal chain
  • V2 – second verb in a verbal chain
  • V3 – third verb in a verbal chain
  • V4 – fourth verb in a chain
  • Vbid – verb bid (Kwa)
  • Vcon – converb
  • Vdtr – ditransitive verb
  • Vimprs – impersonal verb
  • Vitr – intransitive verb
  • VitrOBL – intransitive verb with a prepositional object
  • Vlght – light verb
  • Vmod – modal verb
  • Vneg – negative verb
  • Vpre – preverb
  • Vrefl – reflexive verb
  • Vtr – transitive verb
  • VtrOBL – transitive verb with a prepositional object
  • Vvector – vector verb

Other Categories

  • CIRCP – circumposition
  • CL – clitic
  • CLF – classifier
  • CLFnom – nominal classifier
  • CLFnum – numeral classifier
  • COMP – complementiser
  • EXPL – expletive pronoun
  • INTRJCT – interjection
  • IPHON – ideophone, onomatopoeia
  • MOD – modifier
  • PTCP – participle
  • QUANT – quantifier
  • REL – relative clause marker
  • TRUNC – truncation
  • Wh – wh-word

Punctuation

  • PUL – punctuation: left bracket (e.g. ( or [)
  • PUN – punctuation: general separating mark (e.g. , ; . ! : ?)
  • PUQ – punctuation: quotation marks (' or ")
  • PUR – punctuation: right bracket (e.g. ) or ])

Special Categories

  • NE – named entity
  • XY – non-words such as "JU52"

πŸ”§ Technical Details

  • Framework: SpaCy
  • Language: Akan (ak)
  • Training Epoch: 1 (best-performing model)
  • Tag Set: Custom 100+ tag dictionary based on Akan linguistic structures
  • Model Type: Token classification for POS tagging

πŸ“Š Performance

The model achieves competitive accuracy on Akan POS tagging tasks, with specialized handling for the complex morphological and syntactic features of the Akan language.

🀝 Citation

If you use this model in your research, please cite:

@misc{{akan-pos-tagger,
  title={{Akan POS Tagger}},
  author={{{michsethowusu}}},
  year={{2025}},
  url={{https://huggingface.co/michsethowusu/akan-pos-tagger}}
}}

"""

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results