Skip navigation links
Stanford CoreNLP API

Package edu.stanford.nlp.wordseg

A package for doing Chinese word segmentation.

See: Description

Package edu.stanford.nlp.wordseg Description

A package for doing Chinese word segmentation.

This package makes use of the CRFClassifier class (a conditional random field sequence classifier) to do Chinese word segmentation.

On the Stanford NLP machines, usable properties files can be found at: /u/nlp/data/chinese-segmenter/Sighan2005/prop

Usage: For simplified Chinese:

java -mx200m edu.stanford.nlp.ie.crf.CRFClassifier -sighanCorporaDict $CH_SEG/data -NormalizationTable $CH_SEG/data/norm.simp.utf8 -normTableEncoding UTF-8 -loadClassifier $CH_SEG/data/ctb.gz -testFile $file -inputEncoding $enc
Author:
Pi-Chuan Chang, Huihsin Tseng, Galen Andrew
Skip navigation links
Stanford CoreNLP API

© 2002-2013 Stanford NLP Group