public class Gale2007ChineseSegmenterFeatureFactory<IN extends CoreLabel> extends FeatureFactory<IN>
c is Chinese character ("char"). c means current, n means next and p means previous.
| Feature | Templates |
|---|---|
| Current position clique | |
| useWord1 | CONSTANT, cc, nc, pc, pc+cc, if (As|Msr|Pk|Hk) cc+nc, pc,nc |
cliqueC, cliqueCnC, cliqueCp2C, cliqueCp3C, cliqueCp4C, cliqueCp5C, cliqueCpC, cliqueCpCnC, cliqueCpCp2C, cliqueCpCp2Cp3C, cliqueCpCp2Cp3Cp4C, cliqueCpCp2Cp3Cp4Cp5C, flags, knownCliques| Constructor and Description |
|---|
Gale2007ChineseSegmenterFeatureFactory() |
| Modifier and Type | Method and Description |
|---|---|
protected java.util.Collection<java.lang.String> |
featuresC(PaddedList<? extends CoreLabel> cInfo,
int loc) |
protected java.util.Collection<java.lang.String> |
featuresCnC(PaddedList<? extends CoreLabel> cInfo,
int loc)
For a CRF, this shouldn't be necessary, since the features duplicate
those from CpC, but Huihsin found some valuable, presumably becuase
it modified the regularization a bit.
|
protected java.util.Collection<java.lang.String> |
featuresCpC(PaddedList<? extends CoreLabel> cInfo,
int loc) |
protected java.util.Collection<java.lang.String> |
featuresCpCp2C(PaddedList<? extends CoreLabel> cInfo,
int loc)
Second order clique features
|
protected java.util.Collection<java.lang.String> |
featuresCpCp2Cp3C(PaddedList<? extends CoreLabel> cInfo,
int loc) |
java.util.Collection<java.lang.String> |
getCliqueFeatures(PaddedList<IN> cInfo,
int loc,
Clique clique)
Extracts all the features from the input data at a certain index.
|
void |
init(SeqClassifierFlags flags) |
addAllInterningAndSuffixing, eachClique, getCliques, getCliques, getWordpublic Gale2007ChineseSegmenterFeatureFactory()
public void init(SeqClassifierFlags flags)
init in class FeatureFactory<IN extends CoreLabel>public java.util.Collection<java.lang.String> getCliqueFeatures(PaddedList<IN> cInfo, int loc, Clique clique)
getCliqueFeatures in class FeatureFactory<IN extends CoreLabel>cInfo - The complete data set as a List of WordInfoloc - The index at which to extract features.clique - The particular clique for which to extract features. It
should be a member of the knownCliques list.Collection of the features
calculated for the word at the specified position in info.protected java.util.Collection<java.lang.String> featuresC(PaddedList<? extends CoreLabel> cInfo, int loc)
protected java.util.Collection<java.lang.String> featuresCpC(PaddedList<? extends CoreLabel> cInfo, int loc)
protected java.util.Collection<java.lang.String> featuresCnC(PaddedList<? extends CoreLabel> cInfo, int loc)
cInfo - The list of charactersloc - Position of c in listprotected java.util.Collection<java.lang.String> featuresCpCp2C(PaddedList<? extends CoreLabel> cInfo, int loc)
cInfo - The list of charactersloc - Position of c in listprotected java.util.Collection<java.lang.String> featuresCpCp2Cp3C(PaddedList<? extends CoreLabel> cInfo, int loc)