public class AnCoraProcessor
extends java.lang.Object
MultiWordPreprocessor,
SpanishTreeNormalizer.normalizeForMultiWord(Tree, TreeFactory)
- Heuristic parsing of expanded multi-word tokens (see
MultiWordTreeExpander
- Splitting of elided forms (al, del,
conmigo, etc.) and clitic pronouns from verb forms (see
SpanishTreeNormalizer.expandElisions(Tree),
SpanishTreeNormalizer.expandCliticPronouns(Tree)
- Miscellaneous cleanup of parse trees, spelling fixes, parsing
error corrections (see SpanishTreeNormalizer)
Apart from raw corpus data, this processor depends upon unigram
part-of-speech tag data. If not provided explicitly to the
processor, the data will be collected from the given files. (You can
pre-compute POS data from AnCora XML using AnCoraPOSStats.)
For invocation options, execute the class with no arguments.| Modifier and Type | Field and Description |
|---|---|
static java.util.HashSet<java.lang.String> |
auxTagConversion |
static java.util.HashSet<java.lang.String> |
potentialAUXWords |
| Constructor and Description |
|---|
AnCoraProcessor(java.util.List<java.io.File> inputFiles,
java.util.Properties options) |
| Modifier and Type | Method and Description |
|---|---|
static void |
convertTreeTagsToUD(Tree tree) |
static void |
main(java.lang.String[] args) |
java.util.List<Tree> |
process() |
public static java.util.HashSet<java.lang.String> auxTagConversion
public static java.util.HashSet<java.lang.String> potentialAUXWords
public AnCoraProcessor(java.util.List<java.io.File> inputFiles,
java.util.Properties options)
throws java.io.IOException,
java.lang.ClassNotFoundException
java.io.IOExceptionjava.lang.ClassNotFoundExceptionpublic java.util.List<Tree> process() throws java.lang.InterruptedException, java.io.IOException, java.util.concurrent.ExecutionException
java.lang.InterruptedExceptionjava.io.IOExceptionjava.util.concurrent.ExecutionExceptionpublic static void convertTreeTagsToUD(Tree tree)
public static void main(java.lang.String[] args)
throws java.lang.InterruptedException,
java.io.IOException,
java.util.concurrent.ExecutionException,
java.lang.ClassNotFoundException
java.lang.InterruptedExceptionjava.io.IOExceptionjava.util.concurrent.ExecutionExceptionjava.lang.ClassNotFoundException