public class Ssurgeon
extends java.lang.Object
<ssurgeon-pattern-list>
<ssurgeon-pattern>
<uid>...</uid>
<notes>...</notes>
<semgrex>...</semgrex>
<language>...</language>
<edit-list>...</edit-list>
</ssurgeon-pattern>
</ssurgeon-pattern-list>
The id is the id of the Ssurgeon operation. notes are comments on the Ssurgeon. semgrex is a Semgrex pattern to use when matching for this operation. edit-list is the actual Ssurgeon operation to execute. language is an optional field to determine what
language formalism to use when making new dependencies. By default
it will be English for SD when using the Java API, although most
people probably want UniversalEnglish for UD (including non-English
UD datasets) addEdge -gov node1 -dep node2 -reln depType -weight 0.5
relabelNamedEdge -edge edgename -reln depType
removeEdge -gov node1 -dep node2 reln depType
removeNamedEdge -edge edgename
reattachNamedEdge -edge edgename -gov gov -dep dep
addDep -gov node1 -reln depType -position where ...attributes...
editNode -node node ...attributes...
setRoots n1 (n2 n3 ...)
killAllIncomingEdges -node node
deleteGraphFromNode -node node
killNonRootedNodes
addEdge adds a new edge between two existing nodes.
-gov and -dep will be nodes matched by the Semgrex pattern.
-reln is the name of the dependency type to add.
relabelNamedEdge changes the dependency type of a named edge.
edge is the name of the edge in the Semgrex pattern.
-reln is the name of the dependency type to use.
removeEdge deletes an edge based on its description.
-gov is the governor to delete, a named node from the Semgrex pattern.
-dep is the dependent to delete, a named node from the Semgrex pattern.
-reln is the name of the dependency to delete.
If -gov or -dep are left empty, then all (matching) edges to or from the
remaining argument will be deleted.
removeNamedEdge deletes an edge based on its name.
edge is the name of the edge in the Semgrex pattern.
reattachNamedEdge changes an edge's gov and/or dep based on its name.
edge is the name of the edge in the Semgrex pattern.
-gov is the governor to attach to, a named node from the Semgrex pattern. If left blank, no edit.
-dep is the dependent to attach to, a named node from the Semgrex pattern. If left blank, no edit.
At least one of -gov or -dep must be set.
addDep adds a word and a dependency arc to the dependency graph.
-gov is the governor to attach to, a named node from the Semgrex pattern.
-reln is the name of the dependency type to use.
-position is where in the sentence the word should go. - will be the first word of the sentence,
+ will be the last word of the sentence, and -node or +node will be before or after the
named node.
...attributes... means any attributes which can be set from a string or numerical value
eg -text ... sets the text of the word
-pos ... sets the xpos of the word, -cpos ... sets the upos of the word, etc.
You cannot set the index of a word this way; an exception will be thrown.
To put whitespace in an attribute, you can quote it.
So, for example, a Vietnamese word can be set as -word "xin chào"
editNode will edit the attributes of a word.
-node is the node to edit.
...attributes... are the attributes to change, same as with addDep
setRoots sets the roots of the sentence to a new root.
n1, n2, ... are the names of the nodes from the Semgrex to use as the root(s).
This is best done in conjunction with other operations which actually manipulate the structure
of the graph, or the new root will weirdly have dependents and the graph will be incorrect.
killAllIncomingEdges deletes all edges to a node.
-node is the node to edit.
Note that this is the same as removeEdge with only the dependent set.
deleteGraphFromNode deletes all nodes reachable from a specific node.
-node is the node to delete.
You will only want to do this after separating the node from the parts of the graph you want to keep.
killNonRootedNodes searches the graph and deletes all nodes which have no path to a root.
A practical example comes from the UD_English-Pronouns
dataset, where some words had both nsubj and csubj
dependencies:
1 Hers hers PRON PRP Gender=Fem|Number=Sing|Person=3|Poss=Yes|PronType=Prs 3 nsubj _ _ 2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 cop _ _ 3 easy easy ADJ JJ Degree=Pos 0 root _ _ 4 to to PART TO _ 5 mark _ _ 5 clean clean VERB VB VerbForm=Inf 3 csubj _ SpaceAfter=No 6 . . PUNCT . _ 5 punct _ _
We can update this with the following Semgrex/Ssurgeon pair:
{}=source >nsubj {} >csubj=bad {}
relabelNamedEdge -edge bad -reln advcl
The result will be the csubj updated to advcl
For the most part, each of these operations is already bomb-proof,
eg the pattern will execute once and not repeat on the same part of
the same dependency graph.
However, in the case of addDep, it is not possible to automatically bomb-proof the command,
as certain sentences may legitimately have multiple words with the same attributes as dependents
of the same governor. In this case, it is necessary to make the Semgrex pattern itself bomb-proof.
As an example, if the intent is to change "Jennifer has lovely antennae" to "Jennifer has lovely blue antennae", the following command would "bomb":
{word:antennae}=antennae
addDep -gov antennae -reln dep -word blue
The following would not:
{word:antennae}=antennae !> {word:blue}
addDep -gov antennae -reln dep -word blue
| Modifier and Type | Class and Description |
|---|---|
static class |
Ssurgeon.ArgsBox |
static class |
Ssurgeon.RUNTYPE |
protected static class |
Ssurgeon.SsurgeonArgs |
| Modifier and Type | Field and Description |
|---|---|
protected static Ssurgeon.ArgsBox |
argsBox |
static java.lang.String |
DEP_NODENAME_ARG |
static java.lang.String |
EDGE_NAME_ARG |
static java.lang.String |
GOV_NODENAME_ARG |
static java.lang.String |
NAME_ARG |
static java.lang.String |
NODE_PROTO_ARG |
static java.lang.String |
NODENAME_ARG |
static java.lang.String |
POSITION_ARG |
static java.lang.String |
RELN_ARG |
static java.lang.String |
WEIGHT_ARG |
| Modifier and Type | Method and Description |
|---|---|
static SsurgPred |
assemblePredFromXML(org.w3c.dom.Element elt)
Constructs a
SsurgPred structure from file, given the root element. |
java.util.Collection<SemanticGraph> |
exhaustFromPatterns(java.util.List<SsurgeonPattern> patternList,
SemanticGraph sg)
Similar to the expandFromPatterns, but performs an exhaustive
search, performing simplifications on the graphs until exhausted.
|
java.util.List<SemanticGraph> |
expandFromPatterns(java.util.List<SsurgeonPattern> patternList,
SemanticGraph sg)
Given a list of SsurgeonPattern edit scripts, and a SemanticGraph
to operate over, returns a list of expansions of that graph, with
the result of each edit applied against a copy of the graph.
|
static java.lang.String |
getEltText(org.w3c.dom.Element element)
For a given Element, treats the first child as a text element
and returns its value.
|
static SsurgeonPattern |
getOperationFromFile(java.lang.String path)
Given a path to a file, converts it into a SsurgeonPattern
TODO: finish implementing this stub.
|
SsurgeonWordlist |
getResource(java.lang.String id)
Returns the given resource with the id.
|
java.util.Collection<SsurgeonWordlist> |
getResources() |
static java.lang.String |
getTagText(org.w3c.dom.Element element,
java.lang.String tag)
For the given element, returns the text for the first child Element with
the given tag.
|
void |
initLog(java.io.File logFilePath) |
static Ssurgeon |
inst() |
static void |
main(java.lang.String[] args)
Performs a simple test and print of a given file.
|
static SsurgeonEdit |
parseEditLine(java.lang.String editLine,
java.util.Map<java.lang.String,java.lang.String> attributeArgs,
Language language)
Given a string entry, converts it into a SsurgeonEdit object.
|
java.util.List<SsurgeonPattern> |
readFromDirectory(java.io.File dir)
Reads all Ssurgeon patterns from file.
|
java.util.List<SsurgeonPattern> |
readFromDocument(org.w3c.dom.Document doc) |
java.util.List<SsurgeonPattern> |
readFromFile(java.io.File file)
Given a path to a file containing a list of SsurgeonPatterns, returns
TODO: deal with resources
|
java.util.List<SsurgeonPattern> |
readFromString(java.lang.String text) |
void |
setLogPrefix(java.lang.String logPrefix) |
static SsurgeonPattern |
ssurgeonPatternFromXML(org.w3c.dom.Element elt)
Given the root Element for a SemgrexPattern (SSURGEON_ELEM_TAG), converts
it into its corresponding SemgrexPattern object.
|
void |
testRead(java.io.File tgtDirPath)
Reads in the test file and prints readable to string (for debugging).
|
static void |
writeToFile(java.io.File tgtFile,
java.util.List<SsurgeonPattern> patterns)
Given a target filepath and a list of Ssurgeon patterns, writes them out as XML forms.
|
static java.lang.String |
writeToString(SsurgeonPattern pattern) |
public static final java.lang.String GOV_NODENAME_ARG
public static final java.lang.String DEP_NODENAME_ARG
public static final java.lang.String EDGE_NAME_ARG
public static final java.lang.String NODENAME_ARG
public static final java.lang.String RELN_ARG
public static final java.lang.String NODE_PROTO_ARG
public static final java.lang.String WEIGHT_ARG
public static final java.lang.String NAME_ARG
public static final java.lang.String POSITION_ARG
protected static Ssurgeon.ArgsBox argsBox
public static Ssurgeon inst()
public void initLog(java.io.File logFilePath)
throws java.io.IOException
java.io.IOExceptionpublic void setLogPrefix(java.lang.String logPrefix)
public java.util.List<SemanticGraph> expandFromPatterns(java.util.List<SsurgeonPattern> patternList, SemanticGraph sg) throws java.lang.Exception
java.lang.Exceptionpublic java.util.Collection<SemanticGraph> exhaustFromPatterns(java.util.List<SsurgeonPattern> patternList, SemanticGraph sg) throws java.lang.Exception
java.lang.Exceptionpublic static SsurgeonPattern getOperationFromFile(java.lang.String path)
public SsurgeonWordlist getResource(java.lang.String id)
public java.util.Collection<SsurgeonWordlist> getResources()
public static SsurgeonEdit parseEditLine(java.lang.String editLine, java.util.Map<java.lang.String,java.lang.String> attributeArgs, Language language)
public static void writeToFile(java.io.File tgtFile,
java.util.List<SsurgeonPattern> patterns)
public static java.lang.String writeToString(SsurgeonPattern pattern)
public java.util.List<SsurgeonPattern> readFromString(java.lang.String text)
public java.util.List<SsurgeonPattern> readFromFile(java.io.File file)
public java.util.List<SsurgeonPattern> readFromDocument(org.w3c.dom.Document doc)
public java.util.List<SsurgeonPattern> readFromDirectory(java.io.File dir) throws java.lang.Exception
java.lang.Exceptionpublic static SsurgeonPattern ssurgeonPatternFromXML(org.w3c.dom.Element elt)
java.lang.Exceptionpublic static SsurgPred assemblePredFromXML(org.w3c.dom.Element elt)
SsurgPred structure from file, given the root element.java.lang.Exceptionpublic void testRead(java.io.File tgtDirPath)
throws java.lang.Exception
java.lang.Exceptionpublic static java.lang.String getTagText(org.w3c.dom.Element element,
java.lang.String tag)
public static java.lang.String getEltText(org.w3c.dom.Element element)
public static void main(java.lang.String[] args)