|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.tuebingen.uni.sfs.germanet.relatedness.Statistics
public class Statistics
Calculates some values used in the Relatedness class for the current GermaNet version (GN 8.0).
Constructor Summary | |
---|---|
Statistics()
|
Method Summary | |
---|---|
static double |
correlationBetweenTwoLists(java.lang.String file1,
java.lang.String file2,
int index,
java.lang.String encoding,
java.lang.String separator,
double min,
double max,
boolean includeUnknown)
Calculates Pearson's correlation between values from two files with relatedness values for the same word pairs; order does not matter. |
static int |
getLeskMax(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet,
boolean oneSense,
int size,
int limit,
boolean hypernymsOnly,
boolean includeGloss)
NO LONGER IN USE. |
static int |
getMaxDepth(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Calculates the maximum depth by finding all the leaves and comparing their distance to the root (edge counting). |
static int |
getMaxGlossLength(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Retrieves the maximum number of words in any GermaNet gloss (currently 33). |
static int |
getMaxHypernyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Retrieves the maximum number of hypernyms of any GermaNet Synset, (currently 6). |
static int |
getMaxHyponyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Retrieves the maximum number of hyponyms of any GermaNet Synset, (currently ). |
static double |
getMaxJcnValue(java.util.HashMap<java.lang.String,java.lang.Long> frequencies)
Finds the maximum possible 'distance' (sum of information content values) used in the Jiang & Conrath relatedness measure, which is the IC of 2 leaf nodes with the highest IC (information content), with the root as their LCS (least common subsumer): max_IC + max_IC - 2*0.0 = 2*max_IC Assuming that a leaf has the assigned default minimal frequency of 1, max_IC = -log(1/rootFreq) , which is approx. |
static int |
getMaxLeskValue(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
NO LONGER IN USE. |
static int |
getMaxOrthForms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Retrieves the maximum number of orthForms of any GermaNet Synset (currently 18). |
static int |
getMaxRelsNoHyponyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Retrieves the maximum number of relations of any GermaNet Synset, excluding hyponymy (currently 65). |
static int |
getMaxShortestPath(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
Returns the shortest path between the two Sysets with the largest distance. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Statistics()
Method Detail |
---|
public static int getMaxDepth(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- Instance of GermaNet.
public static int getMaxShortestPath(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- An instance of Germanet.
public static double getMaxJcnValue(java.util.HashMap<java.lang.String,java.lang.Long> frequencies)
max_IC + max_IC - 2*0.0 = 2*max_IC
max_IC = -log(1/rootFreq)
, which is approx. 37.51 for the
current version and frequency files.
frequencies
- HashMap holding the frequencies of all synsets
public static int getMaxRelsNoHyponyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- instance of GermaNet
public static int getMaxHypernyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- instance of GermaNet
public static int getMaxHyponyms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- instance of GermaNet
public static int getMaxOrthForms(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- instance of GermaNet
public static int getMaxGlossLength(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- instance of GermaNet
public static int getMaxLeskValue(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet)
gnet
- an instance of GermaNet
public static int getLeskMax(de.tuebingen.uni.sfs.germanet.api.GermaNet gnet, boolean oneSense, int size, int limit, boolean hypernymsOnly, boolean includeGloss)
gnet
- an instance of GermaNet
public static double correlationBetweenTwoLists(java.lang.String file1, java.lang.String file2, int index, java.lang.String encoding, java.lang.String separator, double min, double max, boolean includeUnknown)
file1
- word pairs with relatedness values from one measure)file2
- word pairs with relatedness values from another measureindex
- position of value in the csv file (0,1,2...). Must be behind
names; must be the same for both files.encoding
- Encoding of both files.separator
- the char(s) used to separate words in the input filesmin
- Smallest possible value in the distribution (e.g. 0).max
- Largest possible value in the distribution (e.g. 4).includeUnknown
- if true, pairs including one or two words with -1
values (unknown to GermaNet) are included in the calculation;
if false, correlation is calculated only based on the pairs of
known words.
WARNING: as is, this also excludes entries where the method failed
due to different categories. Need to distinguish!
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |