Tool in Mylly: Simple statistics / Contingency tables as corner | fi
Contingency tables as corner
TODO how often some values where observed together and how often with something else
TODO four counts determine the other five
Input and output relations are represented as TSV files.
- attribute names for first combination
- attribute names for second combination
With X = x for the assertion that the first attributes X in the input relation have the particular values x, with X ≠ x for the assertion that they do not, and similarly with Y = y and Y ≠ y for the second attributes, the output relation has (for each observed combination x, y) the four schematically named counts cM12, cM1o, cMo2, cMss.
|Y = y||Y ≠ y||∑||∼||Y = y||Y ≠ y||∑|
|X = x||cM12||cM1o||X = x||cM12||cM1o||cM1s|
|X ≠ x||cMo2||X ≠ x||cMo2||cMoo||cMos|
The four numbers on the left determine the full table on the right.
The output attributes that correspond to the selected input attributes are named by appending to each input name a suffix (of1 or of2) to indicate whether the attribute belongs to X (the first selection) or to Y (the second selection).
TODO add here a concrete example from Mylly. Say, word and deprel against head word in dependency triple. TODO!
Show a couple of records from an actual relation.
The first record represents the following table.
|headof2 = TODO||headof2 ≠ TODO||∑|
|(deptof1, relnof1) = (TO, DO)||TODO||TODO|
|(deptof1, relnof1) ≠ (TO, DO)||TODO|
The filled part of the table determines the remaining part.
There should be tools to compute cooccurrence statistics from the contingency tables. TODO!
- contingency tables as cells
- contingency tables as columns
- contingency tables as margins
- contingency tables as rows
- Some link