-
Notifications
You must be signed in to change notification settings - Fork 35
Characterizing table completeness
timrdf edited this page May 1, 2011
·
27 revisions
bash-3.2$ java edu.rpi.tw.data.csv.util.BinaryTable
usage: BinaryTable <file> [--comment-character char] [--header-line headerLineNumber] [--delimiter delimiter]
Column numbers along top, pattern occurrence frequency along right, completeness indication along bottom (for geonames US zip codes):
bash-3.2$ java edu.rpi.tw.data.csv.util.BinaryTable source/US.txt --header-line 0 --delimiter '\t'
123456789012
..... ...| 3
....... .. | 41940
..... . .. | 1
... . .. | 4
....... ...| 147
... ... .. | 10
..... .. | 1408
...... .. | 84
... . . .. | 5
|||_|____||_
- Script: cr test conversion.sh uses this to gray out columns to indicate which may have missing values, which helps design the enhancement parameters for geonames.