Characterizing table completeness

$ java edu.rpi.tw.data.csv.util.BinaryTable
usage: BinaryTable <file> [--comment-character char] [--header-line headerLineNumber] [--delimiter delimiter]
                          [--column-stop colNumber]
see https://github.com/timrdf/csv2rdf4lod-automation/wiki/Characterizing-table-completeness

Column numbers along top, pattern occurrence frequency along right, completeness indication along bottom (|: all there, _: some missing).

The following sample output is applied against geonames US zip codes). One of the things this says is "41,940 rows have values for all cells except for cells 8 and 9. Three rows have values for all cells except for cells 6, 7, 8 and 9.".

bash-3.2$ java edu.rpi.tw.data.csv.util.BinaryTable source/US.txt --header-line 0 --delimiter '\t'
123456789012
.....    ...| 3
.......  .. | 41940
..... .  .. | 1
... .    .. | 4
.......  ...| 147
... ...  .. | 10
.....    .. | 1408
......   .. | 84
... . .  .. | 5
|||_|____||_

Characterizing table completeness

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!