-
Notifications
You must be signed in to change notification settings - Fork 35
Example: White Hosue Visitor Access Records
http://www.whitehouse.gov/briefing-room/disclosures/visitor-records has been providing links to data files describing visits to the White House. Those links have changed over the years, and the corresponding data provided has also changed. To deal with these changes, we cached the data to create the following versions (listed according to date of retrieval). The seven versions of the dataset are available from RPI's SVN.
- 0310
- 0510
- 0810
- 0910
- 2009-2010
- 0511
- 2011-Aug-26
This is the earliest version that we saved. We don't know where we got it, except that we promise that we followed the link that was listed at http://www.whitehouse.gov/briefing-room/disclosures/visitor-records. We weren't capturing any provenance.
The headers in the data file are listed below. We're pretty sure that WhiteHouse-WAVES-Key-1209.txt is a copy of the documentation available in March 2010.
NAMELAST
NAMEFIRST
NAMEMID
UIN
BDGNBR
ACCESS_TYPE
TOA
POA
TOD
POD
APPT_MADE_DATE
APPT_START_DATE
APPT_END_DATE
APPT_CANCEL_DATE
Total_People
LAST_UPDATEDBY
POST
LastEntryDate
TERMINAL_SUFFIX
visitee_namelast
visitee_namefirst
MEETING_LOC
MEETING_ROOM
CALLER_NAME_LAST
CALLER_NAME_FIRST
CALLER_ROOM
description
RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0510.csv on 2010-07-08 and got this.
When comparing the 0310 headers to the 0510 headers, we only see a capitalization/underscore tweak:
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
description description
RELEASE_DATE | Release Date
Although the data file had the string 0827, (presumably a month and day), we used 0810 to be consistent with the "convention" that they were following with "monthyear". Once they set a path with the first step, they change it on the second step of the journey!
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0827.csv on 2010-09-12 and got this.
When comparing the headers from 0510 to the headers of 0810, we see that they reverted to the older CAPS_UNDERSCORE naming for Release Date and decided that a capital Description looked nicer. Looks like we didn't save a copy of the header documentation this time; probably because the headers didn't change enough.
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
description | Description
Release Date | RELEASE_DATE
I really wish I knew who thought, asked, and executed the following thoughts:
- anonymous government entity: "Hey! Let's switch to TABS!"
- anonymous government entity: "And we'll keep the .csv extension!"
No changes to the headers (once you deal with the tabs):
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
Description Description
RELEASE_DATE RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-1210.zip on 2010-12-29 and got this, which uncompressed to this.
This appears to be some aggregate of the previous releases.
No header changes from 910:
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM CALLER_ROOM
Description Description
RELEASE_DATE RELEASE_DATE
We retrieved http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0511.zip on 2011-05-27 and got this, which uncompressed to this.
Government censorship! They removed CALLER_ROOM in this release.
NAMELAST NAMELAST
NAMEFIRST NAMEFIRST
NAMEMID NAMEMID
UIN UIN
BDGNBR BDGNBR
ACCESS_TYPE ACCESS_TYPE
TOA TOA
POA POA
TOD TOD
POD POD
APPT_MADE_DATE APPT_MADE_DATE
APPT_START_DATE APPT_START_DATE
APPT_END_DATE APPT_END_DATE
APPT_CANCEL_DATE APPT_CANCEL_DATE
Total_People Total_People
LAST_UPDATEDBY LAST_UPDATEDBY
POST POST
LastEntryDate LastEntryDate
TERMINAL_SUFFIX TERMINAL_SUFFIX
visitee_namelast visitee_namelast
visitee_namefirst visitee_namefirst
MEETING_LOC MEETING_LOC
MEETING_ROOM MEETING_ROOM
CALLER_NAME_LAST CALLER_NAME_LAST
CALLER_NAME_FIRST CALLER_NAME_FIRST
CALLER_ROOM <
Description Description
RELEASE_DATE RELEASE_DATE
On 2011-09-14, http://www.whitehouse.gov/briefing-room/disclosures/visitor-records said:
- "To download Part 1 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 7.4MB)"
- "To download Part 2 of the data released in 2011 in its raw format, click here. (.zip of a .csv, 3.4MB)"
- "To download an explanation of the column headers contained in the raw data file, click here. (.txt, 1.3KB)"
This version was named 2011-Aug-26 for the following reasons:
- The White House is not providing a clear identifier for this version, like it had (implicitly) done in the past (e.g.,
0310and0511). Although2011is the closest thing they provide now, we are concerned that this will not be distinctive enough in the future. - Part 1 (http://www.whitehouse.gov/files/disclosures/visitors/WhiteHouse-WAVES-Released-0711b.zip) was
Last-Modified2011-Jul-29. - Part 2 (http://www.whitehouse.gov/sites/default/files/visitors/whitehouse-waves-released-2011_part2.csv_.zip) was
Last-Modified2011-Aug-26. - The most recent date of the two data files referenced from their page is the best distinctive identifier we have.
Cached versions of the URLs referenced above are available at:
- Part 1 expands to a csv (with a VERY disorienting name!)
- Part 2 expands to a csv
- Documentation for the column headers
The headers of Part 1 and Part 2 are inconsistent:
0 NAMELAST 0 NAMELAST
1 NAMEFIRST 1 NAMEFIRST
2 NAMEMID 2 NAMEMID
3 UIN 3 UIN
4 BDGNBR 4 BDGNBR
5 ACCESS_TYPE 5 ACCESS_TYPE
6 TOA 6 TOA
7 POA 7 POA
8 TOD 8 TOD
9 POD 9 POD
10 APPT_MADE_DATE 10 APPT_MADE_DATE
11 APPT_START_DATE 11 APPT_START_DATE
12 APPT_END_DATE 12 APPT_END_DATE
13 APPT_CANCEL_DATE 13 APPT_CANCEL_DATE
14 Total_People 14 Total_People
15 LAST_UPDATEDBY 15 LAST_UPDATEDBY
16 POST 16 POST
17 LastEntryDate 17 LastEntryDate
18 TERMINAL_SUFFIX 18 TERMINAL_SUFFIX
19 visitee_namelast 19 visitee_namelast
20 visitee_namefirst 20 visitee_namefirst
21 MEETING_LOC 21 MEETING_LOC
22 MEETING_ROOM 22 MEETING_ROOM
23 CALLER_NAME_LAST 23 CALLER_NAME_LAST
24 CALLER_NAME_FIRST 24 CALLER_NAME_FIRST
25 Description | 25 CALLER_ROOM
26 RELEASE_DATE | 26 description
27 | 27 release_date
28 28