class: center, middle, inverse, title-slide # Data wrangling ## Campaign finance ### Ben Baumer ### SDS 192Oct 7, 2020(
http://beanumber.github.io/sds192/lectures/mdsr_wrangling_07-fec16.html
) --- class: center, middle, inverse ![](https://raw.githubusercontent.com/baumer-lab/fec16/master/data-raw/Sticker/hex_fec16.png) --- background-image: url("../gfx/fec16_poster.png") background-size: contain --- ## `fec16`: the elections 🗳 ```r library(fec16) ``` .footnote[https://www.fec.gov/data/browse-data/?tab=bulk-data] - `candidates`: master table of ~5,000 candidates - e.g., `CLINTON, HILLARY RODHAM / TIMOTHY MICHAEL KAINE` -- - `committees`: master table of ~17,000 committees - e.g., `DONALD J. TRUMP FOR PRESIDENT, INC.` -- - e.g., ` KILLARY CLINTON` -- - May or may not be affiliated with a candidate - Lots of [different types](https://www.fec.gov/campaign-finance-data/committee-type-code-descriptions/) (e.g., PACs, etc.) -- - Election results - `results_house`: more than 435 congressional elections - `results_senate`: ~33 Senatorial elections - `results_president`: state-by-state results --- ## `fec16`: the money 🤑 - `pac`: **Summary** of PAC activity - ~12,000 rows -- - Sampled data: 1,000 records each - `contributions` - `individuals` - `expenditures` - `transactions` -- - `contributions`: Contributions from committees to candidates - ~500,000 transactions - Need to fetch full results with `read_all_contributions()` - Can be **for or against** candidate! (foreshadowing) -- - **You should probably just ignore the other tables!!** .footnote[https://www.fec.gov/campaign-finance-data/contributions-committees-candidates-file-description/] --- ## Ex: Who "gave" to Hillary? ```r library(fec16) hillary_id <- candidates %>% filter( cand_election_yr == 2016, cand_pty_affiliation == "DEM", str_detect(cand_name, "CLINTON") ) %>% pull(cand_id) hillary_id ``` ``` ## [1] "P00003392" ``` --- ## Ex: Who "gave" to Hillary? (cont'd) ```r contributions %>% filter(cand_id == hillary_id) %>% group_by(cmte_id) %>% summarize( num_transactions = n(), total = sum(transaction_amt) ) %>% arrange(desc(total)) %>% left_join(committees, by = "cmte_id") %>% select(num_transactions, total, cmte_nm) ``` ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 16 x 3 ## num_transactions total cmte_nm ## <int> <dbl> <chr> ## 1 1 23116 "UNITED BROTHERHOOD OF CARPENTERS AND JOINERS " ## 2 1 20000 "REPUBLICAN NATIONAL COMMITTEE" ## 3 1 17044 "FOR OUR FUTURE" ## 4 1 12500 "MOVEON.ORG POLITICAL ACTION" ## 5 1 11894 "NEA ADVOCACY FUND" ## 6 1 7391 "CALIFORNIA STATE COUNCIL OF SERVICE EMPLOYEES" ## 7 1 6199 "CONSERVATIVE MAJORITY FUND" ## 8 94 3518 "WORKING AMERICA" ## 9 1 2261 "HUMAN RIGHTS CAMPAIGN EQUALITY VOTES" ## 10 1 1267 "TEXAS ORGANIZING PROJECT PAC" ## 11 1 1000 "AMERICAN SOCIETY FOR METABOLIC AND BARIATRIC SURGERY… ## 12 1 830 "CITIZENS UNITED SUPER PAC LLC" ## 13 1 351 "TEA PARTY MAJORITY FUND" ## 14 1 300 "THE NATIONAL REPUBLICAN TRUST PAC" ## 15 2 206 "SEIU COPE (SERVICE EMPLOYEES INTERNATIONAL UNION COM… ## 16 4 110 "VIGOP (VIRGIN ISLANDS REPUBLICAN PARTY)" ``` --- ## Huh? ```r contributions %>% filter(cand_id == hillary_id) %>% * group_by(cmte_id, transaction_tp) %>% summarize(.groups = "drop", num_transactions = n(), total = sum(transaction_amt) ) %>% arrange(desc(total)) %>% left_join(committees, by = "cmte_id") %>% select(transaction_tp, num_transactions, total, cmte_nm) ``` ``` ## # A tibble: 16 x 4 ## transaction_tp num_transactions total cmte_nm ## <chr> <int> <dbl> <chr> ## 1 24F 1 23116 "UNITED BROTHERHOOD OF CARPENTERS AND … ## 2 24A 1 20000 "REPUBLICAN NATIONAL COMMITTEE" ## 3 24E 1 17044 "FOR OUR FUTURE" ## 4 24E 1 12500 "MOVEON.ORG POLITICAL ACTION" ## 5 24E 1 11894 "NEA ADVOCACY FUND" ## 6 24F 1 7391 "CALIFORNIA STATE COUNCIL OF SERVICE E… ## 7 24A 1 6199 "CONSERVATIVE MAJORITY FUND" ## 8 24E 94 3518 "WORKING AMERICA" ## 9 24E 1 2261 "HUMAN RIGHTS CAMPAIGN EQUALITY VOTES" ## 10 24E 1 1267 "TEXAS ORGANIZING PROJECT PAC" ## 11 24K 1 1000 "AMERICAN SOCIETY FOR METABOLIC AND BA… ## 12 24A 1 830 "CITIZENS UNITED SUPER PAC LLC" ## 13 24A 1 351 "TEA PARTY MAJORITY FUND" ## 14 24A 1 300 "THE NATIONAL REPUBLICAN TRUST PAC" ## 15 24E 2 206 "SEIU COPE (SERVICE EMPLOYEES INTERNAT… ## 16 24A 4 110 "VIGOP (VIRGIN ISLANDS REPUBLICAN PART… ``` .footnote[https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/] --- ## The FEC > How do they have so much information about all these candidates and where did the information get stored? > How did they get the addresses of all those people?? -- - Federal election law requires disclosure - Address are probably self-reported --- ## Corporate contributions > Is there a way to view the monetary values of contributions not made by individuals? -- - I don't know. You may be referring to ["dark money"](https://en.wikipedia.org/wiki/Dark_money) - Corporations can't donate "hard money" - [Citizen's Brochure](https://transition.fec.gov/pages/brochures/citizens.shtml) - [Who can and can't contribute](https://www.fec.gov/help-candidates-and-committees/candidate-taking-receipts/who-can-and-cannot-contribute/) .footnote[https://www.fec.gov/help-candidates-and-committees/candidate-taking-receipts/who-can-and-cannot-contribute/] --- ## Contribution limits > I am a little confused by the context of the data. I thought that **committees have a cap** ($5000) for their donations but, from the data, the contributions look much more than that. .footnote[https://en.wikipedia.org/wiki/Campaign_finance_in_the_United_States#Sources_of_campaign_funding] --- ## Codebook > I don't know what some variables in contribution and committees datasets. > What do all the numbers and symbols mean? > the abbreviations of column and row names are confusing me. Is there a key somewhere? -- - Use the help documentation in R .footnote[https://www.fec.gov/data/browse-data/?tab=bulk-data] -- - [Ballotpedia](https://ballotpedia.org/) --- ## Missing data > Why are employer and occupation all **empty** in contributions dataset? It happens in `cand_id` in committees dataset as well. > Lots of **data is missing** from the dataset and the labels are defiantly more confusing than other datasets in terms of what the categories are labeled. -- - That's what real data is like 🤷‍♂ --- ## Committees? > It doesn't say to whom the various individuals made the donations to, is there a way we can find that out? -- - individuals don't give to candidates, they give to committees - committees spend on behalf of or against candidates - use `cmte_id` and `cand_id` to link tables - note that `contributions` table has **both** - [different types of committees](https://www.fec.gov/campaign-finance-data/committee-type-code-descriptions/) .footnote[https://www.fec.gov/campaign-finance-data/committee-type-code-descriptions/] --- ## Negative amounts? > In the individuals and transactions data, how were some of the transaction amounts negative or zero? -- - donations can be [returned](https://www.fec.gov/help-candidates-and-committees/taking-receipts-political-party/refunds-contributions/) - Pay attention to [`transaction_type` codes](https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/): - `24A`: Independent expenditure **opposing** election of candidate - `24E`: Independent expenditure **advocating** election of candidate --- ## Run-offs > Under house results what is a run-off vote? -- - In various situations there can be a run-off vote - This is not common, but when it does occur it determines the winner ```r results_house %>% filter(state == "TX", district_id == 15) ``` ``` ## # A tibble: 11 x 13 ## state district_id cand_id incumbent party primary_votes primary_percent ## <chr> <chr> <chr> <lgl> <chr> <dbl> <dbl> ## 1 TX 15 H6TX15… FALSE D 22151 0.422 ## 2 TX 15 H6TX15… FALSE D 9913 0.189 ## 3 TX 15 H6TX15… FALSE D 8888 0.169 ## 4 TX 15 H6TX15… FALSE D 6152 0.117 ## 5 TX 15 H6TX15… FALSE D 3149 0.0600 ## 6 TX 15 H6TX15… FALSE D 2224 0.0424 ## 7 TX 15 H6TX15… FALSE R 13164 0.450 ## 8 TX 15 H6TX15… FALSE R 9349 0.320 ## 9 TX 15 H6TX15… FALSE R 6734 0.230 ## 10 TX 15 H6TX15… FALSE GRE NA NA ## 11 TX 15 H8TX28… FALSE LIB NA NA ## # … with 6 more variables: runoff_votes <dbl>, runoff_percent <dbl>, ## # general_votes <dbl>, general_percent <dbl>, won <lgl>, footnotes <chr> ``` .footnote[https://en.wikipedia.org/wiki/Texas%27s_15th_congressional_district#2016]