Archives For May 2014

Screen Shot 2014-05-26 at 1.10.01 PM

As part of National Day of Civic Hacking, we are organizing an OpenElections challenge for the hacking events at locations all over the country – Sat May 31 and Sun June 1st.

If you are attending one of these events near you, and would like to join in on our effort to write scrapers for elections results, let us know!

Write Scrapers for us…
Help us extend our core scraper architecture to create a series of custom scrapers that account for the idiosyncrasies in how each state structures data, stores it, and makes it available.

**Our docs for this process are now up on our site. Look here to see what would be involved with joining in**

Your time and expertise would be most appreciated either day. Also, feel free to join in from home.

If you would like to help out, email sschnadt.projects@gmail.com either or tweet at us @OpenElex either before the event or on the day. Our team will be online and available to get you set up.

Thank you!

The OpenElections Team

Interview with TurboVote Co-Founder Kathryn Peters

kathryn-peters-056a70501d0122524c48c2be9d6a0d97

In this series of interviews, OpenElections has conversations with the leadership of other initiatives that are improving data transparency, easing the voting process and applying new technologies to elections.

For our first piece we talk to Kathryn Peters, co-founder of TurboVote, our sister Knight News Challenge: Data project. TurboVote is a service that aims to make it as easy to vote, and keep track of all the elections you can participate in, as it is to do all the other things we now do online.

***

OE: How did the TurboVote project and Democracy Works Inc. come about, and what were your motivations for starting them?

KP: Seth [Flaxman, TurboVote co-founder] spent a summer in college registering voters in Philadelphia with a sandwich board and a stack of paper forms, and recognized that there had to be a better way to reach would-be voters than standing on street corners. When he finished his first semester of grad school and realized he’d missed a local election back home, that same realization struck him again – voting should fit the way we live. We live online, on our phones, with services and applications that help organize our lives and simplify daily tasks.

Seth asked my advice in building an election-reminder service. My first response was incredulity. I’m from Columbia, MO, where the county clerk Wendy Noren builds her own voter engagement tools and has sent email reminders about upcoming elections for a decade already. I just assumed that these were normal voter services. Once Seth convinced me that Wendy’s online voter services were rare, it made perfect sense to try and make them available to every voter. So we started prototyping.

OE: What background(s) do you bring to this work?

KP: Seth and I met in a graduate policy program, so we’re both deeply committed to innovating with and for government–in this case, local election administrators–which sets us apart from most of the tech startups we know. Seth’s previous work had been as a researcher (at the Council on Foreign Relations), and he approached graduate school with a big research question: why does the Internet seem to be passing government by? I had worked in both political organizing and information management, but was studying international affairs and thinking about how we promote and support democratic processes abroad. Those two concerns came together in a really fantastic way, even if it means I’m in Brooklyn instead of, say, Cairo right now.

OE: Can you describe how TurboVote impacts an individual voter?

KP: It depends a lot on the voter, where they are and what they need. But let’s imagine a college freshman, who arrives on campus and is offered the opportunity to register to vote during orientation, and decides to register at her parents’ home in another state. As she signs up, we’ll also get her on-campus address, and ask if she’ll need to vote by mail in elections back home. So after she joins TurboVote, we’ll send her a voter registration form filled out with her information with an addressed, stamped envelope so she can return it to her election administrator. And then as an election comes up, we’ll send her an email reminder and mail her an absentee ballot request form, again with a stamped envelope so all she has to do is sign it and send it in. And then we’ll send her reminders about the deadline to submit those forms so she gets everything in the mail on time. And election after election, she’ll hear from us and have whatever forms and information she needs to take part, even in local elections she might not hear about living on a college campus the next state over, for example.

We designed a simplified flow chart to try and simplify all the many ways we serve different voters.

process_flow_chart

OE: TurboVote is one of three projects currently in your roster. How has your work expanded and further defined itself this year?

KP: TurboVote’s growth in 2012 demonstrated how much demand there is for voting information and services, but the only way to do this sustainably is if government eventually adopts it and takes on these new tools for voter outreach. To that end, we spent 2013 researching local election administrations across the country, spending six weeks shadowing offices across six states and learning about their work, their staff, the tech they’re using, their needs and motivations. We found dedicated innovators making incremental improvements at every election in pursuit of better elections for their voters. And we found dozens of ideas worth building or popularizing that could help them run elections better, more simply.

From that research, we started building Ballot Scout, which makes it easy to add Intelligent Mail barcodes to absentee ballot envelopes and trace them through the postal system. Right now, most election officials send out their absentee ballots, get some of them back, and have no way of knowing if the others went undelivered, or weren’t cast, or are delayed in a postal processing facility and will arrive three days after the election. Barcode tracking gives officials better insight into what happens to those ballots as they leave the election office, and the ability to intervene if anything goes wrong. We’re working with seven counties from Oregon to Florida to test Ballot Scout this fall (and we’re still looking for three more counties to join the beta).

And last summer, the Pew Charitable Trusts asked us if we’d consider taking on data and technology support for the Voting Information Project. It’s the biggest election dataset in the country, providing tens of millions of Americans with polling place information each cycle, and we were eager to help build out its permanent infrastructure for data collection and processing. It’s also connected us to state election officials and let us get to know their work and needs, as well as those of the counties we’d been working with previously.

OE: What is your business model, and how does it inform your effectiveness?

KP: We’re a 501(c)(3) nonprofit, currently funded through grants from the Knight Foundation, Democracy Fund, and Google, among many others. TurboVote operates on a partnership/fee model, where each of our partner organizations contributes a small amount toward our operating costs, and we’re developing a pricing model for Ballot Scout that will do the same for that service. As we continue to grow and add new partners, these revenues should bring us to fiscal sustainability by 2017, ensuring that we can continue our work without major donations.

OE: How does Democracy Works fit within the ecosystem of voting infrastructure projects going on now? Are there other best practices you are aware of?

KP: Great question. The ecosystem is somewhat ad hoc, but we’ve used research by Dana Chisnell and Whitney Quesenbery at Civic Design for information on what  voters are looking for and how they interact with election data online, and we’re currently collaborating with ELECTricity on a project to offer free website templates to local election offices that takes the Civic Design best practices and implements them by default. We pool our election research with Long Distance Voter, whose forms we use in states that don’t otherwise provide a ballot request form, for example, and we compare deadlines, election administrator addresses, and other data where we can help check and support each others’ work.

We’re also participating in the third-annual National Voter Registration Day, which brings together civic organizations like the League of Women Voters, the Bus Federation, and Voto Latino to celebrate voting and engage new voters across the country.

I’m also keeping an eye on projects in both Los Angeles County, CA and Travis County, TX, where election administrators have recruited designers, computer scientists, academics and citizens to reimagine voting machines. Both are designing their projects to be open-source and available to other jurisdictions, and I think it’s a fantastic model for the kind of collaboration I’d like to see become even more popular in this space.

OE: What do you think of the recent Presidential Commission on Elections Administration and it’s findings? Will it affect how your work is rolled out?

KP: I’m a big fan of the report! The Presidential Commission on Election Administration issued a practical list of recommendations–and accompanying tools–that can help election officials run better elections. They think postal ballot-tracking is a great idea, too, so I may be a little bit biased.

OE: Ideally, what kinds of organizations and systems would come together to make a robust, transparent and cost-effective elections infrastructure?

KP: I think the collaborations in Travis and Los Angeles counties have the right mix – administrators, technologists, designers, and ordinary voters – and that it’s mostly a question of how we scale that and build communications among election innovators so good ideas can really take root and spread nationally.

Kathryn Peters is a co-founder of TurboVote. Her belief in better democracy has taken her from campaign organizing in rural Missouri to a Master’s in Public Policy at the Kennedy School of Government to political rights monitoring in Afghanistan. Katy has also worked for the information management team for the United Nations Department of Safety and Security and the National Democratic Institute’s Information and Communications Technology staff. In 2011, she was honored as one of Forbes magazine’s “30 Under 30” in the field of law and policy.

 

When we embarked on this quest to bring sanity to election data in the U.S., we knew we were in for a heavy lift.

A myriad of data formats awaited us, along with variations in data quality across states and within them over time.  In the past few months, the OpenElections team and volunteers have crafted a system to tame this wild landscape. This post takes a closer look at how we applied this system to Maryland, the first state that we took on to define the data workflow process end to end. Hopefully it helps shine some light on our process and generates ideas on how we can improve things.

The Data Source

Maryland offers relatively clean, precinct-level results on the web. In fact, it provides so many result CSVs (over 700!) that we abstracted the process for generating links to the files, rather than scraping them off the site .

Other states provide harder-to-manage formats such as database dumps and image PDFs that must be massaged into tabular data. We’ve devised a pre-processing workflow to handle these hard cases, and started to apply it in states such as Washington and West Virginia.

The common denominator across all states is the Datasource. It can be a significant effort to wire up code-wise, but once complete, it allows us to easily feed raw results into the data processing pipeline.  Our goal in coming months is to tackle this problem for as many states as possible, freeing contributors to work on more interesting problems such as data loading and standardization.

Raw Results

When the datasource was in place, we were ready to load Maryland’s data as RawResult records in Mongo, our backend datastore. The goal was to minimize the friction of initial data loading. While we retained all available data points, the focus in this critical first step was populating a common core of fields that are available across all states.

In Maryland, this meant writing a series of data loaders to handle variations in data formats across time. Once these raw result loaders were written, we turned our attention to cleanups that make the data more useful to end users.

Transforms

Loading raw results into a common set of fields is a big win, but we’ve set our sights much higher. Election data becomes much more useful after standardizing candidate names, parties, offices, and other common data points.

The types of data transforms we implement will vary by state, and in many cases, one set of cleanups must precede others. Normalizing data into unique contests and candidates is a transform common to all states, usually one that should be performed early in the process.

Transforms let us correct, clean or disambiguate results data in a discrete, easy-to-document, and replicable way.  This helps keep the data loading code simple and clear, especially when dealing with varying data layouts or formats between elections.

In Maryland, we used the core framework to create unique Contest and Candidate records for precinct results. These transforms included:

This opened the door to generating totals at the contest-wide level for each candidate.

Validations

At this point, you might be getting nervous about all this processing.  How do we ensure accuracy with all this data wrangling? Enter data validations, which provide a way to link data integrity checks with a particular transformation, or more broadly check data loading and transformation.  In Maryland, for example, we implemented a validation and bound it to a transform that normalizes the format of precinct names.  In this case, the validation acts like a unit test for the transform.  We also cross-check the loaded and transformed result data in validations that aren’t bound to specific transforms to confirm that we’ve loaded the expected number of results for a particular election or ensure that the sum of a candidate’s sub-racewide vote totals matches up with published racewide totals.

Implementing and running validations has helped us uncover data quirks, such as precinct-level data reflecting only election day vote totals, while result data for other reporting levels includes absentee and other types of votes. Validations have also exposed discrepancies between vote counts published on the State Board of Elections website and ones provided in CSV format.  We’ve circled back to Maryland officials with our findings, prompting them to fix their data at the source.

Summary

Maryland has been a guinea pig of sorts for the OpenElections project (thank you Maryland!).  It’s helped us flesh out a data processing framework and conventions that we hope to apply across the country.  Of course, challenges remain: standardizing party names across states; mapping precincts to counties; and sundry other issues we didn’t cover here remain a challenge.

As we tackle more states, we hope to refine our framework and conventions to address the inevitable quirks in U.S. election data . Meantime, we hope this provides a window into our process and gives you all some footing to make it easier to contribute.

As our volunteers have gathered details on the scope and availability of election results from across the country, one thing became clear: not all election results are created equal.

Some states provide results data in multiple formats and variations; all you have to do is choose and click. Florida has a download for every election. In Pennsylvania, we found that for $7 the state provides a CD with 12 years of consistently formatted results. Idaho has multiple files covering different reporting levels.

But that’s not every state. Some have made the transition from producing PDFs to CSVs, such as West Virginia. Others, like Mississippi, basically provide a picture of the results. For states where data isn’t the norm, OpenElections needs to fill the gap, turning results into data.

This isn’t a glamorous job, but we’d like to tell you a little about how we go about it. For states that provide electronically-generated PDFs, like West Virginia does for elections from 2000-2006, there are several good options for parsing PDF tables into data. The command-line utility pdftotext from the xPDF package works well in many cases, while the excellent Tabula (a product of ProPublica, La Nación and the Knight-Mozilla OpenNews project) can do wonders with more complex files. For West Virginia, xPDF was all we needed (along with some search and replace in a text editor) to make CSV files from the original PDFs. Here’s an example command that generates a fixed-width text file while preserving the rows and column format of the original file:

$ pdftotext -layout 2000 House of Rep Pri.pdf

We used TextWrangler, a free text editor for the Mac, to convert the spaces between columns into tabs, and from there it was trivial to copy results into CSV files. In the process of converting these results, we found several apparent errors (typos or likely copy and paste mistakes) and notified the Secretary of State’s office. To its credit, the office responded quickly and is in the process of fixing the original files (and we’ll update our CSVs when they do).

In Mississippi, however, there’s are no programmatic options, or at least no good ones. Data entry is the best way for us to get precinct-level results that are contained in county-by-county files like this one. Here’s what we’re dealing with: a scanned image of a fax:

 

Screen Shot 2014-05-05 at 8.31.51 PM

 

When it comes to doing data entry, we need to be very specific about what we want and how we want it stored in the CSV file. For our Mississippi files, we’ve developed a guide to the process that we’ll adapt to other states where manual entry is required. Which is where you come in, if you’re up for it. If you’d like to try your hand at a Mississippi file (or another state with PDFs) let us know in the Google Group and we can get you setup. Or you can fork the Mississippi results repository on Github and send us an email per the instructions in the README file.

We know that data entry is neither fun nor exciting (well, most of the time), but think of this: you’ll be part of a project that will provide a great service to journalists, researchers and citizens. And we still have some t-shirts left, too.