Friday, May 14, 2010

I was able to mirror the election results site and parsed the HTML files. 21M+ cluster-position-candidate vote entries.

Here are some stats:

 

all_positions_candidates.csv - 21483907 lines - 1969712099 bytes

 

all_positions.csv - 713438 lines - 61965718 bytes

 

parsed_clusters.csv - 78230 lines - 18701511 bytes

I published the code used to mirror and parsed the HTML files, here:

http://github.com/markjeee/erparser

Hope someone can make some sense of the data generated by the script. Have fun, and use it for something good! :) I'm not going to publish the data in a public forum. If any of you want to have a copy or in any way don't know how to run the script or don't like to run the script and just want to work directly with the parsed CSV data, well i might have the left over data from the last test run (before my lizard brain tells me to delete it).

SUPER BIG DISCLAIMER: This will not give you the official results. They're no way this is an official code from COMELEC. There's no way i work for COMELEC. Don't even have friends who work there, nor from Smartmatic. It's a hack to mirror and parse data, so don't use it to base serious decisions. Though you can use it as a start, then use the actual numbers on the COMELEC site for confirmation. :D

 

 

Posted via web from markjeee.com

No comments: