Calculating BOINC Throughput Data
Post date: Nov 25, 2014 11:27:41 PM
Our Fragile X simulation is running on a grid computing network of computers from Brooklyn College's WEB computing lab. The computers run the simulation when not in use by students or faculty, so output volume varies widely, with the majority of successful jobs being completed over the weekends when most students are not in school. We wanted a way to automate the calculation and visualization of how much work the computers are accomplishing each day.
The python script get_throughput_data.py queries the BOINC mySQL database for all completed jobs within the given time period. It then calculates the amount of jobs completed each day during that time period. The script provides two separate output files:
- a .dat file of date, number of successful jobs, and number of failed jobs formatted for use with gnuplot
- a .out file with throughput information in human readable format including:
- number of total completed jobs
- number of daily completed jobs
- number of successful daily jobs
- number of failed daily jobs
- reason for failure
A simple gnuplot script plots the daily successful and failed jobs as a stacked histogram.
Here's an example of our BOINC server's output from September 1st - November 25th:
We can also look more closely at a week of output:
Attached is a folder with the scripts I wrote for this analysis.
get_throughput_data.py <start date YYYY/MM/DD> <end date YYYY/MM/DD>