18 July 2013
"Site Reliability Engineering and Data at Google"
Meeting Room 1, Ground Floor, JLB
12:30pm - 13:45pm
Adrian Hilton - Google
In this talk we look at the problem of storing user data when you need to scale to hundreds of millions of users across the planet but respond within a quarter of a second to a query. We discuss the problems of computing at a planet-wide scale, when you're running across very improbable events on a daily basis and when no single system can be sufficiently reliable for your needs. We look at how Google's Site Reliability Engineering team approaches the problem of building and maintaining an infrastructure that operates at this scale when failure is really not an option.
Adrian is a Launch Lead Engineer in Google's Site Reliability Engineering team. He works on Google's main campus in Mountain View, California. His job is to work with Google teams launching software to the public to choose their correct production design and implementation, including storage, network, provisioning, load balancing, replication, diagnostics, monitoring, and job configuration, and analyse each launch's potential effect on the stability and reliability of the product and on shared common infrastructure. He studied for his Ph.D. in Computer Science with the Open University part-time from 1998 to 2004, supervised by Jon Hall and Darryl Ince; his thesis title was "High Integrity Hardware-Software Codesign".
Save to your Calendar