Engineering leadership for highly-scalable, fault-tolerant, real-time, distributed systems.
Professional Experience
Elastic, Platform Engineering
Somerville, MA, April 2021-Present
Senior Manager of Engineering
Leading the team that develops all distributed systems aspects of
Elasticsearch, including cluster coordination and data
replication. Doubled the team's size in one year; currently 11 engineers.
Oversaw four promotions and four patent applications. The team
delivered traffic compression by more than 50%, a
resource-sensitive shard allocator, and scalability improvements
reducing out-of-memory errors 12-fold. Updated the support
escalation process across 10 Elasticsearch teams and more than 70
engineers, reducing median response time by 36%, time investment
by 276%, and stress by 84%. Documented the responsibilities of
Elasticsearch managers. Mentored close to half a dozen engineering
managers.
Akamai Technologies, Platform Engineering
Cambridge, MA, October 2008-April 2021
Engineering Manager, March 2018-April 2021
Led the team that develops the alerting infrastructure responsible
for monitoring Akamai's platform with operational
efficiency. Managed up to 8 engineers, including matrix
reporting. The team delivered alert correlation and notification
services, and a web-based interface for viewing alerts. Owned
backend services spanning 3 teams and approximately 20 engineers,
automating the response to thousands of alerts per day.
Principal Lead Software Engineer, January 2015-February 2018
Managed the Alert Management Systems team, including roadmap
planning, performance evaluations, and career progressions. Grew
the team from 1 to 3 engineers. Led the team to evolve a single
database backend to geographically distributed, real-time replicas
while maintaining four 9s of availability, migrate a variety of
database clients to REST APIs, and establish modern development
infrastructure and processes. Delivered on schedule projects
spanning engineers across three continents.
Principal Software Engineer, July 2013-December 2014
Carried out scalability projects in Query, a distributed,
event-based system that continuously processes data
from the entire Akamai platform. Mentored over half a dozen engineers. Senior Software Engineer, February 2010-June 2013
Designed and implemented multi-threaded system software for real-time
publication, aggregation, delivery, and processing of data across Akamai's
distributed platform. Developed C, C++, Java,
Python, and Perl interfaces used by both internal and customer-facing
applications for monitoring, alerting, and reporting. Senior Performance Engineer, October 2008-January 2010
Used and developed tools to measure and analyze the performance,
robustness, and scalability of large distributed systems. Took end-to-end
responsibility of complex systems.
IBM Research
Advanced Enterprise Middleware, Watson Research Center, Hawthorne, NY, Summer 2007
Developed a replication middleware for distributed,
multi-tier, server architectures. Quantified the server
replication and data partitioning performance benefits, as well as
the consistency overhead, using the TPC-W transactional web
commerce benchmark. Patented the
middleware's efficient, distributed, strong-consistency protocol.
Intel Research
Corporate Technology Group, Pittsburgh, PA, Summer 2006
Built an event-driven,
collaborative spam filter that employed a
distributed protocol to defend against sybil attacks.
Hewlett-Packard
Enterprise Storage & Servers, Colorado Springs, CO, Summer 2005
Developed and
documented a logging mechanism used for asynchronous replication in a
distributed disk array.
Analyzed the H.323 protocol family, used for multimedia
applications (VoIP) in packet switched networks, and summarized
the results in a technical report, including detailed protocol
description and performance evaluation of applications under IPv6
in Solaris.