Thomas S. Repantis https://www.ninewhilenine.org thomas@ninewhilenine.org INTERESTS Engineering leadership for highly-scalable, fault-tolerant, real-time, distributed systems. PROFESSIONAL EXPERIENCE Elastic, Platform Engineering, Somerville, MA, April 2021-Present Senior Manager of Engineering Leading the team that develops all distributed systems aspects of Elasticsearch, including cluster coordination and data replication. Doubled the team's size in one year; currently 11 engineers. Oversaw four promotions and four patent applications. The team delivered traffic compression by more than 50%, a resource-sensitive shard allocator, and scalability improvements reducing out-of-memory errors 12-fold. Updated the support escalation process across 10 Elasticsearch teams and more than 70 engineers, reducing median response time by 36%, time investment by 276%, and stress by 84%. Documented the responsibilities of Elasticsearch managers. Mentored close to half a dozen engineering managers. Akamai Technologies, Platform Engineering, Cambridge, MA, October 2008-April 2021 Engineering Manager, March 2018-April 2021 Led the team that develops the alerting infrastructure responsible for monitoring Akamai's platform with operational efficiency. Managed up to 8 engineers, including matrix reporting. The team delivered alert correlation and notification services, and a web-based interface for viewing alerts. Owned backend services spanning 3 teams and approximately 20 engineers, automating the response to thousands of alerts per day. Principal Lead Software Engineer, January 2015-February 2018 Managed the Alert Management Systems team, including roadmap planning, performance evaluations, and career progressions. Grew the team from 1 to 3 engineers. Led the team to evolve a single database backend to geographically distributed, real-time replicas while maintaining four 9s of availability, migrate a variety of database clients to REST APIs, and establish modern development infrastructure and processes. Delivered on schedule projects spanning engineers across three continents. Principal Software Engineer, July 2013-December 2014 Carried out scalability projects in Query, a distributed, event-based system that continuously processes data from the entire Akamai platform. Mentored over half a dozen engineers. Senior Software Engineer, February 2010-June 2013 Designed and implemented multi-threaded system software for real-time publication, aggregation, delivery, and processing of data across Akamai's distributed platform. Developed C, C++, Java, Python, and Perl interfaces used by both internal and customer-facing applications for monitoring, alerting, and reporting. Senior Performance Engineer, October 2008-January 2010 Used and developed tools to measure and analyze the performance, robustness, and scalability of large distributed systems. Took end-to-end responsibility of complex systems. IBM Research Advanced Enterprise Middleware, Watson Research Center, Hawthorne, NY, Summer 2007 Developed a replication middleware for distributed, multi-tier, server architectures. Quantified the server replication and data partitioning performance benefits, as well as the consistency overhead, using the TPC-W transactional web commerce benchmark. Patented the middleware's efficient, distributed, strong-consistency protocol. Intel Research Corporate Technology Group, Pittsburgh, PA, Summer 2006 Built an event-driven, collaborative spam filter that employed a distributed protocol to defend against sybil attacks. Hewlett-Packard Enterprise Storage & Servers, Colorado Springs, CO, Summer 2005 Developed and documented a logging mechanism used for asynchronous replication in a distributed disk array. FGAN e.V. (Fraunhofer FKIE) Bonn, Germany, Summer 2000 Analyzed the H.323 protocol family, used for multimedia applications (VoIP) in packet switched networks, and summarized the results in a technical report, including detailed protocol description and performance evaluation of applications under IPv6 in Solaris. EDUCATION Ph.D. in Computer Science University of California, Riverside, August 2008 Thesis: Synergy: Quality of Service Support for Distributed Stream Processing Systems M.Sc. in Computer Science University of California, Riverside, August 2005 Thesis: Adaptive Data Dissemination and Content-Driven Routing in Peer-to-Peer Systems Diploma in Electrical & Computer Engineering (5-year program) University of Patras, Greece, March 2003 Thesis: Implementation of Page Forwarding on Clusters