Thomas S. Repantis

https://www.ninewhilenine.org

Interests

Engineering leadership for highly-scalable, fault-tolerant, real-time, distributed systems.

Professional Experience

Elastic, Platform Engineering
Somerville, MA, April 2021-Present
Senior Manager of Engineering
Leading the team that develops all distributed systems aspects of Elasticsearch, including cluster coordination and data replication. Doubled the team's size in one year; currently 11 engineers. Oversaw four promotions and four patent applications. The team delivered traffic compression by more than 50%, a resource-sensitive shard allocator, and scalability improvements reducing out-of-memory errors 12-fold. Updated the support escalation process across 10 Elasticsearch teams and more than 70 engineers, reducing median response time by 36%, time investment by 276%, and stress by 84%. Documented the responsibilities of Elasticsearch managers. Mentored close to half a dozen engineering managers.
Akamai Technologies, Platform Engineering
Cambridge, MA, October 2008-April 2021
Engineering Manager, March 2018-April 2021
Led the team that develops the alerting infrastructure responsible for monitoring Akamai's platform with operational efficiency. Managed up to 8 engineers, including matrix reporting. The team delivered alert correlation and notification services, and a web-based interface for viewing alerts. Owned backend services spanning 3 teams and approximately 20 engineers, automating the response to thousands of alerts per day.
Principal Lead Software Engineer, January 2015-February 2018
Managed the Alert Management Systems team, including roadmap planning, performance evaluations, and career progressions. Grew the team from 1 to 3 engineers. Led the team to evolve a single database backend to geographically distributed, real-time replicas while maintaining four 9s of availability, migrate a variety of database clients to REST APIs, and establish modern development infrastructure and processes. Delivered on schedule projects spanning engineers across three continents.
Principal Software Engineer, July 2013-December 2014
Carried out scalability projects in Query, a distributed, event-based system that continuously processes data from the entire Akamai platform. Mentored over half a dozen engineers.
Senior Software Engineer, February 2010-June 2013
Designed and implemented multi-threaded system software for real-time publication, aggregation, delivery, and processing of data across Akamai's distributed platform. Developed C, C++, Java, Python, and Perl interfaces used by both internal and customer-facing applications for monitoring, alerting, and reporting.
Senior Performance Engineer, October 2008-January 2010
Used and developed tools to measure and analyze the performance, robustness, and scalability of large distributed systems. Took end-to-end responsibility of complex systems.
IBM Research
Advanced Enterprise Middleware, Watson Research Center, Hawthorne, NY, Summer 2007
Developed a replication middleware for distributed, multi-tier, server architectures. Quantified the server replication and data partitioning performance benefits, as well as the consistency overhead, using the TPC-W transactional web commerce benchmark. Patented the middleware's efficient, distributed, strong-consistency protocol.
Intel Research
Corporate Technology Group, Pittsburgh, PA, Summer 2006
Built an event-driven, collaborative spam filter that employed a distributed protocol to defend against sybil attacks.
Hewlett-Packard
Enterprise Storage & Servers, Colorado Springs, CO, Summer 2005
Developed and documented a logging mechanism used for asynchronous replication in a distributed disk array.
FGAN e.V. (Fraunhofer FKIE)
Bonn, Germany, Summer 2000
Analyzed the H.323 protocol family, used for multimedia applications (VoIP) in packet switched networks, and summarized the results in a technical report, including detailed protocol description and performance evaluation of applications under IPv6 in Solaris.

Education

Ph.D. in Computer Science
University of California, Riverside, August 2008
Thesis: Synergy: Quality of Service Support for Distributed Stream Processing Systems
M.Sc. in Computer Science
University of California, Riverside, August 2005
Thesis: Adaptive Data Dissemination and Content-Driven Routing in Peer-to-Peer Systems
Diploma in Electrical & Computer Engineering
(5-year program)
University of Patras, Greece, March 2003
Thesis: Implementation of Page Forwarding on Clusters

tsr home
cd /home