Thomas S. Repantis https://www.ninewhilenine.org thomas@ninewhilenine.org INTERESTS Engineering leadership for highly-scalable, fault-tolerant, real-time, distributed systems. PROFESSIONAL EXPERIENCE Elastic, Platform Engineering, Somerville, MA, April 2021-Present https://www.elastic.co Senior Manager of Engineering - Leading the team that develops all distributed systems aspects of Elasticsearch, including cluster coordination and data replication. - Doubled the team's size in one year; currently 11 engineers. - Oversaw four promotions and four patent applications. - The team delivered traffic compression by more than 50%, a resource-sensitive shard allocator, and scalability improvements reducing out-of-memory errors 12-fold. - Defined processes for tackling planned and unplanned work. - Updated the support escalation process across 10 Elasticsearch teams and more than 70 engineers, reducing median response time by 36%, time investment by 276%, and stress by 84%. - Documented the responsibilities of Elasticsearch managers. Mentored close to half a dozen engineering managers. Akamai Technologies, Platform Engineering, Cambridge, MA, October 2008-April 2021 https://www.akamai.com Engineering Manager, March 2018-April 2021 - Led the team that develops the alerting infrastructure responsible for monitoring Akamai's platform with operational efficiency. - Managed up to 8 engineers, including matrix reporting. - The team delivered alert correlation and notification services, and a web-based interface for viewing alerts. - Owned backend services spanning 3 teams and approximately 20 engineers, automating the response to thousands of alerts per day. Principal Lead Software Engineer, January 2015-February 2018 - Managed the Alert Management Systems team, including roadmap planning, performance evaluations, and career progressions. - Grew the team from 1 to 3 engineers. - Led the team to evolve a single database backend to geographically distributed, real-time replicas while maintaining four 9s of availability, and migrate a variety of direct database clients to REST APIs. - Led the team to modernize its software development and build infrastructure. - Established planning, code review, and escalation processes. - Delivered on schedule projects spanning engineers across three continents. Principal Software Engineer, July 2013-December 2014 - Carried out scalability projects in Query, a distributed, event-based system that continuously processes data from the entire Akamai platform. - Implemented in C++ multi-threaded SQL processing, table merging, and network communication, increasing query processing throughput up to 47%. - Mentored over half a dozen senior software and performance engineers. Senior Software Engineer, February 2010-June 2013 - Implemented in C++ system software for real-time publication, aggregation, delivery, and processing of data across Akamai's distributed platform, including multi-threaded data encoding, a thread statistics collection framework, and stateful redirection and retrying of SQL queries across servers. - Developed C, C++, Java, Python, and Perl interfaces used by both internal and customer-facing applications for monitoring, alerting, and reporting. - Built in Python a testing component for automatically replaying production SQL load. - Designed, implemented in C++, tested, documented, and operationalized a distributed aggregation feature that enabled data reductions ranging from 62% to 78%. Senior Performance Engineer, October 2008-January 2010 - Used and developed tools to measure, analyze, and characterize the performance, robustness, and scalability of large distributed systems. - Gathered and analyzed data to root out errors, discern trends, and diagnose complex, customer-facing issues. Responded to incidents and prevented incidents through proactive analysis and monitoring. - Enabled capabilities to the operational networks that spanned multiple technical areas, including safely rolling out a 64-bit OS that reduced infrastructure machines by 18%. - Communicated across all areas of the company, by taking a holistic view and end-to-end responsibility of complex systems. Used quality metrics, reviewed with executives, to shine a light on areas of improvement for the whole company. University of California, Riverside Department of Computer Science & Engineering, Distributed Real-Time Systems Laboratory Graduate Student Researcher, January 2004-August 2008 - Led the development of the Synergy distributed stream processing middleware; supervised other student research projects that used the platform. http://synergy.cs.ucr.edu - Evaluated Synergy's performance over PlanetLab by implementing a network traffic monitoring application operating on real streaming data. - Designed and implemented in Java: - QoS-aware, distributed algorithms for composing stream processing applications. - A DHT-based resource monitoring architecture that used statistical forecasting to predict and prevent QoS violations by relieving overloaded nodes. - A decentralized replica placement protocol that aimed to maximize availability while respecting resource and performance constraints. - A mechanism for routing peer-to-peer queries using Bloom filters. IBM Research Advanced Enterprise Middleware, Watson Research Center, Hawthorne, NY Research Intern, June 2007-September 2007 https://www.research.ibm.com Developed in Java a replication middleware for distributed, multi-tier, server architectures. Proposed and incorporated in the middleware an efficient, distributed, strong-consistency protocol. Quantified the server replication and data partitioning performance benefits, as well as the consistency overhead, using the TPC-W transactional web commerce benchmark. Intel Research Corporate Technology Group, Pittsburgh, PA Research Intern, June 2006-September 2006 http://www.intel-research.net Contributed to the reliable email project. Built in C++ an event-driven, collaborative spam filter that employed a distributed protocol to defend against sybil attacks. Hewlett-Packard Enterprise Storage & Servers, Colorado Springs, CO Software Intern, June 2005-September 2005 https://www.hpe.com As a member of the replication team of an upcoming product of HP's grid storage portfolio, developed in C++ and documented a logging mechanism used for asynchronous replication in a distributed disk array. University of California, Riverside Department of Computer Science & Engineering Teaching Assistant, September 2003-December 2003 https://www.cs.ucr.edu Instructed a lab on C++ programming, and evaluated assignments and exams. Anonymous student reviews included: "Very caring TA. Will teach and willing to spend extra time." and "The best TA I ever had." University of Patras, Greece Department of Computer Engineering & Informatics, High Performance Information Systems Laboratory Undergraduate Student Researcher, January 2001-November 2002 http://old.hpclab.ceid.upatras.gr Implemented in C a protocol for dynamic memory page migration across the nodes of a Software Distributed Shared Memory System, as part of an inter-departmental diploma thesis. Dynamic page migration improved performance by increasing locality and adaptability, while remaining transparent to the application programmer. FGAN e.V. (Fraunhofer FKIE) Research Institute for Communication, Information Processing and Ergonomics, Computer Networks Department, Bonn, Germany Intern, July 2000-August 2000 https://www.fgan.de Analyzed the H.323 protocol family, used for multimedia applications (VoIP) in packet switched networks, and summarized the results in a technical report, including detailed protocol description and performance evaluation of applications under IPv6 in Solaris. University of Patras, Greece Department of Electrical & Computer Engineering Web Developer, October 1998-March 1999 https://www.ece.upatras.gr Developed web pages in HTML, as part of a team that created the department's web site. EDUCATION Ph.D. in Computer Science University of California, Riverside, August 2008 Thesis: Synergy: Quality of Service Support for Distributed Stream Processing Systems M.Sc. in Computer Science University of California, Riverside, August 2005 Thesis: Adaptive Data Dissemination and Content-Driven Routing in Peer-to-Peer Systems GPA: 3.900/4.000 Diploma in Electrical & Computer Engineering (5-year program) University of Patras, Greece, March 2003 Thesis: Implementation of Page Forwarding on Clusters GPA: 7.70/10.00 PATENT Coordinating Updates to Replicated Data, Arun Iyengar, Thomas Repantis, US 7,996,360 B2, August 9, 2011. REFEREED PUBLICATIONS Refereed Journals: Scaling a Monitoring Infrastructure for the Akamai Network, Thomas Repantis, Jeff Cohen, Scott Smith, Joel Wein, ACM SIGOPS Operating Systems Review (OSR), vol. 44, no. 3, pp. 20-26, July 2010. QoS-Aware Shared Component Composition for Distributed Stream Processing Systems, Thomas Repantis, Xiaohui Gu, Vana Kalogeraki, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 20, no. 7, pp. 968-982, July 2009. Adaptive Component Composition and Load Balancing for Distributed Stream Processing Applications, Thomas Repantis, Yannis Drougas, Vana Kalogeraki, Springer Peer-to-Peer Networking and Applications (PPNA), vol. 2, no. 1, pp. 60-74, March 2009. Refereed Book Chapters: Data Dissemination and Query Routing in Mobile Peer-to-Peer Networks, Thomas Repantis, Vana Kalogeraki, Mobile Peer-to-Peer Computing for Next Generation Distributed Environments: Advancing Conceptual and Algorithmic Applications, IGI Global Publishing, pp. 26-49, May 2009. Refereed Conference Proceedings: Consistent Replication in Distributed Multi-Tier Architectures, Thomas Repantis, Arun Iyengar, Vana Kalogeraki, Isabelle Rouvellou, Proceedings of the 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2011). Keeping Track of 70,000+ Servers: The Akamai Query System, Jeff Cohen, Thomas Repantis, Sean McDermott, Scott Smith, Joel Wein, Proceedings of the 24th USENIX Large Installation System Administration Conference (LISA 2010). Hot-Spot Prediction and Alleviation in Distributed Stream Processing Applications, Thomas Repantis, Vana Kalogeraki, Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2008). Acceptance rate (PDS track): 22% Replica Placement for High Availability in Distributed Stream Processing Systems, Thomas Repantis, Vana Kalogeraki, Proceedings of the 2nd International Conference on Distributed Event-Based Systems (DEBS 2008). Synergy: Sharing-Aware Component Composition for Distributed Stream Processing Systems, Thomas Repantis, Xiaohui Gu, Vana Kalogeraki, Proceedings of the 7th ACM/IFIP/USENIX International Middleware Conference (MIDDLEWARE 2006). Acceptance rate: 17% Load Balancing Techniques for Distributed Stream Processing Applications in Overlay Environments, Yannis Drougas, Thomas Repantis, Vana Kalogeraki, Proceedings of the 9th IEEE International Symposium on Object- and Component-Oriented Real-Time Distributed Computing (ISORC 2006). Acceptance rate: 35% A Case for Dynamic Page Migration in Multiple-Writer Software DSM Systems, Thomas Repantis, Christos D. Antonopoulos, Vana Kalogeraki, Theodore S. Papatheodorou, Proceedings of the 7th IEEE International Conference on Cluster Computing (CLUSTER 2005). Acceptance rate: 33% Data Dissemination in Mobile Peer-to-Peer Networks, Thomas Repantis, Vana Kalogeraki, Proceedings of the 6th International Conference on Mobile Data Management (MDM 2005). Acceptance rate: 25% Coordinated Media Streaming and Transcoding in Peer-to-Peer Systems, Fang Chen, Thomas Repantis, Vana Kalogeraki, Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005). Acceptance rate: 34% Dynamic Page Migration in Software DSM Systems, Thomas Repantis, Christos D. Antonopoulos, Vana Kalogeraki, Theodore S. Papatheodorou, Proceedings of the 6th IEEE International Conference on Cluster Computing (CLUSTER 2004) (poster session). Refereed Workshop Proceedings: Efficient Data Dissemination in Overlays, Dung Vu, Thomas Repantis, Vana Kalogeraki, Proceedings of the 1st International Workshop on Software Technologies for Future Dependable Distributed Systems (STFSSD 2009) (in conjunction with ISORC 2009). Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments, Thomas Repantis, Vana Kalogeraki, Proceedings of the 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007) (in conjunction with VLDB 2007). Acceptance rate: 21% Decentralized Trust Management for Ad-Hoc Peer-to-Peer Networks, Thomas Repantis, Vana Kalogeraki, Proceedings of the 4th International Workshop on Middleware for Pervasive and Ad-Hoc Computing (MPAC 2006) (in conjunction with MIDDLEWARE 2006). Acceptance rate: 25% A Comprehensive Comparison of Routing Protocols for Large-Scale Wireless MANETs, Ioannis Broustis, Gentian Jakllari, Thomas Repantis, Mart Molle, Proceedings of the 3rd International Workshop on Wireless Ad Hoc and Sensor Networks (IWWAN 2006) (in conjunction with SECON 2006). Adaptive Resource Management in Peer-to-Peer Middleware, Thomas Repantis, Yannis Drougas, Vana Kalogeraki, Proceedings of the 13th International Workshop on Parallel and Distributed Real-Time Systems (WPDRTS 2005) (in conjunction with IPDPS 2005). Towards Self-Managing QoS-Enabled Peer-to-Peer Systems, Vana Kalogeraki, Fang Chen, Thomas Repantis, Demetris Zeinalipour-Yazti, Self-Star Properties in Complex Information Systems, Hot Topics in Computer Science, Springer LNCS, vol. 3460, 2005. TECHNICAL REPORTS Helping Query Scale with Region Views, Thomas Repantis, Daisy Deng, Internal Report, Akamai Technologies, 2014. Logging Service Architecture Strategy and Design, Chris Stroberger, Thomas Repantis, Internal Report, Hewlett-Packard, 2005. A Performance Comparison of Routing Protocols for Large-Scale Wireless Mobile Ad Hoc Networks, Ioannis Broustis, Gentian Jakllari, Thomas Repantis, Mart Molle, Technical Report UCR-CS-2003-12001, University of California, Riverside, 2003. Analysis of the H.323 Protocol Suite, Thomas Repantis, Peter Sevenich, Technical Report FKIE-KOM 2000/4, FGAN e.V., 2000. POSTERS Synergy: Quality of Service Support for Distributed Stream Processing Systems, Thomas Repantis, Vana Kalogeraki, Xiaohui Gu, Graduate Research Awards and Colloquium, University of California, Riverside, 2008. Graduate research award. Synergy: Quality of Service Support for Distributed Stream Processing Systems, Thomas Repantis, Vana Kalogeraki, Xiaohui Gu, Graduate Students Association Annual Research Conference, University of California, Riverside, 2008. The Synergy Distributed Stream Processing Middleware, Thomas Repantis, Xiaohui Gu, Vana Kalogeraki, Board of Advisors Meeting, University of California, Riverside, 2007. Replication Trade-Offs in Composite Distributed Applications, Thomas Repantis, Arun Iyengar, Isabelle Rouvellou, IBM Summer Student Poster Session 2007. The Synergy Distributed Stream Processing Middleware, Thomas Repantis, Xiaohui Gu, Vana Kalogeraki, TechHorizons, University of California, Riverside, 2007. Synergy: A Distributed Stream Processing Middleware, Thomas Repantis, Xiaohui Gu, Vana Kalogeraki, Graduate Research Awards and Colloquium, University of California, Riverside, 2007. Graduate research honorable mention. Defending Against Sybil Attacks in the Reliable Email Project, Thomas Repantis, Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman, Poster and Demo Session, Intel Research Symposium, 2006. Cooperative Media Processing and Streaming, Thomas Repantis, Fang Chen, Vana Kalogeraki, 7th Annual Industry Day Poster Session, University of California, Riverside, 2005. Second best graduate poster award. Dynamic Page Migration in Software DSM Systems, Thomas Repantis, Christos D. Antonopoulos, Vana Kalogeraki, Theodore S. Papatheodorou, 6th Annual Industry Day Poster Session, University of California, Riverside, 2004. TECHNICAL SKILLS Operating Systems: UNIX (Linux, FreeBSD, Solaris, HP-UX), Windows, OS X, DOS. Languages: C++, C, Java, Python, Perl, Ruby, Go, Scala, Shell, SQL, Assembly (i8085, i80x86, ADSP-21xx), JavaScript, HTML, CSS, XML, UML, LISP, PROLOG, FORTRAN, VB, BASIC. APIs: STL, pthreads, TCP/IP (sockets), RPC, libasync, Node.js, Bootstrap, Express, async, Sequelize, Passport, EJS, JDBC, JNI, Kubernetes, FreePastry, Servlets, Swing. Applications: gdb, make, maven, ant, OProfile, valgrind, TCMalloc, Coverity, CppUnit, JUnit, log4j, git, CVS, Subversion, Perforce, ClearCase, Eclipse, doxygen, tomcat, apache, mod_ssl, OpenSSL, Elasticsearch, Kibana, PostgreSQL, Slony-I, MySQL, matlab, spice, LabVIEW, PlanetLab, ns-2, NeuroGrid P2P Simulator, ComNet III, QualNet, ERwin, LaTeX, gnuplot, RRDtool. SCHOLARSHIPS AND AWARDS Elastic: ElastiSpot award for "coordinating the Elasticsearch team's on-call procedures", August 2023. Elastic: Engineering monthly recognition award for "helping the whole Elasticsearch team evolve our escalation process", April 2022. Akamai Technologies: "One Akamai" award for "great attitude and helpfulness", March 2021. Akamai Technologies: "One Akamai" award for "collaboration and support", May 2020. Akamai Technologies: "Urgency and Persistence" award for "exceptional attitude toward helping another team", January 2020. Akamai Technologies: "Urgency and Persistence" award for "help on incidents", January 2020. Akamai Technologies: "One Akamai" award for "dedication and hard work", January 2020. Akamai Technologies: "One Akamai" award for "gamely pitching in when called to help out", October 2019. Akamai Technologies: "One Akamai" award for "participating in the FedRAMP 2019 on-site assessment", June 2019. Akamai Technologies: Spot award for "unending patience in assisting another team, by providing expertise, alternatives, and pure effort", November 2018. Akamai Technologies: Spot award for "efforts, impact, and accomplishments above and beyond expectations", November 2015. University of California, Riverside: Graduate research award, four recipients throughout the university, the only recipient from Computer Science & Engineering Department, June 2008. Gerondelis Foundation: Graduate study scholarship, May 2008. IFIP: Student travel award for attending DSN 2008, June 2008. University of California, Riverside: Graduate research honorable mention, four recipients throughout the university, the only recipient from Computer Science & Engineering Department, April 2007. ACM: Student travel award for attending MIDDLEWARE 2006, November 2006. City of Riverside: Honorary residency for pursuing an international academic goal, May 2006. University of California, Riverside: Second best graduate poster award at the 7th Annual Industry Day Poster Session, three recipients throughout the university, the only recipient from Computer Science & Engineering Department, October 2005. IEEE Computer Society, Technical Committee on Scalable Computing: Student travel award for attending CLUSTER 2005, September 2005. IEEE Computer Society, Technical Committee on Parallel Processing: Student travel award for attending IPDPS 2005, April 2005. University of California, Riverside: Dean's graduate fellowship award, September 2003-May 2005. Erdos Number: 4 (Vana Kalogeraki, Dimitrios Gunopulos, Bela Bollobas, Paul Erdos). University of Patras: Honor for the highest GPA in the Department of Electrical & Computer Engineering graduating class, March 2003. Zosima Foundation: Scholarship for excellence during university studies, received after exams administered by the Greek Ministry of Education, September 1997-May 2002. International Association for the Exchange of Students for Technical Experience, German Academic Exchange Service: Scholarship for practical traineeship abroad, July 2000-August 2000. City of Patras: Honor for ranking first in the national exams for admission to the Department of Electrical & Computer Engineering of the University of Patras, approximately 150,000 total participants, September 1997. Greek Ministry of Education: Annual honors and awards for the highest GPA in class, 1990-1996. PROFESSIONAL ACTIVITIES Program Committee member for IDCS'14, IDCS'13, ICDCS'12, IDCS'12, IDCS'11, CDN'10. Reviewer for IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Mobile Computing, ACM Transactions on Autonomous and Adaptive Systems, IEEE Intelligent Systems, Elsevier Information Sciences, Elsevier Computer Networks, Elsevier Computer Communications, Wiley Security and Communication Networks, IGI Handbook on Mobile P2P Computing, Springer Peer-to-Peer Networking and Applications, Springer Frontiers of Computer Science, Journal of Zhejiang University, IDCS'12, ICDCS'11, IDCS'11, SoftCOM'10, DSN'09, IPDPS'09, ICC'08, CCNS'08, PV'07, ISORC'06, LCN'05, ICPS'05, RTAS'04. External Reviewer for IEEE TPDS, IPDPS'11, WWW'08, RTSS'08, WOWMOM'08, NETWORKING'08, ICDCS'07, ICPP'07, DOA'07, NETWORKING'07, INFOSCALE'07, DSN'06, SIGCOMM'05, RTSS'05, PODS'05, MDM'05, GLOBECOM'04, DBISP2P'04. Session Chair for CollaborateCom'11. Graduate of Akamai Management Academy. Founder of Akamai software engineering book club. Mentor for Elastic Employee Peer-2-Peer Program. Mentor for the UC Riverside Student Alumni Mentor Program. Mentor for STEM students in MentorNet. Volunteer for Web/IT team of MIT European Career Fair'10. Volunteer for HP Employee Demo Day'05. Student Volunteer for CLUSTER'04. LANGUAGES English: Cambridge Certificate of Proficiency in English: Grade A, June 1999 German: Grosses Deutsches Sprachdiplom: Gut, November 2002 Greek: Native