State Machine Replication (SMR) protocols form the backbone of many distributed systems. Enterprises and startups increasingly build their distributed systems on the cloud due to its many advantages, such as scalability and cost-effectiveness. One of the first technical questions companies face when building a system on the cloud is which programming language to use. Among many factors that go into this decision is whether to use a language with garbage collection (GC), such as Java or Go, or a language with manual memory management, such as C++ or Rust. Today, companies predominantly prefer languages with GC, like Go, Kotlin, or even Python, due to ease of development; however, there is no free lunch: GC costs resources (memory and CPU) and performance (long tail latencies due to GC pauses). While there have been anecdotal reports of reduced cloud cost and improved tail latencies when switching from a language with GC to a language with manual memory management, so far, there has not been a systematic study of the GC overhead of running an SMR-based cloud system. This paper studies the overhead of running an SMR-based cloud system written in a language with GC. To this end, we design from scratch a canonical SMR system -- a MultiPaxos-based replicated in-memory key-value store -- and we implement it in C++, Java, Rust, and Go. We compare the performance and resource usage of these implementations when running on the cloud under different workloads and resource constraints and report our results. Our findings have implications for the design of cloud systems.
状态机复制(SMR)协议是许多分布式系统的支柱。由于云计算具有可扩展性和成本效益等诸多优势,企业和初创公司越来越多地在云上构建其分布式系统。在云上构建系统时,公司面临的首要技术问题之一是使用哪种编程语言。在做出这一决策时需要考虑的众多因素中,包括是使用具有垃圾回收(GC)功能的语言,如Java或Go,还是使用手动内存管理的语言,如C++或Rust。如今,由于开发便捷,公司主要倾向于使用具有GC的语言,如Go、Kotlin,甚至Python;然而,天下没有免费的午餐:GC会消耗资源(内存和CPU)以及性能(由于GC暂停导致的长尾延迟)。虽然有传闻称从具有GC的语言切换到手动内存管理的语言时,云成本降低且尾部延迟得到改善,但到目前为止,对于运行基于SMR的云系统的GC开销还没有系统的研究。本文研究了使用具有GC的语言编写的基于SMR的云系统的开销。为此,我们从头设计了一个标准的SMR系统——一个基于MultiPaxos的复制式内存键值存储,并在C++、Java、Rust和Go中实现它。我们比较了这些实现在不同工作负载和资源约束下在云上运行时的性能和资源使用情况,并报告了我们的结果。我们的研究结果对云系统的设计具有启示意义。