Server load balancers (LBs) are critical components of interactive services, routing client requests to servers in a pool. LBs improve service performance and increase availability by spreading the request load evenly across servers. It is time to rethink what LBs can do for applications. As application compute becomes increasingly granular (e.g., microservices), request-processing latencies at servers will be ever more impacted by software and system variability at small time scales (e.g., 100μs-1ms). Beyond balancing load, we argue that LBs must actively optimize application response time, by adapting request-routing to quickly-varying server performance. Specifically, we advocate for in-band feedback control: LBs should adapt the request-routing policy using purely local observations of server performance, derived from requests traversing the LB. A key challenge to designing such feedback controllers is that high-speed LBs only see the requests, not the responses. We present the design of an LB that adapts to a server latency inflation of 1 ms and reduces tail latencies in milliseconds, while observing only client-to-server traffic.
服务器负载均衡器(LBs)是交互服务的关键组件,它将客户端请求路由到服务器池中。负载均衡器通过在服务器之间均匀分配请求负载来提高服务性能并增加可用性。是时候重新思考负载均衡器能为应用程序做些什么了。随着应用计算变得越来越细粒度(例如微服务),服务器上的请求处理延迟将在小时间尺度(例如100微秒 - 1毫秒)内受到软件和系统可变性的更大影响。除了平衡负载,我们认为负载均衡器必须通过使请求路由适应快速变化的服务器性能来积极优化应用响应时间。具体来说,我们提倡带内反馈控制:负载均衡器应使用从经过负载均衡器的请求中得出的对服务器性能的纯本地观测来调整请求路由策略。设计此类反馈控制器的一个关键挑战是高速负载均衡器只能看到请求,而看不到响应。我们介绍了一种负载均衡器的设计,它能适应服务器1毫秒的延迟增加,并将尾部延迟减少几毫秒,同时仅观测客户端到服务器的流量。