A primary challenge to building reliable and secure computer systems is managing the persistent state of the system: all the executable files, configuration settings and other data that govern how a system functions. The difficulty comes from the sheer volume of this persistent state, the frequency of changes to it, and the variety of workloads and requirements that require customization of persistent state. The cost of not managing a system‘s persistent state effectively is high: configuration errors are the leading cause of downtime at Internet services, troubleshooting configuration problems is a leading component of total cost of ownership in corporate environments, and malware— effectively, unwanted persistent state—is a serious privacy and security concern on personal computers. In this paper, we analyze how computer systems dynamically interact with files and configuration settings in an attempt to gain insights into the problem of persistent state management. We analyze over 3648 machine days of these persistent state interactions, collected over an 8 month period from 193 machines. These machines are under real workloads and include Internet servers, corporate desktops, and home machines. We characterize the scope and magnitude of the persistent state management problem today, measuring not only the gross characteristics of persistent state, but also analyzing how it is used by applications, and when administrators and users modify it. We find that monitoring persistent state interactions provides important visibility and show how it can be used as a foundation for building better persistent state management tools.
构建可靠且安全的计算机系统的一个主要挑战是管理系统的持久状态:所有可执行文件、配置设置以及其他控制系统如何运行的数据。困难源于这种持久状态的庞大数量、其变更的频率,以及需要对持久状态进行定制的各种工作负载和要求。不能有效管理系统持久状态的代价是高昂的:配置错误是互联网服务停机的首要原因,排查配置问题是企业环境中总体拥有成本的主要组成部分,而恶意软件——实际上是不需要的持久状态——是个人计算机上严重的隐私和安全问题。在本文中,我们分析计算机系统如何与文件和配置设置动态交互,试图深入了解持久状态管理问题。我们分析了在8个月期间从193台机器收集的超过3648个机器日的这些持久状态交互。这些机器处于实际工作负载下,包括互联网服务器、企业台式机和家用电脑。我们描述了当今持久状态管理问题的范围和严重程度,不仅测量了持久状态的总体特征,还分析了应用程序如何使用它,以及管理员和用户何时修改它。我们发现监测持久状态交互提供了重要的可视性,并展示了如何将其用作构建更好的持久状态管理工具的基础。