Path: blob/master/Documentation/cgroups/freezer-subsystem.txt
10821 views
The cgroup freezer is useful to batch job management system which start1and stop sets of tasks in order to schedule the resources of a machine2according to the desires of a system administrator. This sort of program3is often used on HPC clusters to schedule access to the cluster as a4whole. The cgroup freezer uses cgroups to describe the set of tasks to5be started/stopped by the batch job management system. It also provides6a means to start and stop the tasks composing the job.78The cgroup freezer will also be useful for checkpointing running groups9of tasks. The freezer allows the checkpoint code to obtain a consistent10image of the tasks by attempting to force the tasks in a cgroup into a11quiescent state. Once the tasks are quiescent another task can12walk /proc or invoke a kernel interface to gather information about the13quiesced tasks. Checkpointed tasks can be restarted later should a14recoverable error occur. This also allows the checkpointed tasks to be15migrated between nodes in a cluster by copying the gathered information16to another node and restarting the tasks there.1718Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping19and resuming tasks in userspace. Both of these signals are observable20from within the tasks we wish to freeze. While SIGSTOP cannot be caught,21blocked, or ignored it can be seen by waiting or ptracing parent tasks.22SIGCONT is especially unsuitable since it can be caught by the task. Any23programs designed to watch for SIGSTOP and SIGCONT could be broken by24attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can25demonstrate this problem using nested bash shells:2627$ echo $$281664429$ bash30$ echo $$31166903233From a second, unrelated bash shell:34$ kill -SIGSTOP 1669035$ kill -SIGCONT 169903637<at this point 16990 exits and causes 16644 to exit too>3839This happens because bash can observe both signals and choose how it40responds to them.4142Another example of a program which catches and responds to these43signals is gdb. In fact any program designed to use ptrace is likely to44have a problem with this method of stopping and resuming tasks.4546In contrast, the cgroup freezer uses the kernel freezer code to47prevent the freeze/unfreeze cycle from becoming visible to the tasks48being frozen. This allows the bash example above and gdb to run as49expected.5051The freezer subsystem in the container filesystem defines a file named52freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the53cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.54Reading will return the current state.5556Note freezer.state doesn't exist in root cgroup, which means root cgroup57is non-freezable.5859* Examples of usage :6061# mkdir /sys/fs/cgroup/freezer62# mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer63# mkdir /sys/fs/cgroup/freezer/064# echo $some_pid > /sys/fs/cgroup/freezer/0/tasks6566to get status of the freezer subsystem :6768# cat /sys/fs/cgroup/freezer/0/freezer.state69THAWED7071to freeze all tasks in the container :7273# echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state74# cat /sys/fs/cgroup/freezer/0/freezer.state75FREEZING76# cat /sys/fs/cgroup/freezer/0/freezer.state77FROZEN7879to unfreeze all tasks in the container :8081# echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state82# cat /sys/fs/cgroup/freezer/0/freezer.state83THAWED8485This is the basic mechanism which should do the right thing for user space task86in a simple scenario.8788It's important to note that freezing can be incomplete. In that case we return89EBUSY. This means that some tasks in the cgroup are busy doing something that90prevents us from completely freezing the cgroup at this time. After EBUSY,91the cgroup will remain partially frozen -- reflected by freezer.state reporting92"FREEZING" when read. The state will remain "FREEZING" until one of these93things happens:94951) Userspace cancels the freezing operation by writing "THAWED" to96the freezer.state file972) Userspace retries the freezing operation by writing "FROZEN" to98the freezer.state file (writing "FREEZING" is not legal99and returns EINVAL)1003) The tasks that blocked the cgroup from entering the "FROZEN"101state disappear from the cgroup's set of tasks.102103104