How does progress(1) work?
We’ll cover a neat little utility called
progress(1). Many common utilities like
gzip don’t spit out a progress bar by default.
progress finds those
processes and estimates how far along they are with their operation. For
example, if you’re copying a
progress will indicate
that it’s progressed
1Gb, and has another
9Gb to go.
Here’s an example, kindly borrowed from the project’s README:
What I was interested in is, how does it work? The README briefly goes
over it, but I wanted to go a little deeper. Fortunately, it’s a fairly simple C
program. While this utility works on MacOS, I’ll cover how it works on Linux.
For MacOS, the methods for obtaining the information about the file-descriptors
and processes is slightly different, utilizing a library called
to the absence of the
/proc file-system. That’s the depth we’ll cover with
At the heart of
progress, we find the function
On Linux, every process exposes itself as a directory on the file-system in
/proc/<pid>. In the directory, there’s e.g. the
exe file is a
link pointing to the binary that the process is executing, this could be for
/bin/tar. There’s many other interesting links and files in here. I
environ regularly in production to check which environment variables a
process has open. Other files will you about its memory usage, various process
configuration, or its priority if the OOM-killer is looking for its next target.
progress will look through the
exe links for all processes on the system to
find interesting binaries, like
md5sum, and many more.
For each of these processes, it’ll scan every file descriptor the process has
opened through the
/proc/<pid>/fdinfo directories. These
contain ample information about the file, such as the name of the file, the
size, what position we’re reading at, and so on.
progress will skip file
descriptors that are invalid or are not for files, e.g. a socket.
progress will find the biggest file-descriptor opened by the process, e.g.
cp is copying and see what offset in the file the process is at.
Based on that, the total file size, and waiting a second before doing a second
read it can estimate the process of the process and its throughput.
progress has done this for all processes, it’ll either quit or do it all
over again (this only takes a few milliseconds). To the user, this appears as
continues monitoring of the processes’ progress!
Of course, this simple method has its limitations. If you’re copying a lot of
small files, then it won’t help you very much. It could be extended to detect
such programs and monitor them, but it’s certainly not trivial. The way this
works also limits its usability in networks, depending on how the network
program is written. If it streams a file locally as it transfers it, it’ll work
well, but if it loads the whole thing into memory and then transfers it,
progress won’t know what to do. From the documentation, it appears that this
works well for downloads by many browsers. Presumably because they pre-allocate
a large file based on the header of the content-length.
progress can then
monitor how far along the offset we are.
You might also like...