Cool Tidbits from “Linux System Programming”

Recently I’ve undertaken a period of deep study related to Linux in all its aspects, but especially for embedded programming. While I’ve used Linux off and on professionally and privately for at least 18 years, my knowledge has always been just enough to get by.

Starting in about 1999 I’ve set up and managed Linux firewall / routers for my home and office use, including DNS, email, email list, and web servers. I continue this to this day.

In 2005, I created a C++ cross-platform framework for using sensors and motion controllers on mobile robots, which I got to work under Linux and Windows with various built-in or external I2C or serial port devices as the attachment points.

From 2013-2017, my team at EM used various off-the-shelf single board computers such as the Pandaboard or Dragonboard as test platforms. We ran the Dragonboard with Debian and wrote a lot of utilities and test software for the ASIC we were developing on it. While I didn’t write the core C framework, I did need to, at times, troubleshoot bugs deep within it.

But programming-wise, I have not otherwise had the need to do much beyond simple POSIX-compliant C programs under Linux, and some occasional PHP, Python, and Java work. This was primarily as a result of depending on Windows for office and embedded toolchain support. In my career I’ve spent a lot of time deep into smaller embedded systems using PIC, AVR, ARM, ARC, and MSP430 processors, with or without an RTOS, which kept me away from digging deeper into Linux.

The first Linux book I finished reading is the O’Reilly book, Linux System Programming by Robert Love. Overall, I found it well written and easy to read. I like that, even though it is a Linux book, he points out which of the various POSIX, System V, BSD, and Linux APIs are portable or not, and which are best avoided.

There were no major surprises in the content for me, as I was familiar with most of the concepts, but certainly there were interesting APIs and command line tools as well as some higher-level concepts which I was not aware of. Here is a list of some things that stood out.

system call interface: on the i386, user code utilizes registers ebx, ecx, and so on to pass parameters, then invoke int 80 to cause a trap into the kernel — very similar to the now ancient DOS int 21 scheme
standards: of the history of POSIX (Portable Operating System Interface), SUS (Single UNIX Specification), and LSB (Linux Standard Base), I remember reading about the Unix Wars, OSF, and X/Open, but didn’t recall the merger forming the Open Group, which then released SUS
inodes and how hard links work: two or more directory entries point to the same inode; a link count is maintained to ensure the contents are not removed until all hard links are; cannot span filesystems; the stat() family of functions returns a stat structure containing the inode number and hard link count (among other useful tidbits I was already familiar with)
symbolic links: essentially a regular file containing the path name of the linked-to file, which can be on any filesystem
processes: process id 1 is always the init process, while process id 0 is the idle process
process groups: represent a parent process and its children, such as happens when a shell starts up a pipeline (e.g., ls | less), and provides a way then to send signals to or get info from an entire pipeline or all children thereof
forking: if a parent process terminates before its child, the child is reparented to the init process
zombies: a terminated process is a zombie until it has been waited on; the init process will clean up the zombies as it becomes their parent (if the parent terminated first) and thus can wait on them; this zombie state is necessary so parent processes can obtain information about why a child terminated, such as its return value
waiting on processes: just like with many of these topics, there are many different functions providing varying levels of control for waiting on a process and obtaining info from it: wait(), waitpid(), waitid(), wait3(), and wait4(); the latter two provide lots of resource usage statistics such as memory use, page faults / swaps, block I/O operations, messages sent/received, signals received, and context switch counts
open(): O_ASYNC requests that a signal be sent when the FIFO, pipe, socket, or terminal becomes readable or writeable; O_DIRECT requests direct I/O; O_NONBLOCK requests non-blocking I/O (usually for FIFOs); O_SYNC requests synchronous writes
pread() and pwrite(): positional equivalents of lseek() + read() or write(), which ignore the file position and leave it alone
multiplexed I/O: select(), pselect(), poll(), ppoll(), and epoll() all allow a process to block on an array of file descriptors until one of them is ready to be read or written, with varying control over how signals are handled; epoll() appears to be the superior function to use
buffered I/O: besides the familiar fopen(), fread(), etc., there are _unlocked() equivalents which are unsafe but give a sizable performance improvement compared to the standard locking functions
scatter/gather I/O: reads or write contiguously from or to a file using one or more segments of memory, where each segment can reside at a different location and have a different size; interestingly, this does not provide a way to modify the position within the file between segments, so to me, it is not truly scatter gather (perhaps my opinion is colored by the old SCSI bus scatter/gather concept, which kind of does that); functions are readv() and writev()
memory mapped files: while I’m familiar with the concept, the details of what can be done on Linux are interesting; for example, you have control over protection (read, write, exec), you can make it private or shared (with other processes that open the same file), and memory must be aligned to MMU page size boundaries and be of multiples of a page in size; you can give the kernel hints about how it will be used so it can optimize its read ahead strategy using madvise(); mmap() is the main function involved;
by setting the MAP_ANONYMOUS flag, one can create a mapping not backed by a file; further, if NULL is passed as the starting address, the kernel allocates pages with copy-on-write mapped to an already zeroed page, so the mapped memory returned will already be cleared
normal file I/O: a similar posix_fadvise() allows you to help the kernel optimize read ahead for an unmapped file too
copy-on-write: this MMU optimization strategy prevents wasting time copying memory from a parent to a forked child unless the child modifies it; if never written, they share the same pages of physical RAM; but if written, new pages are allocated, the original contents are copied there, and then the process can write cleanly and uniquely there
user and group ids: a process actually has four user IDs and four group IDs associated with it — real, effective, saved, and filesystem; and there are APIs to read and modify them (though root privileges are needed for many modifications, as one would expect)
sessions and session leaders: this is associated with a login shell and a controlling terminal; a session is a collection of one or more process groups
daemons: a daemon is simply a session-less process running as a child of init; this can be done either by calling fork(); exiting from the parent; from the child, call setsid() to set a new process group and session; and clean up various file descriptors (such as 0, 1, 2 = stdin, stdout, stderr); or, the process can simply call daemon()
processor affinity: this provides the ability to control which processor in a multicore system the process will run in; this is known as hard affinity, and can help when a given process is very sensitive to the CPU cache; if not set, the kernel uses soft affinity, which tries to keep a process running on the same CPU each time its timeslice occurs, but this is not guaranteed; calls include sched_setaffinity() and sched_getaffinity()
real time support: besides setting the nice() value or the process priority, the scheduling policy can be set to either FIFO, Round Robin, or Other (default); FIFO and Round Robin help ensure that response latency for a process that handles an external signal is predictable; calls include sched_setscheduler() and sched_getscheduler(); pro tip from the book — “while developing a real time process, keep a terminal open, running as a real-time process with a higher priority than the process in development” — this ensures you can kill your process if it runs amok; the util-linux package of tools includes the chrt utility helps you set real-time attributes on other processes
extended file attributes: while I was familiar with EAs in NTFS on Windows, POSIX (and Linux) provides a relatively generic file-system-agnostic mechanism to associate key-value pairs with files (though not all Linux filesystems support this); often this information is stored in unused portions of a file’s inode; namespaces are provided for system, security, trusted, and user; functions include removexattr(), setxattr() and getxattr() to set and get a specific key’s value, as well as listxattr() to get a list of all keys; it makes me wonder what information is commonly hidden there!
special device nodes: besides the commonly known /dev/null, there are also /dev/zero which, when read, returns an infinite stream of zeros or accepts and discards writes as /dev/null does, and /dev/full, which reads like /dev/zero but writes fail immediately with ENOSPC; these are useful for testing purposes
monitoring file events: as Windows does, Linux provides a mechanism to watch changes of various kinds to specified file or directory paths; a single notifier can handle multiple files, and behaves like a file, so reading notifications is done using a normal read() call, and the file descriptor can be waited on with any of the multiplexed I/O mechanisms; functions include inotify_init1(), inotify_add_watch(), inotify_rc_watch(), and close(); there are a lot of options for controlling what events you are interested in watching
memory locking: high performance programs can benefit by locking important regions of their memory against swapping by the MMU, using mlock() to lock a specific range of addresses or mlockall() to lock a process’s entire address space in physical memory; there are of course equivalent functions for unlocking — munlock() and munlockall()
signals: the book touches on some important weaknesses regarding signals in Linux which must be understood to avoid serious problems when using them; for example, a process could be executing anywhere, including in a system call, so signal handlers need to stick to reentrant, signal-safe library functions; a process that needs to manage multiple signals can combine them in a signal set using functions like sigaddset(), sigismember(), etc.; sigprocmask() can block specific signals to protect critical regions; in addition to the simple signal() function, sigaction() provides a much more powerful way to handle signals, including the ability to block specific signals while inside your signal handlers, and gives the handler a lot of information about what was going on when the signal occurred; sigqueue() provides a way to send a payload together with a signal, which sigaction()‘s SA_SIGINFO type handler is passed when it is called
time: besides the familiar time_t, struct tm, and time() functions, Linux supports 5 different POSIX clocks, which include CLOCK_REALTIME (the normal system time), CLOCK_MONOTONIC (won’t go backwards during leap seconds, for example), CLOCK_PROCESS_CPUTIME_ID, and CLOCK_THREAD_CPUTIME_ID, the latter two of which give access to the x86 high resolution CPU registers; clock_getres() tells you what resolution the specified clock has, and clock_gettime() obtains time from that clock; clock_nanosleep() lets you sleep using one of these clocks for relative or absolute times, and returns the amount of time remaining if the sleep was interrupted by a signal; rather than sleeping, timers can be set up; of note are the advanced timers using timer_create(), timer_settime(), and timer_delete(); such timers can use any of the POSIX clocks, can either send a signal or spawn a thread to execute the specified handler function, and can return to you the amount of time a timer might have overrun with timer_getoverrun()

Clearly a lot of powerful features are available in Linux beyond the normal libc functions I’ve used for years. These will be very useful as I embark on digging deeper into embedded Linux.

After reading this book, I now want to revisit that robot framework I mentioned and take advantage of the many new (2.6.x kernel and later) Linux mechanisms described above.

Next up: the excellent Packt Publishing book Mastering Embedded Linux Programming by Chris Simmonds.

The Robot Less Travelled

Cool Tidbits from “Linux System Programming”

Leave a Reply

Robotics, Electronics, Programming, and other Ramblings