Why do I get inconsistent signals for orphaned process group?

I’m working through example in Advanced Programming in the UNIX Environment:

#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

static void
sig_hup(int signo)
{
    printf("SIGHUP received, pid = %ld\n", (long)getpid());
}

static void
pr_ids(char *name)
{
    printf("%s: pid = %ld, ppid = %ld, pgrp = %ld, tpgrp = %ld\n",
        name, (long)getpid(), (long)getppid(), (long)getpgrp(),
        (long)tcgetpgrp(STDIN_FILENO));
    fflush(stdout);
}

int
main(void)
{
    char    c;
    pid_t   pid;

    pr_ids("parent");
    if ((pid = fork()) < 0) {
        exit(1);
    } else if (pid > 0) {   /* parent */
        sleep(5);       /* sleep to let child stop itself */
    } else {            /* child */
        pr_ids("child");
        signal(SIGHUP, sig_hup);    /* establish signal handler */
        printf("test\n");
        kill(getpid(), SIGTSTP);    /* stop ourself */
        pr_ids("child");    /* prints only if we're continued */
        if (read(STDIN_FILENO, &c, 1) != 1)
            printf("read error %d on controlling TTY\n", errno);
    }
    exit(0);
}

The idea is that:

  • We create child process
  • It has a handler for SIGHUP
  • We stop the child with SIGTSTP
  • Parent terminates and child is orphaned (member of orphaned process group)
  • POSIX.1 requires that stopped processed should be sent SIGHUP (which we handle) followed by SIGCONT
  • This should run the child’s SIGHUP handler and continue process

When running this program from shell, I get flaky results. The example in book show this output:

$ ./a.out
parent: pid = 6099, ppid = 2837, pgrp = 6099, tpgrp = 6099
child: pid = 6100, ppid = 6099, pgrp = 6099, tpgrp = 6099
$ SIGHUP received, pid = 6100
child: pid = 6100, ppid = 1, pgrp = 6099, tpgrp = 2837
read error 5 on controlling TTY

However, my results are:

  • either no message that SIGHUP was received
$ ./a.out
parent: pid = 294729, ppid = 291330, pgrp = 294729, tpgrp = 294729
child: pid = 294730, ppid = 294729, pgrp = 294729, tpgrp = 294729
  • or SIGHUP was received but child didn’t print it’s pid output (so it didn’t continue correctly)
$ ./a.out
parent: pid = 295709, ppid = 291330, pgrp = 295709, tpgrp = 295709
child: pid = 295710, ppid = 295709, pgrp = 295709, tpgrp = 295709
SIGHUP received, pid = 295710
  • or child did print it’s output after SIGCONT, but I didn’t get an error on tty.
$ ./a.out
parent: pid = 294722, ppid = 291330, pgrp = 294722, tpgrp = 294722
child: pid = 294723, ppid = 294722, pgrp = 294722, tpgrp = 294722
SIGHUP received, pid = 294723
child: pid = 294723, ppid = 2065, pgrp = 294722, tpgrp = 294722

I’d like to get an idea on why there is such variance between the results.

What you’re experiencing is expected behavior given the environment-dependent and timing-sensitive nature of how signals like SIGHUP and SIGCONT are delivered to orphaned process groups.

This example in Advanced Programming in the UNIX Environment is a great case study in how process groups, terminal control, and signal semantics work in POSIX. But because it relies heavily on precise interactions between the kernel, terminal driver, and process states, slight variations in timing, terminal configuration, or shell behavior can change the result.

Let’s break down what’s happening and why you see different outputs:


What the program is designed to demonstrate

  1. The child installs a SIGHUP handler and stops itself (SIGTSTP).
  2. The parent sleeps briefly, then exits, orphaning the child.
  3. POSIX requires that stopped orphaned process groups that are not in the foreground of a terminal must receive SIGHUP and SIGCONT.
  4. The child, upon receiving SIGCONT, resumes and continues executing after the kill(SIGTSTP) line.
  5. It tries to read from the terminal. Since it’s now disconnected from the controlling terminal, the read() fails with errno == 5 (EIO), and prints the error message.

Why you’re seeing inconsistent results

1. Race conditions and timing

The parent only sleep(5)s, hoping that during that time:

  • The child gets stopped (SIGTSTP)
  • The child is still stopped when the parent exits

Sometimes the child is still running or not fully stopped when the parent exits, which breaks the intended orphaned group behavior.

Fix: Use waitpid(pid, &status, WUNTRACED) in the parent instead of sleep(5) to ensure the child has really stopped.

waitpid(pid, NULL, WUNTRACED);  // replace sleep(5)

2. Your shell or terminal driver may behave differently

POSIX leaves some aspects of signal delivery and terminal behavior to implementation. Depending on:

  • Your shell (bash, zsh, fish)
  • Whether you’re running under tmux, screen, or a graphical terminal emulator
  • Whether job control is fully enabled
  • Whether the child becomes a session leader or not

… all of these affect the result. Some terminal setups may not send SIGHUP or SIGCONT as expected.

3. Terminal disassociation not guaranteed

If the child doesn’t lose its controlling terminal (e.g., it somehow inherits the terminal after the parent exits), then the read might not fail. That explains why in some cases you don’t see the read error 5.


Recommendations to get stable behavior

1. Use waitpid() instead of sleep() in the parent:

waitpid(pid, NULL, WUNTRACED);

This ensures the child is actually in a stopped state before the parent exits.

2. Disable terminal buffering in the child (optional)

Sometimes, printf output from the signal handler is lost due to output buffering. You can flush explicitly inside the handler (though not async-signal-safe):

static void sig_hup(int signo) {
    write(STDOUT_FILENO, "SIGHUP received\n", 16);  // async-signal-safe
}

Or flush stdout before/after printing.

3. Make sure you’re running the program in a normal terminal, not inside an IDE, tmux, or script session

These environments change how controlling terminals are handled. To observe the POSIX behavior cleanly, use a raw terminal (like gnome-terminal, xterm, or a direct tty) and run the program directly.


Summary

You’re seeing flaky behavior because the example is sensitive to:

  • Timing (child may not be stopped when parent exits)
  • Shell/terminal behavior (some terminals don’t deliver SIGHUP/SIGCONT correctly)
  • Buffering and output (signal handler output may be lost or delayed)

To make results more predictable:

  • Replace sleep(5) with waitpid(..., WUNTRACED)
  • Use async-safe logging inside signal handlers
  • Run in a minimal terminal environment