Serial Console Pros and Cons

Why does StorPool recommend against serial console on production installations?

TL;DR – try ‘echo t > /proc/sysrq-trigger‘ and see what happens.

The serial console and frame buffer are two separate ways for the kernel to display text, either to a remote machine or on the local display. They both suffer from a very similar problem, that they block the CPU they run on for long periods of time, which in turn leads to weird latencies. The above command asks the kernel to print a lot of information on its console, and the net result is that the command (and the CPU it’s running on) block until the information gets displayed.

Let’s look at both.

Serial Console

The serial console is probably one of the oldest methods of actually sending information to a terminal. The currently used hardware, known as the 16550 UART, introduced in 1987, is the successor of the 8250 UART, released in 1981. The 16550 has a buffer of 16 bytes, is of a design that’s at least two eras old, and is probably the slowest existing interface output on any modern server (for input interface it could’ve been matched by PS/2, but that’s also extinct).

This serial port is limited to a baud rate of 115200bps, which translates to around 14KB/s. This is a limitation of the hardware, and of the transmission medium it uses.

Now, let’s see how this very old hardware is used for the serial console. If you don’t feel like reading code, skip this bit and read below:)

The serial console is in the drivers/tty/serial/8250/ directory in the Linux kernel. There is a define called “SERIAL_8250_CONSOLE“, which is described as “Console on 8250/16550 and compatible serial port“. In the file “8250_core.c” you can see that there is a function called “serial8250_console_putchar()“. It’s pretty simple in the end, doing:

static void serial8250_console_putchar(struct uart_port *port, int ch)
{
    struct uart_8250_port *up =
            container_of(port, struct uart_8250_port, port);

    wait_for_xmitr(up, UART_LSR_THRE);
    serial_port_out(port, UART_TX, ch);
}

The “wait_for_xmitr()” function in turn does this:

    for (;;) {
            status = serial8250_early_in(port, UART_LSR);
            if ((status & BOTH_EMPTY) == BOTH_EMPTY)
                    return;
            cpu_relax();
    }

and, for completeness, “cpu_relax()” on x86 is

#define cpu_relax()     asm volatile("rep; nop")

So, what does this tell us? This means that to transmit a character via the serial console, you have to busy wait until the previous one is sent, i.e. to transmit a string of characters you’ll be keeping the CPU busy until that string is fully transmitted.

This didn’t seem right to me, as I know that there is another way to use this hardware – fill its buffer, and ask it to send an interrupt when it’s empty (or close to empty). The problem here is that the kernel’s output routines are made to be as simple as possible, as they write out information that something (possibly very bad) has happened. They don’t want to be interrupted, EVER, by anything that’s happening, because otherwise you’ll be left without the proper information to debug the problem. And actually, this follows pretty close the implementation in the original Unix-es, for example you can see this in “Lions’ Commentary on UNIX 6th Edition with Source Code” and the related source code. There the kernel prints stuff to the console in the same way, with busy-waiting (see the explanation for putchar() in prf.c). The hardware there predates 8250 by a decade or two, but the concept remains the same.

This leads to latencies, as if you were doing anything that triggered the kernel to write to the console, you’ll have to wait for it. One example we have seen in production was that a process was prompting the OOM killer to run, it was outputting a lot of information, and because the OOM killer was holding some locks (like the ones controlling the creation of new processes), it was basically preventing the whole server from working.

What year is this and why is it still used?

(It’s 2017)

The reason is that for a lot of people, it still works. Before IPMI, DRAC, iLO and other management solutions were common, this was the way to access the console of a remote server, if you had problems with its network or wanted to capture the output of kernel crashes (for which we now have kdump). This, paired with a PDU that had network-controlled sockets was the greatest thing you can have for your servers.

Also, still a lot of the recent network hardware comes with a serial console (and all of the old one has it), so it’s just something normal and standard in the industry.

Why do we care about this?

In a storage system, latencies are a killer of performance. Because there isn’t any really good way to work asynchronously with storage, and most software just does read() and write() and waits for them, any such latency in the operation is felt by the process. To illustrate, if a request that was taking 5ms starts taking 8ms, the application issuing these requests will go from being able to do 200 rps to 125 rps. This is without taking into account the ability of latencies to propagate – as other processes might wait for this one, they’ll also be slowed down.

So serial console and frame buffer are something we try to disable and persuade people not to use.

We also do other optimizations in that area, for example disabling adaptive-rx and other features on network cards, but those would require a separate article.

If you have any questions feel free to contact us at info@storpool.slm.dev

Leave a Reply

Your email address will not be published. Required fields are marked *