Category Archives: Embedded C Best Practices

Battle of the Standards: C90 vs. C99 – Round 3
Inline Functions

Before the C programming language was created and engineers programmed in assembly, optimizing code was a big part of the daily routine. Today, optimization is less the job of the engineer as it is the responsibility of the compiler. Most will agree that when writing code, readability and maintainability takes precedence over fancy or confusing optimizations. Engineers should only take optimization into their own hands when absolutely necessary. Modern compilers are much more intelligent than they used to be and are able to make ‘decisions’ on how to optimize code for speed and size. To help compilers with some of these decisions, the C99 standard introduced the inline keyword which suggests to the the compiler that the body of that function should be placed inline wherever it is called. What does this mean? Let’s take a deeper look.

C provides us with many different ways of reusing a common piece of code. Functions are the most obvious that come to mind, but there are also macros and gotos. Each have their own advantages and disadvantages. Functions are the safest option as the have typed arguments and return types and provide the best modularity of code. However they do have disadvantages. Functions have the highest overhead and therefore the greatest impact on performance. Having a lot of small functions is great for readability and structuring of code, but each function call will produce several instructions of additional code which consumes memory and CPU cycles. This is because every time a function is called, the function arguments must be loaded into registers (and/or the stack), the address of the next instruction is stored on the stack (so the processor knows where to return to) and finally jump to the function. Inside the function, stack space must be allocated and the arguments removed from registers and placed onto the stack. Local variables are also stored on the stack. Similarly, when exiting a function the stack has to be freed, the return value stored in a register, and then jump back to continue from where the function was originally called. To be fair, this seems seems like a huge process, but really it will boil down to a handful of instructions depending on the function and architecture. Nonetheless, if you are trying to write a complex program with hard real-time deadlines, it can have an impact on the ability to meet those deadlines.

A common ‘solution’ to these issues is to use macros instead of functions. Code defined by a macro is always executed inline. The precompiler searches the translation unit (C file) for instances of the macro name and replaces them with the defined code. This makes macros much faster and efficient in general than functions because there is no overhead, require no additional stack space and and can be better optimized by the compiler. Macros are not typed making them more flexible, but this can also be a double edge sword. Multi-line macros are more difficult to read than functions, and there are are other caveats like side-effects, cryptic compiler errors, and code bloat. Code bloat is when macros are scattered everywhere throughout the code. Since the same code is repeated over and over again each time the macro is called, additional program memory must be allocated. For example, if a macro compiles down to 10 bytes of machine code and is used 1000 times, that’s 10000 bytes of program memory required. If a function was used instead, there might be some function overhead, but the body of the function would only occupy 10 bytes of program space. Finally, macros (specifically function-like macros) are often frowned upon my many coding standards. The general rule of thumb is use macros to replace functions only when performance is an issue. However this is a broad generalization, and when a code base becomes big and complex, it may not necessarily be the best solution.

When C99 introduced inline functions, it tried to address the caveats of both functions and macros providing the best of both worlds while catering to the increasingly complex code being produced. Inline functions provide all the benefits of a normal function, but leave it up to the compiler to determine what is the best way to optimize the code – i.e. to inline or not to inline. This is based on a number of code characteristics as defined by the compiler algorithm and the compiler optimization level used. Inline is a keyword just like static or volatile. The difference with the inline keyword is that it is just a suggestion to the compiler. It does not have to inline the function just because a function was declared as such. By pointing out which functions the compiler might want to optimize, the compiler knows which functions to focus its optimization efforts on while keeping compilation time to a minimum. Until you have no other option, trust the compiler to do it’s job and let it optimize your code as much as possible the way best that it can.

To have a better understanding of how this all works and how the compiler can optimize code using inline functions, let’s look at simple example with a very common operation: max – take two values and return the bigger of the two. We will implement it as a regular function, a macro and an inline function, and then compare the output. The implementation used is trivial. first as a function:

int max(int a, int b)
{
    return (a > b) ? a : b;
}

To make an inline function, simply add the inline keyword in front:

inline int max (int a, int b)
{
	return (a > b) ? a: b;
}

And defined as a macro:

#define MAX(a,b)	((a > b) ? a : b)

Now let’s create a simple test function. Say we want to find the maximum value in an array of integers. In our test code, the array will be extern’d rather than defined in the same file. This is because if the compiler can see the array, it can optimize based the initialized values. The test function will traverse the array and compare the current maximum with the current element and store the greater of the two in a variable, which will be returned to the caller.

extern const int test_array[];
extern const size_t test_array_size;

int max_element(const int)
{
    size_t i;
    int max_value = test_array[0];

    for (i = 1; i < test_array_size; i++)
    {   
        max_value = max(max_value, test_array[i]);
    }   

    return max_value;
}

The variable max_value is initialized to the first member in the array, so we start iterating from index 1. Each iteration the higher value will be stored in the variable max_value.

To compile the code, I have used the following gcc arguments in my makefile:

msp430-gcc -mmcu=msp430g2553 -mhwmult=none -c -O0 -MMD -Wall -Werror -Wextra -Wshadow -std=c99 -Wpedantic

We start off by compiling the code compiled as regular function. Optimizations should be turned off by passing the argument -O0 so that the compiler doesn’t automatically inline our function (the compiler might inline your function even if its not defined as an inline function if optimizations are turned on). This can be confirmed using the objdump utility and look at the disassembled output.

func_O0

Let’s take a look at the output in greater detail. Our first consideration is performance – i.e. speed – the less instructions / CPU cycles the better. We can see where the max function gets called because the call instruction is invoked and has a destination with the address the same as max (0xc13e). The two mov instructions prior are filling r12 and r13 – setting up the function arguments. The call instruction is not actually a single instruction – it represents two operations and consumes several CPU cycles (4-5). This includes saving the address of the next instruction on the stack (the one to return to) and then branching to the max function. Inside the max function, we have the overhead of the creating the stack and saving the two arguments to it (the first three instructions) and then at the end of the function the last two instructions are freeing the stack used by the function and then jumping back to the saved address before the call. This also consumes several CPU cycles. Actually calculating the overhead would required doing some research to find how many CPU cycles each instruction takes and then adding them up. That is beyond the scope of this post, so to perform an example calculation of overhead let’s say there are 15 additional CPU cycles required for function overhead. If the array is only 5 elements long, the overhead would consume 75 CPU cycles. Not a big deal right? But what if the array was 1000 elements. That would be 15000 CPU cycles. Executing this function once a second with a 1MHz CPU clock would result in 1.5% of the CPU cycles being used for this one function’s overhead!

The other characteristic we have to investigate is the size of the code image because that will determine how much memory (flash in the case of the MSP430) is required. Using the objdump utility to view the section headers (passing the argument -h), the size of the compiled code which is defined by the .text section is 0x284 (644) bytes. Keep in mind this size is for the whole image, not just this one function.

func_size_O0

Let’s see how this will differ if we used a macro instead of a function. Implement the code with the max calculation as macro (as above) and compile it again using the same arguments. The output will look like this:

define_O0

The first thing to note is that the max function is gone altogether, as expected. The code defined by macro has become part of the max_element function rather than a separate function. We can see the actual comparison occurs where the cmp instruction is invoked, similar to how it is implemented in the max function above. By comparing the number of instructions in these two examples, it is safe to say we will benefit from a pretty significant performance improvement using the macro.

In terms of code size, it has reduced quite a bit down to 0x26c (620) bytes. Of course this is expected only because the macro is called once. If it was called several times, the code size would start increasing.

Finally, let’s see the inline version of the code and see how that compares. Note that with optimizations turned off, the compiler does not inline the function – I had to gcc attributes to force it ( __attribute__((always_inline)) ). Compiling and getting the output…

inline_func_O0

Interestingly, it is not as efficient as the macro. There is still some overhead, possibly because the compiler inserts the inline function after both are compiled and therefore has to add some glue code. The code size has also increased to 0x27c (636), 16 bytes more than the macro, but still 8 bytes less than a regular function. Therefore, it appears that inline functions are less efficient than macros. 

To see if we can make the inline function as efficient as the macro, let’s turn on optimizations. Recompiling the same code with the -O2 option instead of -O0, we can see that both the inline function and the macro result in the same generated machine code, and are therefore equally efficient in terms of performance and space. No need to paste the same code twice, take my word for it or try it yourself and you will see that they are the same.
define_O2

Keep in mind that just because in this example the generated machine code turned out to be the same, this may not always be the case. Optimizations use very complex algorithms and the optimal solution could change based on many variables that might not be easily spotted by basic analysis.

If we wanted to optimize for size instead of speed, we would use the argument -Os. In this example, the compiler will still inline the function because it is small and called only once. However, if the function was bigger and used in many places, the compiler may choose not inline the function.

All in all inline functions are a great tool to have. In fact, inline functions proved to be so useful that most compilers support them in C90 mode with extensions. Modern compilers are very advanced and will most likely do the best thing regardless whether a function is declared inline or not. However it doesn’t hurt to make the suggestion – the worst case is the compiler ignores you. +1 C99 for introducing inline functions.

Battle of the Standards: C90 vs. C99 – Round 2
Designated Initializers

In C, the initialization of structures can be dangerous if the structure definition ever changes. When a structure is initialized, the order of initializers must follow the structure definition otherwise the members will be assigned the wrong value. This may seem trivial, but take the following example. You have a module which is controlled using a message queue. When the module is first written, the message takes the following simple form.

enum message_id
{ 
    MSG_ID_READ = 0,
    MSG_ID_WRITE,
    MSG_ID_MAX
};

struct message
{
    enum message_id id; 
    uint8_t length;
};

A helper function is used to build and send the write message. With C90, the function might look like this:

static void send_write_msg(uint16_t length)
{
    struct message msg = {MSG_ID_READ, length};
    send_msg(&msg);
}

Several years later there is an urgent need to add a priority to the message. Someone comes along and hastily modifies the structure to add the new member.

struct message
{
    enum message_id id; 
    uint8_t priority;
    uint8_t length;
};

But they add it in middle of the structure… The existing initializations will still compile without error since the second member in both the old and new structure definition –  priority and length respectively – have the same data type. As long the types match, the compiler has no way of knowing that the structure has changed and is now no longer initialized correctly. Only if the types are different and all the strict compiler checks are enabled will it throw an error if say, a uint32_t is assigned to a uint8_t. Even if existing code does not require this new member, the initialization will still be wrong. Developer testing or a code review might catch this type of error, but it would be even better if it could be avoided all together.

C99 introduced a new feature to address this called designated initializers. It is a way to explicitly initialize a member using a new syntax. The function to send a write message could be instead written to take advantage of this syntax.

void send_write_msg(uint8_t length)
{
    struct message msg = {.id = MSG_ID_READ, .length = length};
    send_msg(&msg);
}

The member name is prefixed with a period (.) and then assigned the initializer. In the example above, the existing initialization code is still correct after the structure definition was updated. Therefore a structure can be changed without having to update all the initialization code if the new member is not required. So there are two advantages here: 1) it is safer, and 2) it can lead to more efficient development.

This feature also applies to arrays. Let’s say each message has a handler function. If the message IDs are mostly consecutive, the easiest and often most efficient implementation is to define an array of function pointers. A dispatcher then reads the incoming messages and invokes the appropriate handler.

typedef void (*message_handler_t)(uint8_t priority, uint8_t length);

static message_handler_t msg_handler[] =    
{        
    read_handler,
    write_handler
};
...
void message_dispatcher(void)
{
    struct message msg;

    while (1) {
        read_msg(&msg);

        if (msg.id < MSG_ID_MAX) {
            if (msg_handler[msg.id] != NULL) {
                msg_handler[msg.id](msg.priority, msg.length);
            }
        }
    }
}

Defining the array of message handler seems pretty trivial. However, this is because there are only two messages in this example. If you had 100 messages, keeping track of which handler goes where in the array can be difficult. How do you verify that the handler is at the right index in the array without manually counting? Especially when you may be implementing the handlers out of order. Well, designated initializers in C99 can help here too. Just like with the structures, we can tell the compiler at which index the initializer is intended for.

static message_handler_t msg_handler[] =
{
    [MSG_ID_READ] = read_handler,
    [MSG_ID_WRITE] = write_handler
};

The nice thing about this syntax, is that the array doesn’t have to be initialized in order or even consecutively anymore. And since it explicitly states which index (or in this case message) the initializer is intended for, it makes it easier to read with really big arrays. However you do have to be careful since you can inadvertently create a really big array for no reason. For example, if the first message ID is non-zero and starts at 200 instead – i.e. MSG_ID_READ = 200 – using the ID as a designated initializer will force the compiler to allocate an array of 202 elements, even though the first 200 are unused. This is obviously undesirable, and you might consider using an offset instead of the actual message ID.

Overall, designated initializers in C99 are a definite improvement over C90. Being able to initialize both structures and arrays explicitly can improve the readability and maintainability of your code. And with that, I believe that brings the score to C99:2 – C90:0.

Battle of the Standards: C90 vs. C99 – Round 1
Non-Constant Initializers

I have a confession to make…

I love C90.

Yes, I said it. Strict, pedantic ISO C90 – also known as ANSI C. You might be wondering why I support a 30+ year old standard. You might even be wondering why you ever started following website in the first place. But before you start telling me how wrong I am, let me explain myself. When using strict ISO C90, I feel like it will force me to produce more portable, robust and clean code than say using the C99 standard. I also like to think that if I use the most mature standard, the compiler will also be more mature, more stable, quicker and produce better machine code (the man pages for gcc v4.8.2 still say C99 is only partially supported). Many of the most stringent coding standards used in high reliability/safety critical systems only support ANSI C. There could be many reasons for this – legacy code, legacy compilers, etc… But to be quite honest, my love for ISO C90 is partly (if not mostly) in my head. C99 introduced a wide variety of new features to the C programming language. A lot of them are actually good, if not great. But others are just dangerous and can’t be justified in my opinion and for this reason I stay away from the newer standard as much as possible.

But no more. I will [try to] embrace its positive aspects. Over the course of this mini-series, I will show you examples of both the good and the bad of C99. Hopefully at the end of it, we will come to the conclusion that if used correctly, the C99 standard can help in writing more clear, robust and optimized code than with C90.

One side note here before we get started. Whether you are using C90 or C99, it is highly recommended to disable compiler specific constructs and extensions when possible. There are times when they are needed, but their use should be limited to hardware specific code and should not be scattered around in every source file. Try porting code that randomly assigns variables to hardcoded memory locations to a different hardware platform…

Now, let’s get started shall we?

Round 1: Initialization of complex data types

The first difference between C90 and C99 we will explore is the initialization of complex data types within a function. A complex data type is an array, structure or union. In C90, all initializers of complex data type must be constant expressions. A constant expression is one that can be evaluated at compile time. This means for example, you cannot pass in an argument as an initializer. Similarly, a function call cannot be used as an initializer. In C99, both of these are possible. This is a very useful feature in many circumstances. Let’s say you have a driver for a command based device (think SPI NOR). You write a small helper function to support a specific command which requires one address byte and one data byte following the command (three bytes in total assuming 8-bit commands). The address and data are passed as arguments by the calling function. Let’s take a look at how we have to do this in C90.

int spi_nor_cmd_xxx (uint8_t address, uint8_t data)
{
    uint8_t cmd[3];
    cmd[0] = 0x50; /* Don't use magic numbers !! */
    cmd[1] = address;
    cmd[2] = data;
                                               
    return spi_nor_write(cmd, sizeof(cmd));
}

So what is wrong with that? Technically, nothing at all (other than the magic number 🙂 it’s only there to show it is constant). But there are three points I would consider which would make this better.

  1. The array cmd could be defined as a constant. Once initialized, it should never change.
  2. The array is initialized with a defined length of three (3) rather than allowing the compiler to determine the length based on the number of initializers. This could lead to copy-paste errors if multiple of these helper functions are implemented (yes… I shouldn’t have used magic numbers either).
  3. It takes 4 lines of code to initialize this three byte command.

All three of these points can be addressed by using C99 and its support for non-constant initializers.

int spi_nor_cmd_xxx (uint8_t address, uint8_t data)
{
    const uint8_t cmd[] = {0x50, address, data};
    return spi_write(cmd, sizeof(cmd));
}

That looks much better in my opinion. With the C99 version, the array is now constant and it is defined in a single line. There is less room for error because the array size is defined based on the number of initializers.

An important factor which must be considered: does this affect the output? Is there any change in performance, memory usage, ability to optimize? I would assume not considering it is really the exact same code functionally, however this could vary by compiler. I tested this theory using gcc v4.8.2 (x86) on Linux. I chose to compile for x86 instead of the usual msp430 because I figured since x86 assembly is vastly more complex, if there is anyway to optimize one over the other, it would do so. To compile with strict C90 enabled and all GNU extensions disabled, the command line arguments ‘-ansi -Wpedantic’ must be used. The ‘-ansi’ argument is equivalent to ‘-std=c90’ in gcc. The ‘-Wpedantic’ flag tells the compiler that it must check for strict ISO compliance. Compiling the C90 version of code produces the output below (using objdump).

c90_array_init_objdump

Compiling the C99 version with ‘-std=c99’ to enable C99 but not the GNU extensions produces the output below.

c99_array_init_objdump

Comparing the two we can see that they are exactly the same. Compiling again with optimizations set to O2 shows no difference. Compiling once more with msp430-gcc shows the same results. Interestingly enough, msp430-gcc produced more efficient code than x86-gcc in terms of code size / number of operations – guess my hypothesis was wrong. This is by no means an exhaustive set of tests but it shows that the compiler recognizes these two examples as two different forms of the same construct. However, this is a very simple example, and if you had more variables and code which might change the order of execution, the output will undoubtedly change as well.

Can you think of any downsides to the C99 form? Potentially. Let’s say one of the initializers was a function call which returns a uint8_t. If the function call can fail and return an error instead of a valid value (as it should), then I would not recommend this approach. Instead the function should be called first, the return value checked, and then if and only if it is sane, the command can be initialized. The wrong/dangerous way:

int spi_nor_cmd_xxx (uint8_t data)
{
    const uint8_t cmd[] = {0x50, saved_data_get_address(), data};
    return spi_write(cmd, sizeof(cmd));
}

The better way:

int spi_nor_cmd_xxx (uint8_t data)
{
    int err = -1; 
    const uint8_t address = saved_data_get_address(0);

    if (address > 0) {
        const uint8_t cmd[] = {0x50, address, data};
        err = spi_write(cmd, sizeof(cmd));
    }   

    return err;
}

What about all these new stack variables I just allocated? Isn’t that less efficient? Well maybe, but the compiler should easily optimize all that out. Plus, if you were to implement the same code in C90 correctly, you would have the exact same issue. So yes, here with C99 we could have written more concise code (although arguably maybe more difficult to read), however it would not meet our standards for error checking. So use this feature wisely.

Keep in mind that all this applies to the initialization of structures as well. Say we have a similar example as above, but instead of a byte array being written to a device, the command is a message which is going to be queued for use by a different code module.

struct message
{
    uint8_t cmd;
    uint8_t address;
    uint8_t data;
};

Again, we want to make a simple helper function to send a specific message. In C90, it would look like this:

int send_message_xxx(uint8_t address, uint8_t data)
{
    struct message msg;
    msg.cmd = 0x50; /* Don’t use magic numbers !! */
    msg.address = address;
    msg.data = data;

    send_message(&msg);
}

Ok, not all that bad. Let’s see how it looks with C99.

int send_message_xxx(uint8_t address, uint8_t data)
{
    const struct message msg = {0x50, address, data};
    return send_message(&msg);
}

As with the example using arrays, the C99 version of this code is cleaner, more concise and allows us to define the message as a constant. However, this type of initialization for structures can be dangerous. If my colleague adds a new member to the structure right after cmd, the initialization code in C99 would be wrong. But as we all know, this a general problem with initializing structures in C. If the structure was initialized outside a function, you don’t even have the option of explicitly assigning the initializers. With C90 at least…

In the next round of Battle of the Standards, we will see that C99 addresses this issue quite elegantly. But for today, I conclude that  initializing complex data types with non-constant initializers is a feature of C99 which I certainly accepted in my toolkit. It has a few key benefits, and virtually no pitfalls. +1 for C99.