Filesystem Basics with the Raspberry Pi

Note: the following post was originally published for Victoria Pi Makers and Friends as part of a presentation I prepared for the group.

In this tutorial we are going to look at the filesystems on the Raspberry Pi and how they are used to boot Linux from an SD card. This is the only officially supported boot method available on the Raspberry Pi and although this is specifically written with the Raspberry Pi in mind, many other boards can function in a similar fashion. To begin, we need to understand the basics of filesystems since the SD card contains several partitions with different types of filesystems. On Linux and Unix operating systems, the df utility can be used to check disk space, mount points and filesystem types, among other things. It is a very useful tool to know what filesystems are available and where each is mounted, as some of these may vary by distribution or platform. Running the df -Th (-Th tells df to display filesystem types and the size in human readable format respectively) command on the Raspberry Pi, we can see that there are several different filesystems mounted.

Understanding the output of df is important to understand how Linux operates. The first thing to note from the output is there is a mount point called /boot. This existence of this mount point indicates that the device most likely booted off a filesystem. In many embedded systems, the operating system boots directly out of raw flash memory. Supporting a filesystem based approach makes it easier to upgrade and harder to brick. The user need only update the files on the SD card, reboot and off it goes. Now, you might be wondering how the Raspberry Pi booted off a filesystem. Well let’s take a look.

Filesystems – General Overview

A filesystem is a piece of software, usually part of the OS (but not necessarily) that stores data onto some type of non-volatile memory. It translates generic operating system calls such as open/close/read/write etc… into its own format and then stores the data on the media. There are many different types of filesystems, some of which are designed for general purpose use, while others are for specific types of hardware (i.e. flash, hard disk, etc…) or performance requirements (i.e. servers and data centres).

Let’s start by looking at some examples of different media. On embedded systems, the typical memory of choice is flash, so we will concentrate on this. However, there are many types of flash. The two main categories are raw flash and managed flash. In the category of raw flash memory the two most common types are NAND and NOR. These typically come with either a serial interface – such as SPI – or a parallel interface. Although NAND and NOR are fundamentally different technologies, they both work using the concept of sectors or blocks (sector is for NOR, block is for NAND). A block is the smallest section of memory which can be erased at a time. On these types of devices, you cannot write over existing data. When data needs to be written, the block must first be erased which sets all the bits to ‘1’. Once a the block is erased, data can be written in bytes, words or pages depending on the device.

As you can imagine, writing to a raw memory device requires some additional software to be able handle all these details. This can degrade performance and is extremely complex. To make things worse, erasing a blocks can take a very long time – on the order of seconds.

Example of timing characteristics of a Micron NOR Flash (datasheet)

Finally, both have a limited number of erase cycles. When erased too many times, blocks can go bad and are no longer able to be erased or can corrupt data when it is written or read. On a NOR flash device, this is typically upwards of 100000 erase cycles, while on for some higher density NAND devices it can be as low as 3000 program/erase cycles. The physics behind bad blocks is different between the two technologies, but the point is that special algorithms are required to spread writes across the device so that one specific block is not erased continuously. These algorithms are called wear levelling. Both wear levelling and keeping track of bad blocks is the responsibility of the filesystem. Examples of filesystems for raw flash memory devices are UBIFS, YAFFS.

The other category of flash memory technology is managed memory. While most often the underlying memory is NAND based, these devices manage bad blocks and wear levelling without any software interaction. They have their own memory controller and firmware built into them which does all the wear levelling and bad block management for you. SD cards as well as (e)MMC and USB keys are all examples of managed memory. Because managed memory handles everything flash specific in hardware, you can run any generic filesystem on the device with reasonable confidence.

SD cards typically use the FAT filesystem. FAT stands for file allocation table, which is in reference to the way the filesystem stores metadata for the volume. FAT has a few different flavours which depend on the device size such as FAT16, FAT32 and exFAT. FAT32 is the defacto standard for most bulk memory devices such as USB keys, SD and MMC cards. That is because it is well known and relatively easy to understand and implement, especially for resource constrained environments. FAT is sometimes referred to as vfat, or a DOS filesystem.

The Raspberry Pi Boot Partition

Devices such as the Raspberry Pi don’t just boot straight into Linux. They have a bootloader which runs first and loads the operating system. What you might not know though, is that there are actually several stages of bootloader that run, each one responsible for setting up some specific hardware feature in preparation for the next stage. This is very common on other devices in this class such as the Beaglebone. The first stage bootloader is often called the ROM boot because the firmware is burned directly on the chip during manufacturing so it cannot be updated. Hence it makes sense for the chip designers and firmware engineers to support a very well known, widely implemented, easy to implement filesystem such as FAT. The ROM boot on the Raspberry Pi actually runs on the GPU – not the CPU – and its job is to read the SD card’s FAT32 filesystem, search for a file called bootcode.bin, load that file into memory (cache at this point), and then run it. Cracking open the /boot directory we saw using df, we can begin to see how the boot process works.

We can see here in in the /boot directory there is the file called bootcode.bin. This is the second stage GPU bootloader. It is responsible for setting up the SDRAM and loading the GPU firmware from the SD card. SDRAM is required at this point most likely because the cache is not big enough to load the GPU firmware. The other component of the bootcode.bin is an ELF loader. This is required because the GPU firmware is start.elf. An ELF file is a type of object file which is created as the out of the compiler. It contains the machine code but also contains information about where to load each section of code into memory.

The GPU firmware is responsible for reading configuration data out of the SD card and loading the kernel. There are several of files in the directory which are part of this process:

  • kernel.img: the kernel itself in the zImage file format – a type of compressed image
  • fixup.dat: partitions the SDRAM between the GPU and CPU.
  • *.dtb: the device tree blobs and overlays (in the overlays directory)
  • cmdline.txt: the kernel command line parameters
  • config.txt: more OS configuration

We’ll go into these in a bit more detail shortly but to finish on up with the GPU firmware, once it has loaded all these files into memory, it then when control is transferred from the GPU to the CPU.

Device tree

All the files with the .dtb extension are device tree blobs (.dtb). The device tree is an open standard which is used to describe the hardware. Using a device tree, there can be a single kernel image compiled which can support multiple boards using the same processor. In older versions of the kernel, there was a C file which described the hardware, meaning the hardware support was compiled directly into the kernel. There are multiple .dtb files to support the multiple revisions and types of Raspberry Pi boards. The bootloader chooses the file that is applicable for the board it is running on and reads the correct device tree out of the SD card. It is then passed to the Linux kernel, which upon boot, loads the appropriate drivers based on what it finds in the device tree. Here is an example device tree file from the Raspberry Pi Model B. Device trees are written in plain text and using the syntax shown in the example. There are many specifications depending on the architecture, driver type, etc… The plain text device tree file that humans write has the extension .dts. It is compiled to a binary .dtb file using a device tree compiler.

Inside the boot directory there is another directory called ‘overlays’. This contains a several device tree overlays that can be enabled using the config.txt file. The GPU firmware combines the base device tree blob with any overlays that have been defined in config.txt. This allows you to easily add support for other hardware peripherals, assuming compatible support for them is compiled into the kernel (or included as a kernel modules).

Root Filesystem

There is one more crucial component required by the Linux kernel to boot – the root filesystem (rootfs). The kernel image only includes the low level components – the actual kernel, the drivers and some utilities. By itself, it is pretty much useless. The rootfs contains everything else that we think of when we use a Linux distribution – all the user level features and system utilities. After the kernel loads all the drivers, it looks at the rootfs and calls the init program. Depending on the distribution built, the initialization can change substantially, but basically it will load all the user facing features like the console and command line, user support, a graphical interface if there is one, etc… All the standard Linux utilities we use on a daily bases are built as part of the rootfs.

Going back to our output of df once again, we can see the first entry is /dev/root, which is mounted at /. This is the root filesystem. However, one thing to note here is that /dev/root is not actually a device. The kernel command line parameters tell us the actual device is on /dev/mmcblk0p7, where mmblk0 represents the SD card interface and p7 represents the partition on that device. Remember how we said earlier that the SD card was formatted with FAT32. Well, in reality only one partition was. The NOOBS installer actually partitions the SD card into several partitions. This partition happens to be for the rootfs and is formatted as ext4. Ext4 is a Linux specific filesystem – a journalling filesystem commonly used on almost every mainstream Linux distribution. But this is running on an SD card you say? Yes true, an SD card as we saw earlier is managed memory. Therefore we can treat it just as you would a regular hard drive. In fact, solid state drives (SSD) use the same memory technology as SD cards… there’s just much more of it.

So what is in the rootfs? Well the rootfs is defined by the Filesystem Hierarchy Standard (FHS) which was created by the Linux Foundation to help standardize across distributions. Let’s have a look:

Below is a brief description of each one, paraphrased from the specification with some of my own commentary:

/bin: directory where system executables (such as commands) go and are available to all users.

/boot: contains all the files required by bootloader to boot the kernel.

/dev: this directory is not actually part of the rootfs, it is created each time on boot and contains files that represent each of the devices which can be accessed in user space (look at the output from df to see that it is a special type of filesystem called devtmpfs).

/etc: where system and application configuration files are stored.

/home: contains another ‘private’ directory for each user. All of the user’s personal files are typically stored in here. My documents, desktop, download etc.. all make up the a user’s /home directory.

/lib: the directory where shared libraries and kernel modules are stored. When you compile and need to link in libraries, this is one directory where you would typically point the linker to.

/media: the default mount point for removable devices – for example, if you plug in a USB key

/mnt: Is the default mount point temporary filesystems (i.e. network shares)

/opt: Used to be where some third party applications are installed, supposed to be for add-on applications.

/proc: another virtual directory which is not really part of the root filesystem. In it are files which represent each of the processes in the system.

/root: the home directory for root

/run: a directory for applications to store data during runtime. This is not actually part of the rootfs, it is a temporary directory created at runtime.

/sbin: contains executables like the /bin directory, but these are typically for use only by the system and administrator.

/sys: another virtual filesystem which exposes some of the hardware interfaces in the kernel. It provides a way to see the kernel’s configuration of the hardware. Though not recommended, you can access some hardware through this interface (i.e. read/write GPIOs, LEDs, etc)…

/tmp: a temporary filesystem, not actually part of the filesystem. Essentially a RAM disk which can be used by applications and the system to store temporary files

/usr: a directory where user application binaries, libraries, header files and documentation are stored.Typically files in here are all read-only. In some distributions this directory is on a separate partition from the root filesystem.

/var: another directory where application runtime information can be stored so that it doesn’t end up in the /usr directory. Normally used for logs, locks. Some functions replaced by the /run directory but kept for compatibility.

An important thing to keep in mind is that different distributions, applications, etc… manage and use these directories differently. It is not always so cut and dry, and to maintain compatibility many distribution maintainers prefer to keep legacy files and directories around rather than deprecating them. The FHS specification contains guidelines on how applications should use these directories, but that doesn’t mean all application developers follow them. It is just a guideline and will vary even from version to version of the same distribution or application.

What is important to take away from all this is that the root filesystem is imperative to operation of Linux. You can boot the kernel, but not much will happen afterwards. The filesystem can be built to suit your own needs – include specific utilities, applications, etc.. giving you the freedom to have a really big filesystem – such as Ubuntu’s which requires 4GB straight out of the box – to a really small filesystem – I have built one that is approximately 12MB and it’s still too big! Either way, its up to you and it’s a great exercise to try out, but that’s for another tutorial!

Battle of the Standards: C90 vs. C99 – Round 3
Inline Functions

Before the C programming language was created and engineers programmed in assembly, optimizing code was a big part of the daily routine. Today, optimization is less the job of the engineer as it is the responsibility of the compiler. Most will agree that when writing code, readability and maintainability takes precedence over fancy or confusing optimizations. Engineers should only take optimization into their own hands when absolutely necessary. Modern compilers are much more intelligent than they used to be and are able to make ‘decisions’ on how to optimize code for speed and size. To help compilers with some of these decisions, the C99 standard introduced the inline keyword which suggests to the the compiler that the body of that function should be placed inline wherever it is called. What does this mean? Let’s take a deeper look.

C provides us with many different ways of reusing a common piece of code. Functions are the most obvious that come to mind, but there are also macros and gotos. Each have their own advantages and disadvantages. Functions are the safest option as the have typed arguments and return types and provide the best modularity of code. However they do have disadvantages. Functions have the highest overhead and therefore the greatest impact on performance. Having a lot of small functions is great for readability and structuring of code, but each function call will produce several instructions of additional code which consumes memory and CPU cycles. This is because every time a function is called, the function arguments must be loaded into registers (and/or the stack), the address of the next instruction is stored on the stack (so the processor knows where to return to) and finally jump to the function. Inside the function, stack space must be allocated and the arguments removed from registers and placed onto the stack. Local variables are also stored on the stack. Similarly, when exiting a function the stack has to be freed, the return value stored in a register, and then jump back to continue from where the function was originally called. To be fair, this seems seems like a huge process, but really it will boil down to a handful of instructions depending on the function and architecture. Nonetheless, if you are trying to write a complex program with hard real-time deadlines, it can have an impact on the ability to meet those deadlines.

A common ‘solution’ to these issues is to use macros instead of functions. Code defined by a macro is always executed inline. The precompiler searches the translation unit (C file) for instances of the macro name and replaces them with the defined code. This makes macros much faster and efficient in general than functions because there is no overhead, require no additional stack space and and can be better optimized by the compiler. Macros are not typed making them more flexible, but this can also be a double edge sword. Multi-line macros are more difficult to read than functions, and there are are other caveats like side-effects, cryptic compiler errors, and code bloat. Code bloat is when macros are scattered everywhere throughout the code. Since the same code is repeated over and over again each time the macro is called, additional program memory must be allocated. For example, if a macro compiles down to 10 bytes of machine code and is used 1000 times, that’s 10000 bytes of program memory required. If a function was used instead, there might be some function overhead, but the body of the function would only occupy 10 bytes of program space. Finally, macros (specifically function-like macros) are often frowned upon my many coding standards. The general rule of thumb is use macros to replace functions only when performance is an issue. However this is a broad generalization, and when a code base becomes big and complex, it may not necessarily be the best solution.

When C99 introduced inline functions, it tried to address the caveats of both functions and macros providing the best of both worlds while catering to the increasingly complex code being produced. Inline functions provide all the benefits of a normal function, but leave it up to the compiler to determine what is the best way to optimize the code – i.e. to inline or not to inline. This is based on a number of code characteristics as defined by the compiler algorithm and the compiler optimization level used. Inline is a keyword just like static or volatile. The difference with the inline keyword is that it is just a suggestion to the compiler. It does not have to inline the function just because a function was declared as such. By pointing out which functions the compiler might want to optimize, the compiler knows which functions to focus its optimization efforts on while keeping compilation time to a minimum. Until you have no other option, trust the compiler to do it’s job and let it optimize your code as much as possible the way best that it can.

To have a better understanding of how this all works and how the compiler can optimize code using inline functions, let’s look at simple example with a very common operation: max – take two values and return the bigger of the two. We will implement it as a regular function, a macro and an inline function, and then compare the output. The implementation used is trivial. first as a function:

int max(int a, int b)
    return (a > b) ? a : b;

To make an inline function, simply add the inline keyword in front:

inline int max (int a, int b)
	return (a > b) ? a: b;

And defined as a macro:

#define MAX(a,b)	((a > b) ? a : b)

Now let’s create a simple test function. Say we want to find the maximum value in an array of integers. In our test code, the array will be extern’d rather than defined in the same file. This is because if the compiler can see the array, it can optimize based the initialized values. The test function will traverse the array and compare the current maximum with the current element and store the greater of the two in a variable, which will be returned to the caller.

extern const int test_array[];
extern const size_t test_array_size;

int max_element(const int)
    size_t i;
    int max_value = test_array[0];

    for (i = 1; i < test_array_size; i++)
        max_value = max(max_value, test_array[i]);

    return max_value;

The variable max_value is initialized to the first member in the array, so we start iterating from index 1. Each iteration the higher value will be stored in the variable max_value.

To compile the code, I have used the following gcc arguments in my makefile:

msp430-gcc -mmcu=msp430g2553 -mhwmult=none -c -O0 -MMD -Wall -Werror -Wextra -Wshadow -std=c99 -Wpedantic

We start off by compiling the code compiled as regular function. Optimizations should be turned off by passing the argument -O0 so that the compiler doesn’t automatically inline our function (the compiler might inline your function even if its not defined as an inline function if optimizations are turned on). This can be confirmed using the objdump utility and look at the disassembled output.


Let’s take a look at the output in greater detail. Our first consideration is performance – i.e. speed – the less instructions / CPU cycles the better. We can see where the max function gets called because the call instruction is invoked and has a destination with the address the same as max (0xc13e). The two mov instructions prior are filling r12 and r13 – setting up the function arguments. The call instruction is not actually a single instruction – it represents two operations and consumes several CPU cycles (4-5). This includes saving the address of the next instruction on the stack (the one to return to) and then branching to the max function. Inside the max function, we have the overhead of the creating the stack and saving the two arguments to it (the first three instructions) and then at the end of the function the last two instructions are freeing the stack used by the function and then jumping back to the saved address before the call. This also consumes several CPU cycles. Actually calculating the overhead would required doing some research to find how many CPU cycles each instruction takes and then adding them up. That is beyond the scope of this post, so to perform an example calculation of overhead let’s say there are 15 additional CPU cycles required for function overhead. If the array is only 5 elements long, the overhead would consume 75 CPU cycles. Not a big deal right? But what if the array was 1000 elements. That would be 15000 CPU cycles. Executing this function once a second with a 1MHz CPU clock would result in 1.5% of the CPU cycles being used for this one function’s overhead!

The other characteristic we have to investigate is the size of the code image because that will determine how much memory (flash in the case of the MSP430) is required. Using the objdump utility to view the section headers (passing the argument -h), the size of the compiled code which is defined by the .text section is 0x284 (644) bytes. Keep in mind this size is for the whole image, not just this one function.


Let’s see how this will differ if we used a macro instead of a function. Implement the code with the max calculation as macro (as above) and compile it again using the same arguments. The output will look like this:


The first thing to note is that the max function is gone altogether, as expected. The code defined by macro has become part of the max_element function rather than a separate function. We can see the actual comparison occurs where the cmp instruction is invoked, similar to how it is implemented in the max function above. By comparing the number of instructions in these two examples, it is safe to say we will benefit from a pretty significant performance improvement using the macro.

In terms of code size, it has reduced quite a bit down to 0x26c (620) bytes. Of course this is expected only because the macro is called once. If it was called several times, the code size would start increasing.

Finally, let’s see the inline version of the code and see how that compares. Note that with optimizations turned off, the compiler does not inline the function – I had to gcc attributes to force it ( __attribute__((always_inline)) ). Compiling and getting the output…


Interestingly, it is not as efficient as the macro. There is still some overhead, possibly because the compiler inserts the inline function after both are compiled and therefore has to add some glue code. The code size has also increased to 0x27c (636), 16 bytes more than the macro, but still 8 bytes less than a regular function. Therefore, it appears that inline functions are less efficient than macros. 

To see if we can make the inline function as efficient as the macro, let’s turn on optimizations. Recompiling the same code with the -O2 option instead of -O0, we can see that both the inline function and the macro result in the same generated machine code, and are therefore equally efficient in terms of performance and space. No need to paste the same code twice, take my word for it or try it yourself and you will see that they are the same.

Keep in mind that just because in this example the generated machine code turned out to be the same, this may not always be the case. Optimizations use very complex algorithms and the optimal solution could change based on many variables that might not be easily spotted by basic analysis.

If we wanted to optimize for size instead of speed, we would use the argument -Os. In this example, the compiler will still inline the function because it is small and called only once. However, if the function was bigger and used in many places, the compiler may choose not inline the function.

All in all inline functions are a great tool to have. In fact, inline functions proved to be so useful that most compilers support them in C90 mode with extensions. Modern compilers are very advanced and will most likely do the best thing regardless whether a function is declared inline or not. However it doesn’t hurt to make the suggestion – the worst case is the compiler ignores you. +1 C99 for introducing inline functions.

Battle of the Standards: C90 vs. C99 – Round 2
Designated Initializers

In C, the initialization of structures can be dangerous if the structure definition ever changes. When a structure is initialized, the order of initializers must follow the structure definition otherwise the members will be assigned the wrong value. This may seem trivial, but take the following example. You have a module which is controlled using a message queue. When the module is first written, the message takes the following simple form.

enum message_id
    MSG_ID_READ = 0,

struct message
    enum message_id id; 
    uint8_t length;

A helper function is used to build and send the write message. With C90, the function might look like this:

static void send_write_msg(uint16_t length)
    struct message msg = {MSG_ID_READ, length};

Several years later there is an urgent need to add a priority to the message. Someone comes along and hastily modifies the structure to add the new member.

struct message
    enum message_id id; 
    uint8_t priority;
    uint8_t length;

But they add it in middle of the structure… The existing initializations will still compile without error since the second member in both the old and new structure definition –  priority and length respectively – have the same data type. As long the types match, the compiler has no way of knowing that the structure has changed and is now no longer initialized correctly. Only if the types are different and all the strict compiler checks are enabled will it throw an error if say, a uint32_t is assigned to a uint8_t. Even if existing code does not require this new member, the initialization will still be wrong. Developer testing or a code review might catch this type of error, but it would be even better if it could be avoided all together.

C99 introduced a new feature to address this called designated initializers. It is a way to explicitly initialize a member using a new syntax. The function to send a write message could be instead written to take advantage of this syntax.

void send_write_msg(uint8_t length)
    struct message msg = {.id = MSG_ID_READ, .length = length};

The member name is prefixed with a period (.) and then assigned the initializer. In the example above, the existing initialization code is still correct after the structure definition was updated. Therefore a structure can be changed without having to update all the initialization code if the new member is not required. So there are two advantages here: 1) it is safer, and 2) it can lead to more efficient development.

This feature also applies to arrays. Let’s say each message has a handler function. If the message IDs are mostly consecutive, the easiest and often most efficient implementation is to define an array of function pointers. A dispatcher then reads the incoming messages and invokes the appropriate handler.

typedef void (*message_handler_t)(uint8_t priority, uint8_t length);

static message_handler_t msg_handler[] =    
void message_dispatcher(void)
    struct message msg;

    while (1) {

        if ( < MSG_ID_MAX) {
            if (msg_handler[] != NULL) {
                msg_handler[](msg.priority, msg.length);

Defining the array of message handler seems pretty trivial. However, this is because there are only two messages in this example. If you had 100 messages, keeping track of which handler goes where in the array can be difficult. How do you verify that the handler is at the right index in the array without manually counting? Especially when you may be implementing the handlers out of order. Well, designated initializers in C99 can help here too. Just like with the structures, we can tell the compiler at which index the initializer is intended for.

static message_handler_t msg_handler[] =
    [MSG_ID_READ] = read_handler,
    [MSG_ID_WRITE] = write_handler

The nice thing about this syntax, is that the array doesn’t have to be initialized in order or even consecutively anymore. And since it explicitly states which index (or in this case message) the initializer is intended for, it makes it easier to read with really big arrays. However you do have to be careful since you can inadvertently create a really big array for no reason. For example, if the first message ID is non-zero and starts at 200 instead – i.e. MSG_ID_READ = 200 – using the ID as a designated initializer will force the compiler to allocate an array of 202 elements, even though the first 200 are unused. This is obviously undesirable, and you might consider using an offset instead of the actual message ID.

Overall, designated initializers in C99 are a definite improvement over C90. Being able to initialize both structures and arrays explicitly can improve the readability and maintainability of your code. And with that, I believe that brings the score to C99:2 – C90:0.

Battle of the Standards: C90 vs. C99 – Round 1
Non-Constant Initializers

I have a confession to make…

I love C90.

Yes, I said it. Strict, pedantic ISO C90 – also known as ANSI C. You might be wondering why I support a 30+ year old standard. You might even be wondering why you ever started following website in the first place. But before you start telling me how wrong I am, let me explain myself. When using strict ISO C90, I feel like it will force me to produce more portable, robust and clean code than say using the C99 standard. I also like to think that if I use the most mature standard, the compiler will also be more mature, more stable, quicker and produce better machine code (the man pages for gcc v4.8.2 still say C99 is only partially supported). Many of the most stringent coding standards used in high reliability/safety critical systems only support ANSI C. There could be many reasons for this – legacy code, legacy compilers, etc… But to be quite honest, my love for ISO C90 is partly (if not mostly) in my head. C99 introduced a wide variety of new features to the C programming language. A lot of them are actually good, if not great. But others are just dangerous and can’t be justified in my opinion and for this reason I stay away from the newer standard as much as possible.

But no more. I will [try to] embrace its positive aspects. Over the course of this mini-series, I will show you examples of both the good and the bad of C99. Hopefully at the end of it, we will come to the conclusion that if used correctly, the C99 standard can help in writing more clear, robust and optimized code than with C90.

One side note here before we get started. Whether you are using C90 or C99, it is highly recommended to disable compiler specific constructs and extensions when possible. There are times when they are needed, but their use should be limited to hardware specific code and should not be scattered around in every source file. Try porting code that randomly assigns variables to hardcoded memory locations to a different hardware platform…

Now, let’s get started shall we?

Round 1: Initialization of complex data types

The first difference between C90 and C99 we will explore is the initialization of complex data types within a function. A complex data type is an array, structure or union. In C90, all initializers of complex data type must be constant expressions. A constant expression is one that can be evaluated at compile time. This means for example, you cannot pass in an argument as an initializer. Similarly, a function call cannot be used as an initializer. In C99, both of these are possible. This is a very useful feature in many circumstances. Let’s say you have a driver for a command based device (think SPI NOR). You write a small helper function to support a specific command which requires one address byte and one data byte following the command (three bytes in total assuming 8-bit commands). The address and data are passed as arguments by the calling function. Let’s take a look at how we have to do this in C90.

int spi_nor_cmd_xxx (uint8_t address, uint8_t data)
    uint8_t cmd[3];
    cmd[0] = 0x50; /* Don't use magic numbers !! */
    cmd[1] = address;
    cmd[2] = data;
    return spi_nor_write(cmd, sizeof(cmd));

So what is wrong with that? Technically, nothing at all (other than the magic number 🙂 it’s only there to show it is constant). But there are three points I would consider which would make this better.

  1. The array cmd could be defined as a constant. Once initialized, it should never change.
  2. The array is initialized with a defined length of three (3) rather than allowing the compiler to determine the length based on the number of initializers. This could lead to copy-paste errors if multiple of these helper functions are implemented (yes… I shouldn’t have used magic numbers either).
  3. It takes 4 lines of code to initialize this three byte command.

All three of these points can be addressed by using C99 and its support for non-constant initializers.

int spi_nor_cmd_xxx (uint8_t address, uint8_t data)
    const uint8_t cmd[] = {0x50, address, data};
    return spi_write(cmd, sizeof(cmd));

That looks much better in my opinion. With the C99 version, the array is now constant and it is defined in a single line. There is less room for error because the array size is defined based on the number of initializers.

An important factor which must be considered: does this affect the output? Is there any change in performance, memory usage, ability to optimize? I would assume not considering it is really the exact same code functionally, however this could vary by compiler. I tested this theory using gcc v4.8.2 (x86) on Linux. I chose to compile for x86 instead of the usual msp430 because I figured since x86 assembly is vastly more complex, if there is anyway to optimize one over the other, it would do so. To compile with strict C90 enabled and all GNU extensions disabled, the command line arguments ‘-ansi -Wpedantic’ must be used. The ‘-ansi’ argument is equivalent to ‘-std=c90’ in gcc. The ‘-Wpedantic’ flag tells the compiler that it must check for strict ISO compliance. Compiling the C90 version of code produces the output below (using objdump).


Compiling the C99 version with ‘-std=c99’ to enable C99 but not the GNU extensions produces the output below.


Comparing the two we can see that they are exactly the same. Compiling again with optimizations set to O2 shows no difference. Compiling once more with msp430-gcc shows the same results. Interestingly enough, msp430-gcc produced more efficient code than x86-gcc in terms of code size / number of operations – guess my hypothesis was wrong. This is by no means an exhaustive set of tests but it shows that the compiler recognizes these two examples as two different forms of the same construct. However, this is a very simple example, and if you had more variables and code which might change the order of execution, the output will undoubtedly change as well.

Can you think of any downsides to the C99 form? Potentially. Let’s say one of the initializers was a function call which returns a uint8_t. If the function call can fail and return an error instead of a valid value (as it should), then I would not recommend this approach. Instead the function should be called first, the return value checked, and then if and only if it is sane, the command can be initialized. The wrong/dangerous way:

int spi_nor_cmd_xxx (uint8_t data)
    const uint8_t cmd[] = {0x50, saved_data_get_address(), data};
    return spi_write(cmd, sizeof(cmd));

The better way:

int spi_nor_cmd_xxx (uint8_t data)
    int err = -1; 
    const uint8_t address = saved_data_get_address(0);

    if (address > 0) {
        const uint8_t cmd[] = {0x50, address, data};
        err = spi_write(cmd, sizeof(cmd));

    return err;

What about all these new stack variables I just allocated? Isn’t that less efficient? Well maybe, but the compiler should easily optimize all that out. Plus, if you were to implement the same code in C90 correctly, you would have the exact same issue. So yes, here with C99 we could have written more concise code (although arguably maybe more difficult to read), however it would not meet our standards for error checking. So use this feature wisely.

Keep in mind that all this applies to the initialization of structures as well. Say we have a similar example as above, but instead of a byte array being written to a device, the command is a message which is going to be queued for use by a different code module.

struct message
    uint8_t cmd;
    uint8_t address;
    uint8_t data;

Again, we want to make a simple helper function to send a specific message. In C90, it would look like this:

int send_message_xxx(uint8_t address, uint8_t data)
    struct message msg;
    msg.cmd = 0x50; /* Don’t use magic numbers !! */
    msg.address = address; = data;


Ok, not all that bad. Let’s see how it looks with C99.

int send_message_xxx(uint8_t address, uint8_t data)
    const struct message msg = {0x50, address, data};
    return send_message(&msg);

As with the example using arrays, the C99 version of this code is cleaner, more concise and allows us to define the message as a constant. However, this type of initialization for structures can be dangerous. If my colleague adds a new member to the structure right after cmd, the initialization code in C99 would be wrong. But as we all know, this a general problem with initializing structures in C. If the structure was initialized outside a function, you don’t even have the option of explicitly assigning the initializers. With C90 at least…

In the next round of Battle of the Standards, we will see that C99 addresses this issue quite elegantly. But for today, I conclude that  initializing complex data types with non-constant initializers is a feature of C99 which I certainly accepted in my toolkit. It has a few key benefits, and virtually no pitfalls. +1 for C99.

Thank you for joining us for the unveiling of the LaunchPad Explorer!

We would like to thank everyone for their support and for all those joined us live for the unveiling of the LaunchPad Explorer! If you were not able to attend the live webinar (or just wan’t to watch it again), check out the recorded copy along with the slides from the presentation

Join us for the unveiling of the LaunchPad Explorer!

There is only so much we can do in terms of peripherals with our LaunchPad alone. It was designed to mate with a BoosterPack. There are plenty of shields out there, but to cover everything, we would need – well – a lot of shields. Instead I decided to design a BoosterPack specifically tailored to the content you have asked for.

I have officially teamed up with Kevin (see hardware tutorials) to bring you a new BoosterPack that we will be using to cover all the topics for the next while. This board has it all, enough to push our little MSP430G2553 to it’s limits. I won’t tell you much more about it here, but Kevin and I will be unveiling the board at a TI hosted webinar on November 17th, 2015 @ 10:00am (CST). The link to the webinar is:

Please share this link with your friends and colleagues. If you can’t watch it live don’t worry, the webinar will be recorded and I will post the link after.

Lesson 11: Timing Event now Available

In this latest tutorial, we learn how to time events using the timer module’s capture feature and then implement a simple stopwatch integrated into the menu. After you have read through the tutorial, if you are inclined to do some hardware try and implement the stopwatch using an additional push button on a breadboard! If you want to share your success (or failure) send me an email with the details and I’ll post it on the site.

Debugging with GDB, mspdebug and TI MSP430-GCC

It has been a while that I wanted to have the option of using gdb instead of mspdebug. Don’t get me wrong, mspdebug is excellent, but sometimes it helps to have a bit more powerful and mature debugging tools. When I first investigated using mspdebug along with gdb in a server/client setup, I ran into a connection problem which I did not have the time or the desire to solve. I pretty much just gave up, that is until [yeltrow] wrote to me and told me how he got it working. He even has a nice write-up about it here. I couldn’t resist, I had to try this out ASAP. So I got to work following his instructions but unfortunately ended up with the same error. I updated my compiler to the latest version – still no luck. Finally, I realized that my mspdebug was version 0.22 while his was 0.23. Could this be it? Well, turns out this latest version hasn’t been updated on the Ubuntu repositories. This time, I am not giving up. So I compiled it from source. Luckily it’s really quick and simple.

  1. Remove the old version of mspdebug
    sudo apt-get remove mspdebug
  2. Clone/download the mspdebug repository from here
  3. Install readline devel – this is for command line autocomplete, history etc…
    sudo apt-get install libreadline6-dev
  4. Compile
  5. Install
    make install

After updating mspdebug to the latest version, success!!

Before you start debugging, you absolutely must modify the CFLAGS in the makefile to include the compiler flags [yeltrow] figured out worked. Those would be:

  • -O0: turn off optimizations
  • -g3: include the highlest level of debugging information
  • -gdwarf-2: set the debugging information format to dwarf 2
  •  -ggdb: produce debugging information for gdb

So now your CFLAGS should look like this:

CFLAGS:= -mmcu=msp430g2553 -mhwmult=none -c -O0 -g3 -ggdb -gdwarf-2 -Wall -Werror -Wextra -Wshadow -std=gnu90 -Wpedantic -MMD -I$(INC_DIR)

The makefile has been updated in the github repository so you can just pull and the changes will be there. Now we need to do a rebuild to make sure all the debugging information is included.

make clean && make

Start mspdebug as we normally do and program the the application to the board. Instead of running the program from mspdebug, you will start the gdb server by typing the command ‘gdb’. This starts the server on port 2000 of localhost. The job of the gdb server is to translate gdb commands into hardware actions and communicate these actions to the hardware. Mspdebug is no longer the debugger, it’s really just a translation layer. The client, which is gdb, is the debugger. As with the rest of the toolchain, we must run the cross-compiled version under /opt/msp430-toolchain/bin.

/opt/msp430-toolchain/bin/msp430-gdb build/bin/app.o

Passing the object file as an argument tells gdb to load the symbol table from that file. Now we can connect gdb to the mspdebug server and start running.

target remote localhost:2000

The ‘continue’ command is the gdb equivalent to ‘run’ in mspdebug. Now you can start using gdb to debug your code. If you are unfamiliar with gdb, I would start here and work through the ‘breakpoint’ and ‘continuing and stepping’ sections. Enjoy!