It’s All In The Libs – Building A Plugin System Using Dynamic Loading

Shared libraries are our best friends to extend the functionality of C programs without reinventing the wheel. They offer a collection of exported functions, variables, and other symbols that we can use inside our own program as if the content of the shared library was a direct part of our code. The usual way to use such libraries is to simply link against them at compile time, and let the linker resolve all external symbols and make sure everything is in place when creating our executable file. Whenever we then run our executable, the loader, a part of the operating system, will try to resolve again all the symbols, and load every required library into memory, along with our executable itself.

But what if we didn’t want to add libraries at compile time, but instead load them ourselves as needed during runtime? Instead of a predefined dependency on a library, we could make its presence optional and adjust our program’s functionality accordingly. Well, we can do just that with the concept of dynamic loading. In this article, we will look into dynamic loading, how to use it, and what to do with it — including building our own plugin system. But first, we will have a closer look at shared libraries and create one ourselves.

Note that some details may vary on different architectures, and all examples in here are focusing on x86_64 Linux, although the main principles should be identical on other systems, including Linux on ARM (Raspberry Pi) and other Unix-like systems.

Building A Shared Library

The first step toward dynamically loadable libraries is the normal shared library. Shared libraries are just a collection of program code and data, and there is nothing too mysterious about them. They are ELF files just like a regular executable, except they usually don’t have a main() function as entry point, and their symbols are arranged in a way that any other program or library can use them as needed in their own context. To arrange them that way, we use gcc with the -fPIC option to generate position-independent code. Take the following code and place it in a file libfunction.c.

int double_me(int value)
{
    return value + value;
}

Yes, that’s all there is going to be, a simple function double_me() that will double a given value and return it. To turn this into our own shared library libmylib.so, we first compile the C file as position-independent object file, and then link it as a shared library:

$ gcc -c -fPIC libfunction.c
$ gcc -shared -o libmylib.so libfunction.o

Of course, we can combine it into a single call to gcc and avoid the intermediate object files. Note that you might want to add a soname with the -Wl,-soname, option, and add some versioning to the output file, but for simplicity, we leave that out now.

$ gcc -shared -fPIC -o libmylib.so libfunction.c

Either way, we now have our own shared library libmylib.so, so let’s go right ahead and use it.

// file main.c
#include <stdio.h>

// declare the function, ideally the library has a .h file for this
int double_me(int);

int main(void)                 
{
    int i;
    for (i = 1; i <= 10; i++) {
        // call our library function
        printf("%d doubled is %d\n", i, double_me(i));
    }
    return 0;
}

Now we just have to remember to link against our library when we compile the file, and add our current work directory to the list of paths gcc should look into to find the libraries. Keep in mind that library file names are expected to be in the form of liblibrary_name.so and are then linked via -llibrary_name.

$ gcc -o main main.c -L. -lmylib

This should keep the linker happy and output us our main executable. But what about the loader? Will it automatically find our library?

$ ./main
./main: error while loading shared libraries: libmylib.so: cannot open shared object file: No such file or directory

Well that’s a big nope, and it shows that telling the linker (part of the compiler suite) about our library won’t make the loader (part of the OS) magically know about its location. To find out what libraries are required, along with the loader’s situation of resolving those dependencies, we can use the ldd command. To get some more debug output from the loader, we can set the LD_DEBUG=all environment variable when calling our executable.

So in order to make the loader find our library, we have to tell it where to look, either by adding the correct directory to the LD_LIBRARY_PATH environment variable, or by adding it to the ldconfig paths in either /etc/ld.so.conf inside the /etc/ld.so.conf.d/ directory. Let’s try it with the environment variable for now.

$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./main
1 doubled is 2
2 doubled is 4
...
10 doubled is 20
$

Yes, the loader will now find our library and successfully run the executable.

Dynamically Loading A Shared Library

For our next trick, we will use dynamic loading to read the library into our code at runtime. Once loaded, we can search for symbols in it and extract them to pointers, and then use them as if the library was linked in the first place. Unix and Unix-like systems provide libdl for this. Let’s have a look how we can call our double_up() function this way.

// dynload.c
#include <stdio.h>
#include <dlfcn.h>

int main(void) {
    // handle for dynamic loading functions
    void *handle;

    // function pointer for the library's double_me() function
    int (*double_me)(int);

    // just a counter
    int i;

    // open our library ..hopefully
    if ( (handle = dlopen("./libmylib.so", RTLD_LAZY)) == NULL) {
        return 1;
    }

    // try to extract "double_me" symbol from the library
    double_me = dlsym(handle, "double_me");
    if (dlerror() != NULL) {
        dlclose(handle);
        return 2;
    }

    // use double_me() just like with a regularly linked library
    for (i = 1; i <= 10; i++) {
        printf("%d doubled is %d\n", i, double_me(i));
    }

    dlclose(handle);
    return 0;
}

We try to load our library using the dlopen() function, which returns a generic pointer handle on success. We can then find and extract the double_me symbol from our library with dlsym(), passing the previously returned handle to it. If the symbol is found, dlsym() returns its address as void *, which can be assigned to a (preferably matching) pointer type representing the symbol. In our case, a function pointer that takes an int as parameter, and returns int, just like our double_me() function. If all succeeded, we can call the freshly extracted double_me() function as if it was there from the very beginning, and the output will be just the same. Just remember to link against libdl when compiling.

$ gcc -o dynload dynload.c -ldl
$ ./dynload
1 doubled is 2
2 doubled is 4
...
10 doubled is 20
$

There we go, instead of linking at compile time, we’ve now loaded our library at runtime, and after extracting our symbols, we can use it just as before. Admittedly, using dynamic loading solely as a replacement for the linker isn’t too useful on its own. A more common use for dynamic loading is to extend a program’s core functionality by integrating a plugin system that allows the users to add external components as they need them. A prime example is the Apache webserver that has an extensive list of modules to add individually as one pleases. Of course, we will focus on a much simpler approach here.

Building Your Own Plugin System

Take the good old kids’ game Telephone (or Chinese Whispers, Whisper Mail, Broken Phone, etc). Someone starts with a message and it gets whispered around, and the last child says what the initial message was supposed to be. Well, this sounds like something a bunch of plugins could do by passing a message from one to the other, slightly messing up the data as we go. We’ll write the code to run the telephone system, and anyone can contribute a kid/plugin.

As an API, let’s say that the plugin takes a pointer to the message and a length as parameters and alters the message directly in memory. Let’s simply call it process, so the function would look like void process(char **, int). This is what a plugin with a process() function that sets every second character to uppercase could look like:

// file plugin-uppercase.c
include <ctype.h>

void process(char **message, int len)
{   
    int i;
    char *msg = *message;

    for (i = 1; i < len; i += 2) {
        msg[i] = toupper(msg[i]);
    }
}

Let’s turn it right away into a uppercase.plugin file, and assume we have two more plugins, increase.plugin that increases each digit it finds, and leet.plugin that makes our message just that: l337.

$ gcc -shared -fPIC -o uppercase.plugin plugin-uppercase.c
$ gcc -shared -fPIC -o increase.plugin plugin-increase.c
$ gcc -shared -fPIC -o leet.plugin leet-replace.c
$

Our main program would then take a message as first argument, and an arbitrary number of plugin files as the rest of the argument list. It will load the plugins one by one, pass the message along from one plugin to the other through their process() functions, and then print out the result. (For focus, we’re pretending that we live in a perfect little world where errors do not happen.)

// file telephone.c
#include <stdio.h>
#include <string.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
    void *handle;
    void (*process)(char **, int);
    int index;

    if (argc < 3) {
        printf("usage: %s <message> <plugin> [,<plugin>,...]\n", argv[0]);
        return 1;
    }

    // argv[1] is the message, start from index 2 for the plugin list
    for (index = 2; index < argc; index++) {
        // open next plugin
        handle = dlopen(argv[index], RTLD_NOW);
        // extract the process() function
        process = dlsym(handle, "process");
        // call the process function, modifying argv[1] directly
        process(&argv[1], strlen(argv[1]));
        // close the plugin
        dlclose(handle);
    }

    // print the resulting message
    printf("%s\n", argv[1]);
    return 0;
}

Just like before, we load the plugin file (a dynamically loaded shared library), extract the function we need, and execute it — only this time in a loop. So let’s compile and test it.

$ gcc -o telephone telephone.c -ldl
$ ./telephone "hello hackaday" ./uppercase.plugin
hElLo hAcKaDaY
$ ./telephone "hello hackaday" ./uppercase.plugin ./leet.plugin 
h3lL0 h4cK4D4Y
$ ./telephone "hello hackaday" ./uppercase.plugin ./leet.plugin ./increase.plugin 
h4lL1 h5cK5D5Y
$

As expected, with each plugin altering the input message in their own way, the amount and order of plugins given as parameter to our main program will affect the final message. Now this may not count much as data processing example, but the same concept can of course be used for some more useful scenarios. If you’re curious about the full implementation, you can find it on GitHub. Also note that our main program has never changed, and if we decide to make adjustments to one of the plugins, we only have to recompile that one plugin. We could even add mechanisms to the main program to reload the plugins, and we wouldn’t even have to restart the main program itself.

Raspberry Pi GPIO Monitor

One of those more useful scenarios that would follow the same principles could be a program that monitors the GPIO pins on a Raspberry Pi. We’d have different plugins that can all handle any information our main program reads from the GPIOs. Each plugin would have a set of basic functions it can implement: a function for the plugin setup phase, one to handle each GPIOs state change, and one to tear down the plugin when it’s not used anymore. One plugin could handle input change on one pin to change the state of an output pin, another one could perform some tasks when one specific input pin gets high, and a third one could just write all state changes to a log file.

In the end, the dynamic loading part won’t be much different than in the previous example, and going into the details of such a GPIO monitor would go beyond the scope of this article. However, we wouldn’t mention it if we hadn’t implemented it, so a basic GPIO monitor can be also found on GitHub.

Where To Go From Here

With dynamic loading, we have seen an alternative approach to compile-time linking that makes it easier to extend our program’s main functionality with external libraries. While it adds a bit of complexity to extract the symbols from a library, the main principles are rather simple and straightforward: you open a library, you extract symbols from it, and you close it.
However, this simplicity also has its shortcoming: in order to extend a program’s functionality through dynamic loading, we need to know beforehand what we can find in the loaded library or plugin. We cannot simply add a completely new function and hope that our program will magically know about it on the fly. But if you design your core program with these limitations in mind, dynamic loading will give you a flexible way to extend functionality as needed.

Note that we’ve opened up a Pandora’s box of security issues. If arbitrary external functions can run within our main code, it’s only as secure as the libraries that it dynamically links to. Abusing this trust is the basis of DLL injection attacks or DLL hijacking. If an attacker can fool the operating system into feeding the calling program their dynamically loadable library, they’ve won.

Since dynamic loading will need the support from the operating system, so this isn’t really anything for an 8-bit microcontroller environment. You will always have function pointers though.

Some Words On Looking Up Symbols

You may be thinking that if dlsym() can resolve symbols in a dynamically loaded file, there must be a way to also find the available symbols in the first place, maybe get a whole list of them. Well, yes, common binutils tools such as readelf or nm do just that, with the help of the Binary File Descriptor library libbfd. Also, the GNU extension of our dynamic loading library libdl offers the dlinfo() function to obtain further information about the loaded file. Some further reading about the ELF file format is recommended before you go down that rabbit hole.

from Blog – Hackaday https://ift.tt/2N98LVn
via IFTTT

Thoughts or comments?