Validating strings at compile time

In my previous post, I looked at one way of solving a problem when interfacing with a library that takes a string argument with a list of parameters, such as the pledge(2) system call in OpenBSD.

In my previous post, I looked at one way of solving a problem when interfacing with a library that takes a string argument with a list of parameters, such as the pledge(2) system call in OpenBSD.

In that post, we used a list of enum values that we formatted as a string at compile time. There is an alternative approach to this problem though: what if we validated the string we pass at compile time?

Let's remind ourselves of the problem first. The pledge(2) system call takes a string argument with a list or promises as its first argument. This list can only contain the pre-defined promises that the system call understands, and passing any unknown values will result in an EINVAL error at run time. As the parameter is a simple string, it's fairly easy to make a spelling mistake that won't be caught until runtime. It would be nice if we could catch any errors relating to the format of the string at compile time instead.

In the previous post, I only looked at the Nim programming language but I want to expand this post to cover some other languages too — namely D and Zig.

Problem definition and approach

Rather than implementing a full-blown pledge(2) wrapper in each language, let's take inspiration from it and define a minimal problem to solve.

We should have a function which takes a single string argument, which should be a list of promises separated by spaces. The supported promises should be:

  • stdio
  • rpath
  • wpath
  • cpath
  • dpath

The function should validate that the given string doesn't contain any unknown promise entries at compile time and at run time should simply print the provided promises.

Approaching the problem in D

D is described as follows:

D is a general-purpose programming language with static typing, systems-level access, and C-like syntax. With the D Programming Language, write fast, read fast, and run fast.

We'll start by defining an enum to cover the possible promises. This should be immediately familiar to anyone with experience of C-like languages:

enum Promises
{
    stdio,
    rpath,
    wpath,
    cpath,
    dpath
};

Then we'll write a function which will validate a string to ensure it only contains space separated entries from this enum:

bool validatePromises(string promises)
{
    import std.algorithm.iteration : splitter;
    import std.conv : parse, ConvException;

    auto promisesSplitter = promises.splitter(' ');

    foreach (s; promisesSplitter)
    {
        try
        {
            parse!Promises(s);
        }
        catch (ConvException)
        {
            return false;
        }
    }

    return true;
}

There are a few interesting aspects to this function:

  • Functions in D can have local imports — in this case, we import the splitter function from std.algorithm.iteration and the parse function and ConvException type from std.conv.
  • The splitter function is called using Uniform Function Call Syntax.
  • The splitter function returns a range that yields slices of the original input string.
  • We iterate over the returned range, trying to parse each slice as a member of the enum. If it fails, we return false to indicate the promise string is invalid.

Now to call this function at compile time from our main pledge function. We'll do this using a template. In D, functions can have two sets of arguments — a set of compile time arguments and a set of run-time arguments. We'll use the compile time arguments here, with a static assert which allows you to assert a boolean condition at compile time.

void promise(string promises)()
{
    static assert(validatePromises(promises), "invalid promises: " ~ promises);

    import std.stdio : writeln;

    writeln("valid promises: ", promises);
}

In order to test the implementation, we'll use some conditional compilation so that we can validate that our validation is actually happening at compile time. Let's add a main function:

void main()
{
    version (invalidPromises)
    {
        promise!"stdio foobar";
    }
    else
    {
        promise!"stdio rpath wpath";
    }
}

We use the version condition to check a compilation definition in order to decide to use a valid string of promises or an invalid string of promises. If we set the version to invalidPromises, the build should error out.

Building and running the valid version

To build and run the version with the valid promises string, simply run dmd -run main.d (assuming you named the source file main.d). You should see some output like the following:

valid promises: stdio rpath wpath

Building and running the invalid version

To build and run the version with the invalid promises string, simply run dmd -version=invalidPromises -run main.d (assuming you named the source file main.d). You should see some output like the following:

main.d(34): Error: static assert:  "invalid promises: stdio foobar"
main.d(45):        instantiated from here: promise!"stdio foobar"

As you can see, the build failed and printed out the message we provided to the static assert call.

Approaching the problem in Nim

Nim is described as follows:

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula.

Much like the D version above, we'll begin by defining an enum and a function to verify a promises string:

from strutils import split, parseEnum

type Promises = enum
  stdio,
  rpath,
  wpath,
  cpath,
  dpath

proc validatePromises(promises: string): bool =
  result = true

  for s in promises.split(' '):
    try:
      discard parseEnum[Promises](s)
    except ValueError:
      return false

Interesting features to note here are:

  • Functions in Nim are known as proc.
  • Nim has an implicit result variable to return a result from a function — in validatePromises we set this to true and if parsing fails we return false instead.
  • Nim requires you to explicitly ignore return values from function calls using discard — in this case we must ignore the parsed enum value.
  • The split function is called using Uniform Function Call Syntax.
  • The split function is actually an iterator that yields slices of the original string.
  • Generic functions are called with square brackets ([, ]) - parseEnum is a generic function where we specify the type of the enum to parse as Promises.

Again, we need to call this function at compile time. In Nim, we can do this using a standard doAssert within a static block. The static block enforces the content to be evaluated at compile time. We use a doAssert as we don't want the assertion to be disabled, where a standard assert can be disabled. The parameter to the function must also be declared as a static[string] meaning the parameter must be known at compile time.

proc promise(promises: static[string]): void =
  static: doAssert(validatePromises(promises), "invalid promises: " & promises)

  echo "valid promises: ", promises

Testing the implementation is again done using some conditional compilation. Nim allows us to pass custom definitions during compilation, and we can check if a value is defined with the defined procedure.

when isMainModule:
  when defined(invalidPromises):
    promise "stdio foobar"
  else:
    promise "stdio rpath wpath"

If we define the invalidPromises compile time symbol, the promise procedure will be called with an invalid promise string.

An interesting note here: in Nim, functions can be called with or without parentheses! The call to the promise function can be written as promise "stdio foobar" or as promise("stdio foobar").

Building and running the valid version

To build and run the version with the valid promises string, simply run nim c --hints:off -r main.nim (assuming you named the source file main.nim). You should see some output like the following:

valid promises: stdio rpath wpath

Building and running the invalid version

To build and run the version with the invalid promises string, simply run nim c -d:invalidPromises --hints:off -r main.nim (assuming you named the source file main.nim). You should see some output like the following:

stack trace: (most recent call last)
.../main.nim(20, 11) promise
.../nim-1.2.0/lib/system/assertions.nim(29, 26) failedAssertImpl
.../nim-1.2.0/lib/system/assertions.nim(22, 11) raiseAssert
.../nim-1.2.0/lib/system/fatal.nim(49, 5) sysFatal
.../main.nim(26, 13) template/generic instantiation of `promise` from here
.../nim-1.2.0/lib/system/fatal.nim(49, 5) Error: unhandled exception: .../main.nim(20, 19) `validatePromises(promises)` invalid promises: stdio foobar [AssertionError]

As you can see, the build failed and printed out the message we provided to the doAssert call.

Approaching the problem in Zig

Zig is described as follows:

Zig is a general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

Much like the D and Nim implementations above, we'll begin by defining an enum and a function to verify a promises string:

const std = @import("std");
const tokenize = std.mem.tokenize;
const stringToEnum = std.meta.stringToEnum;
const warn = std.debug.warn;

const Promises = enum {
    stdio,
    rpath,
    wpath,
    path,
    dpath,
};

fn validatePromises(promises: []const u8) bool {
    var splitter = tokenize(promises, " ");

    while (splitter.next()) |s| {
        _ = stringToEnum(Promises, s) orelse return false;
    }

    return true;
}

Interesting features to note here are:

  • Zig has no specific string type — instead, you use byte slices ([]const u8 is a slice of constant byte values).
  • The call to tokenize returns an iterator struct, which has a next function which returns a nullable value. So long as an item can be read, the returned value is not null, but once the end of the source is reached null is returned. The while loop will exit when null is returned. The non-null value is passed through s into the loop body.
  • Zig requires you to explicitly ignore function return values — here we do so by assigned the result of stringToEnum to the special _ variable.
  • Zig’s error handling uses return values rather than exceptions, but the language has some handy syntactical features to make dealing with errors easier. Here, if stringToEnum returns an error we immediately return false from the function.
  • Generics in zig are implemented using the language’s comptime functionality.

As before, we'll call this function at compile time. Zig makes working with parameters and calling functions at compile time extremely easy using the language's comptime facilities.

fn promise(comptime promises: []const u8) void {
    comptime {
        if (!validatePromises(promises)) {
            @compileError("invalid promises: " ++ promises);
        }
    }

    warn("valid promises: {}", .{promises});
}

This code is fairly self explanatory — it takes a compile time known string parameter named promises and then runs the validatePromises() function at compile time, emitting a compiler error if validation fails.

We can once again test the implementation with some conditional compilation. In Zig, we do this by specifying a build argument in the build.zig for the project, then referencing it.

In the build.zig, we'll add an option called invalidPromises. My whole build.zig is quite small, so I'll just repeat it here:

const Builder = @import("std").build.Builder;

pub fn build(b: *Builder) void {
    const target = b.standardTargetOptions(.{});

    const mode = b.standardReleaseOptions();

    const invalidPromises = b.option(
        bool,
        "invalidPromises",
        "Set to true to use invalid peldges",
    ) orelse false;

    const exe = b.addExecutable("app", "src/main.zig");
    exe.setTarget(target);
    exe.setBuildMode(mode);
    exe.install();

    exe.addBuildOption(bool, "invalidPromises", invalidPromises);

    const run_cmd = exe.run();
    run_cmd.step.dependOn(b.getInstallStep());

    const run_step = b.step("run", "Run the app");
    run_step.dependOn(&run_cmd.step);
}

Zig's build system is one of the language’s most interesting features in my opinion — the whole build can be configured using Zig code!

We can then use this option in our code:

const build_options = @import("build_options");

pub fn main() anyerror!void {
    if (build_options.invalidPromises) {
        promise("stdio foobar");
    } else {
        promise("stdio rpath wpath");
    }
}

Building and running the valid version

To build and run the version with the valid promise string, simply run zig build run. You should see some output like the following:

valid promises: stdio rpath wpath

Building and running the invalid version

To build and run the version with the invalid promise string, simply run zig build run -DinvalidPromises=true. You should see some output like the following:

./src/main.zig:28:13: error: invalid promises: stdio foobar
            @compileError("invalid promises: " ++ promises);
            ^
./src/main.zig:37:16: note: called from here
        promise("stdio foobar");
               ^
./src/main.zig:35:29: note: called from here
pub fn main() anyerror!void {
                            ^
app...The following command exited with error code 1:
/usr/local/bin/zig/zig build-exe /app/src/main.zig --pkg-begin build_options /app/zig-cache/app_build_options.zig --pkg-end --cache-dir /app/zig-cache --name app --cache on

Build failed. The following command failed:
/app/zig-cache/o/wMleWcfAyxi0-V7hinMQiA6MwCIf5CPSrsWSt18CnevnaWmCFqTjgWgQA91NzXzT/build /usr/local/bin/zig/zig /app /app/zig-cache run -DinvalidPromises=true

As you can see, the build failed and printed out the message we provided to the @compileError() call.

Other languages

During the course of writing this blog post, I had a look at some other languages to see if this kind of functionality was elsewhere. I thought it may be useful to include some notes on that front here.

Rust

I don't have a great deal of experience with Rust, having only used it to write a couple of very minimal services. It seems like this may be possible using Procedural Macros, but it seems there are some discussions ongoing about exactly how they should work and the documentation seems a little slim at the moment. Unfortunately, I couldn't come up with a workable solution — maybe you can? I'd love to see it!

Conclusion

Being able to harness powerful compile time code execution opens up a whole range of possibilities. Validating that a string only contains valid values is just the beginning, we could do so much more.

I've created a Git repository with all of my implementations, hooked up to CI to test that they all function as expected. I'd welcome any Pull Requests to either add other implementations or improve code!