Skip to content

A collection of out-of-tree Clang plugins for teaching and learning

License

Notifications You must be signed in to change notification settings

banach-space/clang-tutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clang-tutor

Apple Silicon x86-Ubuntu

Example Clang plugins for C and C++ - based on Clang 19

clang-tutor is a collection of self-contained reference Clang plugins. It's a tutorial that targets novice and aspiring Clang developers. Key features:

  • Modern - based on the latest version of Clang (and updated with every release)
  • Complete - includes build scripts, LIT tests and CI set-up
  • Out of tree - builds against a binary Clang installation (no need to build Clang from sources)

Corrections and feedback always welcome!

Overview

Clang (together with LibTooling) provides a very powerful API and infrastructure for analysing and modifying source files from the C language family. With Clang's plugin framework one can relatively easily create bespoke tools that aid development and improve productivity. The aim of clang-tutor is to showcase this framework through small, self-contained and testable examples, implemented using idiomatic LLVM.

This document explains how to set-up your environment, build and run the project, and go about debugging. The source files, apart from the code itself, contain comments that will guide you through the implementation. The tests highlight what edge cases are supported, so you may want to skim through them as well.

Table of Contents

HelloWorld

The HelloWorld plugin from HelloWorld.cpp is a self-contained reference example. The corresponding CMakeLists.txt implements the minimum set-up for an out-of-tree plugin.

HelloWorld extracts some interesting information from the input translation unit. It visits all C++ record declarations (more specifically class, struct and union declarations) and counts them. Recall that translation unit consists of the input source file and all the header files that it includes (directly or indirectly).

HelloWorld prints the results on a file by file basis, i.e. separately for every header file that has been included. It visits all declarations - including the ones in header files included by other header files. This may lead to some surprising results!

You can build and run HelloWorld like this:

# Build the plugin
export Clang_DIR=<installation/dir/of/clang/19>
export CLANG_TUTOR_DIR=<source/dir/clang/tutor>
mkdir build
cd build
cmake -DCT_Clang_INSTALL_DIR=$Clang_DIR $CLANG_TUTOR_DIR/HelloWorld/
make
# Run the plugin
$Clang_DIR/bin/clang -cc1 -load ./libHelloWorld.{so|dylib} -plugin hello-world $CLANG_TUTOR_DIR/test/HelloWorld-basic.cpp

You should see the following output:

# Expected output
(clang-tutor) file: <source/dir/clang/tutor>/test/HelloWorld-basic.cpp
(clang-tutor)  count: 3

How To Analyze STL Headers

In order to see what happens with multiple indirectly included header files, you can run HelloWorld on one of the header files from the Standard Template Library. For example, you can use the following C++ file that simply includes vector.h:

// file.cpp
#include <vector>

When running a Clang plugin on a C++ file that includes headers from STL, it is easier to run it with clang++ (rather than clang -cc1) like this:

$Clang_DIR/bin/clang++ -c -Xclang -load -Xclang libHelloWorld.dylib -Xclang -plugin -Xclang hello-world file.cpp

This way you can be confident that all the necessary include paths (required to locate STL headers) are automatically added. For the above input file, HelloWorld will print:

  • an overview of all header files included when using #include <vector>, and
  • the number of C++ records declared in each.

Note that there are no explicit declarations in file.cpp and only one header file is included. However, the output on my system consists of 37 header files (one of which contains 371 declarations). Note that the actual output depends on your host OS, the C++ standard library implementation and its version. Your results are likely to be different.

Development Environment

Platform Support And Requirements

clang-tutor has been tested on Ubuntu 20.04 and Mac OS X 10.14.6. In order to build clang-tutor you will need:

  • LLVM 19 and Clang 19
  • C++ compiler that supports C++17
  • CMake 3.13.4 or higher

As Clang is a subproject within llvm-project, it depends on LLVM (i.e. clang-tutor requires development packages for both Clang and LLVM).

There are additional requirements for tests (these will be satisfied by installing LLVM 19):

  • lit (aka llvm-lit, LLVM tool for executing the tests)
  • FileCheck (LIT requirement, it's used to check whether tests generate the expected output)

Installing Clang 19 On Mac OS X

On Darwin you can install Clang 19 and LLVM 19 with Homebrew:

brew install llvm

If you already have an older version of Clang and LLVM installed, you can upgrade it to Clang 19 and LLVM 19 like this:

brew upgrade llvm

Once the installation (or upgrade) is complete, all the required header files, libraries and tools will be located in /usr/local/opt/llvm/.

Installing Clang 19 On Ubuntu

On Ubuntu Jammy Jellyfish, you can install modern LLVM from the official repository:

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main"
sudo apt-get update
sudo apt-get install -y llvm-19 llvm-19-dev libllvm19 llvm-19-tools clang-19 libclang-common-19-dev libclang-19-dev libmlir-19 libmlir-19-dev

This will install all the required header files, libraries and tools in /usr/lib/llvm-19/.

Building Clang 19 From Sources

Building from sources can be slow and tricky to debug. It is not necessary, but might be your preferred way of obtaining LLVM/Clang 19. The following steps will work on Linux and Mac OS X:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout release/19.x
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi" <llvm-project/root/dir>/llvm/
cmake --build .

For more details read the official documentation.

Note for macOS users

As per this great description by Arthur O’Dwyer , add -DDEFAULT_SYSROOT="$(xcrun --show-sdk-path)" to your CMake invocation when building Clang from sources. Otherwise, clang won't be able to find e.g. standard C headers (e.g. wchar.h).

Building & Testing

You can build clang-tutor (and all the provided plugins) as follows:

cd <build/dir>
cmake -DCT_Clang_INSTALL_DIR=<installation/dir/of/clang/19> <source/dir/clang-tutor>
make

The CT_Clang_INSTALL_DIR variable should be set to the root of either the installation or build directory of Clang 19. It is used to locate the corresponding LLVMConfig.cmake script that is used to set the include and library paths.

In order to run the tests, you need to install llvm-lit (aka lit). It's not bundled with LLVM 19 packages, but you can install it with pip:

# Install lit - note that this installs lit globally
pip install lit

Running the tests is as simple as:

$ lit <build_dir>/test

VoilĂ ! You should see all tests passing.

Overview of The Plugins

This table contains a summary of the examples available in clang-tutor. The Framework column refers to a plugin framework available in Clang that was used to implement the corresponding example. This is either RecursiveASTVisitor, ASTMatcher or both.

Name Description Framework
HelloWorld counts the number of class, struct and union declarations in the input translation unit RecursiveASTVisitor
LACommenter adds comments to literal arguments in functions calls ASTMatcher
CodeStyleChecker issue a warning if the input file does not follow one of LLVM's coding style guidelines RecursiveASTVisitor
Obfuscator obfuscates integer addition and subtraction ASTMatcher
UnusedForLoopVar issue a warning if a for-loop variable is not used RecursiveASTVisitor + ASTMatcher
CodeRefactor rename class/struct method names ASTMatcher

Once you've built this project, you can experiment with every plugin separately. All of them accept C and C++ files as input. Below you will find more detailed descriptions (except for HelloWorld, which is documented here).

LACommenter

The LACommenter (Literal Argument Commenter) plugin will comment literal arguments in function calls. For example, in the following input code:

extern void foo(int some_arg);

void bar() {
  foo(123);
}

LACommenter will decorate the invocation of foo as follows:

extern void foo(int some_arg);

void bar() {
  foo(/*some_arg=*/123);
}

This commenting style follows LLVM's oficial guidelines. LACommenter will comment character, integer, floating point, boolean and string literal arguments.

This plugin is based on a similar example by Peter Smith presented here.

Run the plugin

You can test LACommenter on the example presented above. Assuming that it was saved in input_file.c, you can add comments to it as follows:

$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libLACommenter.dylib -plugin LAC input_file.cpp

Run the plugin through ct-la-commenter

locommenter is a standalone tool that will run the LACommenter plugin, but without the need of using clang and loading the plugin:

<build_dir>/bin/ct-la-commenter input_file.cpp --

If you don't append -- at the end of tools invocation will get the complain from Clang tools about missing compilation database as follow:

Error while trying to load a compilation database:
Could not auto-detect compilation database for file "input_file.cpp"
No compilation database found in <source/dir/clang-tutor> or any parent directory
fixed-compilation-database: Error while opening fixed database: No such file or directory
json-compilation-database: Error while opening JSON database: No such file or directory
Running without flags.

Another workaround to solve the issue is to set the CMAKE_EXPORT_COMPILE_COMMANDS flag during the CMake invocation. It will give you the compilation database into your build directory with the filename as compile_commands.json. More detailed explanation about it can be found on Eli Bendersky's blog.

CodeStyleChecker

This plugin demonstrates how to use Clang's DiagnosticEngine to generate custom compiler warnings. Essentially, CodeStyleChecker checks whether names of classes, functions and variables in the input translation unit adhere to LLVM's style guide. If not, a warning is printed. For every warning, CodeStyleChecker generates a suggestion that would fix the corresponding issue. This is done with the FixItHint API. SourceLocation API is used to generate valid source location.

CodeStyleChecker is robust enough to cope with complex examples like vector.h from STL, yet the actual implementation is fairly compact. For example, it can correctly analyze names expanded from macros and knows that it should ignore user-defined conversion operators.

Run the plugin

Let's test CodeStyleChecker on the following file:

// file.cpp
class clangTutor_BadName;

The name of the class doesn't follow LLVM's coding guide and CodeStyleChecker indeed captures that:

$Clang_DIR/bin/clang -cc1 -fcolor-diagnostics -load libCodeStyleChecker.dylib -plugin CSC file.cpp
file.cpp:2:7: warning: Type and variable names should start with upper-case letter
class clangTutor_BadName;
      ^~~~~~~~~~~~~~~~~~~
      ClangTutor_BadName
file.cpp:2:17: warning: `_` in names is not allowed
class clangTutor_BadName;
      ~~~~~~~~~~^~~~~~~~~
      clangTutorBadName
2 warnings generated.

There are two warnings generated as two rules have been violated. Alongside every warning, a suggestion (i.e. a FixItHint) that would make the corresponding warning go away. Note that CodeStyleChecker also supplements the warnings with correct source code information.

-fcolor-diagnostics above instructs Clang to generate color output (unfortunately Markdown doesn't render the colors here).

Run the plugin through ct-code-style-checker

ct-code-style-checker is a standalone tool that will run the CodeStyleChecker plugin, but without the need of using clang and loading the plugin:

<build_dir>/bin/ct-code-style-checker input_file.cpp --

Obfuscator

The Obfuscator plugin will rewrite integer addition and subtraction according to the following formulae:

a + b == (a ^ b) + 2 * (a & b)
a - b == (a + ~b) + 1

The above transformations are often used in code obfuscation. You may also know them from Hacker's Delight.

The plugin runs twice over the input file. First it scans for integer additions. If any are found, the input file is updated and printed to stdout. If there are no integer additions, there is no output. Similar logic is implemented for integer subtraction.

Similar code transformations are possible at the LLVM IR level. In particular, see MBAsub and MBAAdd in llvm-tutor.

Run the plugin

Lets use the following file as our input:

int foo(int a, int b) {
  return a + b;
}

You can run the plugin like this:

$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libObfuscator.dylib -plugin Obfuscator input.cpp

You should see the following output on your screen.

int foo(int a, int b) {
  return (a ^ b) + 2 * (a & b);
}

UnusedForLoopVar

This plugin detects unused for-loop variables (more specifically, the variables defined inside the traditional and range-based for loop statements) and issues a warning when one is found. For example, in function foo the loop variable j is not used:

int foo(int var_a) {
  for (int j = 0; j < 10; j++)
    var_a++;

  return var_a;
}

UnusedForLoopVar will warn you about it. Clearly the for loop in this case can be replaced with var_a += 10;, so UnusedForLoopVar does a great job in drawing developer's attention to it. It can also detect unused loop variables in range for loops, for example:

#include <vector>

int bar(std::vector<int> var_a) {
  int var_b = 10;
  for (auto some_integer: var_a)
    var_b++;

  return var_b;
}

In this case, some_integer is not used and UnusedForLoopVar will highlight it. The loop could be replaced with a much simpler expression: var_b += var_a.size();.

Obviously unused loop variables may indicate an issue or a potential optimisation (e.g. unroll the loop) or a simplification (e.g. replace the loop with one arithmetic operation). However, that does not have to be the case and sometimes we have good reasons not to use the loop variable. If the name of a loop variable matches the [U|u][N|n][U|u][S|s][E|e][D|d] then it will be ignored by"UnusedForLoopVar. For example, the following modified version of the above example will not be reported:

int foo(int var_a) {
  for (int unused = 0; unused < 10; unused++)
    var_a++;

  return var_a;
}

UnusedForLoopVar mixes both the ASTMatcher and RecursiveASTVisitor frameworks. It is an example of how to leverage both of them to solve a slightly more complex problem. The generated warnings are labelled so that you can see which framework was used to capture a particular case of an unused for-loop variable. For example, for the first example above you will get the following warning:

warning: (Recursive AST Visitor) regular for-loop variable not used

The second example leads to the following warning:

warning: (AST Matcher) range for-loop variable not used

Reading the source code should help you understand why different frameworks are needed in different cases. I have also added a few test files that you can use as reference examples (e.g. UnusedForLoopVar_regular_loop.cpp).

Run the plugin

$Clang_DIR/bin/clang -cc1 -fcolor-diagnostics -load <build_dir>/lib/libUnusedForLoopVar.dylib -plugin UFLV input.cpp

CodeRefactor

This plugin will rename a specified member method in a class (or a struct) and in all classes derived from it. It will also update all call sites in which the method is used so that the code remains semantically correct.

The following example contains all cases supported by CodeRefactor.

// file.cpp
struct Base {
    virtual void foo() {};
};

struct Derived: public Base {
    void foo() override {};
};

void StaticDispatch() {
  Base B;
  Derived D;

  B.foo();
  D.foo();
}

void DynamicDispatch() {
  Base *B = new Base();
  Derived *D = new Derived();

  B->foo();
  D->foo();
}

We will use CodeRefactor to rename Base::foo as Base::bar. Note that this consists of two steps:

  • update the declaration and the definition of foo in the base class (i.e. Base) as well as all in the derived classes (i.e. Derived)
  • update all call sites the use static dispatch (e.g. B1.foo()) and dynamic dispatch (e.g. B2->foo()).

CodeRefactor will do all this refactoring for you! See below how to run it.

The implementation of CodeRefactor is rather straightforward, but it can only operate on one file at a time. clang-rename is much more powerful in this respect.

Run the plugin

CodeRefactor requires 3 command line arguments: -class-name, -old-name, -new-name. Hopefully these are self-explanatory. Passing the arguments to the plugin is a bit cumbersome and probably best demonstrated with an example:

$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libCodeRefactor.dylib -plugin CodeRefactor -plugin-arg-CodeRefactor -class-name -plugin-arg-CodeRefactor Base  -plugin-arg-CodeRefactor -old-name -plugin-arg-CodeRefactor foo  -plugin-arg-CodeRefactor -new-name -plugin-arg-CodeRefactor bar file.cpp

It is much easier when you the plugin through a stand-alone tool like ct-code-refactor!

Run the plugin through ct-code-refactor

ct-code-refactor is a standalone tool that is basically a wrapper for CodeRefactor. You can use it to refactor your input file as follows:

<build_dir>/bin/ct-code-refactor --class-name=Base --new-name=bar --old-name=foo file.cpp  --

ct-code-refactor uses LLVM's CommandLine 2.0 library for parsing command line arguments. It is very well documented, relatively easy to integrate and the end result is a very intuitive interface.

References

Below is a list of clang resources available outside the official online documentation that I have found very helpful.

License

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org/