Skip to content

Latest commit

 

History

History
125 lines (112 loc) · 4.82 KB

README.md

File metadata and controls

125 lines (112 loc) · 4.82 KB

Overview

Write high-level control flow => get correct optimized bit-twiddled parallel C code

Design

  • Embedded DSL
    • Host language: Scala
    • Tagless Final Interpreter
      • Evaluator, Tracer, Twiddle AST emitter
    • AST
    • Optimizers (twiddler/parallelizer) => adds annotations to AST
    • CodeGen module
    • Verifier module

DSL Overview

  • Primitives: integers, bit-vector
  • Constructs: coniditionals, for loops, while loops, do loops
  • Arithmetics: +, -, *, /, **
  • Bit-wise operators: |, &, ~, >>, <<, ^
  • Logical operators: &&, ||, ==, <, >, <=, =>, !=
  • Optimized builtins: log2, log10, sqrt, ceil, has_zero, byte_{eq,gt,lt,gte,lte}, signof, abs, min, max, count_bits_set, rev_bits, swap_bits, %, is_power_2, next_power_2
  • Optimizations on constructs: vectorize/unroll loops, high-level op -> bit-operation, branch -> bit-operations without branches

Example:

x = 20
y = 10
if( hasZero(x) )
    swapBits(y, log10(x))

should produce

#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))
#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
int x = 20;
int y = 10;
if( haszero(x) )
{
  unsigned int v = x;
  int r;
  r = (v >= 1000000000) ? 9 : (v >= 100000000) ? 8 : (v >= 10000000) ? 7 : 
      (v >= 1000000) ? 6 : (v >= 100000) ? 5 : (v >= 10000) ? 4 : 
      (v >= 1000) ? 3 : (v >= 100) ? 2 : (v >= 10) ? 1 : 0;
  SWAP(y, r)
}

Using the OpenMP annotation backend:

x = "Print me in parallel"
for(i = 0; i < 10000000000; i++)
 prints(x)

should produce

char* x = "Print me in parallel"
#pragma omp for
for(int i = 0; i < 10000000000; i++)
 printf("%s", x)

Outstanding Design Choices

  • Choose host language
    • Scala
  • Should generated code be portable? Compiler intrinsics/ifdefs/etc. or not
    • Yes
  • Should aggressiveness level be adjustable?
    • Yes. Perhaps different evaluator for aggressive optimizations
  • Choose codegen backend. E.g. LMS? lms-verify?
    • No more LMS. Documentation is virtually non-existent. C backend is not consistent with the C language. Maybe in the long term with a few pull requests to LMS.
  • Provide option to verify code? E.g. lms-verify
    • Not yet. Perhaps when supporting LMS style codegen
  • Parallelize C code where bit-hacks were not possible. E.g. OMP, Vectorization or provide pluggable parallelizers
    • Yes

Why no LMS?

  • Documentation for this use-case is hard to come by
  • The C and C++ backends are not well separated and require manual patchwork to be sound
  • Control over variable generation in the output code is not flexible (without significant complexity overhead)

TODO

  • Implement tagless interpreter for core object language (see language overview)
  • Add code generation facilities to core language
  • Add optimization facilities
  • Build out core library
  • OpenMP codegen for LMS?
  • Add verifier and extend
  • ScalaTest support Testsuite
  • Extend with LMS

Instructions

  1. Run LMS installation instructions
  2. sbt
  3. run
  4. Choose main entry point (either testsuite or scratch area)

or to run all tests immediately (and continuously):

  1. sbt ~"runMain twiddle.dsl.Main"

or run from the REPL:

  1. sbt ~"runMain twiddle.dsl.REPL"

Resources