Skip to content

ddcovery/expressive_sort

Repository files navigation

expressive sort

In this quora answere I wrote about javascript/python expressiveness vs Go/Rust/D/... performance.

As an example, I mentioned the "3 lines" haskell quick sort and wrote this javascript version

const sorted = ([pivot, ...others]) => 
  pivot === void 0 ? [] : [
    ...sorted(others.filter(n => n < pivot)),
    pivot,
    ...sorted(others.filter(n => n >= pivot))
  ];

This, of course, is not a "quick sort" because the original one is an "in place" algorithm that doesn't require additional memory space allocation. This is a functional oriented expression that exemplarizes how expressive a "functional" orientation can be (You "express" that the sorted version of an array is, given one of it's elements, the sorted version of the smaller ones, plus the item, plus the sorted version of the bigger ones).

As an enthusiastic newbie to the "D" programming language, I thought that D could affort this expressiveness too...

D has no support for destructuring as javascript has (remember de sorted([pivot, ...others])), but it has lambdas, map/filter/reduce support, array slices and array concatenation that allows you to write easily a similar expression:

T[] sorted(T)(T[] xs)
{
  return xs.length == 0 ? [] : 
    xs[1..$].filter!(x=> x < xs[0]).array.sorted ~ 
    xs[0..1] ~ 
    xs[1..$].filter!(x=> x >= xs[0]).array.sorted;
}

note: D notation allows to write foo(a) as a.foo() or a.foo, this is the reason we can write sorted( array( something ) ) as something.array.sorted

note: sorted is a templated method (T is the type of the elements of the array): "under the scenes", D compiler detects if the final used type is comparable (i.e.: it is a class with a opCmp method, or it is a numerical type, or ...): yo don't need to tell that T extends something like "IComparable" because D libraries are not "interface" based: D prefers to use "conventions" and check them using instrospection at compile time (D developers write compile-time code and run-time code at the same time: D allows you to mix them naturally).

Seeing the similarities, I assume (I really don't know) that javascript and D versions are doing the same "under the scenes":

  • The [...array1, ...array2, ...array3] javascript is equivalent to the array1 ~ array2 ~ array3 D code. That is, a new array is being generated as a result of copying the elements of the original 3.
  • The .filter!(...).array D code is using a "Range" to filter the elements and the ".array()" method to materialize the selected elements as an array. Internally, it is similar to the javascript code where .filter(...) iterates and selects the resulting elements and finally materializes the array

Wich one will perform better?

First surprise was Javascript (nodejs) version performs better than D (about 20% faster for 1_000_000 random Float64 numbers).

  • Javascript (node): 1610 ms
  • D (DMD compiler): 2017 ms

Fortunately, D has 3 compilers: DMD (official reference compiler), GDC (GCC based compiler) and LDC (LLVM based compiler).

  • D (LDC compiler): 693 ms !!!

That's speed :-)

After some tests, I realized javascript destructuring [pivot,...others] caused a performance penalty and I rewrote to a more efficient version (very similar to D syntax)

const sorted = (xs) => 
  xs.length===0 ? [] : [].concat(
    sorted(xs.slice(1).filter(x => x < xs[0])),
    xs[0],
    sorted(xs.slice(1).filter(x => x >= xs[0]))
  );

And the execution on 1 million floats reduced to 921 ms!!! That's impressive :-O

I decided to write similar code in other languajes to compare.

In Python:

def sorted(xs):
  return [] if len(xs) == 0 else \
    sorted([x for x in xs[1:] if x < xs[0]]) + \
    xs[0:1] + \
    sorted([x for x in xs[1:] if x >= xs[0]])

In Crystal:

def sorted(xs : Array(Float64)) : Array(Float64)
  return xs.size == 0 ? [] of Float64  :
    sorted(x[1..].select { |x| x < xs[0] }) +
    [ xs[0] ] +
    sorted(xs[1..].select { |x| x >= xs[0] })
end

In Julia (thanks to Camilo Chacón Sartori)

function sorted(xs::Array{Float64, 1})::Array{Float64, 1}
  return length(xs) == 0 ? Array{Float64, 1}() : vcat(
    sorted([x for x in xs[2:end] if x < xs[1]]),
    xs[1],
    sorted([x for x in xs[2:end] if x >= xs[1]])
  )
end

In Scala

def sorted( xs:List[Double] ): List[Double] =
  xs match {
    case Nil =>
      List()
    case pivot :: rest =>
      sorted( rest.filter( _ < pivot ) ) ++
      List(pivot) ++
      sorted( rest.filter( _ >= pivot ) )
  }

In Nim

func sorted[T](xs:seq[T]): seq[T] = 
  if xs.len==0: @[] else: concat(
    xs[1..^1].filter(x=>x<xs[0]).sorted,
    @[xs[0]],
    xs[1..^1].filter(x=>x>=xs[0]).sorted
  )

For better measurement:

  • each test is internally ran 5 times, each time with a newly generated set of numbers (to avoid run-to-run optimization effects).
  • Between compilers (or interpreters) tests, there is a pause of 30s to normalize system (Mainly CPU temperature)

The results, as CSV, are

compiler,lang,size,ms
"ldc2","D",1000000,645
"ldc2","D",1500000,978
"ldc2","D",3000000,2013
"ldc2","D",6000000,4165
"crystal","Crystal",1000000,713
"crystal","Crystal",1500000,1102
"crystal","Crystal",3000000,2277
"crystal","Crystal",6000000,4750
"node","Javascript",1000000,904
"node","Javascript",1500000,1384
"node","Javascript",3000000,2977
"node","Javascript",6000000,6117
"nim","Nim",1000000,1038
"nim","Nim",1500000,1564
"nim","Nim",3000000,3352
"nim","Nim",6000000,6951
"scala","Scala",1000000,1524
"scala","Scala",1500000,2537
"scala","Scala",3000000,5741
"scala","Scala",6000000,19819
"dmd","D",1000000,1818
"dmd","D",1500000,2773
"dmd","D",3000000,5822
"dmd","D",6000000,12123
"julia","julia",1000000,3224
"julia","julia",1500000,4648
"julia","julia",3000000,9429
"julia","julia",6000000,20264
"python3","Python",1000000,4512
"python3","Python",1500000,7339
"python3","Python",3000000,17428
"python3","Python",6000000,38981

Execution time histogram by array size:

Process time

Changing to logarithmic scale

Process time

Do you know how to improve?

I include the code to the 7 tests. Please, tell me if you see something we can improve:

  • Avoid imperative instructions: "sorted" must be an unique expression or, at least, an unique "return ..." statement funcion.
  • Of course, you can't use built-in library sort methods :-)
  • Remember that this is not a quick-sort performance test (that, obviously, can be implemented in a more efficient way)

Running the tests

Prerequisites

All tests have been executed on a Ubuntu 20.04 linux.

Tests require Nodejs, Python3, Julia, DMD/LDC for D, scala, Nim and Crystal compilers

  • Nodejs: I use node 12.20 (see NodeSource distributions for more information)
  • Python3: Ubuntu comes with python 3 preinstalled.
  • Julia: It is available in apt and snap repositories. The apt version is newest (but not latest). See download page for newest versions
$ sudo apt install julia
  • DMD and LDC: They are available in apt and snap repositories (see guide ) . Because ldc version is newest in snap, I use snap repos:
$ snap install dmd --classic
$ snap install ldc2 --classic
  • scala 2.11.12: It is availabe in apt repository
$ sudo apt install scala
  • Crystal: It is availabe in apt and snap repositories (see guide for more information )

Running

You can run all tests using test.sh

$ ./test.sh 

If no want to generate a result.csv file

$ ./test.sh | tee result.csv

If you prefer to run tests individually:

  • D (LDC)
$ ldc2 -O5 -release -enable-cross-module-inlining --run sorted.d 
  • D (DMD)
$ dmd -O -release -run sorted.d
  • Javascript
$ node sorted.js
  • Crystal
$ crystal run sorted.cr --release
  • Python
$ python3 -OO sorted.py
  • Julia
julia -O3 --inline=yes --check-bounds=no sorted.jl

thanks to