Been a while - not much time for nlvm these days, but thanks to llvm some things are easy.

One such thing is to compile nim to wasm - all it takes is a few lines of code and off we go.

Basically, I followed https://gist.github.com/yurydelendik/4eeff8248aeb14ce763e, except that we'll use nlvm instead of clang. I'll assume you've set up nlvm and the stuff in that post (binaryen mainly).

Here's a simple nim example:

proc adder(i,j: int): int {.exportc.} =
  i+j

Swapping clang for nlvm, here are some interesting points from the nlvm command line that follows:

  • gc turned off - less generated code, and who likes garbage collectors anyway?
  • js defined for an even lighter version of system.nim - oh boy, does this file need cleanup - I'm sure there's a magic combination of defines that cuts this down even further
  • target is wasm32 - wasm64 seems to have some issues / doesn't work out of the box - don't know why

nlvm c --gc:none --deadcodeelim:on --checks:off --nlvm.target=wasm32 -d:js -c mini
llc mini.ll
$BINARYEN/bin/s2wasm mini.s > mini.wast

We end up with the following:

(module
 (table 0 anyfunc)
 (memory $0 1)
 (data (i32.const 16) "\00\00\00\00\00\00\00\00")
 (data (i32.const 40) "\00\00\00\00\00\00\00\00")
 (data (i32.const 48) "\00\00\00\00")
 (export "memory" (memory $0))
 (export ".nlvmInit" (func $.nlvmInit))
 (export "main" (func $main))
 (func $.nlvmInit (; 0 ;)
 )
 (func $adder (; 1 ;) (param $0 i64) (param $1 i64) (result i64)
  (local $2 i32)
  (set_local $2
   (i32.sub
    (i32.load offset=4
     (i32.const 0)
    )
    (i32.const 32)
   )
  )
  (i64.store offset=24
   (get_local $2)
   (get_local $0)
  )
  (i64.store offset=16
   (get_local $2)
   (get_local $1)
  )
  (i64.store offset=8
   (get_local $2)
   (i64.add
    (get_local $0)
    (get_local $1)
   )
  )
  (i64.load offset=8
   (get_local $2)
  )
 )
 (func $main (; 2 ;) (param $0 i32) (param $1 i32) (result i32)
  (call $.nlvmInit)
  (i32.const 0)
 )
)

There's a bit of junk in there because of globals from system.nim, but we can see adder in its full glory.

One reason for using llvm is to get access to its optimization pipeline:

nlvm c --gc:none --deadcodeelim:on --checks:off --nlvm.target=wasm32 -d:js -c -d:release mini
llc mini.ll
$BINARYEN/bin/s2wasm mini.s > mini.wast

(module
 (table 0 anyfunc)
 (memory $0 1)
 (data (i32.const 16) "\00\00\00\00\00\00\00\00")
 (data (i32.const 24) "\00\00\00\00\00\00\00\00")
 (data (i32.const 32) "\00\00\00\00")
 (export "memory" (memory $0))
 (export ".nlvmInit" (func $.nlvmInit))
 (export "adder" (func $adder))
 (export "main" (func $main))
 (func $.nlvmInit (; 0 ;)
 )
 (func $adder (; 1 ;) (param $0 i64) (param $1 i64) (result i64)
  (i64.add
   (get_local $1)
   (get_local $0)
  )
 )
 (func $main (; 2 ;) (param $0 i32) (param $1 i32) (result i32)
  (i32.const 0)
 )
)
Much better!

Of course, this post is really a marketing ploy - to really support wasm, there's a bunch of work to do and it's great to see more people experimenting with it, like stisa: https://github.com/stisa/nwasm. As a prominent example, system.nim would probably require some even more convoluted when blocks, and either backend will benefit from those (so go on, submit some patches!).

Another thing that's easy is debugging. Since the last update, nlvm has gained some nice debugging capabilities - here's what it can look like, with a slightly expanded example:

proc adder(i,j: int): int =
  let x = i+j
  let y = x + 5
  result = y

echo adder(5, 6)

Compile with debug info enabled, and watch that beautiful call stack and break capability in gdb:

nlvm c --debuginfo  mini
gdb ./mini

GNU gdb (GDB) Fedora 8.0.1-36.fc27

(gdb) break mini.nim:3
Breakpoint 1 at 0x410ad2: file mini.nim, line 3.
(gdb) run
Starting program: /home/arnetheduck/src/hello/mini
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, adder_vA9cx9aw6X6V333AMUevrmBA (i=5, j=6) at mini.nim:3
3	  let y = x + 5
(gdb) bt full
#0  adder_vA9cx9aw6X6V333AMUevrmBA (i=5, j=6) at mini.nim:3
        result = 0
        x = 11
        y = 0
#1  0x0000000000401260 in .nlvmInit () at /home/arnetheduck/src/nlvm/Nim/lib/system.nim:6
No locals.
#2  0x0000000000410b0c in main (argc=1, argv=0x7fffffffdcc8)
No locals.
(gdb) step
4	  result = y
(gdb)

Have fun!

2018-04-15 09:27:21
This is pretty nice! As I have said elsewhere, I think a good way forward is to introduce an AST to AST lowering step that is backend agnostic. Then the compiler spin offs like nlvm become easier to write and maintain. Right now the C(++) codegen contains much logic (dealing with producing efficient GC write barriers) that needs to be replicated for every low level backend. The JS codegen would also benefit from an explicit lowering step though the lowering looks quite different. It's unclear if we should combine these or keep them separate.
2018-04-15 11:48:22

re AST-to-AST, sgtm - specially if you can nail and simplify some of the assignment mess. Gut-feeling-wise I think I would try to keep a single one for all backends. Might have to leak some info back to the frontend though - just like in c and llvm where some of the target info leaks back (int, pointer sizes etc), but overall it's probably reasonable, specially if the info leaked back can be part of an official API (a set of props like has_exceptions, has_string, etc?)

I would also really like it if it was painfully obvious in that lowered AST which nodes can actually appear, and which are guaranteed to be gone (ie will a for loop always be lowered to a while? then it should be evident from the API of new AST).

2018-04-15 12:17:14
Nice! I wonder what's your opinion about exceptions. Is it possible to do AST transformation in such a way that we don't have to call out to JS (like emscripten does), or call out to JS when throwing but not when trying at least? 2018-04-15 18:05:40

Not sure actually.

In practice, I've only looked at setjmp or dwarf EH - https://stackoverflow.com/questions/15464891/is-it-possible-to-write-a-zero-cost-exception-handling-in-c doesn't quite seem to get there, but has a comment that implementing C++ exceptions using the cfront approach with C as intermediary language was considered "infeasible". I haven't read the Design and Evolution of C++ book (though it looks interesting), but I would maybe start there.

2018-04-16 02:24:47