Interestingly, Nim has remnants of some sort of LLVM something in its source still (or at least for a while). Evidently, something didn't work out too well.
2015-05-23 18:20:17
There are the following potential downsides to compiling to C:
  1. There is no portable support in C for efficient integer overflow checks. However, both clang and gcc support non-portable efficient versions, and there's an open pull request to support those.
  2. Exceptions in C are slow, due to the reliance on setjmp() and friends; both because setjmp-style register saving can be fairly slow and because it requires local variables to be declared as volatile. However, compilation to C++ does not suffer from this problem and it could be worked around by using libunwind also.
  3. It is not possible for a garbage collector to scan the stack precisely without sacrificing performance. This is not an issue for all GC implementations; however, a copying or fully compacting GC cannot work without precise stack scans (because it may have to modify addresses on the stack). This is not possible to work around; however, neither is it necessary to have such a GC (one can still have a mostly compacting GC).
  4. There is no portable solution for coroutines or lightweight threads, though that is mostly an issue of having an API for the appropriate library. (As far as I know, LLVM does not currently have great support for this, either.)
  5. Neither gcc and clang currently make any guarantees regarding tail call optimizations. They do optimize them, but it's a "best effort" thing and nothing that you can control with 100% certainty, especially not across compilation units. While tail call optimization within a single function can be done at the C level via gotos, all known portable universal solutions (such as trampolines) may incur overhead. That's typically only an issue for functional and logical programming languages, though.
  6. C compilers provide you with very little control over calling conventions (e.g., passing or returning additional values in registers for performance).
  7. There is no portable way to specify or know the memory layout of arbitrary data structures (though, again, for specific C compilers, this information can be known or even controlled).

There are other downsides to using C/C++ as a backend, but they mostly relate to the convenience of code generation/tooling (such as generating debugging information) and not to the quality of code, and, of course, there are plenty of countervailing convenience benefits you get from generating C.

Of the issues listed above, the only one that cannot be worked around in practice is that the stack layout is essentially opaque. From a code generation point of view, this affects the GC and any context switching mechanism such as coroutines or lightweight threads (and both only to an extent); it also affects tooling, such as anything that needs to access debugging information and profiling.

2015-05-23 20:20:32
Some of those issues don't apply to Nim, though. In particular, Nim uses deferred reference counting (solves 3). 2015-05-23 22:33:44

Evidently, something didn't work out too well.

Well we never finished it; generating LLVM is quite some work with not enough benefits to justify the effort.

2015-05-23 22:34:47
Jehan, you left out a major factor: you are subject to the C language rules on undefined behavior that allow aggressive indeterminate optimization by the compiler. See, e.g., https://news.ycombinator.com/item?id=9050999
2015-05-26 04:10:22
Compiling to C is an excellent choice in terms of practicality. C is almost cross platform assembly and has incredibly efficient compilers now. Interop with the vast collection of libraries is easy. Of course, it's not the ideal intermediate language for compilation (but then, neither is LLVM). I think compilation to C is one of Nim's current strengths. 2015-05-26 04:48:58

jibal: Jehan, you left out a major factor

No, I didn't. That was included under "convenience of code generation". It does not affect what your generated code can do, but it can make it tedious (and sometimes inefficient, as in the case of int overflow checks) to generate safe code. Mind you, it can be a bloody pain in the neck.

Nim's code generation, most importantly, does not target the C (or C++) standard. While it has support for generating code for a fairly generic C compiler (--cc:ucc), it is generally assumed that code will be generated for a very specific compiler (such as gcc, clang, or vcc). Nim can turn off options selectively for these compilers and generate code specifically optimized for them. For example, Nim knows that gcc's switch statement supports case ranges and generates code for them (see TInfoCCProp in compiler/extccomp.nim for details). Nim generally assumes that code is being generated for one of the (numerous) approved backends. This means that the question is not whether we can generate memory-safe code for the C standard, but for clang/gcc/vcc.

Note that you can fix pretty much any of these issues quickly by passing -fsanitize=undefined (or an appropriate subset) to gcc or clang. It will ensure that rather than being considered undefined behavior, such code is considered illegal and cannot possibly mislead the optimizer. This is not necessarily the most efficient way of doing it (see below), but it quickly fixes any concerns you may have.

Some notes on specific concerns:

  1. That shift widths are not being tested is probably an oversight rather than intentional (they should be considered under either range checks or overflow checks). Shifts are usually done by constant amounts, anyway, so almost all checks can be done by the compiler without runtime overhead. Note that there is no issue with signed shifts being undefined, because Nim does not have signed shifts (shr will do a logical right-shift even on ints).
  2. Signed integer overflow is tricky, not in that checking for it can't be done (in fact, it's already there), but that overflow checks can be expensive; there's a pull request to integrate the clang/gcc builtins for efficient overflow checking into Nim. Right now, -fsanitize=undefined or -fsanitize=signed-integer-overflow will address this more cheaply, -fno-strict-overflow, -ftrapv and -fwrapv are potentially also options you want to consider. In particular, -fno-strict-overflow tells the compiler that it cannot assume that signed integer overflow is undefined but should assume that signed integer overflow is implementation-defined.
  3. Undefined behavior due to null pointer dereferences is generally a non-issue for gcc, where -O2 implies -fisolate-erroneous-paths-dereference (which makes null pointer dereferencing illegal rather than undefined), but clang will happily optimize them away unless you use -fsanitize=null or unless you generate C code that prevents this. However, generating nil checks is not really a problem.

In practice, gcc and clang authors know that their compilers are being used as backends by other language implementations and accordingly provide mechanisms that facilitate this (just as they know that conservative garbage collectors are common and don't generate code that intentionally breaks them).

With respect to memory safety, I'm more concerned with how careless people are with using -d:release, which disables all checks (worse, it's hardcoded into nimble build and nimble install). Once you use -d:release, undefined behavior by C (which can only bite you because you had a bug somewhere) becomes the least of your concern.

2015-05-26 06:14:55

generating LLVM is quite some work with not enough benefits to justify the effort.

It's hard to quantify the benefits without doing the experiment, so you have to rely on judgement. Just looking at Jehan's list, and looking at the experiment in a 'similar' language, D, it would seem reasonable to guess that there would be a performance boost in generated code from using LLVM directly rather than through C to Clang, and that a native code compiler would also generate slower code that LLVM, though it might be much faster, as with dmd vs ldc. LLVM is also an active and improving project, so I expect that current deficiencies will be addressed.

It seems that there are many more pressing issues and bugs to fix now, but I hope someone tries to couple Nim to LLVM again in the future.

2015-05-26 15:04:12
With respect to memory safety, I'm more concerned with how careless people are with using -d:release, which disables all checks (worse, it's hardcoded into nimble build and nimble install). Once you use -d:release, undefined behavior by C (which can only bite you because you had a bug somewhere) becomes the least of your concern.

@Jehan, I will change this behaviour.

2015-05-26 19:32:02
@dom96 not sure why that's necessary. What -d:release means is determined by the configuration.
2015-05-26 19:36:26