yesterday evening I saw this post "Slow performance compared to C++, ideas?"[email protected] on the D forum and wondered how nimrod performs on this code.

My translation to nimrod is here

on my laptop it performs quite well against the c++ code:

nimrod: 450 ms c++: 690 ms

I have no working D compiler installed anymore, but I think the result for nimrod isn't too bad

2013-06-13 14:43:05

Wow, this is very cool.

I'm actually surprised that Nimrod outperforms C++, I wonder why that is.

I have a question about the output, this is what it looks like on my computer (amazing that this is only ~250 lines):

Why are the reflections of the red and dark blue balls shown on the silver ball? is it because the silver ball is transparent?

2013-06-13 18:36:17

I was surprised too, that nimrod was faster. Maybe because I implemented the Vec3 as a simple array and not as a class and statically unrolled all the operations on it. - and I hoisted an invariant operation out of the loop (and there is another one which I have overseen).

I would like to port the code to use SIMD - has anyone an example how to do SIMD with Nimrod ?

And yes the silver sphere is 80% transparent.

2013-06-14 05:58:56
I've seen GCC's optimizer generate SIMD often enough with code emitted by Nimrod. I've also seen native Nimrod code outperform C code that explicitly used SIMD intristics. So my advice is to check the generated assembly; if it doesn't use SIMD already you can try Visual C++ or Intel C++ or play around with your coding style to make it emit SIMD. Afaik we have no wrapper yet for the SIMD intristics. 2013-06-14 07:56:46

It's indeed amazing that it is so fast yet so concise.

AdrianV: I ported the code to Crystal and put it in the samples directory. Is that ok with you? If not I can remove it.

On my machine it takes 3680 ms using Crystal. Very poor. But we still don't have any kind of optimization in the front end, only the ones that llvm gives us.

I couldn't compare it with nimrod because when I execute it I get this:

--- 2013-06-29 14:15:24.053 raytracer[5775:707] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Error (1000) creating CGSWindow on line 259' *** First throw call stack:

0 CoreFoundation 0x00007fff91ecab06 __exceptionPreprocess + 198 1 libobjc.A.dylib 0x00007fff930d93f0 objc_exception_throw + 43 2 CoreFoundation 0x00007fff91eca8dc +[NSException raise:format:] + 204 3 AppKit 0x00007fff8ed07b49 _NSCreateWindowWithOpaqueShape2 + 655 4 AppKit 0x00007fff8ed06340 -[NSWindow _commonAwake] + 2002 5 AppKit 0x00007fff8ecc4d82 -[NSWindow _commonInitFrame:styleMask:backing:defer:] + 1763 6 AppKit 0x00007fff8ecc3ecf -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 1568 7 AppKit 0x00007fff8ecc389f -[NSWindow initWithContentRect:styleMask:backing:defer:] + 45 8 libSDL-1.2.0.dylib 0x0000000105aecbfa -[SDL_QuartzWindow initWithContentRect:styleMask:backing:defer:] + 279 9 libSDL-1.2.0.dylib 0x0000000105aeaacd QZ_SetVideoMode + 2629 10 libSDL-1.2.0.dylib 0x0000000105ae1907 SDL_SetVideoMode + 886 11 raytracer 0x0000000105a89992 test_78942 + 82 12 raytracer 0x0000000105a8a189 raytracerInit + 9 13 raytracer 0x0000000105a8a210 main + 112 14 libdyld.dylib 0x00007fff8fcfe7e1 start + 0 15 ??? 0x0000000000000001 0x0 + 1

) libc++abi.dylib: terminate called throwing an exception SIGABRT: Abnormal termination. ---

I'm on a Mac and apparently it's an issue with SDL on Mac. In fact, it gives the same error in Crystal but I renamed the "main" function to "SDL_main" and it compiles and runs fine. Is there a way to make the nimrod raytracer work on a Mac?

2013-06-29 17:20:20
Oh, it doesn't work on Mac? I can't say I'm surprised. It's the OS which costs us the most maintenance time. 2013-06-29 21:51:28

I know this is a few months old, but if you're trying to use SDL on a Mac, try adding the following at the top of your code before any SDL calls:

when defined(macosx):
      LibCocoa = "/System/Library/Frameworks/Cocoa.framework/Cocoa"
  proc NSAppLoad*():bool {.cdecl, importc: "NSApplicationLoad", dynlib: LibCocoa.}
  discard NSAppLoad()

It sets up the required NSApplication object that isn't available, hence the crashes.



2013-10-31 18:47:17

Nice. Will be using this as a benchmark to test SIMD code eventually

Speaking a little more on SIMD (which I am no expert on, but know a little about). SIMD is only really useful in well controlled hot-loops, otherwise you'll likely thrash your registers and loose any performance SIMD may have brought (or even slow things down), at least on non-x86_64 hardware (apparently all floating-point ops use SIMD processors on x86_64 processors, so bottlenecks on ARM or PowerPC aren't always bottlenecks on x86_64).

SIMD's benefit isn't just about calculating multiple values at once, it's also about "compressing" the number of vector operations an algorithm needs. For instance, a 'MADD' SIMD op performs both multiply and addition of a vector in a single op. So proper SIMD code can be more than a factor of 4 in performance gains in some areas.

I don't want to make up numbers (i don't have my old code around), but I remember spending some time writing a SIMD-based matrix/vector structs in C to compared performance, and the SIMD version was significantly faster in many conditions (by orders of magnitude, depending on the CPU). The biggest CPU hot-loops in games are Animation & Physics processing (both prime candidates for SIMD optimization), so for game-engine engineers SIMD is very important.

2013-11-01 18:57:02
yes this code is nice to study different processor and compiler dependent aspects. I tried for example clang vs gcc. Float64 vs float32. OpenMP vs single threaded. Some results are really astonishing:
  • on an old AMD machine OpenMP scales as expected with every used core. On a new Intel core i5/i7 OpenMP(gcc) was slower than single threaded or gets even slower than than the old AMD.
  • on float64 gcc and clang performs almost on par on my i7. On float32 clang is about 30% better than gcc

for me it would be very interesting how to improve (and understand) the performance of this code

2013-11-02 19:50:42

My brother and I played with the test a bit. On my machine i've improved it by ~40ms (Nimrod 0.9.2). We also ported the code to C# for comparison (and recorded results). The repo is here:

For future reference, If anyone wants to extend the results page or add a language, just send me a message or make a pull-request.

It would be nice to see your results for comparison, adrianv.

2013-11-03 23:07:30