I was reading this: Golang's Real-time GC in Theory and Practice and thought I'd try the same code in Nim.

Here's my initial cut, anyone see anything that should be changed/improved to make the benchmark more comparable/meaningful?

# run with 'nim -c -r -d:release bm.nim'

import strutils
import times

const windowSize = 200000
const msgCount   = 1000000

type msg = ref array[1024, byte]
type buffer = ref array[windowSize, msg]

var worst: float

proc mkMessage( n: int ): msg =
    result = new( msg )
    for i in items( result[] ):
        result[i] = byte( n )

proc pushMsg( b: var buffer, highID: int ) =
    let start = cpuTime()
    let m = mkMessage( highID )
    b[highID mod windowSize] = m
    let elapsed = cpuTime() - start
    if elapsed > worst:
        worst = elapsed

proc main() =
    var b = new( buffer )
    for i in 0..<msgCount:
        pushMsg( b, i )
    
    let worst_ms = formatFloat( worst*1000, format=ffDecimal, precision=2 )
    echo( "Worst push time: ", worst_ms, "ms" )

when isMainModule:
    main()

I also don't know enough about the various garbage collectors to ID which would produce the best numbers but I thought that would be good information to include in the top comment block, e.g. something like:

# Try with nim -c -r -d:release --gc:stack bm.nim for best performance!

Once there are no concerns/objections I'll submit it back to their repo for inclusion. Thanks!

2016-12-02 16:04:39

Thanks for the link, sounds interesting, will read soon...

For your test code, I am not really sure what the purpose is. Nim's realtime GC is tunable, so you can get very short pause time with a tradeoff in throughput. See

http://nim-lang.org/docs/gc.html

You seems to test only only pause time for the default, which may be more tuned to throughput. (For my understanding, for most software pause times are not critical, or at least pause times of a few ms are fully OK, as for action games. For hardware control pauses may be critical, but if we do not allocate GC memory in critical parts, then the GC will not run in that critical part.) And when you are testing, you may test the boehm Gc as well.

I hope my understanding is not too wrong, I have newer spent time investigationg or understanding the GC.

For your code itself there is some room for improvement of course, for example

for i in items( result[] ):
        result[i] = byte( n )

should be writable as

for i in mitems( result[] ):
        i = byte( n )

2016-12-02 16:47:13

I was planning on porting these benchmarks to Nim myself, happy to see that you beat me to it

I will play around with your version and see if I can speed it up.

2016-12-02 17:12:11

Here is my version:

# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'

import strutils
import times

const
  windowSize = 200000
  msgCount   = 1000000

type
  Msg = seq[byte]
  Buffer = seq[Msg]

var worst: float

proc mkMessage(n: int): Msg =
  result = newSeq[byte](1024)
  for i in 0 .. <result.len:
    result[i] = byte(n)

proc pushMsg(b: var Buffer, highID: int) =
  let start = epochTime()
  
  let m = mkMessage(highID)
  b[highID mod windowSize] = m
  
  GC_step(10_000) # 10ms
  
  let elapsed = epochTime() - start
  if elapsed > worst:
    worst = elapsed

proc main() =
  GC_disable()
  
  var b = newSeq[Msg](windowSize)
  for i in 0 .. <msgCount:
    pushMsg(b, i)
  
  let worst_ms = formatFloat(worst * 1000, format=ffDecimal, precision=2)
  echo("Worst push time: ", worst_ms, "ms")
  
  echo(GC_getStatistics())

when isMainModule:
  main()

Things i've changed:

  • Removed extraneous spaces after ( and before ).
  • Rewrote code to make it more idiomatic.
    • Using seq[byte] instead of ref array[1024, byte] (this is not idiomatic at all)
  • Switched from cpuTime to epochTime, the former only measures CPU time(!).
  • Added some real-time GC API calls.

Strangely, switching to seq causes an overall slowdown (the max push time is unaffected, but the application takes longer to complete).

On my machine the typical worst push time is 10ms, both for my version of the code and yours. The addition of GC_step doesn't really help, but it is nice to show off. Go 1.5 (maybe I should upgrade) gets 8ms. So Nim is 2ms off, which isn't bad. As pointed out by /u/matthieum on Reddit, the power of Nim is the fact that you can specify exactly where the GC should run (as I've done using the GC_step call).

BTW, stack isn't a GC.

2016-12-02 20:00:54

Here is my version, I switched to measuring nano seconds to get numbers different from 0:

# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'

import strutils
#import times

include "$lib/system/timers"

const
  windowSize = 200000
  msgCount   = 1000000

type
  Msg = seq[byte]
  Buffer = seq[Msg]

var worst: Nanos

proc mkMessage(n: int): Msg =
  result = newSeq[byte](1024)
  for i in 0 .. <result.len:
    result[i] = byte(n)

proc pushMsg(b: var Buffer, highID: int) =
  let start = getTicks()
  
  let m = mkMessage(highID)
  # seqs copy, we don't want a copy here, so I used 'shallowCopy'.
  # Alternatively the 'ref array' version could be used.
  # Note that due to Nim's design we could also *re-use* the memory
  # completely, eliminating the GC runs but this would make this
  # benchmark pointless.
  shallowCopy(b[highID mod windowSize], m)
  
  GC_step(500) # 0.5ms, but GC never takes this long.
  
  let elapsed = getTicks() - start
  if elapsed > worst:
    worst = elapsed

proc main() =
  GC_disable()
  # The MS backup collector is only required for cylic data structures,
  # but in this program we generate no cycles. Nim allows us to specify this
  # behaviour directly, so IMO it's fair to disable it. Nim's GC docs also
  # mention this fact, it's not undocumented voodoo:
  GC_disableMarkAndSweep()
  
  var b = newSeq[Msg](windowSize)
  for i in 0 .. <msgCount:
    pushMsg(b, i)
  
  # nano seconds into ms:
  echo("Worst push time: ", worst, "nano seconds")
  
  echo(GC_getStatistics())

when isMainModule:
  main()

On my machine I get to roughly 23000 nano seconds.

Take with a big grain of salt!

  • Read my comments in the code.
  • I am not sure if "lib/system/timers" is reliable.
  • I have a fast CPU.
2016-12-02 21:15:24

on my win7 64-bit system

running @Araq's version 25 times

        25 	Count
   
   204,940 	Min
   975,141 	Mean
 8,422,127 	Max
 
 1,601,464 	StdDev

So a fairly wide deviation of values (and a much slower system).

2016-12-03 01:05:16

Follow on comments:

Removed extraneous spaces after ( and before ).

You'll take the spaces from my cold, dead fingers.

Rewrote code to make it more idiomatic.

Using seq[byte] instead of ref array[1024, byte] (this is not idiomatic at all)

Tracking. I used the array/ref array because that seemed a more comparable data structure to Go's array & slice types but I agree that idiomatic code is probably better for a benchmark like this.

Switched from cpuTime to epochTime, the former only measures CPU time(!).

I originally used epochTime but the STL docs say:

This may be more useful for benchmarking than epochTime.

Thanks for all the feedback everyone!

2016-12-03 03:30:09

Ok, on my Mac this one produces better timings.

# Compile and run with 'nim c -r -d:useRealtimeGC -d:release main.nim'

import strutils
#import times

include "$lib/system/timers"

const
  windowSize = 200000
  msgCount   = 1000000

type
  Msg = seq[byte]
  Buffer = seq[Msg]

var worst: Nanos

proc mkMessage(n: int): Msg =
  result = newSeq[byte](1024)
  for i in 0 .. <result.len:
    result[i] = byte(n)

proc pushMsg0(b: var Buffer, highID: int) =
  # warmup:
  let m = mkMessage(highID)
  shallowCopy(b[highID mod windowSize], m)

proc pushMsg1(b: var Buffer, highID: int) =
  # with benchmarking:
  let start = getTicks()
  
  let m = mkMessage(highID)
  shallowCopy(b[highID mod windowSize], m)
  
  let elapsed = getTicks() - start
  if elapsed > worst:
    worst = elapsed

proc main() =
  # Don't use GC_disable() and GC_step(). Instead use GC_setMaxPause().
  GC_disableMarkAndSweep()
  GC_setMaxPause(300)
  
  var b = newSeq[Msg](windowSize)
  # we need to warmup Nim's memory allocator so that not most
  # of the time is spent in mmap()... Hopefully later versions of Nim
  # will be smarter and allocate larger pieces from the OS:
  for i in 0 .. <msgCount:
    pushMsg0(b, i)
  
  # now after warmup, we can measure things:
  for i in 0 .. <msgCount:
    pushMsg1(b, i)
  
  echo("Worst push time: ", worst, " nano seconds")
  
  echo(GC_getStatistics())

when isMainModule:
  main()

On my slower machine, I get latencies of max 0.3ms.

2016-12-03 11:04:23

For comparison purposes, @Araq's second solution gives me

        25 	Count
   
   272,765 	Min
 1,271,322 	Mean
 7,401,332 	Max
 
 1,503,168 	StdDev

which over 25 tests is not much different to the first test.

2016-12-04 20:58:40
@jlp765 On what OS do you get that much jitter? 2016-12-05 08:03:28
<<<••123••>>>