Hi,

I started looking into the nim language recently after hearing about it on a forum. I find it to be a fun language because of its syntax and meta programming capabilities. In order to test the language, I implemented what is called an isolation forest ( http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html ) during which I encountered some difficulties and based on that I have some questions:

1- How to return a seq from a function without doing an unnecessary copy:

proc foo(): seq[int] =
    var arr = @[1, 2, 3]
    echo cast[int](arr[1].addr)
    arr

var arr = foo()
echo cast[int](arr[1].addr)
The two echo give different addresses.

2- Related to the first question, is there a plan to add a move operator into the language ? I think such operator is necessary since copying a seq is a deep copy by default, which may harm execution speed significantly is some cases. For examples, while building the trees of my isolation forest, a have to push some arrays that represent random indices in a double queue. After pushing them they are no more needed. I didn't find a way to move them instead of making a copy.

The same problem occurs when building an array of array (but in this case it can be done with an ugly shallowCopy call).

3- What is the state of using nim without the GC ? I know the GC in nim is thread local and can be tweaked, but I work in the domains of Computer Vision where every millisecond counts. If the GC have any impact on the performances then I need to avoid it. Is there a way for example to know where the GC is used (as for the D language) and see what modules can be used whithout it ? are there any plans to offer more tools for manual management like smart pointers ? are the destructor now stable and usable ? and does any body here have experience using nim whitout the gc ?

Excuse me for my bad english as I'm not an native speaker and thanks in advance

2017-08-18 15:26:44

You can simulate move semantic by using shallowCopy.

proc worker(): seq[int] =
    var arr = @[1, 2, 3]
    # do some heavy work here
    
    # deep copy is made here
    return arr
    # or
    shallowCopy(result, arr) # move arr to result without copying
    return # not necessary at end of function (implicit)
    
    # you can also use result var from the beginning avoid copies.

if you don't like shallowCopy syntax you can use a template to emulate shallow assignment semantics.

template `:=` (x: var seq[T]; v: seq[T]) =
    shallowCopy(x, v)

# used like this
var s0 = @[1,2,3]
let s1 = s0 # no copy is made since s1 cannot be modified
var s2 = s0 # a copy is made here (unless s0 was marked as shallow)
var s3: seq[int]
s3 := s0 # custom template use shallowCopy
there is also the swap proc which swap two variables without any deep copy.

You can laso mess with assignment semantics for types

https://nim-lang.org/docs/manual.html#type-bound-operations-operator

2017-08-19 14:32:41

@BigEpsilon

CMIIW, as I don't really know the internal Nim, I'll try to answer to extent of what I know:

1. seq already ref type, and content would depend on type kind. In your example you used literal values, and maybe if you used some ref types, it would pass the memory instead of copying it. Also, as @Parashurama mentioned, using var result from beginning won't copy the seq

proc foo(): seq[int] =
  result = newSeq[int]()
  result.add 1
  result.add 2
  result.add 3
  echo cast[int](result[0].addr)

var arr = foo()
echo cast[int](arr[0].addr)

2. I read somewhere with memory-region implemented, it would possible to have move and lent semantic. Using ptr array/seq certainly won't copy the seq.

3. Using Nim without GC is as same as using C with nice syntax. You'll have to manage allocated and deallocated memory manually. Look for related proc s in system module. This GC manual explains the minutes of GC usage and how to tweak it.

2017-08-19 14:35:32

Hi, Thanks both for your responses.

I didn't know that I could treat result as a normal variable. The template := will also be very helpful for me.

The porblem remains when using containers like double queue where the implementation will make copies. For this reason, I tried to play with the assignement operator, however, when implementing the operator, I obtain a runtime error:

ype
    container[T] = object
        val: seq[T]

proc `=`[T](d: var container[T], s: container[T]) =
    echo "= container called"
    echo s
    echo d
    shallowCopy(d.val, s.val)

var c = container[int](val: @[1, 2, 3])

which outputs:

= container called

(val: @[1, 2, 3])

(val: nil)

Traceback (most recent call last)

main2.nim(11) main2

main2.nim(9) =

gc.nim(287) unsureAsgnRef

gc.nim(196) incRef

SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Any idea how can I solve this error ?

2017-08-19 18:24:53

I can reproduce the issue with latest devel. This a likely a bug in the default GC. (refc)

you can use --gc:markAndSweep to try an alternative garbage collector.

see: https://nim-lang.org/docs/nimc.html#compiler-usage-command-line-switches for a list of potential arguments for the compiler

2017-08-19 18:50:16

1. This is the result of copying from arr to result, not the actual return (return x does an implicit result = x). The following code avoids it:

proc foo(): seq[int] =
  result = @[1, 2, 3]
  echo cast[int](result[0].addr)

var arr = foo()
echo cast[int](arr[0].addr)

You can also avoid copying by using let instead of var (the copying is done to avoid aliasing). Passing a value to a procedure will also not copy it.

2. There is a shallowCopy that avoids the copying. Note that shallowCopy is unsafe when the target is a global variable or managed heap location and the source is a constant. You can avoid the unsafety via using a version where the right-hand side must be mutable, e.g.:

proc `<-`[T](lhs: var T, rhs: var T) {.noSideEffect, magic: "ShallowCopy".}

3. If throughput is your concern, then manual memory management won't help you much per se. The primary cost of the the GC in Nim is the allocation/deallocation overhead (plus the write barrier, but you incur that only if you write a reference to a heap location or global variable), and you incur that in C/C++ also, unless you use custom allocation schemes. I know that Araq is also working on a region-based collector, which might alleviate the overhead (enabled via --gc:stack, not sure how mature it is). Note that RAII in particular may not help you much; reference counting as in std::shared_ptr has overhead than Nim's GC, and something like std::unique_ptr is either not memory-safe or incurs significant overhead (C++ chose the memory-unsafe option). In order to have memory-safe move semantics without overhead, you need linear or affine types, which would be a significant increase in language complexity.

2017-08-19 20:08:40

Thank you all for your responses, It is pleasant to see such helpful community around the language !

@Parashurama: indeed the program does not crash with the markAndSweep GC. I posted an issue on github.

@Jehan: Thank for your comment on the usability of the GC. As I said in my first message, I work on computer vision (mainly on Android and iOs platforms). My dream is to be able to use something different from C++ for that matter. I know the GC can be used in such domain because some of the most know decoders (barcodes/Qr-codes ...etc) on Andoid are coded in Java (but they are definitively slower than some c++ alternatives (like our in-house decoder)). Once nim stablilize, I think it could be a niche where nim succeeds provided we can make good wrappers for the most known cv libraries like opencv.

(By the way I started looking at a way to port the Opencv wrapper generator for python to nim. Using nim for opencv could break the need to port to c++ after prototyping on python and make the experience far more enjoyable).

2017-08-20 10:26:29

Just an update on the question of pushing seqs in a double queue without copy.

I tried this:

import deques

proc `<-`[T](lhs: var T, rhs: var T) {.noSideEffect, magic: "ShallowCopy".}

type
    container[T] = object
        val: seq[T]

proc `=`[T](d: var container[T], s: container[T]) =
    echo "copy"
    echo s.val
    d.val <- s.val

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[1].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[1].addr)
But it does not work. I looked at the implementation of deque but I dont see from where comes the problem.

However I found an easier solution:

import deques

type
    container[T] = ref object
        val: seq[T]

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[1].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[1].addr)

Which works as I want.

2017-08-20 18:11:07

Objects (without ref)are value types and those semantics also attach to their components. If you want to have shallow copying for objects, the easiest way is to use the {.shallow.} pragma.

Example:

import deques

type
    container[T] = object {.shallow.}
        val: seq[T]

var c = container[int](val: @[1, 2, 3])
echo repr(c.val)
echo cast[int](c.val[0].addr)
var deque = initDeque[container[int]]()
deque.addFirst(c)
var c2 = deque.popFirst()
echo repr(c2.val)
echo cast[int](c2.val[0].addr)

Note also that a[1] accesses the second element of a seq. If you want the first element, use a[0].

2017-08-20 18:22:29

CMIIW, object is for value type while ref object is for reference type.

If the object would be used for many occasions and its construction quite costly, it's better to use reference type then.

2017-08-20 23:49:01
<<<••123••>>>