# module m1 proc p1*[T](a: T) = echo a.wuff template t1*[T](a: T) = echo a.wuff
# module m2 import m1 type T = object i: int proc wuff(a: T): string = result = "miau" var t: T m1.t1(t) m1.p1(t)
$ nim c m2.nim $ ./m2 miau miau
My assumption was, that p1 would not compile, because proc wuff is unknown inside module m1. But it magically works! (That template t1 should work is not a surprise, because I assume it is "generated" inside module m2.)
Does somebody know what the trick is? And is inlining for proc wuff possible here.
A consequence may be, that for algorithm.sort() we do not need a cmp() proc parameter (which can not be inlined and is a bit slow so). We may just define a plain cmp() proc in our module for the types which we pass to compare? (of course we would have to modify the code of proc sort().)
Why should it be surprising? Procedure calls and field accesses in generics aren't resolved until instantiation time. What if wuff was a field of 'a'? Anyway, if by 'inlining' you mean, 'can the Nim compiler/backend compiler inline generics', then yes, it can be inlined.
I don't quite understand your last question. If there is a generic cmp[T](x, y: T) that applies to all objects, and a more specific cmp(x, y: Foo) defined in the current module/an imported module, the more specific procedure will take precedence during the resolution process.
Looking at the algorithm module's sort procedure, it looks like a sensible addition might be to have a 'sort' procedure which defaults to calling 'cmp' on the items, but the current implementation is ok. I wouldn't be surprised if a compile inlined any calls to 'sort', detected the function pointer, and inlined the referenced function too.
Why should it be surprising? Procedure calls and field accesses in generics aren't resolved until instantiation time.
Yes, you may know that from other languages, I can not remember reading it in the Nim docs. I saw a hint in Tables module defining a custom hash proc, but until today I thought that that works only because table module uses templates. The question was, if the wuff() proc in my example can be inlined from the compiler.
Looking at the algorithm module's sort procedure,
Some months ago I tested a quicksort proc, and found out that giving a cmp() proc for custom types decreases performance by about 25 percent compared to using a default cmp proc for known standard types. I tested that of course for the same data types like int. And the result did not surprised me that time, because the cmp() proc parameter is called often and is not inlined. My conclusion was, that it may make sense to provide a sort proc for arbitrary custom types with cmp() proc parameter, and additional procs for basic data types like int which do not need a cmp() proc parameter.
Reading this blog post
I just remembered a discussion some time ago:
S. Salewski wrote:
A consequence may be, that for algorithm.sort() we do not need a cmp() proc parameter (which can not be inlined and is a bit slow so). We may just define a plain cmp() proc in our module for the types which we pass to compare?
Some months ago I tested a quicksort proc, and found out that giving a cmp() proc for custom types decreases performance by about 25 percent compared to using a default cmp proc for known standard types.
So maybe we should provide indeed a sort proc without cmp() function.
@Stefan_Salewski - I think this idea of having a family of overloaded procs to sort on standard types is very good. I have almost raised it myself several times.
It would be useful for cmp-less sort procs on such type to be stable sorts (meaning elements with identical keys retain relative positions in the array). Then a user could implement a simple 2-level or 3-level sort on distinct embedded keys by just calling such sort procs 2 or 3 times. (A multi-level sort orders elements first by a primary key, then a secondary key for equal primary keys, and so on).
This arrangement has another performance bonus potentially much greater than the inlining effect mentioned so far in this thread. It allows one to tune which sorting algorithm is used based on the nature of the key, e.g. how long it is, integer or floating point or string, etc. For example, for single-byte integer keys it is hard to be faster than counting sort which does two pretty linear sweeps through the input - the first to histogram byte values and the second to output the answer (with a keyspace-sized prefix sum in between to convert counts to output offsets). The simplest version of that does require an additional copy of the input array which may have some consequences on the proc signature/interface/specification.
template sort[T](inp: seq[T], keyField: untyped) = when T.`keyField` is byte: ## XXX Handle all the standard types ... ## XXX good & stable algos for each one myArray.sort(myKey)The implementation might look nicer with a family of overloaded calls rather than a big when T.`keyField` is dispatcher. I am not sure there is a clean way to do that in this case. Each when clause should be able to dispatch to a type-specialized case, though. So, it's not too bad.