I would like to use cudaHostAlloc as a custom allocator.

It's a replacement to C malloc that

  • allocates page-locked memory
  • greatly improve Host<->GPU memory transfer via Direct Memory Access
  • enable non-blocking host<->GPU memcpy w.r.t. both host and GPU computations

Here is Nim implementation

proc cudaHostAlloc*(pHost: ptr pointer;
                    size: csize;
                    flags: cuint): cudaError_t
{.cdecl, importc: "cudaHostAlloc", dynlib: libcudart.so.}

Questions

1. Is it possible to replace the default memory allocator for say seq?

2. If not is it possible to use a custom memory allocator for a custom ref object?

I had a look at Nim's memory regions I have a feel that it may help but I don't understand how to use it in practice.

type
  UncheckedArray {.unchecked.}[T] = array[0..100_000_000, T]
  
  PinnedArray[T] = ref object
    len: int
    data: UncheckedArray[T]


# Allocate 50 kB
# this will be replaced by cudaHostAlloc code

var memRegion = alloc(50_000)

# Doesn't compile
var foo: PinnedArray[int] ptr memRegion

# I get "Error: type expected"
# Nimsuggest also says "region needs to be an object type"

2017-09-09 05:29:06

Well, just following that manual section and compiler messages, replace memRegion (a variable) in the last line with a type, the type you want to point to, PinnedArray[int] in your case, and to the left to that ptr put another object type, serving to distinguish your memory region, like:

type Cuda = object # nothing needed inside, serves just as a mark
var foo: Cuda ptr PinnedArray[int] # read this as "Cuda-pointer to PinnedArray[int]"

Nimsuggest also says "region needs to be an object type"

That's explicit in that manual section.


UPD

Something that compiles, though too much casts, and maybe not best fits your needs:

type
  UncheckedArray {.unchecked.}[T] = array[0..100_000_000, T]
  PinnedArray[T] = object
    len: int
    data: ptr UncheckedArray[T]
  Cuda = object

var foo: Cuda ptr PinnedArray[int]
foo = cast[ptr[Cuda, PinnedArray[int]]](alloc sizeOf(PinnedArray[int]))
foo.data = cast[ptr UncheckedArray[int]](alloc 50_000)
foo.len = 50_000

foo.data[][2]=7
echo foo.data[][2]

2017-09-09 07:17:05

Thanks, that's really cool.

If I can't use a custom allocator or a custom memory region with "new" or "newSeq" in the next couple months, I will probably go with that.

2017-09-09 08:44:34

While direct support for seqs and strings is not here (it's planned, according to the manual), you can wrap them in objects.

type
  MySeq[T] = object
    data: seq[T]
  Cuda = object

var foo = cast[ptr[Cuda, MySeq[int]]](alloc sizeOf(MySeq[int]))
foo.data = newSeq[int]()

foo.data.add 7
echo foo.data[0]

or with constructor and wrapped procs

proc newMySeq[T](size = 0.Natural): Cuda ptr MySeq[T] =
  result = cast[ptr[Cuda, MySeq[T]]](alloc sizeOf(MySeq[T]))
  result.data = newSeq[T](size)
proc add[T](s: Cuda ptr MySeq[T], v: T) = s.data.add v
proc `[]`[T](s: Cuda ptr MySeq[T], i: int): T = s.data[i]

for more convenient usage

var s = newMySeq[float]()
s.add 5
echo s[0]

2017-09-09 10:52:31

Interesting, however if I understand this correctly:

proc newMySeq[T](size = 0.Natural): Cuda ptr MySeq[T] =
  result = cast[ptr[Cuda, MySeq[T]]](alloc sizeOf(MySeq[T]))
  result.data = newSeq[T](size)

result = ... uses the custom allocator for the pointer to result.data result.data is still allocated via the default allocator within newSeq

2017-09-09 12:39:12
Yes, seems you're right. Then you probably need to create something seq-like via pointers, with alloc and the like... Probably what you tried with UncheckedArray. If you cannot use just static arrays (fixed capacity) + length field, which would be the easiest and fastest of coarse.
2017-09-09 13:43:10
--gc:regions will let you do that and is not far away, maybe a couple of weeks...
2017-09-09 17:23:20
@Araq Wasn't --gc:regions supposed to apply to all allocations in the given scope? I can't really remember much about that mode (I only remember that when you mentioned it for the first time on this forum, people were quite surprised because it was nowhere to be found in the docs) but I was pretty sure it was something like scoped memory pools or something like that?
2017-09-10 21:16:44