I am currently looking for ways to generate CUDA C code.

For example the following fake Hello world (yeah it's an empty function on the GPU).

#include <stdio.h>
#include <iostream>

__global__ void kernel (void){

}

int main( void ) {
  kernel<<<1,1>>>();
  printf( "Hello, World!\n" );
  return 0;
}

For the global in the proc declaration I can create a new pragma with codegenDecl similar to the following to request alignment of variable

{.pragma: align16, codegenDecl: "$# $# __attribute__((aligned(16)))".}
var foo{.aligne16.}: array[100, int]

I just have to figure out the $#

Now for the chevrons notation in the function call, where should I start? Are mixins mentionned in the macro module the way to generate custom C?

Note: I'm aware of cudanim however instead of generating the

kernel<<<1,1>>>();

It will generate something like which is a functionally equivalent alternative

cudaLaunchKernel(1,1, kernel)

2017-09-13 07:39:50

There is little you cannot do with emit:

template notSure(x, y) =
  {.emit: "kernel<<", x, ", ", y, ">>();".}

notSure(1, 1)

2017-09-14 14:06:55

After testing, the proper syntax is:

template squareCuda(bpg, tpb: int, y: var GpuArray, x: GpuArray) =
  ## Compute the square of x and store it in y
  ## bpg: BlocksPerGrid
  ## tpb: ThreadsPerBlock
  ## Output square<<<bpg, tpb>>>(y,x)
  {.emit: ["""square<<<""",bpg.cint,""",""",tpb.cint,""">>>(""",y.data[],""",""",x.data[],""");"""].}

The triple-quote are is a bit verbose. I tried with backticks as well "kernel<<<bpg,`tpb`>>>(y,`x`);"

2017-09-15 22:47:56