Hi,

I'm trying to understand what goes on in "lib\system\atomics.nim". This is in part because I'm missing atomicLoadN/atomicStoreN on Windows, and I'm trying to work out how to implement that myself. I've just stumbled upon this declaration (atomics.nim, line #220):

 interlockedCompareExchange8(p: pointer; exchange, comparand: byte): byte
  {.importc: "_InterlockedCompareExchange64", header: "<intrin.h>".}

At first, I though using _InterlockedCompareExchange64 was a bug, but then I found out that there is no _InterlockedCompareExchange8.

So, I guess exchange and comparand get cast to _int64, and the return value just gets cast to byte. So far so good.

But _InterlockedCompareExchange64 assumes p points to a _int64 value, and so will overwrite the 8 bytes at that location.

How can that not go horribly wrong?

2017-11-12 12:39:52
Actually _InterlockedCompareExchange8 does exist, It was added in VS2012 silently and it is not mentioned in the docs. 2017-11-12 21:40:10
Thank you, you rock!
2017-11-13 09:13:57

@cdome Great!

@Araq So, is/was (I'm assuming the '8' version is going to be used in the future), the call safe? I cannot imagine it would have been used at all, if it caused random memory overwrite. Maybe the 'compare' part of the call makes this safe? But then, why even bother offering 8/16/32 versions in the Windows API? Just because it's faster to only access as much memory as you need?

2017-11-13 11:57:33
Can't really remember but it was likely something like "argh this needs to compile now and I know my data is aligned at an 8 byte boundary".
2017-11-13 14:58:04
within the atomics.nim the gcc-api is very different (and better) than the vcc ones. a consolidation would be nice. AtomicLoad could be replaced by atomicInc or atomicDec but unfortunately not AtomicStore. And doing cas is not the same as AtomicStore. If you need two atomics for a specific operation it´s not atomic anymore
2017-11-15 14:43:32

@mikra That is basically what I was trying to do; have a single API for all platforms. I could share it, once it works, but having a unified API in the standard library would surely be beneficial to many users.

On second thought, I'm not sure I want anyone to think I have any clue about the Windows atomics APIs; pthreads makes perfect sense to me, but the Windows calls are just weird. I'm just guessing what does what based on the online M$ docs.

EDIT: @mikra I finally finished typing my "unified atomics" module, and realized that nim was actually compiling with some kind of gcc/clang under the hood. It seems to even have support for the pthread API, totally against my expectations. If the default compiler under windows does have pthread support, maybe it would be simple for you to make your own missing atomic procs? Since I'm going to have to use vcc in the end (due to independent "technical reasons"), it doesn't actually help me.

2017-11-15 19:25:41

@monster agree the windows api is weird . Unfortunately I have not much experience with the vcc(thread model and so on) I just need atomicLoad and atomicStore. Also I don´t know what happens if you use the Nim compiler (built with gcc) and then the vcc to compile your app. It works for me actually.

Iam using now interlockedAnd(atomicLoad) and interlockedExchange(atomicStore). see here (at line 75/180). works perfect for me.

https://github.com/mikra01/timerpool/blob/master/timerpool.nim

both native calls are missing within atomics.nim. what do you think?

2017-11-15 22:04:30

@mikra I can't say yet if my code will work, as I have to get Nim to use vcc first (found this thread), but my approach is somewhat different. Firstly, I tried to always use the "right size" call, by delegating to the appropriate Windows method using "when sizeof(T) == 8: ..." style code. Secondly, I also used "exchange" to replace "store" like you; I could not find anything better, but I have seen on stack-overflow people saying you should just set it non-atomically, and call a fence afterward. Maybe it works, but I didn't like that solution. Thirdly, I think "load" is better replaced by using "_InterlockedOr"; (x | 0) makes more sense to me than (x & F...).

What I still haven't understood yet, is why there seems to exist both "_InterlockedOr64_acq" and "InterlockedOr64Acquire" (for example), doing the same thing.

2017-11-15 22:23:32
@monster you are absolutely right. The or-solution is much better for the atomicLoad substitution. For your question: have a look at: https://docs.microsoft.com/en-us/cpp/intrinsics/intrinsics-available-on-all-architectures seems to me that the "_acq" functions are ARM-platform specific
2017-11-16 07:31:47
<<<••12••>>>