Hi

I have programmed in a few languages extensively in my life, though now I primarly use Ruby.

I just found out about Nim literlly 3 weeks ago now (mid July 2017) after reading this article: Nim for the discerning Rubyist http://www.rubyflow.com/p/1j08ef-nim-for-the-discerning-rubyist

I primarly use Ruby to apply to math/science problems/projects because it's so easy to programm in, and it allows me to think about how to solve problems without worrying about how to code it. I've also played around with Crystal (aka Ruby on steroids) but it's still young, and doesn't let me (currently) do what I want, or without a lot of hassles.

However Nim looks promising, especially it's parallel programing features (if real and not brain destroying).

So here's a request, and maybe a challenge, to the Nimers (Nimrods, Nimists, Nimsters,..) gurus.

I developed a prime sieve called the Sieve of Zakiya (SoZ), and it's companion the Segmented Sieve of Zakiya (SSoZ).

I wrote a paper, The Segmented Sieve of Zakiya (SSoZ) which describes its mathematical foundations and algorithm, and provides a working cpp implementation at the end of the paper (compiled, run, and verified). It's a relatively short program.

Here's a link to read and download the paper:

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

My request/challenge is for someone(s) who are good at Nim to translate the code into idiomatic Nim, to demonstrate the best way to do this algorithm in Nim. Extra points if someone can do a parallel version of the algorithm, which I attempted to do using OpemMP, but what I did didnt' seem to make the code faster than the serial version.

I know this may be a lot to ask, but I learn best from code doing problems I understand, like the Rosetta Code benchmarks, but those are too short to learn enough for this algorithm.

Ultimately, I'd like to publish to results of benchmarks in different languages doing the SSoZ in an updated paper.

If anyone has any question, I'd be pleased to answer them as best I can.

Thanks in advance.

Jabari

2017-08-07 04:45:13

Honestly, this sounds like an interesting little project to bang out. However, I must admit that I am a humble Web Developer and, by extension, am not well versed in mathematics.

The best I could do is translate your C code in the linked document into Nim and maybe make a few small changes. However, without a deeper understanding of your algorithm, I doubt I would be able to optimize it in any meaningful way. It seems like your paper does a good job explaining the concepts behind the algorithm, but I do not have enough free time at the moment to work on this as I am approaching a deadline to launch a web application.

My suggestion is for you to possibly try and implement the algorithm in Nim yourself. You will find it to be surprisingly simple, I'm sure. If you then link to the repository here, we can help you by reviewing the code and making pull requests.

2017-08-08 15:37:29

Thanks. I'm in the process of learning enough Nim to take on this effort. However, I would do what you were thinking, and first try a comparable direct C++ to Nim translation, just to get something to work. But it's going to be some time (with my time/learning curve constraints) to create something workable.

I'm hoping someone who's really fluent in Nim, and who thinks in Nim, can perform the algorithm in a Nimby way that best showcases its capabilites. Once you get your head around the algorithm (and/or math) it shouldn't be too hard to understand what the C++ code is doing. Then someone versed in Nim can structure the alogrithm, particularly a true parallel processing version, in an efficient manner.

2017-08-08 19:25:44
Can you put the paper on a link that is not behind a paywall? I am normally pretty trained to port from different languages to different languages, but from the paper I can't do copy paste at all. 2017-08-08 19:36:29

The link I provided in my first post allows you to read/download it free from my Scribd site. Here it is again.

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

I've tested it many times, and there is no filter/paywall to get around. Let me know if this doesn't work and I'll provide another method to get to it.

2017-08-08 20:55:48

Sounds interesting, I might try doing a direct port from cpp to nim when I get some free time.

Btw, download is indeed behind a paywall (looks like scribd offers a month free but after that it's paid). Reading is free though.

Copy pasting from the pdf is impossible, as latex or whatever mangles the symbols, but I found a gist here

Also found this link in the references https://www.4shared.com/dir/TcMrUvTB/sharing.html but it won't let me download anything.

I'd suggest setting up a github pages website with an html version (pandoc or similar can convert most formats to html) or at least some kind of git repo with the code, as it's free and most people here are used to cloning things to look at them on their pc (at least I am).

2017-08-09 07:05:35

OK, thanks for the feedback on your problems downloading the paper.

Can you tell me what platform you are using, and what country you're accessing from?

I see from my Scribd stats that the paper is being downloaded so it appears to be working for others.

I'll look at putting the paper up on github, though.

2017-08-09 15:39:41

Below is the Nim direct translation of the C++ code in my paper. I have verified it on Nim 0.17 on Linux. In comparison on this same machine the Nim code runs a bit faster for some (larger >= 10^9) test values (not comprehensibly benchmarked yet).

I'll post it to a gist later to make it easier to download. I'm now re-architecting it to make it parallel friendlier (I hope). Feedback welcome on style, and performance tips.

ssozp5.nim

#[
 This Nim source file will compile to an executable program to
 perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
 It is based on the P5 Strictly Prime (SP) Prime Generator.
 
 Prime Genrators have the form:  mod*k + ri; ri -> {1,r1..mod-1}
 The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
 For P5, mod = 2*3*5 = 30 and the number of residues are
 rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
 
 Adjust segment byte length parameter B (usually L1|l2 cache size)
 for optimum operational performance for cpu being used.
 
 Verified on Nim 0.17, using clang (smaller exec) or gcc
 
 $ nim c --cc:[clang|gcc] --d:release  ssozp5.nim
 
 Then run executable: $ ./ssozp5 <cr>, and enter value for N.
 As coded, input values cover the range: 7 -- 2^64-1
 
 Related code, papers, and tutorials, are downloadable here:
 
 http://www.4shared.com/folder/TcMrUvTB/_online.html
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2017/08/21
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math
import strutils, typetraits

let pbits = [
 8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]

let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]

# Global parameters
const
  modp5 = 30           # prime generator mod value
  rescnt = 8           # number of residues for prime generator
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  primecnt = 0'u64     # number of primes <= N
  next: seq[uint64]    # array of primes first multiples
  primes: seq[int]     # array of primes <= sqrt(N)
  seg: seq[uint8]      # seg[B] segment byte array

proc sozp7(val: int | int64) =
  let md = 210
  let rscnt = 48
  let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
          , 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
          ,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
  
  var posn = newSeq[int](210)
  for i in 0 .. <rscnt: posn[res[i]] = i-1
  
  let num = (val-1) or 1           # if value even number then subtract 1
  var k = num div md               # kth residue group
  var modk = md * k                # base num value
  var r = 1                        # starting with first residue
  while num >= modk+res[r]: r += 1 # find last pc position <= num
  let maxprms = k*rscnt + r - 1    # maximum number of prime candidates
  var prms = newSeq[bool](maxprms) # array of prime candidates set False
  
  let sqrtN = int(sqrt float64(num))
  modk=0; r=0; k=0
  
  # sieve to eliminate nonprimes from small primes prms array
  for prm in prms:
    r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
    if prm: continue
    let res_r = res[r]
    let prime = modk + res_r
    if prime > sqrtN: break
    let prmstep = prime * rscnt
    for ri in res[1..rscnt]:
      let prod = res_r * ri         # residues cross-product
      var nonprm = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
      while nonprm < maxprms: prms[nonprm] = true; nonprm += prmstep
  
  # the prms array now has all the positions for primes r1..N
  # extract prime numbers and count from prms into primes array
  primes = @[7]                     # r1 prime for P5
  modk = 0; r = 0
  for prm in prms:
    r += 1; if r > rscnt: (r = 1; modk += md)
    if not prm: primes.add(modk + res[r])
  pcnt = len(primes)

proc next_init() =
  var pos = newSeq[int](modp5)
  for i in 0 .. <rescnt: pos[residues[i]] = i-1
  pos[1] = rescnt-1
  
  # for each prime store resgroup on each restrack for prime*(modk+ri)
  for j, prime in primes:
    let k = uint((prime-2)) div uint(modp5)       # find the resgroup it's in
    let r = uint((prime-2)) mod uint(modp5) + 2   # its base residue value
    for ri in residues[1 .. rescnt]:  # for each residue value
      let prod: int = int(r) * ri                 # create residues cross-product r*ri
      let row:  int = pos[prod mod modp5] * pcnt  # create residue track address
      next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)

proc segsieve(Kn: int) =
                                      # for Kn resgroups in segment
  for b in 0 .. <Kn:                  # for every byte in the segment
    seg[b] = 0                        # set every byte bit to prime (0)
  for r in 0 .. <rescnt:              # for each ith (of 8) residues for P5
    let biti = uint8(1 shl r)         # set the ith residue track bit mask
    let row  = r*pcnt                 # set address to ith row in next[]
    for j in 0 .. <pcnt:              # for each prime <= sqrt(N) for restrack
      if next[row+j] < uint(Kn):      # if 1st mult resgroup index <= seg size
        var k: int = int(next[row+j]) # get its resgroup value
        let prime = primes[j]         # get the prime
        while k < Kn:                 # for each primenth byte < segment size
          seg[k] = seg[k] or biti     # set ith residue in byte as nonprime
          k += int(prime)
        next[row+j] = uint(k - Kn)    # 1st resgroup in next eligible segment
      else: next[row+j] -= uint(Kn)   # if 1st mult resgroup index > seg size
                                      # count the primes in the segment
  for byt in seg[0..<Kn]:             # for the Kn resgroup bytes
    primecnt += uint(pbits[byt])      # count the '0' bits as primes

proc printprms(Kn: int, Ki: uint64) =
  
  # Extract and print the primes in each segment:
  # recommend piping output to less: ./ssozpxxx | less
  # can then use Home, End, Pgup, Pgdn keys to see primes
  for k in 0 .. <Kn:                 # for Kn residues groups|bytes
    for r in 0 .. <8:                # for each residue|bit in a byte
      if (int(seg[k]) and (1 shl r)) == 0:   # if it's '0' it's a prime
        write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
  #echo "\n"

proc ssozp5() =
  
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # find primes <= val (7..2^64-1)
  
  let B = 262144                    # L2D_CACHE_SIZE 256*1024 bytes
  let KB = B                        # number of segment resgroups
  seg = newSeq[uint8](B)            # create segment array of B bytes size
  
  writeLine(stdout, "segment has ", B, " bytes and ", KB, " residues groups")
  
  let num = uint64((val-1) or 1)    # if val even subtract 1
  var k: uint64 = num div modp5     # resgroup of num
  var modk = uint64(modp5) * k      # base value of num
  var r = 1                         # starting with first residue
  while num >= modk+uint64(residues[r]): r += 1  # find last pc position <= num
  let maxpcs = k * uint64(rescnt) + uint64(r-1)  # maximum number of prime candidates
  let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for val
  
  writeLine(stdout, "prime candidates = ", maxpcs, "; resgroups = ", Kmax)
  
  let sqrtN = int(sqrt float64(num))
  
  sozP7(sqrtN)                     # get pcnt and primes <= sqrt(nun)
  
  writeLine(stdout, "create next[", rescnt, "x", pcnt, "] array")
  
  next = newSeq[uint64](rescnt*pcnt)  # create the next[] array
  next_init()                      # load with first nonprimes resgroups
  
  primecnt = 3                     # 2,3,5 the P5 excluded primes count
  var Kn: int = KB                 # set sieve resgroups size to segment size
  var Ki = 0'u64                   # starting resgroup index for each segment
  
  echo "perform segmented SoZ\n";
  
  while Ki < Kmax:                 # for KB size resgroup slices up to Kmax
    if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki)  # set sieve resgroups size for last segment
    segsieve(Kn)                   # sieve primes for current segment
    #printprms(Kn,Ki)              # print primes for the segment (optional)
    Ki += uint64(KB)               # next segment slice
  
  var lprime = 0'u64               # get last prime and primecnt <= val
  modk = uint64(modp5)*(Kmax-1)    # mod for last resgroup in last segment
  var b = Kn-1                     # num bytes to last resgroup in segment
  r = rescnt-1                     # from last restrack in resgroup
  while true:                      # repeat until last prime determined
    if (int(seg[b]) and (1 shl r)) == 0:
      lprime = modk+uint64(residues[r+1])  # determine the prime value
      if lprime <= num: break      # if <= num exit from loop with lprime
      primecnt -= 1                # else reduce total primecnt
                                   # reduce restrack, setup next resgroup
    r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
  
  writeLine(stdout, "last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
  writeLine(stdout, "last prime (of ", primecnt, ") is ", lprime)

ssozp5()

2017-08-21 04:34:40
I don't have time right at this very moment, but I will take a look at this within the next few days and see if there are any changes I can make to get a faster runtime. Love all your comments by the way. 2017-08-21 15:55:57

I cleaned up the code and added more commments. It should be much easier now to uderstand what/why the code is doing after reading my paper.

Here is the gist location of the source code for this version.

https://gist.github.com/jzakiya/94670e6144735eb0041919f633d6011c

I've created a rearchitected version to run in parallel, and in fact I got it to compile, and run with correct results, but is 2x slower than the serial version. If anyone knows how to really create parallel running programs please let me know.

I will create a github project for all the different versions I|others create, for different languages, prime generators, parallel versions, etc. So far I have various C++ versions, and now this Nim version, and I plan to do a Crystal version too.

If people want to, you can contact me directly via email, which is included in my pape and the code intro.

I would urge people who run the code to play with the B value for their system cache. I've only run this code on Intel I5|7 cpu laptops. I'm interested how the code runs on other systems (AMD, ARM, PowerPC, etc) and operarting systems (I've only used Linux).

Here's the code in the gist file.

#[
 This Nim source file will compile to an executable program to
 perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
 It is based on the P5 Strictly Prime (SP) Prime Generator.
 
 Prime Genrators have the form:  mod*k + ri; ri -> {1,r1..mod-1}
 The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
 For P5, mod = 2*3*5 = 30 and the number of residues are
 rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
 
 Adjust segment byte length parameter B (usually L1|l2 cache size)
 for optimum operational performance for cpu|system being used.
 
 Verified on Nim 0.17, using clang (smaller exec) or gcc
 
 $ nim c --cc:[clang|gcc] --d:release  ssozp5.nim
 
 Then run executable: $ ./ssozp5 <cr>, and enter value for N.
 As coded, input values cover the range: 7 -- 2^64-1
 
 Related code, papers, and tutorials, are downloadable here:
 
 http://www.4shared.com/folder/TcMrUvTB/_online.html
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2017/08/23
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math                   # for sqrt function
import strutils, typetraits   # for number input

# Global var used to count number of primes in 'seg' variable
# Each value is number of '0' bits (primes) for values 0..255
let pbits = [
 8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]

# The global residue values for the P5 prime generator.
let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]

# Global parameters
const
  modp5 = 30           # mod value for P5 prime generator
  rescnt = 8           # number of residues for P5 prime generator
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  primecnt = 0'u64     # number of primes <= N
  next: seq[uint64]    # table of regroups vals for primes multiples
  primes: seq[int]     # list of primes r1..sqrt(N)
  seg: seq[uint8]      # segment byte array to perform ssoz

# This routine is used to compute the list of primes r1..sqrt(input_num),
# stored in global var 'primes', and its count stored in global var 'pcnt'.
# Any algorithm (fast|small) can be used. Here the SoZ using P7 is used.
proc sozp7(val: int | int64) =      # Use P7 prime gen to finds upto val
  let md = 210                      # P7 mod value
  let rscnt = 48                    # P7 residue count and residues list
  let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
          , 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
          ,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
  
  var posn = newSeq[int](210)              # small array (hash) to convert
  for i in 0 .. <rscnt: posn[res[i]] = i-1 # (x mod md) to values -1,0,1,2..6
  
  let num = (val-1) or 1            # if val even then subtract 1
  var k = num div md                # compute its residue group val
  var modk = md * k                 # compute its base num
  var r = 1                         # starting with first residue
  while num >= modk+res[r]: r += 1  # find last pc position <= num
  let maxprms = k*rscnt + r - 1     # maximum number of prime candidates
  var prms = newSeq[bool](maxprms)  # array of prime candidates set False
  
  let sqrtN = int(sqrt float64(num))
  modk = 0; r = 0; k = 0
  
  # sieve to eliminate prime multiples from list of pcs r1..sqrtN
  for prm in prms:                  # from list of pc positions in prms
    r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
    if prm: continue                # if pc not prime go to next location
    let res_r = res[r]              # if prime save residue of prime value
    let prime = modk + res_r        # numerate the prime value
    if  prime > sqrtN: break        # exit if > sqrtN
    let prmstep = prime * rscnt     # prime's stepsize to eliminate its mults
    for ri in res[1..rscnt]:        # eliminate prime's multiples from prms
      let prod = res_r * ri         # residues cross-product for this prime
      # compute resgroup val of 1st prime multiple for each restrack
      # starting there, mark all prime multiples on restrack upto end of prms
      var prm_mult = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
      while prm_mult < maxprms: prms[prm_mult] = true; prm_mult += prmstep
  
  # prms now contains the nonprime positions for the prime candidates r1..N
  # extract primes into global var 'primes' and count into global var 'pcnt'
  primes = @[7]                     # r1 prime for P5
  modk = 0; r=0
  for prm in prms:                  # numerate primes from processed pcs list
    r += 1; if r > rscnt: (r = 1; modk += md)
    if not prm: primes.add(modk + res[r])  # put prime in global 'primes' list
  pcnt = len(primes)                       # set global count of primes

# This routine initializes the [rescnt x pcnt] global var 'next' table with the
# resgroup values of the 1st prime multiples for each prime r1..sqrtN along the
# restracks for the 8 P5 residues (7, 11..29, 31). 'next' is used to eliminate
# prime multiples in 'seg' for each SSoZ segment, and is updated with new
# resgroup values for each prime for use with subsequent segment iterations.
proc next_init() =
  var pos = newSeq[int](modp5)                    # create small array (hash)
  for i in 0 .. <rescnt: pos[residues[i]] = i-1   # to convert an (x mod modp5)
  pos[1] = rescnt-1                               # val into restrack val 0..7
  
  # load 'next' with 1st prime multiples regroups vals along each residue track
  for j, prime in primes:                         # for each prime r1..sqrt(N)
    let k = uint((prime-2)) div uint(modp5)       # find the resgroup it's in
    let r = uint((prime-2)) mod uint(modp5) + 2   # and its residue value
    for ri in residues[1 .. rescnt]:              # for each prime|residue pair
      let prod: int = int(r) * ri                 # compute res cross-product r*ri
      let row:  int = pos[prod mod modp5] * pcnt  # compute residue track address
      # compute|store resgroup val of 1st prime multiple for prime|ri residue pair
      next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)

# This routine performs the segment sieve for a segment of Kn resgroups|bytes.
# Each 'seg' bit represents a residue track of Kn resgroups, which are processed
# sequentially. 'next' resgroup vals are used to mark prime multiples in 'seg' along
# each restrack, and is udpated for each prime for the next segment. Upon completion
# the number of primes ('0' bits) per 'seg' byte are added to global var 'primecnt'.
proc segsieve(Kn: int) =              # for Kn resgroups in segment
  for b in 0 .. <Kn: seg[b] = 0       # initialize seg bits to all prime '0'
  for r in 0 .. <rescnt:              # for each ith (of 8) residues for P5
    let biti = uint8(1 shl r)         # set the ith residue track bit mask
    let row  = r * pcnt               # set address to ith row in next[]
    for j, prime in primes:           # for each prime r1..sqrt(N)
      if next[row + j] < uint(Kn):    # if 1st mult resgroup index <= seg size
        var k = int(next[row + j])    # get its resgroup value
        while k < Kn:                 # for each primenth byte < segment size
          seg[k] = seg[k] or biti     # set resgroup's restrack bit to '1' (nonprime)
          k += int(prime)             # compute next prime multiple resgroup
        next[row + j] = uint(k - Kn)  # 1st resgroup in next eligible segment
      else: next[row + j] -= uint(Kn) # do if 1st mult resgroup index > seg size
                                      # count the primes in the segment
  for byt in seg[0..<Kn]:             # for the Kn resgroup bytes
    primecnt += uint(pbits[byt])      # count the '0' bits as primes

# Print segment primes. Can pipe output as: $ ./ssozp5 | less (or output file)
# to make viewing primes easier
proc printprms(Kn: int, Ki: uint64) = # print out prime values in segment
  for k in 0 .. <Kn:                  # for Kn residues groups|bytes
    for r in 0 .. <8:                 # for each restrack bit in byte
      if (int(seg[k]) and (1 shl r)) == 0:  # if it's '0' it's a prime
        write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
  #echo "\n"

proc ssozp5() =                    # Perform SSoZ using P5 prime generator
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # input integer val (7..2^64-1)
  
  const B = 256 * 1024             # adjust size to optimize for system's cache
  let KB = B                       # number of segment resgroups
  seg = newSeq[uint8](B)           # create segment array of B bytes size
  
  echo("segment has ", B, " bytes and ", KB, " residues groups")
  
  let num = uint64((val-1) or 1)   # if val even subtract 1
  var k: uint64 = num div modp5    # compute its resgroup val
  var modk = uint64(modp5) * k     # compute its base num
  var r = 1                        # starting with first residue
  while num >= modk+uint64(residues[r]): r += 1  # find last pc position <= num
  let maxpcs = k * uint64(rescnt) + uint64(r-1)  # maximum number of prime candidates
  let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for num
  
  echo("prime candidates = ", maxpcs, "; resgroups = ", Kmax)
  
  let sqrtN = int(sqrt float64(num))
  
  sozP7(sqrtN)                     # get pcnt and primes <= sqrt(nun)
  
  echo("create next[", rescnt, "x", pcnt, "] array")
  
  next = newSeq[uint64](rescnt*pcnt)  # create size of the global next[] table
  next_init()                      # load with first prime multiples resgroups
  
  primecnt = 3                     # 2,3,5 the P5 excluded primes count
  var Kn: int = KB                 # set sieve resgroups size to segment size
  var Ki = 0'u64                   # starting resgroup index for first segment
  
  echo("perform segmented SoZ")
  
  while Ki < Kmax:                 # for KB size resgroup slices up to Kmax
    if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki) # to set last segment size
    segsieve(Kn)                   # sieve primes for current segment
    #printprms(Kn,Ki)              # print primes for the segment (optional)
    Ki += uint64(KB)               # first resgroup index for next seg slice
  
  var lprime = 0'u64               # get last prime and primecnt <= val
  modk = uint64(modp5) * (Kmax-1)  # mod for last resgroup in last segment
  var b = Kn-1                     # number of bytes|resgroup in last segment
  r = rescnt-1                     # set to last restrack in resgroup
  while true:                      # repeat until last prime determined
    if (int(seg[b]) and (1 shl r)) == 0:
      lprime = modk + uint64(residues[r+1])  # numerate the prime value
      if lprime <= num: break      # if <= num exit from loop with lprime
      primecnt -= 1                # else reduce primecnt if prime > num
                                   # reduce restrack, setup next resgroup
    r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
  
  echo("last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
  echo("total primes = ", primecnt, "; last prime = ", lprime)

ssozp5()

2017-08-23 17:25:12
<<<••1234••>>>