Need optimization tips for a subset sum like problem with a big constraint

Question

Given a number 1 <= N <= 3*10^5, count all subsets in the set {1, 2, ..., N-1} that sum up to N. This is essentially a modified version of the subset sum problem, but with a modification that the sum and number of elements are the same, and that the set/array increases linearly by 1 to N-1.

I think i have solved this using dp ordered map and inclusion/exclusion recursive algorithm, but due to the time and space complexity i can't compute more than 10000 elements.

#include <iostream>
#include <chrono>
#include <map>
#include "bigint.h"


using namespace std;

//2d hashmap to store values from recursion; keys- i & sum; value- count
map<pair<int, int>, bigint> hmap;

bigint counter(int n, int i, int sum){

    //end case
    if(i == 0){ 
        if(sum == 0){
            return 1;
        }
        return 0;
    }

    //alternative end case if its sum is zero before it has finished iterating through all of the possible combinations
    if(sum == 0){
        return 1;
    }

    //case if the result of the recursion is already in the hashmap
    if(hmap.find(make_pair(i, sum)) != hmap.end()){
        return hmap[make_pair(i, sum)];
    }

    //only proceed further recursion if resulting sum wouldnt be negative
    if(sum - i < 0){
        //optimization that skips unecessary recursive branches
        return hmap[make_pair(i, sum)] = counter(n, sum, sum);
    }
    else{
                                        //include the number   dont include the number
        return hmap[make_pair(i, sum)] = counter(n, i - 1, sum - i) + counter(n, i - 1, sum);
    }
}

The function has starting values of N, N-1, and N, indicating number of elements, iterator(which decrements) and the sum of the recursive branch(which decreases with every included value).

This is the code that calculates the number of the subsets. for input of 3000 it takes around ~22 seconds to output the result which is 40 digits long. Because of the long digits i had to use an arbitrary precision library bigint from rgroshanrg, which works fine for values less than ~10000. Testing beyond that gives me a segfault on line 28-29, maybe due to the stored arbitrary precision values becoming too big and conflicting in the map. I need to somehow up this code so it can work with values beyond 10000 but i am stumped with it. Any ideas or should i switch towards another algorithm and data storage?

This is [A000009](http://oeis.org/A000009) I think (various ways to compute it are known). Do you need to provide the answer modulo `P` (then you can reduce intermediate results modulo `P` as well, keeping the numbers small)? That is commonly the case for programming challenges with big integer results and this looks like one of those. — harold, Jan 28 '23 at 20:35
`map, bigint> hmap;` -- This is not a hash map. The hash map in C++ is `std::unordered_map`. Second, given [this answer](https://stackoverflow.com/questions/67630357/c-factorial-of-number-100/67630658#67630658), try the boost multiprecision library instead of the one you chose. — PaulMcKenzie, Jan 28 '23 at 20:44
Yes sorry i forgot map is ordered and not a hash map. For an inputted N the answer must be the number of possible combinations, or subsets that sum up to N, so i think there is no modulo P but just raw, big integer result — svok, Jan 28 '23 at 22:35
Consider asking this on Code Review Stack Exchange: https://codereview.stackexchange.com/help/on-topic — Solomon Ucko, Jan 28 '23 at 23:03
After reading some time later the A000009 sequence is indeed the one i need to compute. Albeit the site is a little poorly constructed for me i dont know where to find the computation algorithms — svok, Jan 29 '23 at 00:45

score 2 · Accepted Answer · answered Jan 29 '23 at 17:35

Here is a different algorithm, described in a paper by Evangelos Georgiadis, "Computing Partition Numbers q(n)":

std::vector<BigInt> RestrictedPartitionNumbers(int n)
{
    std::vector<BigInt> q(n, 0);
    // initialize q with A010815
    for (int i = 0; ; i++)
    {
        int n0 = i * (3 * i - 1) >> 1;
        if (n0 >= q.size())
            break;
        q[n0] = 1 - 2 * (i & 1);
        int n1 = i * (3 * i + 1) >> 1;
        if (n1 < q.size())
            q[n1] = 1 - 2 * (i & 1);
    }
    // construct A000009 as per "Evangelos Georgiadis, Computing Partition Numbers q(n)"
    for (size_t k = 0; k < q.size(); k++)
    {
        size_t j = 1;
        size_t m = k + 1;
        while (m < q.size())
        {
            if ((j & 1) != 0)
                q[m] += q[k] << 1;
            else
                q[m] -= q[k] << 1;
            j++;
            m = k + j * j;
        }
    }
    return q;
}

It's not the fastest algorithm out there, and this took about half a minute for on my computer for n = 300000. But you only need to do it once (since it computes all partition numbers up to some bound) and it doesn't take a lot of memory (a bit over 150MB).

The results go up to but excluding n, and they assume that for each number, that number itself is allowed to be a partition of itself eg the set {4} is a partition of the number 4, in your definition of the problem you excluded that case so you need to subtract 1 from the result.

Maybe there's a nicer way to express A010815, that part of the code isn't slow though, I just think it looks bad.

What bigint library is this? Keep getting error "no match for 'operator<<' (operand types are '__gnu_cxx::__alloc_traits, bigint>::value_type' {aka 'bigint'} and 'int')" at lines 23 and 24 (q[m] += q[k] << 1; and q[m] -= q[k] << 1;) — svok, Jan 29 '23 at 20:29
@svok you can replace `q[k] << 1` with `q[k] + q[k]` or `q[k] * 2`, but I think it's a little silly for that bigint to not implement left shifts.. — harold, Jan 29 '23 at 20:32
Actually I just took a look at [the bigint library you were using](https://github.com/rgroshanrg/bigint) and it looks very silly. Based on decimal numbers stored in `std::string`.. yikes. That's not even a good plan when converting to and from decimal strings is very important. — harold, Jan 29 '23 at 20:38
Can you give me the link to the library you are using? Im stuck still computing 30000 for like 10 minutes — svok, Jan 29 '23 at 21:09
@svok my own so there's no link, you could try Boost multiprecision — harold, Jan 29 '23 at 21:15
Changing to boost solved it. Marking it as accepted but, could there be a way to solve it recursively or half recursively? Since that is one of the requirements of the problem — svok, Jan 29 '23 at 22:54

Need optimization tips for a subset sum like problem with a big constraint

1 Answers1