1

I have an array that can vary in size, with n columns and m rows, and I need to find all the combinations of one element for each row/column combination, but exclude any combinations where the element is zero. So, in practice, if I have:

Row Item1 Item2 Item3
1 A B C
2 D E F

I will have 2^3 = 8 possible combinations: ABC, ABF, AEC, AEF, DBC, DBF, DEC, DEF.

But if instead of B I have a zero in row 1 Item2, I want to exclude that cell from the list of combinations (in bold above), so I would end up with: AEC, AEF, DEC and DEF.

I found some code that give me all the possible combinations on a fixed number of columns (Macro to make all possible combinations of data in various columns in excel sheet), but it doesn't account for an array that can change dimensions, or for the exclusion rule above.

Anton
  • 19
  • 4
  • Could a zero be positioned anywhere and would all the possible combinations including a zero have to excluded from the resulting array? Also, include your own attempt at solving your issue. – JvdV Aug 26 '21 at 10:43
  • A column must have at least one non-zero value, other than that yes, the zeroes can be positioned anywhere. In practice the data will be checked before running the code, so it can be assumed the condition above is satisfied. I don't have a working code yet I'm afraid. Once and if I do, I will post it. – Anton Aug 26 '21 at 11:01
  • Have you considered editing the code to match your needs? – Solar Mike Aug 26 '21 at 12:41
  • I don't think a nested loop approach really works for a variable number of rows and columns - I would use some form of counting with a radix equal to the number of rows (in this case 2, i.e. binary) so each digit selects the row from which the next letter is drawn (000=ABC, 001=ABF 010=AEC etc.) – Tom Sharpe Aug 26 '21 at 13:09
  • I would have agreed with this before stumbling on this helpful post [link](https://stackoverflow.com/questions/19780016/vba-write-all-possible-combinations-of-4-columns-of-data) - Tim Williams' reply is close to what I am trying to achieve, but I still need to figure out how to reduce the number of combos by excluding those with a zero. – Anton Aug 26 '21 at 13:19
  • This is a pretty sophisticated piece of code isn't it, not the easiest to get your head round to modify. I might still post something simple using Application.Base - but I see that something has just appeared! – Tom Sharpe Aug 26 '21 at 14:13
  • @Tom Sharpe yes, something just appeared but I would be happy to see a different approach :) – gimix Aug 26 '21 at 14:16
  • Yes, thank you @gimix for your contribution. The biggest challenge I am trying to overcome here is dealing with the dimensionality - hence why I am trying to eliminate certain nodes from the calculation to begin with. – Anton Aug 26 '21 at 15:17
  • @Anton, how many columns/rows do you have in the real use case? I didn't expect dimensions to be an issue – gimix Aug 26 '21 at 15:19
  • In a conservative scenario you can assume 15 by 10 (so 15^10 combos, assuming no zeroes), but it can be more (say max would be 20 x 20). I'm trying to limit the size as much as possible, though the dataset is of such nature that it may not be possible. – Anton Aug 26 '21 at 15:45
  • 15^10 or more, phew, but I guess I can still try my simplistic Base method (Base function allows up to base 36 but I don't have to actually use it). I have it working for the simple case (no zeroes) and will try excluding zeroes next. – Tom Sharpe Aug 26 '21 at 16:19
  • Where are you storing the results? Assuming 15x15, and hence 15-chars strings in the result, you would need about 6 exabytes (that is 6 billion gigabytes). I'd recommend using a different tool: Python for instance has a standard library `product` function (implemented in C) which does what you need, with two important features: it's lightning fast (a couple of microseconds), and it only creates an "iterator" which will return values on demand, so it does not require too much memory – gimix Aug 26 '21 at 17:20
  • For now in memory - I have thought of trying to use Python for the calculation, but have to admit it's a bit outside my realm of abilities. My data is an array of prices and I am trying to find the cheapest combination. This is the target of the code. I am trying to eliminate combos where one product is an obvious outlier and would never be part of the winning combo. I will take a look at the Python function you suggested above and see if I can do something about it. Thank you. – Anton Aug 26 '21 at 17:30

3 Answers3

2

I'm just going to post the code for the simple (no zeroes) case so you can see where I'm going with this (of course I have realised that Base switches over to letters for radix 11 onwards so this might not be the smartest approach :) )

Function ListCombos(r As Range)

    Dim s As String, result As String
    Dim arr()
    Dim j As Integer, offset As Integer
    Dim rows As Integer, cols As Integer
    Dim nComb As Long, i As Long
    
    rows = r.rows.Count
    cols = r.Columns.Count
    
    nComb = rows ^ cols
    ReDim arr(1 To nComb)
    
    For i = 1 To nComb
        s = Application.Base(i - 1, rows, cols)

        result = ""
        For j = 1 To cols
            offset = CInt(Mid(s, j, 1))
            result = result & r.Cells(1, 1).offset(offset, j - 1)
        Next j

        arr(i) = result
    Next i
    
    ListCombos = arr
End Function

This is the version skipping combinations which contain zeroes. The method is to move non-zero values to the first rows of a holding array so effectively if you start with something like this

enter image description here

You make it look like this

enter image description here

So you don't have to generate or check all the combinations that contain zeroes.

Then use mixed radix to cycle through the combinations:

Option Explicit
Option Base 1

Function ListCombosWithZeroes(r As Range)
    Dim s As String, result As String
    Dim arr()
    Dim i As Integer, j As Integer, offset As Integer, count As Integer, carry As Integer, temp As Integer
    Dim rows As Integer, cols As Integer
    Dim nComb As Long, iComb As Long
    Dim holdingArr(20, 20) As String
    Dim countArr(20) As Integer
    Dim countUpArr(20) As Integer
    
    
    rows = r.rows.count
    cols = r.Columns.count
    
    ' Move non-zero cells to first rows of holding array and establish counts per column
    
    For j = 1 To cols
        count = 0
        For i = 1 To rows
            If r.Cells(i, j) <> 0 Then
                count = count + 1
                holdingArr(count, j) = r.Cells(i, j)
            End If
        Next i
        countArr(j) = count
    Next j
                
    
    ' Calculate number of combos
    
    nComb = 1
    For j = 1 To cols
        nComb = nComb * countArr(j)
    Next j
        
    ReDim arr(1 To nComb)
    
    'Loop through combos
    
    For iComb = 1 To nComb

        result = ""
        For j = 1 To cols
            offset = countUpArr(j)
            result = result & holdingArr(offset + 1, j)
        Next j

        arr(iComb) = result
        
        'Increment countup Array - this is the hard part.
        
        j = cols
        
        'Set carry=1 to force increment on right-hand column
        
        carry = 1
        
        Do

            temp = countUpArr(j) + carry
            countUpArr(j) = temp Mod countArr(j)
            carry = temp \ countArr(j)
            j = j - 1
            
        Loop While carry > 0 And j > 0

    Next iComb
    
    ListCombosWithZeroes = arr

End Function

enter image description here

You don't have to have equal numbers of letters per column.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
  • Thank you Tom - an interesting use of Application.Base, I've never come across this until now. The solution above generates all possible combos as you flagged, including where zeroes are present (the combo would just omit that element, so instead of ABF I have AF). The dimension problem is still there though: is there a way of getting all the possible combos after having eliminated the ones containing zeroes? – Anton Aug 26 '21 at 17:26
  • My idea, for what it's worth, would be to shuffle all of the non-zero values up to the top of a 2D array and just process those using mixed radix so as to generate just the ones that you need. I will have a go at it but it might take a little while. But that's not quite what you said in your last comment, the combo would be omitted altogether (like ABC, ABF,DBE,DBF in your question) rather than just being shortened – Tom Sharpe Aug 26 '21 at 17:32
  • Apologies if I was not clear - in summary I don't even care about the combos that would capture the zeroes. I want to eliminate them before getting all the possible combos, to reduce the dimensionality of the problem. – Anton Aug 26 '21 at 20:49
  • Version skipping combos with zeroes posted. – Tom Sharpe Aug 27 '21 at 17:05
  • 1
    Tom, thank you very much - this does the trick, though it still blows up once I get a to arrays of 10^9, even after eliminating the zeroes. The code works perfectly, it's just the sheer amount of calculations that causes Excel to crash. I guess I need to predetermine a maximum number of possible combos before I launch the execution, and only then go ahead. – Anton Aug 27 '21 at 18:09
1

Here's a solution. Probably not most efficient, since it is O(n2), but it works.

Caveats

  • I put a '.' instead of zero to avoid dealing with numeric vs alphanumeric values, but you can easily change this
  • Since I build the strings incrementally I need indices to be predictable. Hence I fill all the possible combinations and then remove the ones containing a '.' in a second pass
Global aws As Worksheet
Global ur As Range
Global ccount, rcount, size, rptline, rptblock, iblk, iln, idx As Integer
Global tempcombos(), combos() As String
Public Sub Calc_combos()
    Set aws = Application.ActiveSheet
    Set ur = aws.UsedRange
    ccount = ur.Columns.Count
    rcount = ur.Rows.Count
    size = (rcount - 1) ^ (ccount - 1)
    ReDim tempcombos(size - 1)
    ReDim combos(size - 1)
    rptline = size / (rcount - 1)
    rptblock = 1
    For c = 2 To ccount
        idx = 0
        For iblk = 1 To rptblock
            For r = 2 To rcount
                For iln = 1 To rptline
                    tempcombos(idx) = tempcombos(idx) & Cells(r, c)
                    idx = idx + 1
                Next iln
            Next r
        Next iblk
        rptline = rptline / (rcount - 1)
        rptblock = rptblock * (rcount - 1)
    Next c
    idx = 0
    For iln = 0 To size - 1
        If InStr(tempcombos(iln), ".") = 0 Then
            combos(idx) = tempcombos(iln)
            idx = idx + 1
        End If
    Next iln
End Sub
gimix
  • 3,431
  • 2
  • 5
  • 21
  • Thanks @gimix - the code above does indeed work as intended, but it's the size that quickly becomes an issue. As you highlighted, you are first building **all the combos** and then removing the unwanted ones. This quickly runs into overflow problems with Excel. The more I think about it, the more I believe a recursive approach works better, where the unwanted combos are excluded from the onset. – Anton Aug 26 '21 at 15:15
0

The Python way:

from dataclasses import dataclass, field
from itertools import product
from random import randint
from typing import Dict, List

@dataclass
class PriceComparison():
    rows : int
    cols : int
    maxprice : int = 50
    threshold : int = 0
    itemcodes : List[List[str]] = field(init=False)
    pricelist : Dict[str, int] = field(init=False)
    
    def __post_init__(self):
        ##create sample data
        self.itemcodes = [[f'A{r+self.cols*c:03d}' for c in range(self.rows)] for r in range(self.cols)]
        print(self.itemcodes)
        self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
        ##remove items with price = 0
        for col in self.itemcodes:
            for item in col[:]:
                if self.pricelist[item] == 0:
                    print(f'removing {item} from {col}')
                    col.remove(item)
                    del self.pricelist[item]

    def find_cheapest(self):
        iterations = 1
        for col in self.itemcodes:
            iterations *= len(col)
        print(f'this may require {iterations} iterations!')
        cheapest = self.maxprice * self.cols + 1
        for i, combo in enumerate(product(*self.itemcodes)):
            ##dummy price calculation
            price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
            if price < cheapest:
                print(f'current cheapest is {price} at iteration {i}')
                cheapest = price
                if price < self.threshold:
                    print('under threshold: returning')
                    break
        return cheapest

Some notes:

  • I assume the cheapest combo is not simply given by selecting the cheapest item in each column, otherwise we would not need all this complicated machinery; so I inserted a random coefficient while calculating the total price of a combo - this should be replaced with the actual formula
  • I also assume we have item codes in our input table, with prices for each item stored elsewhere. As sample data I create codes from 'A000' to 'Axxx', and assign a random price between 0 and a maxprice to each one
  • Items with price = 0 are removed immediately, before the search for the cheapest combo
  • For large input tables the search will take a very long time. So although it wasn't requested I also added an optional threshold parameter: if we find a total price under that value we consider it is cheap enough and stop the search

EDIT

The following is a Python 3.5 compatible version.

However it must be noted that with a 10x15 input table the number of required iterations will be somewhere near 1E+15 (something less actually, depending on how many cells we are able to ignore as "obvious outliers"). Even if we check 1 million combos per second it will still run for (something less than) 1E+09 seconds, or about 32 years.

So we need a way to improve our strategy. I integrated two options:

  • Setting a threshold, so that we don't search for the actual best price but stop as soon as we find an "acceptable" one
  • Splitting the tables in "zones" (subsets of columns), looking for the best partial solution for each zone and then combining them.

Sample runs:

##10 x 15, 5 zones, each 3 columns wide
this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 71 in 0.06 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 2 at iteration 291 in 0.11 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 330 in 0.07 secs

this may require up to 8.100000e+02 iterations!
...
current best price is 4 at iteration 34 in 0.09 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 82 in 0.07 secs
['A000', 'A106', 'A017', 'A033', 'A139', 'A020', 'A051', 'A052', 'A008', 'A009', 'A055', 'A131', 'A147', 'A133', 'A044']

##10 x 15, no zones, threshold = 25
this may require up to 8.100000e+14 iterations!
...
current best price is 24 at iteration 267493282 in 1033.24 secs
under threshold: returning
['A000', 'A001', 'A002', 'A003', 'A004', 'A005', 'A051', 'A052', 'A008', 'A039', 'A055', 'A071', 'A042', 'A133', 'A044']

Code follows:

from itertools import product
from random import randint
from time import time

class PriceComparison():
    def __init__(self, rows, cols, zones = [], maxprice = 50, threshold = 0):
        self.rows = rows
        self.cols = cols
        if zones == []:
            self.zones = [cols]
        else:
            self.zones = zones
        self.maxprice = maxprice
        self.threshold = threshold
        self.__post_init__()
    
    def __post_init__(self):
        ##create sample data
        self.itemcodes = [['A%03d' % (r+self.cols*c) for c in range(self.rows)] for r in range(self.cols)]
        print(self.itemcodes)
        self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
        ##remove items with price = 0
        for col in self.itemcodes:
            for item in col[:]:
                if self.pricelist[item] == 0:
                    print('removing %s from %s' % (item, col))
                    col.remove(item)
                    del self.pricelist[item]

    def find_cheapest(self, lo, hi):
        iterations = 1
        for col in self.itemcodes[lo:hi]:
            iterations *= len(col)
        start = time()
        print('\nthis may require up to %e iterations!' % (iterations))
        bestprice = self.maxprice * self.cols + 1
        for i, combo in enumerate(product(*self.itemcodes[lo:hi])):
            ##dummy price calculation
            price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
            if price < bestprice:
                elapsed = time() - start
                print('current best price is %d at iteration %d in %.2f secs' % (price, i, elapsed))
                cheapest = combo
                bestprice = price
                if price < self.threshold:
                    print('under threshold: returning')
                    break
        return cheapest

    def find_by_zones(self):
        print(self.zones)
        fullcombo = []
        lo = 0
        for zone in self.zones:
            hi = lo + zone
            fullcombo += self.find_cheapest(lo, hi)
            lo = hi
        return fullcombo
gimix
  • 3,431
  • 2
  • 5
  • 21
  • You're right with regards to the cheapest combo, there is limit associated with each row, such that it's not always the case that the cheapest price can be selected. It's the cheapest overall combo that fits within the limits. I omitted this from the original explanation as I wanted to keep the question simple. I'll give the above a try. – Anton Aug 27 '21 at 11:13
  • EDIT: I'm getting an error right at the very beginning (see below). It's when it's declaring rows as integer. I'm using Python 3.5. Any ideas? File "", line 8 rows : int ^ SyntaxError: invalid syntax – Anton Aug 27 '21 at 11:22
  • Oh, sorry. Typing should be supported in 3.5, but dataclasses are not, so my code cannot work anyhow. Later I will post a 3.5 compatible version – gimix Aug 27 '21 at 12:36