0

I have written a class that generates random TCP ports that meet two criteria: 1, the port isn't being used by other processes; 2, the port hadn't been assigned by the object before.

To solve the problem, I created three sets, set 1 is a class attribute that won't change, it is the constant set representing all TCP ports from 1 to 65535.

Set 2 is dynamically created using a staticmethod, that contains all listening ports at the runtime of the staticmethod.

Set 3 is an instance attribute that is initially empty.

When the method that gets random ports is called, it calculates set1 - set2 - set3 and randomly chooses from the result and adds the choice to set3.

The code:

import psutil
import random

class Port_Getter:
    TOTAL = set(range(1, 65536))
    @staticmethod
    def busyports():
        return set(i.laddr.port for i in psutil.net_connections())
    
    def __init__(self):
        self.assigned = set()
    
    def randomport(self):
        available = list(Port_Getter.TOTAL - Port_Getter.busyports() - self.assigned)
        port = random.choice(available)
        self.assigned.add(port)
        return port

It works, but it is not very performant:

In [214]: p = Port_Getter()

In [215]: %timeit [p.randomport() for i in range(32)]
144 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I understand the list conversion is likely the culprit that slows down the function, however random.choice() doesn't work with sets and random.sample() with sets is deprecated and not very fast either.

Using a list will eliminate the need of conversion for random.choice(), however I need to either convert the lists back to sets to get the difference or use list comprehension to get the difference, both methods will slow down the code.

So what is a more efficient way to get the same results? I think it might be a third party library that can randomly choose from sets, but I really don't know what can do this.

This is not a duplicate of Python list subtraction operation, however this does indeed contain certain facets from that question.

Ξένη Γήινος
  • 2,181
  • 1
  • 9
  • 35
  • 1
    1st: measure where your times is being spent - might be the list() maybe it is psutil.net_connections(). 2nd: if `a = Port_Getter.TOTAL - Port_Getter.busyports() - self.assigned` is "sufficiently big" it may be faster to draw `c = choice(range(1,65536))` and check `if c in a: return c` then create the whole list (even 20 * O(1) checks may outperform 1 listgeneration). You could try something to that effect if `list(Port_Getter.TOTAL - Port_Getter.busyports() - self.assigned)` is really your culprit – Patrick Artner Aug 19 '21 at 11:23
  • Does it truly have to be a *random* choice from the set, or would an *arbitrary* choice be good enough? Arbitrary is easy - `S.pop()`, or `next(iter(S))` if you want a non-destructive choice – jasonharper Aug 21 '21 at 18:44

2 Answers2

1

The time complexity for turning a set into a list (or the other way around) is O(n), where as the complexity to check whether the element is in either busyports() or assigned is O(1) each. Additionally you have O(1) for guessing a new element.

That means as long as you have more than 3 of your ports available, guessing randomly and checking membership is faster in expectation.

TOTAL = range(1, 65536)
def randomport(self):
    port = random.choice(Port_Getter.TOTAL)
    busy_ports = Port_Getter.busyports()
    while port in busy_ports or port in self.assigned:
        port = random.choice(Port_Getter.TOTAL)
    self.assigned.add(port)
    return port
Lukas Schmid
  • 1,895
  • 1
  • 6
  • 18
  • "That means as long as you have more than 3 of your ports available, guessig randomly and checking membership is faster in expectation". I'm not very good at probability, but I'm a bit skeptical about this line of reasoning. Even if checking each randomly guessed value is `O(1)`, the probability of that guessed value being already busy or assigned would increase progressively, as the number of available ports decreases. Which means, more expected iterations of that `while` loop, which means more expected time to produce a valid random guess. – fountainhead Aug 20 '21 at 16:59
  • It's not that the chance is very good, it's just that creating a list from a set is very bad. It's a `geometric distribution`, so assuming you only have 3/n ports free to choose it'll take you n/3 tries on average to find a unused port. `Set` membership lookups are `O(1)` due to the hashmap used. Since it's (about) 3 operations to test membership and get a new testee that's the limit where making a set and then a list becomes better – Lukas Schmid Aug 21 '21 at 17:55
0

Perhaps you could use a list for the all the remaining ports and shuffle that. Then just pick (and remove) the next one to get a random unused value. Using a deque would make that even more efficient.

from collections import deque
from random import sample

busy      = {1,2,3} # use your busyport() method here
remaining = deque(sample([p for p in range(65536) if p not in busy],65536-len(busy)))

You can then get a random unused port using port = remaining.pop()

If you need to take into account dynamic changes in the busy port list (i.e. usage by other processes), you will need to check for membership and skip to the next random port when the selected port is busy. This will add extra overhead at each call to randomPort but makes the initialization code somewhat simpler:

class Port_Getter:            
    REMAINING = deque(sample(list(range(65536)),65536))

    def randomPort(self):
        while True:
            port = Port_Getter.REMAINING.pop()
            if port in Port_Getter.busyports(): continue
            return port

Port_Getter.REMAINING.pop() will never return the same port twice so you don't need to check against an assigned set (although you may want to keep track of the assigned ports for some other reason).

Note that it may be more efficient to use the 1st approach and just get a new random port if binding fails (ask for forgiveness rather than permission). Given that you'll never get the same port twice, only other processes can interfere so binding should succeed on the first try most of the time

Alain T.
  • 40,517
  • 4
  • 31
  • 51