2

I have the following vector:

import numpy as np
my_vector = np.array([0.001, -0.05, 0.3, 0.5, 0.01, -0.03])

Could someone suggest a way to randomly generate similar vectors, with just slightly different values? The desired output would be, for instance:

[0.002, -0.06, 0.2, 0.4, 0.02, -0.02]

To give some context, this vector represents a sample that I feed into a classification model. My plan is to randomly generate a set of similar samples and feed them into the same model to observe the variation in its output. The end goal is to verify whether the model generates similar outputs for similar samples.

I tried to Create random vector given cosine similarity and setting my desired cosine similarity to 1, but with this method I can only obtain one similar vector (see below). And I would need at least 10.

def rand_cos_sim(v, costheta):
# Form the unit vector parallel to v:
u = v / np.linalg.norm(v)

# Pick a random vector:
r = np.random.multivariate_normal(np.zeros_like(v), np.eye(len(v)))

# Form a vector perpendicular to v:
uperp = r - r.dot(u)*u

# Make it a unit vector:
uperp = uperp / np.linalg.norm(uperp)

# w is the linear combination of u and uperp with coefficients costheta
# and sin(theta) = sqrt(1 - costheta**2), respectively:
w = costheta*u + np.sqrt(1 - costheta**2)*uperp

return w


new_vector = rand_cos_sim(my_vector, 1)
print(new_vector)

# [ 0.00170622 -0.08531119  0.51186714  0.8531119   0.01706224 -0.05118671]

I do not have a particular similarity measure in mind, it could be either Euclidean, Cosine, whichever works best. Any suggestions most welcome.

Please note that the my_vector I provided is for illustration purposes, in reality my vectors will have different ranges of values depending on the model I am testing and different data.

Thank you.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Notna
  • 491
  • 2
  • 8
  • 19

4 Answers4

4

Maybe I'm oversimplifying, but could you not just generate random vectors of the same size as yours then add them to your one to make them similar (or add one, then multiply since your example seems to vary less on the smaller numbers)?

def similar_vector(my_vector):
    return (0.95+numpy.random.rand(len(my_vector))*0.1)*my_vector
Simon Notley
  • 2,070
  • 3
  • 12
  • 18
4

I think the best way is to add a random number between two values. Look into random for this purpose.

import numpy as np
import random
my_vector = np.array([0.001, -0.05, 0.3, 0.5, 0.01, -0.03])

for i in range(len(my_vector)):
    my_vector[i] += random.uniform(.001,.1)

print(my_vector)

You can tune this by tweaking the value range

angrymantis
  • 352
  • 1
  • 9
3

You could generate random multiplicative factors by calling numpy.random.lognormal. Use mean=0 and a small value of sigma to generate random values near 1.

For example,

In [23]: my_vector = np.array([0.001, -0.05, 0.3, 0.5, 0.01, -0.03])                                                                 

In [24]: a = np.random.lognormal(sigma=0.1, size=my_vector.shape)                                                                    

In [25]: a                                                                                                                           
Out[25]: 
array([1.07162745, 0.99891183, 1.02511718, 0.85346562, 1.04191125,
       0.87158183])

In [26]: a * my_vector                                                                                                               
Out[26]: 
array([ 0.00107163, -0.04994559,  0.30753516,  0.42673281,  0.01041911,
       -0.02614745])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
0

I'm not a Python programmer, but I can see that your question is easily solved by simply recording the length of the start vector (vstart), generating another random unit vector (vnew), then multiplying vnew by the length of vstart, and you'll have a vector of the same length. Follow this pseudo code assuming we're talking about 3d vectors:

// get the length of the start vector
vslength = vector length float(vstart)

// generate new random vector
vnew = new vector(randox x, random y, random z)

// convert it to a unit vector (length = 1.0)
vnew = vector normalize (vnew)

// multiply it by the length of vstart
vnew = vnew * vslength

I imagine in Python there's probably a way to do all of this in one line of code, using APIs and some of the language's built in functionality.

If you don't need the functionality of a full cosine similarity implementation, this is far simpler and way less execution time.