-1
  • I have a large document corpus, D which is basically a Python list of n filtered tweets.

    For example, D[0] is "New Exploit to 'Hack Android Phones Remotely' threatens Millions of Devices"

    Also, n is of the order 10^4.

  • Say, there's another list of m = 10 topics for my documents in Z, that I wish to randomly assign to each document and,

    Z = ['hack', 'tools', 'android', 'google', 'anonymous', ... ].

How do I go about creating an n x 2 array, such that that the assignment of topics is (as close to) a truly random process?


Edit:

I'm not sure how to code this. Sorry if the explanation is a little vague, but there isn't much information to give. I simply want a way to map from Z to D, randomly (to obtain an n x 2 array not an n x m array, honest mistake).

  • It would be helpful if you clarify your question with a simple example using small values of n and m. Also, you should post your own attempt at coding this. – PM 2Ring Mar 18 '16 at 11:09
  • @PM2Ring I've added as much detail as I could. There's not a lot going on in the code itself. I simply want to map from Z to D, *randomly*. –  Mar 18 '16 at 11:45
  • I can show you how to build a Python list of _n_ rows. The _i_ th row consists of _m_ tuples. Each tuple pairs the _i_ th tweet with one of the _m_ topics, in random order. Would that help? – PM 2Ring Mar 18 '16 at 12:05
  • @PM2Ring yes, that should work. I realised that I don't need an n x m matrix at all. –  Mar 18 '16 at 12:13
  • 1
    Take a look at [random.choice](https://docs.python.org/3/library/random.html#random.choice); numpy may provide something similar, but I don't know numpy. – PM 2Ring Mar 18 '16 at 12:13
  • Yeah, I was confused by _why_ you wanted a `nxm` array. :) – PM 2Ring Mar 18 '16 at 12:15
  • Turns out there's also a [numpy.random.choice](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html). Worked, thanks! –  Mar 18 '16 at 12:32

1 Answers1

0

I think this is what you are after.

>>> D = [1,2,3,4,5,6,7,8,9]
>>> Z = ['a','b','c','d','e','f','g']
>>> [[i, random.choice(Z)] for i in D]
[[1, 'a'], [2, 'd'], [3, 'c'], [4, 'f'], [5, 'b'], [6, 'g'], [7, 'f'], [8, 'f'], [9, 'f']]

This list comprehension iterates through D (Your corpus) and matches each element to a random element of Z (your topics).

Tuples might be a better choice than lists for the individual pairs though, as they are more commonly used to represent a collection of different things - see this answer for when to use Lists vs Tuples.

Community
  • 1
  • 1
SiHa
  • 7,830
  • 13
  • 34
  • 43