NumPy can work with comma separated lists but that is a different task. I want to convert each character of a string into an entry of a np array:
x = np.frombuffer('fooλ'.encode(), dtype=np.uint8) #x = [102 111 111 206 187]
But the UTF-8 encoding assigns a variable number of bytes to each char (ascii chars take one byte but unicode chars take up to four). In this example "λ" costs two bytes.
To get the correct answer "ord()" works well:
x = np.asarray([ord(c) for c in 'fooλ']) #x = [102 111 111 955]
But this solution involves a list comprehension. Doing so is slow since it's not vectorized: the Python intrepreter has to call ord() on each character instead of calling a function once on the whole string. Is there a faster way?
Edit: this question is very similar, although my answer is much more concise.