1

I am looking to convert a string like "ASDF" into one number like "123456789" that can easily be turned back into "ASDF", how would I go about doing this in python 3?

PyPylia
  • 387
  • 3
  • 14

4 Answers4

3

You can convert each character into its ascii value using ord and format them to be the same length using f"{1:02d}" and concatenate them together. Then when you want the string back reverse this process.

Something like

def to_num(s):
    return int("1" + "".join(map(lambda a: f"{ord(a):03d}", s)))

def to_str(n):
    num = str(n)[1:]
    chunks = [num[i:i+3] for i in range(0, len(num), 3)]
    return "".join(map(lambda a: chr(int(a)), chunks))
RedKnite
  • 1,525
  • 13
  • 26
  • 1
    Just tested out your code, it doesn't seem to work? https://imgur.com/a/ix7BOIy I am using python 3.7.5 (Also ignore the background, I have my cmd semi-transparent). – PyPylia May 17 '20 at 08:30
  • 1
    @LiamBogur Thank you! I was formatting it to always have 2 digits not 3. It should work now. – RedKnite May 17 '20 at 08:44
3

How about that:

int.from_bytes("ASDF".encode('utf-8'), byteorder='big', signed=False)
1095976006

and back

import math
(1095976006).to_bytes(math.ceil((1095976006).bit_length() / 8), byteorder = 'big', signed=False).decode('utf-8')
'ASDF'

It uses encode/decode to get utf-8 representation of Python string as byte array and then from_bytes/to_bytes to convert it to integer. SO it works with any string. Surprisingly, it is necessary to calculate number of bytes in order to use to_bytes.

  • I forgot that existed, great answer, and funnily enough, it gives the exact same result as my answer. – PyPylia May 17 '20 at 10:25
  • @LiamBogur Not surprising, this is the exact same logic (as long as we stay within ASCII land). I still wonder, what's the point of the exercise. It's not any kind of conversion: it's just - ok, take string's content as is in memory, that's your number. And thanks for the accept, now I can finally write nasty comments to other people's answers. Ok, not necessarily nasty. – Yaroslav Fyodorov May 17 '20 at 10:32
  • As to why I asked this, I was implementing RC4 myself as a challenge and I wondered how to convert strings to ints (even though what I was using already did it for me), so I tried to figure it out myself for a while and couldn't so I asked this question, then later figured it out and made my answer. – PyPylia May 17 '20 at 10:35
  • I was right! The best way is to let Python do the work! But, I never think of the encoding/decoding stuff. Good job! – RedKnite May 19 '20 at 00:36
  • @RedKnite here encode doesn't even do anything (I think) it's just the way to get to the underlying utf-8 representation of the string. Or maybe it does create new bytes object, who knows – Yaroslav Fyodorov May 19 '20 at 07:05
0

Just figured out a possible way, since I am only using ASCII this should work:

inp = "ASDF" #Input string
bits = 8     #Amount of bits per character

num = 0      #Just some setup value, this will be our encoded number
numstr = [ord(letter) for letter in inp] #Turns the string into a list containing each of the letters ascii code
for numletter in numstr: #Loop over each letter
    num = (num<<bits)+numletter #First shift the current num stored by 8 [In binary 01010101 becomes 0101010100000000] which makes room for the next character, then adds the new character in

print("Encoded:",num) #Print the encoded number

#num = 1234, not included as we've defined it before with the encoding part
#bits = 8, same reason as above
outp = "" #Another setup value, this will be our decoded string
AND = ((2**bits)-1) #A setup constant, we'll need this later, this is a number of just 1's in binary with the length of 8.
while num>=1: #Loop over each character
     numletter = num&AND #Bitwise AND it with the constant we got earlier, this'll extract the first character in the number
     outp = chr(numletter) + outp #First convert the extracted number to it's character, then add the character to the front of the output
     num = num>>bits #Remove the extracted number and shift it back by 8

print(outp) #Print the decoded string
PyPylia
  • 387
  • 3
  • 14
  • What do you guys think of my code? It might not work for UTF-8 but it should still work for things like Unicode as long as you change the bits to the number of bits required, but I don't exactly know how computers deal with Unicode so this might be wrong. – PyPylia May 17 '20 at 08:24
-1

Kinda problematic by definition. Depending on the length of the string you can easily run out of ints. Unless you mean number in string representation, then it's theoretically possible, but practically you will need arithmetic that works on those int as string numbers. Basically, if you are restricted to ascii, use base 26 representation:

"F"*26^0 + "D"*26^1 + "S" * 26^ 2 and so on

Or you can just concatenate the character codes, it's also an option, although it seems rather pointless - you don't really covert string to number, you just print its representation in another way.

Or perhaps, you are talking about mapping closed set of string to integers - then just put your strings in the list and use their index as the matching int.

  • 1
    Python or at least Python3 has arbitrarily large ints. You can't run you unless your memory does. – RedKnite May 17 '20 at 08:04
  • @RedKnite Still, can be kinda big number. But then base 26 is good if you are in ascii. Ah, I forgot to convert letters to numbers. So, need to add mapping `letters = {A': 1, 'B':2, ...}` `letters("F")*26^0 + letters("D")*26^1 + letters("S") * 26^ 2 ` I still miss the purpose of the conversion – Yaroslav Fyodorov May 17 '20 at 08:18
  • ints are stored the same way regardless of what base you pass them to int as. And ascii has more characters than just uppercase letters, but that would be a clever trick if your just dealing with letters. Though OP only uses capital letters so maybe that would work here :) – RedKnite May 17 '20 at 08:25
  • You actually don't need the dictionary `int("ASDF", 36)` will result in 463520 – RedKnite May 17 '20 at 08:27
  • 1
    @RedKnite I can say the same to your solution: not every letter is encodable with 2 or 3 digit number. Think unicode. So, your split by chunks of 3 will mess things up in decoding – Yaroslav Fyodorov May 17 '20 at 08:30
  • touche - I suppose the best way would be to see how python stores unicode strings internally since everything has to be converted to binary eventually anyway, but I don't want to go find that – RedKnite May 17 '20 at 08:35
  • Remember, I said "ASCII" so while it would probably be best for others if it included Unicode support, as I have stated ASCII that is all I need. – PyPylia May 17 '20 at 08:36
  • @LiamBogur I can't comment on your answer so I will write here: Your code is ok, but in some sense when you look at it from low-level perspective it's kinda finny: basically string is saved in memory as sequence of character codes. In Python it's a bit more complicated but in C this what string is - sequence of bytes where each byte contains ASCII code (especially if we are in pre unicode world). So, this code does a lot of operations which basically translate to "this is you string in memory" - now treat it as an integer. I ignore byte order and stuff. And the decode is the opposite ... – Yaroslav Fyodorov May 17 '20 at 09:26
  • @LiamBogur ... this is an int in memory - ok treat each byte as a char. Almost Like nop (no operation). Ad what a lot of work to do this in Python – Yaroslav Fyodorov May 17 '20 at 09:28
  • @YaroslavFyodorov Well, I know that, but this is basically the best I could have done as I don't currently know of anything that allows you to see the raw C representation of variables in python. – PyPylia May 17 '20 at 09:49
  • @LiamBogur See my new answer that uses underlying representation of strings – Yaroslav Fyodorov May 17 '20 at 10:12