I need an algorithm (preferably in Python) to convert an arbitrary string to a string containing only characters from the GSM alphabet. I need this filter to send the string as text in SMS:es. If possible, the algorithm should also replace characters with their closest encodable equivalent. Examples:
>>> gsm_convert('© all rights reserved')
[copyright sign] all rights reserved
# or
C all rights reserved
>>> gsm_convert('––– long dashes –––')
--- long dashes ---
Python has some builtin algorithms for doing this, but those functions also convert the input string to ascii which is not correct. GSM handles several characters not found in ascii.