43

What is the best way to convert a String to a ByteString in Haskell?

My gut reaction to the problem is

import qualified Data.ByteString as B
import Data.Char (ord)

packStr = B.pack . map (fromIntegral . ord)

But this doesn't seem satisfactory.

Thomas Eding
  • 35,312
  • 13
  • 75
  • 106
  • 6
    Modern: You should typically convert `[Char]` to `Text` and `[Word8]` to `ByteString`. Its still `pack` though :) – alternative Apr 04 '12 at 16:01
  • 4
    Converting Unicode to bytes involves using a Unicode encoding. Using `pack` is more similar to an unsafe cast. – tibbe Apr 02 '14 at 09:38

3 Answers3

60

Here is my cheat sheet for Haskell String/Text/ByteString strict/lazy conversion assuming the desired encoding is UTF-8. The Data.Text.Encoding library has other encodings available.

Please make sure to not write (using OverloadedStrings):

lazyByteString :: BL.ByteString
lazyByteString = "lazyByteString ä ß" -- BAD!

This will get encoded in an unexpected way. Try

lazyByteString = BLU.fromString "lazyByteString ä ß" -- good

instead.

String literals of type 'Text' work fine with regard to encoding.

Cheat sheet:

import Data.ByteString.Lazy as BL
import Data.ByteString as BS
import Data.Text as TS
import Data.Text.Lazy as TL
import Data.ByteString.Lazy.UTF8 as BLU -- from utf8-string
import Data.ByteString.UTF8 as BSU      -- from utf8-string
import Data.Text.Encoding as TSE
import Data.Text.Lazy.Encoding as TLE

-- String <-> ByteString

BLU.toString   :: BL.ByteString -> String
BLU.fromString :: String -> BL.ByteString
BSU.toString   :: BS.ByteString -> String
BSU.fromString :: String -> BS.ByteString

-- String <-> Text

TL.unpack :: TL.Text -> String
TL.pack   :: String -> TL.Text
TS.unpack :: TS.Text -> String
TS.pack   :: String -> TS.Text

-- ByteString <-> Text

TLE.encodeUtf8 :: TL.Text -> BL.ByteString
TLE.decodeUtf8 :: BL.ByteString -> TL.Text
TSE.encodeUtf8 :: TS.Text -> BS.ByteString
TSE.decodeUtf8 :: BS.ByteString -> TS.Text

-- Lazy <-> Strict

BL.fromStrict :: BS.ByteString -> BL.ByteString
BL.toStrict   :: BL.ByteString -> BS.ByteString
TL.fromStrict :: TS.Text -> TL.Text
TL.toStrict   :: TL.Text -> TS.Text

Please +1 Peaker's answer, because he correctly deals with encoding.

cmaher
  • 5,100
  • 1
  • 22
  • 34
thetrutz
  • 1,395
  • 13
  • 12
32

Data.ByteString.UTF8.fromString is also useful. The Char8 version will lose the unicode-ness and UTF8 will make a UTF8-encoded ByteString. You have to choose one or the other.

Philippe Fanaro
  • 6,148
  • 6
  • 38
  • 76
Peaker
  • 2,354
  • 1
  • 14
  • 19
  • In case the question comes up: this function is not located by Hoogle because it only indexes a small set of libraries (those shipped with GHC). Expanding the set of libraries indexed by Hoogle has came up several times but hasn't been done I think due to time constraints of the Hoogle developer (Neil). FYI, the function discussed here is from the utf8-string package. – Thomas M. DuBuisson Jul 13 '10 at 22:18
  • @TomMD: Hayoo addresses this: http://holumbus.fh-wedel.de/hayoo/hayoo.html#0:String%20-%3E%20ByteString – Peaker Jul 14 '10 at 10:24
  • @peaker: Not to my satisfaction. Hayoo does a poor job at type search, particularly when the type is general or polymorphic. – Thomas M. DuBuisson Jul 14 '10 at 14:20
15

A safe approach will involve encoding the unicode string:

import qualified Data.ByteString as B
import qualified Data.Text as T
import Data.Text.Encoding (encodeUtf8)

packStr'' :: String -> B.ByteString
packStr'' = encodeUtf8 . T.pack

Regarding the other answers: Data.ByteString.Char8.pack is effectively the same as the version in the question, and is unlikely to be what you want:

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import qualified Data.Text as T
import Data.Text.Encoding (encodeUtf8)
import Data.Char (ord)

packStr, packStr', packStr'' :: String -> B.ByteString
packStr   = B.pack . map (fromIntegral . ord)
packStr'  = C.pack
packStr'' = encodeUtf8 . T.pack

*Main> packStr "hellö♥"
"hell\246e"
*Main> packStr' "hellö♥"
"hell\246e"
*Main> packStr'' "hellö♥"
"hell\195\182\226\153\165"

Data.ByteString.UTF8.fromString is fine, but requires the utf8-string package, while Data.Text.Encoding comes with the Haskell Platform.

robx
  • 2,221
  • 1
  • 14
  • 31