Copyright | (c) Duncan Coutts 2012-2013 |
---|---|
License | BSD-style |
Maintainer | [email protected] |
Stability | stable |
Portability | ghc only |
Safe Haskell | Trustworthy |
Language | Haskell98 |
A compact representation suitable for storing short byte strings in memory.
In typical use cases it can be imported alongside Data.ByteString, e.g.
import qualified Data.ByteString as B import qualified Data.ByteString.Short as B (ShortByteString, toShort, fromShort)
Other ShortByteString
operations clash with Data.ByteString or Prelude functions however, so they should be imported qualified
with a different alias e.g.
import qualified Data.ByteString.Short as B.Short
data ShortByteString Source
A compact representation of a Word8
vector.
It has a lower memory overhead than a ByteString
and and does not contribute to heap fragmentation. It can be converted to or from a ByteString
(at the cost of copying the string data). It supports very few other operations.
It is suitable for use as an internal representation for code that needs to keep many short strings in memory, but it should not be used as an interchange type. That is, it should not generally be used in public APIs. The ByteString
type is usually more suitable for use in interfaces; it is more flexible and it supports a wide range of operations.
With GHC, the memory overheads are as follows, expressed in words and in bytes (words are 4 and 8 bytes on 32 or 64bit machines respectively).
ByteString
unshared: 9 words; 36 or 72 bytes.ByteString
shared substring: 5 words; 20 or 40 bytes.ShortByteString
: 4 words; 16 or 32 bytes.For the string data itself, both ShortByteString
and ByteString
use one byte per element, rounded up to the nearest word. For example, including the overheads, a length 10 ShortByteString
would take 16 + 12 = 28
bytes on a 32bit platform and 32 + 16 = 48
bytes on a 64bit platform.
These overheads can all be reduced by 1 word (4 or 8 bytes) when the ShortByteString
or ByteString
is unpacked into another constructor.
For example:
data ThingId = ThingId {-# UNPACK #-} !Int {-# UNPACK #-} !ShortByteString
This will take 1 + 1 + 3
words (the ThingId
constructor + unpacked Int
+ unpacked ShortByteString
), plus the words for the string data.
With GHC, the ByteString
representation uses pinned memory, meaning it cannot be moved by the GC. This is usually the right thing to do for larger strings, but for small strings using pinned memory can lead to heap fragmentation which wastes space. The ShortByteString
type (and the Text
type from the text
package) use unpinned memory so they do not contribute to heap fragmentation. In addition, with GHC, small unpinned strings are allocated in the same way as normal heap allocations, rather than in a separate pinned area.
toShort :: ByteString -> ShortByteString Source
O(n). Convert a ByteString
into a ShortByteString
.
This makes a copy, so does not retain the input string.
fromShort :: ShortByteString -> ByteString Source
O(n). Convert a ShortByteString
into a ByteString
.
pack :: [Word8] -> ShortByteString Source
O(n). Convert a list into a ShortByteString
unpack :: ShortByteString -> [Word8] Source
O(n). Convert a ShortByteString
into a list.
empty :: ShortByteString Source
O(1). The empty ShortByteString
.
null :: ShortByteString -> Bool Source
O(1) Test whether a ShortByteString
is empty.
length :: ShortByteString -> Int Source
O(1) The length of a ShortByteString
.
index :: ShortByteString -> Int -> Word8 Source
O(1) ShortByteString
index (subscript) operator, starting from 0.
© The University of Glasgow and others
Licensed under a BSD-style license (see top of the page).
https://downloads.haskell.org/~ghc/7.10.3/docs/html/libraries/bytestring-0.10.6.0/Data-ByteString-Short.html