目录
FFI:外语函数接口
Haskell 的外语函数接口描述了当需要从其它语言(主要是 C)调用函数时,在 Haskell 中应该怎样做。写起来一般是如下形式:
-
在文件
Foo.hs
中:foreign import ccall unsafe "foo" c_foo :: CInt -> CInt -> IO CInt
-
在文件
foo.c
中:int foo(int x, int y){ ... }
-
在 Cabal 配置文件(proj.cabal)中:
... c-sources: foo.c ...
此外,GHC 也提供了用于处理 C 函数绑定的预处理器:hsc2hs
。
一旦合适的准备工作完成,Cabal 便可以指挥编译过程并返回静态连接的二进制文件。Haskell 的 FFI 细则规定了在 Haskell 端的具体语法。要确保 FFI 调用顺利完成,需要注意以下几个方面:
- 需要自行确保在 Haskell 端和 C 端的类型匹配
- 在 C 端应该如何分配和释放内存
- 函数所涉及副作用的控制
- 安全(safe)和不安全(unsafe)的 FFI 调用的区别,参考此处
除此之外,需要使用正确的调用方式(一般是 ccall
)。如果要调用 C++ 函数,可以通过写一个对应的 C 函数封装。
FFI 类型
下表是一些常用的 FFI 类型和它们的位置,这些类型可以用于在 C 与 Haskell 之间传递。
C 类型和所在头文件 | Haskell 类型和所在模块 | Haskell 类型(启用 UnliftedFFITypes )和所在模块 |
---|---|---|
bool, built-in | CBool, Foreign.C.types | - |
int, built-in | CInt, Foreign.C.types | - |
uint, built-in | CUInt, Foreign.C.types | - |
long, built-in | CLong, Foreign.C.types | - |
ulong, built-in | CULong, Foreign.C.types | - |
uchar, built-in | Word8, Data.Word | - |
char, built-in | Int8, Data.Word | - |
uint8_t, stdint.h | Word8, Data.Word | - |
uint16_t, stdint.h | Word16, Data.Word | - |
uint32_t, stdint.h | Word32, Data.Word | - |
uint64_t, stdint.h | Word64, Data.Word | - |
int8_t, stdint.h | Int8, Data.Int | - |
int16_t, stdint.h | Int16, Data.Int | - |
int32_t, stdint.h | Int32, Data.Int | - |
int64_t, stdint.h | Int64, Data.Int | - |
type *, built-in | Ptr type, Foreign.Ptr | Addr#, GHC.Prim |
HsInt, HsFFI.h | Int, Prelude | Int#, GHC.Prim |
HsWord, HsFFI.h | Word, Prelude | Word#, GHC.Prim |
HsBool, HsFFI.h | Bool, Prelude | - |
double, built-in | Double, Prelude | Double#, GHC.Prim |
float, built-in | Float, Prelude | Float#, GHC.Prim |
size_t, stddef.h | CSize, Foreign.C.types | Word#, GHC.Prim |
一些类型的大小是平台相关(如 32 位、64 位间存在区别)的,如 HsInt
和 Int
的大小在 32 位计算机上是 32 位,而在 64 位计算机上是 64 位。GHC 也支持将某些数组(array)类型传递到 C 但并不是反之亦然:
C 类型和所在头文件 | Haskell 类型和所在模块 | Haskell 类型(启用 UnliftedFFITypes )和所在模块 |
type *, built-in | - | MutableByteArray#, GHC.Prim |
const type *, built-in | - | ByteArray#, GHC.Prim |
StgMutArrPtrs *(ghc<8.10), StgArrBytes **, Rts.h | - | ArrayArray#, GHC.Prim |
Haskell 的 FFI 细则同时支持使用函数地址。可以用来处理弱指针的析构函数。
foreign import ccall "&free" free :: FunPtr (Ptr Word8 -> IO ())
内存的分配与释放
在 C 函数中常常需要用到动态分配内存的数组,要在 Haskell 的 FFI 中调用它们,下面是两种常见的解决思路:
- 在 C 端分配内存并将对应的指针传回 Haskell 端。使用模块
Foreign.ForeignPtr
中的类型ForeignPtr
或模块Z.Foreign.CPtr
中的CPtr
类型来封装它。注意确保在不再需要时释放这片内存。 - 在 Haskell 端作为 GC(垃圾回收)可管理的堆对象分配内存,再传递给 C 端处理。
一般推荐参考第二种模式,由于此时使用的内存仍然处于 GHC 的 GC 管理之下,因此不用手动释放和担心泄露。
Allocate memory and pass to C
There’re some helpers in Z.Foreign
to help you with allocating and passing, it’s important to have some knowledge about GHC runtime system to get things right. GHC runtime is garbaged collected, and there’re two types of primitive array in GHC, with the objective to minimize overall memory management cost:
-
Small primitive arrays created with
newPrimArray
are directly allocated on GHC heap, which can be moved by GHC garbage collector, we call these arrays unpinned. Allocating these array is cheap, we only need to check heap limit and bump heap pointer just like any other haskell heap objects. But we will pay GC cost , which is OK for small arrays. -
Large primitive array and those created with
newPinnedPrimArray
are allocated on GHC managed memory blocks, which is also traced by garbage collector, but will never moved before freed, thus are called pinned. Allocating these arrays are bit more expensive since it’s more like how malloc works, but we don’t have to pay for GC cost.
Beside the pinned/unpinned difference, we have two types of FFI calls in GHC:
-
Safe FFI call annotated with
safe
keyword. These calls are executed on separated OS thread, which can be running concurrently with GHC garbage collector, thus we want to make sure only pinned arrays are passed. The main use case for safe FFIs are long running functions, for example, doing IO polling. Since these calls are running on separated OS thread, haskell thread on original OS thread will not be affected. -
Unsafe FFI call annotated with
unsafe
keyword. These calls are executed on the same OS thread which is running the haskell side FFI code, which will in turn stop GHC from doing a garbage collection. We can pass both pinned and unpinned arrays in this case. The use case for unsafe FFIs are short/small functions, which can be treated like a fat primitive operations, such as memcpy, memcmp. Using unsafe FFI with long running functions will effectively block GHC runtime thread from running any other haskell threads, which is dangerous. Even if you use threaded runtime and expect your haskell thread can be stolen by other OS threads, but this will not work since GHC garbage collector will refuse to run if one of the OS thread is blocked by FFI calls.
Base on above analysis, we have following FFI strategy table:
FFI \ Array | pinned | unpinned |
---|---|---|
unsafe | directly pass | directly pass |
safe | directly pass | make a copy |
Helpers in Z.Foreign
are also divided into two categories: those with unsafe suffix to be used with unsafe
FFI, and those with safe suffix to be used with safe
FFI. Following is a example to try accommodate a small C function:
include <HsFFI.h>
void c_add_and_time(HsInt x, HsInt y, HsInt* add_result, HsInt* time_result){
*add_result = x + y;
*time_result = x * y;
}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE UnliftedFFITypes #-}
import Z.Foreign
foreign import ccall unsafe c_add_and_time :: Int -> Int -> MBA# Int -> MBA# Int -> IO ()
cAddTime :: Int -> Int -> (Int, Int)
cAddTime x y = do
fst <$> allocPrimUnsafe @Int (\ add_result ->
fst <$> allocPrimUnsafe @Int (\ time_result ->
c_add_and_time x y add_result time_result))
Now when you call cAdd
in haskell:
allocPrimUnsafe
function will allocate a single elementMutablePrimArray Int
to be used asInt
pointer, here we use twoallocPrimUnsafe
to allocate memory for save add and time results.- The
x
andy
parameters are passed asInt
, and receive asHsInt
in C. Theadd_result
andtime_result
are passed asMBA# Int
, which is type alias forMutableByteArray#
, and received asHsInt*
in C. allocPrimUnsafe
will auto peek result from the single element array, and return together with FFI’s return value, which is ignored byfst
.
The memory allocated by allocPrimUnsafe
, allocPrimArrayUnsafe
and allocPrimVectorUnsafe
is not pinned, so you can’t get the address first, then pass it as Ptr a
. The only way to pass them is to use MutableByteArray#
and ByteArray#
primitive types. In Z.Foreign
module BA# a
and MBA# a
type alias are defined for writing convenience:
-- for const pointers
type BA# a = ByteArray#
-- for writable pointers
type MBA# a = MutableByteArray# RealWorld
Since they are type aliases, the type tag is only for document. You should use proper pointer types on C side to receive them just like a Ptr a
. Another common problem with BA#
and MBA#
is that they can only pass the array’s first element’s address, thus you have to manually pass a seperate offset parameter if you want to work with certain range of the array. This can be illustrated by following code:
include <HsFFI.h>
// here we write a wrapper to receive a slice of bytearray
HsInt hs_memchr(const uint8_t *a, HsInt aoff, uint8_t b, HsInt n) {
a += aoff;
uint8_t *p = memchr(a, b, (size_t)n);
if (p == NULL) return -1;
else return (p - a);
}
import Z.Foreign
import Data.Word
import qualified Z.Data.Vector as V
foreign import ccall unsafe hs_memchr :: BA# Word8 -> Int -> Word8 -> Int -> IO Int
memchrBytes :: V.Bytes -> Word8 -> Int
memchrBytes bs x = withPrimVector bs $ \ mba off len -> hs_memchr mba off x len
The safe FFI variation withPrimVectorSafe
is simplier, the offset is directly added to the address of pinned memory, so there’s only a pointer and an address parameter. It’s highly recommended to use unpinned allocation if possible, because pinned allocation often lead to memory fragmentation due their garbage collection strategy, especially under a lot of small repetitive allocations.
Null terminated strings
C use a lot of null ternimated strings, i.e. char*
where no length info is needed because it’s assumed that the string always ended with a NULL ternimator. In Haskell we provide a special type for this, that is the CBytes
type from Z.Data.CBytes
module. Similar to withPrimVectorUnsafe
and WithPrimVectorSafe
, use WithCBytesUnsafe
and withCBytes
to pass a CBytes
to C FFI.
> :m + Z.Data.CBytes Z.Foreign Data.Word
> foreign import ccall unsafe strlen :: BA# Word8 -> IO CSize
> withCBytesUnsafe "hello, world!" strlen
13
> foreign import ccall safe "strlen" strlen_safe :: Ptr Word8 -> IO CSize
> withCBytes "hello, world!" strlen_safe
13
Use allocCBytesUnsafe
, allocCBytes
to allocate memory to be passed to C, return CBytes
back.
> foreign import ccall unsafe sprint :: MBA# Word8 -> BA# Word8 -> Int -> IO ()
> allocCBytesUnsafe 32 $ \ dest -> withCBytesUnsafe "result is %d" $ \ fmt -> sprintf dest fmt 3
("result is 3",())
To get CBytes
from null terminated char*
, use fromCString
or peekMBACBytes
. If the memory is allocated from C, it’s recommend to use bracket
to ensure memory get freed.
Unaligned Class
Sometime the memory passed to C are written with some struct fields, you could use Storable
machinery from Foreign.Storable
to peek/poke data from/to the memory, but Storable
use Ptr a
, so it requires pinned memory whose address is fixed. In Z-Data an alternative way to do this is to use Unaligned
class from Z.Data.Array.Unaligned
module. Here’s a code sample from Z-IO:
// definitions from libuv
typedef struct uv_passwd_s {
char* username;
long uid;
long gid;
char* shell;
char* homedir;
} uv_passwd_t;
int uv_os_get_passwd(uv_passwd_t* pwd);
void uv_os_free_passwd(uv_passwd_t* pwd);
import Z.Foreign
import Z.Data.Array.Unaligned
import Z.IO.Exception
import Z.Data.CBytes
-- | Data type for password file information.
data PassWD = PassWD
{ passwd_username :: CBytes
, passwd_uid :: UID
, passwd_gid :: GID
, passwd_shell :: CBytes
, passwd_homedir :: CBytes
} deriving (Eq, Ord, Show, Read)
foreign import ccall unsafe uv_os_get_passwd :: MBA## PassWD -> IO CInt
foreign import ccall unsafe uv_os_free_passwd :: MBA## PassWD -> IO ()
-- | Gets a subset of the password file entry for the current effective uid (not the real uid).
--
-- The populated data includes the username, euid, gid, shell, and home directory.
-- On non-Windows systems, all data comes from getpwuid_r(3).
-- On Windows, uid and gid are set to -1 and have no meaning, and shell is empty.
getPassWD :: HasCallStack => IO PassWD
getPassWD = bracket
(do mpa@(MutableByteArray mba##) <- newByteArray (#size uv_passwd_t)
throwUVIfMinus_ (uv_os_get_passwd mba##)
return mpa)
(\ (MutableByteArray mba##) -> uv_os_free_passwd mba##)
(\ (MutableByteArray mba##) -> do
username <- fromCString =<< peekMBA mba## (#offset uv_passwd_t, username)
uid <- fromIntegral <$> (peekMBA mba## (#offset uv_passwd_t, uid) :: IO CLong)
gid <- fromIntegral <$> (peekMBA mba## (#offset uv_passwd_t, gid) :: IO CLong)
shell <- fromCString =<< peekMBA mba## (#offset uv_passwd_t, shell)
homedir <- fromCString =<< peekMBA mba## (#offset uv_passwd_t, homedir)
return (PassWD username uid gid shell homedir))
Note above Haskell code use hsc2hs to get constants(struct size, field offset, etc.) from C code, ##
is #
escaped in .hsc
file. uv_os_get_passwd
asks for a uv_passwd_t*
struct pointer which must a valid writable memory location, so in Haskell we manually allocate memory with newByteArray
and pass the MutableByteArray#
as a pointer. After FFI is complete, we use peekMBA
from Unaligned
class to read the char*
pointer, then use fromCString
from Z.Data.CBytes
to copy the result. After copy completes, uv_os_free_passwd
is called to free any memory allocated in C code.
CPtr
For some cases, allocation from C is mandatory, e.g. you can’t get size to allocate(hidden from C). We will use CPtr
as an example to illustrate how do we keep reference to some opaque C struct.
First you have to prepare a pair of allocation and free functions:
struct foo_s{
...
};
typedef struct foo_s foo_t;
// the allocation function
foo_t *new_foo(int x);
// the free function
void destroy_foo(foo_t* foo);
// some function need foo_t
void bar(foo_t* foo);
Now we import these functions in Haskell:
import Z.Foreign
import Z.Foreign.CPtr
data Foo
foreign import ccall unsafe new_foo :: CInt -> IO (Ptr Foo)
foreign import ccall unsafe "&destroy_foo" destroy_foo :: FunPtr (Ptr Foo -> IO ())
newFoo :: Int -> IO (CPtr Foo)
newFoo x = newCPtr' (new_foo (fromIntegral x)) destroy_foo
-- use `withCPtr` if you want to get foo_t pointer.
foreign import ccall unsafe bar :: Ptr Foo -> IO ()
...
foo <- newFoo ...
withCPtr foo bar
...
We encapsulate the C strcut foo_t
in a Haskell heap object CPtr Foo
with following steps:
- Define a type tag
Foo
. - Import allocation and free functions, the free function should be imported as a
FunPtr
with its address. - Use
newCPtr'
fromZ.Foreign.CPtr
to attach the free function as finalizer, which will be call once theCPtr Foo
is collected. withCPtr
will get the pointer back and ensure it will not get collected during the FFI computation.
Exception handling
C libraries usually have some conventions on error handling, e.g. return a minus error code to indicate exception case. It’s recommend to define an exception type then provide helpers. Following is an example in Z-Botan:
- Import Error code in hsc file:
pattern BOTAN_FFI_ERROR_UNKNOWN_ERROR :: CInt
pattern BOTAN_FFI_SUCCESS = (#const BOTAN_FFI_SUCCESS)
pattern BOTAN_FFI_INVALID_VERIFIER = (#const BOTAN_FFI_INVALID_VERIFIER)
pattern BOTAN_FFI_ERROR_INVALID_INPUT = (#const BOTAN_FFI_ERROR_INVALID_INPUT)
...
- Define an extensible exception type.
data SomeBotanException = forall e . Exception e => SomeBotanException e
instance Show SomeBotanException where
show (SomeBotanException e) = show e
instance Exception SomeBotanException
botanExceptionToException :: Exception e => e -> SomeException
botanExceptionToException = toException . SomeBotanException
botanExceptionFromException :: Exception e => SomeException -> Maybe e
botanExceptionFromException x = do
SomeBotanException a <- fromException x
cast a
#define BotanE(e) data e = e CInt CallStack deriving Show; \
instance Exception e where \
{ toException = botanExceptionToException \
; fromException = botanExceptionFromException \
}
BotanE(InvalidVerifier)
BotanE(InvalidInput)
BotanE(BadMac)
...
- And provide helpers for FFI code:
throwBotanIfMinus :: (HasCallStack, Integral a) => IO a -> IO a
throwBotanIfMinus f = do
r <- f
when (r < 0) (throwBotanError_ (fromIntegral r) callStack)
return r
throwBotanIfMinus_ :: (HasCallStack, Integral a) => IO a -> IO ()
throwBotanIfMinus_ f = do
r <- f
when (r < 0) (throwBotanError_ (fromIntegral r) callStack)
throwBotanError :: HasCallStack => CInt -> IO ()
throwBotanError r = throwBotanError_ r callStack
throwBotanError_ :: CInt -> CallStack -> IO ()
throwBotanError_ r cs = case r of
BOTAN_FFI_ERROR_INVALID_INPUT -> throw (InvalidInput r cs)
BOTAN_FFI_ERROR_BAD_MAC -> throw (BadMac r cs)
BOTAN_FFI_ERROR_INSUFFICIENT_BUFFER_SPACE -> throw (InsufficientBufferSpace r cs)
...
- In FFI code, use helper to throw exception when needed:
foreign import ccall unsafe hs_botan_mac_update :: BotanStructT -> BA## Word8 -> Int -> Int-> IO CInt
updateMAC :: HasCallStack => MAC -> V.Bytes -> IO ()
updateMAC (MAC bts _ _) bs =
withBotanStruct bts $ \ pbts ->
withPrimVectorUnsafe bs $ \ pbs off len ->
throwBotanIfMinus_ (hs_botan_mac_update pbts pbs off len)