Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LibC #366

Open
TravisCardwell opened this issue Jan 17, 2025 · 3 comments
Open

Use LibC #366

TravisCardwell opened this issue Jan 17, 2025 · 3 comments

Comments

@TravisCardwell
Copy link
Collaborator

Discussing with @edsko, we need to start using the LibC module in the code that we generate.

When translating C code, we must be able to recognize when a C type is one of the standard types that we support. Such types include the following:

  • Standard types defined in base in Foreign.**
  • stdint types that map to Haskell types defined in base in Data.Int and Data.Word
  • Standard types that we define in HsBindgen.Runtime.LibC

A known C type is mapped to a Haskell package, module, and identifier. Making such a mapping implementation general would be useful, since it could be used if/when we add support for translating multiple headers/modules.

Perhaps we can identify standard types using the name and source location information provided by libclang. Note that the actual source header files for many types may differ in different C library implementation, since they have different internal implementations. For example, Musl defines many types in the bits/alltypes.h header file that is included by the standard header files.

@edsko has a promising idea: perhaps we can parse our standard_headers.h bootstrap file and keep track of where the standard types are defined in that execution (which depends on which include directories are configured). We could then use the results to identify standard types as we translate the user code. With this idea, the implementation details should not matter, and we do not need to worry about the same types being imported from different header files.

If we run into trouble, perhaps we can build up a call-stack-style "include stack." I think we can do this with the information provided by libclang while folding, but we could parse the tokens of include directives if necessary.

Should we add checks to confirm that a parsed standard type matches the Haskell definition/instances? We could check that size and alignment are consistent. We could check that structure/union field names and types are consistent.

@TravisCardwell
Copy link
Collaborator Author

Types may be defined differently in different C library implementations.

For example, on my platform, using my default system (glibc) include path:

typedef signed int __int32_t;  // /usr/include/bits/types.h
typedef __int32_t int32_t;     // /usr/include/bits/stdint-intn.h

Using Musl:

typedef signed int int32_t;    // hs-bindgen/musl-include/x86_64/bits/alltypes.h

We need to map the C int32_t to Haskell Data.Int.Int32 for all such definitions.


Details:

Test header:

#include <stdint.h>

typedef int32_t MyInt32;

Using my default system include path:

$ cabal run hs-bindgen -- dev parse --input travis.h
Header
  { headerDecls =
      [ DeclTypedef
          Typedef
            { typedefName = CName { getCName = __int32_t }
            , typedefType = TypePrim (PrimIntegral PrimInt Signed)
            , typedefSourceLoc = "/usr/include/bits/types.h:41:20"
            }
      , DeclTypedef
          Typedef
            { typedefName = CName { getCName = int32_t }
            , typedefType = TypeTypedef CName { getCName = __int32_t }
            , typedefSourceLoc = "/usr/include/bits/stdint-intn.h:26:19"
            }
      , DeclTypedef
          Typedef
            { typedefName = CName { getCName = MyInt32 }
            , typedefType = TypeTypedef CName { getCName = int32_t }
            , typedefSourceLoc = "travis.h:3:17"
            }
      ]
  }

Using the vendored Musl headers:

$ cabal run hs-bindgen -- \
    --clang-option='-nostdinc' \
    --clang-option="-isystem$(pwd)/hs-bindgen/musl-include/x86_64" \
    dev parse --input travis.h
Header
  { headerDecls =
      [ DeclTypedef
          Typedef
            { typedefName = CName { getCName = int32_t }
            , typedefType = TypePrim (PrimIntegral PrimInt Signed)
            , typedefSourceLoc =
                "hs-bindgen/musl-include/x86_64/bits/alltypes.h:106:25"
            }
      , DeclTypedef
          Typedef
            { typedefName = CName { getCName = MyInt32 }
            , typedefType = TypeTypedef CName { getCName = int32_t }
            , typedefSourceLoc = "travis.h:3:17"
            }
      ]
  }

@TravisCardwell
Copy link
Collaborator Author

There is a factor of the LibC (usage) design that we should consider. The translation of C types as reported by libclang to standard/shared Haskell types (from base or LibC) should be implemented within hs-bindgen, where the needed types are available. Directly referencing or even enumerating LibC types within hs-bindgen would result in tight coupling and narrow dependency version constraints, though. Since the types are only used in generated code, it is possible to avoid this.

Here is a brief description of my current (WIP) design.

The C types for which we want to use standard/shared Haskell types are defined in the hs-bindgen-runtime package. The code in this package cannot use types defined in hs-bindgen, so simple types that can be easily translated are defined. The core of the API is as follows:

  • Sum type StdCType enumerates those types, both those defined in base and those defined in LibC.

  • A function resolves a C type name, returning a StdCType and a "check" when found. A check includes information that can be used to confirm that the parsed type matches the base/LibC type.

    resolveStdCType :: CTypeName -> Maybe (StdCType, Check)
  • A function gets the Haskell type reference (package, module, and identifier) for the Haskell type corresponding to a given StdCType.

    getHsTypeRef :: StdCType -> HsTypeRef

Note that "check" are only for these standard/shared types. They are not needed for user code since we generate those Haskell types.

In hs-bindgen, we can first parse standard_headers.h. Function resolveStdCType can be used to check if we have a Haskell type for a given C type, and the check helps us confirm compatibility. We can build up a mapping from (CName, SingeLoc) to StdCType.

When parsing user code, we can use that mapping to determine which types should be translated to a shared Haskell type. I think that this needs to be done in the C parser, where we have information from libclang. The varied definitions described in my previous comment must be handled.

When translating from C to Haskell, function getHsTypeRef can be used to get a reference to the Haskell type that should be used.

Any thoughts or ideas? Does it sound like I am heading in the right direction?

@TravisCardwell
Copy link
Collaborator Author

The next issue is long double support (#349).

$ cabal run hs-bindgen -- \
    --select-all \
    --clang-option='-nostdinc' \
    --clang-option="-isystem$(pwd)/hs-bindgen/musl-include/x86_64" \
    dev parse --input hs-bindgen/bootstrap/standard_headers.h
...
hs-bindgen: tcMacro: long double not supported
CallStack (from HasCallStack):
  error, called at src/HsBindgen/C/Tc/Macro.hs:1249:21 in hs-bindgen-0.1.0-inplace:HsBindgen.C.Tc.Macro

Parsing of the standard headers halts when long double is encountered for the first time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant