Question

如果我有一个声明struct的C ++程序，请说：

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

我通过LLVM C ++ API生成一些LLVM IR来镜像C ++声明：

vector<Type*> members;
members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
// since LLVM doesn't support unions, just use an ArrayType that's the same size
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

StructType *const llvm_S = StructType::create( ctx, "S" );
llvm_S->setBody( members );

如何确保C ++代码中的sizeof(S)与LLVM IR代码中的StructType大小相同？对于个别成员的抵消也是如此，即u.b。

我还有一个用C ++分配S数组的情况：

S *s_array = new S[10];

我将s_array传递给LLVM IR代码，我在其中访问数组的各个元素。为了实现这一点，{C}和LLVM IR中的sizeof(S)必须相同，所以：

%elt = getelementptr %S* %ptr_to_start, i64 1

将正确访问s_array[1]。

当我编译并运行下面的程序时，它会输出：

sizeof(S) = 16
allocSize(S) = 10

问题是LLVM在S::s和S::u之间缺少6个字节的填充。 C ++编译器使union在8字节对齐的边界上启动，而LLVM不启动。

我正在玩DataLayout。对于我的机器[Mac OS X 10.9.5，g ++ Apple LLVM 6.0（clang-600.0.57）（基于LLVM 3.5svn）]，如果我打印数据布局字符串，我得到：

e-m:o-i64:64-f80:128-n8:16:32:64-S128

如果我强制设置数据布局：

e-m:o-i64:64-f80:128-n8:16:32:64-S128-a:64

其中加法为a:64，这意味着聚合类型的对象在64位边界上对齐，然后我得到相同的大小。那么为什么默认数据布局不正确呢？

下面的完整工作程序

// LLVM
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IR/Type.h>
#include <llvm/Support/TargetSelect.h>

// standard
#include <iostream>
#include <memory>
#include <string>

using namespace std;
using namespace llvm;

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

ExecutionEngine* createEngine( Module *module ) {
    InitializeNativeTarget();
    InitializeNativeTargetAsmPrinter();

    unique_ptr<Module> u( module );
    EngineBuilder eb( move( u ) );
    string errStr;
    eb.setErrorStr( &errStr );
    eb.setEngineKind( EngineKind::JIT );
    ExecutionEngine *const exec = eb.create();
    if ( !exec ) {
        cerr << "Could not create ExecutionEngine: " << errStr << endl;
        exit( 1 );
    }
    return exec;
}

int main() {
    LLVMContext ctx;

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( layout );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

Answer 1

由于原始答案是＆＃34;预编辑＆＃34;的正确答案。问题，我正在为新问题写一个全新的答案（我猜测结构实际上并不相同是非常好的）。

问题不在于DataLayout [但你需要使用DataLayout来解决问题，所以你需要在开始制作LLVM-IR之前更新代码以创建模块]，但是你要在union中组合struct具有对齐限制的struct S { short s; // Alignment = 2 union U { bool b; // Alignment = 1 void *v; // Alignment = 4 or 8 }; U u; // = Alignment = 4 or 8 };具有较小的对齐限制：

members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

现在在LLVM代码中：

char dummy[sizeof(S::U)]

结构中的第二个元素是struct，其对齐要求为1.因此，当然，LLVM将与具有更严格对齐条件的C ++编译器不同地对齐i8 *

在这种特殊情况下，使用void *（又名i8）代替bitcast数组可以解决问题[显然需要相关b来翻译访问struct]

的值时，根据需要访问其他类型

要以完全通用的方式修复此问题，您需要生成union，其中包含char中具有最大对齐要求的元素，然后使用足够的{{1}填充它元素来弥补最大的尺寸。

我现在可以吃点东西了，但我会找回一些可以正确解决问题的代码，但它比我原先想象的要复杂一点。

以上发布的main修改为使用指针代替char数组：

int main() {
    LLVMContext ctx;

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    members.push_back( PointerType::getUnqual( IntegerType::get( ctx, 8 ) ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( *layout );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

还有一些微小的变化可以涵盖您的LLVM版本与我使用的版本之间setDataLayout已发生变化的事实。

最后是允许使用任何类型的通用版本：

Type* MakeUnionType( Module* module, LLVMContext& ctx, vector<Type*> um )
{
    const DataLayout dl( module );
    size_t maxSize = 0;
    size_t maxAlign = 0;
    Type*  maxAlignTy = 0;

    for( auto m : um )
    {
        size_t sz = dl.getTypeAllocSize( m );
        size_t al = dl.getPrefTypeAlignment( m );
        if( sz > maxSize ) 
            maxSize = sz;
        if( al > maxAlign) 
        {
            maxAlign = al;
            maxAlignTy = m;
        }
    }
    vector<Type*> sv = { maxAlignTy };
    size_t mas = dl.getTypeAllocSize( maxAlignTy );
    if( mas < maxSize )
    {
        size_t n = maxSize - mas;
        sv.push_back(ArrayType::get( IntegerType::get( ctx, 8 ), n ) );
    }
    StructType* u = StructType::create( ctx, "U" );
    u->setBody( sv );
    return u;
}

int main() {
    LLVMContext ctx;

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( *layout );

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    vector<Type*> unionMembers = { PointerType::getUnqual( IntegerType::get( ctx, 8 ) ), 
                   IntegerType::get( ctx, 1 )  };
    members.push_back( MakeUnionType( module, ctx, unionMembers ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

请注意，在这两种情况下，您需要bitcast操作来转换b的地址 - 在第二种情况下，您还需要一个bitcast将struct转换为{ {1}}，但假设您确实需要通用的void *支持，那么无论如何您都必须这样做。

可以在这里找到生成union类型的完整代码，这是我的Pascal编译器的union [这是Pascal的方法来制作一个{ {1}}]：

https://github.com/Leporacanthicus/lacsap/blob/master/types.cpp#L525 和代码生成，包括bitcast： https://github.com/Leporacanthicus/lacsap/blob/master/expr.cpp#L520

Answer 2

DataLayout的主要目的是了解元素的对齐方式。如果您不需要知道代码中元素的大小，对齐或偏移[并且LLVM实际上没有超出GEP指令的有用方法来查找偏移量，那么您几乎可以忽略偏移部分]，在您从IR执行（或生成目标文件）之前，不需要数据布局。

（当我为我的编译器实现-m32开关时，尝试使用64位“本机”datalayout编译32位代码确实有一些非常有趣的错误 - 不是一个好主意，在中间切换DataLayout编译，我这样做是因为我使用了“默认”，然后在创建实际目标文件时设置了不同的编译。

从LLVM IR

2 个答案: