Question

我正在研究c中的项目，我试图了解如何调试崩溃我程序的模糊错误。它有点大，试图通过制作较小版本的代码来解决问题是行不通的。所以我试图想出一种调试和查明内存泄漏的方法。

我提出了以下计划：我知道问题来自于运行某个函数，并且该函数以递归方式调用自身。所以我想我可以制作各种程序内存分配的快照。因为我不知道引擎盖下发生的事情（我知道在这种情况下有一点不够用）：

typedef struct record_mem {
    int num_allocs;
    int num_frees;
    int size_space;
    int num_structure_1;
    ...
    int num_structure_N;
    int num_records;
    struct record_mem *next;
} RECORD;
extern RECORD *top;
void pushmem(RECORD **top)
{
    RECORD *nnew = 0;
    RECORD *nnew = (RECORD *)malloc(sizeof(RECORD));
    nnew->num_allocs=1;
    nnew->num_frees=0;
    nnew->size_space=sizeof(RECORD);
    nnew->num_structure_1=0;
    ...
    nnew->num_structure_N=0;
    nnew->num_records=1;
    nnew->next=0;
    if(*top)
    {
        nnew->num_allocs+=(*top)->num_allocs;
        nnew->num_frees=(*top)->num_frees;
        nnew->size_space+=(*top)->size_space;
            nnew->num_structure_1=(*top)->num_allocs;
            ...
            nnew->num_structure_N=(*top)->num_allocs;
            nnew->num_records+=(*top)->num_records;
        nnew->next=*top;
    }
    *top=nnew;
}

我的想法是在我的程序崩溃之前打印出我的记忆内容（我知道GDB在哪里崩溃）。

然后在整个程序中（对于我的程序中的每个数据结构，我都有类似上面的推送功能）我可以简单地添加一个带有功能的单线程，统计数据结构分配加上总堆栈（堆？）内存分配（即我可以跟踪）。我只是在需要记录程序运行快照的地方创建更多的memory_record结构。问题是如果我不能以某种方式记录实际使用的内存量，那么内存资产负债表记录就不会有帮助。

但是我该怎么做？另外，我如何考虑悬空指针和泄漏？我正在使用OS X，而我目前正在查看如何记录堆栈指针和其他内容。

编辑：因为你问：valgrind的输出:( closure（）是从main调用的函数，它返回坏指针：它应该返回一个双向链表的头部，traversehashmap（）是一个调用的函数closure（）我用来计算并附加额外的节点到链表，并且它递归调用自身，因为它需要在节点之间跳转。）

jason-danckss-macbook:project Jason$ valgrind --leak-check=full --tool=memcheck ./testc
Will attempt to compute closure of AB:
Result: testcl: 0x10000d0b0
==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)
==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd
==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 
==7682== HEAP SUMMARY:
==7682==     in use at exit: 5,360 bytes in 48 blocks
==7682==   total heap usage: 99 allocs, 51 frees, 6,640 bytes allocated
==7682== 
==7682== 48 (24 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 33 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100001104: getnewrelation (relation.h:134)
==7682==    by 0x100001848: copyrelation (relation.h:343)
==7682==    by 0x100003991: closure (computation.h:531)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 1,128 (24 direct, 1,104 indirect) bytes in 1 blocks are definitely lost in loss record 36 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100002315: getnewholder (dependency.h:129)
==7682==    by 0x100003B17: main (test-computation.c:14)
==7682== 
==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks
==7682== Reachable blocks (those to which a pointer was found) are not shown.
==7682== To see them, rerun with: --leak-check=full --show-reachable=yes
==7682== 
==7682== For counts of detected and suppressed errors, rerun with: -v
==7682== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

Answer 1

您是否尝试过valgrind（及其memcheck）？

$ valgrind --tool=memcheck --leak-check=full ./yourprogram

（最好用-g编译你的程序）

编辑：对不起，我没有读到您不想使用Valgrind，但正如dureuill在帖子中的评论中指出的那样，非常< / em>很有用，学习它是值得的。

另一个信息：memory leak是由free或malloc之后遗失realloc引起的（您可以看到here a simple example in C）。您还可以使用grep（使用-n获取行，-r进行递归搜索）以列出程序中的所有内存分配行;并尝试通过调用free来匹配每个人。然而，这可能是乏味的，我真的相信使用Valgrind会更快。

Answer 2

从valgrind输出：

这可能是导致您出现问题的原因：

==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)
==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd
==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)

让我们深入了解

==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)

这是您的错误摘要。您可以在{h}的第490行printrelation2中访问8个字节的未分配（或以前分配的，然后解除分配的）内存位置。

==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd

访问的地址在大小为24的块内长度为8个字节，即大小为24的结构中大小为8个字节的字段（查找这样的结构），之前您已释放此地址。

==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)

这是调用的堆栈，导致释放您在程序崩溃时引用的地址。它以free开头，这是正常的，因为您可能使用free函数来释放内存。但是文件和行是标准库，因此不太相关。但是相关的是，这个免费是从destroyrelation2在关系h中的第161行调用的，这是有缺陷的免费的。 destroyrelation2本身由destroyallhashmap调用，由closure调用，main在test-computation.c的第38行调用。您需要找出分配中的错误导致您重复使用printrelation2中的指针，该指针先前在第38行的main中释放。

之后报告的内存泄漏存在，但不太可能是导致崩溃的原因。

现在valgrind输出更清晰吗？

注1：修复segfault后，此内存泄漏报告可能会发生变化，但就像现在一样，这是我如何解释它：

==7682== 48 (24 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 33 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100001104: getnewrelation (relation.h:134)
==7682==    by 0x100001848: copyrelation (relation.h:343)
==7682==    by 0x100003991: closure (computation.h:531)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 1,128 (24 direct, 1,104 indirect) bytes in 1 blocks are definitely lost in loss record 36 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100002315: getnewholder (dependency.h:129)
==7682==    by 0x100003B17: main (test-computation.c:14)
==7682== 
==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks

让我们从摘要开始：

==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks

您有两个已分配内存块，无法通过任何指针访问。这意味着在程序的某个地方，你可以使用它们，稍后你会完全忘掉它们。那些糟糕的内存泄漏。您需要检查逻辑以便处理这些块，或者在程序生命周期中尽快释放它们。我不确定间接丢失，我说你没有直接处理你的块，但你有指向拥有块的句柄的结构的指针。可以通过在退出之前释放结构中的指针来减轻这些内存泄漏。我不知道＆＃34;可能会丢失＆＃34;从来没有一个与valgrind。＆＃34;仍然可以到达＆＃34;是良好的内存泄漏，即在valgrind崩溃的时候，你没有释放仍然可以访问的块，但你有一个指向它的指针，你可以轻松地添加一个调用来释放该指针并解决记忆泄漏。

两个调用堆栈向您显示导致内存泄漏的malloc，减去＆＃34;仍然可以访问＆＃34;泄漏（要查看它们，必须将选项--leak-check-full --show-reachable=yes添加到valgrind调用中。

注2：避免使用诸如destroyallhashmap（难以阅读）或destroyrelation2（编号）之类的函数名称。首选destroy_all_hashmap或较不常见的（在C中）destroyAllHashmap并避免对函数编号。同样，避免使用像nnew这样的变量名，但要使用语义敏感的变量名。

Answer 3

由于我看到Valgrind的所有建议，我会推荐其他一些已经证明有用的更通用的。

缩小代码以查找错误

首先，使用任何工具/跟踪大型系统更加困难。尝试缩小问题范围。

例如关闭模块（注释掉代码段，看看你是否仍然继续产生问题）。一些命中和试验应该让你消除你的代码的一大部分，除非它是一个真正讨厌的随机内存损坏。

删除动态内存或至少注释内存取消分配

尝试评论＆＃34;无记忆＆＃34;调用（如果你的情况可以避免溢出系统内存）。通过这种方式，您至少可以消除或缩小deallocs的问题。更好的是，尝试使用静态分配的内存来运行整个系统。我知道它可能不太实用，但是一旦你有一个有限的范围一致地产生崩溃，你可能能够分配一个足够大的静态内存而不需要动态内存。可以创建一个节点数组并将它们分配给你的指针。

调用堆栈并观察崩溃位置

我假设您已经在崩溃点检查了调用堆栈并验证了本地可用的指针。在尝试上述任何一项之前，这应该是非常直接的方法。

c：调试模糊内存泄漏的策略？

3 个答案: