SanitizeCoverage学习

SanitizerCoverage官方文档翻译

Introduction

LLVM内置了一个简单的代码覆盖率插桩工具（SanitizerCoverage）。它在函数、基本块和边界级别上插入对用户定义函数的调用。提供了这些回调的默认实现，用于简单的覆盖率报告和可视化。然而，如果您只需要覆盖率的可视化，可能更适合使用SourceBasedCodeCoverage。

Tracing PCs with guards

通过-fsanitize-coverage=trace-pc-guard设置，编译器将会在每一条边edge插入下面的代码：

1	__sanitizer_cov_trace_pc_guard(&guard_variable)

每一条边edge有他自己的 guard_variable(uint32_t)

编译器还将插入对模块构造函数的调用：

// The guards are [start, stop).
// This function will be called at least once per DSO and may be called
// more than once with the same values of start/stop.
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

通过额外的flag设置 ...=trace-pc,indirect-calls ，__sanitizer_cov_trace_pc_indirect(void *callee)函数将会在每一个间接调用时插入。

下列函数 __sanitizer_cov_trace_pc_* 应该由用户定义。

Example：

// trace-pc-guard-cb.cc
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}

疑问

1. SanitizeCoverage中，Tracing PCs with guards是什么意思，作用是什么？

答：在SanitizeCoverage中，”Tracing PCs with guards”是指通过对程序中的特定代码块进行跟踪，并在其执行路径上添加条件检查来实现代码覆盖率的收集。

作用是为了确定测试用例是否能够触发程序中的所有代码路径。当程序中存在条件分支、循环或其他控制结构时，使用条件检查可以追踪和记录每个可能的执行路径。这样做可以帮助开发人员识别未被覆盖的代码区域，从而进行调试、优化和改进代码。

具体来说，”Tracing PCs with guards”的过程涉及以下步骤：

选择一个代码块（通常是函数或方法）作为目标。
在代码块的入口处插入条件检查。
当满足条件时，记录该代码块的执行路径。
在执行路径中的每个条件分支点，重复步骤2-3直到覆盖所有可能的执行路径。
收集并分析代码覆盖率信息，以确定哪些路径已经被触发，哪些还没有。

通过”Tracing PCs with guards”的技术，开发人员可以更全面地了解他们的代码在不同输入下的行为，从而提高软件的质量和可靠性。

2. **guard是什么，*guard是什么？**

答：

guard指的是guard变量的地址，大小为32bit。其实插桩相当于是给每一个基本块（或边，根据细粒度具体来说）分配一个guard变量，每一个guard变量存储其对应的基本块的ID值。当执行该基本块时，调用__sanitizer_cov_trace_pc_guard函数，通过传入guard参数来识别当前是在处理哪一个基本块。

*guard 里存储的是插桩函数__sanitizer_cov_trace_pc_guard的ID序号，根据源代码中的基本块顺序来分配，验证如下：

以LLVM官网文档中的例子举例，对于以下代码：

void foo(){}
int main(int argc, char **argv){
    if (argc > 1) foo();
}

编译执行后结果如下：

将代码做下简单修改：

void foo(){}
void foo2(){}
int main(int argc, char **argv){
    if (argc > 1) foo();
}

可以看到，中间新添加了一个函数，除了第三行仍然是1之外，前两行的值都 +1，可以说明我们插入了一个函数导致main函数中基本块位置后移1位，所以ID值加1。

3. start和stop如何理解？[start,stop)

答：以LLVM官方文档中的tarce-pc-guard中的例子为例，按照命令编译执行后得到结果如下：

根据结果可以看到 start 为 0x6f9c70 , stop 为 0x6f9c80 。

start所指的地址正好是结果中最后一行，foo() 函数中插入的插桩函数的地址，并且guard的ID值刚好为1。

注意到注释中说，guards的范围为 [start,stop) ，前闭后开，所以 stop 所指的并不是最后一个guard的位置（0x6f9c7c , 4），而是该位置的下一个位置，即 0x6f9c7c+0x4(一个guard4字节) = 0x6f9c80

[start, stop)这个内存区域里存储的是插桩的edges的ID表，guards列表

在__sanitizer_cov_trace_pc_guard_init函数中，对该区域进行ID表的初始化。

在我们对该函数进行自定义时，要加上对该区域的初始化过程，如果不加，则__sanitizer_cov_trace_pc_guard函数不会产生输出，如下：

注释掉初始化代码以后，程序运行结果如图：

注：此处的代码示例参见https://github.com/lcatro/Source-and-Fuzzing/blob/master/12.%E6%B7%B1%E5%85%A5%E8%A7%A3%E6%9E%90libfuzzer%E4%B8%8Easan.md

如果未注释，即存在对该区域的初始化操作，则运行结果如下：

4. 每个边执行的次数在哪里统计？

libfuzzer没有对边的执行次数的计数？

答，计数了，似乎是通过一个全局变量数组来计数的，通过pc来定位是哪一个edge，并作为下标，

libfuzzer覆盖率如何统计？到底有没有统计path coverage

5. 如何根据边的覆盖率得到path coverage？

path具体是什么样子的？

ASAN_OPTIONS=coverage有哪些取值？

除了使用coverage=1之外，ASAN_OPTIONS环境变量中的coverage选项还可以设置其他值来启用不同的代码覆盖率收集模式。以下是几个常用的选项：

coverage=0：禁用代码覆盖率检测。这是默认设置。
coverage=1：启用基本块（basic block）级别的代码覆盖率检测。基本块是源代码中连续的一段指令，以及在控制流程中可能跳转到的位置。
coverage=2：启用PC表（PC-table）级别的代码覆盖率检测。PC表是记录代码中每个指令地址是否被执行的数据结构。
coverage=3：启用函数级别的代码覆盖率检测。它会计算每个函数被调用的次数和覆盖率。
coverage=4：启用BB-depth级别的代码覆盖率检测。它会记录每个基本块在执行时的深度信息。
请注意，具体可用的coverage取值可能因使用的编译器、操作系统和工具版本而有所差异，以上列出的选项仅为常见示例。可以参考相关文档或工具的官方说明以获取更多详细信息。