ELF和DWARF可能是在程序员日常生活中经常使用但是可能却没有听说过的两个部件。ELF(Executable and Linkable Format)是 Linux 世界最广泛中使用的一种Object File Format;它指定了一种将各部分数据存储在二进制文件的方式,比如说代码,静态数据,调试信息,以及一些 字符 串等这些数据。同时,也告诉加载器以何种方式对待二进制文件以及准备好执行,这涉及到将二进制文件的不同部分加载到内存中,以及根据其他一些组件的位置来修复(重定位)相关的数据位等等。我不会在文章中包含太多的ELF相关的知识,但是如果感兴趣的话你可以看一下这个精彩的图表或者这个ELF标准文档。
int m ai n() { long a = 3; long b = 2; long c = a + b; a = 4;}
.debug_line: line number info for a single cuSource lines (f rom CU-DIE at .debug_info offset 0x0000000b): NS new sta te ment, BB new b asic block, ET end of text sequence PE prologue end, EB e pi logue begin IS=val ISA number, DI=val discri mi nator value [lno,col] NS BB ET PE EB IS= DI= uri: "filepath"0x00400670 [ 1, 0] NS uri: "/home/simon/play/MiniDbg/examples/variable.cpp"0x00400676 [ 2,10] NS PE0x0040067e [ 3,10] NS0x00400686 [ 4,14] NS0x0040068a [ 4,16]0x0040068e [ 4,10]0x00400692 [ 5, 7] NS0x0040069a [ 6, 1] NS0x0040069c [ 6, 1] NS ET
那么,如果我们想在variable.cpp中的第4行下一个断点,应该怎么做呢? 查找与该文件相对应的条目,然后找到相关的行号,找到相关的地址,然后设置一个断点就可以了。在我们的小程序中,就是这一条:
0x00400686 [ 4,14] NS
相反的工作也是如此,如果我们有一个内存位置 - 比如一个RIP,并且想要找出它在源代码中的哪个位置,只需在行号信息表中找到最接近的映射地址,并从中获取行号即可。
:< 0><0x0000000b> DW_TAG_compile_unit DW_AT_producer clang ve rs ion 3.9.1 (tags/RELEASE_391/final) DW_AT_language DW_LANG_C_plus_plus DW_AT_name /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_stmt_list 0x00000000 DW_AT_comp_dir /super/secret/path/MiniDbg/build DW_AT_low_pc 0x00400670 DW_AT_high_pc 0x0040069cLOCAL_SYMBOLS:< 1><0x0000002e> DW_TAG_subprogram DW_AT_low_pc 0x00400670 DW_AT_high_pc 0x0040069c DW_AT_frame_base DW_OP_reg6 DW_AT_name main DW_AT_decl_file 0x00000001 /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_decl_line 0x00000001 DW_AT_type <0x00000077> DW_AT_external yes(1)< 2><0x0000004c> DW_TAG_variable DW_AT_loca ti on DW_OP_fbreg -8 DW_AT_name a DW_AT_decl_file 0x00000001 /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_decl_line 0x00000002 DW_AT_type <0x0000007e>< 2><0x0000005a> DW_TAG_variable DW_AT_loca TI on DW_OP_fbreg -16 DW_AT_name b DW_AT_decl_file 0x00000001 /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_decl_line 0x00000003 DW_AT_type <0x0000007e>< 2><0x00000068> DW_TAG_variable DW_AT_loca TI on DW_OP_fbreg -24 DW_AT_name c DW_AT_decl_file 0x00000001 /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_decl_line 0x00000004 DW_AT_type <0x0000007e>< 1><0x00000077> DW_TAG_base_type DW_AT_name int DW_AT_encoding DW_ATE_signed DW_AT_byte_size 0x00000004< 1><0x0000007e> DW_TAG_base_type DW_AT_name long int DW_AT_encoding DW_ATE_signed DW_AT_byte_size 0x00000008
DW_AT_producer clang version 3.9.1 (tags/RELEASE_391/final) <-- The compiler which produced this binaryDW_AT_language DW_LANG_C_plus_plus <-- The source languageDW_AT_name /super/secret/path/MiniDbg/examples/variable.cpp <-- The name of the file which this CU representsDW_AT_stmt_list 0x00000000 <-- An offset into the line table which tracks this CUDW_AT_comp_dir /super/secret/path/MiniDbg/build <-- The compila TI on directoryDW_AT_low_pc 0x00400670 <-- The start of the code for this CUDW_AT_high_pc 0x0040069c <-- The end of the code for this CU
for each compile unit: if the pc is between DW_AT_low_pc and DW_AT_high_pc: for each func TI on in the compile unit: if the pc is between DW_AT_low_pc and DW_AT_high_pc: return function information
一旦找到该函数,就可以在给定的内存地址DW_AT_low_pc上设置断点。但是,这将会在在函数头部开始时中断,最好在用户代码开始时中断。由于行表信息可以指定指定函数头部结束的内存地址,因此可以直接在行表中查找DW_AT_low_pc的值,然后继续读取,直到找到 标记 为函数头部结尾的条目。有些编译器不会输出这个信息,所以另外一个选择是在该函数的第二行条目给出的地址上设置一个断点。
< 1><0x0000002e> DW_TAG_subprogram DW_AT_low_pc 0x00400670 DW_AT_high_pc 0x0040069c DW_AT_frame_base DW_OP_reg6 DW_AT_name main DW_AT_decl_file 0x00000001 /super/secret/path/MiniDbg/examples/variable.cpp DW_AT_decl_line 0x00000001 DW_AT_type <0x00000077> DW_AT_external yes(1)
0x00400670 [ 1, 0] NS uri: "/super/secret/path/MiniDbg/examples/variable.cpp"
0x00400676 [ 2,10] NS PE
读取变量可能非常复杂。它们是可以在整个函数中变化的难以捉摸的东西,存储在 寄存器 中,放在内存中,被优化,被隐藏在角落里,等等等等乱七八糟。还好,我们简单的例子确实很简单。如果我们想要读取变量a的内容,则需要查看一下它的DW_AT_location 属性。
DW_AT_location DW_OP_fbreg -8
reg6 在x86架构上是RBP,由System V x86_64 ABI指定。现在我们读取RBP的内容,从中减去8,就找到了我们的变量。如果我们想实际上的理解这个变量,还需要查看它的类型:
< 2><0x0000004c> DW_TAG_variable DW_AT_name a DW_AT_type <0x0000007e>
< 1><0x0000007e> DW_TAG_base_type DW_AT_name long int DW_AT_encoding DW_ATE_signed DW_AT_byte_size 0x00000008
当然,这些类型可能会比这更复杂,因为它们必须能够表达类似于C ++类型的东西,但是这给出了它们如何工作的基本思想。
暂时回到RBP,Clang可以很好地根据RBP来追踪帧基址。最近版本的GCC更倾向于DW_OP_call_frame_cfa,它涉及解析.eh_frame ELF部分,这是一个完全不同的文章,我并不打算写。如果你告诉GCC使用DWARF 2而不是更新的版本,它会倾向于输出位置列表,这更容易阅读:
DW_AT_frame_base low-off : 0x00000000 addr 0x00400696 high-off 0x00000001 addr 0x00400697>DW_OP_breg7+8 low-off : 0x00000001 addr 0x00400697 high-off 0x00000004 addr 0x0040069a>DW_OP_breg7+16 low-off : 0x00000004 addr 0x0040069a high-off 0x00000031 addr 0x004006c7>DW_OP_breg6+16 low-off : 0x00000031 addr 0x004006c7 high-off 0x00000032 addr 0x004006c8>DW_OP_breg7+8
如果你想了解有关DWARF的更多信息,那么可以从这里获取相关标准。在撰写本文时,DWARF 5刚刚被发布,但是DWARF 4更受欢迎。
class debugger {public: debugger (std::string prog_name, pid_t pid) : m_prog_name{std::move(prog_name)}, m_pid{pid} { auto fd = open(m_prog_name.c_str(), O_RDONLY); m_elf = elf::elf{elf::create_mmap_loader(fd)}; m_dwarf = dwarf::dwarf{dwarf::elf::create_loader(m_elf)}; } //...private: //... dwarf::dwarf m_dwarf; elf::elf m_elf;};
## 调试信息原语接下来我们可以实现根据RIP的值来检索行条目和函数DIE。先从```get_function_from_pc```开始吧:``` c++ dwarf::die debugger::get_function_from_pc(uint64_t pc) { for (auto &cu : m_dwarf.compilation_units()) { if (die_pc_range(cu.root()).contains(pc)) { for (const auto& die : cu.root()) { if (die.tag == dwarf::DW_TAG::subprogram) { if (die_pc_range(die).contains(pc)) { return die; } } } } } throw std::out_of_range{" Can not find function"};}
这里我采取了一个比较笨拙的方法,只需遍历编译单元,直到知道到包含RIP的代码,然后一直迭代,直到在子节点中找到相关函数(DW_TAG_subprogram)。正如在上篇提到的,你可以想成员函数一样来处理这些,如果你想的话你还可以使用内联。 接下来是get_line_entry_from_pc:
dwarf::line_table::iterator debugger::get_line_entry_from_pc(uint64_t pc) { for (auto &cu : m_dwarf.compilation_units()) { if (die_pc_range(cu.root()).contains(pc)) { auto < = cu.get_line_table(); auto it = lt.find_address(pc); if (it == lt.end()) { throw std::out_of_range{"Cannot find line entry"}; } else { return it; } } } throw std::out_of_range{"Cannot find line entry"};}
void debugger::print_source(const std::string& file_name, unsigned line, unsigned n_lines_context) { std::ifstream file {file_name}; //Work out a window around the desired line auto start_line = line <= n_lines_context ? 1 : line - n_lines_context; auto end_line = line + n_lines_context + (line < n_lines_context ? n_lines_context - line : 0) + 1; char c{}; auto current_line = 1u; //Skip lines up until start_line while (current_line != start_line && file.get(c)) { if (c == '\n') { ++current_line; } } //Output cursor if we're at the current line std::cout << (current_line==line ? "> " : " "); //Write lines up until end_line while (current_line <= end_line && file.get(c)) { std::cout << c; if (c == '\n') { ++current_line; //Output cursor if we're at the current line std::cout << (current_line==line ? "> " : " "); } } //Write newline and make sure that the stream is flushed properly std::cout << std::endl;}
我们希望能够输出什么样的信号被发送给了进程,同时亦希望知道该信号是如何被产生的。例如,我们想知道收到的SIGTRAP信号是由于命中断点还是一个单步执行完产生的,亦或者是由于新线程建立而产生的,等等。 幸运的是,ptrace再一次支援了我们。ptrace有一个参数PTRACE_GETSIGINFO,该参数将会给出进程之前发出的信号的相关信息。如下:
siginfo_t debugger::get_signal_info() { siginfo_t info; ptrace(PTRACE_GETSIGINFO, m_pid, nullptr, &info); return info;}
siginfo_t { int si_signo; /* Signal number */ int si_errno; /* An errno value */ int si_code; /* Signal code */ int si_trapno; /* Trap number that caused hardware-generated signal (unused on most architectures) */ pid_t si_pid; /* Sending process ID */ uid_t si_uid; /* Real user ID of sending process */ int si_status; /* Exit value or signal */ clock_t si_utime; /* User time consumed */ clock_t si_stime; /* System time consumed */ sigval_t si_value; /* Signal value */ int si_int; /* POSIX.1b signal */ void *si_ptr; /* POSIX.1b signal */ int si_overrun; /* Timer overrun count; POSIX.1b timers */ int si_timerid; /* Timer ID; POSIX.1b timers */ void *si_addr; /* Memory location which caused fault */ long si_band; /* Band event (was int in glibc 2.3.2 and earlier) */ int si_fd; /* File descriptor */ short si_addr_lsb; /* Least significant bit of address (since Linux 2.6.32) */ void *si_lower; /* Lower bound when address violation occurred (since Linux 3.19) */ void *si_upper; /* Upper bound when address violation occurred (since Linux 3.19) */ int si_pkey; /* Protection key on PTE that caused fault (since Linux 4.6) */ void *si_call_addr; /* Address of system call instruction (since Linux 3.5) */ int si_syscall; /* Number of attempted system call (since Linux 3.5) */ unsigned int si_arch; /* Architecture of attempted system call (since Linux 3.5) */}
void debugger::wait_for_signal() { int wait_status; auto options = 0; waitpid(m_pid, &wait_status, options); auto siginfo = get_signal_info(); switch (siginfo.si_signo) { case SIGTRAP: handle_sigtrap(siginfo); break; case SIGSEGV: std::cout << "Yay, segfault. Reason: " << siginfo.si_code << std::endl; break; default: std::cout << "Got signal " << strsignal(siginfo.si_signo) << std::endl; }}
void debugger::handle_sigtrap(siginfo_t info) { switch (info.si_code) { //one of these will be set if a breakpoint was hit case SI_KERNEL: case TRAP_BRKPT: { set_pc(get_pc()-1); //put the pc back where it should be std::cout << "Hit breakpoint at address 0x" << std::hex << get_pc() << std::endl; auto line_entry = get_line_entry_from_pc(get_pc()); print_source(line_entry->file->path, line_entry->line); return; } //this will be set if the signal was sent by single stepping case TRAP_TRACE: return; default: std::cout << "Unknown SIGTRAP code " << info.si_code << std::endl; return; }}
你可以处理一堆不同风格的信号。详情请参阅man sigaction。 由于我们现在在得到SIGTRAP时修正RIP,所以可以去掉step_over_breakpoint中的部分代码:
void debugger::step_over_breakpoint() { if (m_breakpoints.count(get_pc())) { auto& bp = m_breakpoints[get_pc()]; if (bp.is_enabled()) { bp.disable(); ptrace(PTRACE_SINGLESTEP, m_pid, nullptr, nullptr); wait_for_signal(); bp.enable(); } }}