Breaking

Sunday, May 23, 2021

Now that we have a high-level idea of what binaries look like and how they work, we’re ready to dive into a real binary format. Executable and Linkable Format (ELF) is the default binary format on Linux-based systems and Portable Executable (PE) format is on Windows system. But in this series, we will discuss the Linux-based ELF format. ELF is used for executable files, object files, shared libraries, and core dumps. we’ll focus on ELF executables here, but the same concepts apply to other ELF file types. Let's take a look at ELF binary inner view.




1. Executable header:

Every ELF file starts with an executable header, which is just a structured series of bytes telling you that it’s an ELF file, what kind of ELF file it is, and wherein the file to find all the other contents. The below figure shows the type definition for the 64-bit ELF executable header from /usr/include/elf.h.



e_ident: The executable header (and the ELF file) starts with a 16-byte array called e_ident. The e_ident array always starts with a 4-byte “magic value” identifying the file as an ELF binary. The magic value consists of the hexadecimal number 0x7f, followed by the ASCII character codes for the letters E, L, and F. Having these bytes right at the start is convenient because it allows tools such as a file, as well as specialized tools such as the binary loader, to quickly discover that they’re dealing with an ELF file.


The EI_CLASS byte really denotes is whether the binary is for a 32-bit or 64-bit architecture. In the former case, the EI_CLASS byte is set to the constant ELFCLASS32 (which is equal to 1), while in the latter case, it’s set to ELFCLASS64 (equal to 2).


The EI_DATA byte indicates the endianness of the binary. A value of ELFDATA2LSB (equal to 1) indicates a little-endian, while ELFDATA2MSB (equal to 2) means a big-endian.


The EI_VERSION indicates the version of the ELF specification used when creating the binary. Currently, the only valid value is EV_CURRENT, which is defined to be equal to 1.


The EI_OSABI and EI_ABIVERSION bytes denote information regarding the application binary interface (ABI) and operating system (OS) for which the binary was compiled. If the EI_OSABI byte is set to nonzero, it means that some ABI- or OS-specific extensions are used in the ELF file. The default value of zero indicates that the binary targets the UNIX System V ABI. 


The EI_ABIVERSION byte denotes the specific version of the ABI indicated in the EI_OSABI byte that the binary targets. You’ll usually see this set to zero because it’s not necessary to specify any version information when the default EI_OSABI is used.


The EI_PAD field(9 - 15) actually reserved for possible future use but currently set to zero.

You can see the e_ident array of any ELF binary by using readelf.


The e_ident array is shown on the line marked Magic. It starts with the familiar four magic bytes, followed by a value of 2 (indicating ELFCLASS64), then a 1 (ELFDATA2LSB), and finally another 1 (EV_CURRENT). The remaining bytes are all zeroed out since the EI_OSABI and EI_ABIVERSION bytes are at their default values; the padding bytes are all set to zero as well. The information contained in some of the bytes is explicitly repeated on dedicated lines, marked Class, Data, Version, OS/ABI, and ABI Version, respectively.


e_type: It specifies the type of the binary. It could be ET_REL (indicating a relocatable object file), ET_EXEC (an executable binary), and ET_DYN (a dynamic library, also called a shared object file).

e_machine:  which denotes the architecture that the binary is intended to run on. Here it set to EM_X86_64. Other values include EM_386 (32-bit x86) and EM_ARM (for ARM binaries).

e_version: This field serves the same role as the EI_VERSION byte in the e_ident array.

e_entry: This field denotes the entry point of the binary; this is the virtual address at which execution should start, which means where the interpreter (typically ld-linux.so) will transfer control after it finishes loading the binary into virtual memory.

e_phoff:  This indicates the file offsets to the beginning of the program header table, which means the number of bytes you should read into the file to get to the header.

e_shoff:  This indicates the file offsets to the beginning of the section header table.

e_flags: This field provides room for flags specific to the architecture for which the binary is compiled. For x86 binaries, e_flags is typically set to zero.

e_ehsize: This field specifies the size of the executable header, in bytes. For 64-bit x86 binaries, the executable header size is always 64 bytes, as you can see in the readelf output, while it’s 52 bytes for 32-bit x86 binaries.

e_phnum: Number of program header in the binary.

e_phentsize:  Size of each program header.

e_shnum: Number of section header in the binary.

e_shentsize: Size of each section header.

e_shstrndx: This field contains the index (in the section header table) of the header associated with a special string table section, called .shstrtab which stores the names of all the sections in the binary.



To make the following discussion easier to follow, we’ll discuss section headers and sections before program headers.


2. Section Headers:

The code and data in an ELF binary are logically divided into contiguous non-overlapping chunks called sections. Sections don’t have any predetermined structure. So the properties of the section are described by a section header, which allows you to locate the bytes belonging to the section. Let’s begin by discussing the format of the section headers.



sh_name: Section name (string tbl index). If set, it contains an index into the string table. If the index is zero, it means the section doesn’t have a name.

sh_type: Section type tells the linker something about the structure of a section’s contents. If it is  SHT_PROGBITS, then it contains program data, such as machine instructions or constants. SHT_SYMTAB for static symbol tables and SHT_DYNSYM for symbol tables used by the dynamic linker and string tables (SHT_STRTAB). Sections with type SHT_REL or SHT_RELA contain relocation entries. Sections of type SHT_DYNAMIC contain information needed for dynamic linking.

sh_flags: Section flags describe additional information about a section. SHF_WRITE indicates that the section is writable at runtime. The SHF_ALLOC flag indicates that the contents of the section are to be loaded into virtual memory when executing the binary. Finally, SHF_EXECINSTR tells you that the section contains executable instructions, which is useful to know when disassembling a binary.

sh_addr: Section virtual address at execution. Sections that aren’t intended to be loaded into virtual memory have sh_addr value of zero.

sh_offset: Section file offset, file offset (in bytes from the start of the file

sh_size: Section size in bytes, 

sh_link: The sh_link field makes the relationships between sections explicit by denoting the index of the related section, that the linker needs to know about.

sh_info: This field contains additional information about the section. For instance, for relocation sections, sh_info denotes the index of the section to which the relocations are to be applied. 

sh_addralign: Some sections may need to be aligned in memory in a particular way for efficiency of memory accesses. For instance, if this field is set to 16, it means the base address of the section (as chosen by the linker) must be some multiple of 16. The values 0 and 1 are reserved to indicate no special alignment needs.

sh_entsize: Some sections, such as symbol tables or relocation tables, contain a table of well-defined data structures (such as Elf64_Sym or Elf64_Rela). For such sections, the sh_entsize field indicates the size in bytes of each entry in the table. When the field is unused, it is set to zero.


The section headers for all sections in the binary are contained in the section header table. We will discuss the section and program header in the next part not to make this post so long.



close