68 lines
3.0 KiB
Markdown
68 lines
3.0 KiB
Markdown
|
# Terminology
|
||
|
|
||
|
## DOL
|
||
|
|
||
|
A [DOL file](https://wiki.tockdom.com/wiki/DOL_(File_Format)) is the executable format used by GameCube and Wii games.
|
||
|
It's essentially a raw binary with a header that contains information about the code and data sections, as well as the
|
||
|
entry point.
|
||
|
|
||
|
## ELF
|
||
|
|
||
|
An [ELF file](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) is the executable format used by most
|
||
|
Unix-like operating systems. There are two common types of ELF files: **relocatable** and **executable**.
|
||
|
|
||
|
A relocatable ELF (`.o`, also called "object file") contains machine code and relocation information, and is used as
|
||
|
input to the linker. Each object file is compiled from a single source file (`.c`, `.cpp`).
|
||
|
|
||
|
An executable ELF (`.elf`) contains the final machine code that can be loaded and executed. It *can* include
|
||
|
information about symbols, debug information (DWARF), and sometimes information about the original relocations, but it
|
||
|
is often missing some or all of these (referred to as "stripped").
|
||
|
|
||
|
## Symbol
|
||
|
|
||
|
A symbol is a name that is assigned to a memory address. Symbols can be functions, variables, or other data.
|
||
|
|
||
|
**Local** symbols are only visible within the object file they are defined in.
|
||
|
These are usually defined as `static` in C/C++ or are compiler-generated.
|
||
|
|
||
|
**Global** symbols are visible to all object files, and their names must be unique.
|
||
|
|
||
|
**Weak** symbols are similar to global symbols, but can be replaced by a global symbol with the same name.
|
||
|
For example: the SDK defines a weak `OSReport` function, which can be replaced by a game-specific implementation.
|
||
|
Weak symbols are also used for functions generated by the compiler or as a result of C++ features, since they can exist
|
||
|
in multiple object files. The linker will deduplicate these functions, keeping only the first copy.
|
||
|
|
||
|
## Relocation
|
||
|
|
||
|
A relocation is essentially a pointer to a symbol. At compile time, the final address of a symbol is
|
||
|
not known yet, therefore a relocation is needed.
|
||
|
At link time, each symbol is assigned a final address, and the linker will use the relocations to update the machine
|
||
|
code with the final addresses of the symbol.
|
||
|
|
||
|
Before:
|
||
|
|
||
|
```asm
|
||
|
# Unrelocated, instructions point to address 0 (unknown)
|
||
|
lis r3, 0
|
||
|
ori r3, r3, 0
|
||
|
```
|
||
|
|
||
|
After:
|
||
|
|
||
|
```asm
|
||
|
# Relocated, instructions point to 0x80001234
|
||
|
lis r3, 0x8000
|
||
|
ori r3, r3, 0x1234
|
||
|
```
|
||
|
|
||
|
Once the linker performs the relocation with the final address, the relocation is no longer needed. Still, sometimes the
|
||
|
final ELF will still contain the relocation information, but the conversion to DOL will **always** remove it.
|
||
|
|
||
|
When we analyze a file, we attempt to rebuild the relocations. This is useful for several reasons:
|
||
|
|
||
|
- It allows us to split the file into relocatable objects. Each object can then be replaced with a decompiled version,
|
||
|
as matching code is written.
|
||
|
- It allows us to modify or add code and data to the game and have all machine code still to point to the correct
|
||
|
symbols, which may now be in a different location.
|
||
|
- It allows us to view the machine code in a disassembler and show symbol names instead of raw addresses.
|