decomp-toolkit/docs/terminology.md

68 lines
3.0 KiB
Markdown
Raw Permalink Normal View History

2024-04-23 05:17:09 +00:00
# Terminology
## DOL
A [DOL file](https://wiki.tockdom.com/wiki/DOL_(File_Format)) is the executable format used by GameCube and Wii games.
It's essentially a raw binary with a header that contains information about the code and data sections, as well as the
entry point.
## ELF
An [ELF file](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) is the executable format used by most
Unix-like operating systems. There are two common types of ELF files: **relocatable** and **executable**.
A relocatable ELF (`.o`, also called "object file") contains machine code and relocation information, and is used as
input to the linker. Each object file is compiled from a single source file (`.c`, `.cpp`).
An executable ELF (`.elf`) contains the final machine code that can be loaded and executed. It *can* include
information about symbols, debug information (DWARF), and sometimes information about the original relocations, but it
is often missing some or all of these (referred to as "stripped").
## Symbol
A symbol is a name that is assigned to a memory address. Symbols can be functions, variables, or other data.
**Local** symbols are only visible within the object file they are defined in.
These are usually defined as `static` in C/C++ or are compiler-generated.
**Global** symbols are visible to all object files, and their names must be unique.
**Weak** symbols are similar to global symbols, but can be replaced by a global symbol with the same name.
For example: the SDK defines a weak `OSReport` function, which can be replaced by a game-specific implementation.
Weak symbols are also used for functions generated by the compiler or as a result of C++ features, since they can exist
in multiple object files. The linker will deduplicate these functions, keeping only the first copy.
## Relocation
A relocation is essentially a pointer to a symbol. At compile time, the final address of a symbol is
not known yet, therefore a relocation is needed.
At link time, each symbol is assigned a final address, and the linker will use the relocations to update the machine
code with the final addresses of the symbol.
Before:
```asm
# Unrelocated, instructions point to address 0 (unknown)
lis r3, 0
ori r3, r3, 0
```
After:
```asm
# Relocated, instructions point to 0x80001234
lis r3, 0x8000
ori r3, r3, 0x1234
```
Once the linker performs the relocation with the final address, the relocation is no longer needed. Still, sometimes the
final ELF will still contain the relocation information, but the conversion to DOL will **always** remove it.
When we analyze a file, we attempt to rebuild the relocations. This is useful for several reasons:
- It allows us to split the file into relocatable objects. Each object can then be replaced with a decompiled version,
as matching code is written.
- It allows us to modify or add code and data to the game and have all machine code still to point to the correct
symbols, which may now be in a different location.
- It allows us to view the machine code in a disassembler and show symbol names instead of raw addresses.