Split up README a bit
This commit is contained in:
parent
a156c3697f
commit
e46c6a72bc
146
README.md
146
README.md
|
@ -15,9 +15,9 @@ project structure and build system that uses decomp-toolkit under the hood.
|
||||||
|
|
||||||
- [Goals](#goals)
|
- [Goals](#goals)
|
||||||
- [Background](#background)
|
- [Background](#background)
|
||||||
- [Other approaches](#other-approaches)
|
|
||||||
- [Terminology](#terminology)
|
|
||||||
- [Analyzer features](#analyzer-features)
|
- [Analyzer features](#analyzer-features)
|
||||||
|
- [Other approaches](docs/other_approaches.md)
|
||||||
|
- [Terminology](docs/terminology.md)
|
||||||
- [Commands](#commands)
|
- [Commands](#commands)
|
||||||
- [ar create](#ar-create)
|
- [ar create](#ar-create)
|
||||||
- [ar extract](#ar-extract)
|
- [ar extract](#ar-extract)
|
||||||
|
@ -79,148 +79,7 @@ binary that is byte-for-byte identical to the original, then we know that the de
|
||||||
decomp-toolkit provides tooling for analyzing and splitting the original binary into relocatable objects, as well
|
decomp-toolkit provides tooling for analyzing and splitting the original binary into relocatable objects, as well
|
||||||
as generating the linker script and other files needed to link the decompiled code.
|
as generating the linker script and other files needed to link the decompiled code.
|
||||||
|
|
||||||
## Other approaches
|
|
||||||
|
|
||||||
### Manual assembly
|
|
||||||
|
|
||||||
With existing GameCube/Wii decompilation tooling, the setup process is very tedious and error-prone.
|
|
||||||
The general process is:
|
|
||||||
|
|
||||||
- Begin by disassembling the original binary with a tool like
|
|
||||||
[doldisasm.py](https://gist.github.com/camthesaxman/a36f610dbf4cc53a874322ef146c4123). This produces one giant
|
|
||||||
assembly file per section.
|
|
||||||
- Manually comb through the assembly files and fix many issues, like incorrect or missing relocations, incorrect or
|
|
||||||
missing symbols, and more.
|
|
||||||
- Manually find-and-replace the auto-generated symbol names based on other sources, like other decompilation projects
|
|
||||||
or a map file. (If you're lucky enough to have one)
|
|
||||||
- Manually determine data types and sizes, and convert them accordingly. (For example, `.4byte` -> `.float`, strings,
|
|
||||||
etc)
|
|
||||||
- Manually split the assembly files into individual objects. This is a very tedious process, as it requires identifying
|
|
||||||
the boundaries of each function, determining whether adjacent functions are related, finding associated
|
|
||||||
data from each data section, and cut-and-pasting all of this into a new file.
|
|
||||||
|
|
||||||
Other downsides of this approach:
|
|
||||||
|
|
||||||
- Manually editing the assembly means that the result is not reproducible. You can't run the script again to
|
|
||||||
make any updates, because your changes will be overwritten. This also means that the assembly files must be
|
|
||||||
stored in version control, which is not ideal.
|
|
||||||
- Incorrectly splitting objects is very easy to do, and can be difficult to detect. For example, a `.ctors` entry _must_
|
|
||||||
be located in the same object as the function it references, otherwise the linker will not generate the correct
|
|
||||||
`.ctors` entry. `extab` and `extabindex` entries _must also_ be located in the same object as the function they
|
|
||||||
reference, have a label and have the correct size, and have a direct relocation rather than a section-relative
|
|
||||||
relocation. Otherwise, the linker will crash with a cryptic error message.
|
|
||||||
- Relying on assembly means that you need an assembler. For GameCube/Wii, this means devkitPro, which is a
|
|
||||||
large dependency and an obstacle for new contributors. The assembler also has some quirks that don't interact well
|
|
||||||
with `mwldeppc`, which means that the object files must be manually post-processed to fix these issues. (See the
|
|
||||||
[elf fixup](#elf-fixup) command)
|
|
||||||
|
|
||||||
With decomp-toolkit:
|
|
||||||
|
|
||||||
- Many analysis steps are automated and highly accurate. Many DOL files can be analyzed and split into re-linkable
|
|
||||||
objects with no configuration.
|
|
||||||
- Signature analysis automatically labels common functions and objects, and allows for more accurate relocation
|
|
||||||
rebuilding.
|
|
||||||
- Any manual adjustments are stored in configuration files, which are stored in version control.
|
|
||||||
- Splitting is simplified by updating a configuration file. The analyzer will check for common issues, like
|
|
||||||
incorrectly split `.ctors`/`.dtors`/`extab`/`extabindex` entries. If the user hasn't configured a split for these,
|
|
||||||
the analyzer will automatically split them along with their associated functions to ensure that the linker will
|
|
||||||
generate everything correctly. This means that matching code can be written without worrying about splitting all
|
|
||||||
sections up front.
|
|
||||||
- The splitter generates object files directly, with no assembler required. This means that we can avoid the devkitPro
|
|
||||||
requirement. (Although we can still generate assembly files for viewing, editing, and compatibility with other tools)
|
|
||||||
|
|
||||||
### dadosod
|
|
||||||
|
|
||||||
[dadosod](https://github.com/InusualZ/dadosod) is a newer replacement for `doldisasm.py`. It has more accurate function
|
|
||||||
and relocation analysis than `doldisasm.py`, as well as support for renaming symbols based on a map file. However, since
|
|
||||||
it operates as a one-shot assembly generator, it still suffers from many of the same issues described above.
|
|
||||||
|
|
||||||
### ppcdis
|
|
||||||
|
|
||||||
[ppcdis](https://github.com/SeekyCt/ppcdis) is one of the tools that inspired decomp-toolkit. It has more accurate
|
|
||||||
analysis than doldisasm.py, and has similar goals to decomp-toolkit. It's been used successfully in several
|
|
||||||
decompilation projects.
|
|
||||||
|
|
||||||
However, decomp-toolkit has a few advantages:
|
|
||||||
|
|
||||||
- Faster and more accurate analysis. (See [Analyzer features](#analyzer-features))
|
|
||||||
- Emits object files directly, with no assembler required.
|
|
||||||
- More robust handling of features like common BSS, `.ctors`/`.dtors`/`extab`/`extabindex`, and more.
|
|
||||||
- Requires very little configuration to start.
|
|
||||||
- Automatically labels common functions and objects with signature analysis.
|
|
||||||
|
|
||||||
### Honorable mentions
|
|
||||||
|
|
||||||
[splat](https://github.com/ethteck/splat) is a binary splitting tool for N64 and PSX. Some ideas from splat inspired
|
|
||||||
decomp-toolkit, like the symbol configuration format.
|
|
||||||
|
|
||||||
## Terminology
|
|
||||||
|
|
||||||
### DOL
|
|
||||||
|
|
||||||
A [DOL file](https://wiki.tockdom.com/wiki/DOL_(File_Format)) is the executable format used by GameCube and Wii games.
|
|
||||||
It's essentially a raw binary with a header that contains information about the code and data sections, as well as the
|
|
||||||
entry point.
|
|
||||||
|
|
||||||
### ELF
|
|
||||||
|
|
||||||
An [ELF file](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) is the executable format used by most
|
|
||||||
Unix-like operating systems. There are two common types of ELF files: **relocatable** and **executable**.
|
|
||||||
|
|
||||||
A relocatable ELF (`.o`, also called "object file") contains machine code and relocation information, and is used as
|
|
||||||
input to the linker. Each object file is compiled from a single source file (`.c`, `.cpp`).
|
|
||||||
|
|
||||||
An executable ELF (`.elf`) contains the final machine code that can be loaded and executed. It *can* include
|
|
||||||
information about symbols, debug information (DWARF), and sometimes information about the original relocations, but it
|
|
||||||
is often missing some or all of these (referred to as "stripped").
|
|
||||||
|
|
||||||
### Symbol
|
|
||||||
|
|
||||||
A symbol is a name that is assigned to a memory address. Symbols can be functions, variables, or other data.
|
|
||||||
|
|
||||||
**Local** symbols are only visible within the object file they are defined in.
|
|
||||||
These are usually defined as `static` in C/C++ or are compiler-generated.
|
|
||||||
|
|
||||||
**Global** symbols are visible to all object files, and their names must be unique.
|
|
||||||
|
|
||||||
**Weak** symbols are similar to global symbols, but can be replaced by a global symbol with the same name.
|
|
||||||
For example: the SDK defines a weak `OSReport` function, which can be replaced by a game-specific implementation.
|
|
||||||
Weak symbols are also used for functions generated by the compiler or as a result of C++ features, since they can exist
|
|
||||||
in multiple object files. The linker will deduplicate these functions, keeping only the first copy.
|
|
||||||
|
|
||||||
### Relocation
|
|
||||||
|
|
||||||
A relocation is essentially a pointer to a symbol. At compile time, the final address of a symbol is
|
|
||||||
not known yet, therefore a relocation is needed.
|
|
||||||
At link time, each symbol is assigned a final address, and the linker will use the relocations to update the machine
|
|
||||||
code with the final addresses of the symbol.
|
|
||||||
|
|
||||||
Before:
|
|
||||||
|
|
||||||
```asm
|
|
||||||
# Unrelocated, instructions point to address 0 (unknown)
|
|
||||||
lis r3, 0
|
|
||||||
ori r3, r3, 0
|
|
||||||
```
|
|
||||||
|
|
||||||
After:
|
|
||||||
|
|
||||||
```asm
|
|
||||||
# Relocated, instructions point to 0x80001234
|
|
||||||
lis r3, 0x8000
|
|
||||||
ori r3, r3, 0x1234
|
|
||||||
```
|
|
||||||
|
|
||||||
Once the linker performs the relocation with the final address, the relocation is no longer needed. Still, sometimes the
|
|
||||||
final ELF will still contain the relocation information, but the conversion to DOL will **always** remove it.
|
|
||||||
|
|
||||||
When we analyze a file, we attempt to rebuild the relocations. This is useful for several reasons:
|
|
||||||
|
|
||||||
- It allows us to split the file into relocatable objects. Each object can then be replaced with a decompiled version,
|
|
||||||
as matching code is written.
|
|
||||||
- It allows us to modify or add code and data to the game and have all machine code still to point to the correct
|
|
||||||
symbols, which may now be in a different location.
|
|
||||||
- It allows us to view the machine code in a disassembler and show symbol names instead of raw addresses.
|
|
||||||
|
|
||||||
## Analyzer features
|
## Analyzer features
|
||||||
|
|
||||||
|
@ -261,7 +120,6 @@ Generates `ldscript.lcf` for `mwldeppc.exe`.
|
||||||
|
|
||||||
- Support RSO files
|
- Support RSO files
|
||||||
- Add more signatures
|
- Add more signatures
|
||||||
- Rework CodeWarrior map parsing
|
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,74 @@
|
||||||
|
# Other approaches
|
||||||
|
|
||||||
|
## Manual assembly
|
||||||
|
|
||||||
|
With existing GameCube/Wii decompilation tooling, the setup process is very tedious and error-prone.
|
||||||
|
The general process is:
|
||||||
|
|
||||||
|
- Begin by disassembling the original binary with a tool like
|
||||||
|
[doldisasm.py](https://gist.github.com/camthesaxman/a36f610dbf4cc53a874322ef146c4123). This produces one giant
|
||||||
|
assembly file per section.
|
||||||
|
- Manually comb through the assembly files and fix many issues, like incorrect or missing relocations, incorrect or
|
||||||
|
missing symbols, and more.
|
||||||
|
- Manually find-and-replace the auto-generated symbol names based on other sources, like other decompilation projects
|
||||||
|
or a map file. (If you're lucky enough to have one)
|
||||||
|
- Manually determine data types and sizes, and convert them accordingly. (For example, `.4byte` -> `.float`, strings,
|
||||||
|
etc)
|
||||||
|
- Manually split the assembly files into individual objects. This is a very tedious process, as it requires identifying
|
||||||
|
the boundaries of each function, determining whether adjacent functions are related, finding associated
|
||||||
|
data from each data section, and cut-and-pasting all of this into a new file.
|
||||||
|
|
||||||
|
Other downsides of this approach:
|
||||||
|
|
||||||
|
- Manually editing the assembly means that the result is not reproducible. You can't run the script again to
|
||||||
|
make any updates, because your changes will be overwritten. This also means that the assembly files must be
|
||||||
|
stored in version control, which is not ideal.
|
||||||
|
- Incorrectly splitting objects is very easy to do, and can be difficult to detect. For example, a `.ctors` entry _must_
|
||||||
|
be located in the same object as the function it references, otherwise the linker will not generate the correct
|
||||||
|
`.ctors` entry. `extab` and `extabindex` entries _must also_ be located in the same object as the function they
|
||||||
|
reference, have a label and have the correct size, and have a direct relocation rather than a section-relative
|
||||||
|
relocation. Otherwise, the linker will crash with a cryptic error message.
|
||||||
|
- Relying on assembly means that you need an assembler. For GameCube/Wii, this means devkitPro, which is a
|
||||||
|
large dependency and an obstacle for new contributors. The assembler also has some quirks that don't interact well
|
||||||
|
with `mwldeppc`, which means that the object files must be manually post-processed to fix these issues. (See the
|
||||||
|
[elf fixup](/README.md#elf-fixup) command)
|
||||||
|
|
||||||
|
With decomp-toolkit:
|
||||||
|
|
||||||
|
- Many analysis steps are automated and highly accurate. Many DOL files can be analyzed and split into re-linkable
|
||||||
|
objects with no configuration.
|
||||||
|
- Signature analysis automatically labels common functions and objects, and allows for more accurate relocation
|
||||||
|
rebuilding.
|
||||||
|
- Any manual adjustments are stored in configuration files, which are stored in version control.
|
||||||
|
- Splitting is simplified by updating a configuration file. The analyzer will check for common issues, like
|
||||||
|
incorrectly split `.ctors`/`.dtors`/`extab`/`extabindex` entries. If the user hasn't configured a split for these,
|
||||||
|
the analyzer will automatically split them along with their associated functions to ensure that the linker will
|
||||||
|
generate everything correctly. This means that matching code can be written without worrying about splitting all
|
||||||
|
sections up front.
|
||||||
|
- The splitter generates object files directly, with no assembler required. This means that we can avoid the devkitPro
|
||||||
|
requirement. (Although we can still generate assembly files for viewing, editing, and compatibility with other tools)
|
||||||
|
|
||||||
|
## dadosod
|
||||||
|
|
||||||
|
[dadosod](https://github.com/InusualZ/dadosod) is a newer replacement for `doldisasm.py`. It has more accurate function
|
||||||
|
and relocation analysis than `doldisasm.py`, as well as support for renaming symbols based on a map file. However, since
|
||||||
|
it operates as a one-shot assembly generator, it still suffers from many of the same issues described above.
|
||||||
|
|
||||||
|
## ppcdis
|
||||||
|
|
||||||
|
[ppcdis](https://github.com/SeekyCt/ppcdis) is one of the tools that inspired decomp-toolkit. It has more accurate
|
||||||
|
analysis than doldisasm.py, and has similar goals to decomp-toolkit. It's been used successfully in several
|
||||||
|
decompilation projects.
|
||||||
|
|
||||||
|
However, decomp-toolkit has a few advantages:
|
||||||
|
|
||||||
|
- Faster and more accurate analysis. (See [Analyzer features](/README.md#analyzer-features))
|
||||||
|
- Emits object files directly, with no assembler required.
|
||||||
|
- More robust handling of features like common BSS, `.ctors`/`.dtors`/`extab`/`extabindex`, and more.
|
||||||
|
- Requires very little configuration to start.
|
||||||
|
- Automatically labels common functions and objects with signature analysis.
|
||||||
|
|
||||||
|
## Honorable mentions
|
||||||
|
|
||||||
|
[splat](https://github.com/ethteck/splat) is a binary splitting tool for N64 and PSX. Some ideas from splat inspired
|
||||||
|
decomp-toolkit, like the symbol configuration format.
|
|
@ -0,0 +1 @@
|
||||||
|
### Visit the [dtk-template](https://github.com/encounter/dtk-template) repository for additional documentation, including a guide.
|
|
@ -0,0 +1,67 @@
|
||||||
|
# Terminology
|
||||||
|
|
||||||
|
## DOL
|
||||||
|
|
||||||
|
A [DOL file](https://wiki.tockdom.com/wiki/DOL_(File_Format)) is the executable format used by GameCube and Wii games.
|
||||||
|
It's essentially a raw binary with a header that contains information about the code and data sections, as well as the
|
||||||
|
entry point.
|
||||||
|
|
||||||
|
## ELF
|
||||||
|
|
||||||
|
An [ELF file](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) is the executable format used by most
|
||||||
|
Unix-like operating systems. There are two common types of ELF files: **relocatable** and **executable**.
|
||||||
|
|
||||||
|
A relocatable ELF (`.o`, also called "object file") contains machine code and relocation information, and is used as
|
||||||
|
input to the linker. Each object file is compiled from a single source file (`.c`, `.cpp`).
|
||||||
|
|
||||||
|
An executable ELF (`.elf`) contains the final machine code that can be loaded and executed. It *can* include
|
||||||
|
information about symbols, debug information (DWARF), and sometimes information about the original relocations, but it
|
||||||
|
is often missing some or all of these (referred to as "stripped").
|
||||||
|
|
||||||
|
## Symbol
|
||||||
|
|
||||||
|
A symbol is a name that is assigned to a memory address. Symbols can be functions, variables, or other data.
|
||||||
|
|
||||||
|
**Local** symbols are only visible within the object file they are defined in.
|
||||||
|
These are usually defined as `static` in C/C++ or are compiler-generated.
|
||||||
|
|
||||||
|
**Global** symbols are visible to all object files, and their names must be unique.
|
||||||
|
|
||||||
|
**Weak** symbols are similar to global symbols, but can be replaced by a global symbol with the same name.
|
||||||
|
For example: the SDK defines a weak `OSReport` function, which can be replaced by a game-specific implementation.
|
||||||
|
Weak symbols are also used for functions generated by the compiler or as a result of C++ features, since they can exist
|
||||||
|
in multiple object files. The linker will deduplicate these functions, keeping only the first copy.
|
||||||
|
|
||||||
|
## Relocation
|
||||||
|
|
||||||
|
A relocation is essentially a pointer to a symbol. At compile time, the final address of a symbol is
|
||||||
|
not known yet, therefore a relocation is needed.
|
||||||
|
At link time, each symbol is assigned a final address, and the linker will use the relocations to update the machine
|
||||||
|
code with the final addresses of the symbol.
|
||||||
|
|
||||||
|
Before:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
# Unrelocated, instructions point to address 0 (unknown)
|
||||||
|
lis r3, 0
|
||||||
|
ori r3, r3, 0
|
||||||
|
```
|
||||||
|
|
||||||
|
After:
|
||||||
|
|
||||||
|
```asm
|
||||||
|
# Relocated, instructions point to 0x80001234
|
||||||
|
lis r3, 0x8000
|
||||||
|
ori r3, r3, 0x1234
|
||||||
|
```
|
||||||
|
|
||||||
|
Once the linker performs the relocation with the final address, the relocation is no longer needed. Still, sometimes the
|
||||||
|
final ELF will still contain the relocation information, but the conversion to DOL will **always** remove it.
|
||||||
|
|
||||||
|
When we analyze a file, we attempt to rebuild the relocations. This is useful for several reasons:
|
||||||
|
|
||||||
|
- It allows us to split the file into relocatable objects. Each object can then be replaced with a decompiled version,
|
||||||
|
as matching code is written.
|
||||||
|
- It allows us to modify or add code and data to the game and have all machine code still to point to the correct
|
||||||
|
symbols, which may now be in a different location.
|
||||||
|
- It allows us to view the machine code in a disassembler and show symbol names instead of raw addresses.
|
Loading…
Reference in New Issue