Or: How to reverse engineer a 90s compression algorithm?
During my analysis of Ace Combat 3, I noticed something strange: The EU/US versions both had an additional ACE.BIN that was missing from the Japanese version. What’s inside?
50 53 2D 59 34 61 E3 77 00 B0 0B 80 4C 50 06 00 PS-Y4aãw.°.€LP.. 00 B0 0A 00 5C 00 00 00 00 00 00 00 1C 52 02 80 .°..\........R.€ 00 00 00 00 00 00 01 80 00 B0 0A 00 00 00 00 00 .......€.°...... 00 00 00 00 00 00 00 00 00 00 00 00 F0 FF 1F 80 ............ðÿ.€ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 B0 FF 88 42 0C DF 24 62 ........°ÿˆB.ß$b 02 80 2C 30 04 34 AA 30 08 3C 30 0C 44 30 10 4C .€,0.4ª0.<0.D0.L 30 14 FD 54 30 18 43 61 6E 27 74 FF 20 6F 70 65 0.ýT0.Can'tÿ ope 6E 20 54 FF 49 4D 20 5B 25 30 38 DF 78 5D 0A 00 n TÿIM [%08ßx].. 00 60 18 72 F7 65 61 64 E0 18 45 72 72 FF 6F 72 .`.r÷eadà.Errÿor 3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76 : MemÿAlloc: ÿOv 65 72 66 6C 6F FF 77 20 25 64 20 62 79 FF 74 65 erfloÿw %d byÿte 73 20 28 52 45 DB 51 3A 90 0F 29 0A B0 34 4E E1 s (REÛQ:..).°4Ná 6F 00 36 00 36 A0 36 C0 6C 46 72 FF 65 65 3A 20 o.6.6 6ÀlFrÿee: 55 6E 64 B8 00 6C 00 6C B0 A0 53 65 74 00 33 FC Und¸.l.l° Set.3ü 40 9F 30 DC 30 31 32 33 34 FF 35 36 37 38 39 41 @Ÿ0Ü01234ÿ56789A 42 8F 43 44 45 46 00 10 30 7B B0 24 BF 61 62 63 B.CDEF..0{°$¿abc
Some text strings will be followed by a varying number of zero bytes. This is padding, inserted by the compiler to align strings for faster memory access. If you don’t understand it, just acknowledge that they are an invisible part of the text and you’ll be okay.
These are not just random word pieces. I recognize them! I’ve seen them a dozen times whenever I looked at the Japanese game executable (AC3.BIN)! Seeing them side-by-side:
EU/US ACE.BIN JAP ACE3.BIN PS-Y4aãw.°.€LP.. PS-X EXE........ .°..\........R.€ ,W.€.......€.Ð.. .......€.°...... ................ ............ðÿ.€ ðÿ.€............ ................ ............Sony ↓ Computer Entert ↓ ainment Inc. for ↓ Japan area..... … … ↓ bye bye... .€,0.4ª0.<0.D0.L ....Äf.€Ìf.€Ôf.€ ↓ Üf.€äf.€ìf.€ôf.€ 0.ýT0.Can'tÿ ope Can't open TIM [ n TÿIM [%08ßx] %08x] .Can't re .`.r÷eadà.Errÿor ad TIM [%08x] . : MemÿAlloc: ÿOv Error: MemAlloc: erfloÿw %d byÿte Overflow %d byt s (REÛQ:..).°4Ná es (REQ: %d byte ↓ s) Error: MemNo ↓ Alloc: Overflow ↓ %d bytes (REQ: % ↓ d bytes) ..Erro o.6.6 6ÀlFrÿee: r: MemFree: Unde Und¸.l.l° Set.3ü rflow %d bytes ( ↓ REQ: %d bytes) ↓ Error: MemSet: U ↓ nderflow %d byte @Ÿ0Ü01234ÿ56789A s .0123456789AB B.CDEF .0{°$¿abc CDEF0123456789AB ↓ CDEF ...01234567 def@.¿X|.€”‚..€. 89abcdef ...ø€.€
These strings relate to basic memory management – something that is needed very early in the program. It makes sense to find these strings pretty early in the executable.
However, you can clearly see that the strings in the EU/US version are butchered to pieces.
Let’s figure out what happened!
I cannot find anything related to the magic ID PS-Y on the internet. I asked a few people from the Ace Combat 3 modding scene and Soapy (for his knowledge on the Psy-Q SDK, the development environment of the original PlayStation), but they had no idea either.
Infrid from the Load Word Team told me they had loaded the extracted file from their emulator’s memory instead of bothering with the compression.
So I was probably not re-inventing the wheel here 🙂
When you look at the two most readable strings, you see an odd character ÿ. This is the byte value FF (hexadecimal) in my Hex Editor’s default encoding. It appears in a very regular pattern:
43 61 6E 27 74 FF 20 6F 70 65 6E 20 54 FF 49 4D 20 5B 25 30 38 DF 78 5D 0A 00 00 C a n ' t ÿ o p e n T ÿ I M [ % 0 8 ß x ] . 45 72 72 FF 6F 72 3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76 65 72 66 6C E r r ÿ o r : M e m ÿ A l l o c : ÿ O v e r f l 6F FF 77 20 25 64 20 62 79 FF 74 65 73 20 28 52 45 DB 51 3A o ÿ w % d b y ÿ t e s ( R E Û Q :
You can clearly see that the green FF bytes mean Seven characters follow now!
.
Using this logic, DF and DB mean Five characters follow!
and Two characters follow!
, respectively.
Look again at this section:
EU/US ACE.BIN JAP ACE3.BIN 0.ýT0.Can'tÿ ope Can't open TIM [ n TÿIM [%08ßx] %08x] .Can't re .`.r÷eadà.Errÿor ad TIM [%08x] . : MemÿAlloc: ÿOv Error: MemAlloc:
The middle (red) string is almost completely gone. In fact, it is gone except for a single word: read
. What’s so special about this word?
Well, this word is the only difference compared to the previous (blue) string!
Can't open TIM [%08x] ↕↕↕↕ Can't read TIM [%08x]
The gibberish before and after read probably means something like Copy the beginning of the previous sentence!
and Copy the end of the previous sentence!
, respectively. Let’s have a look at the exact bytes:
60 18 72 F7 65 61 64 E0 18 r e a d
This pattern is more subtle than the last one. Take a look at the pieces we want to copy:
Can't open TIM [%08x] .Can't read TIM [%08x] . ╰──6─╯ ╰───14d=Eh───╯│ │ │ ┊ │ │ └─────── 24d=18h ───────┘ │ ┊ │ └─────── 24d=18h ───────┘
The first piece is 6 characters long, and it precedes the second sentence by 24 bytes (18 hexadecimal). This matches the 60 18 block exactly!
Then, the character r is inserted. The following byte F7 probably means Three more characters follow
, so we write ead.
Finally, we again reach back 24 bytes (18 hexadecimal) and grab 14 bytes (E hexadecimal) – the E0 18 block – and the text is complete.
We now know most definitely that this is an LZ77-like compression algorithm. Going back to grab previously extracted stuff is called a match, and directly writing stuff is called a literal. This kind of compressor is called dictionary coder.
But we’re not quite done.
Let’s keep decoding and see where it takes us!
00 60 18 72 F7 65 61 64 E0 18 45 72 72 FF 6F 72 .`.r÷eadà.Errÿor 3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76 : MemÿAlloc: ÿOv 65 72 66 6C 6F FF 77 20 25 64 20 62 79 FF 74 65 erfloÿw %d byÿte 73 20 28 52 45 DB 51 3A 90 0F 29 0A B0 34 4E E1 s (REÛQ:..).°4Ná Can't open TIM [%08x] .Can't read E0 18 match 14 B from 24 B before:TIM [%08x] .45 72 72 literal 3 BErr. FF 6F 72 3A 20 4D 65 6D literal 7 Bor: MemFF 41 6C 6C 6F 63 3A 20 literal 7 BAlloc:FF 4F 76 65 72 66 6C 6F literal 7 BOverfloFF 77 20 25 64 20 62 79 literal 7 Bw %d byFF 74 65 73 20 28 52 45 literal 7 Btes (REDB 51 3A literal 2 BQ:90 0F match 9 B from 15 B before:%d bytes29 0A literal 2 B:)TIM [%08x] .Error: MemAlloc: Overflow %d bytes (REQ: %d bytes)
It’s all fun and games, except for the parts I highlighted in yellow.
These are literals that immediately follow a match. But if we look at the match, we don’t find any indication on how many literal bytes follow, if at all! There is no 3 in E0 18, and there is no 2 in 90 0F!
This one took me a whole evening to crack; but it’s obvious once you see it.
All this time, we were thinking about the FF/DB/DF codes as arbitrary codes. But they have a structure, too!
Let’s check them out in binary:
FF (7-byte literal) = 11111111 F7 (3-byte literal) = 11110111 DF (5-byte literal) = 11011111 DB (2-byte literal) = 11011011
It’s not just that. The two bits I highlighted in yellow match the literal 2 B from the previous listing exactly! So the deal is:
Read one byte – let’s call it control byte.
Check the least significant bit. If it is set, write a literal byte. Else, process a match.
Repeat for all bits except for the most significant one.
This bit has a special meaning, but we’ll come to that later.
One question I haven’t touched yet: Where does the compressed stream start in the first place?
I don’t know, but I’m pretty sure that it’s 93 bytes into the executable. It is the first byte that makes any sense; earlier bytes cannot be interpreted in a meaningful way.
With only two files, it is hard to learn about the structure of the header. However, it does contain the compressed size as well as the extracted size. I included a definition with the source code.
As we worked out above, a match of 90 11 means copy 9 bytes from 17 bytes before
. But there are two more forms we need to take care of:
In 92 11, the 2 counts towards the offset, not the length! I.e. the offset becomes 2×256+17 bytes while the length remains 9 B.
In 00 11, the zero does not mean that the length is zero – this wouldn’t make any sense. It must be treated as 16.
One case I’m not sure of is matches of the form 90 00, i.e. zero offsets. Go back zero bytes and copy nine bytes
does not make any sense! I’m treating it as setting 9 B to zero.
The extracted size in the header is a bit off, and I haven’t quite figured out why. I found that the most reliable way to identify the end-of-file is reading a control byte whose most significant bit is zero. So that’s why not all eight bits seem used!
You may have noticed that the PS-X EXE part is missing. I assume that the decompressor will add it, because it can easily be re-computed. It may not even be necessary if the executable is loaded dynamically.
Here’s my header structure and my decoding loop:
struct Header { // Always “PS-Y” (50 53 2D 59 hexadecimal). UInt4B_LE id; UInt4B unknown_a[2]; UInt4B_LE compressedSize; // includes the header Byte unknown_b[24]; UInt4B_LE extractedSize; // slightly off; use EOF bit instead Byte unknown_c[49]; Byte stream[]; }; // Extracts the given PS-Y to the given address. // • requires sufficient space void extract( Byte * RESTRICT toOutput, void const * const RESTRICT toData, USize const inputSize ) { auto & header = *static_cast<Header const *>(toData); auto toInput = &header.stream[0]; for(;;) { // The 7 least significant bits of the control byte each decide whether a // 1-B literal follows (1) or a match (0). // An unset most significant bit indicates EOF. auto control = UInt4B(*toInput++); if (0 == control) [[unlikely]] return; // end of stream do { if(control & 1) { // Literal? *toOutput++ = *toInput++; } else { // Match? auto const lengthAndBlock = *toInput++; auto const offset = 256 * (lengthAndBlock & 0xF) + *toInput++; if(0 == offset) // not sure here *toOutput = 0; auto length = UInt4B(lengthAndBlock >> 4); if(0 == length) length = 16; auto toMatch = toOutput - offset; auto const toMatchEnd = toMatch + length; do { *toOutput++ = *toMatch++; } while(toMatch < toMatchEnd); } control >>= 1; } while(control != 1); } }
Remember that we identified the algorithm as being LZ77-like
? Wikipedia has a nice overview of LZ77 implementations. While reading it, one particular implementation caught my attention:
LZSS improves on LZ77 by using a 1-bit flag to indicate whether the next chunk of data is a literal or a length–distance pair
Wikipedia’s LZSS article eventually gives us a clue with Japanese connections:
Most implementations stem from a public domain 1989 code by Haruhiko Okumura.[3][4]
The article helpfully links the original 1989 implementation. In LZSS.C we find Okumura’s original Decode() function, and the similarities are striking:
for ( ; ; ) { if (((flags >>= 1) & 256) == 0) { if ((c = getc(infile)) == EOF) break; flags = c | 0xff00; /* uses higher byte cleverly */ } /* to count eight */ if (flags & 1) { if ((c = getc(infile)) == EOF) break; putc(c, outfile); text_buf[r++] = c; r &= (N - 1); } else { if ((i = getc(infile)) == EOF) break; if ((j = getc(infile)) == EOF) break; i |= ((j & 0xf0) << 4); j = (j & 0x0f) + THRESHOLD; for (k = 0; k <= j; k++) { c = text_buf[(i + k) & (N - 1)]; putc(c, outfile); text_buf[r++] = c; r &= (N - 1); } } }
It is not identical – LZSS handles eight consecutive bits where Namco’s compression only handles seven (the most significant bit implies EOF). But it does process bits from the control byte, and it reads matches as pairs of bytes (with a 4-bit length and a 12-bit offset just like we’ve seen earlier).
Namco’s compression may not be an exact copy of LZSS, but I’d say it’s directly derived. The epoch (90s), the location (Japan), and the overall structure leave little room for other explanations.
I call this compression PS-Y after the magic ID in its header.
It’s similar to the ULZ compression used everywhere throughout Ace Combat 3 … but PS-Y is a bit more primitive, especially in handling matches. Where ULZ is data-oriented (all matches in one place, all literals in another place, all control bytes in another place), PS-Y mixes all of this into one stream.
PS-Y is obviously optimized for MIPS (the PSX CPU) – all bit shifts use fixed operands. Still, I can imagine ULZ being quite a lot faster because it allows longer matches and doesn’t force you to read everything byte-wise.
It’s an interesting design choice to compress the game executable. This may be an attempt at reducing loading time – the inner loop of this algorithm is surely faster than a CD-ROM drive. The compression is not very good however – it removes a mere 15 %.
For fun, mostly 🙂
Having it all extracted, we can see that these are the main executables for the EU/US version of Ace Combat 3. Their differences from the Japanese version will probably make an article on their own.
But the really interesting quesion is: Where else did Namco use this PS-Y compression? We know that their ULZ compression was used in other titles, e.g. in Time Crisis (1997).
Having written this article, the next dataminer encountering this compression will at least have one Google result to rely on 🙂