Namco’s PS-Y Compression

Or: How to reverse engineer a 90s compression algorithm?

Odd Ace Combat 3 Executables

During my analysis of Ace Combat 3, I noticed something strange: The EU/US versions both had an additional ACE.BIN that was missing from the Japanese version. What’s inside?

50 53 2D 59 34 61 E3 77 00 B0 0B 80 4C 50 06 00  PS-Y4aãw.°.€LP..
00 B0 0A 00 5C 00 00 00 00 00 00 00 1C 52 02 80  .°..\........R.€
00 00 00 00 00 00 01 80 00 B0 0A 00 00 00 00 00  .......€.°......
00 00 00 00 00 00 00 00 00 00 00 00 F0 FF 1F 80  ............ðÿ.€
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 B0 FF 88 42 0C DF 24 62  ........°ÿˆB.ß$b
02 80 2C 30 04 34 AA 30 08 3C 30 0C 44 30 10 4C  .€,0.4ª0.<0.D0.L
30 14 FD 54 30 18 43 61 6E 27 74 FF 20 6F 70 65  0.ýT0.Can'tÿ ope
6E 20 54 FF 49 4D 20 5B 25 30 38 DF 78 5D 0A 00  n TÿIM [%08ßx]..
00 60 18 72 F7 65 61 64 E0 18 45 72 72 FF 6F 72  .`.r÷eadà.Errÿor
3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76  : MemÿAlloc: ÿOv
65 72 66 6C 6F FF 77 20 25 64 20 62 79 FF 74 65  erfloÿw %d byÿte
73 20 28 52 45 DB 51 3A 90 0F 29 0A B0 34 4E E1  s (REÛQ:..).°4Ná
6F 00 36 00 36 A0 36 C0 6C 46 72 FF 65 65 3A 20  o.6.6 6ÀlFrÿee: 
55 6E 64 B8 00 6C 00 6C B0 A0 53 65 74 00 33 FC  Und¸.l.l° Set.3ü
40 9F 30 DC 30 31 32 33 34 FF 35 36 37 38 39 41  @Ÿ0Ü01234ÿ56789A
42 8F 43 44 45 46 00 10 30 7B B0 24 BF 61 62 63  B.CDEF..0{°$¿abc

Some text strings will be followed by a varying number of zero bytes. This is padding, inserted by the compiler to align strings for faster memory access. If you don’t understand it, just acknowledge that they are an invisible part of the text and you’ll be okay.

These are not just random word pieces. I recognize them! I’ve seen them a dozen times whenever I looked at the Japanese game executable (AC3.BIN)! Seeing them side-by-side:

EU/US ACE.BIN       JAP ACE3.BIN
PS-Y4aãw.°.€LP..    PS-X EXE........
.°..\........R.€    ,W.€.......€.Ð..
.......€.°......    ................
............ðÿ.€    ðÿ.€............
................    ............Sony
↓                    Computer Entert
↓                   ainment Inc. for
↓                    Japan area.....
…                   …
↓                         bye bye...
.€,0.4ª0.<0.D0.L    ....Äf.€Ìf.€Ôf.€
↓                   Üf.€äf.€ìf.€ôf.€
0.ýT0.Can'tÿ ope    Can't open TIM [
n TÿIM [%08ßx]      %08x]  .Can't re
.`.r÷eadà.Errÿor    ad TIM [%08x]  .
: MemÿAlloc: ÿOv    Error: MemAlloc:
erfloÿw %d byÿte     Overflow %d byt
s (REÛQ:..).°4Ná    es (REQ: %d bytes)  Error: MemNoAlloc: Overflow %d bytes (REQ: %d bytes)  ..Erro
o.6.6 6ÀlFrÿee:     r: MemFree: Unde
Und¸.l.l° Set.3ü    rflow %d bytes (REQ: %d bytes)  Error: MemSet: Underflow %d byte
@Ÿ0Ü01234ÿ56789A    s  .0123456789AB
B.CDEF .0{°$¿abc    CDEF0123456789ABCDEF ...01234567
def@.¿X|.€”‚..€.    89abcdef ...ø€.€

These strings relate to basic memory management – something that is needed very early in the program. It makes sense to find these strings pretty early in the executable.

However, you can clearly see that the strings in the EU/US version are butchered to pieces.

Let’s figure out what happened!

Previous Work

I cannot find anything related to the magic ID PS-Y on the internet. I asked a few people from the Ace Combat 3 modding scene and Soapy (for his knowledge on the Psy-Q SDK, the development environment of the original PlayStation), but they had no idea either.

Infrid from the Load Word Team told me they had loaded the extracted file from their emulator’s memory instead of bothering with the compression.

So I was probably not re-inventing the wheel here 🙂

Finding the Basic Pattern

When you look at the two most readable strings, you see an odd character ÿ. This is the byte value FF (hexadecimal) in my Hex Editor’s default encoding. It appears in a very regular pattern:

43 61 6E 27 74 FF 20 6F 70 65 6E 20 54 FF 49 4D 20 5B 25 30 38 DF 78 5D 0A 00 00
 C  a  n  '  t  ÿ     o  p  e  n     T  ÿ  I  M     [  %  0  8  ß  x  ]  .      

45 72 72 FF 6F 72 3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76 65 72 66 6C
 E  r  r  ÿ  o  r  :     M  e  m  ÿ  A  l  l  o  c  :     ÿ  O  v  e  r  f  l
6F FF 77 20 25 64 20 62 79 FF 74 65 73 20 28 52 45 DB 51 3A
 o  ÿ  w     %  d     b  y  ÿ  t  e  s     (  R  E  Û  Q  :

You can clearly see that the green FF bytes mean Seven characters follow now!.

Using this logic, DF and DB mean Five characters follow! and Two characters follow!, respectively.

Compression!

Look again at this section:

EU/US ACE.BIN       JAP ACE3.BIN
0.ýT0.Can'tÿ ope    Can't open TIM [
n TÿIM [%08ßx]      %08x]  .Can't re
.`.r÷eadà.Errÿor    ad TIM [%08x]  .
: MemÿAlloc: ÿOv    Error: MemAlloc:

The middle (red) string is almost completely gone. In fact, it is gone except for a single word: read. What’s so special about this word?

Well, this word is the only difference compared to the previous (blue) string!

Can't open TIM [%08x]
      ↕↕↕↕
Can't read TIM [%08x]

The gibberish before and after read probably means something like Copy the beginning of the previous sentence! and Copy the end of the previous sentence!, respectively. Let’s have a look at the exact bytes:

60 18 72 F7 65 61 64 E0 18
       r     e  a  d

This pattern is more subtle than the last one. Take a look at the pieces we want to copy:

Can't open TIM [%08x]  .Can't read TIM [%08x]  .
╰──6─╯    ╰───14d=Eh───╯│         │
│         ┊             │         │
└─────── 24d=18h ───────┘         │
          ┊                       │
          └─────── 24d=18h ───────┘

The first piece is 6 characters long, and it precedes the second sentence by 24 bytes (18 hexadecimal). This matches the 60 18 block exactly!

Then, the character r is inserted. The following byte F7 probably means Three more characters follow, so we write ead.

Finally, we again reach back 24 bytes (18 hexadecimal) and grab 14 bytes (E hexadecimal) – the E0 18 block – and the text is complete.

We now know most definitely that this is an LZ77-like compression algorithm. Going back to grab previously extracted stuff is called a match, and directly writing stuff is called a literal. This kind of compressor is called dictionary coder.

But we’re not quite done.

Inner Loop

Let’s keep decoding and see where it takes us!

00 60 18 72 F7 65 61 64 E0 18 45 72 72 FF 6F 72  .`.r÷eadà.Errÿor
3A 20 4D 65 6D FF 41 6C 6C 6F 63 3A 20 FF 4F 76  : MemÿAlloc: ÿOv
65 72 66 6C 6F FF 77 20 25 64 20 62 79 FF 74 65  erfloÿw %d byÿte
73 20 28 52 45 DB 51 3A 90 0F 29 0A B0 34 4E E1  s (REÛQ:..).°4Ná

Can't open TIM [%08x]  .Can't read

E0 18                    match 14 B from 24 B before:  TIM [%08x]  .
45 72 72                 literal 3 B Err.
FF 6F 72 3A 20 4D 65 6D  literal 7 B or: Mem
FF 41 6C 6C 6F 63 3A 20  literal 7 B Alloc: 
FF 4F 76 65 72 66 6C 6F  literal 7 B Overflo
FF 77 20 25 64 20 62 79  literal 7 B w %d by
FF 74 65 73 20 28 52 45  literal 7 B tes (RE
DB 51 3A                 literal 2 B Q:
90 0F                    match 9 B from 15 B before:  %d bytes
29 0A                    literal 2 B: ) 

 TIM [%08x]  .Error: MemAlloc: Overflow %d bytes (REQ: %d bytes) 

It’s all fun and games, except for the parts I highlighted in yellow.

These are literals that immediately follow a match. But if we look at the match, we don’t find any indication on how many literal bytes follow, if at all! There is no 3 in E0 18, and there is no 2 in 90 0F!

This one took me a whole evening to crack; but it’s obvious once you see it.

The last piece

All this time, we were thinking about the FF/DB/DF codes as arbitrary codes. But they have a structure, too!

Let’s check them out in binary:

FF (7-byte literal) = 11111111
F7 (3-byte literal) = 11110111
DF (5-byte literal) = 11011111
DB (2-byte literal) = 11011011

It’s not just that. The two bits I highlighted in yellow match the literal 2 B from the previous listing exactly! So the deal is:

  1. Read one byte – let’s call it control byte.

  2. Check the least significant bit. If it is set, write a literal byte. Else, process a match.

  3. Repeat for all bits except for the most significant one.

    This bit has a special meaning, but we’ll come to that later.

Edge Cases

Header

One question I haven’t touched yet: Where does the compressed stream start in the first place?

I don’t know, but I’m pretty sure that it’s 93 bytes into the executable. It is the first byte that makes any sense; earlier bytes cannot be interpreted in a meaningful way.

With only two files, it is hard to learn about the structure of the header. However, it does contain the compressed size as well as the extracted size. I included a definition with the source code.

Large Lengths/Offsets

As we worked out above, a match of 90 11 means copy 9 bytes from 17 bytes before. But there are two more forms we need to take care of:

Zero Offsets

One case I’m not sure of is matches of the form 90 00, i.e. zero offsets. Go back zero bytes and copy nine bytes does not make any sense! I’m treating it as setting 9 B to zero.

End of File

The extracted size in the header is a bit off, and I haven’t quite figured out why. I found that the most reliable way to identify the end-of-file is reading a control byte whose most significant bit is zero. So that’s why not all eight bits seem used!

The EXE Header

You may have noticed that the PS-X EXE part is missing. I assume that the decompressor will add it, because it can easily be re-computed. It may not even be necessary if the executable is loaded dynamically.

Source Code

Here’s my header structure and my decoding loop:

struct Header {
	// Always “PS-Y” (50 53 2D 59 hexadecimal).
	UInt4B_LE id;
	UInt4B    unknown_a[2];
	UInt4B_LE compressedSize; // includes the header
	Byte      unknown_b[24];
	UInt4B_LE extractedSize; // slightly off; use EOF bit instead
	Byte      unknown_c[49];
	Byte      stream[];
};

// Extracts the given PS-Y to the given address.
//  • requires sufficient space
void extract(
	Byte * RESTRICT             toOutput,
	void const * const RESTRICT toData,
	USize const                 inputSize
) {
	auto & header  = *static_cast<Header const *>(toData);
	auto   toInput = &header.stream[0];
	for(;;) {

		// The 7 least significant bits of the control byte each decide whether a
		//  1-B literal follows (1) or a match (0).
		// An unset most significant bit indicates EOF.
		auto control = UInt4B(*toInput++);
		if (0 == control) [[unlikely]]
			return; // end of stream
		do {

			if(control & 1) { // Literal?

				*toOutput++ = *toInput++;

			} else { // Match?

				auto const lengthAndBlock = *toInput++;
				auto const offset = 256 * (lengthAndBlock & 0xF) + *toInput++;
				if(0 == offset) // not sure here
					*toOutput = 0;

				auto length = UInt4B(lengthAndBlock >> 4);
				if(0 == length)
					length = 16;

				auto       toMatch    = toOutput - offset;
				auto const toMatchEnd = toMatch + length;
				do {
					*toOutput++ = *toMatch++;
				} while(toMatch < toMatchEnd);

			}

			control >>= 1;
		} while(control != 1);

	}
}

Origins

Remember that we identified the algorithm as being LZ77-like? Wikipedia has a nice overview of LZ77 implementations. While reading it, one particular implementation caught my attention:

LZSS improves on LZ77 by using a 1-bit flag to indicate whether the next chunk of data is a literal or a length–distance pair

Wikipedia’s LZSS article eventually gives us a clue with Japanese connections:

Most implementations stem from a public domain 1989 code by Haruhiko Okumura.[3][4]

The article helpfully links the original 1989 implementation. In LZSS.C we find Okumura’s original Decode() function, and the similarities are striking:

for ( ; ; ) {
	if (((flags >>= 1) & 256) == 0) {
		if ((c = getc(infile)) == EOF) break;
		flags = c | 0xff00;		/* uses higher byte cleverly */
	}							/* to count eight */
	if (flags & 1) {
		if ((c = getc(infile)) == EOF) break;
		putc(c, outfile);  text_buf[r++] = c;  r &= (N - 1);
	} else {
		if ((i = getc(infile)) == EOF) break;
		if ((j = getc(infile)) == EOF) break;
		i |= ((j & 0xf0) << 4);  j = (j & 0x0f) + THRESHOLD;
		for (k = 0; k <= j; k++) {
			c = text_buf[(i + k) & (N - 1)];
			putc(c, outfile);  text_buf[r++] = c;  r &= (N - 1);
		}
	}
}

It is not identical – LZSS handles eight consecutive bits where Namco’s compression only handles seven (the most significant bit implies EOF). But it does process bits from the control byte, and it reads matches as pairs of bytes (with a 4-bit length and a 12-bit offset just like we’ve seen earlier).

Namco’s compression may not be an exact copy of LZSS, but I’d say it’s directly derived. The epoch (90s), the location (Japan), and the overall structure leave little room for other explanations.

Review

I call this compression PS-Y after the magic ID in its header.

It’s similar to the ULZ compression used everywhere throughout Ace Combat 3 … but PS-Y is a bit more primitive, especially in handling matches. Where ULZ is data-oriented (all matches in one place, all literals in another place, all control bytes in another place), PS-Y mixes all of this into one stream.

PS-Y is obviously optimized for MIPS (the PSX CPU) – all bit shifts use fixed operands. Still, I can imagine ULZ being quite a lot faster because it allows longer matches and doesn’t force you to read everything byte-wise.

It’s an interesting design choice to compress the game executable. This may be an attempt at reducing loading time – the inner loop of this algorithm is surely faster than a CD-ROM drive. The compression is not very good however – it removes a mere 15 %.

What For?

For fun, mostly 🙂

Having it all extracted, we can see that these are the main executables for the EU/US version of Ace Combat 3. Their differences from the Japanese version will probably make an article on their own.

But the really interesting quesion is: Where else did Namco use this PS-Y compression? We know that their ULZ compression was used in other titles, e.g. in Time Crisis (1997).

Having written this article, the next dataminer encountering this compression will at least have one Google result to rely on 🙂