I found a long-forgotten CD-ROM with artwork and screenshots of Namco’s Ace Combat 3: Electrosphere, Ridge Racer Type 4, and other games. But finding it really was just half of the deal! Read on if you wanna see me perform some forensic data analysis. If you just want to see the content, skip forward to part 2 and part 3.
When Namco released Ace Combat 3: Electrosphere in Europe in 2000, it also handed out a press kit. Press kits are collections of screenshots and artwork, intended for use in magazine articles and on websites. The Ace Combat 3 PAL press kit is well-documented; Skyward FM has an excellent article on it.
One of my favorite internet forums closed down last week, so I decided to contact Jason Scott – Internet Archive wizard – on preservation tips. In doing so, I stumbled upon an interesting tweet of his:
The Internet Archive has over 700 Press Kit CDs/DVDs from videogames of the last 25 years or so.
https://archive.org/details/gamepr…
Needless to say, I spent the next hour scrolling through the various CD rips. This one caught my attention immediately:
This scan tells us two important things (aside from the CD being about Namco, obviously):
The disc probably contains something rare and unique. Why? It is hand-labelled, not mass-produced!
Downloading the disc took a while, because the Internet Archive has severe bandwidth problems (as is expected with a top-100 site serving assets with hundreds of megabytes in size). And then … it won’t mount e.g. with WinCDEmu. The image is unsupported or in the wrong format.
Did I mess up the download?
I tried again with PowerISO, and now I could at least see a file system. But some folders were empty, and some files were obviously missing.
… and the few files I could see? Well, they looked like this:
The periodic patterns look a lot like disc errors.
At this point I assumed the disc was damaged beyond repair.
As I later learned from Infrid, the disc image mounts fine on Linux, rendering the rest of this article obsolete. Choose your tools wisely! 🙂
I had contacted DragonSpikeXIII about the disc, and he said that some of the damaged pictures do look new to him. This at least motivated me to look further: There is damaged stuff on the disc, but at least it’s new, previously unknown damaged stuff!
PowerISO showed me 240 MiB of data on the disc. But the downloaded file is 382 MiB large. So there must be quite a lot of stuff hidden in the damaged file system!
Opening the file in a hex editor, I could see filenames that wouldn’t show up in PowerISO:
0012CC30 00 00 00 56 05 66 69 6F 6E 61 A2 90 DD 92 E8 83 ...V.fiona¢.Ý’èƒ 0012CC40 74 83 48 83 8B 83 5F 00 00 00 00 00 00 00 00 00 tƒHƒ‹ƒ_......... 0012CC50 00 00 00 00 11 00 00 00 00 70 0B 46 49 4F 4E 41 .........p.FIONA 0012CC60 30 31 2E 50 53 44 02 00 00 00 38 42 50 53 38 42 01.PSD....8BPS8B 0012CC70 49 4D 05 00 00 00 00 40 00 00 00 00 00 71 00 00 IM.....@.....q.. 0012CC80 00 2E 38 BF 00 2E 50 00 00 00 00 00 22 68 00 00 ..8¿..P....."h.. 0012CC90 26 00 B2 A4 4E 52 B2 A4 4E 5B 00 00 00 00 00 00 &.²¤NR²¤N[...... 0012CCA0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0012CCB0 3C 62 01 38 00 00 00 00 00 00 00 00 3D 9A 00 01 <b.8........=š.. 0012CCC0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0012CCD0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0012CCE0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Searching for the data directly, however, wasn’t easy. PowerISO displayed the file system as HFS, an ancient Apple format with dynamic block allocation.
It got too complicated. Let’s keep it simple! Instead of tracing the file system’s block, let’s just search for image files the brute-force way!
Every BMP image begins with the bytes 42 4D, and every PSD image begins with the string 8BPS8BIM. Why not just parse the entire disc, look for those signatures byte after byte, and dump the next, say, 2 MiB once we find one? That’s just a few lines of code!
auto f = mapForReading("C:\\Users\\Krishty\\Desktop\\Namco 03-Mar-99 (USA).bin"); USize offset = 0; auto sizeRemaining = f.size; for(auto it = f.begin; it < f.end; ++it, ++offset, --sizeRemaining) { if(isBMP(it, sizeRemaining)) { auto path = sprintf("C:\\Users\\Krishty\\Desktop\\test\\%08X.BMP", offset); if(auto out = createEmptyFile(path, writeOnly)) { out->write(it, minimumOf(2 * 1024 * 1024, sizeRemaining)); out->close(); } } }
This worked … I guess? It did dig out images, but … see for yourself:
You can spot some image data, but why is it so strangely distorted?!
Well, what’s in that .bin file anyway? Websites will tell you It’s a CD dump, duh!
. But what is a CD dump?
In this case, the dump contains the actual bytes that have been read from the CD-ROM, but those bytes are not what you expect. Well, not exclusively.
CD-ROMs have their data organized in sectors. Each sector holds 2048 B of data (the payload) … and additional metadata.
What metadata? That’s specified in ISO 9660, and you can purchase it for a few hundred bucks if you want to know!
…
Or you grab yourself a different specification, like Nocash’s reverse engineered PSX specification (I used this earlier to decode the XA sound files in Ace Combat 3). It contains a very useful chapter CDROM Sector Encoding. Here we see that a sector contains a rich set of metadata, e.g.:
Whether the sector contains ordinary data or sound.
This is the reason you can’t see any music files on a music CD-ROM when you insert it into your computer: The computer will only display the data sectors, not music sectors.
A synchronization signal, consisting of the bytes 00 FF FF FF FF FF FF FF FF FF FF 00 and sector numbers.
This allows the laser to skip an arbitrary number of sectors on the disc and find the destination safely.
A checksum and error correction codes.
This allows a CD-ROM player to detect errors (checksum of the payload doesn’t match) and possibly to fix them (using the error correction codes). In short, it’s the reason you can still play a CD with dust or scratches on it, if they aren’t too severe.
That’s where I realized: The images came out distorted because the image data was interleaved with the metadata in the CD-ROM sectors!
Extracting the payload from the CD-ROM sectors takes just a few lines of code:
struct Sector { Byte sync[12]; // 00 FF FF FF FF FF FF FF FF FF FF 00 struct { Byte minute; Byte second; Byte sector; Byte mode; } header; Byte data[2048]; int checksum; Byte zero[8]; Byte errorCorrectionCodes[276]; }; MappedFile decodeCDROMDump(void const * data, USize size) { auto resultBegin = malloc(size); auto resultIt = resultBegin; while(size > sizeof(Sector)) { auto * sector = (Sector const *)data; memcpy(resultIt, sector.data, sizeof sector.data); resultIt += sizeof sector.data; data += sizeof(Sector); size -= sizeof(Sector); } return { resultBegin, resultIt }; }
For the sake of simplicity, I have omitted error correction.
And then something really, really weird happened: All images came out perfectly clear!
Remember that the CD contains an Apple HFS file system? I quickly downloaded HFSExplorer (then not-so-quickly configured a portable Java installation to run it) and it reported the file system as perfectly healthy. I could easily extract the whole disc!
Images on the disc are in PSD (Photoshop) and BMP (Windows bitmap) format. Some vector graphics are in a PostScript-like format I couldn’t load. The presence of specific empty folders and database files indicates that the disc was burnt on a Mac; the images were authored with Adobe Photoshop 4.0.
There’s so much to show that I had to put the Ace Combat 3 content into a second article and the other games into a third one. Enjoy!