What Happened to the Japanese PC Platforms?

Sep 21st, 2024 2:01 pm

(This was originally posted on a social media site; I’ve revised and updated it for my blog.)

The other day a friend asked me a pretty interesting question: what happened to all those companies who made those Japanese computer platforms that were never released outside Japan? I thought it’d be worth expanding that answer into a full-size post.

A quick introduction: the players

It’s hard to remember these days, but there there used to be an incredible amount of variety in the computer space. There were a lot of different computer platforms, pretty much all of them totally incompatible with each other. North America settled on the IBM PC/Mac duopoly pretty early¹, but Europe still had plenty of other computers popular well into the 90s, and Japan had its own computers that essentially didn’t exist anywhere else.

So who were they? By the 16-bit computer era, there’s three I’m going to talk about today²: NEC’s PC-98, Fujitsu’s FM Towns, and Sharp’s X68000. The PC-98 was far and away the biggest of those platforms, with the other two having a more niche market.

The PC-98 in a time of transition

First, a quick digression: what is this DOS thing?

The thing about DOS is that it’s a much thinner OS than what we think of in 2024. When you’re writing DOS software of any kind of complexity, you’re talking straight to the hardware, or to drivers that are specific to particular classes of hardware. When we talk about “DOS” in the west, we specifically mean “DOS on IBM compatible PCs”. PC-98 and FM Towns both had DOS-based operating systems, but their hardware was nothing at all like IBM compatible PCs and there was no level of software compatibility between them. The PC-98 was originally a DOS-based computer without a GUI of any kind - just like DOS-based IBM PCs. When we talk about “PC-98” games and software, what we really mean is DOS-based PC-98 software that only runs on that platform.

Windows software is very different from DOS in one important way: Windows incorporates a hardware abstraction layer. Software written for Windows APIs doesn’t need to be specific to particular hardware, and that set the stage for the major transition that was going to come.

NEC and Microsoft teamed up on porting Windows to the PC-98 platform. Both the PC-98 and the IBM PC use the same CPU, even though the rest of their hardware is very different, which made the port technically feasible. The first Windows release for PC-98 came out in 1992, but Windows didn’t really take off in a big way until Windows 95 in the mid-90s. And so, suddenly, for the first time software could run on both IBM PCs running Japanese language Windows and PC-98 running Windows.³ Software developers didn’t have to do anything special to get that compatibility: it happened by default, so long as they were using the standard Windows software features and didn’t talk directly to the hardware.

Around the same time, NEC started making IBM-compatible PCs. As far as I can tell, they made both PC-98s and IBM PCs alongside each other for quite a few years. With Windows software not caring what the underlying hardware was, the distinction between “PC-98” and “PC” got a lot fuzzier. If you were buying a PC, you had no reason to buy a PC-98 unless you wanted to run DOS-based PC-98 software. If you just wanted that shiny new Windows software, why not buy the cheaper IBM PC that NEC would also sell you?

So, for the PC-98, the answer isn’t really that it died - it sort of faded away and merged into what every other system was becoming.

The FM Towns

The FM Towns had a similar transition. While it had a homegrown GUI-based OS called Towns OS, it was relatively primitive compared to Windows 3 and especially Windows 95. The FM Towns also used the same CPU as IBM PCs and the PC-98, which means Microsoft could work with Fujitsu to port their software to the platform. And, just like what happened with the PC-98, the platform became far less relevant and less distinctive when it was just another platform to run Windows software on. If you didn’t care about running the older FM Towns-specific software, why would you care about buying an FM Towns instead of any other IBM PC?

Fujitsu, just like NEC, made the transition to making standard Windows PCs and discontinued the FM Towns a few years later.

The X68000 loses out in the CPU wars

Unlike the other two platforms, the X68000 had a different CPU and a distinct homegrown OS. It used the 68000 series of processors from Motorola, which were incredibly popular in the 80s and 90s. The same CPU was used by the Mac until the mid 90s, the Amiga, and a huge number of home consoles and arcade boards. It was a powerful CPU, but when every other platform was looking for a way to merge with the Windows platform, they had a big problem: you simply couldn’t port Windows to the platform and get it to run regular Windows software because they didn’t use the same CPUs. Sharp were locked out. While they also switched to making Windows PCs in the 90s, they had no way to bring their existing users with them by giving them a transition path.

The lure of multitasking

Why did Windows win out, though? In the west we often credit Microsoft Office as the killer app, but it wasn’t a major player in Japan where Japanese language-specific word processors were huge in the market for years. I’d argue instead that multitasking was the killer feature.

In the DOS era, you ran one program at a time. You might have a lot of software you used, but you’d pick one program to use at a time. If you wanted to switch to something else, you’d have to save whatever you’re doing, quit, and open a completely different full-screen app. While competing platforms like the Mac⁴ had multitasking via their GUIs for years, Windows and especially Windows 3 is what brought it to the wider market.

If you’re going to be using more than one program at the same time, having a wider amount of software that’s inter-compatible becomes more important. I’d argue that multitasking is what nudged market consolidation onto a smaller number of computers. Windows, and especially Windows 95, became very hard for other platforms to compete with because its base of software was just so large. It made far more sense for NEC and Fujitsu to bring Windows to their users even if it meant losing the lock-in that their unique OSs and platform-specific software had gotten them.

Shifts in the gaming market

In the 16-bit era, the FM Towns and X68000 were doing great in the computer gaming niche. They had powerful 2D gaming hardware and a lot of very sophisticated action games. Their original games and ports of arcade games compared extremely well against what 16-bit consoles could do, giving them a reputation of being the real gamers' platforms. By 1994 though, they had a problem: the 32-bit consoles were out, which could do 2D games just as well as the FM Towns and X68000, and the consoles could also do 3D that blew away anything those computers could handle. Fujitsu and Sharp, meanwhile, just weren’t releasing new hardware that could keep up. The PC gaming niche had already been shrinking and moving towards consoles for a few years, and this killed off a lot of what was left.

I also suspect that Sony’s marketing for the PlayStation changed things significantly. Home computers had older players than the 16-bit consoles did, but Sony was marketing the PS1 towards those same older audiences. It probably made it easy for computer players to look at the new consoles and decide to move on.

What about the 8-bit platforms?

Japan had a variety of 8-bit computer platforms, some of which (like the MSX) were also well-known in western countries. While in Europe the 8-bit micros held on right into the 90s, and many users upgraded straight from 8-bit micros to Windows PCs, in Japan the 8-bit computers had already been supplanted by native 16-bit computing platforms before the Windows era. In some cases, these were 16-bit computers by the same manufacturers - both Sharp and NEC had been major players in the 8-bit computing era too. The MSX, meanwhile, had failed to produce either a 16-bit evolution of the platform or a 16-bit successor and so many of its users had already moved on by the time Windows 95 came out.

So, in conclusion

None of the 16-bit Japanese computer makers acutally died off - they just switched to making standard Windows PCs that were interchangeable with anything else out there. Microsoft took over that market just like they did everywhere else in the world, but at least the companies themselves survived better than the Commodores and Ataris of the world.

Some of the 16-bit competitors, like Amiga and Atari ST, had some market penetration in North America, but they were pretty niche compared to Europe.↩
There were some others too, like Sony NEWS, but they mostly settled into the “professional workstation market” that was its own weird thing. Just like the international SGI, Sun and NeXT workstations, they had their own reasons for fading away.↩
A lot of the earlier Japanese Windows games I have list their system requirements in terms of both PC-98 and IBM PC, even though they’re not using anything specific to either platform.↩
Outside Japan the Amiga and many others also had high-quality multitasking GUIs for years, but I’m focusing specifically on Japan here.↩

The Working Archivist's Guide to Enthusiast CD-ROM Archiving Tools

Sep 13th, 2024 4:32 pm

I’ve seen a lot of professional archivists who use flux disc image archiving techniques for their collections—a technique in which a specialized floppy controller captures the raw signal coming from the floppy drive so that it can be preserved and decoded in software. I haven’t, however, seen many archivists using enthusiast-developed low-level reading techniques for CD-ROM. I’ve personally been making use of these techniques and I find them very helpful; I know that many other archivists and institutions could make great use of them. However, I know that information about enthusiast-developed tools are usually deeply embedded in those communities and can be hard to find for others. As someone with a foot in both worlds, I want to try to bridge the gap and make this information available a bit more widely. This post will summarize why archivists might be interested in these tools, what they can do, and how to make use of them.

Redump

People who are familiar with emulation may think of Redump as collections of disc images online, but they’re really a metadata database for CD-ROM preservation focused primarily on games. It collects metadata of transfers of disc images but also, crucially for us, it sets standards on how disc images should be created in order to ensure accuracy. Those standards are publicly available and are easy enough to follow by anyone—not just people looking to submit to Redump’s database.

Because Redump’s disc imaging standards are of sufficiently high quality, and their software and guides are freely available, I highly recommend them to all people looking to preserve CD-ROMs.

What does dumping to Redump’s standards do that typical dumping doesn’t?

Although the end product of Redump’s dumping process is a disc image in the common BIN/CUE format, the actual process is different in some key ways.

Typically, when reading a CD-ROM, the data the computer receives has been processed and transformed by the drive’s firmware. Data on a CD-ROM is stored in a scrambled¹ (encoded) format, which the drive’s firmware descrambles into the standard format before the computer receives it. The firmware also performs checksum comparison using CD-ROM’s builtin fixity format and automatically corrects any errors it finds. (The next section will describe the format of CD-ROM in more detail.)

By comparison, analogous to how a raw flux read performs a low level image of a floppy² and then processes it using software, Redump’s standards makes use of raw reading functions that are available on a certain set of CD drives. These raw reading functions completely disable the processing the firmware would normally apply to data tracks: the data is read in its original scrambled form, with error correction disabled, so that data is returned in as close to its original form as possible. The software then performs descrambling and error correction after it’s read. (For those interested in a more detailed technical summary of exactly what’s being done here, the redumper README goes into extensive detail.)

The primary benefit to performing rips this way is metadata: it’s possible to log better, more legible information about the descrambling and integrity check processes when it’s performed in software like this. The other benefit is that it becomes easier to reason about discs with unusual formats, disc with mastering errors from when they were produced, and discs with complex copy protection formats. Strangely-mastered or mis-mastered discs are surprisingly common, and this has been helpful for me in the past with a few discs that would otherwise have been difficult to reason about. Here are two recent examples:

One disc contains a mastering error which corrupted the fixity data for a single 2048-byte sector. Using a typical read, this would manifest as a read error and it would be difficult to tell from the logs that this was the result of a mastering error and not disc damage. With a raw read, it became easier to separate out the reading process from the decoding process and thus to get a better understanding of what had happened.
One disc contains a mastering error which places 75 sectors (150KB) of data at the start of an audio track. This would otherwise have been very easy to miss, and may not have been properly decoded by the drive’s firmware.

But Why? (aka, why is CD-ROM so weird?)

The CD-ROM format is very complex, and not all software or all disc image formats support its full set of features.

CD-ROM’s relationship to the audio disc format means discs can have a complex structure.
“ISO” files can only represent the most simple kinds of discs.
CD has a builtin metadata format which most disc image formats don’t support.
The same CD-ROM disc can have different data when viewed on different operating systems. OS-specific imaging tools may discard data for other OSs.

CD-ROM, CD audio, and multi track support

The CD format wasn’t originally designed for data at all—the original CD standard was purely designed around digital audio. The CD-ROM standard was only finalized later, and it acts as an extension to the CD audio format. The interaction between these two formats is the reason behind much of CD-ROM’s complexity.

CD audio isn’t a file-based format, and instead uses a series of unnamed, numbered tracks. CD-ROM extends this by making it possible for a track on a disc to contain data and a filesystem instead of audio. Since CD-ROM extends CD audio, the two formats aren’t mutually exclusive: a CD-ROM disc can still contain multiple tracks, and it can even contain more than one data track or a mixture of data and audio tracks.

The most commonly used disc image file format, the ISO, doesn’t support any of this advanced structure. An ISO represents a data track, not necessarily a full disc. Producing an ISO from a disc containing multiple tracks means that the rest of the disc is ignored, and only a single data track has been backed up.

The other unique feature of the ISO format compared to other disc image formats is that it omits fixity information. CD contains a builtin form of integrity protection, intended to protect against physical damage to a disc; up to a certain level of read error can be recovered using information in the error correction data. Typical data discs have sectors which are 2352 bytes long, of which 2048 bytes are data and 304 are error correction³. ISOs use a “cooked” format which strips the error correction component of each sector, leaving just the data. This data is less critical for a disc after it’s been transferred to a disc image, but it does mean that it serves as a less accurate representation of the physical structure of the original disc.

Subcode - CD’s builtin metadata format

CD defines a sidecar metadata format called the “subcode” or “subchannel”. It allows for small amounts of data to be stored alongside the audio or data on a disc. In most cases, it doesn’t contain anything significant and so most CD disc image formats omit it entirely. However, it’s possible for it to contain interesting or unique data that would be lost if it’s not transferred along with a disc. Examples include CD-Text (track names for CD audio discs); CD graphics (usually used for karaoke graphics on otherwise normal audio discs); and copy protection data for commercial software.

Other builtin metadata that’s not typically preserved is contained in the disc’s leadin and leadout segments. The leadin contains the disc’s table of contents; typically, this information is preserved in a processed form via the drive’s firmware, but not in the raw format direct from the disc. Likewise, the leadout contains finalizing metadata that isn’t otherwise preserved when a CD is backed up.

Multiple filesystems in a single track

The CD-ROM format doesn’t dictate which filesystem is used on a disc, and it’s possible for a single track on a disc to contain more than one filesystem. This also means that the same disc can display drastically different content depending on whether it’s inserted into a Windows, Mac or Linux PC. I’ve personally witnessed a hybrid Mac/PC disc which had completely different contents on both systems, without a single shared file between them. This means that simply backing up a disc by copying the files off the disc is unsafe: you may be missing data from one of the other filesystems. This also means that filesystem-specific backup tools can be unsafe.

I’ve seen some archivists use HFS Explorer to back up Mac CDs, for example, but this tool backs up individual filesystems from a disc—using it for a disc like this one would mean that the Windows contents would be completely lost. Even in the case that a disc is only for Mac, HFS Explorer doesn’t necessarily preserve structural filesystem content in the same format as it was stored on disc.

CD disc image formats

There are a wide variety of disc image formats, many of which are specific to the vendor of a particular disc image reading program, and which can represent differing levels of a CD’s features. A few common examples:

ISO, as mentioned above, represents a single data track at the start of a disc, and isn’t able to represent the remainder of a disc. It’s stored in a “cooked” format with error correction data removed, and omits subcode data.
BIN/CUE, which can represent a full multi-track disc. Stored in a “raw” format, with error correction data retained. Modern versions of the format can include subcode data and can represent complex disc structures. It uses a human-readable metadata format called the “cue sheet”. The software I’ll be talking about later in this post use the modern extended versions of BIN/CUE.
CloneCD, which was originally created to properly back up discs with complex copy protection schemes. It supports the same complex disc structures as BIN/CUE, and preserves subcode information, but differs in that its metadata format is lower level and not intended to be human-readable.

In summary

CD-ROM is a complex format with a wide number of variations, and many disc image formats support only some of the kinds of discs which exist in the real world. Capturing in a complex format ensures nothing is lost while still leaving the flexibility to convert into a simpler format in the future.

The Hardware

Unlike floppy disk image flux archiving, there’s no special enthusiast equipment needed here. Backing up CDs using these techniques uses certain models of standard off the shelf drives manufactured by Plextor. While these drives are no longer manufactured, they’re readily available secondhand from eBay or computer recycling stores. They can be frequently purchased in good working condition for $40 or less. A full list of compatible drives can be found on the Redump wiki: http://wiki.redump.org/index.php?title=Optical_Disc_Drive_Compatibility

This list contains a mixture of internal drives and USB-based external drives. Interal drives can also be converted into external drives using a cheap USB adapter.

The Software

There are a number of different tools available; this post will focus on the most popular ones and the ones with which I have personal experience. Redump’s wiki provides step-by-step usage guides for all of the tools I recommend.

Media Preservation Frontend (Windows only)

For users who prefer GUI tools to commandline tools, Media Preservation Frontend (MPF) provides a graphical interface to the redumper, DiscImageCreator and Aaru tools. (This blog post won’t be discussing Aaru.) Unfortunately, it’s only available for Windows at this time.

It exposes each underlying tool’s feature set to the fullest extent it can, and captures the appropriate metadata. Because it’s oriented around submissions to the Redump database it also contains some data entry fields specific to Redump, but they’re not mandatory and can be easily ignored.

redumper

redumper is a relatively new commandline disc archiving program which has quickly emerged as the Redump community’s new preferred disc backup tool. For archivists interested in using a commandline tool, redumper is my current recommendation.

Its feature set is relatively restricted compared to DiscImageCreator, but its opinionated defaults ensure it just does the right thing without extra configuration. Its focus on simplicity and reliability also extends to its metadata files: while it provides the same metadata as other options, it produces a smaller number of more organized files which I find easier to reason about. It also provides some additional metadata that I find useful.

DiscImageCreator

DiscImageCreator was formerly the tool Redump recommended, but its standards no longer recommend it. Compared to redumper, whose focus is reliability and simplicity, DiscImageCreator features a vast suite of options but is comparably less reliable. Its metadata is also less organized and harder to read.

Its large feature set does mean that there are times when DiscImageCreator can come in handy for something specialized, but at the moment I don’t recommend it as a primary tool.

Converting from more complex formats to simpler ones

After capturing in the formats produced by redumper and DiscImageCreator, it’s possible to convert into simpler formats for access. This provides a useful tradeoff: the more complex formats are kept for longterm preservation, while copies in other formats can be temporarily produced for access and compatibility with software that needs plain ISO images.

On Mac and Linux, bchunk is an open source program which can convert BIN/CUE disc images into plain ISO files. For audio CDs or mixed-mode CDs which contain audio tracks, it can also convert audio tracks to WAV files. On Windows, IsoBuster can similarly convert disc images from one format to another.

Both redumper and DiscImageCreator produce their BIN/CUE images in a split format with one BIN file per track. For those who need a unified image with a single BIN for the same disc, binmerge (cross-platform, written in Python) and chdman (cross-platform, written in C) can perform the conversion.

Useful metadata

In addition to backing up discs, both redumper and DiscImageCreator produce some very useful metadata after the read is complete. This information isn’t necessarily unique to this dumping technique—other software could do the same things after dumping a disc—but it’s very useful to have this automatically performed for every disc.

Both redumper and DiscImageCreator produce machine-readable XML metadata containing metadata about each track on the disc: its size, and hashes in several formats. DiscImageCreator places it in a file named .dat, while Redumper places it in the dat: section of its log file.

<rom name="moonlight (Track 1).bin" size="658917504" crc="ec48aea4" md5="ed350360b8f40c9c5fc4a8ce1bc41c99" sha1="8b0022a6b14842678f0beee961720103d6ca5431" />
<rom name="moonlight (Track 2).bin" size="21226800" crc="06284fb2" md5="e97b60b95764212ba4788911e236c349" sha1="8a112d2f60693f6c767d60514c9a35d3855c55b1" />
<rom name="moonlight (Track 3).bin" size="50189328" crc="2358ba07" md5="191b3f4132b862b8f9239cbe0ad22dd9" sha1="cfbb15b6782a482305a90dea00b1bf4288e617b3" />
<rom name="moonlight (Track 4).bin" size="25371024" crc="31a7d363" md5="1a5a08d9c4c4084e1a390ad5b32454bf" sha1="710ee4cb7a85d627ec9bc9c29deb0620a3d67cba" />

For ISO 9660/PC format discs, both programs also extract mastering date information. This comes from the primary volume descriptor (PVD) information, and contains date information pertaining to the disc’s creation. For example, from the logs for the same disc as the one above:

ISO9660 [moonlight (Track 1).bin]:
  volume identifier: CAFFE
  PVD:
0320 : 20 20 20 20 20 20 20 20  20 20 20 20 20 31 39 39                199
0330 : 36 30 36 30 37 31 34 32  39 31 36 30 30 00 31 39   6060714291600.19
0340 : 39 36 30 36 30 37 31 34  32 39 31 36 30 30 00 30   96060714291600.0
0350 : 30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 00   000000000000000.
0360 : 30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30   0000000000000000
0370 : 00 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................

This shows that the disc has the title CAFFE, and four embedded timestamps representing the disc’s creation:

Volume creation date and time - 1996060714291600, aka June 7, 1996, at 14:29:16 (UTC)
Volume moditification date - identical to the above
Volume expiration date - date the disc should be considered obsolete; often left with null values, as it is here
Volume effective date - date the disc should be used starting from; also often left null

Redumper also produces a full file listing for ISO 9660 discs, along with calculating their hashes. An abbreviated example from the same disc:

*** SKELETON (time check: 3s)

excluded areas hashes (SHA-1):
1a7334e9350d06a69f5dbf1e8ec8ca9c98ad89da SYSTEM_AREA
edcae21603e3564acfea07e81c205031101976ea /SAVER/OPENING.MOV
1d73c3b2f53d251a56b61e0b75c6b5184600c4ae /SAVER/TOKIMEKI.MOV
4f89fe21c61e44e1b9dedc85e09b2c1390055f9b /SAVER/ENDING.MOV
091492f54a3a182921d5255ae3560f26d4dc4d11 /SAVER/CAFFES.MOV
c1589aa3e8f55b86d0be614e835127d254eabb54 /README.TXT

What do all these files mean?

Both redumper and DiscImageCreator produce a large number of files, which can be overwhelming at first; this list provides a little guide as to what those files mean, and which ones are most important to retain for longterm preservation.

redumper

A list of files can also be found on the Redump wiki.

All .bin files - The disc’s data and audio tracks, one file per track.
discname.log - The full set of logs and metadata from the read process.
discname.cue - The disc’s table of contents (list of tracks) in a human-readable cuesheet format.
discname.toc and discname.fulltoc - The disc’s table of contents, in its original, low-level binary format.
discname.state - The disc’s original fixity information, in a binary format.
discname.subcode - The subcode metadata, in its original binary format, as stored on the disc.
discname.scram - The scrambled version of the disc, as a single file. While this is generally no longer needed after the reading process is complete and the data has been decoded, it contains the leadin and leadout data that is normally omitted when reading a disc; some people may elect to preserve it for that reason.

DiscImageCreator

All .bin files - The disc’s data and audio tracks, one file per track.
All .txt files - The full set of logs and metadata from the read process. Unlike redumper, these are stored as a large number of separate files.
discname.sub - The subcode metadata, in a processed binary format which reorders the data in order to be easier to read.
discname.cue - The disc’s table of contents (list of tracks) in a human-readable cuesheet format.
discname.ccd - The disc’s table of contents (list of tracks) in the CloneCD format, which is more complex and not designed to be read by humans.
discname.toc - The disc’s table of contents, in its original, low-level binary format.
discname.dat - XML-format metadata for each track, containing file sizes and hashes/checksums in several formats. The same data is contained in the .log file from redumper.
discname.c2 - The disc’s original fixity information, in a binary format.
Filenames containing Track 0 and Track AA - The leadin and leadout sections of the disc.
discname.img - A single-file copy of the disc’s data. This duplicates exactly the contents of the .bin files, and can be easily recreated by concatenating them in the future, so it’s not important to keep.
discname_img.cue - A copy of the cuesheet adjusted for the above file.

Obtaining the tools

All of these tools are open source and can be downloaded from GitHub.

MPF: https://github.com/SabreTools/MPF/releases
redumper: https://github.com/superg/redumper/releases
DiscImageCreator: https://github.com/saramibreak/DiscImageCreator/releases

In addition, for Mac users, I package redumper and DiscImageCreator in Homebrew. While my packages aren’t always 100% up to date, I try to ensure that they work. They can be installed via:

redumper: brew install mistydemeo/digipres/redumper
DiscImageCreator: brew install mistydemeo/digipres/disc-image-creator

Limitations

Certain especially complex types of copy protection are still not fully supported by these tools, although the situation is improving. While Redumper recently added support for the SafeDisc protection format, for example, there are still discs it’s not able to handle properly; closed-source tools such as CloneCD are still needed to handle these discs.

Redumper has plans to add support for ring-based copy protection such as Ring Protech in the future, but it’s poorly-supported at the moment; again, closed-source tools such as Alcohol 120% are necessary to handle these discs.

Conclusion

I hope this guide has been helpful for those who are interested. If readers have any questions or need any other information, please feel free to reach out to me on Mastodon or Bluesky.

Amazingly, this is actually the technical term - see ECMA-130 Annex B.↩
It’s not quite analogous: a Redump-style disc rip isn’t operating on as low a level as a raw flux read is, but it’s lower-level than standard disc reading software. While the Domesday86 project exists to perform truly low-level raw laser dumps of laserdisc and LD-ROM discs, there isn’t a mature project to apply the same technique to CD.↩
There are a few alternate sector formats which divide up the 2352 bytes differently; they devote more space to data and less space to error correction, at the risk of making a disc more susceptible to physical damage.↩

"GitHub" Is Starting to Feel Like Legacy Software

Jul 12th, 2024 12:58 pm

I’ve used a lot of tools over the years, which means I’ve seen a lot of tools hit a plateau. That’s not always a problem; sometimes something is just “done” and won’t need any changes. Often, though, it’s a sign of what’s coming. Every now and then, something will pull back out of it and start improving again, but it’s often an early sign of long-term decline. I can’t always tell if something’s just coasting along or if it’s actually started to get worse; it’s easy to be the boiling frog. That changes for me when something that really matters to me breaks.

To me, one of GitHub’s killer power user features is its blame view. git blame on the commandline is useful but hard to read; it’s not the interface I reach for every day. GitHub’s web UI is not only convenient, but the ease by which I can click through to older versions of the blame view on a line by line basis is uniquely powerful. It’s one of those features that anchors me to a product: I stopped using offline graphical git clients because it was just that much nicer.

The other day though, I tried to use the blame view on a large file and ran into an issue I don’t remember seeing before: I just couldn’t find the line of code I was searching for. I threw various keywords from that line into the browser’s command+F search box, and nothing came up. I was stumped until a moment later, while I was idly scrolling the page while doing the search again, and it finally found the line I was looking for. I realized what must have happened.

I’d heard rumblings that GitHub’s in the middle of shipping a frontend rewrite in React, and I realized this must be it. The problem wasn’t that the line I wanted wasn’t on the page—it’s that the whole document wasn’t being rendered at once, so my browser’s builtin search bar just couldn’t find it. On a hunch, I tried disabling JavaScript entirely in the browser, and suddenly it started working again. GitHub is able to send a fully server-side rendered version of the page, which actually works like it should, but doesn’t do so unless JavaScript is completely unavailable.

I’m hardly anti-JavaScript, and I’m not anti-React either. Any tool’s perfectly fine when used in the right place. The problem: this isn’t the right place, and what is to me personally a key feature suddenly doesn’t work right all the time anymore. This isn’t the only GitHub feature that’s felt subtly worse in the past few years—the once-industry-leading status page no longer reports minor availability issues in an even vaguely timely manner; Actions runs randomly drop network connections to GitHub’s own APIs; hitting the merge button sometimes scrolls the page to the wrong position—but this is the first moment where it really hit me that GitHub’s probably not going to get better again from here.

The corporate branding, the new “AI-powered developer platform” slogan, makes it clear that what I think of as “GitHub”—the traditional website, what are to me the core features—simply isn’t Microsoft’s priority at this point in time. I know many talented people at GitHub who care, but the company’s priorities just don’t seem to value what I value about the service. This isn’t an anti-AI statement so much as a recognition that the tool I still need to use every day is past its prime. Copilot isn’t navigating the website for me, replacing my need to the website as it exists today. I’ve had tools hit this phase of decline and turn it around, but I’m not optimistic. It’s still plenty usable now, and probably will be for some years to come, but I’ll want to know what other options I have now rather than when things get worse than this.

And in the meantime, well… I still need to use GitHub everyday, but maybe it’s time to start exploring new platforms—and find a good local blame tool that works as well as the GitHub web interface used to. (Got a fave? Send it to me at misty@digipres.club / @cdrom.ca. Please!)

Unlocking Puyo Puyo Fever for Mac's English Mode

Apr 7th, 2024 10:31 pm

The short, no-clickbait version: to switch the Mac version of Puyo Puyo Fever to English, edit ~/Library/Preferences/PuyoPuyo Fever/PUYOF.BIN and set the byte at 0x266 to 0x01—or just download this pre-patched save game and place it in that directory.

English Puyo Pop Fever in-game story mode screen

I’ve been a Mac user since 2005, and one of the very first Mac games I bought was the Mac port of Sega’s Puyo Puyo Fever. I’ve always been a Sega fangirl and I’ve always loved puzzle games (even if I’m not that good at Puyo Puyo), so when they actually released a Puyo Puyo game for Mac I knew I had to get it. This was back in the days when very, very few companies released games for Mac, so there weren’t many options. Even Sega usually ignored Mac users; Puyo Puyo Fever only came out as part of a marketing gimmick that saw Sega release a new port every month for most of a year, leading them to target more niche platforms like Mac, Palm Pilot and Pocket PC.

A few of the console versions came out in English, but the Mac port was exclusive to Japan. I didn’t read any Japanese at the time, so I just muddled my way through the menus while wishing I could play it in English. I’d thought that maybe I could try to transplant English game data from the console versions, but I didn’t own any of them so I just resigned myself to playing the game in Japanese.

Recently, though, I came across some information that made me realize there might be more to it. First, I finally got to try the Japan-exclusive Dreamcast port from 2004… and discovered that it was fully bilingual, with an option in the menu to switch between Japanese or English text and voices. I might have just thought that Dreamcast players were lucky and I was still out of luck until I ran into the English Puyo Puyo fan community’s mod to enable English support in the Windows version. Their technique, which was discovered by community members Yoshi and nmn around 2009, involves modifying not the game itself but a flag in the save game—the same flag used by the Dreamcast version, which it’s still programmed to respect despite the menu option having been removed.

I wasn’t able to use the Windows save modding tool produced by Puyo Puyo fan community member NickW for a couple of reasons:

It’s hardcoded to open the save file from the Windows save location, %AppData%\SEGA\PuyoF\PUYOF.BIN, and can’t just be given a save file at some other path, and
The Windows version uses compressed save data, while the Mac version always uses uncompressed saves, and so the editor won’t try to open uncompressed saves.

I could have updated the editor to work around this but, knowing that that the save was uncompressed and I only had to change a single byte, it seemed like overkill. One byte is easy enough to edit without a specialized tool, so I just pulled out a hex editor. The Windows save editor is source-available, so I didn’t have to reverse engineer the locations of the key flags in the save file myself. I guessed that the language flag offset wouldn’t be different between the uncompressed Windows saves and the Mac saves, so after reading that it’s stored at byte 0x288, I tried changing it from 0x00 to 0x01 and started up the game.

English Puyo Pop Fever title screen

…and it just worked! Without any changes, the entire game swapped over to English—menus, dialogue, and even the title screen logo. After 20 years, suddenly I was playing Puyo Puyo Fever for Mac in English.

According to the Windows save editor, the next byte (0x289) controls the voice language. Neither the Windows nor the Mac versions actually shipped with English voices on the disc, however, so setting this value just ends up silencing the game instead. The fan community prepared an English voice pack taken from the other versions, but I didn’t bother trying it on Mac since proper timing data for the English voices is missing.

At this point I figured I’d discovered everything I was going to find until I noticed something at the start of the save data in the hex editor:

Screenshot of a hex editor showing an image-like pattern

I’d only been paying attention to data later in the file, so I’d overlooked the very beginning until now. But now that I looked at it, it was a very regular pattern. It looks suspiciously like an image; uncompressed bitmaps are usually recognizable to the naked eye in a hex editor, and I wondered if that could be what this was. So I dug out the Dreamcast version again, and lo and behold:

A square pixel art image of a sign with the Japanese hiragana symbol "pu"

It’s the Dreamcast version’s save icon, displayed in the Dreamcast save menu and on the portable VMU save device. The Mac version doesn’t have any reason to need this, and has nowhere to display it, but it’s there anyway. Looking at the start of the header made me realize the default save file name from the Dreamcast port is there too—the very first bytes read 「システムファイル」, or “System File”. Grabbing an original Dreamcast save file, I was able to confirm that the Mac save is completely identical to the Dreamcast version, except for rendering multi-byte fields in big-endian format¹. I guess by 2004 there was no reason to spend time rewriting the save data format just to save a few hundred bytes, so all the Dreamcast-specific features come along for the ride on Mac and Windows.

Now, you might, ask, why would I spend so much time on a Mac port that doesn’t even run on modern computers? (Though I’d be happy to fix that - Sega, email me!) Part of it is just that I love digging into older games like this to find out what makes them tick; it’s as much a hobby as actually playing them. The other part, of course, is that I’ll actually play it. As you might be able to guess from the PowerPC Mac package manager I maintain, I still keep my old Macs around and every now and then I break out the PowerMac G4 for a few rounds of Puyo Puyo Fever. The next time I do, I’ll be able to play it in English.

The byte order, or endianness, of multi-byte data types is different between different kinds of CPUs. The PowerPC processors used by that era of Macs use the big endian format.↩

That Time I Accidentally Deleted a Game From MAME

Mar 1st, 2024 3:25 am

Awhile back, I had the chance to dump a game for MAME. I told myself that if the chance ever came up again, I’d contribute again. Luckily, it turns out I didn’t have to wait too long—but the story didn’t end like I expected it to.

In-game screenshot of Martial Masters

When I bought my PGM arcade motherboard, the #1 game I wanted to own was a one-on-one fighting game called Martial Masters. It’s a deeply underrated, gorgeous game—and judging from the price it goes for these days, I’m not the only one after it. It took quite a bit of hunting until I found a copy within my price range but my usual PGM game dealer in China finally tracked down a copy to sell me a few months ago. I was excited to finally play it on the original hardware, but also to see if I had another chance to contribute a game to MAME.

When it arrived, even before I had the chance to check the version number, I was surprised to see it was a Taiwanese region game. All of IGS’s games have simplified Chinese region variants for sale in China; it’s unusual to see a traditional Chinese version from Taiwan show up over there. It could just be a sign that the game was so popular they brought over extra cartridges from Taiwan when there weren’t enough for local arcades. Once I booted the game and made note of its version numbers, I checked MAME and saw that there was a matching game in its database: martmasttw, or a special Taiwanese version of revision 1.02. That also surprised me—IGS typically didn’t produce entirely separate builds for different regions. Instead, each of their games contains the data for every language and region in its ROMs, and the region code in its copy protection chip determines what region it boots up as.

Screenshot of Martial Masters crashing

The other thing I noticed about MAME’s martmasttw was a comment in the source code noting that it might be a bad dump—that is, an invalid read that produced corrupted data. This isn’t that uncommon when dumping these sorts of games. Whether it’s due to dying chips or hardware issues with the reading process, sometimes a read just goes wrong and it gets missed. Once I booted it up in MAME, I confirmed it looked like a bad dump. It instantly crashes with an illegal instruction error, a clear sign of corrupted program code. Now that I owned the game, I had a chance to dump the correct ROMs and fix MAME’s database.

Photo of a game chip being held

As soon as I opened the cartridge, I noticed something interesting: these weren’t the chips I was expected. Like with The Gladiator, I only needed to remove and dump two socketed chips, but these were a completely different model. Other PGM games using the same hardware typically use 27C322 (4MB) and 27C160 (2MB) chips, which were common EPROMs in their time period. Here, though, I saw something much more exotic: an OKI 27C3202 soldered into a custom adapter. The game board itself is essentially the same one that’s in The Gladiator, so it was clear that the adapter was presenting them as 4MB 27C322 chips.

I haven’t been able to figure out why it was designed this way. It can’t have been cheap to design and manufacture these custom adapters, and other PGM games that were made both before and after this one all use the more common chips without any adapters. I’ve only seen a single other game built this way. Was there a 27C322 shortage at the time this specific game was being made? Were they experimenting with new designs and ended up abandoning this approach? It’s hard to tell.

Photo of a game chip being dumped in an EPROM reader

I only have an EPROM reader adapter for chips in the 27C322 family, so I hoped it would would be able to handle reading them just fine. On my first attempt, it rejected it; as far as I can tell, it was trying to perform “smart” verification of the chip, which failed since the underlying chip underneath IGS’s adapter isn’t actually the chip it’s trying to query. I ultimately tricked it by inserting a real 27C322 first and reading that before swapping over to the chip I actually wanted to read. Once the reader’s recognized at least one chip, it seems happy to stick in 27C322 mode persistently.

My first read seemed fine, and the dumped data did have a different hash from what MAME recognized. Success! …or so I thought, until I tried actually booting the game, where it crashed again. I went back to the EPROM reader to make sure the chip was seated correctly before doing a new test read. From the physical design of the adapters, I knew that getting it seated might be a challenge.

The reader uses a ZIF socket which usually makes it easy to insert and remove chips. This time, though, there was an interesting complication. Because of how it’s constructed, the socket has a “lip” at the end past the final set of pins. With a normal 27C322, that’s not a problem; the chip ends right at the final set of pins, so nothing hangs over the end of the chip. This adapter has a very different shape from a real 27C322 chip, however—there’s a dangling “head” that contains the actual chip, as seen in the photo above showing the underside of the adapter. On the real board it hangs harmlessly over the end of the socket, but on a ZIF socket it ends up actually making contact with the end of the socket and keeps the pins from being able to sit as deeply as it would normally sit. I haven’t spoken to the person who originally dumped this revision, but I suspect that this is the issue behind the bad dump.

I ended up holding the apdater with one hand to stabilize it and keep all of the pins as even as I could while I locked the ZIF socket’s lever a second time; this time, it seemed as though I’d been able to get it sitting as even as possible. I then performed several more reads and, before trying to boot it again, compared them against each other. This time, I saw that these new reads were different from the first attempt—and that they were byte-for-byte identical to each other.

Screenshot of Martial Masters's title screen

Once I had what seemed like good dump of both chips, I booted them up in MAME to see if it would work. Unlike MAME’s ROMs, it booted right away without issues and worked perfectly. After I played a few rounds without a single crash or unexpected behaviour, I was satisfied that my new dumps were fine. As I was getting ready to submit a pull request to MAME to update the hashes in its database, however, I happened to grep the source for them and noticed something funny—they were already there. In another version of Martial Masters.

I mentioned earlier that I was surprised that MAME had labelled the Taiwanese 1.02 version of Martial Masters as a separate revision from the Chinese 1.02. Well, as it turns out, once the ROMs are dumped correctly it’s not a separate revision. The ROMs are actually byte-for-byte identical; it’s only the bad dump that had made MAME consider martmasttw a separate revision this whole time.

This is the point where I’d intended to open a pull request to MAME just updating a few hashes for the correct dump, but with everything I’d learned the final pull request deleted martmasttw entirely. I had set out to fix a revision of the game in MAME, and make one more verison of it playable. Instead, I’d proven it didn’t exist in the first place. This wasn’t where I expected to end up, but it does teach an important lesson: corrupted data can go unnoticed for years if it’s not double and triple checked.

And, more than that, it’s a reminder that databases are an eternal work in progress. MAME’s list of ROMs is also as close as there is to a global catalogue of arcade games and their revisions, but it’s still fallible. Databases grow and, sometimes, they shrink; proving a work doesn’t exist can be just as important as uncovering new works.

Fixing Classical Cats; or, How I Got Tricked by 28-year-old Defensive Programming

Dec 10th, 2023 9:49 pm

Every now and then, when working on ScummVM’s Director engine, I run across a disc that charms me so much I just have to get it working right away. That happened when I ran into Classical Cats, a digital art gallery focused on the work of Japanese artist and classical musician Mitsuhiro Amada. I wrote about the disc’s contents in more detail at my CD-ROM blog, but needless to say I was charmed—I wanted to share this with more people.

Screenshot of a cat playing piano next to a cat playing a violin and a cat playing cello

I first found out about Classical Cats when fellow ScummVM developer einstein95 pointed me at it because its music wasn’t working. Like a lot of early Director discs, Classical Cats mostly just worked on the first try. At this point in ScummVM’s development, I’m often more surprised if a disc made in Director 3 or 4 fails to boot right away. The one thing that didn’t work was the music.

Classical Cats uses CD audio for its music, and I’d already written code to support this in early releases of Alice: An Interactive Museum for Mac. I’d optimistically hoped that Classical Cats might be as easy, but it turned out to present some extra technical complexity. Regardless, for a disc called “Classical” Cats, I knew that getting music working would be important. I could tell that I wasn’t having the full experience.

While many CD-ROMs streamed their music from files on the disc, some discs used CD audio tracks for music instead. (If you’re already familiar with CD audio and mixed-mode CDs, you can skip to the next paragraph.) CD audio is the same format used in audio CDs; these tracks aren’t files in a directory and don’t have names, but are simply numbered tracks like you’d see in a CD player. Data on a CD is actually contained within a track on the disc, just like audio; data tracks are just skipped over by CD players. A mixed mode CD is one that contains a mixture of one or more data tracks and one or more audio tracks on the same disc. This was often used by games and multimedia discs as a simple and convenient way to store their audio.

Director software is written in its own programming language called Lingo; I’ve written about it a few times before. In addition to writing logic in Lingo, developers are able to write modules called XObjects; these can be implemented in another language like C, but expose an interface to Lingo code. It works very similarly to C extensions in languages like Ruby or Python.

While ScummVM is able to run Lingo code directly, it doesn’t emulate the original XObjects. Instead, it contains new clean-room reimplementations embedded into ScummVM that expose the same interfaces as the originals. If a disc tries to call an unimplemented XObject, ScummVM just logs a warning and is able to continue. I’d already implemented one of Director’s builtin audio CD XObjects earlier, which was how I fixed Alice’s music earlier.

ScummVM has builtin support for playing emulated audio CDs by replacing the audio tracks with MP3 or FLAC files. For Alice, I wrote an implementation of Director’s builtin Apple Audio CD XObject. That version was straightforward and easy to implement; it has a minimal API that allows an app to request playback of a CD via track number, which maps perfectly onto ScummVM’s virtual CD backend.

I already knew Classical Cats uses a different XObject, and so I’d have to write a new implementation for it, it turns out the API was very different from Alice’s. Alice, along with many other Director games I’ve looked at, uses a fairly high-level, track-oriented API that was simple to implement. ScummVM’s builtin CD audio infrastructure is great at handling requests like “play track 5”, or “play the first 30 seconds of track 7”. What it’s not at all prepared for is requests like “play from position 12:00:42 on the disc”.

You can probably guess what Classical Cats does! Instead of working with tracks, it starts and stops playback based on absolute positions on a disc. This may sound strange, but it’s how the disc itself is set up. On a real CD, tracks themselves are just indices into where tracks start and stop on a disc, and a regular CD player looks up those indices to decide where to seek to when you ask it to play a particular track. In theory, it’s pretty similar to dropping a record player needle on a specific spot on the disc.

This might not sound too complex to manage, but there’s actually something that makes it a lot harder: translating requests to play an absolute timecode to an audio file on disc. ScummVM isn’t (usually) playing games from a real CD, but emulating a drive using the game data and FLAC or MP3 files replacing the CD audio tracks. ScummVM generally plays games using the data extracted from the CD into a folder on the hard drive, which causes a problem: the data track on a mixed mode CD is usually the first track, which means that the timing of every other track on the disc is offset by the length of the data track. We can’t guess where anything else is stored without knowing exactly how long the data track is. If we’ve extracted the data from the CD, we no longer know how big that track is, and we can’t guess at the layout of the rest of the disc.

“Knowing the disc layout” is a common problem with CD ripping and authoring, and a number of standards exist already. Single-disc data CDs can easily be represented as an ISO file, but anything more complex requires an actual table of contents. When thinking about how to solve this problem for ScummVM, I immediately thought of cuesheets—one of the most popular table of contents formats for CD ripping, and one that’s probably familiar to gamers who have used BIN/CUE rips of 32-bit era video games. Among all the formats available for documenting a disc’s table of contents, cuesheets were attractive for a few reasons: I’ve worked with it before, so I’m already familiar with it; it’s human-readable, so it’s easy to validate that it’s being used properly; and it provides a simple, high-level interface that abstracts away irrelevant details that I wouldn’t need to implement this feature. A sample cuesheet for a mixed mode CD looks something like this:

FILE "CLSSCATS.BIN" BINARY
  TRACK 01 MODE1/2352
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    PREGAP 00:02:00
    INDEX 01 17:41:36
  TRACK 03 AUDIO
    INDEX 01 19:20:46
  TRACK 04 AUDIO
    INDEX 01 22:09:17

Once you understand the format, it’s straightforward to read and makes it clear exactly where every track is located on the disc.

The main blocker here was simply that ScummVM didn’t have a cuesheet parser yet, and I wasn’t eager to write one myself. Just when I was on the verge of switching to another solution, however, ScummVM project lead Eugene Sandulenko offered to write a new one integrated into ScummVM itself. As soon as that was ready, I was able to get to work.

The XObject Classical Cats uses has a fairly complicated interface that’s meant to support not just CDs, but also media like video cassettes. To keep things simple, I decided to limit myself to implementing just the API that this disc uses and ignore methods it never calls. It’s hard to make sure my implementation’s compatible if I don’t actually see parts of it in use, after all. By watching to see which method stubs are called, I could see that I mainly had to deal with a limit set of methods. Aside from being able to see which methods are called and the arguments passed to them, I was able to consult the official documentation in the Director 4.0 manual.¹

Two of the most fundamental methods I began with were mSetInPoint and mSetOutPoint, whose names were pretty self-explanatory. Rather than have a single method to begin playback with start/stop positions, this library uses a cue system. Callers first call mSetInPoint to define the start playback position and mSetOutPoint to set a stop position. These positions are tracked in frames, a unit representing 1/75th of a second.

On a real drive, they can then call mPlayCue to seek to the start of the position so that the drive is ready. Given the slow seek times of early CD-ROM drives, this separation forced developers to consider that the device might not actually be able to start playback as soon as they request it and take that into account with their app’s interactive features. After starting the seek operation, the developer was meant to repeatedly call mService to retrieve a status code and find out whether the drive was still seeking, had finished seeking, or encountered an error. Since ScummVM is usually acting on an emulated drive without actual seek times, I simplified this. mSetInPoint and mSetOutPoint simply assign instance variables with the appropriate values, and mService always immediately returns the “drive ready” code.

At this point, I did what I should have done in the first place and checked the source code. As I mentioned in a previous post, early Director software includes the source code as a part of the binary, and luckily that’s true for Classical Cats. As I checked its CD-ROM helper library, I stumbled on the method that made me realize exactly where I’d gone wrong:

on mGetFirstFrame me, aTrack
  put the pXObj of me into myXObj
  if myXObj(mRespondsTo, "mGetFirstFrame") = 0 then
    return 0
  else
    return  myXObj(mGetFirstFrame, aTrack)
  end if
end

This code might be familiar to Rubyists, since Ruby has a very similar construct. This class wraps the AppleCD SC XObject, instantiated in the instance variable myXObj, and calls methods on it. But it’s written defensively: before calling a number of methods, it calls mRespondsTo first to see if myXObj has the requested method. If it doesn’t, it just stubs it out instead of erroring. Since ScummVM implements mRespondsTo correctly, it means this code was doing what the original authors intended: seeing that my implementation of AppleCD SC didn’t have an mGetFirstFrame method, and just returning a stub value. Unfortunately for me, I was being lazy and had chosen which methods to implement based on seeing the disc try to use them—so I let myself be tricked into thinking those methods were never used.

As it turns out, they were actually key to getting the right timing data. Classical Cats was trying to ask the CD drive about timing information for tracks, and storing that to use to actually play the songs. With these methods missing, it was stuck without knowing where the songs were and how to play them.

And here I realized the great irony of what I was doing. Internally, Classical Cats thinks about its audio in terms of tracks, and asks the XObject for absolute timing data for each track. It then passes that data back into the XObject to play the songs, where ScummVM intercepts it and translates it back into track-oriented timing so its CD drive emulation knows how to play them. It’s a lot of engineering work just to take it all full circle.

At the end of the day, though, what’s important is it does work. Before I finished writing this, it was difficult to play Classical Cats on any modern computer; now, anyone with version 2.8.0 or later of ScummVM can give it a try. Now that it’s more accessible, I hope other people are able to discover it too.

Note: CD audio support for this disc is available in nightly builds of ScummVM, and will be available in a future stable release.

Schmitz, J., & Essex, J. (1994). Basic device control. In Using Lingo: Director Version 4 (pp. 300–307). Macromedia, Inc.↩

Cargo-dist: System Dependencies Are Hard (So We Made Them Easier)

Oct 25th, 2023 2:26 pm

My latest blog post is over at my employer’s blog post and talks about the work I’ve done to get system dependency management integrated into cargo-dist, an open source release management tool for Rust. The new release lets users specify non-Rust dependencies in Cargo.toml using a Cargo-like syntax and also provides a detailed report on the resulting binary’s dynamic linkage. Here’s a sample of the dependency syntax:

[workspace.metadata.dist.dependencies.homebrew]
cmake = { targets = ["x86_64-apple-darwin"] }
libcue = { version = "2.2.1", targets = ["x86_64-apple-darwin"] }

[workspace.metadata.dist.dependencies.apt]
cmake = '*'
libcue-dev = { version = "2.2.1-2" }

[workspace.metadata.dist.dependencies.chocolatey]
lftp = '*'
cmake = '3.27.6'

Go read the blog post to find out more!

Untangling Another Lingo Parser Edge Case

May 29th, 2023 3:53 pm

I was testing out a new Macromedia Director CD in ScummVM, and I noticed a non-fatal error at startup:

WARNING: ######################  LINGO: syntax error, unexpected tSTRING: expected ')' at line 2 col 70 in ScoreScript id: 2!
WARNING: #   2: set DiskChk = FileIO(mnew,"read"¬"The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")!
WARNING: #                                                                            ^ about here!

It may have been non-fatal, but seeing an error like that makes me uneasy anyway—I’m never sure when it’ll turn out to have ramifications down the line. This comes from the parser for Director’s custom programming language, Lingo, so I opened up the code in question¹ to take a look. The whole script turned out to be only three straightforward lines. The part ScummVM complained about came right at the start of the file, and at first glance it looked pretty innocuous.

set DiskChk = FileIO(mnew,"read"¬
"The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")
IF DiskChk = -35 THEN GO TO "No CD"

The symbol at the end of that first line is a continuation marker, which you might remember from a previous blog post where I debugged a different issue with them. The continuation marker is a special kind of escape character with one specific purpose: it escapes newlines to allow statements to extend across more than one line of code, and nothing else.

At first I thought maybe the issue was with the continuation marker itself being misparsed, like in the error I documented in that older blog post; maybe it was failing to be recognized and wasn’t being replaced with whitespace? To figure that out, I started digging around in ScummVM’s Lingo preprocessor. Spoiler: it turned out not to be an issue with the continuation marker, but it pointed me in the right direction anyway.

ScummVM handles the continuation marker in two phases. In a preprocessor phase, it removes the newline after the marker in order to simplify parsing later. Afterwards, in the lexer, it replaces the marker with a space to produce a single normal line of code. The error message above contains a version of the line between those two steps: the preprocessor has combined the two lines of code into one, but the continuation marker hasn’t been replaced with a space yet.

If we do the work of the preprocessor/lexer ourselves, we get this copy of the line:

set DiskChk = FileIO(mnew,"read" "The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")

In this form, the error is a bit more obvious than when it was spread across multiple lines. The problem is with how the arguments are passed to FileIO: the first two arguments are separated by a comma, but the second and third aren’t. The newline between the second and third arguments makes it easy to miss, but as soon as we put it all together it becomes obvious.

In the last case I looked at, described in the previous blog post, this was an ambiguous parse case: the same line of code was valid if you added the missing comma or not, but it was interpreted two totally different ways. This time is different. If you add the missing comma, this is a normal, valid line of code; if you don’t, it’s invalid syntax and you get the error we’re seeing at the top.

As far as I can tell, the original Director runtime actually accepts this without throwing an error even though this isn’t documented as correct syntax. The official Director programming manual tells the user to use commas to separate arguments, but it’s tolerant enough to support when they’re forgotten like they are here². ScummVM doesn’t get that same luxury. As I mentioned in the previous blog post, later Director versions tightened up these ambiguous parse cases, and supporting the weird case in Director 3 would significantly complicate the parser. Since this is only the second case of this issue, though, it’s not really necessary to support it either. ScummVM has builtin support for patching a specific disc’s Lingo source code, so I was able to simply fix this by patching the code to the properly-formatted version.

The disc in question still doesn’t fully work, but I’m putting some time into it. I’m planning on writing a followup on the other fixes necessary to get it running as expected. And for today’s lesson? Old software is weird. Just like new software.

Before version 4, Director software was interpreted from source code at runtime—so, conveniently, that means that you can peek at the source code to any early Director software.↩
MacroMind Director Version 3.0: Interactivity Manual. (1991). MacroMind, Inc. Page 64.↩

How I Dumped an Arcade Game for MAME

Oct 27th, 2022 9:06 pm

I recently had the chance to do something that I’ve wanted to do for years: dump an arcade game to add it to MAME.

Screenshot of The Gladiator's title screen

MAME is a massive emulator that covers thousands of arcade games across the history of gaming. It’s one of those projects I’ve used and loved for decades. I’ve always wanted to give back, but it never felt like I had something to contribute—until recently.

You might remember from one of my other posts that I recently got into collecting arcade games. This year I’ve been collecting games for a Taiwanese system called the PolyGame Master (PGM), a cartridge-based system with interchangeable games sold by International Games System (IGS) between 1997 and 2005. It has a lot of great games that aren’t very well-known, in particular some incredibly well-designed beat-em-ups.

A couple months ago, I bought a copy of The Gladiator, a wuxia-themed beat em up set in the Song dynasty. My specific copy turns out to be an otherwise-undumped game revison. Many arcade games were released in many different versions, including regional variations and bugfix updates, and it can take collectors and MAME developers years to track them all down. In the case of The Gladiator, MAME has the final release, 1.07, and the first few revisions, but it’s missing most of the versions in between. When my copy came here, I booted it up and found out it was one of those versions MAME was missing: 1.04.

Luckily, I already had the hardware on hand to dump it. I own an EPROM burner that I’d bought to write chips so that I could mod games I’d bought, but EPROM burners can read chips as well. I own an adapter that supports several common chips that, luckily, can handle exactly the chips I needed for this game.

Photo of an EPROM burner with a 27C160 chip in it

It’s easy to think of game cartridges as just being a single thing, but arcade game boards typically have a large number of chips. Why’s that? It’s partly technical; specific chips can be connected directly to particular regions of the system’s hardware, like graphics or sound, which means that even though it’s less flexible than an all-in-one ROM, it has some performance advantages too. The two chips I dumped here are program code for two different CPUs: one for the 68000 CPU in the system itself, and one for the ARM7 CPU in the game cartridge.

The other advantage is that using a large number of chips can make it easier to update a game down the line. Making an overseas release? It would be much cheaper to update just a couple of your chips instead of producing new revisions of everything on your board. Releasing a bugfix update? It’s much more quick and painless to update existing games if all your program code is on a single chip.

From looking at MAME, I could tell that every other revision of The Gladiator used a single set of chips for almost everything. Only the two program ROM chips are different between versions, which made my life a lot easier. I was also lucky that these chips were easy to get to. Depending on the kind of game, chips might be attached straight to the board, or they might be in sockets where they can be easily removed and reattached. The Gladiator has two game boards, one of which uses two socketed chips. And, thankfully, those were the only two chips I had to remove and dump.

To remove the chips, I used an EPROM removal tool—basically just a little set of pliers on a spring, with pair of needle noses that get in under the edge of the chip in the socket so you can pull it out. The two chips were both common types that my EPROM burner supports, so once I got them out they weren’t difficult to read. The most important chip, which has the game’s program code, is an EPROM chip known as the 27C160—a 2MB chip in a specific form factor. I already own a reader that supports that and the 4MB version of the same chip, which you can see in the above photo. The second chip is a 27C4096DC which has a much smaller 512KB capacity.

Photo of an open game cartridge showing the boards and ROM chips

Why are there two program ROMs? Many games for the PGM use a fascinating and pretty intense form of copy protection. As I mentioned earlier, the PGM motherboard has a 20MHz 68000 processor, a 16-bit CPU that was very widely used in the 90s. The game cartridge, meanwhile, has a 20MHz ARM7 coprocessor. For early games, that ARM7 was there just for copy protection. Game cartridges would feature an unencrypted 68000 ROM and an encrypted ARM7 ROM; the ARM7 must successfully decrypt and execute code from the encrypted ROM for the main program code to be allowed to boot and run. By the end of the PGM’s life, they’d clearly realized it was silly to be devoting the ARM7 just to copy protection when it was faster than the CPU on the motherboard, so they put it to use for actual game logic. On games like The Gladiator, the unencrypted 68000 ROM and the main CPU are only doing very basic bootstrapping work and then hand off the rest of the work to the ARM7, which runs the game using code on the encrypted ARM7 chip.

I spent awhile fumbling around trying to get the dumped ARM7 ROM to work, but it turns out that’s because I read it as the wrong kind. Oops. My reader has a switch that switches between the 2MB and 4MB versions of the chip… and I had it set to 4MB, even though the chip helpfully told me right on the package it’s a 2MB chip. So, after I spent a half hour fumbling, I realize what I’d done and went back to redump it—and that version worked first try. Phew.

Screenshot of The Gladiator's boot screen with the program ROM versions

Once I dumped it, I was able to figure out that one of the two program ROMs is identical to one that’s already in MAME; only the ARM ROM is unique. That meant adding it to MAME was very easy; I could mostly copy and paste existing code defining the game cartridge, changing just one line with the new ROM and a few lines with some different metadata, and I was good to go. I submitted a pull request and, after some discussion, it was merged. For something I’ve wanted to be able to contribute to for years, feels good and, honestly pretty painless. And now, as of MAME 0.249, The Gladiator 1.04 can finally be emulated!

Do You Speak the Lingo?

Jan 6th, 2022 2:58 pm

I’ve been spending some time lately contributing to ScummVM, an open-source reimplementation of many different game engines that makes it possible to play those games on countless modern platforms. They’ve recently added support for Macromedia Director, an engine used by a ton of 90s computer games and multimedia software that I’m really passionate about, so I wanted to get involved and help out.

One of the first games I tried out is Difficult Book Game (Muzukashii Hon wo Yomu to Nemukunaru, or Reading a Difficult Book Makes You Sleepy), a small puzzle game for the Mac by a one-person indie studio called Itachoco Systems that specialized in strange, interesting art games. Players take on the role of a woman named Miss Linli who, after falling asleep reading a complicated book, finds herself in a strange lucid dream where gnomes are crawling all over her table. Players can entice them to climb on her or scoop them up with her hands. If two gnomes walk into each other, they turn into a strange seed that, in turn, grows into other strange things if it comes into contact with another gnome. Players guide her using what feels like an early ancestor to QWOP, with separate keys controlling the joints on each of Linli’s arms. It’s bizarre, difficult to control, and compelling.

A lot of early Director games play fine in ScummVM without any special work, so I was hoping that would be true here too. Unfortunately, it didn’t turn out to be quite that simple. I ended up taking a dive into ScummVM’s implementation of Director to fix it.

Director uses its own programming language, Lingo, which is inspired by languages like Smalltalk and HyperCard. HyperCard was Apple’s hypermedia development environment, released for Macs in 1987, and was known for its simple, English-like, non-programmer friendly programming language. Smalltalk, meanwhile, is a programming language developed in the 70s and 80s known for its simple syntax and powerful object oriented features, very new at the time; it’s also influenced other modern languages such as Python and Ruby. Lingo uses a HyperCard-style English-like way of programming and Smalltalk-style object oriented features.

Early versions of Director are unusual for having the engine interpret the game logic straight from source code¹—which means if you’ve got any copy of the game, you’ve got the source code too. It’s great for debugging and learning how it works, but there’s a downside too. If you’re writing a new interpreter, like ScummVM, it means you have to deal with writing a parser for arbitrary source code. As it turns out, every issue I’d have to deal with to get this game working involved the parser.

I’ll get into the details later, but first some background. To give a simplified view, ScummVM processes Lingo source in a few steps. First, it translates the text from its source encoding to Unicode; since Lingo dates to before Unicode was widely used, each script is stored in a language-specific encoding and needs to be translated in order for modern Unicode-native software to interpret it correctly. Next, there’s a preprocessing stage in which a few transformations are made in order to make the later stages simpler. The output of this stage is still text which carries the same technical meaning, it’s just text that’s easier for the next stages to process. This is followed by the two stages of the actual parser itself: the lexer, in which source code text is converted into a series of tokens, and the parser, which has a definition of the grammar for the language and interprets the tokens from the previous stage in the context of that grammar.

This all sounds complicated, but my changes ended up being pretty small. They did, however, end up getting spread across several of these layers.

1. The fun never ends!

The very first thing I got after launching the game was this parse error:

WARNING: ######################  LINGO: syntax error, unexpected tMETHOD: expected end of file at line 83 col 6 in MovieScript id: 0!

Taking a look at the code in question, there’s nothing that really looks too out of the ordinary:

factory lady
method mNew
    instance rspri,rx,ry,rhenka,rkihoncala,rflag,rhoko,rkasoku
end method
method mInit1 spri
# etc

This is the start of the definition of the game’s first factory. Lingo supports object-oriented features, something that was still pretty new when it was introduced, and allows for user-defined classes called “factories”². Following the factory lady definition are a number of methods, defined in a block-like format: method NAME, an indented set of one or more lines of method definitions, and an end method line.

That last line, it turns out, was the problem. To my surprise, it turns out those end method blocks are totally optional even though it’s the documented syntax in the official Director manual. Not only can it have any text there instead of method, but it turns out you don’t need any form of end statement at all. If ScummVM didn’t recognize it, it seems that many games must have just skipped it.

Luckily, this was a very easy fix: I added a single line to ScummVM’s Bison-based parser and it was able to handle end statements without breaking support for methods defined without them. I hoped that was all it was going to take for Difficult Book Game to run, but I wasn’t quite so lucky.

2. Language-dependent syntax

Unlike most modern languages, Lingo doesn’t have a general-purpose escape character like \ that can be use to extend a line of code across multiple lines. Instead, it uses a special character called the “continuation marker”, ¬³, which serves that purpose and is used for nothing else in the language⁴. (Hope you like holding down keys to type special characters!) Here’s an example of how that looks with a couple lines of code from a real application:

global theObjects,retdata1,retdata2,ladytime,selif,daiido,Xzahyo,Yzahyo,StageNum, ¬
           daihoko

Since Lingo was originally written for the Mac, whose default MacRoman character set supported a number of “special” characters and accents outside the normal ASCII range, they were able to get away with characters that might not be safe in other programming languages. But there’s a problem there, and not just that it was annoying to type: what happens if you’re programming in a language that doesn’t use MacRoman? This is before Unicode, so each language was using a different encoding, and there’s no guarantee that a given language would have ¬ in its character set.

Which takes me back to Difficult Book Game. I tried running it again after the fix above, only to run into a new parse error. After checking the lines of code it was talking about, I ran into something that looks almost like the code above… almost.

global theObjects,retdata1,retdata2,ladytime,selif,daiido,Xzahyo,Yzahyo,StageNum, ﾂ
           daihoko

Spot the difference? In the place where the continuation marker should be, there’s something else: ﾂ, or the halfwidth katakana character “tsu”. As it turns out, that’s not random. In MacRoman, ¬ takes up the character position 0xC2, and ﾂ is at the same location in MacJapanese. That, it turns out, seems to be the answer of how the continuation marker is handled in different languages. It’s not really ¬, it’s whatever character happens to be at 0xC2 in a given text encoding.

Complicating things a bit, ScummVM handles lexing Lingo after translating the code from its source encoding to UTF-8. If it lexed the raw bytes, it would be one thing: whatever the character is at 0xC2 is the continuation marker, regardless of what character it “means”. Handling it after it’s been turned into Unicode is a lot harder. Since ScummVM already has a Lingo preprocessor, though, it could get fixed up there: just look for instances of ﾂ followed by a newline, and treat that as though it’s a “real” continuation marker⁵. A little crude, but it works, and suddenly ScummVM could parse Difficult Book Game’s code⁶. Or, almost…

3. What’s in a space?

Now that I could finally get in-game, I could start messing around with the controls and see how it ran. Characters were moving, controls were responding—it was looking good! At least until I pressed a certain key…

Her arms detached—that doesn’t look comfortable. In the console, ScummVM flagged an error that looked relevant:

Incorrect number of arguments for handler mLHizikaraHand (1, expected 3 to 3). Adding extra 2 voids!

This sounded relevant, since “hiji” means elbow. I figured it was probably the handler called when rotating her arm around her elbow, which is exactly what visually broke. I took a look at where mLHizikaraHand and the similar handlers were being called, and noticed something weird. In some places, it looks like this:

locaobject(mLHizikaraHand,(rhenka + 1),dotti)

And in other places, it looked slightly different:

locaobject(mLHizikaraHand (rhenka + 1),dotti)

Can you find the difference? It’s the character immediately after the handler name: instead of a comma, it’s followed by a space. Now that I looked at it, the ScummVM error actually sounded right. It does look like it’s calling mLHizikaraHand with a single argument (rhenka + 1). After talking it over with ScummVM dev djsrv, it sounds like this is just a Lingo parsing oddity. Lingo was designed to be a user-friendly language, and there are plenty of cases where its permissive parser accepts things that most languages would reject. This seems to be one of them.

Unfortunately, this parse case also seems to be different between Lingo versions. Fixing how it interprets it might have knock-on effects for parsing things created for later Director releases. Time to get hacky instead. The good news is that ScummVM has a mechanism for exactly this: it bundles patches for various games, making it possible to fix up weird and ambiguous syntax that its parser can’t handle yet. I added patches to change the ambiguous cases to the syntax used elsewhere, and suddenly Miss Linli’s posture is looking a lot healthier.

This whole thing ended up being much more of a journey than I expected. So much for having it just run! In the end, though, I learned quite a bit—and I was able to get a cool game to run on modern OSs. I’m continuing to work on ScummVM’s Director support and should have more to write about later.

Thanks to ScummVM developers djsrv and sev for their help working on this.

Later versions switched to using a bytecode format, similar to Java or C#. This makes processing a lot easier, since bytecode produced by Director’s own compiler is far more standardized than human-written source code.↩
Despite the name, it isn’t really implementing the factory pattern.↩
The mathematical negation operator.↩
It’s a bit of a weird choice, but Lingo didn’t do it first. It showed up first in Apple’s HyperCard and AppleScript languages.↩
Tempting as it is to refactor the lexer, I had other things to do, and I really wasn’t familiar enough with its innards to take that on.↩
As it turns out, this wasn’t the only game with the same issue. Fixing this also fixed several other Japanese games, including The Seven Colors: Legend of Psy・S City and Eriko Tamura’s Oz.↩

← Older Blog Archives