The Future Is Now

What You Might Miss When Backing Up CDs

I’ve written a bit recently about CD-ROM preservation and some of the more niche, easily-missed parts of the format. I’ve covered the formats themselves, but I felt it might help to provide some concrete examples of the kind of data that can easily be missed and that might not get backed up.

As I mentioned in a previous post, many CD disc image formats don’t include the disc’s subcode data1. Most discs don’t use it for any non-structural data, and in the cases where it’s used for copy protection it’s immediately obvious that it’s needed since the backed up software won’t work. There are cases that are subtler, however, and where actually significant data in the subcode can be missed.

CD+G is an extension to the Compact Disc format that allows displaying simple graphics alongside the audio content of a CD. It comes well before CD-ROM, so it’s designed for CD players that are hooked up to a TV rather than computers. CD+G stores its graphics in the disc’s subcode data, which means that only backups that include that data actually capture the full content of the disc. Back up a CD+G disc in a format that doesn’t include subcode data, like BIN/CUE, and it just turns into a normal audio CD. These graphics can be used for anything; the first CD+G release, Firesign Theatre’s 1985 comedy album (shown above) features illustrations to accompany the audio. It was never widely-used, but it did develop a significant niche in karaoke discs as a way to display lyrics on-screen.

I want to talk a little more about how easy it can be to miss that a disc has significant CD+G data, so let’s take a look at a few practical examples. A simple example is the Firesign Theatre album mentioned above. The packaging, as seen on Discogs, doesn’t mention the CD+G content at all, aside from a brief reference in the album credits—most owners of this disc would have no idea the CD+G content existed, and would never have owned a player. It’s very likely that most people backing up their disc wouldn’t even know they had skipped some of its content.

That’s a little too simple, though. A little too neat and tidy. Let’s take a look at something more fun.

In the 16-bit era, the first CD-based game consoles all had support for playing music CDs as a bonus feature. Many of these consoles also supported CD+G, and for many families these would have been their only CD+G player. The Victor Wondermega, a high-end all-in-one Sega Mega Drive/Mega CD console released in Japan, leaned into CD+G’s popularity as a karaoke format by making karaoke one of its major features—including two microphone ports built right into the console. The system was bundled with a pack-in CD called Wondermega Collection that showed off all aspects of its features: it includes several minigames that can be played in Mega CD mode, and two karaoke audio tracks that can be played if the player boots into the system’s CD player instead of the game.

Screenshot of a track from Wondermega Collection with CD+G imagery missing. Screenshot of a track from Wondermega Collection with CD+G imagery present. The UI in the CD player indicates that this is the same track on the disc, with the exact same spot in the track being shown.

Screenshots of two disc images of Wondermega Collection running in the same CD player. The screenshot on the left is played without the subcode information, so it's recognized as audio-only. The screenshot on the right is played with the subcode information, so the CD+G content is correctly identified and rendered during playback.

Those karaoke tracks are coded using CD+G2, which means that they’re only properly backed up if the disc is ripped in a format which supports subcode data. And, because of the complexity of the disc, there are many reasons that it’s easy to fail to notice that this data was missed:

  • Since the disc contains both Mega CD and audio CD content, the audio CD portion could easily be missed when testing the backup. In this case, it’s easy to miss that the audio CD tracks actually had unique content beyond the audio itself.
  • Not all Mega CD emulators support subcode data, so it may not be clear how to even test that the disc is complete or incomplete.
  • The Redump standard doesn’t include subcode data in the set of data it validates3, so those backing up their discs to match Redump’s database may discard the subcode data without realizing that it’s significant.

So what’s the lesson here? Well, first of all, it’s simply that it’s difficult to fully audit all of the content on a disc to confirm that a backup is fully functional. The more kinds of distinct content on a disc, as in our Wondermega Collection example, the harder. (This is similar to the example of Mac/Windows hybrid discs I gave in my previous post, where by only testing a backup on one operating system an archivist might miss that they had discarded data for the other.) The second lesson is that it’s not always obvious what content even exists on a disc, and it’s easy to throw something away simply by not knowing it existed in the first place.

My personal recommendation, for those creating raw disc backups of physical CDs, is simply to always store the subcode data—at only 4% the size of the disc’s primary data, it adds very little extra storage burden in exchange for being sure that nothing is being lost. For the truly storage space-starved, it’s worth at least doing a full audit to make sure that no CD+G, CD-TEXT or similar data is present before discarding subcode data.


  1. Also known as subchannel data.

  2. Which, yes, means they do work on any CD player that supports CD+G, including regular karaoke CD players.

  3. This isn’t out of ignorance—there are technical limitations that make it difficult to validate the fixity of subcode data. Redump’s database only includes data that can be reliably reproduced; omitting subcode data doesn’t mean that it’s not significant or that it shouldn’t be backed up along with the rest of the disc’s content, just that it can’t be validated in the same way that the disc’s main contents can be.