<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[The Future Is Now]]></title>
  <link href="http://mistys-internet.website/blog/atom.xml" rel="self"/>
  <link href="http://mistys-internet.website/blog/"/>
  <updated>2026-03-27T17:53:36-07:00</updated>
  <id>http://mistys-internet.website/blog/</id>
  <author>
    <name><![CDATA[Misty De Meo]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Forget Spreadsheets, I Wrote My Own Visual Game Script Editor]]></title>
    <link href="http://mistys-internet.website/blog/blog/2026/03/27/forget-spreadsheets/"/>
    <updated>2026-03-27T17:52:46-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2026/03/27/forget-spreadsheets</id>
    <content type="html"><![CDATA[<p>People who have been following me on social media may know that, for the past few years, I&rsquo;ve been working on a fan translation project for a Sega Saturn RPG called <a href="https://en.wikipedia.org/wiki/Lunar:_Sanposuru_Gakuen"><em>Magic School Lunar</em></a>. With technical and initial playtesting work wrapping up, that project is finally entering the late phases and I&rsquo;m now reaching a point where I&rsquo;m going to be doing a lot of rewriting of the English text. The traditional approach, in both professional localizations and fan translations, involves working in a spreadsheet with access to the original language and translation side by side. (In fan translations, it&rsquo;s not even rare to work by editing raw text files.) It&rsquo;s not the most convenient or <em>fun</em> to be writing entirely within a spreadsheet though, especially when there are technical constaints to be thinking about. This game has a strict character limit per line and a maximum of three lines per text box, and I found myself wishing for a nice way to visualize how text will look in-game so I can ensure what I&rsquo;m writing will actually look good in-game. A spreadsheet just wasn&rsquo;t going to cut it. But then, of course, I realized&hellip; I <em>am</em> a programmer. Why don&rsquo;t I just write my own? So I did.</p>

<p><img src="http://mistys-internet.website/blog/images/msl/editor.png" alt="Screenshot of the script editor, showing a screenshot at the top and a spreadsheet-like view on bottom. The screenshot at the top features the text that's being typed into the actively-selected text box." /></p>

<p>My work is based around CSV files that contain the script and metadata for every line of text in the game, and between the script itself and metadata such as what character is speaking, I had in principle enough data to mock up what any given line should look like in-game. My basic goal was to reproduce a spreadsheet-like workflow but with added screenshot visualization that updates in realtime as I work. Now that I&rsquo;ve had the chance to try it out, I can say it definitely works just as well as I&rsquo;d hoped. It&rsquo;s much easier to catch mistakes should I find myself overflowing lines, and it&rsquo;s very useful for making aesthetic decisions about text formatting too.</p>

<p>I decided to write a GUI app rather than a web app, for no particular reason other than because I happen to like desktop apps, and ended up using <a href="https://www.tcl-lang.org">Tk</a> as the windowing toolkit so it would be easy to run it on other platforms if someone needs to run it on something other than macOS. Although I briefly considered using a compiled language<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> I ended up deciding on Python. This was partly because Python happens to have very good bindings for Tk, and partly because I&rsquo;ve passed my CSVs through Python-based tooling already so I knew that the CSVs I&rsquo;d output would result in minimal formatting/git repo churn. I&rsquo;ve looked at Tk apps a few times but I&rsquo;ve never written one of my own, so I was pleasantly surprised how easy it was to work with. I spent a few hours on this project over the course of a few days, and quite honestly most of that time was just spent picking up the basics of Tk.</p>

<p>I didn&rsquo;t start out with the GUI though. This was all about the screenshot preview, so I figured I&rsquo;d handle that first. I&rsquo;ve done a little work with Python&rsquo;s <a href="https://pillow.readthedocs.io/en/stable/">Pillow library</a> before, which felt like it&rsquo;d be right tool for the job, and thankfully it was quite easy to work with. All I needed was basic image compositing, and Pillow handles that very well. There were only really three elements I needed to simulate a proper screenshot:</p>

<ul>
<li>A scene from the game with no text or character portrait</li>
<li>The game&rsquo;s font and positioning data for where text goes</li>
<li>The portraits used when named characters speak and their positioning data</li>
</ul>


<p>I had the raw images for the font and portraits already from the translation process, and the positioning data wasn&rsquo;t too hard to find either. <em>Magic School Lunar</em> uses a fixed-width font<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>, so there&rsquo;s no fancy inter-character spacing to reverse-engineer; a given index on a line is always going to go in exactly the right place. I simply grabbed real screenshots of the game and worked out exactly where where the letters belonged, then I rendered new text on top of the existing text to make sure everything was aligned exactly where it belongs. It was easy to handle the portraits as well, which always go in the same place<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, so it didn&rsquo;t take me long to add that either. After I was confident I had the positioning completely right, I ran off a build of the game with all the text blanked out so I could grab myself an in-game screenshot of an empty text box with no character speaking.</p>

<p><img src="http://mistys-internet.website/blog/images/msl/basicui.png" alt="Screenshot of the UI with a single text input box and a screenshot rendered using the text that's been typed there." /></p>

<p>Once I had the actual screenshot generation taken care of, I got to work building out the real UI. I&rsquo;d been prototyping with a simple MVP that just rendered the screenshot out from a single text input box and which was missing the actual script editor, and I was slightly dreading the work of laying out the real thing. I shouldn&rsquo;t have worried though, since it turned out to be quite easy. I used Tk&rsquo;s <a href="https://tkdocs.com/tutorial/grid.html">grid manager</a><sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> to handle the layout. It divides the screen into rows and columns, with each element laid out on a specific column and row. It&rsquo;s easy to think about, but it also just happens to be a perfect match for laying out a bunch of UI elements in a table&mdash;exactly my usecase, in other words.</p>

<p>I was also pleasantly surprised just how easy it was to integrate the generated screenshot into the UI. Tk itself assumes images you&rsquo;re displaying are coming from some kind of bitmap file, but Tk+Pillow seems to be a popular combination because Pillow has built-in Tk integration and can wrap a Pillow-generated image in a Tk-compatible image class that allows you to put a generated image anywhere that Tk accepts an image.</p>

<p>My only big UI issue came when I started loading larger script files. My original plan had been to populate the full UI with the entirety of a script file and implement a scrolling viewport, much like in a traditional spreadsheet app. Unfortunately, it turns out Tk has big issues rendering huge numbers of elements within a frame and it would spin its wheels for an extended period loading any input file with more than a few hundred rows. If this were an app I was making for other people, I might have chosen to try to optimize this by having it dynamically load and unload elements as you scroll, but I didn&rsquo;t really need something that fancy for myself; I chose to instead just paginate the data with a limited number of rows onscreen at once along with back/forward buttons to switch pages. It works well enough for my own needs.</p>

<p>All in all, I was very happy how fast this all came together. I&rsquo;ve already been using this to do little changes, and I know it&rsquo;s going to come in handy as I get deeper into script editing. If you want to check the source code for yourself, I&rsquo;ve put it <a href="https://codeberg.org/mistydemeo/msl-script-editor">up on Codeberg</a>.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>I actually considered Swift at one point, but I really didn&rsquo;t want to limit this to only working on Mac.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>Fun fact: this game uses the exact same font as the two Sega CD <em>Lunar</em> games in Japanese, so the English version is reusing the font from the English versions of those games for that extra nod to series history.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>The real game sometimes puts these on the right instead of the left, but I didn&rsquo;t bother simulating this since it won&rsquo;t affect how I&rsquo;ll be writing.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
<li id="fn:4">
<p>Tk-knowers will probably recognize that despite the grid manager being around since 1996, it&rsquo;s still the &ldquo;new&rdquo; system to a lot of people and most code examples online are still using the older, fiddlier <code>pack</code> system. I decided to deal with the annoyance of having fewer examples to follow in exchange for getting to use something easier.<a href="#fnref:4" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[It's Not Called Watchy; or, Solving a Minor Video Game Mystery]]></title>
    <link href="http://mistys-internet.website/blog/blog/2025/12/22/watchy/"/>
    <updated>2025-12-22T20:31:43-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2025/12/22/watchy</id>
    <content type="html"><![CDATA[<p>Flexing the brevity muscle today with a short post to set the record straight on a weird video game history quirk I spotted online.</p>

<p>For reasons that aren&rsquo;t important<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>, I was looking up an extremely obscure 90s PC game called <em>Team 47 GoMan</em>. Absolutely no one has heard of this game before, which doesn&rsquo;t seem to be fondly (or at all) remembered, which explains why there wasn&rsquo;t all that much about it online. (&ldquo;From the creators of <a href="https://obscuritory.com/fighting/creep-clash/">Creep Clash</a>&rdquo; didn&rsquo;t sell too many copies.) But I <em>did</em> see something strange. A lot of the search results online had a different title than the one I&rsquo;d seen. Several of the top results called it <em>Watchy</em>&mdash;which I thought was <em>weird</em>, since it didn&rsquo;t seem to have anything to do with the game at all. Even weirder, though, several of these sites had screenshots and videos of the game, but <em>none</em> of them used that &ldquo;Watchy&rdquo; title. So where did it come from?</p>

<p>The Mobygames page gave me a bit of a hint. It used &ldquo;Watchy&rdquo; as the primary title, while specifying that &ldquo;Team 47 GoMan&rdquo; was the title in other territories. Vaguely plausible, but at the same time not terribly satisfying. Is this US-developed game really so obscure in the US that the only screenshots and footage are of the alleged European/Japanese release?</p>

<!-- ![Box for "Watchy: A Team 17 GoMan Adventure" featuring robot characters.](/images/watchy/61Z109fKk8L._AC_-2.jpg) -->


<p><img src="http://mistys-internet.website/blog/images/watchy/61Z109fKk8L._AC_-2.jpg" style="width: 50%; float: left; padding-right: 0.5em;" alt="Watchy: A Team 17 GoMan Adventure"></p>

<p>I did some searching under the &ldquo;Watchy&rdquo; name to see if I could find any copies for sale, and did run into two copies. One is from a Canadian budget publisher, and one <a href="https://aukro.cz/watchy-a-team-47-goman-mini-adventure-cd-pc-6988977528?tab=course-description">seems to be from Europe</a>. Both of them have the Watchy title and what looks like similar characters, but what really caught my eye is the subtitle. The Canadian one calls itself &ldquo;A Team 47 GoMan Adventure&rdquo; - which feels like a weird title if this is meant to <em>be</em> &ldquo;Team 47 GoMan&rdquo;, isn&rsquo;t it? The other box is even more telling though. It reads &ldquo;A Team 47 GoMan Mini Adventure!&rdquo;, which makes it read a lot less like an alternate title and more like a spinoff.</p>

<!-- ![Box for "Watchy: A Team 17 GoMan Mini Adventure!" featuring robot characters.](/images/watchy/YB13.jpg) -->


<p><img src="http://mistys-internet.website/blog/images/watchy/YB13.jpg" style="width: 50%; float: right; padding-bottom: 0.5em" alt="Watchy: A Team 17 GoMan Mini Adventure!"></p>

<p>At this point I did what I should have done in the first place and checked the publisher&rsquo;s archived webpage. Right there, on the <a href="https://web.archive.org/web/19981206050102/http://www.47-tek.com/3pack.htm">homepage from 1998</a>, I had the evidence I needed&mdash;a page for their &ldquo;GoMan&trade; 3 Pack&rdquo;, which included <em>Team 47 GoMan</em> itself along with a screensaver and, of course, <a href="https://web.archive.org/web/19980129025158/http://www.47-tek.com/watchy.htm"><em>Watchy</em></a>. This, it turns out, is a spinoff like I thought it must have been from the title. It&rsquo;s described as a 2D arcade action game, and from looking at <a href="https://www.youtube.com/watch?v=1450HriICCM&amp;t=870s">footage</a> that was linked to me when I was <a href="https://bsky.app/profile/maxmantic.bsky.social/post/3m6s3llkvjl2f">talking about it on Bluesky</a> it&rsquo;s a much simpler game than the actual <em>GoMan</em>.</p>

<p>But <em>why</em> was part of the internet convinced these were the same game? I think this may have accidentally come from Mobygames or another database like it, since many of the pages with the wrong title have text that seems to be copied and pasted from one place. The <a href="https://www.mobygames.com/game/42081/team-47-goman/">Mobygames page was created in 2009</a>, and luckily the Wayback Machine <a href="https://web.archive.org/web/20100503033618/http://www.mobygames.com/game/windows/team-47-goman">has copies going back to 2009-2010</a> that show it using the <em>Team 47 GoMan</em> title. <a href="https://web.archive.org/web/20130430070454/http://www.mobygames.com/game/windows/watchy">By 2013</a> the title had become <em>Watchy</em>, so someone clearly changed the main title somewhere in between.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> Why? I suspect it&rsquo;s that Canadian box I showed earlier, which <a href="https://www.amazon.com/Watchy-Team-47-GOMAN-Adventure-Canada/dp/B002OCD0V8">comes from the only Amazon.com product listing with &ldquo;Team 47 GoMan&rdquo; in it</a>. It sure feels like someone was looking the game up, found the box on Amazon with a different title, and assumed that was the primary game title in the US and so it should be the main game title on Mobygames as well<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>.</p>

<p>Even though this is <a href="https://www.mobygames.com/game/42081/team-47-goman/">correct on Mobygames now</a>, I expect the long tail of this particular mixup to take awhile. The other online references to &ldquo;Watchy&rdquo; all seem to be copied from Mobygames, which is a common pattern online; a lot of game history is more or less rumour, copied from site to site without actually going back to check with sources. But these pages also don&rsquo;t tend to update very often once they&rsquo;ve been created, which means that even if Mobygames has been corrected I expect it to take a long time for the web to correct itself.</p>

<p>Which is why I&rsquo;m writing this. If someone online finds one of these abandonware sites and wonders why the page is calling the game they just downloaded &ldquo;Watchy&rdquo;, at least they might find this post in a search engine telling them that it&rsquo;s wrong&mdash;and why.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>But that may eventually bear fruit on <a href="https://cdrom.ca">CD-ROM Journal</a>.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>The observant will notice that the <em>description</em> still reads &ldquo;Team 47 GoMan&rdquo;, which should have been a sign that I should have questioned the primary game title sooner.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>I&rsquo;m not really a fan of Mobygames&rsquo;s policy that the US title for a game is always the default title no matter where it&rsquo;s from or when it came out, but that&rsquo;s another story&hellip;<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Dist 0.28.1 Is Out]]></title>
    <link href="http://mistys-internet.website/blog/blog/2025/07/20/dist-0-dot-28-dot-1-is-out/"/>
    <updated>2025-07-20T14:43:27-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2025/07/20/dist-0-dot-28-dot-1-is-out</id>
    <content type="html"><![CDATA[<p>I&rsquo;m happy to announce that I&rsquo;ve been granted permission to resume development on <a href="https://axodotdev.github.io/cargo-dist/">dist/cargo-dist</a>, the easy-to-use binary packaging and distribution tool. As someone who uses dist a lot in my own projects, I&rsquo;ve been hoping to be able to keep the project going and I hope this will be useful for other people as well.</p>

<p>I&rsquo;ve just released <a href="https://github.com/axodotdev/cargo-dist/releases/tag/v0.28.1">version 0.28.1</a>, a bugfix release based on the last stable version. This fixes the critical issues around GitHub Actions runners and also includes a number of other important bugfixes. Please give it a try and let me know how it goes!</p>

<p>I&rsquo;m planning an 0.29.0 in the near future containing all of the improvements that originated in Astral&rsquo;s forkm and longterm I&rsquo;m planning to keep it going as long as there are people interested in using it. I&rsquo;d like this to move forward in a community-supported model with members of the community contributing new features they&rsquo;re interested in.</p>

<p>Got thoughts? Let me know <a href="https://github.com/axodotdev/cargo-dist/issues">on the issue tracker</a> or <a href="https://discord.gg/MnyjrpTceV">on the official discord</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Database Row That Did and Didn't Exist]]></title>
    <link href="http://mistys-internet.website/blog/blog/2025/05/13/the-database-row-that-did-and-didnt-exist/"/>
    <updated>2025-05-13T15:16:48-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2025/05/13/the-database-row-that-did-and-didnt-exist</id>
    <content type="html"><![CDATA[<p>Every now and then, I run into a bug so mystifying on its face I know it&rsquo;s going to be a journey just from the error message. This is one of those stories.</p>

<p>I work at a Python shop right now, and we were upgrading an app to Django 5. There are a few breaking changes, but nothing too unexpected or out of the ordinary. Everything was working great in our development environment, but once we deployed to our QA environment we noticed something <em>very</em> strange&mdash;an <code>IntegrityError</code> exception telling us that an object couldn&rsquo;t be saved because the primary key already existed in the database.</p>

<p>My immediate reaction was that, well, <em>that</em> was unusual. I didn&rsquo;t think we were ever manually assigning an <code>id</code> and we were always relying on the database&rsquo;s autoincrement. I double checked the code in the traceback, and indeed, all I saw was something harmless. An anonymized version looks something like this:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>object = models.Model.filter(property1=value).get()
</span><span class='line'>object.property2 = method_result()
</span><span class='line'>object.save()</span></code></pre></td></tr></table></div></figure>


<p>Not only were we not assigning it a clashing <code>id</code>, we weren&rsquo;t even creating a new row from scratch. This was an existing database record we&rsquo;d just queried for, then updated a single property on and then saved. We shouldn&rsquo;t have been trying to insert a new row into the database at all&mdash;Django should have been able to know to <code>UPDATE</code> the existing row it itself had fetched in the first place. I wracked my head trying to figure out what was going on, and started trying everything I could think of to narrow it down:</p>

<ul>
<li>Did it always happen? (Yes.)</li>
<li>Did it happen with every record, not just that one? (Yes&mdash;or so we thought.)</li>
<li>Did it happen if I simplified the code to the most basic possible version, to isolate out anything else that could be messing with it? (Yes; just fetching the record and immediately saving it without changes triggered the bug.)</li>
<li>Did it happen if I forced Django to perform an <code>UPDATE</code> instead of an <code>INSERT</code>? (Yes, but with new symptoms; calling <code>object.save(force_update=True)</code> failed with a message about how it couldn&rsquo;t do an <code>UPDATE</code> because the row <em>didn&rsquo;t</em> exist.)</li>
<li>Did it happen from a <code>django-admin</code> shell on the QA server? (Yes.)</li>
<li>Could I manually query for and update the row from a <code>psql</code> console? (Yes, so Postgres itself was fine.)</li>
</ul>


<p>By this point several of us were poking at the problem from different angles trying to figure out exactly what was going on. We had a little breakthrough when a coworker idly tried fetching not one of our recent records but the record with <code>id</code> 1&mdash;and Django was perfectly happy to save it back. It was only <em>some</em> of our records that seemed to be cursed, and again only in Django&mdash;Postgres itself would happily write to them.</p>

<p>By this point there were three of us poking at the problem, and each of us made a discovery building on the experiments we&rsquo;d done so far that ended up cracking the problem. First, one person took a look at the queryset that Django used to determine if the primary key existed in the database and discovered that it was completely empty. This determines whether Django thinks the record is newly-created or not, and so&mdash;even though the record was fetched from the database&mdash;it determined the record was new and should be inserted into the database with an <code>INSERT</code> instead of using <code>UPDATE</code> on an existing row.</p>

<p>Well, that was the proximate cause, but that didn&rsquo;t yet explain <em>why</em> that was happening. The next clue came in the discovery that the Django model definitions and the schema of the actual database had drifted for historical reasons. Specifically, several tables&mdash;including this one&mdash;were specified as using an <code>int</code> for their primary key in Django, but used <code>bigint</code> in the real database. This was true for both our QA server and production, so it wasn&rsquo;t just a QA-specific drift. We also, at this point, noticed that the records we were having trouble with <em>all had <code>id</code>s outside the <code>int</code> range</em>. Until this point, the only small primary key we&rsquo;d tried was <code>id=1</code>, but we tried a selection of other small <code>id</code>s and found that Django would read and write those just fine. It was only once we entered <code>bigint</code>&rsquo;s range that the Django problem reared its head.</p>

<p>Which, finally, led me to look back at something I&rsquo;d previously noticed but not paid much attention to in the Django 5.0 release notes. In the <a href="https://docs.djangoproject.com/en/5.2/releases/5.0/#miscellaneous">Miscellaneous section</a>, it mentions that</p>

<blockquote><p>Filtering querysets against overflowing integer values now always returns an empty queryset. As a consequence, you may need to use ExpressionWrapper() to explicitly wrap arithmetic against integer fields in such cases.</p></blockquote>

<p>This took on new meaning with the knowledge we now had about the mismatch of primary key types. We <em>were</em> in fact filtering querysets against overflowing integer values, or rather <em>Django itself</em> was because Django had been given a wrong understanding of what the integer type was. In prior versions, Django had let the &ldquo;impossible&rdquo; comparison carry out and it worked by chance; in Django 5, it enforced internal correctness by rejecting a comparison the underlying database would be able to carry out. I certainly can&rsquo;t call it the wrong decision from a correctness standpoint, but it was a hell of an issue to debug!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[What You Might Miss When Backing Up CDs]]></title>
    <link href="http://mistys-internet.website/blog/blog/2025/01/23/what-you-might-miss-when-backing-up-cds/"/>
    <updated>2025-01-23T16:39:24-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2025/01/23/what-you-might-miss-when-backing-up-cds</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve written a bit recently about CD-ROM preservation and some of the more niche, easily-missed parts of the format. I&rsquo;ve covered the formats themselves, but I felt it might help to provide some concrete examples of the kind of data that can easily be missed and that might not get backed up.</p>

<p>As I mentioned in a <a href="http://mistys-internet.website/blog/blog/2024/09/13/the-working-archivists-guide-to-enthusiast-cd-rom-archiving-tools/">previous post</a>, many CD disc image formats don&rsquo;t include the disc&rsquo;s subcode data<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. Most discs don&rsquo;t use it for any non-structural data, and in the cases where it&rsquo;s used for copy protection it&rsquo;s immediately obvious that it&rsquo;s needed since the backed up software won&rsquo;t work. There are cases that are subtler, however, and where actually significant data in the subcode can be missed.</p>

<center><iframe width="560" height="315" src="https://www.youtube.com/embed/videoseries?list=PL8RnW3nRCF9lYNarjMnxlUmWSa8pXGcb1" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></center>


<p><a href="https://en.wikipedia.org/wiki/CD+G">CD+G</a> is an extension to the Compact Disc format that allows displaying simple graphics alongside the audio content of a CD. It comes well before CD-ROM, so it&rsquo;s designed for CD players that are hooked up to a TV rather than computers. CD+G stores its graphics in the disc&rsquo;s subcode data, which means that only backups that include that data actually capture the full content of the disc. Back up a CD+G disc in a format that doesn&rsquo;t include subcode data, like BIN/CUE, and it just turns into a normal audio CD. These graphics can be used for anything; the first CD+G release, Firesign Theatre&rsquo;s 1985 comedy album (shown above) features illustrations to accompany the audio. It was never widely-used, but it did develop a significant niche in karaoke discs as a way to display lyrics on-screen.</p>

<p>I want to talk a little more about how easy it can be to miss that a disc has significant CD+G data, so let&rsquo;s take a look at a few practical examples. A simple example is the Firesign Theatre album mentioned above. The packaging, <a href="https://www.discogs.com/release/843915-The-Firesign-Theatre-Eat-Or-Be-Eaten">as seen on Discogs</a>, doesn&rsquo;t mention the CD+G content at all, aside from a brief reference in the album credits&mdash;most owners of this disc would have no idea the CD+G content existed, and would never have owned a player. It&rsquo;s very likely that most people backing up their disc wouldn&rsquo;t even know they had skipped some of its content.</p>

<p>That&rsquo;s a little too simple, though. A little too neat and tidy. Let&rsquo;s take a look at something more fun.</p>

<p>In the 16-bit era, the first CD-based game consoles all had support for playing music CDs as a bonus feature. Many of these consoles also supported CD+G, and for many families these would have been their only CD+G player. The Victor Wondermega, a high-end all-in-one Sega Mega Drive/Mega CD console released in Japan, leaned into CD+G&rsquo;s popularity as a karaoke format by making karaoke one of its major features&mdash;including two microphone ports built right into the console. The system was bundled with a pack-in CD called <a href="https://segaretro.org/Wondermega_Collection"><em>Wondermega Collection</em></a> that showed off all aspects of its features: it includes several minigames that can be played in Mega CD mode, and two karaoke audio tracks that can be played if the player boots into the system&rsquo;s CD player instead of the game.</p>

<center style="image-rendering: pixelated;">
<p><img src="http://mistys-internet.website/blog/images/subcode/cdg-missing.png" alt="Screenshot of a track from Wondermega Collection with CD+G imagery missing.">
<img src="http://mistys-internet.website/blog/images/subcode/cdg-present.png" alt="Screenshot of a track from Wondermega Collection with CD+G imagery present. The UI in the CD player indicates that this is the same track on the disc, with the exact same spot in the track being shown."></center>


<p></p></p>

<p style="text-align: center; font-style: italic;">Screenshots of two disc images of Wondermega Collection running in the same CD player. The screenshot on the left is played without the subcode information, so it's recognized as audio-only. The screenshot on the right is played with the subcode information, so the CD+G content is correctly identified and rendered during playback.</p>


<p>Those karaoke tracks are coded using CD+G<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>, which means that they&rsquo;re only properly backed up if the disc is ripped in a format which supports subcode data. And, because of the complexity of the disc, there are many reasons that it&rsquo;s easy to fail to notice that this data was missed:</p>

<ul>
<li>Since the disc contains both Mega CD and audio CD content, the audio CD portion could easily be missed when testing the backup. In this case, it&rsquo;s easy to miss that the audio CD tracks actually had unique content beyond the audio itself.</li>
<li>Not all Mega CD emulators support subcode data, so it may not be clear how to even test that the disc is complete or incomplete.</li>
<li>The Redump standard doesn&rsquo;t include subcode data in the set of data it validates<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, so those backing up their discs to match Redump&rsquo;s database may discard the subcode data without realizing that it&rsquo;s significant.</li>
</ul>


<p>So what&rsquo;s the lesson here? Well, first of all, it&rsquo;s simply that it&rsquo;s difficult to fully audit all of the content on a disc to confirm that a backup is fully functional. The more kinds of distinct content on a disc, as in our <em>Wondermega Collection</em> example, the harder. (This is similar to the example of Mac/Windows hybrid discs I gave in my previous post, where by only testing a backup on one operating system an archivist might miss that they had discarded data for the other.) The second lesson is that it&rsquo;s not always obvious what content even <em>exists</em> on a disc, and it&rsquo;s easy to throw something away simply by not knowing it existed in the first place.</p>

<p>My personal recommendation, for those creating raw disc backups of physical CDs, is simply to always store the subcode data&mdash;at only 4% the size of the disc&rsquo;s primary data, it adds very little extra storage burden in exchange for being sure that nothing is being lost. For the truly storage space-starved, it&rsquo;s worth at least doing a full audit to make sure that no CD+G, CD-TEXT or similar data is present before discarding subcode data.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Also known as subchannel data.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>Which, yes, means they do work on any CD player that supports CD+G, including regular karaoke CD players.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>This isn&rsquo;t out of ignorance&mdash;there are technical limitations that make it difficult to validate the fixity of subcode data. Redump&rsquo;s database only includes data that can be reliably reproduced; omitting subcode data doesn&rsquo;t mean that it&rsquo;s not significant or that it shouldn&rsquo;t be backed up along with the rest of the disc&rsquo;s content, just that it can&rsquo;t be validated in the same way that the disc&rsquo;s main contents can be.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Announcing Cue2ccd: A Tool to Convert BIN/CUE Disc Images to CloneCD]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/12/15/announcing-cue2ccd-a-tool-to-convert-bin-slash-cue-disc-images-to-clonecd/"/>
    <updated>2024-12-15T14:59:56-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/12/15/announcing-cue2ccd-a-tool-to-convert-bin-slash-cue-disc-images-to-clonecd</id>
    <content type="html"><![CDATA[<p>I&rsquo;m releasing a tool I wrote for myself: <a href="https://www.mistys-internet.website/cue2ccd/"><code>cue2ccd</code></a>, a commandline tool to convert CD-ROM disc images from the <a href="https://en.wikipedia.org/wiki/Cue_sheet_(computing)">BIN/CUE</a> format to the <a href="https://en.wikipedia.org/wiki/CloneCD">CloneCD</a> format. For as many disc image conversion tools as there are out there, I hadn&rsquo;t found anything open-source or cross-platform that can handle going between these two specific formats&mdash;so I wrote it myself.</p>

<p>This is a very niche tool, but it solves one specific problem I have. I own a <a href="https://gdemu.wordpress.com/about/">Rhea</a> optical drive emulator for the Sega Saturn, a device which replaces the original CD drive in the console and allows it to load media from disc images on an SD card instead of physical CDs. The Rhea&rsquo;s great in a lot of ways, but it has one specific limitation: it doesn&rsquo;t load games in the BIN/CUE disc image format<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. Since a lot of media online is in that format, I&rsquo;ve really been wanting a convenient way to convert existing BIN/CUE images I have lying around into something I can use. Given how niche this is I don&rsquo;t expect many other people to need it, but I hope it&rsquo;s helpful if there&rsquo;s anyone else in the same situation.</p>

<p>Usage is as simple as possible: just run <code>cue2ccd path_to_cuesheet.cue</code> and it&rsquo;ll produce new <code>.img</code>, <code>.ccd</code> and <code>.sub</code> files in the same directory, ready for use. I&rsquo;ve set up convenient commandline installers for installing it on Mac, Linux, and Windows, which are available from <a href="https://www.mistys-internet.website/cue2ccd/">the website</a>, and it can be installed using Homebrew by running <code>brew install mistydemeo/formulae/cue2ccd</code>.</p>

<p>From here, I&rsquo;d like to take a little dive into the details of what this kind of conversion looks like and what I needed to do. I&rsquo;m not planning to go into my specific implementation, but rather I&rsquo;d like to focus on the details of the formats and the problems I ran into when writing cue2ccd. If you don&rsquo;t care about the technical details, you can skip the rest of the post (but please enjoy the tool, if you use it!). There are three primary things I needed to handle: writing CloneCD control files (<code>.ccd</code>), writing subcode data (<code>.sub</code>), and merging multi-track images.</p>

<h2>Writing CloneCD control files</h2>

<p>Like I mentioned in a <a href="http://mistys-internet.website/blog/blog/2024/09/13/the-working-archivists-guide-to-enthusiast-cd-rom-archiving-tools/">previous post</a>, CloneCD&rsquo;s table of contents format is lower-level and much more complex than the cue sheets used by BIN/CUE disc images. Here&rsquo;s a sample cue sheet for a disc image with one data track and two audio tracks:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>FILE "disc.bin" BINARY
</span><span class='line'>  TRACK 01 MODE1/2352
</span><span class='line'>    INDEX 01 00:00:00
</span><span class='line'>  TRACK 02 AUDIO
</span><span class='line'>    INDEX 00 00:04:16
</span><span class='line'>    INDEX 01 00:06:16
</span><span class='line'>  TRACK 03 AUDIO
</span><span class='line'>    INDEX 00 00:07:16
</span><span class='line'>    INDEX 01 00:09:16</span></code></pre></td></tr></table></div></figure>


<p>These nine lines capture (most of) the essential parts of a CD, without getting into details: it lists which tracks exist (and which files those tracks are stored in); what type and mode each of those tracks are; and that track&rsquo;s indices, with their locations on the disc.<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></p>

<p>The equivalent CloneCD file, meanwhile, is 121 lines long and contains entries that look like this:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>[CloneCD]
</span><span class='line'>Version=3
</span><span class='line'>
</span><span class='line'>[Disc]
</span><span class='line'>TocEntries=6
</span><span class='line'>Sessions=1
</span><span class='line'>DataTracksScrambled=0
</span><span class='line'>CDTextLength=0
</span><span class='line'>
</span><span class='line'>[Session 1]
</span><span class='line'>PreGapMode=1
</span><span class='line'>PreGapSubC=0
</span><span class='line'>
</span><span class='line'>[Entry 0]
</span><span class='line'>Session=1
</span><span class='line'>Point=0xa0
</span><span class='line'>ADR=0x01
</span><span class='line'>Control=0x04
</span><span class='line'>TrackNo=0
</span><span class='line'>AMin=0
</span><span class='line'>ASec=0
</span><span class='line'>AFrame=0
</span><span class='line'>ALBA=-150
</span><span class='line'>Zero=0
</span><span class='line'>PMin=1
</span><span class='line'>PSec=0
</span><span class='line'>PFrame=0
</span><span class='line'>PLBA=4350
</span><span class='line'>
</span><span class='line'># and so on</span></code></pre></td></tr></table></div></figure>


<p>And it continues from there&mdash;as you can imagine, it&rsquo;s a much more complex format to generate! At its core, though, they&rsquo;re both representing roughly the same information: the table of contents of a disc, with the tracks and their definitions. All of the information I need to generate the CloneCD files either exists in the cue sheet or can be derived based on information I have access to. This data fits into three categories, one of which is data shared in common between cue sheets and the CloneCD format:</p>

<ul>
<li>Data about each track, including its list of indices and start/stop timestamps<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></li>
<li>Overall data about the disc and the session (missing from the cue sheet)</li>
<li>Data about the disc&rsquo;s lead-in and lead-out sections (missing from the cue sheet)</li>
</ul>


<h3>Track-level metadata</h3>

<p>That&rsquo;s a lot to go over, but this turned out not to be as complex as I thought it might be. I&rsquo;ll gloss over the disc-level metadata (which is fairly brief); let&rsquo;s look at what the two formats share in common instead, the track-level metadata. We&rsquo;ll do direct comparison of the same track from both the cue sheet and the CloneCD file, starting with the cue sheet:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>TRACK 01 MODE1/2352
</span><span class='line'>  INDEX 01 00:00:00</span></code></pre></td></tr></table></div></figure>


<p>Despite being fairly short, it encodes a few different bits of information that we&rsquo;ll be wanting to reproduce.</p>

<ul>
<li>This is track 1 on the disc;</li>
<li>It&rsquo;s a data track, specifically a mode 1 data track.<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup></li>
<li>That data track is stored in the disc image with &ldquo;raw&rdquo; 2352-byte sectors, meaning error correction is included. This field isn&rsquo;t important for us, since cue2ccd only works with raw disc images.</li>
<li>This track contains a single index, numbered 1, which begins at the timestamp <code>00:00:00</code>&mdash;that is, at the very beginning of the disc image.</li>
</ul>


<p>It&rsquo;s all, in other words, pretty core structural metadata about the track and how it&rsquo;s formed. Now let&rsquo;s take a look at the CloneCD version:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>[Entry 3]
</span><span class='line'>Session=1
</span><span class='line'>Point=0x01
</span><span class='line'>ADR=0x01
</span><span class='line'>Control=0x04
</span><span class='line'>TrackNo=0
</span><span class='line'>AMin=0
</span><span class='line'>ASec=0
</span><span class='line'>AFrame=0
</span><span class='line'>ALBA=-150
</span><span class='line'>Zero=0
</span><span class='line'>PMin=0
</span><span class='line'>PSec=2
</span><span class='line'>PFrame=0
</span><span class='line'>PLBA=0</span></code></pre></td></tr></table></div></figure>


<p>At first glance, it looks pretty overwhelming! It turns out, however, it&rsquo;s not actually as complex as it seems. The field names may seem difficult to understand at first flance, but the good news is that they&rsquo;re based directly on the table of contents from the lead-in on a real CD, and so all of them (with the same or similar names) are documented in the <a href="https://ecma-international.org/publications-and-standards/standards/ecma-130/">CD spec</a>.</p>

<ul>
<li>The <code>Point</code> (pointer) field is a hex value which means a few different things depending on context. For a standard track, it&rsquo;s the track number. In this case, we know from the cue sheet that this is track 1, so it&rsquo;s set to <code>1</code>.</li>
<li>The <code>Control</code> field is a hex value which indicates information about the track type, along with some other metadata that isn&rsquo;t relevant to us. This is four bits out of a byte in the CD&rsquo;s binary format, but CloneCD lets us just write a number. There are only two values that matter to us: audio (<code>0</code>) or data (<code>4</code>). We&rsquo;ve got a data track, so this uses <code>4</code>.</li>
<li>The track starts at <code>00:00:00</code>, so we mark the same values here. They&rsquo;re just in three separate fields, unlike the cue sheet where they&rsquo;re written as a single timestamp. We get <code>PMin=0</code>, <code>PSec=2</code> and <code>PFrame=0</code>. (If that seems like an off-by-two value to you, well-spotted. The explanation comes later.)</li>
<li>The <code>PLBA</code> field contains essentially the same information as in the Min/Sec/Frame fields, but expressed in terms of the number of sectors since the beginning of the disc&rsquo;s content. In this case, this track begins at the start of the disc, so that&rsquo;s 0.</li>
<li>The <code>AMin</code>, <code>ASec</code> and <code>AFrame</code> values mean something in other contexts, but here are left at zero.</li>
<li>The <code>Zero</code> field always contains a <code>0</code>. What a surprise!</li>
<li>Finally, a few fields aren&rsquo;t relevant to us and get hardcoded, like <code>Adr</code> and <code>TrackNo</code>.</li>
</ul>


<p>Whew! In other words, this is mostly the same data as in the cue sheet, it&rsquo;s just in a more verbose form and using terms that only make sense after reading the CD-ROM spec. Knowing what these fields mean, it wasn&rsquo;t too hard to generate these CloneCD tracks given the equivalent information from the cue sheet.</p>

<h3>Lead-in and lead-out</h3>

<p>I mentioned earlier that the CloneCD format includes information about the lead-in and lead-out. These are sections at the beginning and end of the disc that aren&rsquo;t typically stored, in their raw format, in disc images. The lead-in contains the raw, binary table of contents information for the disc while the lead-out contains information about the disc&rsquo;s duration.</p>

<p>This is missing from the cue sheet format, but we <em>can</em> derive the info we need from what&rsquo;s in the CloneCD data. These are stored as &ldquo;entries&rdquo; in the CloneCD control file alongside the tracks, and actually looks a lot like track data. The fields share names with the ones used for track data, but some of them take on different meanings when used like this.</p>

<p>To give you an idea what this looks like, here&rsquo;s an abbreviated copy of the first/last track information for this disc with only the fields that differ from regular track data.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>[Entry 0]
</span><span class='line'>Point=0xa0
</span><span class='line'>PMin=1
</span><span class='line'>PSec=0
</span><span class='line'>PFrame=0
</span><span class='line'>PLBA=4350
</span><span class='line'>
</span><span class='line'>[Entry 1]
</span><span class='line'>Point=0xa1
</span><span class='line'>PMin=3
</span><span class='line'>PSec=0
</span><span class='line'>PFrame=0
</span><span class='line'>PLBA=13350</span></code></pre></td></tr></table></div></figure>


<p>The <code>Point</code> field is the POINTER field defined in 22.3.4.2 of the CD-ROM spec. Previously, when talking about tracks, we set this to the track number. When set to a value outside the 1-99 track number range, it means something different. Two of those values can be seen above: <code>0xa0</code> means that this entry contains information about the first track on the disc, while <code>0xa1</code> means the last track. When set to these values, it changes the meaning of the remaining fields. Instead of containing timing information, the <code>PMin</code> field is used to specify the <em>track number</em> of the first or last track on the disc, and the other two values are left empty. These two fields tell the player how many tracks to expect when reading the rest of the disc. The PLBA fields are still here, and still calculated based on the Min/Sec/Frame values, but they&rsquo;re essentially meaningless for these entries since the Min/Sec/Frame aren&rsquo;t real timestamps.</p>

<p>Finally, we get to the lead-out, which looks like this (relevant fields only):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>[Entry 2]
</span><span class='line'>Point=0xa2
</span><span class='line'>PMin=0
</span><span class='line'>PSec=12
</span><span class='line'>PFrame=16</span></code></pre></td></tr></table></div></figure>


<p>A pointer of <code>0xa2</code> indicates that the remaining values are describing the beginning of the disc&rsquo;s lead-out&mdash;or, in other words, describing the end of data. Here, the Min/Sec/Frame values are a timecode again, but instead of describing the <em>start</em> of a section of data, they describe the timestamp marking the <em>end of the disc</em>. (Yes, 12.21 seconds is accurate; this is a small test image containing three seconds-long tracks.) This is actually pretty critical info: it tells the CD player when it should stop seeking at the end of the CD, and makes it possible to tell how long the disc is as a whole.</p>

<h3>Parsing and oddities</h3>

<p>I went for <a href="https://github.com/lipnitsk/libcue">libcue</a> for parsing cue sheets, since it provides a simple and straightforward track-oriented interface which makes it easy to query all of the track definitions. Writing my own parser in Rust felt out of scope. There are a couple of pure-Rust parsers on crates.io, but they&rsquo;re oriented around music files like FLAC and are missing a few features I&rsquo;d need for raw disc images. Instead, I wrote <a href="https://github.com/mistydemeo/libcue.rs">a small crate</a> that acts as a thin binding for libcue while adapting a few bits of its interface to Rust conventions.</p>

<p>One of the more annoying gotchas of the cue sheet format is that it leaves out one important piece of information that&rsquo;s necessary to render the lead-out entry. Let&rsquo;s take another peek at the cue sheet, and see if it jumps out at you.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>FILE "disc.bin" BINARY
</span><span class='line'>  TRACK 01 MODE1/2352
</span><span class='line'>    INDEX 01 00:00:00
</span><span class='line'>  TRACK 02 AUDIO
</span><span class='line'>    INDEX 00 00:04:16
</span><span class='line'>    INDEX 01 00:06:16
</span><span class='line'>  TRACK 03 AUDIO
</span><span class='line'>    INDEX 00 00:07:16
</span><span class='line'>    INDEX 01 00:09:16</span></code></pre></td></tr></table></div></figure>


<p>It lists where tracks and indices start&hellip; but it doesn&rsquo;t show where they <em>end</em>. libcue calculates track ends for every track except the last by checking where the next index starts, and returns that with the rest of the information that&rsquo;s in the file, but the duration and endpoint of the final track is left completely ambiguous. The only way to get that information is to check the file size of the actual underlying disc image file and calculate how many sectors long it is. It&rsquo;s not the end of the world, but it <em>is</em> annoying&mdash;and it&rsquo;s the one and only bit of metadata generation I did that required access to the underlying data files. I would have loved if I could have worked just off of the metadata.</p>

<p>Another interesting gotcha is the timestamps, which have an unusual off-by-150 problem. As I mentioned previously, the lead-in and lead-out sections are usually omitted from the binary content of a disc image. Since the lead-in takes up the first 150 sectors on the disc, this means that standard disc images actually start at index 150 into the disc, not index 0. This gives us an conundrum for absolute timestamps. Although the BIN/CUE images appear at first glance to have absolute timestamps that are comparable with the CloneCD file, its definition is slightly different.</p>

<p>With a single BIN file, a cue sheet&rsquo;s indices are absolute indices <em>into the BIN file</em>. Since the first index <em>within the BIN file</em> is actually sector 150 on the disc, it means that the timecodes for that BIN file are offset from the real CD by 150. Let&rsquo;s take another look at some absolute timestamps for the two formats for a practical example:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>TRACK 02 AUDIO
</span><span class='line'>  INDEX 01 00:06:16</span></code></pre></td></tr></table></div></figure>




<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>PMin=0
</span><span class='line'>PSec=8
</span><span class='line'>PFrame=16</span></code></pre></td></tr></table></div></figure>


<p>This track on our sample image begins at 00:06:16 into the BIN/CUE&hellip; which means that, for CloneCD, it has an absolute timestamp of exactly two seconds more, 00:08:16. In practice, applying an offset when translating timestamps wasn&rsquo;t actually that hard, but it <em>was</em> a place where where errors seeped in. For a nontrivial part of my tool&rsquo;s life, I had an off-by-one error from sloppy timestamp conversion.</p>

<h2>Generating subcode data</h2>

<p>The second thing I needed to create was <a href="https://en.wikipedia.org/wiki/Compact_Disc_subcode">subcode data</a> (aka subchannel data), a form of builtin metadata used on CD. On physical CDs, each 2352-byte sector is accompanied by 98 bytes of subcode data. The subcode data is necessary when reading a physical CD but not typically needed when mounting or burning a disc image, so a number of disc image formats&mdash;including BIN/CUE and plain ISO files&mdash;don&rsquo;t bother reading or saving it at all. The CloneCD format <em>does</em> back it up, however, and the device I&rsquo;m using requires valid subcode data. I knew I&rsquo;d need to generate it myself.</p>

<p>Subcode data is a binary format encoding very similar information to the entries we just saw in the text-based CloneCD control format above. Each 98-byte subcode sector contains two bytes of synchronization words, followed by 96 bytes of data divided into eight channels with lettered names from P to W. In the original CD and CD-ROM specs, only the P and Q channels are specified; channels R through W were set aside for later expansion, and most discs never use them. They were used for standards such as <a href="https://en.wikipedia.org/wiki/CD-Text">CD-TEXT</a>, which allowed encoding human-readable track names on a CD; <a href="https://en.wikipedia.org/wiki/CD+G">CD+G</a>, which allowed encoding simple graphics, such as on karaoke CDs; and various copy protection systems. For my usecase, none of those were relevant, so I only needed to generate data for the P and Q channels.</p>

<h3>P channel</h3>

<p>The P channel was by far the simplest, and took very little work to do. It&rsquo;s used to indicate the boundaries between tracks for very primitive early players which didn&rsquo;t keep track of table of contents information. If a sector is within the first 150 sectors of the start of a track, it&rsquo;s filled with <code>FF</code> bytes. Otherwise, it contains <code>00</code> bytes. There&rsquo;s no other variation, so it was very easy to implement.</p>

<h3>Q channel</h3>

<p>The Q channel is slightly more complex. Before getting into the details, let&rsquo;s look at a little sample of what a single Q channel sector looks like. Here&rsquo;s the raw bytes in hex format:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>41010100 00480000 0248F2BB</span></code></pre></td></tr></table></div></figure>


<p>There&rsquo;s a chance you may be able to put together some of this based on the description of the entries in a CloneCD control file earlier, but don&rsquo;t worry, we&rsquo;ll come back to this later.</p>

<p>This channel primarily consists of timing information: it encodes the timestamp of the currently-playing sector, a flag indicating whether this sector is data or audio, and some simple forms of metadata<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup>. It also contains a 16-bit checksum, allowing the data in the rest of the Q channel to be validated. The metadata in question isn&rsquo;t relevant to my usecase, so I only needed to worry about the timestamps, the data flag, and the checksum.</p>

<h4>Control and q-Mode fields</h4>

<p>The first byte is separated into two four-bit fields. That is, it contains data which is smaller than one byte&mdash;an idea that isn&rsquo;t always familiar to people who aren&rsquo;t familiar with binary data. Since a byte contains eight bits, it&rsquo;s possible to fit multiple fields into a single byte if they&rsquo;re smaller than one byte. In this case, instead of using the full byte for one field, we can split that one byte in half and use it to store two four-bit fields.</p>

<p>The first of these fields, the control field, consists of a few different flags, but only one is relevant here: the data flag. When unset, it indicates that this sector contains audio; when set, it indicates that it contains data. In our case, that means taking the first four bits of our byte and setting them to <code>0100</code>.</p>

<p>The second field indicates the type of data being encoded in the following bytes. Since I&rsquo;m ignoring the alternate metadata that could be represented here, I always set it to the value indicating that the bytes to follow will contain timing information. In our case, that means taking the last four bits of our byte and setting it to <code>0001</code>. Putting it all together, we get a byte with the bits:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>01000001</span></code></pre></td></tr></table></div></figure>


<p>Or, read as a single byte:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>41</span></code></pre></td></tr></table></div></figure>


<h4>Timestamps</h4>

<p>As with the CloneCD control file, timestamps are stored as separate minute, second and fraction fields. The Q channel contains two different timestamps and some other timekeeping information:</p>

<ul>
<li>The track number</li>
<li>The index number</li>
<li>The timestamp relative to the current track</li>
<li>The absolute timestamp</li>
</ul>


<p>All of these values are stored in <a href="https://en.wikipedia.org/wiki/Binary-coded_decimal">binary-coded decimal</a> (BCD) format, which has the side bonus that it makes this data easy to read by eye with a hex editor. I made use of that while debugging.</p>

<p>For the most part, these timestamp fields are straightforward to implement so long as I pass the right data in. There was one fun gotcha, however. CD audio contains gaps between tracks called <a href="https://en.wikipedia.org/wiki/Pregap">&ldquo;pregaps&rdquo;</a>; they&rsquo;re defined as index 0 within a track, with the track itself beginning at index 1. They throw an interesting edge case for calculating relative timestamps. What does it mean to track the timestamp relative to the start of the track for a time that <em>isn&rsquo;t part of the track</em>? Since this binary-coded digital format doesn&rsquo;t support negative numbers, the standard uses a slightly strange but appropriate workaround. Within the pregaps, the relative timestamp instead <em>starts</em> at the <em>length</em> of the pregap and then counts down until it hits 0, which marks the beginning of the track, at which point it begins counting up again. Needless to say, this was the source of a few fun off-by-one bugs.</p>

<h4>Checksum</h4>

<p>Finally, it ends with a 16-bit (two-byte) checksum of the remainder of the data. The CRC-16 routine it uses is specified in the CD-ROM spec; I generated a suitable C CRC-16 routine using the Ruby <a href="https://rubygems.org/gems/crc">crc</a> library, then translated it into Rust. I&rsquo;ve published it standalone as the <a href="https://crates.io/crates/cdrom_crc">cdrom_crc</a> crate.</p>

<h4>Putting it all together</h4>

<p>Here&rsquo;s that raw data again, with each byte annotated:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>41 - This one byte is actually two different fields,
</span><span class='line'>     each of which takes up four bits.
</span><span class='line'>     The first four bits are the control field;
</span><span class='line'>     here, 0100 indicates this is a data track.
</span><span class='line'>     The next four bits are are the Q-mode field.
</span><span class='line'>     0001 indicates the remainder of the data is time
</span><span class='line'>     information.
</span><span class='line'>01 - This is the track number - track 01.
</span><span class='line'>01 - This is the current index - index 01.
</span><span class='line'>00 - These next three bytes make up the relative
</span><span class='line'>     position of this sector within the track,
</span><span class='line'>     00:00:48.
</span><span class='line'>00
</span><span class='line'>48
</span><span class='line'>00 - This is the zero field. It's always zero.
</span><span class='line'>00 - These next three bytes make up the absolute
</span><span class='line'>     position of this sector on the disc,
</span><span class='line'>     00:02:48.
</span><span class='line'>02
</span><span class='line'>48
</span><span class='line'>F2 - These last two bytes are the 16-bit checksum.
</span><span class='line'>BB</span></code></pre></td></tr></table></div></figure>


<p>Not actually that <em>much</em> information, and not too hard to make sense of after taking the time to assemble everything, but it certainly took some work to get there.</p>

<p>Luckily for me, the CloneCD representation of subcode data is simplified in a few ways that made things easier. CloneCD ignores the two sync bytes, storing only the 96 data bytes, which saved me the trouble of handling them. It also reorders the data to be easier to reason about. On a physical CD, the subcode for a sector isn&rsquo;t contiguous. Instead, every 32-byte frame of a data sector is followed by a single byte containing one single bit from each of the eight channels. Assembling a complete byte for the channels requires waiting for eight frames and reordering the bits as they come in. CloneCD, meanwhile, reorders the data into the standard byte order. There may be technical reasons why this is the case when streaming from a CD, but I&rsquo;m just grateful to get to write bytes like a normal person.</p>

<h2>Merging disc images</h2>

<p>I actually had a version of cue2ccd ready to release about a year ago, but I had one last feature I really wanted and kept putting off: merging disc images.</p>

<p>More specifically, I wanted to handle disc images containing multiple files. A lot of BIN/CUE disc images use a single BIN file containing all tracks, sort of like how a CD itself is structured, and that&rsquo;s what the initial version of cue2ccd was written for. In recent years, however, split images have become more common. These are still raw images, but they use separate raw disc image files for every track on the disc. In theory, doing this is easy; the data is the same, you just need to concatenate the files. No work at all. Unfortunately, the metadata is a bit harder. Let&rsquo;s take a look at the disc from earlier in its original one-file version:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>FILE "disc.bin" BINARY
</span><span class='line'>  TRACK 01 MODE1/2352
</span><span class='line'>    INDEX 01 00:00:00
</span><span class='line'>  TRACK 02 AUDIO
</span><span class='line'>    INDEX 00 00:04:16
</span><span class='line'>    INDEX 01 00:06:16
</span><span class='line'>  TRACK 03 AUDIO
</span><span class='line'>    INDEX 00 00:07:16
</span><span class='line'>    INDEX 01 00:09:16</span></code></pre></td></tr></table></div></figure>


<p>Now let&rsquo;s take a look at the exact same disc, but in a one-file-per-track form:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>FILE "disc (Track 01).bin" BINARY
</span><span class='line'>  TRACK 01 MODE1/2352
</span><span class='line'>    INDEX 01 00:00:00
</span><span class='line'>FILE "disc (Track 02).bin" BINARY
</span><span class='line'>  TRACK 02 AUDIO
</span><span class='line'>    INDEX 00 00:00:00
</span><span class='line'>    INDEX 01 00:02:00
</span><span class='line'>FILE "disc (Track 03).bin" BINARY
</span><span class='line'>  TRACK 03 AUDIO
</span><span class='line'>    INDEX 00 00:00:00
</span><span class='line'>    INDEX 01 00:02:00
</span></code></pre></td></tr></table></div></figure>


<p>It may strike you that those timestamps aren&rsquo;t useful. And you wouldn&rsquo;t be entirely wrong. They&rsquo;re all the same now! What the heck? What happened?</p>

<p>Well, as I (briefly) mentioned earlier, the timestamps in a cue sheet are timestamps <em>into that file</em>, not absolute timestamps <em>into the disc</em>. For a single-file disc image there&rsquo;s almost no difference between the two, except the off-by-150 issue I mentioned previously. But if a single binary also contains a single track, it suddenly becomes a lot more obvious that the offsets for each track are specific to each file.</p>

<p>So, in practice, implementing this didn&rsquo;t <em>just</em> mean concatenating the files. It also meant, for each track, keeping track of the size of the disc up until that point so that I could convert each of these relative timestamps into an absolute one. It&rsquo;s not necessarily <em>hard</em> work but it&rsquo;s an easy source of off-by-one errors and other similar mistakes, so I had a few revisions with subtly wrong timing. It also runs into a harsher version of the &ldquo;no duration of the last track&rdquo; problem: since every track is its own file, now <em>every</em> track is the last track in its file, so <em>none</em> of them have durations available from the metadata. I was able to apply what I&rsquo;d already written to calculate the duration based on the filesize, with a fix for a bug that only happened when it wasn&rsquo;t the last track in a larger file, but I&rsquo;d certainly have preferred not to have to do it at all.</p>

<h2>In conclusion: CD is weird</h2>

<p>Honestly, it&rsquo;s been fun to get to dig deeper into a format not many people still care about these days. I&rsquo;d also like to thank a couple of people whose help with previous projects was very useful for this one: the creator of the Rhea, Phoebe and GDEmu hardware, who was gracious in providing support debugging my earliest attempts at generating files compatible with his hardware; and CyberWarriorX, with whom I worked on an earlier CloneCD-generating project.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>It also supports a few other formats, such as DiscJuggler and Alcohol 120%, but there aren&rsquo;t any open-source tools to convert to those either.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>Each track is divided into one or more indices. Index 1 is the actual start of the track, while index 0 defines a gap that comes before the actual track begins, and indices 2 and beyond are rare. The gap between tracks is typically called a &ldquo;pregap&rdquo;. On a real CD player, when picking a track by number, the player will start straight from that track&rsquo;s index 1. When letting the disc play through from a previous track, however, the disc will play the pregap defined in index 0 first before proceeding to index 1.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>Since CD was originally designed just for music, all indices to locations on the disc are measured in terms of timestamps instead of a more data-oriented index like an address in bytes. These timestamps are stored in three parts: minutes, seconds, and 1/75 fractions of a second. For example, if a track starts at two seconds into the disc, its timestamp is 00:02:00. libcue translates these into a logical block address, eg a number of sectors, which would mean the previous example is 150. The CloneCD format reproduces the original CD-ROM spec&rsquo;s timestamps, but additionally stores logical block addresses in some places for convenience.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
<li id="fn:4">
<p>There are a few different modes of data track which have different data layouts. A data sector is always 2352 bytes with a mixture of data and error correction data. The different modes have different ratios of data to error correction. Mode 1, the original and most common mode, uses 2048 bytes out of every sector for data with the remaining 304 bytes serving as error correction.<a href="#fnref:4" rev="footnote">&#8617;</a></p></li>
<li id="fn:5">
<p>It&rsquo;s also used in the disc&rsquo;s lead-in and lead-out, but I&rsquo;m not dealing with those sections of the disc.<a href="#fnref:5" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[What Happened to the Japanese PC Platforms?]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/09/21/what-happened-to-the-japanese-pc-platforms/"/>
    <updated>2024-09-21T14:01:19-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/09/21/what-happened-to-the-japanese-pc-platforms</id>
    <content type="html"><![CDATA[<p><em>(This was originally posted on a social media site; I&rsquo;ve revised and updated it for my blog.)</em></p>

<p>The other day <a href="https://nex-3.com">a friend</a> asked me a pretty interesting question: what <em>happened</em> to all those companies who made those Japanese computer platforms that were never released outside Japan? I thought it&rsquo;d be worth expanding that answer into a full-size post.</p>

<h3>A quick introduction: the players</h3>

<p>It&rsquo;s hard to remember these days, but there there used to be an incredible amount of variety in the computer space. There were a <em>lot</em> of different computer platforms, pretty much all of them totally incompatible with each other. North America settled on the IBM PC/Mac duopoly pretty early<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>, but Europe still had plenty of other computers popular well into the 90s, and Japan had its own computers that essentially didn&rsquo;t exist anywhere else.</p>

<p>So who were they? By the 16-bit computer era, there&rsquo;s three I&rsquo;m going to talk about today<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>: NEC&rsquo;s PC-98, Fujitsu&rsquo;s FM Towns, and Sharp&rsquo;s X68000. The PC-98 was far and away the biggest of those platforms, with the other two having a more niche market.</p>

<h3>The PC-98 in a time of transition</h3>

<p>First, a quick digression: what is this DOS thing?</p>

<p>The thing about DOS is that it&rsquo;s a much thinner OS than what we think of in 2024. When you&rsquo;re writing DOS software of any kind of complexity, you&rsquo;re talking straight to the hardware, or to drivers that are specific to particular classes of hardware. When we talk about &ldquo;DOS&rdquo; in the west, we specifically mean &ldquo;DOS on IBM compatible PCs&rdquo;. PC-98 and FM Towns both had DOS-based operating systems, but their hardware was nothing at all like IBM compatible PCs and there was no level of software compatibility between them. The PC-98 was originally a DOS-based computer without a GUI of any kind - just like DOS-based IBM PCs. When we talk about &ldquo;PC-98&rdquo; games and software, what we really mean is DOS-based PC-98 software that only runs on that platform.</p>

<p>Windows software is very different from DOS in one important way: Windows incorporates a hardware abstraction layer. Software written for Windows APIs doesn&rsquo;t need to be specific to particular hardware, and that set the stage for the major transition that was going to come.</p>

<p>NEC and Microsoft teamed up on porting Windows to the PC-98 platform. Both the PC-98 and the IBM PC use the same CPU, even though the rest of their hardware is very different, which made the port technically feasible. The first Windows release for PC-98 came out in 1992, but Windows didn&rsquo;t really take off in a big way until Windows 95 in the mid-90s. And so, suddenly, for the first time software could run on both IBM PCs running Japanese language Windows <em>and</em> PC-98 running Windows.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> Software developers didn&rsquo;t have to do anything special to get that compatibility: it happened by default, so long as they were using the standard Windows software features and didn&rsquo;t talk directly to the hardware.</p>

<p>Around the same time, NEC started making IBM-compatible PCs. As far as I can tell, they made both PC-98s and IBM PCs alongside each other for quite a few years. With Windows software not caring what the underlying hardware was, the distinction between &ldquo;PC-98&rdquo; and &ldquo;PC&rdquo; got a lot fuzzier. If you were buying a PC, you had no reason to buy a PC-98 unless you wanted to run DOS-based PC-98 software. If you just wanted that shiny new Windows software, why not buy the cheaper IBM PC that NEC would also sell you?</p>

<p>So, for the PC-98, the answer isn&rsquo;t really that it <em>died</em> - it sort of faded away and merged into what every other system was becoming.</p>

<h3>The FM Towns</h3>

<p>The FM Towns had a similar transition. While it had a homegrown GUI-based OS called Towns OS, it was relatively primitive compared to Windows 3 and especially Windows 95. The FM Towns also used the same CPU as IBM PCs and the PC-98, which means Microsoft could work with Fujitsu to port their software to the platform. And, just like what happened with the PC-98, the platform became far less relevant and less distinctive when it was just another platform to run Windows software on. If you didn&rsquo;t care about running the older FM Towns-specific software, why would you care about buying an FM Towns instead of any other IBM PC?</p>

<p>Fujitsu, just like NEC, made the transition to making standard Windows PCs and discontinued the FM Towns a few years later.</p>

<h3>The X68000 loses out in the CPU wars</h3>

<p>Unlike the other two platforms, the X68000 had a different CPU and a distinct homegrown OS. It used the 68000 series of processors from Motorola, which were incredibly popular in the 80s and 90s. The same CPU was used by the Mac until the mid 90s, the Amiga, and a huge number of home consoles and arcade boards. It was a powerful CPU, but when every other platform was looking for a way to merge with the Windows platform, they had a big problem: you simply couldn&rsquo;t port Windows to the platform and get it to run regular Windows software because they didn&rsquo;t use the same CPUs. Sharp were locked out. While they also switched to making Windows PCs in the 90s, they had no way to bring their existing users with them by giving them a transition path.</p>

<h3>The lure of multitasking</h3>

<p>Why did Windows win out, though? In the west we often credit Microsoft Office as the killer app, but it wasn&rsquo;t a major player in Japan where Japanese language-specific word processors were huge in the market for years. I&rsquo;d argue instead that multitasking was the killer feature.</p>

<p>In the DOS era, you ran one program at a time. You might have a lot of software you used, but you&rsquo;d pick one program to use at a time. If you wanted to switch to something else, you&rsquo;d have to save whatever you&rsquo;re doing, quit, and open a completely different full-screen app. While competing platforms like the Mac<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup> had multitasking via their GUIs for years, Windows and especially Windows 3 is what brought it to the wider market.</p>

<p>If you&rsquo;re going to be using more than one program at the same time, having a wider amount of software <em>that&rsquo;s inter-compatible</em> becomes more important. I&rsquo;d argue that multitasking is what nudged market consolidation onto a smaller number of computers. Windows, and especially Windows 95, became <em>very</em> hard for other platforms to compete with because its base of software was just so large. It made far more sense for NEC and Fujitsu to bring Windows to their users even if it meant losing the lock-in that their unique OSs and platform-specific software had gotten them.</p>

<h3>Shifts in the gaming market</h3>

<p>In the 16-bit era, the FM Towns and X68000 were doing great in the computer gaming niche. They had powerful 2D gaming hardware and a lot of very sophisticated action games. Their original games and ports of arcade games compared extremely well against what 16-bit consoles could do, giving them a reputation of being the real gamers' platforms. By 1994 though, they had a problem: the 32-bit consoles were out, which could do 2D games just as well as the FM Towns and X68000, and the consoles could also do 3D that blew away anything those computers could handle. Fujitsu and Sharp, meanwhile, just weren&rsquo;t releasing new hardware that could keep up. The PC gaming niche had already been shrinking and moving towards consoles for a few years, and this killed off a lot of what was left.</p>

<p>I also suspect that Sony&rsquo;s marketing for the PlayStation changed things significantly. Home computers had older players than the 16-bit consoles did, but Sony was marketing the PS1 towards those same older audiences. It probably made it easy for computer players to look at the new consoles and decide to move on.</p>

<h3>What about the 8-bit platforms?</h3>

<p>Japan had a variety of 8-bit computer platforms, some of which (like the MSX) were also well-known in western countries. While in Europe the 8-bit micros held on right into the 90s, and many users upgraded straight from 8-bit micros to Windows PCs, in Japan the 8-bit computers had already been supplanted by native 16-bit computing platforms before the Windows era. In some cases, these were 16-bit computers by the same manufacturers - both Sharp and NEC had been major players in the 8-bit computing era too. The MSX, meanwhile, had failed to produce either a 16-bit evolution of the platform or a 16-bit successor and so many of its users had already moved on by the time Windows 95 came out.</p>

<h3>So, in conclusion</h3>

<p>None of the 16-bit Japanese computer makers acutally died off - they just switched to making standard Windows PCs that were interchangeable with anything else out there. Microsoft took over that market just like they did everywhere else in the world, but at least the companies themselves survived better than the Commodores and Ataris of the world.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Some of the 16-bit competitors, like Amiga and Atari ST, had <em>some</em> market penetration in North America, but they were pretty niche compared to Europe.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>There were some others too, like Sony NEWS, but they mostly settled into the &ldquo;professional workstation market&rdquo; that was its own weird thing. Just like the international SGI, Sun and NeXT workstations, they had their own reasons for fading away.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>A lot of the earlier Japanese Windows games I have list their system requirements in terms of both PC-98 and IBM PC, even though they&rsquo;re not using anything specific to either platform.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
<li id="fn:4">
<p>Outside Japan the Amiga and many others also had high-quality multitasking GUIs for years, but I&rsquo;m focusing specifically on Japan here.<a href="#fnref:4" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Working Archivist's Guide to Enthusiast CD-ROM Archiving Tools]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/09/13/the-working-archivists-guide-to-enthusiast-cd-rom-archiving-tools/"/>
    <updated>2024-09-13T16:32:48-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/09/13/the-working-archivists-guide-to-enthusiast-cd-rom-archiving-tools</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve seen a lot of professional archivists who use flux disc image archiving techniques for their collections&mdash;a technique in which a specialized floppy controller captures the raw signal coming from the floppy drive so that it can be preserved and decoded in software. I haven&rsquo;t, however, seen many archivists using enthusiast-developed low-level reading techniques for CD-ROM. I&rsquo;ve personally been making use of these techniques and I find them very helpful; I know that many other archivists and institutions could make great use of them. However, I know that information about enthusiast-developed tools are usually deeply embedded in those communities and can be hard to find for others. As someone with a foot in both worlds, I want to try to bridge the gap and make this information available a bit more widely. This post will summarize why archivists might be interested in these tools, what they can do, and how to make use of them.</p>

<h2>Redump</h2>

<p>People who are familiar with emulation may think of Redump as collections of disc images online, but they&rsquo;re really a <em>metadata</em> database for CD-ROM preservation focused primarily on games. It collects metadata of transfers of disc images but also, crucially for us, it <em>sets standards</em> on how disc images should be created in order to ensure accuracy. Those standards are publicly available and are easy enough to follow by anyone&mdash;not just people looking to submit to Redump&rsquo;s database.</p>

<p>Because Redump&rsquo;s disc imaging standards are of sufficiently high quality, and their software and guides are freely available, I highly recommend them to all people looking to preserve CD-ROMs.</p>

<h2>What does dumping to Redump&rsquo;s standards do that typical dumping doesn&rsquo;t?</h2>

<p>Although the <em>end product</em> of Redump&rsquo;s dumping process is a disc image in the common BIN/CUE format, the actual <em>process</em> is different in some key ways.</p>

<p>Typically, when reading a CD-ROM, the data the computer receives has been processed and transformed by the drive&rsquo;s firmware. Data on a CD-ROM is stored in a scrambled<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> (encoded) format, which the drive&rsquo;s firmware descrambles into the standard format before the computer receives it. The firmware also performs checksum comparison using CD-ROM&rsquo;s builtin fixity format and automatically corrects any errors it finds. (The next section will describe the format of CD-ROM in more detail.)</p>

<p>By comparison, analogous to how a raw flux read performs a low level image of a floppy<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup> and then processes it using software, Redump&rsquo;s standards makes use of raw reading functions that are available on a certain set of CD drives. These raw reading functions completely disable the processing the firmware would normally apply to data tracks: the data is read in its original scrambled form, with error correction disabled, so that data is returned in as close to its original form as possible. The software then performs descrambling and error correction after it&rsquo;s read. (For those interested in a more detailed technical summary of exactly what&rsquo;s being done here, <a href="https://github.com/superg/redumper?tab=readme-ov-file#good-drives-technical">the redumper README</a> goes into extensive detail.)</p>

<p>The primary benefit to performing rips this way is metadata: it&rsquo;s possible to log better, more legible information about the descrambling and integrity check processes when it&rsquo;s performed in software like this. The other benefit is that it becomes easier to reason about discs with unusual formats, disc with mastering errors from when they were produced, and discs with complex copy protection formats. Strangely-mastered or mis-mastered discs are surprisingly common, and this has been helpful for me in the past with a few discs that would otherwise have been difficult to reason about. Here are two recent examples:</p>

<ul>
<li>One disc contains a mastering error which corrupted the fixity data for a single 2048-byte sector. Using a typical read, this would manifest as a read error and it would be difficult to tell from the logs that this was the result of a mastering error and not disc damage. With a raw read, it became easier to separate out the reading process from the decoding process and thus to get a better understanding of what had happened.</li>
<li>One disc contains a mastering error which places 75 sectors (150KB) of data at the start of an audio track. This would otherwise have been very easy to miss, and may not have been properly decoded by the drive&rsquo;s firmware.</li>
</ul>


<h2>But Why? (aka, why is CD-ROM so weird?)</h2>

<p>The CD-ROM format is very complex, and not all software or all disc image formats support its full set of features.</p>

<ul>
<li>CD-ROM&rsquo;s relationship to the audio disc format means discs can have a complex structure.</li>
<li>&ldquo;ISO&rdquo; files can only represent the most simple kinds of discs.</li>
<li>CD has a builtin metadata format which most disc image formats don&rsquo;t support.</li>
<li>The same CD-ROM disc can have different data when viewed on different operating systems. OS-specific imaging tools may discard data for other OSs.</li>
</ul>


<h3>CD-ROM, CD audio, and multi track support</h3>

<p>The CD format wasn&rsquo;t originally designed for data at all&mdash;the original CD standard was purely designed around digital <em>audio</em>. The CD-ROM standard was only finalized later, and it acts as an extension to the CD audio format. The interaction between these two formats is the reason behind much of CD-ROM&rsquo;s complexity.</p>

<p>CD audio isn&rsquo;t a file-based format, and instead uses a series of unnamed, numbered <em>tracks</em>. CD-ROM extends this by making it possible for a <em>track</em> on a disc to contain data and a filesystem instead of audio. Since CD-ROM extends CD audio, the two formats aren&rsquo;t mutually exclusive: a CD-ROM disc can still contain multiple tracks, and it can even contain more than one data track or a mixture of data and audio tracks.</p>

<p>The most commonly used disc image file format, the ISO, doesn&rsquo;t support any of this advanced structure. An ISO represents a <em>data track</em>, not necessarily a full disc. Producing an ISO from a disc containing multiple tracks means that the rest of the disc is ignored, and only a single data track has been backed up.</p>

<p>The other unique feature of the ISO format compared to other disc image formats is that it omits fixity information. CD contains a builtin form of integrity protection, intended to protect against physical damage to a disc; up to a certain level of read error can be recovered using information in the error correction data. Typical data discs have sectors which are 2352 bytes long, of which 2048 bytes are data and 304 are error correction<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>. ISOs use a &ldquo;cooked&rdquo; format which strips the error correction component of each sector, leaving just the data. This data is less critical for a disc after it&rsquo;s been transferred to a disc image, but it does mean that it serves as a less accurate representation of the physical structure of the original disc.</p>

<h3>Subcode - CD&rsquo;s builtin metadata format</h3>

<p>CD defines a sidecar metadata format called the &ldquo;subcode&rdquo; or &ldquo;subchannel&rdquo;. It allows for small amounts of data to be stored alongside the audio or data on a disc. In most cases, it doesn&rsquo;t contain anything significant and so most CD disc image formats omit it entirely. However, it&rsquo;s possible for it to contain interesting or unique data that would be lost if it&rsquo;s not transferred along with a disc. Examples include <a href="https://en.wikipedia.org/wiki/CD-Text">CD-Text</a> (track names for CD audio discs); <a href="https://en.wikipedia.org/wiki/CD%2BG">CD graphics</a> (usually used for karaoke graphics on otherwise normal audio discs); and copy protection data for commercial software.</p>

<p>Other builtin metadata that&rsquo;s not typically preserved is contained in the disc&rsquo;s leadin and leadout segments. The leadin contains the disc&rsquo;s table of contents; typically, this information is preserved in a processed form via the drive&rsquo;s firmware, but not in the raw format direct from the disc. Likewise, the leadout contains finalizing metadata that isn&rsquo;t otherwise preserved when a CD is backed up.</p>

<h3>Multiple filesystems in a single track</h3>

<p>The CD-ROM format doesn&rsquo;t dictate which filesystem is used on a disc, and it&rsquo;s possible for a single track on a disc to contain more than one filesystem. This also means that the same disc can display drastically different content depending on whether it&rsquo;s inserted into a Windows, Mac or Linux PC. I&rsquo;ve personally witnessed a hybrid Mac/PC disc which had completely different contents on both systems, without a single shared file between them. This means that simply backing up a disc by copying the files off the disc is unsafe: you may be missing data from one of the other filesystems. This also means that filesystem-specific backup tools can be unsafe.</p>

<p>I&rsquo;ve seen some archivists use HFS Explorer to back up Mac CDs, for example, but this tool backs up individual <em>filesystems</em> from a disc&mdash;using it for a disc like this one would mean that the Windows contents would be completely lost. Even in the case that a disc is only for Mac, HFS Explorer doesn&rsquo;t necessarily preserve structural filesystem content in the same format as it was stored on disc.</p>

<h3>CD disc image formats</h3>

<p>There are a wide variety of disc image formats, many of which are specific to the vendor of a particular disc image reading program, and which can represent differing levels of a CD&rsquo;s features. A few common examples:</p>

<ul>
<li>ISO, as mentioned above, represents a <em>single data track</em> at the start of a disc, and isn&rsquo;t able to represent the remainder of a disc. It&rsquo;s stored in a &ldquo;cooked&rdquo; format with error correction data removed, and omits subcode data.</li>
<li>BIN/CUE, which can represent a full multi-track disc. Stored in a &ldquo;raw&rdquo; format, with error correction data retained. Modern versions of the format can include subcode data and can represent complex disc structures. It uses a human-readable metadata format called the &ldquo;<a href="https://en.wikipedia.org/wiki/Cue_sheet_(computing)">cue sheet</a>&rdquo;. The software I&rsquo;ll be talking about later in this post use the modern extended versions of BIN/CUE.</li>
<li>CloneCD, which was originally created to properly back up discs with complex copy protection schemes. It supports the same complex disc structures as BIN/CUE, and preserves subcode information, but differs in that its metadata format is lower level and not intended to be human-readable.</li>
</ul>


<h3>In summary</h3>

<p>CD-ROM is a complex format with a wide number of variations, and many disc image formats support only some of the kinds of discs which exist in the real world. Capturing in a complex format ensures nothing is lost while still leaving the flexibility to convert into a simpler format in the future.</p>

<h2>The Hardware</h2>

<p>Unlike floppy disk image flux archiving, there&rsquo;s no special enthusiast equipment needed here. Backing up CDs using these techniques uses certain models of standard off the shelf drives manufactured by Plextor. While these drives are no longer manufactured, they&rsquo;re readily available secondhand from eBay or computer recycling stores. They can be frequently purchased in good working condition for $40 or less. A full list of compatible drives can be found on the Redump wiki: <a href="http://wiki.redump.org/index.php?title=Optical_Disc_Drive_Compatibility">http://wiki.redump.org/index.php?title=Optical_Disc_Drive_Compatibility</a></p>

<p>This list contains a mixture of internal drives and USB-based external drives. Interal drives can also be converted into external drives using a cheap USB adapter.</p>

<h2>The Software</h2>

<p>There are a number of different tools available; this post will focus on the most popular ones and the ones with which I have personal experience. <a href="http://wiki.redump.org/index.php?title=Dumping_Guides">Redump&rsquo;s wiki</a> provides step-by-step usage guides for all of the tools I recommend.</p>

<h3>Media Preservation Frontend (Windows only)</h3>

<p>For users who prefer GUI tools to commandline tools, <a href="https://github.com/SabreTools/MPF">Media Preservation Frontend</a> (MPF) provides a graphical interface to the redumper, DiscImageCreator and Aaru tools. (This blog post won&rsquo;t be discussing Aaru.) Unfortunately, it&rsquo;s only available for Windows at this time.</p>

<p>It exposes each underlying tool&rsquo;s feature set to the fullest extent it can, and captures the appropriate metadata. Because it&rsquo;s oriented around submissions to the Redump database it also contains some data entry fields specific to Redump, but they&rsquo;re not mandatory and can be easily ignored.</p>

<h3>redumper</h3>

<p><a href="https://github.com/superg/redumper">redumper</a> is a relatively new commandline disc archiving program which has quickly emerged as the Redump community&rsquo;s new preferred disc backup tool. For archivists interested in using a commandline tool, redumper is my current recommendation.</p>

<p>Its feature set is relatively restricted compared to DiscImageCreator, but its opinionated defaults ensure it just does the right thing without extra configuration. Its focus on simplicity and reliability also extends to its metadata files: while it provides the same metadata as other options, it produces a smaller number of more organized files which I find easier to reason about. It also provides some additional metadata that I find useful.</p>

<h3>DiscImageCreator</h3>

<p><a href="https://github.com/saramibreak/DiscImageCreator">DiscImageCreator</a> was formerly the tool Redump recommended, but its standards no longer recommend it. Compared to redumper, whose focus is reliability and simplicity, DiscImageCreator features a vast suite of options but is comparably less reliable. Its metadata is also less organized and harder to read.</p>

<p>Its large feature set does mean that there are times when DiscImageCreator can come in handy for something specialized, but at the moment I don&rsquo;t recommend it as a primary tool.</p>

<h2>Converting from more complex formats to simpler ones</h2>

<p>After capturing in the formats produced by redumper and DiscImageCreator, it&rsquo;s possible to convert into simpler formats for access. This provides a useful tradeoff: the more complex formats are kept for longterm preservation, while copies in other formats can be temporarily produced for access and compatibility with software that needs plain ISO images.</p>

<p>On Mac and Linux, <a href="http://he.fi/bchunk/">bchunk</a> is an open source program which can convert BIN/CUE disc images into plain ISO files. For audio CDs or mixed-mode CDs which contain audio tracks, it can also convert audio tracks to WAV files. On Windows, <a href="https://www.isobuster.com">IsoBuster</a> can similarly convert disc images from one format to another.</p>

<p>Both redumper and DiscImageCreator produce their BIN/CUE images in a split format with one BIN file per track. For those who need a unified image with a single BIN for the same disc, <a href="https://github.com/putnam/binmerge">binmerge</a> (cross-platform, written in Python) and <a href="https://docs.mamedev.org/tools/chdman.html">chdman</a> (cross-platform, written in C) can perform the conversion.</p>

<h2>Useful metadata</h2>

<p>In addition to backing up discs, both redumper and DiscImageCreator produce some very useful metadata after the read is complete. This information isn&rsquo;t necessarily unique to this dumping technique&mdash;other software could do the same things after dumping a disc&mdash;but it&rsquo;s very useful to have this automatically performed for every disc.</p>

<p>Both redumper and DiscImageCreator produce machine-readable XML metadata containing metadata about each track on the disc: its size, and hashes in several formats. DiscImageCreator places it in a file named <code>.dat</code>, while Redumper places it in the <code>dat:</code> section of its log file.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>&lt;rom name="moonlight (Track 1).bin" size="658917504" crc="ec48aea4" md5="ed350360b8f40c9c5fc4a8ce1bc41c99" sha1="8b0022a6b14842678f0beee961720103d6ca5431" /&gt;
</span><span class='line'>&lt;rom name="moonlight (Track 2).bin" size="21226800" crc="06284fb2" md5="e97b60b95764212ba4788911e236c349" sha1="8a112d2f60693f6c767d60514c9a35d3855c55b1" /&gt;
</span><span class='line'>&lt;rom name="moonlight (Track 3).bin" size="50189328" crc="2358ba07" md5="191b3f4132b862b8f9239cbe0ad22dd9" sha1="cfbb15b6782a482305a90dea00b1bf4288e617b3" /&gt;
</span><span class='line'>&lt;rom name="moonlight (Track 4).bin" size="25371024" crc="31a7d363" md5="1a5a08d9c4c4084e1a390ad5b32454bf" sha1="710ee4cb7a85d627ec9bc9c29deb0620a3d67cba" /&gt;</span></code></pre></td></tr></table></div></figure>


<p>For ISO 9660/PC format discs, both programs also extract mastering date information. This comes from the primary volume descriptor (PVD) information, and contains date information pertaining to the disc&rsquo;s creation. For example, from the logs for the same disc as the one above:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>ISO9660 [moonlight (Track 1).bin]:
</span><span class='line'>  volume identifier: CAFFE
</span><span class='line'>  PVD:
</span><span class='line'>0320 : 20 20 20 20 20 20 20 20  20 20 20 20 20 31 39 39                199
</span><span class='line'>0330 : 36 30 36 30 37 31 34 32  39 31 36 30 30 00 31 39   6060714291600.19
</span><span class='line'>0340 : 39 36 30 36 30 37 31 34  32 39 31 36 30 30 00 30   96060714291600.0
</span><span class='line'>0350 : 30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 00   000000000000000.
</span><span class='line'>0360 : 30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30   0000000000000000
</span><span class='line'>0370 : 00 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................</span></code></pre></td></tr></table></div></figure>


<p>This shows that the disc has the title <code>CAFFE</code>, and four embedded timestamps representing the disc&rsquo;s creation:</p>

<ul>
<li>Volume creation date and time - <code>1996060714291600</code>, aka June 7, 1996, at 14:29:16 (UTC)</li>
<li>Volume moditification date - identical to the above</li>
<li>Volume expiration date - date the disc should be considered obsolete; often left with null values, as it is here</li>
<li>Volume effective date - date the disc should be used starting from; also often left null</li>
</ul>


<p>Redumper also produces a full file listing for ISO 9660 discs, along with calculating their hashes. An abbreviated example from the same disc:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>*** SKELETON (time check: 3s)
</span><span class='line'>
</span><span class='line'>excluded areas hashes (SHA-1):
</span><span class='line'>1a7334e9350d06a69f5dbf1e8ec8ca9c98ad89da SYSTEM_AREA
</span><span class='line'>edcae21603e3564acfea07e81c205031101976ea /SAVER/OPENING.MOV
</span><span class='line'>1d73c3b2f53d251a56b61e0b75c6b5184600c4ae /SAVER/TOKIMEKI.MOV
</span><span class='line'>4f89fe21c61e44e1b9dedc85e09b2c1390055f9b /SAVER/ENDING.MOV
</span><span class='line'>091492f54a3a182921d5255ae3560f26d4dc4d11 /SAVER/CAFFES.MOV
</span><span class='line'>c1589aa3e8f55b86d0be614e835127d254eabb54 /README.TXT</span></code></pre></td></tr></table></div></figure>


<h2>What do all these files mean?</h2>

<p>Both redumper and DiscImageCreator produce a large number of files, which can be overwhelming at first; this list provides a little guide as to what those files mean, and which ones are most important to retain for longterm preservation.</p>

<h3>redumper</h3>

<p>A list of files can also be found on the <a href="http://wiki.redump.org/index.php?title=redumper#Output_Files">Redump wiki</a>.</p>

<ul>
<li>All <code>.bin</code> files - The disc&rsquo;s data and audio tracks, one file per track.</li>
<li><code>discname.log</code> - The full set of logs and metadata from the read process.</li>
<li><code>discname.cue</code> - The disc&rsquo;s table of contents (list of tracks) in a human-readable cuesheet format.</li>
<li><code>discname.toc</code> and <code>discname.fulltoc</code> - The disc&rsquo;s table of contents, in its original, low-level binary format.</li>
<li><code>discname.state</code> - The disc&rsquo;s original fixity information, in a binary format.</li>
<li><code>discname.subcode</code> - The subcode metadata, in its original binary format, as stored on the disc.</li>
<li><code>discname.scram</code> - The scrambled version of the disc, as a single file. While this is generally no longer needed after the reading process is complete and the data has been decoded, it contains the leadin and leadout data that is normally omitted when reading a disc; some people may elect to preserve it for that reason.</li>
</ul>


<h3>DiscImageCreator</h3>

<ul>
<li>All <code>.bin</code> files - The disc&rsquo;s data and audio tracks, one file per track.</li>
<li>All <code>.txt</code> files - The full set of logs and metadata from the read process. Unlike redumper, these are stored as a large number of separate files.</li>
<li><code>discname.sub</code> - The subcode metadata, in a processed binary format which reorders the data in order to be easier to read.</li>
<li><code>discname.cue</code> - The disc&rsquo;s table of contents (list of tracks) in a human-readable cuesheet format.</li>
<li><code>discname.ccd</code> - The disc&rsquo;s table of contents (list of tracks) in the CloneCD format, which is more complex and not designed to be read by humans.</li>
<li><code>discname.toc</code> - The disc&rsquo;s table of contents, in its original, low-level binary format.</li>
<li><code>discname.dat</code> - XML-format metadata for each track, containing file sizes and hashes/checksums in several formats. The same data is contained in the <code>.log</code> file from redumper.</li>
<li><code>discname.c2</code> - The disc&rsquo;s original fixity information, in a binary format.</li>
<li>Filenames containing <code>Track 0</code> and <code>Track AA</code> - The leadin and leadout sections of the disc.</li>
<li><code>discname.img</code> - A single-file copy of the disc&rsquo;s data. This duplicates exactly the contents of the <code>.bin</code> files, and can be easily recreated by concatenating them in the future, so it&rsquo;s not important to keep.</li>
<li><code>discname_img.cue</code> - A copy of the cuesheet adjusted for the above file.</li>
</ul>


<h2>Obtaining the tools</h2>

<p>All of these tools are open source and can be downloaded from GitHub.</p>

<ul>
<li>MPF: <a href="https://github.com/SabreTools/MPF/releases">https://github.com/SabreTools/MPF/releases</a></li>
<li>redumper: <a href="https://github.com/superg/redumper/releases">https://github.com/superg/redumper/releases</a></li>
<li>DiscImageCreator: <a href="https://github.com/saramibreak/DiscImageCreator/releases">https://github.com/saramibreak/DiscImageCreator/releases</a></li>
</ul>


<p>In addition, for Mac users, I package redumper and DiscImageCreator in Homebrew. While my packages aren&rsquo;t always 100% up to date, I try to ensure that they work. They can be installed via:</p>

<ul>
<li>redumper: <code>brew install mistydemeo/digipres/redumper</code></li>
<li>DiscImageCreator: <code>brew install mistydemeo/digipres/disc-image-creator</code></li>
</ul>


<h2>Limitations</h2>

<p>Certain especially complex types of copy protection are still not fully supported by these tools, although the situation is improving. While Redumper recently added support for the SafeDisc protection format, for example, there are still discs it&rsquo;s not able to handle properly; closed-source tools such as CloneCD are still needed to handle these discs.</p>

<p>Redumper has plans to add support for ring-based copy protection such as Ring Protech in the future, but it&rsquo;s poorly-supported at the moment; again, closed-source tools such as Alcohol 120% are necessary to handle these discs.</p>

<h2>Conclusion</h2>

<p>I hope this guide has been helpful for those who are interested. If readers have any questions or need any other information, please feel free to reach out to me <a href="https://digipres.club/@misty">on Mastodon</a> or <a href="https://bsky.app/profile/cdrom.ca">Bluesky</a>.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Amazingly, this is actually the technical term - see <a href="https://ecma-international.org/publications-and-standards/standards/ecma-130/">ECMA-130</a> Annex B.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>It&rsquo;s not quite analogous: a Redump-style disc rip isn&rsquo;t operating on <em>as</em> low a level as a raw flux read is, but it&rsquo;s lower-level than standard disc reading software. While the <a href="https://www.domesday86.com">Domesday86 project</a> exists to perform truly low-level raw laser dumps of laserdisc and LD-ROM discs, there isn&rsquo;t a mature project to apply the same technique to CD.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>There are a few alternate sector formats which divide up the 2352 bytes differently; they devote more space to data and less space to error correction, at the risk of making a disc more susceptible to physical damage.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA["GitHub" Is Starting to Feel Like Legacy Software]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/07/12/github-is-starting-to-feel-like-legacy-software/"/>
    <updated>2024-07-12T12:58:54-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/07/12/github-is-starting-to-feel-like-legacy-software</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve used a lot of tools over the years, which means I&rsquo;ve seen a lot of tools hit a plateau. That&rsquo;s not always a problem; sometimes something is just &ldquo;done&rdquo; and won&rsquo;t need any changes. Often, though, it&rsquo;s a sign of what&rsquo;s coming. Every now and then, something will pull back out of it and start improving again, but it&rsquo;s often an early sign of long-term decline. I can&rsquo;t always tell if something&rsquo;s just coasting along or if it&rsquo;s actually started to get worse; it&rsquo;s easy to be the boiling frog. That changes for me when something that <em>really</em> matters to me breaks.</p>

<p>To me, one of GitHub&rsquo;s killer power user features is its <code>blame</code> view. <code>git blame</code> on the commandline is useful but hard to read; it&rsquo;s not the interface I reach for every day. GitHub&rsquo;s web UI is not only convenient, but the ease by which I can click through to older versions of the blame view on a line by line basis is uniquely powerful. It&rsquo;s one of those features that anchors me to a product: I stopped using offline graphical git clients because it was just that much nicer.</p>

<p>The other day though, I tried to use the blame view on a large file and ran into an issue I don&rsquo;t remember seeing before: I just <em>couldn&rsquo;t find</em> the line of code I was searching for. I threw various keywords from that line into the browser&rsquo;s command+F search box, and nothing came up. I was stumped until a moment later, while I was idly scrolling the page while doing the search again, and it finally found the line I was looking for. I realized what must have happened.</p>

<p>I&rsquo;d heard rumblings that GitHub&rsquo;s in the middle of shipping a frontend rewrite in React, and I realized this must be it. The problem wasn&rsquo;t that the line I wanted wasn&rsquo;t on the page&mdash;it&rsquo;s that the whole document wasn&rsquo;t being rendered at once, so my browser&rsquo;s builtin search bar just <em>couldn&rsquo;t find it</em>. On a hunch, I tried disabling JavaScript entirely in the browser, and suddenly it started working again. GitHub is <em>able</em> to send a fully server-side rendered version of the page, which actually works like it should, but doesn&rsquo;t do so unless JavaScript is completely unavailable.</p>

<p>I&rsquo;m hardly anti-JavaScript, and I&rsquo;m not anti-React either. Any tool&rsquo;s perfectly fine when used in the right place. The problem: this <em>isn&rsquo;t the right place</em>, and what is to me personally a key feature suddenly doesn&rsquo;t work right all the time anymore. This isn&rsquo;t the only GitHub feature that&rsquo;s felt subtly worse in the past few years&mdash;the once-industry-leading status page no longer reports minor availability issues in an even vaguely timely manner; Actions runs randomly drop network connections to GitHub&rsquo;s own APIs; hitting the merge button sometimes scrolls the page to the wrong position&mdash;but this is the first moment where it really hit me that GitHub&rsquo;s probably not going to get better again from here.</p>

<p>The corporate branding, the new &ldquo;AI-powered developer platform&rdquo; slogan, makes it clear that what I think of as &ldquo;GitHub&rdquo;&mdash;the traditional website, what are to me the core features&mdash;simply isn&rsquo;t Microsoft&rsquo;s priority at this point in time. I know many talented people at GitHub who care, but the company&rsquo;s priorities just don&rsquo;t seem to value what I value about the service. This isn&rsquo;t an anti-AI statement so much as a recognition that the tool I still need to use every day is past its prime. Copilot isn&rsquo;t navigating the website for me, replacing my need to the website as it exists today. I&rsquo;ve had tools hit this phase of decline and turn it around, but I&rsquo;m not optimistic. It&rsquo;s still plenty usable now, and probably will be for some years to come, but I&rsquo;ll want to know what other options I have <em>now</em> rather than when things get worse than this.</p>

<p>And in the meantime, well&hellip; I still need to use GitHub everyday, but maybe it&rsquo;s time to start exploring new platforms&mdash;and find a good local <code>blame</code> tool that works as well as the GitHub web interface used to. (Got a fave? Send it to me at <a href="https://digipres.club/@misty">misty@digipres.club</a> / <a href="https://bsky.app/profile/cdrom.ca">@cdrom.ca</a>. Please!)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Unlocking Puyo Puyo Fever for Mac's English Mode]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/04/07/unlocking-puyo-puyo-fever-for-macs-english-mode/"/>
    <updated>2024-04-07T22:31:23-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/04/07/unlocking-puyo-puyo-fever-for-macs-english-mode</id>
    <content type="html"><![CDATA[<p><em>The short, no-clickbait version: to switch the Mac version of Puyo Puyo Fever to English, edit <code>~/Library/Preferences/PuyoPuyo Fever/PUYOF.BIN</code> and set the byte at <code>0x266</code> to <code>0x01</code>&mdash;or just download <a href="http://mistys-internet.website/blog/data/puyofever/PUYOF.BIN">this pre-patched save game</a> and place it in that directory.</em></p>

<p><img src="http://mistys-internet.website/blog/images/puyofever/story.png" title="English Puyo Pop Fever in-game story mode screen" ></p>

<p>I&rsquo;ve been a Mac user since 2005, and one of the very first Mac games I bought was the Mac port of Sega&rsquo;s <a href="https://en.wikipedia.org/wiki/Puyo_Pop_Fever">Puyo Puyo Fever</a>. I&rsquo;ve always been a Sega fangirl and I&rsquo;ve always loved puzzle games (even if I&rsquo;m not that good at Puyo Puyo), so when they actually released a Puyo Puyo game for Mac I knew I had to get it. This was back in the days when very, very few companies released games for Mac, so there weren&rsquo;t many options. Even Sega usually ignored Mac users; Puyo Puyo Fever only came out as part of a marketing gimmick that saw Sega release a new port every month for most of a year, leading them to target more niche platforms like Mac, Palm Pilot and Pocket PC.</p>

<p>A few of the console versions came out in English, but the Mac port was exclusive to Japan. I didn&rsquo;t read any Japanese at the time, so I just muddled my way through the menus while wishing I could play it in English. I&rsquo;d thought that maybe I could try to transplant English game data from the console versions, but I didn&rsquo;t own any of them so I just resigned myself to playing the game in Japanese.</p>

<p>Recently, though, I came across some information that made me realize there might be more to it. First, I finally got to try the Japan-exclusive Dreamcast port from 2004&hellip; and discovered that it was fully bilingual, with an option in the menu to switch between Japanese or English text and voices. I might have just thought that Dreamcast players were lucky and I was still out of luck until I ran into the <a href="https://puyonexus.com/wiki/Guide:PPF_PC_Translation">English Puyo Puyo fan community</a>&rsquo;s mod to enable English support in the Windows version. Their technique, which was discovered by community members Yoshi and nmn around 2009, involves modifying not the game itself but a flag in the save game&mdash;the same flag used by the Dreamcast version, which it&rsquo;s still programmed to respect despite the menu option having been removed.</p>

<p>I wasn&rsquo;t able to use the <a href="https://puyonexus.com/forum/viewtopic.php?f=7&amp;t=1097">Windows save modding tool</a> produced by Puyo Puyo fan community member NickW for a couple of reasons:</p>

<ol type="a">
<li>It&rsquo;s hardcoded to open the save file from the Windows save location, <code>%AppData%\SEGA\PuyoF\PUYOF.BIN</code>, and can&rsquo;t just be given a save file at some other path, and</li>
<li>The Windows version uses compressed save data, while the Mac version always uses uncompressed saves, and so the editor won&rsquo;t try to open uncompressed saves.</li>
</ol>


<p>I could have updated the editor to work around this but, knowing that that the save was uncompressed and I only had to change a single byte, it seemed like overkill. One byte is easy enough to edit without a specialized tool, so I just pulled out a hex editor. The Windows save editor is source-available, so I didn&rsquo;t have to reverse engineer the locations of the key flags in the save file myself. I guessed that the language flag offset wouldn&rsquo;t be different between the uncompressed Windows saves and the Mac saves, so after reading that it&rsquo;s stored at byte <code>0x288</code>, I tried changing it from <code>0x00</code> to <code>0x01</code> and started up the game.</p>

<p><img src="http://mistys-internet.website/blog/images/puyofever/title.png" title="English Puyo Pop Fever title screen" ></p>

<p>&hellip;and it just worked! Without any changes, the entire game swapped over to English&mdash;menus, dialogue, and even the title screen logo. After 20 years, suddenly I was playing Puyo Puyo Fever for Mac in English.</p>

<p>According to the Windows save editor, the next byte (<code>0x289</code>) controls the voice language. Neither the Windows nor the Mac versions actually shipped with English voices on the disc, however, so setting this value just ends up silencing the game instead. The fan community prepared an <a href="https://puyonexus.com/wiki/Guide:PPF_PC_Translation">English voice pack</a> taken from the other versions, but I didn&rsquo;t bother trying it on Mac since proper timing data for the English voices is missing.</p>

<p>At this point I figured I&rsquo;d discovered everything I was going to find until I noticed something at the start of the save data in the hex editor:</p>

<p><img src="http://mistys-internet.website/blog/images/puyofever/hexeditor.png" width="300" title="Screenshot of a hex editor showing an image-like pattern" ></p>

<p>I&rsquo;d only been paying attention to data later in the file, so I&rsquo;d overlooked the very beginning until now. But now that I looked at it, it was a <em>very</em> regular pattern. It looks suspiciously like an image; uncompressed bitmaps are usually recognizable to the naked eye in a hex editor, and I wondered if that could be what this was. So I dug out the Dreamcast version again, and lo and behold:</p>

<p><img src="http://mistys-internet.website/blog/images/puyofever/PUYOFEVERSYS.png" title="A square pixel art image of a sign with the Japanese hiragana symbol &#34;pu&#34;" alt="A square pixel art image of a sign with the Japanese hiragana symbol &#34;pu&#34;"></p>

<p>It&rsquo;s the Dreamcast version&rsquo;s save icon, displayed in the Dreamcast save menu and on the portable VMU save device. The Mac version doesn&rsquo;t have any reason to need this, and has nowhere to display it, but it&rsquo;s there anyway. Looking at the start of the header made me realize the default save file name from the Dreamcast port is there too&mdash;the very first bytes read 「システムファイル」, or &ldquo;System File&rdquo;. Grabbing an original Dreamcast save file, I was able to confirm that the Mac save is <em>completely identical</em> to the Dreamcast version, except for rendering multi-byte fields in big-endian format<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. I guess by 2004 there was no reason to spend time rewriting the save data format just to save a few hundred bytes, so all the Dreamcast-specific features come along for the ride on Mac and Windows.</p>

<p>Now, you might, ask, why would I spend so much time on a Mac port that doesn&rsquo;t even run on modern computers? (Though I&rsquo;d be happy to fix that - Sega, <a href="mailto:mistydemeo+puyopuyo@gmail.com">email me!</a>) Part of it is just that I love digging into older games like this to find out what makes them tick; it&rsquo;s as much a hobby as actually playing them. The other part, of course, is that I&rsquo;ll actually play it. As you might be able to guess from the <a href="https://github.com/mistydemeo/tigerbrew">PowerPC Mac package manager</a> I maintain, I still keep my old Macs around and every now and then I break out the PowerMac G4 for a few rounds of Puyo Puyo Fever. The next time I do, I&rsquo;ll be able to play it in English.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>The <a href="https://en.wikipedia.org/wiki/Endianness">byte order</a>, or endianness, of multi-byte data types is different between different kinds of CPUs. The PowerPC processors used by that era of Macs use the big endian format.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[That Time I Accidentally Deleted a Game From MAME]]></title>
    <link href="http://mistys-internet.website/blog/blog/2024/03/01/that-time-i-accidentally-deleted-a-game-from-mame/"/>
    <updated>2024-03-01T03:25:45-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2024/03/01/that-time-i-accidentally-deleted-a-game-from-mame</id>
    <content type="html"><![CDATA[<p>Awhile back, I had the chance to <a href="https://www.mistys-internet.website/blog/blog/2022/10/27/how-i-dumped-an-arcade-game-for-mame/">dump a game for MAME</a>. I told myself that if the chance ever came up again, I&rsquo;d contribute again. Luckily, it turns out I didn&rsquo;t have to wait too long&mdash;but the story didn&rsquo;t end like I expected it to.</p>

<p><img src="http://mistys-internet.website/blog/images/martmasttw/0013.big.png" alt="In-game screenshot of Martial Masters" /></p>

<p>When I bought my PGM arcade motherboard, the #1 game I wanted to own was a one-on-one fighting game called <a href="https://wiki.supercombo.gg/w/Martial_Masters">Martial Masters</a>. It&rsquo;s a deeply underrated, gorgeous game&mdash;and judging from the price it goes for these days, I&rsquo;m not the only one after it. It took quite a bit of hunting until I found a copy within my price range but my usual PGM game dealer in China finally tracked down a copy to sell me a few months ago. I was excited to finally play it on the original hardware, but also to see if I had another chance to contribute a game to MAME.</p>

<p>When it arrived, even before I had the chance to check the version number, I was surprised to see it was a Taiwanese region game. All    of IGS&rsquo;s games have simplified Chinese region variants for sale in China; it&rsquo;s unusual to see a traditional Chinese version from Taiwan show up over there. It could just be a sign that the game was so popular they brought over extra cartridges from Taiwan when there weren&rsquo;t enough for local arcades. Once I booted the game and made note of its version numbers, I checked MAME and saw that there was a matching game in its database: <code>martmasttw</code>, or a special Taiwanese version of revision 1.02. That also surprised me&mdash;IGS typically didn&rsquo;t produce entirely separate builds for different regions. Instead, each of their games contains the data for every language and region in its ROMs, and the region code in its copy protection chip determines what region it boots up as.</p>

<p><img src="http://mistys-internet.website/blog/images/martmasttw/0000.big.png" alt="Screenshot of Martial Masters crashing" /></p>

<p>The other thing I noticed about MAME&rsquo;s <code>martmasttw</code> was a comment in the source code noting that it might be a bad dump&mdash;that is, an invalid read that produced corrupted data. This isn&rsquo;t that uncommon when dumping these sorts of games. Whether it&rsquo;s due to dying chips or hardware issues with the reading process, sometimes a read just goes wrong and it gets missed. Once I booted it up in MAME, I confirmed it looked like a bad dump. It instantly crashes with an illegal instruction error, a clear sign of corrupted program code. Now that I owned the game, I had a chance to dump the correct ROMs and fix MAME&rsquo;s database.</p>

<p><img src="http://mistys-internet.website/blog/images/martmasttw/gamechip.jpeg" alt="Photo of a game chip being held" /></p>

<p>As soon as I opened the cartridge, I noticed something interesting: these weren&rsquo;t the chips I was expected. Like with The Gladiator, I only needed to remove and dump two socketed chips, but these were a completely different model. Other PGM games using the same hardware typically use 27C322 (4MB) and 27C160 (2MB) chips, which were common EPROMs in their time period. Here, though, I saw something much more exotic: an OKI 27C3202 soldered into a custom adapter. The game board itself is essentially the same one that&rsquo;s in The Gladiator, so it was clear that the adapter was presenting them as 4MB 27C322 chips.</p>

<p>I haven&rsquo;t been able to figure out why it was designed this way. It can&rsquo;t have been cheap to design and manufacture these custom adapters, and other PGM games that were made both before and after this one all use the more common chips without any adapters. I&rsquo;ve only seen a single other game built this way. Was there a 27C322 shortage at the time this specific game was being made? Were they experimenting with new designs and ended up abandoning this approach? It&rsquo;s hard to tell.</p>

<p><img src="http://mistys-internet.website/blog/images/martmasttw/reader.jpeg" alt="Photo of a game chip being dumped in an EPROM reader" /></p>

<p>I only have an EPROM reader adapter for chips in the 27C322 family, so I hoped it would would be able to handle reading them just fine. On my first attempt, it rejected it; as far as I can tell, it was trying to perform &ldquo;smart&rdquo; verification of the chip, which failed since the underlying chip underneath IGS&rsquo;s adapter isn&rsquo;t actually the chip it&rsquo;s trying to query. I ultimately tricked it by inserting a real 27C322 first and reading that before swapping over to the chip I actually wanted to read. Once the reader&rsquo;s recognized at least one chip, it seems happy to stick in 27C322 mode persistently.</p>

<p>My first read seemed fine, and the dumped data did have a different hash from what MAME recognized. Success! &hellip;or so I thought, until I tried actually booting the game, where it crashed again. I went back to the EPROM reader to make sure the chip was seated correctly before doing a new test read. From the physical design of the adapters, I knew that getting it seated might be a challenge.</p>

<p>The reader uses a <a href="https://en.wikipedia.org/wiki/Zero_insertion_force">ZIF socket</a> which usually makes it easy to insert and remove chips. This time, though, there was an interesting complication. Because of how it&rsquo;s constructed, the socket has a &ldquo;lip&rdquo; at the end past the final set of pins. With a normal 27C322, that&rsquo;s not a problem; the chip ends right at the final set of pins, so nothing hangs over the end of the chip. This adapter has a very different shape from a real 27C322 chip, however&mdash;there&rsquo;s a dangling &ldquo;head&rdquo; that contains the actual chip, as seen in the photo above showing the underside of the adapter. On the real board it hangs harmlessly over the end of the socket, but on a ZIF socket it ends up actually making contact with the end of the socket and keeps the pins from being able to sit as deeply as it would normally sit. I haven&rsquo;t spoken to the person who originally dumped this revision, but I suspect that this is the issue behind the bad dump.</p>

<p>I ended up holding the apdater with one hand to stabilize it and keep all of the pins as even as I could while I locked the ZIF socket&rsquo;s lever a second time; this time, it seemed as though I&rsquo;d been able to get it sitting as even as possible. I then performed several more reads and, before trying to boot it again, compared them against each other. This time, I saw that these new reads were different from the first attempt&mdash;<em>and</em> that they were byte-for-byte identical to each other.</p>

<p><img src="http://mistys-internet.website/blog/images/martmasttw/0003.big.png" alt="Screenshot of Martial Masters's title screen" /></p>

<p>Once I had what seemed like good dump of both chips, I booted them up in MAME to see if it would work. Unlike MAME&rsquo;s ROMs, it booted right away without issues and worked perfectly. After I played a few rounds without a single crash or unexpected behaviour, I was satisfied that my new dumps were fine. As I was getting ready to submit a pull request to MAME to update the hashes in its database, however, I happened to grep the source for them and noticed something funny&mdash;they were already there. In another version of Martial Masters.</p>

<p>I mentioned earlier that I was surprised that MAME had labelled the Taiwanese 1.02 version of Martial Masters as a separate revision from the Chinese 1.02. Well, as it turns out, once the ROMs are dumped correctly it&rsquo;s <em>not</em> a separate revision. The ROMs are actually byte-for-byte identical; it&rsquo;s only the bad dump that had made MAME consider <code>martmasttw</code> a separate revision this whole time.</p>

<p>This is the point where I&rsquo;d intended to open a pull request to MAME just updating a few hashes for the correct dump, but with everything I&rsquo;d learned the <a href="https://github.com/mamedev/mame/pull/11883">final pull request</a> deleted <code>martmasttw</code> entirely. I had set out to fix a revision of the game in MAME, and make one more verison of it playable. Instead, I&rsquo;d proven <em>it didn&rsquo;t exist in the first place</em>. This wasn&rsquo;t where I expected to end up, but it does teach an important lesson: corrupted data can go unnoticed for years if it&rsquo;s not double and triple checked.</p>

<p>And, more than that, it&rsquo;s a reminder that <em>databases are an eternal work in progress</em>. MAME&rsquo;s list of ROMs is also as close as there is to a global catalogue of arcade games and their revisions, but it&rsquo;s still fallible. Databases grow and, sometimes, they shrink; proving a work <em>doesn&rsquo;t</em> exist can be just as important as uncovering new works.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Fixing Classical Cats; or, How I Got Tricked by 28-year-old Defensive Programming]]></title>
    <link href="http://mistys-internet.website/blog/blog/2023/12/10/fixing-classical-cats-or/"/>
    <updated>2023-12-10T21:49:36-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2023/12/10/fixing-classical-cats-or</id>
    <content type="html"><![CDATA[<p>Every now and then, when working on ScummVM&rsquo;s Director engine, I run across a disc that charms me so much I just have to get it working right away. That happened when I ran into Classical Cats, a digital art gallery focused on the work of Japanese artist and classical musician Mitsuhiro Amada. I <a href="https://cdrom.ca/games/2023/12/06/classical-cats.html">wrote about the disc&rsquo;s contents</a> in more detail at my <a href="https://cdrom.ca">CD-ROM blog</a>, but needless to say I was charmed&mdash;I wanted to share this with more people.</p>

<p><img src="http://mistys-internet.website/blog/images/classicalcats/scummvm-classicalcats-mac-ja-00004.png" alt="Screenshot of a cat playing piano next to a cat playing a violin and a cat playing cello" /></p>

<p>I first found out about Classical Cats when fellow ScummVM developer einstein95 pointed me at it because its music wasn&rsquo;t working. Like a lot of early Director discs, Classical Cats <em>mostly</em> just worked on the first try. At this point in ScummVM&rsquo;s development, I&rsquo;m often more surprised if a disc made in Director 3 or 4 fails to boot right away. The one thing that didn&rsquo;t work was the music.</p>

<p>Classical Cats uses CD audio for its music, and I&rsquo;d already written code to support this in early releases of <a href="https://www.mobygames.com/game/2059/alice-an-interactive-museum/">Alice: An Interactive Museum</a> for Mac. I&rsquo;d optimistically hoped that Classical Cats might be as easy, but it turned out to present some extra technical complexity. Regardless, for a disc called &ldquo;Classical&rdquo; Cats, I knew that getting music working would be important. I could tell that I wasn&rsquo;t having the full experience.</p>

<p>While many CD-ROMs streamed their music from files on the disc, some discs used CD audio tracks for music instead. (If you&rsquo;re already familiar with CD audio and mixed-mode CDs, you can skip to the next paragraph.) CD audio is the same format used in audio CDs; these tracks aren&rsquo;t files in a directory and don&rsquo;t have names, but are simply numbered tracks like you&rsquo;d see in a CD player. Data on a CD is actually contained within a track on the disc, just like audio; data tracks are just skipped over by CD players. A <em>mixed mode</em> CD is one that contains a mixture of one or more data tracks and one or more audio tracks on the same disc. This was often used by games and multimedia discs as a simple and convenient way to store their audio.</p>

<p>Director software is written in its own programming language called Lingo; I&rsquo;ve written about it <a href="https://www.mistys-internet.website/blog/blog/2022/01/06/do-you-speak-the-lingo/">a few</a> <a href="https://www.mistys-internet.website/blog/blog/2023/05/29/untangling-another-lingo-parser-edge-case/">times</a> before. In addition to writing logic in Lingo, developers are able to write modules called XObjects; these can be implemented in another language like C, but expose an interface to Lingo code. It works very similarly to C extensions in languages like Ruby or Python.</p>

<p>While ScummVM is able to run Lingo code directly, it doesn&rsquo;t emulate the original XObjects. Instead, it contains new clean-room reimplementations embedded into ScummVM that expose the same interfaces as the originals. If a disc tries to call an unimplemented XObject, ScummVM just logs a warning and is able to continue. I&rsquo;d already implemented one of Director&rsquo;s builtin audio CD XObjects earlier, which was how I fixed Alice&rsquo;s music earlier.</p>

<p>ScummVM has builtin support for playing emulated audio CDs by replacing the audio tracks with MP3 or FLAC files. For Alice, I <a href="https://github.com/scummvm/scummvm/pull/4231">wrote</a> an implementation of Director&rsquo;s builtin Apple Audio CD XObject. That version was straightforward and easy to implement; it has a minimal API that allows an app to request playback of a CD via track number, which maps perfectly onto ScummVM&rsquo;s virtual CD backend.</p>

<p>I already knew Classical Cats uses a different XObject, and so I&rsquo;d have to write a new implementation for it, it turns out the API was very different from Alice&rsquo;s. Alice, along with many other Director games I&rsquo;ve looked at, uses a fairly high-level, track-oriented API that was simple to implement. ScummVM&rsquo;s builtin CD audio infrastructure is great at handling requests like &ldquo;play track 5&rdquo;, or &ldquo;play the first 30 seconds of track 7&rdquo;. What it&rsquo;s not at all prepared for is requests like &ldquo;play from position 12:00:42 on the disc&rdquo;.</p>

<p>You can probably guess what Classical Cats does! Instead of working with tracks, it starts and stops playback based on absolute positions on a disc. This may sound strange, but it&rsquo;s how the disc itself is set up. On a real CD, tracks themselves are just indices into where tracks start and stop on a disc, and a regular CD player looks up those indices to decide where to seek to when you ask it to play a particular track. In theory, it&rsquo;s pretty similar to dropping a record player needle on a specific spot on the disc.</p>

<p>This might not sound too complex to manage, but there&rsquo;s actually something that makes it a lot harder: translating requests to play an absolute timecode to an audio file on disc. ScummVM isn&rsquo;t (usually) playing games from a real CD, but emulating a drive using the game data and FLAC or MP3 files replacing the CD audio tracks. ScummVM generally plays games using the data extracted from the CD into a folder on the hard drive, which causes a problem: the data track on a mixed mode CD is usually the first track, which means that the timing of every other track on the disc is offset by the length of the data track. We can&rsquo;t guess where anything else is stored without knowing exactly how long the data track is. If we&rsquo;ve extracted the data from the CD, we no longer know how big that track is, and we can&rsquo;t guess at the layout of the rest of the disc.</p>

<p>&ldquo;Knowing the disc layout&rdquo; is a common problem with CD ripping and authoring, and a number of standards exist already. Single-disc data CDs can easily be represented as an ISO file, but anything more complex requires an actual table of contents. When thinking about how to solve this problem for ScummVM, I immediately thought of <a href="https://en.wikipedia.org/wiki/Cue_sheet_(computing)">cuesheets</a>&mdash;one of the most popular table of contents formats for CD ripping, and one that&rsquo;s probably familiar to gamers who have used BIN/CUE rips of 32-bit era video games. Among all the formats available for documenting a disc&rsquo;s table of contents, cuesheets were attractive for a few reasons: I&rsquo;ve worked with it before, so I&rsquo;m already familiar with it; it&rsquo;s human-readable, so it&rsquo;s easy to validate that it&rsquo;s being used properly; and it provides a simple, high-level interface that abstracts away irrelevant details that I wouldn&rsquo;t need to implement this feature. A sample cuesheet for a mixed mode CD looks something like this:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>FILE "CLSSCATS.BIN" BINARY
</span><span class='line'>  TRACK 01 MODE1/2352
</span><span class='line'>    INDEX 01 00:00:00
</span><span class='line'>  TRACK 02 AUDIO
</span><span class='line'>    PREGAP 00:02:00
</span><span class='line'>    INDEX 01 17:41:36
</span><span class='line'>  TRACK 03 AUDIO
</span><span class='line'>    INDEX 01 19:20:46
</span><span class='line'>  TRACK 04 AUDIO
</span><span class='line'>    INDEX 01 22:09:17</span></code></pre></td></tr></table></div></figure>


<p>Once you understand the format, it&rsquo;s straightforward to read and makes it clear exactly where every track is located on the disc.</p>

<p>The main blocker here was simply that ScummVM didn&rsquo;t have a cuesheet parser yet, and I wasn&rsquo;t eager to write one myself. Just when I was on the verge of switching to another solution, however, ScummVM project lead Eugene Sandulenko offered to write a new one integrated into ScummVM itself. As soon as that was ready, I was able to get to work.</p>

<p>The XObject Classical Cats uses has a fairly complicated interface that&rsquo;s meant to support not just CDs, but also media like video cassettes. To keep things simple, I decided to limit myself to implementing just the API that this disc uses and ignore methods it never calls. It&rsquo;s hard to make sure my implementation&rsquo;s compatible if I don&rsquo;t actually see parts of it in use, after all. By watching to see which method stubs are called, I could see that I mainly had to deal with a limit set of methods. Aside from being able to see which methods are called and the arguments passed to them, I was able to consult the official documentation in the Director 4.0 manual.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<p>Two of the most fundamental methods I began with were <code>mSetInPoint</code> and <code>mSetOutPoint</code>, whose names were pretty self-explanatory. Rather than have a single method to begin playback with start/stop positions, this library uses a cue system. Callers first call <code>mSetInPoint</code> to define the start playback position and <code>mSetOutPoint</code> to set a stop position. These positions are tracked in <em>frames</em>, a unit representing 1/75th of a second.</p>

<p>On a real drive, they can then call <code>mPlayCue</code> to seek to the start of the position so that the drive is ready. Given the slow seek times of early CD-ROM drives, this separation forced developers to consider that the device might not actually be able to start playback as soon as they request it and take that into account with their app&rsquo;s interactive features. After starting the seek operation, the developer was meant to repeatedly call <code>mService</code> to retrieve a status code and find out whether the drive was still seeking, had finished seeking, or encountered an error. Since ScummVM is usually acting on an emulated drive without actual seek times, I simplified this. <code>mSetInPoint</code> and <code>mSetOutPoint</code> simply assign instance variables with the appropriate values, and <code>mService</code> always immediately returns the &ldquo;drive ready&rdquo; code.</p>

<p>At this point, I did what I should have done in the first place and checked the source code. As I mentioned in a <a href="https://www.mistys-internet.website/blog/blog/2022/01/06/do-you-speak-the-lingo/">previous post</a>, early Director software includes the source code as a part of the binary, and luckily that&rsquo;s true for Classical Cats. As I checked its CD-ROM helper library, I stumbled on the method that made me realize exactly where I&rsquo;d gone wrong:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>on mGetFirstFrame me, aTrack
</span><span class='line'>  put the pXObj of me into myXObj
</span><span class='line'>  if myXObj(mRespondsTo, "mGetFirstFrame") = 0 then
</span><span class='line'>    return 0
</span><span class='line'>  else
</span><span class='line'>    return  myXObj(mGetFirstFrame, aTrack)
</span><span class='line'>  end if
</span><span class='line'>end</span></code></pre></td></tr></table></div></figure>


<p>This code might be familiar to Rubyists, since Ruby has a very similar construct. This class wraps the AppleCD SC XObject, instantiated in the instance variable <code>myXObj</code>, and calls methods on it. But it&rsquo;s written defensively: before calling a number of methods, it calls <code>mRespondsTo</code> first to see if <code>myXObj</code> has the requested method. If it doesn&rsquo;t, it just stubs it out instead of erroring. Since ScummVM implements <code>mRespondsTo</code> correctly, it means this code was doing what the original authors intended: seeing that my implementation of AppleCD SC didn&rsquo;t have an <code>mGetFirstFrame</code> method, and just returning a stub value. Unfortunately for me, I was being lazy and had chosen which methods to implement based on seeing the disc try to use them&mdash;so I let myself be tricked into thinking those methods were never used.</p>

<p>As it turns out, they were actually key to getting the right timing data. Classical Cats was trying to ask the CD drive about timing information for tracks, and storing that to use to actually play the songs. With these methods missing, it was stuck without knowing where the songs were and how to play them.</p>

<p>And here I realized the great irony of what I was doing. Internally, Classical Cats thinks about its audio in terms of tracks, and asks the XObject for absolute timing data for each track. It then passes that data back into the XObject to play the songs, where ScummVM intercepts it and translates it back into track-oriented timing so its CD drive emulation knows how to play them. It&rsquo;s a lot of engineering work just to take it all full circle.</p>

<p>At the end of the day, though, what&rsquo;s important is it <em>does</em> work. Before I finished writing this, it was difficult to play Classical Cats on any modern computer; now, anyone with version 2.8.0 or later of ScummVM can give it a try. Now that it&rsquo;s more accessible, I hope other people are able to discover it too.</p>

<p>Note: CD audio support for this disc is available in nightly builds of ScummVM, and will be available in a future stable release.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Schmitz, J., &amp; Essex, J. (1994). Basic device control. In <em>Using Lingo: Director Version 4</em> (pp. 300–307). Macromedia, Inc.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Cargo-dist: System Dependencies Are Hard (So We Made Them Easier)]]></title>
    <link href="http://mistys-internet.website/blog/blog/2023/10/25/cargo-dist-system-dependencies-are-hard-so-we-made-them-easier/"/>
    <updated>2023-10-25T14:26:42-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2023/10/25/cargo-dist-system-dependencies-are-hard-so-we-made-them-easier</id>
    <content type="html"><![CDATA[<p>My latest blog post is over at my employer&rsquo;s blog post and talks about the work I&rsquo;ve done to get system dependency management integrated into <a href="https://opensource.axo.dev/cargo-dist/"><code>cargo-dist</code></a>, an open source release management tool for Rust. The new release lets users specify non-Rust dependencies in <code>Cargo.toml</code> using a Cargo-like syntax and also provides a detailed report on the resulting binary&rsquo;s dynamic linkage. Here&rsquo;s a sample of the dependency syntax:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>[workspace.metadata.dist.dependencies.homebrew]
</span><span class='line'>cmake = { targets = ["x86_64-apple-darwin"] }
</span><span class='line'>libcue = { version = "2.2.1", targets = ["x86_64-apple-darwin"] }
</span><span class='line'>
</span><span class='line'>[workspace.metadata.dist.dependencies.apt]
</span><span class='line'>cmake = '*'
</span><span class='line'>libcue-dev = { version = "2.2.1-2" }
</span><span class='line'>
</span><span class='line'>[workspace.metadata.dist.dependencies.chocolatey]
</span><span class='line'>lftp = '*'
</span><span class='line'>cmake = '3.27.6'</span></code></pre></td></tr></table></div></figure>


<p>Go <a href="https://blog.axo.dev/2023/10/dependencies">read the blog post</a> to find out more!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Untangling Another Lingo Parser Edge Case]]></title>
    <link href="http://mistys-internet.website/blog/blog/2023/05/29/untangling-another-lingo-parser-edge-case/"/>
    <updated>2023-05-29T15:53:18-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2023/05/29/untangling-another-lingo-parser-edge-case</id>
    <content type="html"><![CDATA[<p>I was testing out a new Macromedia Director CD in ScummVM, and I noticed a non-fatal error at startup:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>WARNING: ######################  LINGO: syntax error, unexpected tSTRING: expected ')' at line 2 col 70 in ScoreScript id: 2!
</span><span class='line'>WARNING: #   2: set DiskChk = FileIO(mnew,"read"¬"The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")!
</span><span class='line'>WARNING: #                                                                            ^ about here!</span></code></pre></td></tr></table></div></figure>


<p>It may have been non-fatal, but seeing an error like that makes me uneasy anyway&mdash;I&rsquo;m never sure when it&rsquo;ll turn out to have ramifications down the line. This comes from the parser for Director&rsquo;s custom programming language, Lingo, so I opened up the code in question<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> to take a look. The whole script turned out to be only three straightforward lines. The part ScummVM complained about came right at the start of the file, and at first glance it looked pretty innocuous.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>set DiskChk = FileIO(mnew,"read"¬
</span><span class='line'>"The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")
</span><span class='line'>IF DiskChk = -35 THEN GO TO "No CD"</span></code></pre></td></tr></table></div></figure>


<p>The symbol at the end of that first line is a continuation marker, which you might remember from a <a href="https://www.mistys-internet.website/blog/blog/2022/01/06/do-you-speak-the-lingo/">previous blog post</a> where I debugged a different issue with them. The continuation marker is a special kind of escape character with one specific purpose: it escapes newlines to allow statements to extend across more than one line of code, and nothing else.</p>

<p>At first I thought maybe the issue was with the continuation marker itself being misparsed, like in the error I documented in that older blog post; maybe it was failing to be recognized and wasn&rsquo;t being replaced with whitespace? To figure that out, I started digging around in ScummVM&rsquo;s Lingo preprocessor. Spoiler: it turned out <em>not</em> to be an issue with the continuation marker, but it pointed me in the right direction anyway.</p>

<p>ScummVM handles the continuation marker in two phases. In a preprocessor phase, it <a href="https://github.com/scummvm/scummvm/blob/6823514318a29fe9ec34956a97085117514a60dc/engines/director/lingo/lingo-preprocessor.cpp#L69-L73">removes the newline after the marker</a> in order to simplify parsing later. Afterwards, in the lexer, it <a href="https://github.com/scummvm/scummvm/blob/6823514318a29fe9ec34956a97085117514a60dc/engines/director/lingo/lingo-lex.l#L92">replaces the marker with a space</a> to produce a single normal line of code. The error message above contains a version of the line between those two steps: the preprocessor has combined the two lines of code into one, but the continuation marker hasn&rsquo;t been replaced with a space yet.</p>

<p>If we do the work of the preprocessor/lexer ourselves, we get this copy of the line:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>set DiskChk = FileIO(mnew,"read" "The Source:Put Contents on Hard Drive:Journey to the Source:YZ.DATA")</span></code></pre></td></tr></table></div></figure>


<p>In this form, the error is a bit more obvious than when it was spread across multiple lines. The problem is with how the arguments are passed to <code>FileIO</code>: the first two arguments are separated by a comma, but the second and third aren&rsquo;t. The newline between the second and third arguments makes it easy to miss, but as soon as we put it all together it becomes obvious.</p>

<p>In the last case I looked at, described in the previous blog post, this was an ambiguous parse case: the same line of code was valid if you added the missing comma or not, but it was interpreted two totally different ways. This time is different. If you add the missing comma, this is a normal, valid line of code; if you don&rsquo;t, it&rsquo;s invalid syntax and you get the error we&rsquo;re seeing at the top.</p>

<p>As far as I can tell, the original Director runtime actually accepts this without throwing an error even though this isn&rsquo;t documented as correct syntax. The official Director programming manual tells the user to use commas to separate arguments, but it&rsquo;s tolerant enough to support when they&rsquo;re forgotten like they are here<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>. ScummVM doesn&rsquo;t get that same luxury. As I mentioned in the previous blog post, later Director versions tightened up these ambiguous parse cases, and supporting the weird case in Director 3 would significantly complicate the parser. Since this is only the second case of this issue, though, it&rsquo;s not really <em>necessary</em> to support it either. ScummVM has builtin support for patching a specific disc&rsquo;s Lingo source code, so I was able to simply fix this by patching the code to the properly-formatted version.</p>

<p>The disc in question still doesn&rsquo;t fully work, but I&rsquo;m putting some time into it. I&rsquo;m planning on writing a followup on the other fixes necessary to get it running as expected. And for today&rsquo;s lesson? Old software is weird. Just like new software.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Before version 4, Director software was interpreted from source code at runtime&mdash;so, conveniently, that means that you can peek at the source code to any early Director software.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p><em>MacroMind Director Version 3.0: Interactivity Manual</em>. (1991). MacroMind, Inc. Page 64.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How I Dumped an Arcade Game for MAME]]></title>
    <link href="http://mistys-internet.website/blog/blog/2022/10/27/how-i-dumped-an-arcade-game-for-mame/"/>
    <updated>2022-10-27T21:06:52-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2022/10/27/how-i-dumped-an-arcade-game-for-mame</id>
    <content type="html"><![CDATA[<p>I recently had the chance to do something that I&rsquo;ve wanted to do for years: dump an arcade game to add it to MAME.</p>

<p><img src="http://mistys-internet.website/blog/images/theglad104/0005.big.png" alt="Screenshot of The Gladiator's title screen" /></p>

<p>MAME is a massive emulator that covers thousands of arcade games across the history of gaming. It&rsquo;s one of those projects I&rsquo;ve used and loved for decades. I&rsquo;ve always wanted to give back, but it never felt like I had something to contribute&mdash;until recently.</p>

<p>You might remember from <a href="https://www.mistys-internet.website/blog/blog/2021/08/14/exploring-jvs/">one of my other posts</a> that I recently got into collecting arcade games. This year I&rsquo;ve been collecting games for a Taiwanese system called the PolyGame Master (PGM), a cartridge-based system with interchangeable games sold by International Games System (IGS) between 1997 and 2005. It has a lot of great games that aren&rsquo;t very well-known, in particular some incredibly well-designed beat-em-ups.</p>

<p>A couple months ago, I bought a copy of <em>The Gladiator</em>, a wuxia-themed beat em up set in the Song dynasty. My specific copy turns out to be an otherwise-undumped game revison. Many arcade games were released in many different versions, including regional variations and bugfix updates, and it can take collectors and MAME developers years to track them all down. In the case of <em>The Gladiator</em>, MAME has the final release, 1.07, and the first few revisions, but it&rsquo;s missing most of the versions in between. When my copy came here, I booted it up and found out it was one of those versions MAME was missing: 1.04.</p>

<p>Luckily, I already had the hardware on hand to dump it. I own an EPROM burner that I&rsquo;d bought to <em>write</em> chips so that I could mod games I&rsquo;d bought, but EPROM burners can read chips as well. I own an adapter that supports several common chips that, luckily, can handle exactly the chips I needed for this game.</p>

<div style="text-align: center; width: 33%; margin: 1em auto;"><img alt="Photo of an EPROM burner with a 27C160 chip in it" src="http://mistys-internet.website/blog/images/theglad104/IMG_5291.jpeg"></div>


<p>It&rsquo;s easy to think of game cartridges as just being a single thing, but arcade game boards typically have a large number of chips. Why&rsquo;s that? It&rsquo;s partly technical; specific chips can be connected directly to particular regions of the system&rsquo;s hardware, like graphics or sound, which means that even though it&rsquo;s less flexible than an all-in-one ROM, it has some performance advantages too. The two chips I dumped here are program code for two different CPUs: one for the 68000 CPU in the system itself, and one for the ARM7 CPU in the game cartridge.</p>

<p>The other advantage is that using a large number of chips can make it easier to update a game down the line. Making an overseas release? It would be much cheaper to update just a couple of your chips instead of producing new revisions of everything on your board. Releasing a bugfix update? It&rsquo;s much more quick and painless to update existing games if all your program code is on a single chip.</p>

<p>From looking at MAME, I could tell that every other revision of <em>The Gladiator</em> used a single set of chips for almost everything. Only the two program ROM chips are different between versions, which made my life a lot easier. I was also lucky that these chips were easy to get to. Depending on the kind of game, chips might be attached straight to the board, or they might be in sockets where they can be easily removed and reattached. <em>The Gladiator</em> has two game boards, one of which uses two socketed chips. And, thankfully, those were the only two chips I had to remove and dump.</p>

<p>To remove the chips, I used an EPROM removal tool&mdash;basically just a little set of pliers on a spring, with pair of needle noses that get in under the edge of the chip in the socket so you can pull it out. The two chips were both common types that my EPROM burner supports, so once I got them out they weren&rsquo;t difficult to read. The most important chip, which has the game&rsquo;s program code, is an EPROM chip known as the 27C160&mdash;a 2MB chip in a specific form factor. I already own a reader that supports that and the 4MB version of the same chip, which you can see in the above photo. The second chip is a 27C4096DC which has a much smaller 512KB capacity.</p>

<div style="text-align: center; width: 33%; margin: 1em auto;"><img alt="Photo of an open game cartridge showing the boards and ROM chips" src="http://mistys-internet.website/blog/images/theglad104/IMG_5293.jpeg"></div>


<p>Why are there two program ROMs? Many games for the PGM use a fascinating and pretty intense form of copy protection. As I mentioned earlier, the PGM motherboard has a 20MHz 68000 processor, a 16-bit CPU that was very widely used in the 90s. The game cartridge, meanwhile, has a 20MHz ARM7 coprocessor. For early games, that ARM7 was there just for copy protection. Game cartridges would feature an unencrypted 68000 ROM and an encrypted ARM7 ROM; the ARM7 must successfully decrypt and execute code from the encrypted ROM for the main program code to be allowed to boot and run. By the end of the PGM&rsquo;s life, they&rsquo;d clearly realized it was silly to be devoting the ARM7 just to copy protection when it was faster than the CPU on the motherboard, so they put it to use for actual game logic. On games like <em>The Gladiator</em>, the unencrypted 68000 ROM and the main CPU are only doing very basic bootstrapping work and then hand off the rest of the work to the ARM7, which runs the game using code on the encrypted ARM7 chip.</p>

<p>I spent awhile fumbling around trying to get the dumped ARM7 ROM to work, but it turns out that&rsquo;s because I read it as the wrong kind. Oops. My reader has a switch that switches between the 2MB and 4MB versions of the chip&hellip; and I had it set to 4MB, even though the chip helpfully told me right on the package it&rsquo;s a 2MB chip. So, after I spent a half hour fumbling, I realize what I&rsquo;d done and went back to redump it&mdash;and <em>that</em> version worked first try. Phew.</p>

<p><img src="http://mistys-internet.website/blog/images/theglad104/0004.big.png" alt="Screenshot of The Gladiator's boot screen with the program ROM versions" /></p>

<p>Once I dumped it, I was able to figure out that one of the two program ROMs is identical to one that&rsquo;s already in MAME; only the ARM ROM is unique. That meant adding it to MAME was very easy; I could mostly copy and paste existing code defining the game cartridge, changing just one line with the new ROM and a few lines with some different metadata, and I was good to go. I submitted a <a href="https://github.com/mamedev/mame/pull/10306">pull request</a> and, after some discussion, it was merged. For something I&rsquo;ve wanted to be able to contribute to for <em>years</em>, feels good and, honestly pretty painless. And now, as of <a href="https://www.mamedev.org/?p=518">MAME 0.249</a>, <em>The Gladiator</em> 1.04 can finally be emulated!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Do You Speak the Lingo?]]></title>
    <link href="http://mistys-internet.website/blog/blog/2022/01/06/do-you-speak-the-lingo/"/>
    <updated>2022-01-06T14:58:00-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2022/01/06/do-you-speak-the-lingo</id>
    <content type="html"><![CDATA[<p>I&rsquo;ve been spending some time lately contributing to <a href="https://www.scummvm.org">ScummVM</a>, an open-source reimplementation of many different game engines that makes it possible to play those games on countless modern platforms. They&rsquo;ve recently added support for Macromedia Director, an engine used by a ton of 90s computer games and multimedia software that I&rsquo;m really passionate about, so I wanted to get involved and help out.</p>

<p><img class="crisp" src="http://mistys-internet.website/blog/images/scummvm-henachoco03/scummvm-henachoco03-mac-ja-00003.png"></p>

<p>One of the first games I tried out is <em>Difficult Book Game</em> (<em>Muzukashii Hon wo Yomu to Nemukunaru</em>, or <em>Reading a Difficult Book Makes You Sleepy</em>), a small puzzle game for the Mac by a one-person indie studio called Itachoco Systems that specialized in strange, interesting art games. Players take on the role of a woman named Miss Linli who, after falling asleep reading a complicated book, finds herself in a strange lucid dream where gnomes are crawling all over her table. Players can entice them to climb on her or scoop them up with her hands. If two gnomes walk into each other, they turn into a strange seed that, in turn, grows into other strange things if it comes into contact with another gnome. Players guide her using what feels like an early ancestor to <a href="http://www.foddy.net/Athletics.html">QWOP</a>, with separate keys controlling the joints on each of Linli&rsquo;s arms. It&rsquo;s bizarre, difficult to control, and compelling.</p>

<p>A lot of early Director games play fine in ScummVM without any special work, so I was hoping that would be true here too. Unfortunately, it didn&rsquo;t turn out to be quite that simple. I ended up taking a dive into ScummVM&rsquo;s implementation of Director to fix it.</p>

<p>Director uses its own programming language, Lingo, which is inspired by languages like <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a> and <a href="https://en.wikipedia.org/wiki/HyperCard">HyperCard</a>. HyperCard was Apple&rsquo;s hypermedia development environment, released for Macs in 1987, and was known for its simple, English-like, non-programmer friendly programming language. Smalltalk, meanwhile, is a programming language developed in the 70s and 80s known for its simple syntax and powerful object oriented features, very new at the time; it&rsquo;s also influenced other modern languages such as Python and Ruby. Lingo uses a HyperCard-style English-like way of programming and Smalltalk-style object oriented features.</p>

<p>Early versions of Director are unusual for having the engine interpret the game logic straight from source code<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>&mdash;which means if you&rsquo;ve got any copy of the game, you&rsquo;ve got the source code too. It&rsquo;s great for debugging and learning how it works, but there&rsquo;s a downside too. If you&rsquo;re writing a new interpreter, like ScummVM, it means you have to deal with writing a parser for arbitrary source code. As it turns out, every issue I&rsquo;d have to deal with to get this game working involved the parser.</p>

<p>I&rsquo;ll get into the details later, but first some background. To give a simplified view, ScummVM processes Lingo source in a few steps. First, it translates the text from its source encoding to Unicode; since Lingo dates to before Unicode was widely used, each script is stored in a language-specific encoding and needs to be translated in order for modern Unicode-native software to interpret it correctly. Next, there&rsquo;s a preprocessing stage in which a few transformations are made in order to make the later stages simpler. The output of this stage is still text which carries the same technical meaning, it&rsquo;s just text that&rsquo;s easier for the next stages to process. This is followed by the two stages of the actual parser itself: the lexer, in which source code text is converted into a series of tokens, and the parser, which has a definition of the grammar for the language and interprets the tokens from the previous stage in the context of that grammar.</p>

<p>This all sounds complicated, but my changes ended up being pretty small. They did, however, end up getting spread across several of these layers.</p>

<h3>1. The fun never ends!</h3>

<p>The very first thing I got after launching the game was this parse error:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>WARNING: ######################  LINGO: syntax error, unexpected tMETHOD: expected end of file at line 83 col 6 in MovieScript id: 0!</span></code></pre></td></tr></table></div></figure>


<p>Taking a look at the code in question, there&rsquo;s nothing that really looks too out of the ordinary:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>factory lady
</span><span class='line'>method mNew
</span><span class='line'>    instance rspri,rx,ry,rhenka,rkihoncala,rflag,rhoko,rkasoku
</span><span class='line'>end method
</span><span class='line'>method mInit1 spri
</span><span class='line'># etc</span></code></pre></td></tr></table></div></figure>


<p>This is the start of the definition of the game&rsquo;s first factory. Lingo supports object-oriented features, something that was still pretty new when it was introduced, and allows for user-defined classes called &ldquo;factories&rdquo;<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>. Following the <code>factory lady</code> definition are a number of methods, defined in a block-like format: <code>method NAME</code>, an indented set of one or more lines of method definitions, and an <code>end method</code> line.</p>

<p>That last line, it turns out, was the problem. To my surprise, it turns out those <code>end method</code> blocks are totally optional even though it&rsquo;s the documented syntax in the official Director manual. Not only can it have any text there instead of <code>method</code>, but it turns out you don&rsquo;t need any form of <code>end</code> statement at all. If ScummVM didn&rsquo;t recognize it, it seems that many games must have just skipped it.</p>

<p>Luckily, this was a very easy fix: I <a href="https://github.com/scummvm/scummvm/pull/3547">added a single line</a> to ScummVM&rsquo;s <a href="https://en.wikipedia.org/wiki/GNU_Bison">Bison</a>-based parser and it was able to handle <code>end</code> statements without breaking support for methods defined without them. I hoped that was all it was going to take for Difficult Book Game to run, but I wasn&rsquo;t quite so lucky.</p>

<h3>2. Language-dependent syntax</h3>

<p>Unlike most modern languages, Lingo doesn&rsquo;t have a general-purpose escape character like <code>\</code> that can be use to extend a line of code across multiple lines. Instead, it uses a special character called the &ldquo;continuation marker&rdquo;, <code>¬</code><sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>, which serves that purpose and is used for nothing else in the language<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup>. (Hope you like holding down keys to type special characters!) Here&rsquo;s an example of how that looks with a couple lines of code from a real application:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>global theObjects,retdata1,retdata2,ladytime,selif,daiido,Xzahyo,Yzahyo,StageNum, ¬
</span><span class='line'>           daihoko</span></code></pre></td></tr></table></div></figure>


<p>Since Lingo was originally written for the Mac, whose default MacRoman character set supported a number of &ldquo;special&rdquo; characters and accents outside the normal ASCII range, they were able to get away with characters that might not be safe in other programming languages. But there&rsquo;s a problem there, and not just that it was annoying to type: what happens if you&rsquo;re programming in a language that doesn&rsquo;t use MacRoman? This is before Unicode, so each language was using a different encoding, and there&rsquo;s no guarantee that a given language would have <code>¬</code> in its character set.</p>

<p>Which takes me back to Difficult Book Game. I tried running it again after the fix above, only to run into a new parse error. After checking the lines of code it was talking about, I ran into something that looks almost like the code above&hellip; <em>almost</em>.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>global theObjects,retdata1,retdata2,ladytime,selif,daiido,Xzahyo,Yzahyo,StageNum, ﾂ
</span><span class='line'>           daihoko</span></code></pre></td></tr></table></div></figure>


<p>Spot the difference? In the place where the continuation marker should be, there&rsquo;s something else: <code>ﾂ</code>, or the halfwidth katakana character &ldquo;tsu&rdquo;. As it turns out, that&rsquo;s not random. In MacRoman, <code>¬</code> takes up the character position <code>0xC2</code>, and <code>ﾂ</code> is at the same location in MacJapanese. That, it turns out, seems to be the answer of how the continuation marker is handled in different languages. It&rsquo;s not really <code>¬</code>, it&rsquo;s whatever character happens to be at <code>0xC2</code> in a given text encoding.</p>

<p>Complicating things a bit, ScummVM handles lexing Lingo <em>after</em> translating the code from its source encoding to UTF-8. If it lexed the raw bytes, it would be one thing: whatever the character is at <code>0xC2</code> is the continuation marker, regardless of what character it &ldquo;means&rdquo;. Handling it after it&rsquo;s been turned into Unicode is a lot harder. Since ScummVM already has a Lingo preprocessor, though, it could get fixed up there: just look for instances of <code>ﾂ</code> followed by a newline, and treat that as though it&rsquo;s a &ldquo;real&rdquo; continuation marker<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup>. A little crude, but it works, and suddenly ScummVM could parse Difficult Book Game&rsquo;s code<sup id="fnref:6"><a href="#fn:6" rel="footnote">6</a></sup>. Or, almost&hellip;</p>

<h3>3. What&rsquo;s in a space?</h3>

<p>Now that I could finally get in-game, I could start messing around with the controls and see how it ran. Characters were moving, controls were responding&mdash;it was looking good! At least until I pressed a certain key&hellip;</p>

<p><img class="crisp" src="http://mistys-internet.website/blog/images/scummvm-henachoco03/scummvm-henachoco03-mac-ja-00000.png"></p>

<p>Her arms detached&mdash;that doesn&rsquo;t look comfortable. In the console, ScummVM flagged an error that looked relevant:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Incorrect number of arguments for handler mLHizikaraHand (1, expected 3 to 3). Adding extra 2 voids!</span></code></pre></td></tr></table></div></figure>


<p>This sounded relevant, since &ldquo;hiji&rdquo; means elbow. I figured it was probably the handler called when rotating her arm around her elbow, which is exactly what visually broke. I took a look at where <code>mLHizikaraHand</code> and the similar handlers were being called, and noticed something weird. In some places, it looks like this:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>locaobject(mLHizikaraHand,(rhenka + 1),dotti)</span></code></pre></td></tr></table></div></figure>


<p>And in other places, it looked slightly different:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>locaobject(mLHizikaraHand (rhenka + 1),dotti)</span></code></pre></td></tr></table></div></figure>


<p>Can you find the difference? It&rsquo;s the character immediately after the handler name: instead of a comma, it&rsquo;s followed by a space. Now that I looked at it, the ScummVM error actually sounded right. It <em>does</em> look like it&rsquo;s calling <code>mLHizikaraHand</code> with a single argument (<code>rhenka + 1</code>). After talking it over with ScummVM dev djsrv, it sounds like this is just a Lingo parsing oddity. Lingo was designed to be a user-friendly language, and there are plenty of cases where its permissive parser accepts things that most languages would reject. This seems to be one of them.</p>

<p>Unfortunately, this parse case also seems to be different between Lingo versions. Fixing how it interprets it might have knock-on effects for parsing things created for later Director releases. Time to get hacky instead. The good news is that ScummVM has a mechanism for exactly this: it bundles patches for various games, making it possible to fix up weird and ambiguous syntax that its parser can&rsquo;t handle yet. I added patches to change the ambiguous cases to the syntax used elsewhere, and suddenly Miss Linli&rsquo;s posture is looking a lot healthier.</p>

<p><img class="crisp" src="http://mistys-internet.website/blog/images/scummvm-henachoco03/scummvm-henachoco03-mac-ja-00001.png"></p>

<p>This whole thing ended up being much more of a journey than I expected. So much for having it just run! In the end, though, I learned quite a bit&mdash;and I was able to get a cool game to run on modern OSs. I&rsquo;m continuing to work on ScummVM&rsquo;s Director support and should have more to write about later.</p>

<p>Thanks to ScummVM developers djsrv and sev for their help working on this.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Later versions switched to using a bytecode format, similar to Java or C#. This makes processing a lot easier, since bytecode produced by Director&rsquo;s own compiler is far more standardized than human-written source code.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>Despite the name, it isn&rsquo;t really implementing the <a href="https://en.wikipedia.org/wiki/Factory_method_pattern">factory pattern</a>.<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>The mathematical <a href="https://en.wikipedia.org/wiki/Negation">negation operator</a>.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
<li id="fn:4">
<p>It&rsquo;s a bit of a weird choice, but Lingo didn&rsquo;t do it first. It showed up first in Apple&rsquo;s HyperCard and AppleScript languages.<a href="#fnref:4" rev="footnote">&#8617;</a></p></li>
<li id="fn:5">
<p>Tempting as it is to refactor the lexer, I had other things to do, and I really wasn&rsquo;t familiar enough with its innards to take that on.<a href="#fnref:5" rev="footnote">&#8617;</a></p></li>
<li id="fn:6">
<p>As it turns out, this wasn&rsquo;t the only game with the same issue. Fixing this also fixed several other Japanese games, including <em>The Seven Colors: Legend of Psy・S City</em> and <em>Eriko Tamura&rsquo;s Oz</em>.<a href="#fnref:6" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Exploring JVS]]></title>
    <link href="http://mistys-internet.website/blog/blog/2021/08/14/exploring-jvs/"/>
    <updated>2021-08-14T13:54:08-07:00</updated>
    <id>http://mistys-internet.website/blog/blog/2021/08/14/exploring-jvs</id>
    <content type="html"><![CDATA[<p>Everyone had their weird pandemic hobby, right? Well, mine is that I bought an arcade video game. Not the whole cabinet - just a board, to hook up to a TV and a game controller. If I can&rsquo;t go to the arcade, at least I can bring a favourite game home, right?</p>

<p>As you might imagine, an arcade board isn&rsquo;t just plug and play; you can&rsquo;t just plug in a USB gamepad and call it a day. Finding out how to actually use it took some research. I looked up the different methods, and found I had more or less two options: a more expensive one involving repurposed arcade hardware, and an open-source one using off-the-shelf parts. While I was figuring out if the more expensive options were worth it, I decided I&rsquo;d try out the open source one, <a href="https://github.com/openjvs/openjvs">OpenJVS</a>. I started out as just a user, but ended up getting involved in development. Over the past few months, I&rsquo;ve contributed a bunch of new features and bugfixes. In the process I learned a lot about the specification&hellip; and quashed my fair share of weird bugs.</p>

<h2>So what is JVS anyway?</h2>

<p>Imagine this: you&rsquo;re an arcade machine manufacturer, and you need to make it easy for a machine operator to swap a new board into one of their cabinets. How do you make sure they don&rsquo;t have to run dozens of wires to get the controls hooked up? Video is easy; you can just use VGA, DVI, or HDMI. Power is easy too. What else is left? Well, there&rsquo;s many different kinds of input so players can actually play the game; coin inputs and money bookkeeping; smart cards to communicate information to a game; in other words, all kinds of input from the player.</p>

<p>The solution? JAMMA Video Standard, or JVS, the standard used by most arcade games since the late 90s.<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> It&rsquo;s a serial protocol that makes arcade hardware close to plug and play: connect a single cable from your cabinet&rsquo;s I/O board to your new arcade board, and you get all of your controls and coin slots connected at once.</p>

<p>If you want to use a JVS game at home, you could always put something together using an official I/O board, but - as it happens, the JVS spec is available<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>. There&rsquo;s nothing stopping you from making your own I/O board, and it turns out a few different people have. There are a few different open source options intended to work in a few different ways. I use OpenJVS, written by Bobby Dilley, which transforms a Raspberry Pi running Linux into a JVS I/O board.</p>

<h2>How does JVS work?</h2>

<p>So that&rsquo;s JVS, but how does it work? I won&rsquo;t go into deep detail, but here&rsquo;s the 10,000 meter view.</p>

<p>JVS defines communication between an I/O board and a video game board. It uses the <a href="https://en.wikipedia.org/wiki/RS-485">RS-485</a> serial standard to communicate data, meaning that it&rsquo;s possible to use a standard USB RS-485 device to send and receive commands.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup> The I/O board handles tracking the state of all of the cabinet&rsquo;s inputs, and it&rsquo;s also responsible for all the coin management: it has its own count of how much money players have spent, and the game just reads that number instead of keeping track of it itself.</p>

<p>The game board communicates with the I/O board by sending commands, then receiving packets of information in return. The game board tells the I/O board how many bytes of data to expect, then sends one or more messages as a set of raw bytes. The I/O board reads those bytes, interprets them, then sends a set of responses for each command.</p>

<p>Open source JVS implementations don&rsquo;t just emulate the things a player would normally touch in the arcade, either. They also simulate features like the service and test buttons, which are tucked away inside a cabinet where only arcade employees can normally use them. The service button allows access to the operator menu, where employees can change hardware and software settings, while the test button does exactly what it sounds like. Arcade players will never get to use these, but for home players it&rsquo;s useful to be able to do things like change how many coins a play costs, turn on free play, or change the game language. OpenJVS supports both of these, and they seemed to work when I tested them. The test button did cause OpenJVS to log a message about an &ldquo;unsupported command&rdquo;, which seemed suspicious, but Bobby and I suspected this was just the board playing fast and loose with the spec so we ignored it. (This will come up again later.)</p>

<p>OpenJVS was already basically complete and implemented all of the common parts of the protocol, so I started out as just a user. As I ran into a few bugs, I started contributing bugfixes and then implementations of more parts of the protocol.</p>

<h2>Let&rsquo;s talk money</h2>

<p>I mentioned coin management earlier. Obviously, in arcades, games are pay-to-play. Most game boards have a free play option so you can play as much as you want without paying, and a lot of home players will just switch their boards into the free play mode. There&rsquo;s nothing stopping you from emulating a coin slot though. OpenJVS lets you map a button on your controller to inserting a coin for something closer to the authentic arcade feeling. It seems like this might not be a common usecase, but the option is always there.</p>

<p><img src="http://mistys-internet.website/blog/images/jvs/coins-253.png" title="Screenshot of the arcade board's coin counter with 253 coins" ></p>

<p>Before long, I noticed a really strange bug. I&rsquo;d be playing games, and then notice that somehow I had over 200 credits in the machine. I definitely wasn&rsquo;t mashing the coin button that much, so I knew something had to be wrong. After some experimentation, I eventually figured out it only happened if you did a few specific things in exactly the right order:</p>

<ul>
<li>Insert one or more coins</li>
<li>Enter the service menu and then the input test menu</li>
<li>Exit the service menu</li>
</ul>


<p>At this point, I realized something suspicious was happening. I was always getting 200+ coins&hellip; but it was more specific than that. I was ending up with exactly 256 minus the number of coins I had when entering the service menu. For example, if I started with 3 coins, I&rsquo;d always have 253 coins after leaving the service menu. That was a pretty good sign I was seeing something in particular: integer underflow.</p>

<p><img src="http://mistys-internet.website/blog/images/jvs/io_test.png" title="Screenshot of the IO test from the service menu" ></p>

<p>Experienced programmers, feel free to skip this paragraph. But for those who aren&rsquo;t familiar: languages like C, which OpenJVS is written in, feature something called integer overflow and underflow. Number types have a maximum size which affects how many numbers it can represent. A 16-bit (or 2-byte) integer that can store only positive numbers, for example, can only represent numbers between 0 and 65535. Picture, for a moment, what happens if you ask a program to subtract 1 from a 16-bit integer that&rsquo;s already at 0, or add 1 to a number that&rsquo;s already at 65535. In C, and many other languages, it will underflow or underflow: subtract 1 from 0, and you get 65535; add 1 to 65535, and you get 0.</p>

<p>Having figured out that I was probably seeing underflow, I took a look at the protocol. Since the JVS I/O board keeps track of the money balance, the protocol provides commands for dealing with that and it seemed like the most likely place I&rsquo;d find the bug. When I dug into OpenJVS&rsquo;s implementation of the &ldquo;decrease number of coins&rdquo; command, it wasn&rsquo;t too hard to find the culprit. It was right here:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='c'><span class='line'><span class="cm">/* Prevent underflow of coins */</span>
</span><span class='line'><span class="k">if</span> <span class="p">(</span><span class="n">coin_decrement</span> <span class="o">&gt;</span> <span class="n">jvsIO</span><span class="o">-&gt;</span><span class="n">state</span><span class="p">.</span><span class="n">coinCount</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
</span><span class='line'>    <span class="n">coin_decrement</span> <span class="o">=</span> <span class="n">jvsIO</span><span class="o">-&gt;</span><span class="n">state</span><span class="p">.</span><span class="n">coinCount</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
</span><span class='line'>
</span><span class='line'><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">jvsIO</span><span class="o">-&gt;</span><span class="n">capabilities</span><span class="p">.</span><span class="n">coins</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
</span><span class='line'>    <span class="n">jvsIO</span><span class="o">-&gt;</span><span class="n">state</span><span class="p">.</span><span class="n">coinCount</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-=</span> <span class="n">coin_decrement</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>When OpenJVS received the command to decrement the number of coins by a certain amount, it tried to prevent underflows. Unfortunately, the underflow protection itself was buggy: it checked the number of coins in the first coin slot, but then decremented the number of coins in <em>every</em> slot. If slot 2 has fewer coins than slot 1, then slot 2 will end up underflowing by whatever the difference is. I still wasn&rsquo;t sure why it was trying to remove <em>256</em> coins, which seemed weird. I figured it must just be trying to clear the slot of all coins and trusting the I/O board to prevent underflows, and moved on.</p>

<p>With that bug fixed, I decided to keep working at improving coin support. While I was working on that command, I noticed that OpenJVS was ignoring a similar command. While the board usually only needs to send commands to reduce the number of coins in the balance, like when the player starts a game, it can also send a command to increase the number of coins. I&rsquo;d noticed that my game board was trying to send that command, but OpenJVS had only implemented a placeholder that logged the request and then moved on without doing anything. The quickest way to figure out what was going on was just to implement the command myself. The actual command in the spec is pretty simple:</p>

<table>
<thead>
<tr>
<th> Purpose </th>
<th> Sample </th>
</tr>
</thead>
<tbody>
<tr>
<td> Command code (always 35) </td>
<td> <code>0x35</code> </td>
</tr>
<tr>
<td> Coin slot index </td>
<td> <code>0x01</code> </td>
</tr>
<tr>
<td> Amount (first half) </td>
<td> <code>0x00</code> </td>
</tr>
<tr>
<td> Amount (second half) </td>
<td> <code>0x01</code> </td>
</tr>
</tbody>
</table>


<p>Easy enough: you can identify that it&rsquo;s the command to increase coins by the first byte, <code>35</code>, and then it tells you which coin slot to act on and how many coins to add to it. But when I was replacing the old placeholder command, I noticed something funny:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='c'><span class='line'><span class="k">case</span> <span class="nl">CMD_WRITE_COINS</span><span class="p">:</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="n">debug</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&quot;CMD_WRITE_COINS</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
</span><span class='line'>    <span class="n">size</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
</span><span class='line'>    <span class="n">outputPacket</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">outputPacket</span><span class="p">.</span><span class="n">length</span><span class="o">++</span><span class="p">]</span> <span class="o">=</span> <span class="n">REPORT_SUCCESS</span><span class="p">;</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'><span class="k">break</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>The placeholder command just reported success without doing anything, then jumped ahead in the stream by the command&rsquo;s length. But it jumped ahead <em>3</em> bytes, and according to the spec this command should be <em>4</em> bytes long. I tested OpenJVS with the command fully implemented, and noticed two things: pressing the test button now inserted 256 coins into the second coin slot, instead of doing nothing; and the &ldquo;unsupported command&rdquo; error I used to see went away. Why? When the buggy placeholder skipped ahead by three bytes, that left one byte in the buffer for OpenJVS to find and mistake for being a command. That byte, <code>0x01</code>, was actually the last byte of the &ldquo;insert coin&rdquo; command that was being sent when the test button was activated. I hadn&rsquo;t even set out to fix the &ldquo;unsupported command&rdquo; bug, but I fixed it anyway.</p>

<p>At this point I&rsquo;d fixed all the bugs I set out to, but I was still seeing something that just didn&rsquo;t feel right. When exiting the service menu, the game board now withdrew coins from the balance; when pressing the test button, the board now added coins. But the number of coins looked wrong: it was happening in increments of 256, instead of 1, and even for test commands that seemed unlikely. So I took another look at the spec, and realized the answer had been staring me in the face the entire time. Let&rsquo;s take another look at the last part of that table from earlier:</p>

<table>
<thead>
<tr>
<th> Purpose </th>
<th> Sample </th>
</tr>
</thead>
<tbody>
<tr>
<td> Amount (first half) </td>
<td> <code>0x00</code> </td>
</tr>
<tr>
<td> Amount (second half) </td>
<td> <code>0x01</code> </td>
</tr>
</tbody>
</table>


<p>The number of coins to add or remove is a two-byte, or 16-bit, value. Since JVS is communicating through single bytes, any multi-byte values have to be split up into single bytes for transmission. The spec helpfully tells us how to decode that: the numbers are stored in the big-endian, or most-significant byte first, format. But taking a look at OpenJVS&rsquo;s code shows that anywhere it was decoding these values, it was decoding them in little-endian format - in other words, it was reading the bytes backwards. What&rsquo;s 256 in little-endian format? It&rsquo;s the bytes &ldquo;0&rdquo; and &ldquo;1&rdquo;, in that order. What&rsquo;s 1 in big-endian format? It&rsquo;s those same two bytes in that same order. In other words, the game board hadn&rsquo;t been trying to add or subtract 256 coins all this time: it was just trying to add and remove single coins.</p>

<p>Putting it all together, what exactly was happening when I saw coins being added and removed? It actually turns out to be pretty simple. Pressing the test button inserts a coin in the second coin slot. When the operator menu is activated, it can be used to test the coin counter. When the operator leaves the menu, the unit sends commands to remove all of the coins that were added by the test button; this should leave it with the same number of coins as it had when they started. The strange behaviour was a combination of all the bugs working together: the test button didn&rsquo;t do anything because the command wasn&rsquo;t implemented, and the buggy bounds checking and incomplete &ldquo;remove coins&rdquo; command meant it could underflow and leave the player with hundreds of coins.</p>

<p>I originally set out to fix some pretty simple bugs, but every new bug I uncovered revealed a new one. The experience was quite a fun one. I can&rsquo;t say I ever thought I&rsquo;d be writing software for arcade machines, but not only did I fix my own problems, but I had the chance to learn more about how things work in a domain I might never otherwise have gotten the chance to touch.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>This was actually the second JAMMA standard, following a simpler standard that was used internationally between 1985 and the late 90s.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
<li id="fn:2">
<p>Including an excellent <a href="http://daifukkat.su/files/jvs_wip.pdf">English translation</a> by Alex Marshall!<a href="#fnref:2" rev="footnote">&#8617;</a></p></li>
<li id="fn:3">
<p>Mostly. JVS actually introduces an extra wire, known as the sync line, but I&rsquo;m going to ignore that here to keep the explanation simple.<a href="#fnref:3" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Bundle Install With Homebrew Magic Using Brew Bundle Exec]]></title>
    <link href="http://mistys-internet.website/blog/blog/2020/11/23/bundle-install-with-homebrew-magic-using-brew-bundle-exec/"/>
    <updated>2020-11-23T20:30:00-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2020/11/23/bundle-install-with-homebrew-magic-using-brew-bundle-exec</id>
    <content type="html"><![CDATA[<p>Has this ever happened to you? You&rsquo;re writing a project in Ruby, JavaScript, Go, etc., and you have to build a dependency that uses a system library. So you <code>bundle install</code> and then, a few minutes later, your terminal spits up an ugly set of C compiler errors you don&rsquo;t know how to deal with. After dealing with this enough times I decided to do something about it.</p>

<p><a href="https://brew.sh">Homebrew</a> already has a great tool in its arsenal for dealing with these problems. Homebrew needs to be able to build software reliably and robustly, after all - even if the user&rsquo;s system has weird software installed on it or strange misconfigurations. The <a href="https://docs.brew.sh/Formula-Cookbook#superenv-notes">&ldquo;superenv&rdquo; build environment</a> features intelligent automatic setup of build-related environment variables and <code>PATH</code>s based on just the requested dependencies, which filters out unrequested software and prevents a lot of common build failures that come from interfering software. It also uses shims for many common build tools to enforce just the right arguments passing through to the real tools.</p>

<p>So I thought to myself - we solved that problem for Homebrew builds already, right? Wouldn&rsquo;t it be nice if I could just reuse that work for other things? So that&rsquo;s what I did. Homebrew already provides the <code>Brewfile</code> dependency declaration format and the <code>brew bundle</code> tool to library dependencies with Homebrew, and as a result there&rsquo;s already a great way to get the dependency information we&rsquo;d need to produce a reliable build environment. Since <code>brew bundle</code> is a Homebrew plugin, it has access to Homebrew&rsquo;s core code - including build environment setup. Putting these together, I wrote a feature called <code>brew bundle exec</code>. It takes the software you specify in your Brewfile and builds a dependency tree out of that, then sets up just the right build flags to let anything you want use them.</p>

<p>For example, say I want to <code>gem install mysql2</code>. Often, you get something like this:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ gem install mysql2
</span><span class='line'>Building native extensions. This could take a while...
</span><span class='line'># several dozen lines later...
</span><span class='line'>linking shared-object mysql2/mysql2.bundle
</span><span class='line'>ld: library not found for -lssl
</span><span class='line'>clang: error: linker command failed with exit code 1 (use -v to see invocation)
</span><span class='line'>make: *** [mysql2.bundle] Error 1</span></code></pre></td></tr></table></div></figure>


<p>Ew, right? Let&rsquo;s make that better.</p>

<p>By creating a <code>Brewfile</code> with the line <code>brew "mysql"</code>, we can specify that we want to build against a Homebrew-installed MySQL <em>and</em> all of its dependencies. Just by running our command prefixed with <code>brew bundle exec --</code>, for example, <code>brew bundle exec -- gem install mysql2</code>, we can run that command in a build environment that knows exactly how to use its dependencies. Suddenly, everything works&mdash;no messing around with flags, no special options passed to <code>gem install</code>, and no fragile <code>bundle config</code> trickery.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ brew bundle exec -- gem install mysql2
</span><span class='line'>Building native extensions. This could take a while...
</span><span class='line'>Successfully installed mysql2-0.5.3
</span><span class='line'>Parsing documentation for mysql2-0.5.3
</span><span class='line'>Installing ri documentation for mysql2-0.5.3
</span><span class='line'>Done installing documentation for mysql2 after 0 seconds
</span><span class='line'>1 gem installed</span></code></pre></td></tr></table></div></figure>


<p>What exactly does <code>brew bundle exec</code> set? There&rsquo;s a variety of flags set which are useful for a variety of different compilers and buildsystems.</p>

<ul>
<li><code>CC</code> and <code>CXX</code>, the compiler specification flags, point to Homebrew&rsquo;s compiler shims which help ensure that the right flags are passed to the real compiler being used.</li>
<li><code>CFLAGS</code>, <code>CXXFLAGS</code>, and <code>CPPFLAGS</code> ensure that C and C++ compilers know about the header and library lookup paths for all of the <code>Brewfile</code> dependencies.</li>
<li><code>PATH</code> ensures that all of the executables installed by <code>Brewfile</code> dependencies will be found first, before any tools of the same name that may be installed elsewhere on your system.</li>
<li><code>PKG_CONFIG_LIBDIR</code> and <code>PKG_CONFIG_LIBDIR</code> ensure that the <code>pkg-config</code> tool finds <code>Brewfile</code> dependencies.</li>
<li>Buildsystem-specific flags, such as <code>CMAKE_PREFIX_PATH</code>, ensure that buildsystems can make use of the <code>Brewfile</code> dependencies.</li>
</ul>


<p>So the next time you&rsquo;re bashing your head against build failures in your project, give <code>brew bundle exec</code> a try. It might just solve your problems for you!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Hunt for the Lost Phoenix Games]]></title>
    <link href="http://mistys-internet.website/blog/blog/2019/02/20/the-hunt-for-the-lost-phoenix-games/"/>
    <updated>2019-02-20T20:22:52-08:00</updated>
    <id>http://mistys-internet.website/blog/blog/2019/02/20/the-hunt-for-the-lost-phoenix-games</id>
    <content type="html"><![CDATA[<p>Among trashgame fans, Phoenix Games are one of the most infamous game publishers of all time. They&rsquo;re famous for making games of the lowest possible quality for the lowest possible price; their most infamous &ldquo;games&rdquo; <a href="https://youtu.be/j4sMwSWoSJY">were just poorly-animated movies on a PS2 disc</a>. They&rsquo;re the kind of fascinating bad you just can&rsquo;t help but love for their own sake.</p>

<p>In their last year of life, Phoenix started publishing Nintendo DS games. They had an aggressive release schedule; in 2008 they announced plans to release 20 games inside of a year, most of which never came out. <em>Which</em> games did or didn&rsquo;t come out is still hard to figure out. Many online release lists take Phoenix Games&rsquo;s estimated release dates at face value, including many games that probably never came out. I&rsquo;ve put together a list of what I&rsquo;ve been able to confirm in the hopes that anyone with more information can help out.</p>

<p>Out of the 20 games Phoenix announced, only five definitely came out under Phoenix themselves. There are confirmed physical copies of these games and ROM dumps on the internet. Three more were licensed to other publishers after Phoenix&rsquo;s bankruptcy. The other twelve are less certain; I&rsquo;ve divided these into three categories with explanations below. Phoenix declared bankruptcy in early 2009, and I haven&rsquo;t been able to confirm that any games were released after the end of 2008. If you have any information, please contact me via email or <a href="https://twitter.com/mistydemeo">on Twitter</a>!</p>

<p>Definitely released:</p>

<ul>
<li><em>12</em> - NorthPole Studio, 2008</li>
<li><em>Adventures of Pinocchio</em> - Unknown, 2008</li>
<li><em>Peter Pan&rsquo;s Playground</em> - Unknown, 2008</li>
<li><em>Polar Rampage</em> - CyberPlanet, 2008</li>
<li><em>Valentines Day</em> - NorthPole Studio, 2008</li>
</ul>


<p>Released by other developers:</p>

<ul>
<li><em>Coral Savior</em> - NorthPole Studio; released as <em>Bermuda Triangle: Saving the Coral</em> by Storm City Games, in North America, 2010; and as <em>Bermuda Triangle</em> by Funbox Media, in Europe, 2011</li>
<li><em>Hoppie II</em> - CyberPlanet; released as <em>Hoppie</em>, by Maximum Family Games, in North America, 2011</li>
<li><em>Veggy World</em> - CyberPlanet; released by Maximum Family Games, in North America, 2011</li>
</ul>


<p>Possibly released:</p>

<p>These games had finalized cover art and announced release dates. The games in this category have been found in some online store catalogues, suggesting they may have either been released or at least had already been offered to stores.</p>

<ul>
<li><em>Jungle Gang: Perfect Plans</em> - Listed in an <a href="https://www.mcvuk.com/business/whats-new-october-17th-08">MCV article</a> as having been released October 17, 2008. Listed in a few online store catalogues as sold out.</li>
<li><em>Love Heart</em> - Reviewed by Polish site <a href="https://www.gry-online.pl/gry/love-heart/zb24cb">GRYOnline.pl</a>, who gave it a 2008 release date.</li>
</ul>


<p>Probably not released:</p>

<p>Included in release lists, but with no store pages to indicate they were ever actually released. Some of these games were given multiple release dates, probably because they were delayed and solicited multiple times.</p>

<ul>
<li><em>Dalmatians 4</em> - Listed in a <a href="http://playstationcollecting.com/forum/messageview.cfm?catid=76&amp;threadid=73644&amp;StartRow=26">PlayStation Collecting forums</a> catalogue as having been released on February 13, 2009.</li>
<li><em>Lion and the King 3</em> - Listed in a <a href="http://playstationcollecting.com/forum/messageview.cfm?catid=76&amp;threadid=73644&amp;StartRow=26">PlayStation Collecting forums</a> catalogue as having been released on both October 24, 2008 and February 13, 2009.</li>
<li><em>Monster Egg II</em> - Listed in a <a href="http://playstationcollecting.com/forum/messageview.cfm?catid=76&amp;threadid=73644&amp;StartRow=26">PlayStation Collecting forums</a> catalogue as having been released on both November 28, 2008 and March 13, 2009.</li>
<li><em>Rat-A-Box</em> - Listed in an <a href="https://www.mcvuk.com/business/whats-new-october-17th-08">MCV article</a> as having been released October 17, 2008, but no store catalogue entries.</li>
</ul>


<p>Definitely not released:</p>

<ul>
<li><em>Cinderella&rsquo;s Fairy Tale</em> - Final cover art, but never included in any release lists.</li>
<li><em>Greatest Flood</em> - No cover art or homepage.</li>
<li><em>Iron Chef II</em> - Final cover art, but never included in any release lists.</li>
<li><em>Monster Dessert</em> - No cover art or homepage.</li>
<li><em>Princess Snow White</em> - No cover art or homepage.</li>
<li><em>War in Heaven</em> - Final cover art, but never included in any release lists.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Recover From FileVault Doom]]></title>
    <link href="http://mistys-internet.website/blog/blog/2018/10/18/recover-from-filevault-doom/"/>
    <updated>2018-10-18T19:19:04-03:00</updated>
    <id>http://mistys-internet.website/blog/blog/2018/10/18/recover-from-filevault-doom</id>
    <content type="html"><![CDATA[<p>Just upgraded to Mojave? Getting the 🚫 of doom when you try to boot your Mac? Worried you&rsquo;ve lost all your data? <strong>Don&rsquo;t panic!</strong> Your Mac is recoverable, and this guide will help you get your drive back in shape.</p>

<p>This bug is caused by a new limitation introduced for FileVault encryption. As mentioned in the <a href="https://developer.apple.com/documentation/macos_release_notes/macos_mojave_10_14_release_notes">release notes</a>, any APFS volume encrypted using Mojave&rsquo;s implementation of FileVault becomes invisible to earlier versions of macOS until the encryption or decryption process completes. By itself, this is mostly harmless. However, this is rendered much worse by a bug<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> that causes the Mojave installer to incorrectly trigger reencryption of an already-encrypted FileVault volume during the install process. Because full drive encryption cannot complete in a timely manner, the Mojave installer times out before encryption completes. The result is a macOS 10.13 (or older) install with a macOS 10.14 FileVault volume that hasn&rsquo;t yet finished encrypting. This volume can&rsquo;t be read by the startup manager, and so becomes unbootable.</p>

<p>In order to fix this issue, we&rsquo;ll need to let your drive finish encrypting. This guide will walk you through the process. Before we get started, make sure you have the right supplies on hand. You&rsquo;ll need a USB key with a capacity of at least 8GB and access to a working Mac.</p>

<h3>1. Create a USB Mojave installer</h3>

<p>Download the Mojave installer application on your working Mac using the App Store. Don&rsquo;t worry, you won&rsquo;t be using that to install - just to create a USB installer for your broken Mac. Then follow <a href="https://support.apple.com/en-us/HT201372">these instructions from Apple</a>. This will erase your USB key and turn it into a bootable USB install drive.</p>

<h3>2. Boot into the USB installer</h3>

<p>Turn off the Mac that you&rsquo;re upgrading to Mojave and insert the USB drive you just created. Turn the Mac back on, and immediately hold the option key until the Startup Manager appears. Select the drive you created.</p>

<h3>3. Enter Recovery Mode</h3>

<p>The installer should provide an option to enter Recovery Mode. This is a different version of Recovery Mode than the one that&rsquo;s builtin to your Mac, and it will be able to see your hard drive.</p>

<h3>4. Unlock and mount your hard drive</h3>

<p>From the Utilities menu bar at the top of the screen, select Terminal. From the terminal prompt, type <code>diskutil apfs list</code>. In the output, you should see one volume whose FileVault status is listed as <code>FileVault: in progress</code> (or something similar). If it&rsquo;s already listed as &ldquo;unlocked&rdquo;, then you&rsquo;re good - proceed to step 5.</p>

<p>Unlock this drive by running <code>diskutil apfs unlockVolume DISK</code>, where <code>DISK</code> is the identifier listed after <code>APFS Volume Disk</code> for the volume you identified in the previous paragraph. For example, you might type <code>diskutil apfs unlockVolume disk1s1</code>. This command will ask you for a password; enter either your personal log in password, or a FileVault recovery key.</p>

<p>Once you&rsquo;ve completed this step, exit Terminal. You should be returned to the Recovery Mode main menu.</p>

<h3>5. Create a new APFS volume</h3>

<p>Open Disk Utility from the Recovery Mode main menu. Select your main hard drive, right-click it, and choose &ldquo;Add APFS Volume&rdquo;. Don&rsquo;t worry, this isn&rsquo;t repartitioning your drive - this bootable volume will live on the same volume as your main volume, and you will be able to delete it later. Name it &ldquo;Temporary&rdquo;, then click &ldquo;Add&rdquo;.</p>

<p><img src="http://mistys-internet.website/blog/images/mojave/1_volume.png" width="264" title="Creating a volume" ></p>

<p><img src="http://mistys-internet.website/blog/images/mojave/2_volume.png" width="442" title="Volume options" ></p>

<p>Once you&rsquo;ve completed this step, exit Disk Utility. You should be returned to the Recovery Mode main menu. Return to the main Mojave installer menu.</p>

<h3>6. Install Mojave on the new volume</h3>

<p>Choose to install Mojave on the volume named &ldquo;Temporary&rdquo; that you created in the previous step. This will be a temporary fresh install, so don&rsquo;t worry too much about it - go ahead and keep all the options at their defaults.</p>

<h3>7. Allow drive encryption to finish</h3>

<p>Once your install has finished and you&rsquo;ve logged in, open Disk Utility. Unlock your normal hard drive by clicking on it and clicking the &ldquo;mount&rdquo; button in the toolbar. You will be asked for a password; like in the previous step in the terminal, you will need to provide either your login password (from your old Mac install), or your recovery key.</p>

<p>At this point, drive encryption will start back up again. You will need to wait for that to finish; this could take a few hours. If you want to check on the progress, you can open a terminal and look at the output of <code>diskutil apfs list</code>; encryption is finished when you run that command and see the status read <code>FileVault: Yes</code>.</p>

<h3>8. Upgrade your main drive to Mojave</h3>

<p>Now you&rsquo;re finally (finally!) ready to let the Mojave install complete. Open up your USB drive in the Finder and run the macOS installer app. This time, instead of picking the temporary volume, pick your original Mac volume. Let the installer complete. This time, your Mac should boot up like normal. You&rsquo;re saved!</p>

<h3>9. Clean up</h3>

<p>Now that your main Mac volume is bootable again, you can get rid of the temporary volume we created in the previous step. To do that, open Disk Utility and right-click the volume named &ldquo;Temporary&rdquo;. Click &ldquo;Delete APFS Volume&rdquo;, then confirm by clicking &ldquo;Delete&rdquo; again.</p>

<p><img src="http://mistys-internet.website/blog/images/mojave/3_volume.png" width="205" title="Deleting the temporary volume" ></p>

<h3>You&rsquo;re done!</h3>

<p>With any luck, you should be all set at this point. Feel free to reach out to me if you&rsquo;re still having trouble! Big thanks to <a href="https://twitter.com/mikeymikey">@mikeymikey</a> for the suggestion of using a second APFS volume to install the temporary macOS 10.14. Thanks also to my manager, who allowed me to use my internal blog post as the basis for this post.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>I wish I could provide more detail about this bug, but as far as I know no details have been published. I&rsquo;ve filed a radar but haven&rsquo;t received a response. All I know is that this occurs, seemingly at random, to a small percentage of people who try to upgrade from macOS 10.13.6 to macOS 10.14 when FileVault is already enabled.<a href="#fnref:1" rev="footnote">&#8617;</a></p></li>
</ol>
</div>

]]></content>
  </entry>
  
</feed>
