NES Architecture

A practical analysis by Rodrigo Copetti

Classic edition - Last updated: November 20, 2021

Languages available: 🇬🇧 - English, 👋 - Add translation


About this edition

The ‘classic’ edition is an alternative version to the ‘modern’ counterpart. It doesn‘t require Javascript, state-of-the-art CSS or convoluted HTML to work, which makes it ideal for eBooks users, legacy internet browsers or readers who use accessibility tools.

This edition is identical content-wise. However, interactive widgets have been simplified to work with pure HTML, though these will offer an link to the original article in case the reader wants to try the ‘full version’.

As always, this article is available on Github to enable readers to report mistakes or propose changes. There‘s also a supporting reading list available to help understand the series. The author also accepts donations to help improve the quality of current articles and upcoming ones.


Table of Contents

  1. Supporting imagery
  2. A quick introduction
  3. Models and variants
  4. CPU
    1. A bit of context
      1. Ricoh’s licensing enigma
      2. Scrapped functions
    2. Memory
      1. Cartridge/game data
      2. Going beyond existing capabilities
  5. Graphics
    1. Organising the content
    2. Constructing the frame
      1. Tiles
      2. Background Layer
      3. Sprite Layer
      4. Result
    3. Secrets and limitations
      1. Multi-Scrolling
      2. Tile-Swapping
      3. Curious behaviour
  6. Audio
    1. Functionality
      1. Pulse
      2. Triangle
      3. Noise
      4. Sample
    2. Secrets and limitations
      1. Extra Channels
      2. Tremolo
  7. Games
    1. The alternative medium
  8. Anti-piracy and region lock
  9. That’s all folks
  10. Referencing
  11. Sources / Keep Reading
  12. Contributing
  13. Changelog

Supporting imagery

Model

Image
The NES
Released on 18/10/1985 in America and 01/09/1986 in Europe
Image
The Famicom
Released on 15/07/1983 in Japan

Motherboard

Image
Motherboard
Showing the 'NES' variant
Image
Motherboard with important parts labelled

Diagram

Image
Main architecture diagram

A quick introduction

At first glance, the NES appears to be just another 6502 computer, with a sophisticated case and a controller.

And while this is technically true, let me show you why the CPU is not the central part of this system.


Models and variants

Nintendo ended up shipping lots of different variants of the same console across the world [1] and even though they all share the same architecture, many look dramatically different and some may include built-in accessories. So, to keep it simple for this article, I’ll focus on the two most popular revisions:

Because the author grew up with the ‘NES’ name, I’ll default to using that term to refer to the console in general, but I will switch to the ‘Famicom’ name when referring to unique capabilities only found in the Japanese variant.


CPU

The NES’s CPU is a Ricoh 2A03 [3], which is based on the popular 8-bit MOS Technology 6502 and runs at 1.79 MHz (or 1.66 MHz in PAL systems).

A bit of context

The CPU market in the late 70s and early 80s was quite diverse. If a company wanted to build an affordable microcomputer, the following options were available:

As if these options weren’t enough, another company named MOS appeared on the market and offered a redesigned version of the 6800: the 6502. While incompatible with the rest, the new chip was much much less expensive to produce [5][6] and it was only a matter of time before the most famous computer makers (Commodore, Apple, Atari, Acorn and so forth) chose the 6502 to power their machines.

Back in Japan, Nintendo needed something inexpensive but familiar to develop for, so they selected the 6502. Ricoh, their CPU supplier, successfully produced a 6502-compatible CPU.

Ricoh’s licensing enigma

How Ricoh managed to clone the 6502 isn’t clear to this day. One would expect MOS to have licensed the chip design to Ricoh, but there are many contradictions to this:

Scrapped functions

The Ricoh 2A03 omits the Binary-Coded Decimal (BCD) mode originally included in the 6502 [10]. BCD encodes each decimal digit of a number as a separate 4-bit binary. The 6502 uses 8-bit ‘words’ – meaning that each word stores two decimal digits.

As an example for the curious, the decimal number 42 is represented as:

We could go on and on talking about it, but to give an outline: BCD is useful for applications that require treating each decimal place separately (for instance, a digital clock). However, it requires more storage since each word can only encode up to the decimal number 99 (whereas traditional binary can encode up to 255 with a eight-bit word).

In any case, Ricoh deliberately broke BCD mode in its chip by severing the control lines that activate it. This was presumably done to avoid paying royalties to MOS, since BCD was patented by them (and the legislation that enabled to copyright integrated circuit layouts in the United States wasn’t enacted until 1984 [11]).

Memory

Both Ricoh 2A03 and MOS 6502 feature an 8-bit data bus and a 16-bit address bus, which allows them to access up to 64 KB of memory. So, how did Nintendo fill that memory space?

On one side, the motherboard contains a chip providing 2 KB of Static RAM (SRAM) [12]. Nintendo calls this area ‘Work RAM’ (WRAM) and can be used to store:

On the other side, the components of the system are memory-mapped [13], meaning that they are accessed using memory addresses, thus, they occupy part of the CPU’s address space. The CPU’s memory space is therefore filled with addresses pointing to the game cartridge, WRAM, the PPU, the APU and two controllers (don’t worry about each component, as they are explained throughout this article).

Cartridge/game data

Just in case you don’t know, NES games are distributed in the form of cartridges, and the cartridge’s buses connect directly to the CPU.

Nintendo wired up the cartridge lines in a way that only 49120 Bytes (~ 49.97 KB) of cartridge data can be accessed [14]. Now, what do I mean with ‘cartridge data’? Well, any chip connected to those buses, for instance:

The fact there are different combinations comes down to the fact the CPU doesn’t care about what kind of component is reading from, it only sees memory locations. So is up to game studios to choose (or come up with) a feasible layout to fit their game in.

Image
PCB of Super Mario Bros [15].
Image
The same picture with important parts labelled.
The meaning of the ‘Lockout’ chip is explained in the ‘Anti-piracy’ section.

For example, Nintendo’s ‘Super Mario Bros’ used a layout they call NES-NROM-256 and consists of 32 KB of program ROM and 8 KB of ‘Character RAM’ for graphics (we’ll see more about it in the ‘Graphics’ section) [16]. NES-NROM-256 was also prepared to house up to 3 KB of extra WRAM, though the game doesn’t make use of it.

Going beyond existing capabilities

One of the big limitations of 16-bit address buses (affecting 3rd and 4th generation consoles) is their compact address space. Nowadays, 32-bit computers can address up to 4 GB of memory (and 64-bit machines lavishly enjoy up to 16 exabytes) so this is not a concern anymore, but back then the NES only had a 64 KB address space and a great part of it was consumed by the memory-mapped hardware (something competitors avoided).

So, did this mean game studios could only develop games that didn’t exceed the 49.97 KB limit? Absolutely not! If history taught us something, is that there’s always a clever solution to a challenging problem; and this issue was tackled with a Mapper.

Image
Simplified representation of how a mapper extends the addressing capabilities of the CPU.
With the inclusion of a mapper, the CPU can access extra banks (groups of addresses) of a large Program ROM. Although the game/program has the new task of manually switching between banks whenever needed.
Image
The same setup but without a mapper installed.
While simpler and inexpensive, the CPU can only access a finite number of banks.

A mapper is an extra chip included in the cartridge that sits between the memory chips and the console’s address lines. Its main job is to extend the address space so developers may fit more chips. This is done by bank switching: Memory addresses are grouped into banks, and the mapper provides switches (controlled through memory addresses) to alternate between banks. Now, the CPU still sees the same amount of memory, so it’s the game that’s been programmed with a mapper present in charge of operating it. Due to their cost-effectiveness, mappers were the order of the day in 80s-to-early 90s technology.

Image
PCB of Super Mario Bros 3 [17].
Image
The same picture with important parts labelled.
At first, I thought the extra WRAM was for storing saves, but then I realised there’re no saves in this game (and there isn’t a battery either). In reality, that RAM chip is used to store a decompressed level.

Back to the NES, a famous example is ‘Super Mario Bros 3’ which shipped with the ‘MMC3’ mapper (made by Nintendo) in its cartridge. For comparison, MMC3 provided up to 512 KB of space for the Program ROM, up to 256 KB for Character memory and up to 8 KB for extra WRAM [18]. You can now see why ‘Super Mario Bros 3’ differs significantly in quality compared to the first instalment.

All in all, while this console may appear limited while examining its internal features, Nintendo made sure it could adapt as technology evolves. On the other side, while this technique helped to keep the costs down of the console, it shifted part of the burden to the game cartridge. So, game quality and cartridge costs were two concerns game studios had to balance.


Graphics

Graphics are generated by a proprietary chip called the Picture Processing Unit (PPU). This is one of the chips that give the NES an identity. To put it in another way, anyone can pick up a 6502 CPU from the hardware store, so why is the NES any different from, say, an Apple 2 or a Commodore 64? Well, what distinguishes the NES from other machines are the chips that surround the CPU: The PPU and the APU. These make up the NES' unique graphics and audio capabilities, respectively.

That being said, the PPU renders 2D graphics called sprites and backgrounds, outputting the result to the video signal.

Organising the content

Image
Memory architecture of the PPU

To render something on the screen, the PPU must know what graphics to draw, where in the screen to place them; and how to draw them (i.e. which palette to use).

To answer these questions, the PPU came pre-programmed with a different memory map that looks for the following type of data:

Don’t worry about the new terminology, the meaning of these data structures is discussed step-by-step in the following paragraphs.

Constructing the frame

As with its contemporaries, this chip is designed for the behaviour of a CRT display. There is no frame buffer as such: the PPU will render in step with the CRT’s beam, building the image on-the-fly.

The PPU draws frames with a fixed dimension of 256x240 pixels [19]. Alas, due to the discrepancies of analogue video standards across the world, the image will differ in appearance depending on the region of the appliance (NTSC or PAL) from which is displayed. In a nutshell, NTSC televisions will crop the top and bottom edges to accommodate overscan (only ~224 scanlines are visible), so these edges are considered ‘danger zones’ by developers when deciding where to place elements on the game. On the other hand, PAL tellies won’t crop the edges but will show extra black bars to fill the taller signal (PAL uses 288 scanlines).

Behind the scenes, the frame that the PPU outputs is composed of two different layers. For demonstration purposes, let’s use Super Mario Bros. to show how this works:

Tiles

Image
Two pattern tables with multiple tiles squashed together.
Image
A single tile.
Tiles Found in Character ROM.
(For demonstration purposes a default palette is being used).

To begin with, the PPU uses tiles as a basic ingredient for producing sprites and backgrounds.

The NES defines tiles as basic 8x8 pixel maps, these are stored in Character memory (residing in the game cartridge) and organised into a big data structure called Pattern Table [20]. Each tile occupies 16 B and a Pattern table houses 256 tiles [21]. Since the PPU addresses up to 8 KB of Character memory, it can access up to two Pattern tables.

Inside a tile, each of its pixels is encoded using a 2-bit value, which references one of four colours from a palette. Programmers can define up to eight palettes (four for background and the other for sprites). The colours referenced on each palette point to a ‘master palette’ made of 64 colours [22], representing all the colours this console can produce. Palettes are made of four colours, though one is reserved for transparent.

To start drawing something on the screen, games populate a set of tables with references to tiles in Character memory. Each table is responsible for one layer (sprite or background) of the frame. Then, the PPU reads from those tables and composes the scanlines that will be beamed by the CRT gun.

I will now explain how each layer/table works and how do they differ in terms of functionality.

Background Layer

Image
Allocated background map.
Image
Allocated background map with selected area marked.
Background map set up with vertical mirroring, which enables smooth horizontal scrolling. However, only one half can be used.

The background layer is a 512x480 pixel map containing static tiles [23]. You may recall that the viewable frame is much smaller, so the game decides which part of the layer is selected for display. Games can also move the viewable area during gameplay; that’s how the scrolling effect is accomplished.

To save memory, groups of four tiles are combined into 16x16 pixel maps called blocks, within which all tiles share a colour palette.

Nametables (stored in VRAM) specify which tiles to display in the background layer. The PPU looks for four 1024-byte Nametables, each one corresponding to a quadrant of the layer. However, there’s only 2 KB of VRAM available! As a consequence, only two Nametables can be stored without additional hardware from the cartridge. Though the remaining two still have to be addressed somewhere: most games just point the remaining two where the first two are (this is called mirroring).

While this architecture may seem flawed at first, it was designed to keep costs down while providing simple expandability: if games needed a wider background, extra VRAM could be included in the cartridge.

The last bytes of each Nametable store a 64-byte Attribute table that specifies which colour palette is assigned to each block [24].

Sprite Layer

Image
Rendered sprite layer

Sprites are tiles that can move around the screen. They can also overlap each other, or appear behind the background. The viewable graphic will be decided based on its priority value (it’s the same concept as ‘layers’ in traditional graphic design software).

The Object Attribute Memory (OAM) table specifies which tiles will be used as sprites [25]. In addition to the tile index, each entry contains an (x,y) position and multiple attributes (colour palette, a priority and flip flags). This table is stored in a 256-byte DRAM found in the PPU chip.

The OAM table can be filled by the CPU. However, this can be pretty slow in practice (and risks corrupting the frame if not done at the right time), as a consequence, the PPU contains a small component called Direct Memory Access or ‘DMA’ which can be programmed (by altering the PPU’s registers) to fetch the table from WRAM. With DMA, it’s guaranteed that the table will be uploaded when the next frame is drawn, but bear in mind that the CPU will be halted during the transfer!

The PPU is limited to eight sprites per scanline and up to 64 per frame. The scanline limit can be exceeded thanks to the PPU’s multiplexing skills. In other words, the PPU will automatically alternate sprites between scans; however, they will appear to flicker on-screen.

Result

Image
Tada!

Once the frame is finished, it’s time to move on to the next one!

However, the CPU can’t modify any table that’s being used by the PPU, so when all scanlines are completed the vertical blank interrupt is triggered. This allows the game to update the tables without tearing the picture currently displayed. At that moment the CRT’s beam is pointing below the visible area of the screen, into the overscan (or bottom border area).

Secrets and limitations

If you’re thinking that a frame-buffer system with memory allocated to store the full-frame would have been preferable: RAM costs were very high, and the console’s goal was to be affordable. Let me now show you why this design proved to be very efficient and flexible.

Multi-Scrolling

Image
Super Mario Bros. 2
Nametable setup for vertical scrolling (horizontal mirroring)
The character has to climb the mountain. The viewable area is at the bottom, and the PPU has time to render the top.
Image
Super Mario Bros. 3 Mario can run and fly, so the PPU needs to scroll diagonally. Notice the right edge showing the wrong colour palette! The left edge has a mask applied.

Some games require the main character to move vertically – thus the nametable will be set up with horizontal mirroring. Other games need their character to move left and right, and so use vertical mirroring instead.

Either type of mirroring will allow the PPU to update background tiles without the user noticing: there is plenty of space to scroll while new tiles are being rendered at a distance.

But what if the character wants to move diagonally? The PPU can scroll in any direction, but without extra VRAM, the edges are forced to share the same colour palette (remember that tiles are grouped in blocks).

This is why some games like Super Mario Bros. 3 show strange graphics at the right edge of the screen while Mario moves (the game is set up for vertical scrolling) [26]. It’s possible that they needed to minimise the hardware cost per cartridge (as this game has already a powerful mapper installed).

As an interesting fix: the PPU allowed developers to apply a vertical mask on top of tiles, effectively hiding part of the glitchy area.

Tile-Swapping

Image
Hypothetical if it were rendered using tiles available during the first scan lines.
Image
Hypothetical if it were rendered using tiles available during the later scan lines.
Image
Actual frame displayed to the user.

Another speciality of Super Mario Bros. 3 is the amount of graphics it could display.

This game displays more background tiles than is strictly permitted. So how is it doing that? If we take two screen captures at different times while the display is generated, we can see that the final frame is actually composed of two different frames.

This is another wizardry of the MMC3 mapper, which was not only used to access extra space in the Program ROM, but also extends the Character ROM space by connecting two different Character chips. By checking which part of the screen the PPU is requesting, the mapper will redirect to one chip or the other – thus allowing more unique tiles on-screen than was originally supported [27].

Curious behaviour

Throughout my research, I came across many interesting articles that explain unusual behaviour of the PPU, so I thought in mentioning some here:


Audio

A dedicated component called Audio Processing Unit (APU) provides this service [32]. Ricoh embedded it inside the CPU chip presumably to avoid unlicensed cloning of both CPU and APU.

Functionality

This audio chip is a Programmable Sound Generator (PSG), which means that it can only produce pre-defined waveforms. It’s controlled by the CPU.

The APU sequences audio data over five channels of audio – each one reserved for a specific waveform. The music data is found in the Program ROM. Each waveform contains different properties that can be altered to produce a specific note, sound or volume. These five channels are continuously mixed and sent through the audio signal.

Moreover, the Famicom model contains cartridge pins that send the mixed audio signal to the cartridge, so the latter can mix it with extra channels (requiring extra chips) [33].

Let’s now discuss the type of wave-forms synthesised by the APU [34]:

Pulse

Pulse 1 channel.
Pulse 2 channel.
All audio channels.
Mother (1989).

Pulse waves have a very distinct beep sound that is mainly used for melody or sound effects.

The APU reserves two channels for pulse waves. Each can use one of three different voices, produced by varying their pulse widths.

Most games used one pulse channel for melody and the other for accompaniment.

When the game needs to play a sound effect, the accompaniment channel is switched to play the effect and then returns to accompanying. This avoids interrupting the melody during gameplay.

Triangle

Triangle channel.
All audio channels.
Mother (1989)

This waveform serves as a bassline for the melody. Modifying its pitch dramatically can also produce percussion.

The APU has one channel reserved for this type of wave.

The volume of this channel can’t be controlled, possibly because the volume control is used to construct the triangle.

Noise

Noise channel.
All audio channels.
Mother (1989)

Noise is basically a set of random waveforms that sound like white static. One channel is allocated for it.

Games use the noise channel for percussion or ambient effects.

This channel has only 32 presets available. Half (16) of these presets produce clean static, and the other half produce robotic static.

Sample

Sample channel.
All audio channels.
Mother (1989)

Samples are recorded pieces of music that can be replayed. As you can see, samples is not limited to a single waveform, but they weight a lot more space.

The APU has one channel dedicated to samples. With the APU, samples are limited to 7-bit resolution (encoded with values from 0 to 127) and ~15.74 KHz sampling rate [35]. To program this channel, games can either stream 7-bit values (which steals a lot of cycles and storage) or use delta modulation to only encode the variation between the next sample and the previous one.

The delta modulation implementation in the APU only receives 1-bit values, this means games can only tell if the sample increments or decrements by 1 every time the counter kicks in. So, at the cost of fidelity, delta modulation can save games from having to stream continuous values to the APU.

Since programming this channel takes longer space and CPU cycles, games normally store small pieces (like drum samples) that can be played repeatedly. But throughout the lifetime of the NES, many studio have come up with clever uses of this channel.

Secrets and limitations

While the APU was not comparable to the quality of a vinyl, cassette or CD, programmers did find ways of expanding its capability, thanks to the modular architecture of the NES.

Extra Channels

Castlevania III (USA/Europe, 1989)
Akumajō Densetsu (Japan, 1989)

Remember that the Famicom provided exclusive cartridge pins available for sound expansion? Well, games like Castlevania 3 took advantage of this and bundled an extra chip called Konami VRC6, which added two extra pulse waves and a sawtooth wave to the mix.

Take a look at this video which shows the difference between the Japanese version and the American versions the game (the latter runs on the NES, which didn’t provide sound expansion capabilities).

Tremolo

Final Fantasy III (1990)

Instead of incrementing cartridge costs, some games prioritised creativity over technology to add more channels.

In this example, Final Fantasy III came up with the idea of using tremolo effects to ‘simulate’ extra channels.


Games

NES games are mainly written in the 6502 assembly language and reside in the Program ROM, while the game’s graphics (tiles) are stored in Character memory.

Furthermore, games were sold (or rented out) at retail stores under the approval of Nintendo.

The alternative medium

Even though it was only released in Japan, I thought this would be a good opportunity to introduce a short-lived but peculiar add-on that, just like the mappers, brought in more capabilities to this console. This peripheral was called Famicom Disk System (FDS) and shipped in 1986 (~3 years after the Famicom). It had the shape of an external floppy reader and came bundled with an odd-shaped cartridge called ‘RAM adapter’.

Image
The drive, where floppies are inserted (the photo shows a cardboard floppy inserted for protection). It either runs on six C batteries (1.5 V each) or a 3.6 W AC adapter.
Image
The RAM adapter, fitted on the cartridge slot of the Famicom and connected to the drive though a cable.
The two components that make up the Famicom Disk System (FDS).

The Famicom Disk System added the following services to the Famicom:

Image
FDS equipment mounted onto the Famicom.

Since the floppy is a single medium (as opposed to the multi-chip cartridges), all game data needs to be squashed inside, though it’s kept organised with the use of a proprietary file system.

Nonetheless, the Famicom/NES strictly requires segregated Program and Character memory to function, so that’s the job of the ‘RAM adapter’ to sort out. This component houses 32 KB of Program RAM and 8 KB of Character RAM to buffer the game data read from the floppy disc, and in doing so, it enables the console to read from it as if it were a cartridge-based game.

To operate the drive, the RAM adapter embeds an additional 8 KB ROM to store a BIOS [38]. This program performs the following tasks:

Image
Example of two retail games for the FDS.
The blue ‘flavour’ of the disc was also dust proof.
The FDS splash animation, waiting for the user to… uhm… insert a game.

During that era, Nintendo deployed some ‘kiosks’ at retail stores so users could bring their floppies and overwrite them with a new game at a reduced price.

Unfortunately, after a few years of lifespan, the Famicom Disk System was discontinued and subsequent games returned to the cartridge medium. On the bright side, there were new mappers available with similar (or better) capabilities compared to the FDS' functions.


Anti-piracy and region lock

Nintendo was able to block unauthorised publishing thanks to the inclusion of a proprietary lockout chip called the Checking Integrated Circuit (CIC) [39]. It is located in the console and is connected to the reset signal (and thus not easily removed).

This chip runs 10NES, an internal program that checks for the existence of another lockout chip in the game cartridge. If that check fails then the console is sent into an infinite reset.

Both lockout chips are in constant communication during the console’s uptime. This system can be defeated by cutting one of the pins on the console’s lockout, which leaves the chip in an idle state. Alternatively, sending it a -5V signal can freeze it.

The CIC exists as a result of the fear caused by the video game crash of 1983. Nintendo’s then-president Hiroshi Yamauchi decided that in order to enforce good quality games, they would be in charge of approving every single one of them [40].

You’ll notice that the Japanese model of the console, the Famicom, was released before 1983’s crash. That’s why neither game cartridges nor the console includes CIC circuitry [41], instead, the pins are used for optional sound expansion.


That’s all folks


Contributing

This article is part of the Architecture of Consoles series. If you found it interesting then please consider donating. Your contribution will be used to fund the purchase of tools and resources that will help me to improve the quality of existing articles and upcoming ones.

Donate with PayPal Become a Patreon

A list of desirable tools and latest acquisitions for this article are tracked in here:

Interesting hardware to get (ordered by priority)

Alternatively, you can help out by suggesting changes and/or adding translations.


Referencing

This work is licensed under a Creative Commons Attribution 4.0 International License. You may use it for your work at no cost, even for commercial purposes. But I ask that you reference it properly. Please take a look at the following citation guidelines:

Article information

For any referencing style, you can use the following information:

For instance, using the IEEE style:

[1]R. Copetti, “NES Architecture - A Practical Analysis”, Copetti.org, 2019. [Online]. Available: https://classic.copetti.org/writings/consoles/nes/. [Accessed: day- month- year].

and Harvard style:

Copetti, R., 2019. NES Architecture - A Practical Analysis. [online] Copetti.org. Available at: https://classic.copetti.org/writings/consoles/nes/ [Accessed day month year].

Special use in multimedia (Youtube, etc)

I only ask that you include at least the author’s name, title of article and URL of article, using any style of choice.

You don’t have to include all the information in the same place if it’s not feasible. For instance, if you use the article’s imagery in a Youtube video, you should state either the author’s name or URL of article at the bottom of the image, and then include the complete reference in the video description.

Appreciated additions

If this article has significantly contributed to your work, I would appreciate it if you could dedicate an acknowledgement section, just like I do with the people and communities that helped me.

This is of course optional and beyond the requirements of the CC license, but I think it’s a nice detail that makes us, the random authors on the net, feel part of something bigger.


Sources / Keep Reading

General

CPU

Graphics

Audio

Games

Anti-Piracy

Photography


Changelog

It's always nice to keep a record of changes.

2021-11-10

Super revamp:

2021-04-29

2020-08-23

2020-06-13

2020-06-06

2019-09-17

2019-04-06

2019-02-17

2019-01-25