Swooshable

Let's talk about classification of parts

Sorting your collection is hard, but what about sorting every piece that's ever been produced?

I recently had a need to check where a certain piece was available, and in what variations. Could I build with it in LDraw? And LDD? Is it available on Bricklink? This comparison was a very manual process, but it didn't have to be. So I went down the rabbit hole and began indexing all part data from fun sources.

The result is, of course, Swooshable's parts reference.

These are the resources I use most often and were interested in indexing. They're easily available and up to date:

  • Lego Digital Designer
  • LDraw
  • Mecabricks
  • Bricklink
  • Brickset
  • Rebrickable

I only had a basic understanding of part classification before doing this work. And thing is, it turns out all of these resources have different systems to do the job. How do we consolidate them as well as we can? Do we create our own system or is there one we can use and adjust the others to?

When we look closer we can narrow it all down to two underlying systems. The first one is created by LEGO themselves - let's call it the "official" system. The other one was created by the community before LEGO started to make theirs open. Let's call it the "unofficial" system. I'm of the opinion that it's easier to use the official part categorization system. It's easier to let the guys creating parts name them and use their name, rather than doing double work by naming the parts again.

Two classification systems

The official system

So what does the official system look like? We can see three defining identifiers and a number of properties. For this exercise we're primarily interested in the identifiers, which are:

  • Category
  • Design ID
  • Element ID

The design ID is a specific mold. For instance: 3004 - a 1x2 brick. Design ID says nothing about colors or materials - just the mold.

The element ID is the unique number assigned to a unique piece variant. A variant consist of a design ID, in a specific material, with a specific pattern. If you want a 3004 in medium lilac you're looking for element ID #4224854. A yellow brick with a particular smiley face is #4224854.

The category is intended to group similar design IDs together - the #3004 is commonly grouped as Bricks.

The humble 1x2 brick. White. The same 1x2 brick, but medium lilac. This is also a 1x2 brick, but yellow with a smiley face print.

This gives us a clear hierarchy. We have Categories of design IDs with variants of element IDs. Pretty neat, right?

The unofficial system

The unofficial system is pretty similar to the official one on the surface. The goal was to have the same separation:

  • Category
  • Design ID
  • Element ID

But it turns out we as a community didn't know (in some cases we still don't know) all design IDs. This made Bricklink, LDraw (and by extension Rebrickable since they use these sources) focus on giving each variation a unique number - a part ID, if you will.

This part ID is in many cases named after a specific pattern. It goes something like this: [design ID][letter][letter ID]. In LDraw they use p as a letter, for instance, for pattern. You can see another smiley face as #3004p0b, for instance.

The 3004 again. Another smiley face, another number.

Many exceptions exist. How do we handle mold differences? And some sources doesn't use p for pattern, but d for decoration. The differences are especially apparent for older bricks.

There's a lot of interesting history to read up on about the different part systems and how they came to be. It is not the focus of this article, but if you're interested I recommend these links:

Limits and oddities

The perceptive reader might already notice a few things that seems weird. What about name and color? Aren't these pretty distinctive identifiers?

Well, no. When it comes to names there's no real standard, which makes them useless for piece categorization. Everyone names their pieces differently. It doesn't matter if two parts are named "Brick 1x2" and "Linus's mother" - they could still be related. Names are a property, not an identifier.

And colors? Well, as it turns out, colors are really a community construct. TLG doesn't really think about colors - they think about materials. They still call "Tan" after a color (Brick Yellow) - but they think of it as a material with a specific ID. This lets them differ between different mixes of rubber and plastic. But the material is really just another property - not something that is useful enough to identify a piece on its own.

We've also talked a great deal about part IDs, but hardly anything about part categories. I'll make it short: the categories are handled differently everywhere. A challenge in itself.

Combining the systems

The primary challenges we have are three:

  1. How do we map parts that have different element IDs?
  2. From the sources that only gives us part ID, how can we figure out the element ID?
  3. What category system can we use?

Mapping element IDs

This is by far the hardest, as there is only one way: mapping them by hand. LDraw provides some data - it never removes a part, but indicates if it has been "moved" to another number. Rebrickable and Bricklink are otherwise good bets. There's a very active community at both places mapping parts and part relations. For my part, I decided to leave this task for the time being.

Turning part IDs into design ID and element ID

This task, however, can be solved. Given a part number from sources using the unofficial system we can attempt to extract the design ID by these rules:

  • If the part ID starts with a number (0-9) and contains a letter (a-z), the part ID is the number before the letter.
  • If the part ID starts with a letter, it is the design ID itself.

This system isn't perfect, but it handles most cases. I wrote a small script that does just this which I use for LDraw, Rebrickable and Bricklink.

Working with categories

Now, categories aren't the easiest bunch either. As said, all sources have different categories. My first attempt was to use Brickset's categories - they were imported directly from LEGO. Thing is, they weren't very usable and the mapping not the best thanks to multiple spacing and spelling errors. So I built a new system based on that with a few strategies in mind:

  • We want to use as little genres as possible (Fabuland, Belville, Scala) except when they might affect compatibility (Constraction, Duplo, System, Technic)
  • If possible, start broad and expand when needed
  • If one source mixes pieces another source might have in two categories we're forced to put them in the same category

This worked quite well. Aside from some necessary sacrifices - we had to put tiles and plates in the same category, for instance - you can see it in action in the parts reference.

Story Time

Steve and John did their job every day, yet they never stopped to consider that their tools might be too small for the task. After all: everything is cool when you're part of a team.