Armed with a text editor

mu's views on program and recipe! design

December 2005

Developing Mutagen (3): Data-Driven Metaprogramming Posted 2005.12.06 21:18 PST

Yesterday I described a lot of the complexities of the ID3v2 frame structures. By grouping frames into text frames, URL frames, and other frames, and then subgrouping appropriately, it's possible to avoid some code redundancy. But considering the number of remaining frames, it's really not enough. It would also be a royal pain to test all the copy-pasted implementations. So what can we do?

FrameSpecs

Mutagen solves this with data-driven metaprogramming on a _framespec. (Don't bother looking for this in the ID3 specifications; it's a Mutagen construct). This member stores a list of specifiers (specs). Each spec is an instance of a class derived from Spec. Although each spec knows only four things, the four things are enough to provide really rich functionality. It knows:

Using specs

To tie this down let's look at how to implement a text frame class. First we'll examine the following framespec, which corresponds directly to the text frame structure of one byte specifying encoding followed by a payload of text data:

_framespec = [
  EncodingSpec('encoding'),
  EncodedTextSpec('text'),
]

The EncodingSpec knows how to read a byte off the stream, and return its interpreted integer value. The EncodedTextSpec knows how to read the text data, decode it through the stored encoding, and return it. To load the value of a single frame, the base class Frame exposes a factory function: give it some information about the tag and the frame, and also pass it the payload data of the frame, and it will return a instance filled with the data from the payload bytestream. This factory function then iterates across all its framespecs, calls its read method, and stores its return value on the frame instance as a member data value with the spec's name.

MultiSpec

Since the spec receives the frame whose data it is reading or writing, you might wonder why it doesn't store the value directly. The reason: flexibility. The framespec above isn't sufficient. While most text frames may have a single value, the ID3v2.4 spec allows for multiple strings separated by NULLs in most cases. In other scenarios it allows for multiple pairs of such strings. Rather than understand the NULLs in client code, or build list behavior into each relevant spec, the indirection allows us to put all the logic in one spec.

MultiSpec is a meta-spec. It doesn't know how to read from or write to a bytestream, though it does have a name. It is the one spec that stores multiple values as a list. When it's asked to read a bytestream, it iterates over its sub-specs calling their read methods as if it were a Frame, as many times as is necessary to exhaust the available data. So using MultiSpec, the real TextFrame._framespec is:

_framespec = [
  EncodingSpec('encoding'),
  MultiSpec('text', EncodedTextSpec('text'), sep='\u0000'),
]

Reading a frame

Now when SomeTextFrame.fromData is called to read a frame, the process will be as follows:

Writing a frame

Writing a frame has symmetrical steps:

Extensibility

Doing things this way makes it really easy to add additional frames to Mutagen. If the frame structure is composed of already specified piece, just reuse the spec. If there's a new one, implement it as a new spec. List out the specs as they can appear in the frame, and it not only just works, it's practically documented. For instance the PairedTextFrames (TIPL and TMCL) have

_framespec = [
  EncodingSpec('encoding'),
  MultiSpec('people',
    EncodedTextSpec('involvement'),
    EncodedTextSpec('person))
]

TXXX has

_framespec = [
  EncodingSpec('encoding'),
  EncodedTextSpec('desc'),
  MultiSpec('text', EncodedTextSpec('text'), sep=u'\u0000')
]

URL frames (other than WXXX which has an encoded text description) have just

_framespec = [
  Latin1TextSpec('url')
]

Even APIC is almost understandable just by reading its framespec.

_framespec = [
  EncodingSpec('encoding'),
  Latin1TextSpec('mime'),
  ByteSpec('type'),
  EncodedTextSpec('desc'),
  BinaryDataSpec('data')
]

Next entry I'll cover some more benefits of this structure; data driven metaprogramming is an exceptional fit for Mutagen.

(0 Comments ) (0 Trackbacks) mutagen