Upgrading Applications

  1. Basic Persistence: Application and .tap files
  2. Versioned: New Code Meets Old Data
  3. Rebuild: Loading New Code Without Restarting

Applications must frequently deal with data that lives longer than the programs that create it. Sometimes the structure of that data changes over time, but new versions of a program must be able to accomodate data created by an older version. These versions may change very quickly, especially during development of new code. Sometimes different versions of the same program are running at the same time, sharing data across a network connection. These situations all result in a need for a way to upgrade data structures.

Basic Persistence: Application and .tap files

Simple object persistence (using pickle or jelly) provides the fundamental save the object to disk functionality at application shutdown. If you use the Application object, every object referenced by your Application will be saved into the -shutdown.tap file when the program terminates. When you use twistd to launch that new .tap file, the Application object will be restored along with all of its referenced data.

This provides a simple way to have data outlive any particular invocation of your program: simply store it as an attribute of the Application. Note that all Services are referenced by the Application, so their attributes will be stored as well. Ports that have been bound with listenTCP (and the like) are also remembered, and the sockets are created at startup time (when Application.run is called).

To influence the way that the Application is persisted, you can adapt it to twisted.persisted.sob.IPersistable and use the setStyle(style) method with a string like pickle or source. These use different serializers (and different extensions: .tap and .tas respectively) for the saved Application.

You can manually cause the application to be saved by calling its .save method (on the twisted.persisted.sob.IPersistable adapted object).

Versioned: New Code Meets Old Data

So suppose you're running version 1 of some application, and you want to upgrade to version 2. You shut down the program, giving you a .tap file that you could restore with twistd to get back to the same state that you had before. The upgrade process is to then install the new version of the application, and then use twistd to launch the saved .tap file. The old data will be loaded into classes created with the new code, and now you'll have a program running with the new behavior but the old data.

But what about the data structures that have changed? Since these structures are really just pickled class instances, the real question is what about the class definitions that have changed? Changes to class methods are easy: nothing about them is saved in the .tap file. The issue is when the data attributes of a instance are added, removed, or their format is changed.

Twisted provides a mechanism called Versioned to ease these upgrades. Each version of the data structure (i.e. each version of the class) gets a version number. This number must change every time you add or remove a data attribute to the class. It must also change every time you modify one of those data attributes: for example, if you use a string in one version and an integer in another, those versions must have different version numbers.

The version number is defined in a class attribute named persistenceVersion. This is an integer which will be stored in the .tap file along with the rest of the instance state. When the object is unserialized, the saved persistenceVersion is compared against the current class's value, and if they differ, special upgrade methods are called. These methods are named upgradeToVersionNN, and there must be one for each intermediate version. These methods are expected to manipulate the instance's state from the previous version's format into that of the new version.

To use this, simply have your class inherit from Versioned. You don't have to do this from the very beginning of time: all objects have an implicit version number of 0 when they don't inherit from Versioned. So when you first make an incompatible data-format change to your class, add Versioned to the inheritance list, and add an upgradeToVersion1 method.

For example, suppose the first version of our class saves an integer which measures the size of a line. We release this as version 1.0 of our neat application:

1 2 3

class Thing: def __init__(self, length): self.length = length

Then we fix some bugs elsewhere, and release versions 1.1 and 1.2 of the application. Later, we decide that we should add some units to the length, so that people can refer to it in inches or meters. Version 1.3 is shipped with the following code:

1 2 3 4 5 6

class Thing(Versioned): persistenceVersion = 1 def __init__(self, length, units): self.length = "%d %s" % (length, units) def upgradeToVersion1(self): self.length = "%d inches" % self.length

Note that we must make an assumption about what the previous value meant: in this case, we assume the number was in inches.

1.4 and 1.5 are shipped with other changes. Then in version 1.6 we decide that saving the two values as a string was foolish and that it would be better to save the number and the string separately, using a tuple. We ship 1.6 with the following:

1 2 3 4 5 6 7 8 9

class Thing(Versioned): persistenceVersion = 2 def __init__(self, length, units): self.length = (length, units) def upgradeToVersion1(self): self.length = "%d inches" % self.length def upgradeToVersion2(self): (length, units) = self.length.split() self.length = (length, units)

Note that we must provide both upgradeToVersion1 and upgradeToVersion2. We have to assume that the saved .tap files which will be provided to this class come from a random assortment of old versions: we must be prepared to accept anything ever saved by a released version of our application.

Finally, version 2.0 adds multiple dimensions. Instead of merely recording the length of a line, it records the size of an N-dimensional rectangular solid. For backwards compatiblity, all 1.X version of the program are assumed to be dealing with a 1-dimensional line. We change the name of the attribute from .length to .size to reflect the new meaning.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

class Thing(Versioned): persistenceVersion = 3 def __init__(self, dimensions): # dimensions is a list of tuples, each is (length, units) self.size = dimensions self.name = ["line", "square", "cube", "hypercube"][len(dimensions)] def upgradeToVersion1(self): self.length = "%d inches" % self.length def upgradeToVersion2(self): (length, units) = self.length.split() self.length = (length, units) def upgradeToVersion3(self): self.size = [self.length] del self.length self.name = "line"

If a .tap file from the earliest version of our program were to be loaded by the latest code, the following sequence would occur for each Thing instance contained inside:

  1. An instance of Thing would be created, with a __dict__ that contained a single attribute .size, which was an integer, like 5.
  2. self.upgradeToVersion1() would be called, changing self.size into a string, like 5 inches.
  3. self.upgradeToVersion2() would be called, changing self.size into a tuple, like (5, inches).
  4. Finally, self.upgradeToVersion3() would be called, creating self.size as a list holding a single dimension, like [(5, inches)]. The old .length attribute is deleted, and a new .name is created with the type of shape this instance represents (line).

Some hints for the upgradeVersion methods:

For more information, look at the doc strings for styles.Versioned, as well as the app.Application class and the Application HOWTO.

Rebuild: Loading New Code Without Restarting

Versioned is good for handling changes between released versions of your program, where the application state is saved on disk during the upgrade. But while you are developing that code, you often want to change the behavior of the running program, without the slowdown of saving everything out to disk, shutting down, and restarting. Sometimes it will be difficult or time-consuming to get back to the previous state: the running program could include ephemeral objects (like open sockets) which cannot be persisted.

twisted.python.rebuild provides a function called rebuild which helps smooth this cycle. It allows objects in a running program to be upgraded to a new version of the code without shutting down.

To use it, simply call rebuild on the module that holds the classes you want to be upgraded. Through deep gc magic, all instances of classes in that module will be located and upgraded.

Typically, this is done in response to a privileged command sent over a network connection. The usual development cycle is to start the server, get it into an interesting state, see a problem, edit the class definition, then push the rebuild yourself button. That button could be a magic web page which, when requested, runs rebuild(mymodule), or a special IRC command, or perhaps just a socket that listens for connections and accepts a password to trigger the rebuild. (You want this to be a privileged operation to prevent someone from making your server do a rebuild while you're in the middle of editing the code).

A few useful notes about the rebuild process:

Finally, note that rebuild cannot currently be mixed with Versioned. rebuild does not run any of the classes' methods, whereas Versioned works by running __setstate__ during the load process and doUpgrade afterwards. This means rebuild can only be used to process upgrades that do not change the data attributes of any of the involved classes. Any time attributes are added or removed, the program must be shut down, persisted, and restarted, with upgradeToVersionNN methods used to handle the attributes. (this may change in the future, but for now the implementation is easier and more reliable with this restriction).

Index

Version: 10.2.0 Site Meter