Applications must frequently deal with data that lives longer than the programs that create it. Sometimes the structure of that data changes over time, but new versions of a program must be able to accomodate data created by an older version. These versions may change very quickly, especially during development of new code. Sometimes different versions of the same program are running at the same time, sharing data across a network connection. These situations all result in a need for a way to upgrade data structures.
Basic Persistence: Application and .tap files
Simple object persistence (using pickle
or
jelly
) provides the fundamental save the object to disk
functionality at application shutdown. If you use the Application
object, every object
referenced by your Application will be saved into the
-shutdown.tap
file when the program terminates. When you use
twistd
to launch that new .tap file, the Application object
will be restored along with all of its referenced data.
This provides a simple way to have data outlive any particular invocation
of your program: simply store it as an attribute of the Application. Note
that all Services are referenced by the Application, so their attributes
will be stored as well. Ports that have been bound with listenTCP (and the
like) are also remembered, and the sockets are created at startup time (when
Application.run
is called).
To influence the way that the Application
is persisted, you can adapt
it to twisted.persisted.sob.IPersistable
and use
the setStyle(style)
method with
a string like pickle
or source
. These use different serializers (and different
extensions: .tap
and .tas
respectively) for the
saved Application.
You can manually cause the application to be saved by calling its
.save
method (on the twisted.persisted.sob.IPersistable
adapted object).
Versioned: New Code Meets Old Data
So suppose you're running version 1 of some application, and you want to upgrade to version 2. You shut down the program, giving you a .tap file that you could restore with twistd to get back to the same state that you had before. The upgrade process is to then install the new version of the application, and then use twistd to launch the saved .tap file. The old data will be loaded into classes created with the new code, and now you'll have a program running with the new behavior but the old data.
But what about the data structures that have changed? Since these structures are really just pickled class instances, the real question is what about the class definitions that have changed? Changes to class methods are easy: nothing about them is saved in the .tap file. The issue is when the data attributes of a instance are added, removed, or their format is changed.
Twisted provides a mechanism called Versioned
to ease these upgrades.
Each version of the data structure (i.e. each version of the class) gets a
version number. This number must change every time you add or remove a data
attribute to the class. It must also change every time you modify one of
those data attributes: for example, if you use a string in one version and
an integer in another, those versions must have different version numbers.
The version number is defined in a class attribute named
persistenceVersion
. This is an integer which will be stored in
the .tap file along with the rest of the instance state. When the object is
unserialized, the saved persistenceVersion is compared against the current
class's value, and if they differ, special upgrade methods are called. These
methods are named upgradeToVersionNN
, and there must be one for
each intermediate version. These methods are expected to manipulate the
instance's state from the previous version's format into that of the new
version.
To use this, simply have your class inherit from Versioned
. You don't have to do this
from the very beginning of time: all objects have an implicit version number
of 0
when they don't inherit from Versioned. So when you first make
an incompatible data-format change to your class, add Versioned to the
inheritance list, and add an upgradeToVersion1
method.
For example, suppose the first version of our class saves an integer which measures the size of a line. We release this as version 1.0 of our neat application:
1 2 3
class Thing: def __init__(self, length): self.length = length
Then we fix some bugs elsewhere, and release versions 1.1 and 1.2 of the application. Later, we decide that we should add some units to the length, so that people can refer to it in inches or meters. Version 1.3 is shipped with the following code:
1 2 3 4 5 6
class Thing(Versioned): persistenceVersion = 1 def __init__(self, length, units): self.length = "%d %s" % (length, units) def upgradeToVersion1(self): self.length = "%d inches" % self.length
Note that we must make an assumption about what the previous value meant: in this case, we assume the number was in inches.
1.4 and 1.5 are shipped with other changes. Then in version 1.6 we decide that saving the two values as a string was foolish and that it would be better to save the number and the string separately, using a tuple. We ship 1.6 with the following:
1 2 3 4 5 6 7 8 9
class Thing(Versioned): persistenceVersion = 2 def __init__(self, length, units): self.length = (length, units) def upgradeToVersion1(self): self.length = "%d inches" % self.length def upgradeToVersion2(self): (length, units) = self.length.split() self.length = (length, units)
Note that we must provide both upgradeToVersion1
and upgradeToVersion2
. We have to assume that the
saved .tap files which will be provided to this class come from a random
assortment of old versions: we must be prepared to accept anything ever
saved by a released version of our application.
Finally, version 2.0 adds multiple dimensions. Instead of merely
recording the length of a line, it records the size of an N-dimensional
rectangular solid. For backwards compatiblity, all 1.X version of the
program are assumed to be dealing with a 1-dimensional line. We change the
name of the attribute from .length
to .size
to
reflect the new meaning.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class Thing(Versioned): persistenceVersion = 3 def __init__(self, dimensions): # dimensions is a list of tuples, each is (length, units) self.size = dimensions self.name = ["line", "square", "cube", "hypercube"][len(dimensions)] def upgradeToVersion1(self): self.length = "%d inches" % self.length def upgradeToVersion2(self): (length, units) = self.length.split() self.length = (length, units) def upgradeToVersion3(self): self.size = [self.length] del self.length self.name = "line"
If a .tap file from the earliest version of our program were to be loaded by the latest code, the following sequence would occur for each Thing instance contained inside:
- An instance of Thing would be created, with a __dict__ that contained
a single attribute
.size
, which was an integer, like5
. self.upgradeToVersion1()
would be called, changingself.size
into a string, like5 inches
.self.upgradeToVersion2()
would be called, changingself.size
into a tuple, like (5,inches
).- Finally,
self.upgradeToVersion3()
would be called, creatingself.size
as a list holding a single dimension, like [(5,inches
)]. The old.length
attribute is deleted, and a new.name
is created with the type of shape this instance represents (line
).
Some hints for the upgradeVersion
methods:
- They must do everything the
__init__
method would have done, as well as any methods that might have been called during the lifetime of the object. - If the class has (or used to have) methods which can add attributes
that weren't created in
__init__
, then the saved object may have a haphazard subset of those attributes, depending upon which methods were called. The upgradeVersion methods must be prepared to deal with this.hasattr
and.get
may be useful. - Once you have released a class with a given
upgradeVersion
method, you should never change that method. (assuming you care about infinite backwards compatibility). - You must add a new
upgradeVersion
method (and bump the persistenceVersion value) for each and every release that has a different set of data attributes than the previous release. Versioned
works by providing__setstate__
and__getstate__
methods. You probably don't want to override these methods without being very careful to call the Versioned versions at exactly the right time. It also requires adoUpgrade
function to be called after all the objects are loaded. This is done automatically byApplication.run
.- Depending upon how they are serialized,
Versioned
objects can probably be sent across a network connection, and the upgrade process can be made to occur upon receipt. (You'll want to look at therequireUpgrade
function). This might be useful in providing compability with an older peer. Note, however, thatVersioned
does not let you go backwards in time; there is nodowngradeVersionNN
method. This means it is probably only useful for compatibility in one direction: the newer-to-older direction must still be explicitly handled by the application. - In general, backwards compatibility is handled by pretending that the old code was restricting itself to a narrow subset of the capabilities of the new code. The job of the upgrade routines is then to translate the old representation into a new one.
For more information, look at the doc strings for styles.Versioned
, as well as the app.Application
class and the Application HOWTO.
Rebuild: Loading New Code Without Restarting
Versioned
is good for handling changes between
released versions of your program, where the application state is saved on
disk during the upgrade. But while you are developing that code, you often
want to change the behavior of the running program, without the
slowdown of saving everything out to disk, shutting down, and restarting.
Sometimes it will be difficult or time-consuming to get back to the previous
state: the running program could include ephemeral objects (like open
sockets) which cannot be persisted.
twisted.python.rebuild
provides a function
called rebuild
which helps smooth this cycle. It allows objects
in a running program to be upgraded to a new version of the code without
shutting down.
To use it, simply call rebuild
on the module
that holds the classes you want to be upgraded. Through deep gc
magic, all instances of classes in that module will
be located and upgraded.
Typically, this is done in response to a privileged command sent over a
network connection. The usual development cycle is to start the server, get
it into an interesting state, see a problem, edit the class definition, then
push the rebuild yourself
button. That button
could be a magic
web page which, when requested, runs rebuild(mymodule)
, or a special IRC command, or
perhaps just a socket that listens for connections and accepts a password to
trigger the rebuild. (You want this to be a privileged operation to prevent
someone from making your server do a rebuild while you're in the middle of
editing the code).
A few useful notes about the rebuild process:
- If the module has a top-level attribute named
ALLOW_TWISTED_REBUILD
, this attribute must evaluate to True. Should it be false, the rebuild attempt will raise an exception. - Adapters (from
twisted.python.components
) use top-level registration function calls. These are handled correctly during rebuilds, and the usual duplicate registration errors are not raised. - Rebuilds may be slow: every single object known to the interpreter must be examined to see if it is one of the classes being changed.
Finally, note that rebuild
cannot currently be
mixed with Versioned
. rebuild
does
not run any of the classes' methods, whereas Versioned
works by
running __setstate__
during the load process and
doUpgrade
afterwards. This means rebuild
can only
be used to process upgrades that do not change the data attributes of any of
the involved classes. Any time attributes are added or removed, the program
must be shut down, persisted, and restarted, with upgradeToVersionNN methods
used to handle the attributes. (this may change in the future, but for now
the implementation is easier and more reliable with this restriction).