[Twisted-Python] twisted.news cleanups and tests
Abe Fettig
abe at fettig.net
Sun Apr 13 22:44:13 MDT 2003
On Sun, 2003-04-13 at 02:16, Glyph Lefkowitz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> On Saturday, April 12, 2003, at 04:50 PM, Jp Calderone wrote:
>
> > ... throughout Twisted wherever email-like messages need to be dealt
> > with ...
>
> Speaking of such message-unifying work:
>
> Moshe and I were recently on IRC, discussing a unification of several
> disparate message representations within Twisted. We're not exactly
> sure where this is going, but in a future release, the .mail, .words,
> and .news packages should all be unified, probably under the heading of
> "words", in order to present a coherent messaging API for things like
> bots which don't care whether you're talking to them through email,
> netnews, IRC, AOL IM, or whatever.
As some of you know, I'm developing a Twisted application called Hep.
Hep is a multiprotocol message server that lets users set up connections
to various messaging systems (currently: RSS, POP3, SMTP, Blogger API,
and local maildir storage) and then read/write to those connections
through various protocols (currently: POP3, SMTP, NNTP). Hep also has a
web server with a basic web interface for managing things.
Anyhow, since I've been working on this project I've been thinking a lot
about abstraction layers for messages. So here are some incomplete
late-night thoughts:
* First, I define 'message' very loosely: a short text communication
written by one person for one or more other persons to read.
* Formats, protocols, and storage are loosely coupled. rfc822 is a
format for messages. So is RSS. So is HTML (if you want to look at it
that way), or plain .txt. SMTP, NNTP, IMAP, or POP3 can all be used to
transfer rfc822 messages. HTTP is typically the protocol used to
transfer RSS and HTML. You can store messages in plain directories,
specialized directory layouts like maildir, file formats like RSS and
mbox that support multiple messages, databases, etc.
* All types of messages, by definition, have a body (although it may be
left blank).
* Almost all messages can be looked at in terms of "from" and "to".
Even personal-use text files on the local system. Even web pages,
although the "from" might be unkown, and the "to" might be "anyone in
the world who happens to be interested in this"
* Most, but not all, message formats support the idea of a
title/subject.
* Messages can contain other messages.
* You can't rely on getting a (correct) mime-type along with the message
- I've had to resort to letting my format handlers try to figure out for
themselves whether they can handle a given file if the mime-type is
unknown.
* For messaging protocols, the basic operations are "scan" (and/or
"fetch"), "post", "edit", and "delete". Fetch is almost universal,
although the way this works varies a lot from system to system. My
messaging library currently has a single function that every protocol
supports, scan(). You run connection.scan(messageHandlerFunction), and
the connection will try to get whatever messages might be available
there, and run messageHandlerFunction on each message. scan() returns a
deferred, which will be called back if the scan has completed, and there
won't be any more messages found unless scan() is called again.
Although I haven't implemented it my idea has been that for
always-connected systems like IMAP and instant messaging, scan() would
continue to run until the connection was terminated.
* post() is pretty easy to abstract across various systems, and works
well for SMTP, IMAP, NNTP, various blog APIs, instant messaging, and
local message stores. It even works with vanilla HTTP, sort of.
* So far I haven't put implemented edit() or delete() on anything except
local message stores, so no comments there.
* Folders and chat rooms are basically the same thing - a place where
messages are posted to and read from.
* I'm using URLs to describe places you can access messages.
Anyhow, I'm not sure what level of support for all this you want to have
in Twisted. If you really wanted to go all out it's possible to create
a layer of abstraction that would let you read/write to a local mbox
file, a NNTP newsgroup, an RSS file accessed through HTTP, and a folder
full of text files, all using the same API. That's the direction I'm
going in at the moment, but it's probably too wide a scope for the core
Twisted APIs.
If anybody here is interested in seeing my attempts at creating a system
that implements these kinds of ideas, you can check out the code from
CVS:
:pserver:anonymous at cvs.sourceforge.net:/cvsroot/hepserver
The two modules are "messaging" (the messaging abstraction layer, which
is usable on it's own - see examples/copymessages.py) and "hep". If you
want to run Hep you'll also need the Lupy text indexer from divmod.org.
You might also to read my weblog entries from last week at
http://www.fettig.net/.
Abe
More information about the Twisted-Python
mailing list