[Twisted-Python] Surprises in twisted.web.woven

Mon Aug 4 06:13:26 MDT 2003

On Monday, Aug 4, 2003, at 01:37 Australia/Sydney, Alex Levy wrote:
> On Sun, Aug 03, 2003 at 11:05:48PM +1000, Tim Allen wrote:
>> Finally, if I have a template that looks like this:
>>
>>     <p>Have a look at
>>     <a href="cat1.png">these</a>
>>     <a href="cat2.png">three</a>
>>     <a href="cat3.png">cats</a>.</p>
>>
>> then the output HTML looks like:
>>
>>     <p>Have a look at <a href="cat1.png">these</a><a
>>     href="cat2.png">three</a><a href="cat3.png">cats</a>.</p>
>
> This is another issue that I've wrangled with for some time. Feel free 
> to
> make a lot of noise and hope somebody fixes it; I believe the problem 
> lies
> in the XML parser itself. As Wayne so aptly put it, "We fear change."

As it turns out, I suspect this is false.

twisted.web.microdom.MicroDOMParser.shouldPreserveSpace() currently 
looks like:

     def shouldPreserveSpace(self):
         for edx in xrange(len(self.elementstack)):
             el = self.elementstack[-edx]
             if el.tagName == 'pre' or el.getAttribute("xml:space", '') 
== 'preserve':
                 return 1
         return 0

I dare say that a simple modification of this method would be enough, 
without delving into the depths of the XML parser (which doesn't 
discard pure-whitespace text elements at all).

Off the top of my head, I can think of three whitespace-handling modes:

  * Preserve all whitespace (as used in the HTML <pre> tag and elements 
with xml:space='preserve'.)
  * Collapse redundant whitespace (if not string.strip(): return ' ') 
which most closely matches how HTML user-agents handle whitespace.
  * Strip whitespace (run .strip() over all text nodes - I believe this 
most closely matches how XML processors handle whitespace).

Is there any reason why white-space preserving should not always be on?