Python-Ref > XML > SAX > SAX introduction
 
 

^^->
Klíčová slova
Moduly
Knihovní funkce

SAX introduction

How does SAX work.
SAX is a abbreviation of Simple API to XML and provides a very fast and lightweight access to XML data. In contrast to DOM, which reads the whole XML document and creates a in-memory tree representation from it, SAX gives the program available information immediately after they are read from the document during parsing. The workflow is a follows - the parser reads the document and each time it finds a distinct type of XML data, such as start of an element, end of an element, processing instruction, etc. it passes the information to the so called handler. It is up to the handler to do something with data and up to the programmer to create the handler.
There are in fact several handlers, but the most interesting one is the ContentHandler, the handler that takes care of the document content.
The ContentHandler which we need to create is a class that is derived from xml.sax.ContentHandler. We then pass this handler to the parser and let it do its job. During the parsing process our handler will be informed about the content of the file via its methods. Most important of these are startElement, endElement and characters.
The following example shows a simple ContentHandler used for dumping of the XML structure.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example 1</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example 2</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (sax1-1.py)
  1   import xml.sax
  2   
  3   class MyHandler ( xml.sax.ContentHandler):
  4   
  5     def __init__( self):
  6       xml.sax.ContentHandler.__init__( self)
  7   
  8     def startElement( self, name, attrs):
  9       print "Element start: %s (%d attribute(s))" % (name, len( attrs))
 10   
 11     def endElement( self, name):
 12       print "Element end: %s" % name
 13   
 14     def characters( self, data):
 15       print "Characters: %s" % data
 16   
 17   filename = "example.xml"
 18   handler = MyHandler()
 19   xml.sax.parse( filename, handler)
stdout:
Element start: examples (0 attribute(s))
Characters: 

Characters:   
Element start: example (1 attribute(s))
Characters: 

Characters:     
Element start: title (0 attribute(s))
Characters: Example 1
Element end: title
Characters: 

Characters:     
Element start: text (0 attribute(s))
Characters: This is example nr. 1. It shows how an example looks.
Element end: text
Characters: 

Characters:   
Element end: example
Characters: 

Characters: 

Characters:   
Element start: example (1 attribute(s))
Characters: 

Characters:     
Element start: title (0 attribute(s))
Characters: Example 2
Element end: title
Characters: 

Characters:     
Element start: text (0 attribute(s))
Characters: Another example. Imagine some ingenious text here...
Element end: text
Characters: 

Characters:   
Element end: example
Characters: 

Element end: examples
Doba běhu: 74.7 ms
Because out ContentHandler inherits all the default methods from its parent, the ContentHandler, we do not necessarily need to provide all of them. In some cases, it is sufficient to provide only one of them.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example 1</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example 2</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (sax1-2.py)
  1   import xml.sax
  2   
  3   class MyHandler ( xml.sax.ContentHandler):
  4   
  5     def __init__( self):
  6       xml.sax.ContentHandler.__init__( self)
  7       self.element_name2count = {}
  8   
  9     def startElement( self, name, attrs):
 10       self.element_name2count[ name] = self.element_name2count.get( name, 0) + 1
 11   
 12   filename = "example.xml"
 13   handler = MyHandler()
 14   xml.sax.parse( filename, handler)
 15   # sort elements according to their count
 16   to_sort = [(count,name) for name,count in handler.element_name2count.iteritems()]
 17   to_sort.sort( reverse=True)
 18   for count,name in to_sort:
 19     print "%s: %d" % (name,count)
stdout:
title: 2
text: 2
example: 2
examples: 1
Doba běhu: 73.9 ms