Python-Ref > XML > DOM > Getting text from an element
 
 

<-^^->
Moduly
Knihovní funkce

Getting text from an element

How to extract text from an element.
The DOM tree consist mainly of two types of nodes - element nodes and text nodes. The element nodes do not contain any text directly, but always inside child text nodes. This is very important do keep in mind when writing a program.
The following piece of code shows a very simple example how to extract text from an element.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example 1</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example 2</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (dom7-1.py)
  1   import xml.dom.minidom as dom
  2   
  3   doc = dom.parse( "example.xml")
  4   
  5   for title_element in doc.getElementsByTagName( "title"):
  6       text = title_element.childNodes[0].data
  7       print text
stdout:
Example 1
Example 2
Doba běhu: 48.9 ms
The major drawback of the example above is that it implies that the text is the first (and only) child of an element. This fact is demonstrated in the example below.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example <i>1</i> - introduction</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example <i>2</i> - the second part</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (dom7-2.py)
  1   import xml.dom.minidom as dom
  2   
  3   doc = dom.parse( "example2.xml")
  4   
  5   for title_element in doc.getElementsByTagName( "title"):
  6       text = title_element.childNodes[0].data
  7       print text
stdout:
Example 
Example 
Doba běhu: 49.7 ms
The following two examples show different solutions to this problem.
The first one retrieves all text that is directly part of an element - it is not contained in child elements.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example <i>1</i> - introduction</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example <i>2</i> - the second part</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (dom7-3.py)
  1   import xml.dom.minidom as dom
  2   
  3   doc = dom.parse( "example2.xml")
  4   
  5   for title_element in doc.getElementsByTagName( "title"):
  6       text = ""
  7       for ch in title_element.childNodes:
  8           if isinstance( ch, dom.Text):
  9               text += ch.data
 10       print text
stdout:
Example  - introduction
Example  - the second part
Doba běhu: 49.7 ms
The second one retrieves all the text recursively, even from child elements.
Expand/Shrink
<examples>
  <example num="1">
    <title>Example <i>1</i> - introduction</title>
    <text>This is example nr. 1. It shows how an example looks.</text>
  </example>

  <example num="2">
    <title>Example <i>2</i> - the second part</title>
    <text>Another example. Imagine some ingenious text here...</text>
  </example>
</examples>
Zdroj: (dom7-4.py)
  1   import xml.dom.minidom as dom
  2   
  3   def get_all_text_from_element( el):
  4       text = ""
  5       for ch in el.childNodes:
  6           if isinstance( ch, dom.Element):
  7               text += get_all_text_from_element( ch)
  8           if isinstance( ch, dom.Text):
  9               text += ch.data
 10       return text
 11       
 12   doc = dom.parse( "example2.xml")
 13   
 14   for title_element in doc.getElementsByTagName( "title"):
 15       print get_all_text_from_element( title_element)
stdout:
Example 1 - introduction
Example 2 - the second part
Doba běhu: 51.0 ms