Python-Ref > Regular expressions (regexp) > Regexp strategies III.
 
 

<-^^
Klíčová slova
Moduly
Knihovní funkce

Regexp strategies III.

How to use regexps effectively - beware of greedy regexp
Regular expressions are greedy by default. When working with some kinds of data and especially if multiline searches are used this fact may backfire at you.
The following code demonstrates this together with the importance of multiline searching.
Expand/Shrink
Zdroj: (regexp13-1.py)
  1   import re
  2   
  3   text = '''<h1>This is the main title</h1>
  4   <p>some text.
  5   Another piece of text</p>
  6   <p id="x112">some text here</p>
  7   <p>some more text</p>
  8   <h2>Secondary title</h2>
  9   <p class="red">text to show in red
 10   that continues here</p>'''
 11   
 12   # find all paragraphs - no multiline
 13   for p in re.findall('<p.*?>.*</p>', text):
 14     print " --start--"
 15     print p
 16     print " --end--"
 17   
 18   print "--------------------"
 19   
 20   # multiline, but beware of the greedy regexp
 21   for p in re.findall('<p.*?>.*</p>', text, re.S):
 22     print " --start--"
 23     print p
 24     print " --end--"
 25   
 26   print "--------------------"
 27   
 28   # multiline and non-greedy, this is what we wan't
 29   for p in re.findall('<p.*?>.*?</p>', text, re.S):
 30     print " --start--"
 31     print p
 32     print " --end--"
stdout:
 --start--
<p id="x112">some text here</p>
 --end--
 --start--
<p>some more text</p>
 --end--
--------------------
 --start--
<p>some text.
Another piece of text</p>
<p id="x112">some text here</p>
<p>some more text</p>
<h2>Secondary title</h2>
<p class="red">text to show in red
that continues here</p>
 --end--
--------------------
 --start--
<p>some text.
Another piece of text</p>
 --end--
 --start--
<p id="x112">some text here</p>
 --end--
 --start--
<p>some more text</p>
 --end--
 --start--
<p class="red">text to show in red
that continues here</p>
 --end--
Doba běhu: 23.1 ms