Python-Ref > Regular expressions (regexp) > More on groups
 
 

<-^^->
Klíčová slova
Moduly

More on groups

More fun with groups
In the same way as it is possible to refer back to matched groups in a replacement string, it is possible to use back-reference in the search string itself.
Expand/Shrink
Zdroj: (regexp7-1.py)
  1   import re
  2   
  3   # I would like to get individual tags content
  4   text = "This is <i>italic, <b>bold-italic</b></i> and this only <b>bold</b>."
  5   
  6   # greediness problem
  7   print re.search( "<\w+>.*</\w+>", text).group(0)
  8   
  9   # huh, </b> matches before </i> => unbalanced tags
 10   print re.search( "<\w+>.*?</\w+>", text).group(0)
 11   
 12   # solution
 13   print re.search( "<(\w+)>.*?</(\\1)>", text).group(0)
stdout:
<i>italic, <b>bold-italic</b></i> and this only <b>bold</b>
<i>italic, <b>bold-italic</b>
<i>italic, <b>bold-italic</b></i>
Doba běhu: 23.5 ms
Sometimes it is undesirable for some groups to appear in the output. The following code shows how to specify a group that works only for grouping of items, but cannot be retrieved afterwards using its number.
Expand/Shrink
Zdroj: (regexp7-2.py)
  1   import re
  2   
  3   text = "There was 1 party of 30 men and 2 parties of 10 women."
  4   # how many parties in total, how many persons?
  5   
  6   # I have to use a group to enable plural and singular forms..
  7   print re.findall( "([0-9]+) part(y|ies) of ([0-9]+)", text)
  8   # .. but I am not interested in retrieving the group
  9   
 10   # (: is a non-retrievable group
 11   print re.findall( "([0-9]+) part(?:y|ies) of ([0-9]+)", text)
stdout:
[('1', 'y', '30'), ('2', 'ies', '10')]
[('1', '30'), ('2', '10')]
Doba běhu: 23.6 ms
Expand/Shrink
Zdroj: (regexp7-3.py)
  1   """I would like to split an internet address into two parts -
  2   'server name' and 'directory'."""
  3   
  4   import re
  5   
  6   texts = ["bkchem.zirael.org", "python.org/news", "www.python.org/download/releases/2.3.6/"]
  7   
  8   # the internal groups appear in the result (the last match for each of them)
  9   for text in texts:
 10       m = re.match( "(\w+(\.\w+)+)((/[\w.]+)*)", text)
 11       if m:
 12           print m.groups()
 13   
 14   print
 15   # the internal groups were converted to non-retrievable ones
 16   for text in texts:
 17       m = re.match( "(\w+(?:\.\w+)+)((?:/[\w.]+)*)", text)
 18       if m:
 19           print m.groups()
stdout:
('bkchem.zirael.org', '.org', '', None)
('python.org', '.org', '/news', '/news')
('www.python.org', '.org', '/download/releases/2.3.6', '/2.3.6')

('bkchem.zirael.org', '')
('python.org', '/news')
('www.python.org', '/download/releases/2.3.6')
Doba běhu: 21.0 ms
In more complex programs it might be desirable to name the groups in order not to loose track of them when a syntax of data changes.
Expand/Shrink
Zdroj: (regexp7-4.py)
  1   import re
  2   
  3   text = "There was 1 party of 30 men and 2 parties of 10 women."
  4   
  5   # how many parties in total, how many persons?
  6   for m in re.finditer( "(?P<parties>[0-9]+) part(?:y|ies) of (?P<persons>[0-9]+)", text):
  7       print "----- Match -----"
  8       print m.groupdict()
  9       print "Parties: %s" % m.group( 'parties')
 10       print "Persons: %s" % m.group( 'persons')
 11       print "Total:   %d" % (int( m.group( 'parties')) * (int( m.group( 'persons'))))
stdout:
----- Match -----
{'persons': '30', 'parties': '1'}
Parties: 1
Persons: 30
Total:   30
----- Match -----
{'persons': '10', 'parties': '2'}
Parties: 2
Persons: 10
Total:   20
Doba běhu: 23.7 ms