If your a regex guru, and you know why you came here, you can go straight to the brief explanation. If not just keep reading.
I found a workaround for python bug 1519638. It most definitely will not solve all of the puzzles out there but it stops breaking the sub method for replacing with the use of backrefs.
The problem
If you would like to replace this:
<label for="author"><small>Name
With this:
<label for="author"><small>Naam
And you’re not sure if the <small> tags is there, you would group the chars “<small>” and use a question mark for making them optional. BTW, running a replace on just “Name” is not allowed because they would mess up other parts of the file in question.
Example updated. Thanx dbr!
The solution
Using a compiled pattern and thus a regex to replace this, a solution might look like this:
reg = re.compile(r'(<label for="author">)(<small>)?(Name)', \
re.VERBOSE | re.MULTILINE | re.DOTALL)
replace = r'\g<1>\g<2>\g<3>'
search = reg.sub(replace, data)
In this case the replacement string uses backreferences to the groups being the sub expressions within the parenthesis in the search pattern.
The oops
However, if the “<small>” tag is not there the search command raises an exception.
$ python regex.py
Traceback (most recent call last):
File "regex.py", line 14, in <module>
search = reg.sub(replace, data)
File "/usr/lib/python2.5/re.py", line 274, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib/python2.5/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
This happens because the second group represented with “\g<2>” in the replacement string returns a “None” instead of an empty string. That is (seems) the bug.
Solving the oops
This can be resolved by replacing the optional notation “(<small>)?” with an alternation “(|<small>)” because with the “<small>” tag being absent it matches on the empty subexpression. And then it actually returns an empty string so the search command won’t raise the exception.
In other words …
Brief explanation
When doing a search and replace with sub, replace the group represented as optional for a group represented as an alternation with one empty subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without the double quotes).
If there’s nothing matched by this group the empty subexpression matches. Then an empty string is returned instead of a None and the sub method is executed normally instead of raising the “unmatched group” error.
That’s all folks …

Hi:
I’m the original poster of the bug, and I’m writing to thank you for the fix.
It’s been almost two years since I posted that bug, and I’d lost hope that it would be fixed. Your workaround will allow me to fix my scripts to finally avoid the silly hacks I’ve been using in the meantime.
Again, thank you!
Not sure this is really a bug.. You are trying to use a referenced group that might not exist (the “()?” one)..
Not exactly sure of what you are trying to achieve in the end (The example could be done by data.replace(“Blue”,”Red”) ), but the way I’d do it is..
import re
data = “Blue”
reg = re.compile(r’((?:)?)(Blue)’, re.VERBOSE | re.MULTILINE | re.DOTALL)
print reg.sub(“\gBlue”, data)
Err, the comment system messed up the quotes and angle-brackets.. I posted the same comment on reddit: http://www.reddit.com/info/6rbg9/comments/#c04nqd4
To “nneonneo”: Very welcome! It gave me a headache for about a week ..
To “dbr”: It is indeed debatable whether it is a bug or not. And the example, now that I read your response, is not that accurate.
The replace I was going for originated from this one:
Where “\g<2>” could be any font like atribute in html tag form. And I could not just do your trick on the “Name” to “Naam” replace, because that would most definitely mess up other parts of files I was going through.
Anyway thanx for the repost on the brackets/quotes issue. I was passed the point of no return on that ..
Regards,
Gerard.
This is NOT a bug.
In most regex libraries the ()? match will return null if the sub expression does not match. Look at Java, C#, etc; They all do this.
To “Jon”: I merely mentioned the term bug because it is on the python bug list. Nevertheless, one could debate on whether it is or not.
Thanx for the heads-up!
Gerard.
Gerard,
remind me, are you my ex-colleague from Energis/Enertel?
If so, it’s quite bizarre that I see you meddle with Python at about the same time I have a Python programming job.
Most likely,
Sent you an email …
Awesome site man. It is easy to see that you like blogging.