<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web-In-Sight &#187; regex</title>
	<atom:link href="http://web-in-sight.nl/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://web-in-sight.nl</link>
	<description>Inzicht in internet en werken</description>
	<lastBuildDate>Mon, 30 Jan 2012 09:00:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Solved python regex raising exception &#8220;unmatched group&#8221;</title>
		<link>http://web-in-sight.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=solved-python-regex-raising-exception-unmatched-group</link>
		<comments>http://web-in-sight.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/#comments</comments>
		<pubDate>Fri, 11 Jul 2008 08:07:40 +0000</pubDate>
		<dc:creator>Gerard</dc:creator>
				<category><![CDATA[All ENGLISH articles]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[backref]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[exception]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.gp-net.nl/?p=51</guid>
		<description><![CDATA[If your a regex guru, and you know why you came here, you can go straight to the brief explanation. If not just keep reading. I found a workaround for python bug 1519638. It most definitely will not solve all &#8230; <a href="http://web-in-sight.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/">Lees verder <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><!--TOC-->If your a regex guru, and you know why you came here, you can go straight to the <a href="#toc-brief-explanation">brief explanation</a>. If not just keep reading.</p>
<p>I found a workaround for python bug <a title="issue1519638" href="http://bugs.python.org/issue1519638" target="_blank">1519638</a>. It most definitely will not solve all of the puzzles out there but it stops breaking the sub method for replacing with the use of backrefs.</p>
<h3>The problem</h3>
<p>If you would like to replace this:</p>
<pre>&lt;label for="author"&gt;&lt;small&gt;Name</pre>
<p>With this:</p>
<pre>&lt;label for="author"&gt;&lt;small&gt;Naam</pre>
<p>And you&#8217;re not sure if the &lt;small&gt; tags is there, you would group the chars &#8220;&lt;small&gt;&#8221; and use a question mark for making them optional. BTW, running a replace on just &#8220;Name&#8221; is not allowed because they would mess up other parts of the file in question.</p>
<p><em>Example updated. Thanx dbr!</em></p>
<h3>The solution</h3>
<p>Using a compiled pattern and thus a regex to replace this, a solution might look like this:</p>
<pre>reg = re.compile(r'(&lt;label for="author"&gt;)(&lt;small&gt;)?(Name)', \
    re.VERBOSE | re.MULTILINE | re.DOTALL)
replace = r'\g&lt;1&gt;\g&lt;2&gt;\g&lt;3&gt;'
search = reg.sub(replace, data)</pre>
<p>In this case the replacement string uses backreferences to the groups being the sub expressions within the parenthesis in the search pattern.</p>
<h3>The oops</h3>
<p>However, if the &#8220;&lt;small&gt;&#8221; tag is not there the search command raises an exception.</p>
<pre>$ python regex.py
Traceback (most recent call last):
  File "regex.py", line 14, in &lt;module&gt;
    search = reg.sub(replace, data)
  File "/usr/lib/python2.5/re.py", line 274, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.5/sre_parse.py", line 793, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched group</pre>
<p>This happens because the second group represented with &#8220;\g&lt;2&gt;&#8221; in the replacement string returns a &#8220;None&#8221; instead of an empty string. That is (seems) the bug.</p>
<h3>Solving the oops</h3>
<p>This can be resolved by replacing the optional notation &#8220;(&lt;small&gt;)?&#8221; with an alternation &#8220;(|&lt;small&gt;)&#8221; because with the &#8220;&lt;small&gt;&#8221; tag being absent it matches on the empty subexpression. And then it actually returns an empty string so the search command won&#8217;t raise the exception.</p>
<p>In other words &#8230;</p>
<h3>Brief explanation</h3>
<p>When doing a search and replace with sub, replace the group represented as optional for a group represented as an alternation with one empty subexpression. So instead of this &#8220;(.+?)?&#8221; use this &#8220;(|.+?)&#8221; (without the double quotes).</p>
<p>If there&#8217;s nothing matched by this group the empty subexpression matches. Then an empty string is returned instead of a None and the sub method is executed normally instead of raising the &#8220;unmatched group&#8221; error.</p>
<p>That&#8217;s all folks &#8230;</p>
<div class="AWD_like_button "><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fweb-in-sight.nl%2F2008%2F07%2F11%2Fsolved-python-regex-raising-exception-unmatched-group%2F&amp;send=false&amp;layout=button_count&amp;width=&amp;show_faces=false&amp;action=recommend&amp;colorscheme=light&amp;font=arial&amp;height=21" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:px; height:21px;" allowTransparency="true"></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://web-in-sight.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A python function to capture an IP address or range</title>
		<link>http://web-in-sight.nl/2008/06/12/a-python-function-to-capture-an-ip-address-or-range/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-python-function-to-capture-an-ip-address-or-range</link>
		<comments>http://web-in-sight.nl/2008/06/12/a-python-function-to-capture-an-ip-address-or-range/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 10:39:02 +0000</pubDate>
		<dc:creator>Gerard</dc:creator>
				<category><![CDATA[All ENGLISH articles]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[address]]></category>
		<category><![CDATA[ip]]></category>
		<category><![CDATA[netmask]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.gp-net.nl/?p=49</guid>
		<description><![CDATA[I recently needed a function that validates an IP address or network range. Since my python application will pass it as a parameter to iptables it needs to be correct and not &#8216;close to&#8217;. So I dug in &#8230; Validating &#8230; <a href="http://web-in-sight.nl/2008/06/12/a-python-function-to-capture-an-ip-address-or-range/">Lees verder <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently needed a function that validates an IP address or network range. Since my python application will pass it as a parameter to iptables it needs to be correct and not &#8216;close to&#8217;. So I dug in &#8230;</p>
<p>Validating an IP address or range with just a regex seems like self castigation. I looked at the source code of iptables and it check&#8217;s whether or not 1 octet fits in a byte. With your octet being only valid from 0 up to and 255 it must fit in 1 byte. That method seems ok but when you&#8217;re writing interpretable code the interpreter most likely does a better job then you in checking byte lenght&#8217;s. If it doesn&#8217;t already do it like that when checking int&#8217;s like this:</p>
<pre>if not (0 &lt;= int(octet) &lt;= 255):</pre>
<p>Digesting all that information I wrote the function below that takes an IP address or range and simply returns &#8216;True&#8217; or &#8216;False&#8217;.</p>
<pre>def check_address(address):

    if not (re.search('^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(|\/\d{1,2})$', address)):
        return False

    if (address.count('/') == 1):
        (ip, mask) = address.split('/')
        if not (0 &lt;= int(mask) &lt;= 32):
            return False
    else:
        ip = address

    for octet in ip.split('.'):
        if not (0 &lt;= int(octet) &lt;= 255):
            return False

    return True</pre>
<p>IMHO, it&#8217;s very safe and very readable. You know &#8230; <a title="KISS" href="http://en.wikipedia.org/wiki/KISS_principle" target="_blank">KISS</a>.</p>
<p>If you have suggestions on how to do this more pythonesque I&#8217;m very curious to here them so please drop me a line.</p>
<p>GrtzG</p>
<div class="AWD_like_button "><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fweb-in-sight.nl%2F2008%2F06%2F12%2Fa-python-function-to-capture-an-ip-address-or-range%2F&amp;send=false&amp;layout=button_count&amp;width=&amp;show_faces=false&amp;action=recommend&amp;colorscheme=light&amp;font=arial&amp;height=21" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:px; height:21px;" allowTransparency="true"></iframe></div>]]></content:encoded>
			<wfw:commentRss>http://web-in-sight.nl/2008/06/12/a-python-function-to-capture-an-ip-address-or-range/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

