PHP Regular Expressions: Recursion, Named Capture
Here's a PHP regular expression for matching *nested* parentheses (e.g. blocks of code):
((?:[^()]++|\((?1)\))*)
The ?1 is a recursive reference to the regex marked by the outermost parentheses. It is a feature of the PHP regex engine.
See Jeffrey Friedl's Mastering Regular Expressions, 3rd ed., p. 476, "Recursive reference to a set of capturing parentheses".
Another potentially useful regex technique is "named capture":
^(?P<protocol>https?)://(?P<host>[^/:]+)(?::(?P<port>\d+))?
Here you can use either $matches[0], $matches[1], $matches[2] or $matches['protocol'], $matches['host'], $matches['port'].
((?:[^()]++|\((?1)\))*)
The ?1 is a recursive reference to the regex marked by the outermost parentheses. It is a feature of the PHP regex engine.
See Jeffrey Friedl's Mastering Regular Expressions, 3rd ed., p. 476, "Recursive reference to a set of capturing parentheses".
Another potentially useful regex technique is "named capture":
^(?P<protocol>https?)://(?P<host>[^/:]+)(?::(?P<port>\d+))?
Here you can use either $matches[0], $matches[1], $matches[2] or $matches['protocol'], $matches['host'], $matches['port'].
6 Comments:
I didn't know about ?1. Is it also in other languages like Perl or Ruby?
With things like ?1, perhaps regular expressions should now be called context free expressions instead.
-K
By Kaushik, at 12/14/2007 11:45 p.m.
Hi Kaushik - Some regex engines have recursive matching. There are a couple of entries in Friedl's book about recursive matching -
Perl has a "dynamic regex": (??{perl code})
Java had ?1 as an undocumented feature until 1.4.2, after which it was removed.
.NET uses (?<DEPTH>) to achieve something similar.
By Jonathan, at 12/15/2007 9:54 a.m.
Interesting! I guess I just didn't venture beyond bare-bones regexes.
Thanks for your reply!
By Kaushik, at 12/15/2007 11:22 p.m.
Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.
http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html
Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.
By Demon, at 3/29/2009 11:56 a.m.
Good to know - thanks Wolf!
By Jonathan, at 3/29/2009 1:12 p.m.
Will recursive regex ever take off? They take the potential of regex for confusion and errors to another dimension. There's an example in this recursive regex tutorial that shows a subtle problem in a recursive expression because of atomic grouping. Anyhow, thanks for spreading the regex "Gospel".
By Anonymous, at 12/15/2011 10:16 a.m.
Post a Comment
<< Home