[Israel.pm] run many regexes

Shlomi Fish shlomif at iglu.org.il
Sun May 4 12:54:53 EEST 2008


On Sunday 04 May 2008, Pinkhas Nisanov wrote:
> On Sat, May 3, 2008 at 9:58 AM, Gaal Yahas <gaal at forum2.org> wrote:
> >  Yes, the regular expression was improved, and trie optimizations were
> >  introduced. To take advantage of it you'll need to build a single RE
> >  our of your many posiblitities:
> >
> >  my $any_expression = "(" . (join "|", @expressions) . ")";
> >  my $any_re = qr/$any_expression/;
> >
> >  for my $input (@inputs) {
> >   print "match: $input" if $input =~ $any_re;
> >  }
> >
> >  The point here is that 5.10 is better at optimizing $any_re than
> >  previous perls; if several expressions shared the same prefix you'll
> >  get less backtracking. E.g. you have "banana" and "bandanna",
> >  internally the matching will be for "ban(?:ana|danna)". This is true
> >  even when only a small number of the various @expressions share
> >  prefixes.
>
> I try this code for ~700 regexes and it run 2-3 times faster!!!
> Does any other programming language has this feature?
>

I think most programming languages that support regexes (e.g: Python, C, Java) 
require you to compile the regexes first and only then execute them. (Though 
they may sometimes have convenience functions.) In Perl 5 the regex match is 
part of the syntax of the language, rather than a library call (for 
convenience and Huffmanisation.).

Regards,

	Shlomi Fish

-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
First stop for Perl beginners - http://perl-begin.org/

The bad thing about hardware is that it sometimes work and sometimes doesn't.
The good thing about software is that it's consistent: it always does not
work, and it always does not work in exactly the same way.


More information about the Perl mailing list