[Israel.pm] run many regexes

Omer Zak w1 at zak.co.il
Thu May 1 15:44:48 EEST 2008


(Terminology:  regexp, data string, simple match string.)
Crazy idea:
Is there a way to somehow invert the operation of regexp matching, and
then study the data string before matching it to the regexps - rather
than study all the regexps before matching them to the data string?

Missing the above possibility and assuming that you want to match in
order to reject non-matching data strings, rather than parse them:

1. Determine the percentage of rejections each regexp scores when
matching data strings.  Then order the regexps so that the first 20%
will reject 80% of the strings.  Take into account (non-)overlaps.
2. Regexps of similar rejection percentages - order from simple to
complicated.
3. Determine if and which simple match string searches can be used to
eliminate large percentage of the data strings.


On Thu, 2008-05-01 at 15:22 +0300, Amir E. Aharoni wrote:
> 2008/5/1 Pinkhas Nisanov <pinkhas at nisanov.com>:
> > Hi,
> >
> >  I need run many ( hundreds ) regexes on some string.
> >  Is there some way to optimize it and get better performance?
> 
> There's study: http://perldoc.perl.org/functions/study.html
> 
> You should run your own benchmarks to see if it actually helps.
> 
> Any other ideas?
-- 
May the holy trinity of  $_, @_ and %_ be hallowed.
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html



More information about the Perl mailing list