[Israel.pm] run many regexes
Omer Zak
w1 at zak.co.il
Thu May 1 15:44:48 EEST 2008
(Terminology: regexp, data string, simple match string.)
Crazy idea:
Is there a way to somehow invert the operation of regexp matching, and
then study the data string before matching it to the regexps - rather
than study all the regexps before matching them to the data string?
Missing the above possibility and assuming that you want to match in
order to reject non-matching data strings, rather than parse them:
1. Determine the percentage of rejections each regexp scores when
matching data strings. Then order the regexps so that the first 20%
will reject 80% of the strings. Take into account (non-)overlaps.
2. Regexps of similar rejection percentages - order from simple to
complicated.
3. Determine if and which simple match string searches can be used to
eliminate large percentage of the data strings.
On Thu, 2008-05-01 at 15:22 +0300, Amir E. Aharoni wrote:
> 2008/5/1 Pinkhas Nisanov <pinkhas at nisanov.com>:
> > Hi,
> >
> > I need run many ( hundreds ) regexes on some string.
> > Is there some way to optimize it and get better performance?
>
> There's study: http://perldoc.perl.org/functions/study.html
>
> You should run your own benchmarks to see if it actually helps.
>
> Any other ideas?
--
May the holy trinity of $_, @_ and %_ be hallowed.
My own blog is at http://www.zak.co.il/tddpirate/
My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
More information about the Perl
mailing list