[Israel.pm] regexp
Shlomi Fish
shlomif at iglu.org.il
Sun Jun 25 11:46:51 EEST 2006
On Sunday 25 June 2006 11:14, Ernst, Yehuda wrote:
> Hello!
>
>
> I have a text like this
>
> "aaa<asd>='asd'/6>bbb<asd>='asd'/3>ccc<asd>='asd'/5>ddd###"
>
> I need to extract the aaa bbb ccc ddd
> between is the same <asd>='asd'/6>
> just the number can be different
>
> i do not know how many <asd>='asd'/6> are there the end is like this ###
>
> any ideas?
>
Not really a ready solution, but from the problem I suggest you tokenise the
string into tokens, and then manipulate the array of tokens etc. You can do
it using:
1. if ($string =~ s{^$regex}{})
elsif ($string =~ s{^$regex2}{})
.
.
.
2. Alternatively use a lexer module from CPAN:
http://search.cpan.org/dist/HOP-Lexer/
http://cpan.uwinnipeg.ca/dist/Parse-Flex
There are also some others which you can find from a CPAN search. Also see:
http://www.shlomifish.org/Vipe/lecture/Sys-Call-Track/Lex-Yacc/
They are basically an interface above this if ... elsif statement.
3. There are other ways to write #1 above. One of them is using /g and \G.
-------------
Using simple regexps to parse HTML (which seems similar to your problem) is a
very old Perl request, and often appears in #perl on Freenode.
Regards,
Shlomi Fish
---------------------------------------------------------------------
Shlomi Fish shlomif at iglu.org.il
Homepage: http://www.shlomifish.org/
95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.
More information about the Perl
mailing list