[Israel.pm] \w for utf8
Yuval Kogman
nothingmuch at woobling.org
Mon Aug 20 15:44:59 EEST 2007
use utf8;
Will tell perl that the current file is encoded in utf8 and all
strings will be assumed to be that (as opposed to latin1).
Since your string is likely coming from elsewhere, look into
binmode($fh, ":utf8) and open($fh, "<:utf8", $file), and also
Encode::decode.
These are the common methods to get a string to be marked as unicode
in memory, at which point the regex engine treats \w+ as really all
alphanumerical characters, not only [a-zA-Z0-9_].
There is a tutorial by Juerd somewhere, it's supposed to be pretty
good. Try google perhaps
On Mon, Aug 20, 2007 at 15:39:58 +0300, Pinkhas Nisanov wrote:
> Hi,
>
> I need catch string that may include 'utf8' characters:
> e.g.:
>
> my $str_utf8 = 'N-Größe';
> my @res = ( $str_utf8 =~ /(\w+)/g );
> print join( " ++ ", @res ), "\n";
>
>
> it prints:
>
> N ++ Gr ++ e
>
> but I need:
>
> N ++ Größe
>
>
> thanks
> Pinkhas Nisanov
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://perl.org.il/mailman/listinfo/perl
--
Yuval Kogman <nothingmuch at woobling.org>
http://nothingmuch.woobling.org 0xEBD27418
More information about the Perl
mailing list