[Israel.pm] UTF-8
Gaal Yahas
gaal at forum2.org
Thu May 3 13:10:39 EEST 2007
On 5/3/07, Pinkhas Nisanov <pinkhas at nisanov.com> wrote:
> I need to convert some text to UTF-8,
> problem is that in text I have some characters
> encoded in ISO-8859-1 and some in UTF-8.
> When I run "encode_utf8" function it
> convert ISO characters to UTF, but it also
> convert UTF characters to something unreadable.
> Is there some way to convert iso->utf and
> leave utf without change?
Not in general. Given a string you can tell if it's valid UTF-8 or
not, but *all* strings are valid ISO-889-1! (Modulo perhaps one or two
byte patterns that don't appear in UTF-8 either). So you need some
sort of heuristic; perhaps scan characters until you see something
that isn't valid UTF-8 and assume it's a single Latin-1 character and
translate it accordingly.
--
Gaal Yahas <gaal at forum2.org>
http://gaal.livejournal.com/
More information about the Perl
mailing list