Book Information
|
Book Reviews
Review by Gabor Szabo, November 2002
This is the first Manning book I have read.
I am not sure why but I was looking forward to it with very high expectations -
it was either because of the author or because of the nice and serious
picture on the cover - and it delivered:
I learned a lot of new things and in areas where I already had some
knowledge the book made order in the things I knew.
Manning books are a bit more expensive than other computer books and
they cost me even more as it is not sold via Bookpool the usual place
I buy books but the publisher was nice enough and sent a copy
of this book free of charge to be reviewed by the
Israeli Perl Mongers.
munge - as also described in the book - means
1. [derogatory] To imperfectly
transform information
but the book is about something slightly different: It is about taking
some kind of input data and transforming it to another
structure. Maybe it is actually the other meaning mentioned:
3. To modify data in some way the speaker doesn't need to go
into right now or cannot describe succinctly (compare mumble)
The word is somehow related to 'mangal' which is the word used in Israel
for barbecue that of course means To take some raw material and
make it digestible.
The book has four parts:
In the first part Dave gives us the
Foundations: he gives us the basic background of data
processing, the various sources and forms of data. He then goes
on giving some guidelines on data structure design and
best practices in data munging. This part already
involves some interesting Perl expression. Then chapter 3-4 gives us
the required Perl knowledge which is about intermediate level.
It explains various methods for sorting and in 20 pages tries to teach
pattern matching. This is definitely not enough for the beginners but then
an intermediate Perl programmer already should know this level of
pattern matching. I think this chapter about pattern matching is a bit
misplaced but Dave might know that as he points us to other sources
where we can get deeply involved with patterns.
The second part called Data Munging is the part that gave the
title of the whole book. Here Dave gets a bit less generic and shows
us how to can we read, transform and write 4 different data types:
- unstructured data
- record-oriented data (variable-width data)
- fixed-width data
- binary-data
There are some nice tricks like keeping the field names in an array,
using $/ in some unpredictable way and of course manipulating date and
time. The binary section slightly touched the wealth of
graphical file formats and gives you some directions to start.
Simple data parsing is the third part of the book. It goes into
more specific data formats mainly HTML and XML and RSS.
It shows a good example when NOT to use regular expressions for
parsing data and then shows various tools and ways to parse these
file formats.
In the last part there is an introduction to Parse::RecDescent a module
that enables you to defined your own grammar and build your own parser
for it.
The Big Picture is the last part of the book, it gives you an
overview of the content of the book and mentions further resources
where to seek help. Among other things Dave mentions the
Perl Mongers too. Did he know something
?
So the book provides you with various tools and shows quite a number
of CPAN modules. Of course you'll still need to read the documentation
to use the full power of the modules but the book gives you a good head
start and points you in some of the good directions.
The text is quite readable but Dave will not make it as a
thriller writer. In the paragraph where the title was
"Making use of extra data" right from the title I could
guess what will be the end of the story.
There is one really annoying issue that the book cover started to bend when I
was about 5 minutes into reading the book. It certainly makes the book
look as if I use it very often right at the beginning. You don't need
that trick guys. I'll pick up the book quite often in any case.
Interestingly just as I was reading "Data Munging" I taught a Perl class
where exactly some of the questions discussed in this book came up.
I warmly recommended this book.
Another review with some discussion can be found on
slashdot