Review by Gabor Szabo, November 2002
This is the first Manning book I have read.
I am not sure why but I was looking forward to it with very high expectations - it was either because of the author or because of the nice and serious picture on the cover - and it delivered:
I learned a lot of new things and in areas where I already had some knowledge the book made order in the things I knew.
Manning books are a bit more expensive than other computer books and they cost me even more as it is not sold via Bookpool the usual place I buy books but the publisher was nice enough and sent a copy of this book free of charge to be reviewed by the Israeli Perl Mongers.
munge - as also described in the book - means 1. [derogatory] To imperfectly transform information but the book is about something slightly different: It is about taking some kind of input data and transforming it to another structure. Maybe it is actually the other meaning mentioned: 3. To modify data in some way the speaker doesn't need to go into right now or cannot describe succinctly (compare mumble)
The word is somehow related to 'mangal' which is the word used in Israel for barbecue that of course means To take some raw material and make it digestible.
The book has four parts:
In the first part Dave gives us the Foundations: he gives us the basic background of data processing, the various sources and forms of data. He then goes on giving some guidelines on data structure design and best practices in data munging. This part already involves some interesting Perl expression. Then chapter 3-4 gives us the required Perl knowledge which is about intermediate level. It explains various methods for sorting and in 20 pages tries to teach pattern matching. This is definitely not enough for the beginners but then an intermediate Perl programmer already should know this level of pattern matching. I think this chapter about pattern matching is a bit misplaced but Dave might know that as he points us to other sources where we can get deeply involved with patterns.
The second part called Data Munging is the part that gave the title of the whole book. Here Dave gets a bit less generic and shows us how to can we read, transform and write 4 different data types:
- unstructured data
- record-oriented data (variable-width data)
- fixed-width data
There are some nice tricks like keeping the field names in an array,
using $/ in some unpredictable way and of course manipulating date and
time. The binary section slightly touched the wealth of
graphical file formats and gives you some directions to start.
Simple data parsing is the third part of the book. It goes into more specific data formats mainly HTML and XML and RSS. It shows a good example when NOT to use regular expressions for parsing data and then shows various tools and ways to parse these file formats.
In the last part there is an introduction to Parse::RecDescent a module that enables you to defined your own grammar and build your own parser for it.
The Big Picture is the last part of the book, it gives you an overview of the content of the book and mentions further resources where to seek help. Among other things Dave mentions the Perl Mongers too. Did he know something ?
So the book provides you with various tools and shows quite a number of CPAN modules. Of course you'll still need to read the documentation to use the full power of the modules but the book gives you a good head start and points you in some of the good directions.
The text is quite readable but Dave will not make it as a thriller writer. In the paragraph where the title was "Making use of extra data" right from the title I could guess what will be the end of the story.
There is one really annoying issue that the book cover started to bend when I was about 5 minutes into reading the book. It certainly makes the book look as if I use it very often right at the beginning. You don't need that trick guys. I'll pick up the book quite often in any case.
Interestingly just as I was reading "Data Munging" I taught a Perl class where exactly some of the questions discussed in this book came up.
I warmly recommended this book.
Another review with some discussion can be found on slashdot