I'm new to Hadoop MapReduce (4 days to be precise) and I've been asked to perform distributed XML parsing on a cluster. As per my (re)search on the Internet, it should be fairly easy using Mahout's XmlInputFormat, but my task is to make sure that the system works for huge (~5TB) XML files.I'm new to Hadoop MapReduce (4 days to be preci