Thursday, March 26, 2015

Getting Started: Hadoop 0 to 60 using Cloudera

I like to stay on top of things and try and learn new technologies when I have spare time.  My new personal project has been all things BigData.   This includes reading RSS feeds, industry reports and of course playing with the tools.
My goal with this blog is to share with others who are like minded my process and resources used to learn Hadoop and BigData.
Let get started.
The first thing you will need is a VM (virtual machine).  I personally use Oracle VM Virtual Box, its free and can be found here (http://www.oracle.com/technetwork/community/developer-vm/index.html)  as well as many pre-built VM’s to learn all things Oracle.  I was playing with the OBIEE but that would be another blog.
Next thing you will need to do is find a Hadoop VM image.
Since everything available as opensource you can depending on your needs, configure your Virtual Box from scratch.  Download and install your linux version / Hadoop etc…. and try to configure all your software.  However, since is see my role as a consultant using a pre-configured running instance I grabbed a working vendor VM.
Most vendors are now supplying them  pick the vendor you need/want to learn and download the Virtual Box version if they have one.
I downloaded my VM’s from Cloudera.
The Cloudera QuickStart VM (Make sure you choose the correct VM version or you will download a large file for nothing)
I then stumbled upon the link below which is a free intro course to Mapreduce and it used another functional VM with the training materials.  
And that is where I began.  Follow the course videos.. start up your instance and you’re in business running mapreduce jobs on your own Cloudera VM server.
Lastly…  I’m not a java guy and I had some reservations about my ability to write mappers and reducers. So I choose to use Python code. And the tutorial above also uses python so if you have any scripting knowledge then you will be fine.
Other nice to have software:
Notepad++  great free tool http://notepad-plus-plus.org/
If your writing Java or Python get Eclipse.  It’s the tool used on the VM’s and it’s nice to have a local copy to use when you don’t want to have your VM up and running taking up memory and eating you laptops resources.


No comments:

Post a Comment