I
like to stay on top of things and try and learn new technologies when I have
spare time. My new personal project has been all things BigData. This
includes reading RSS feeds, industry reports and of course playing with the
tools.
My
goal with this blog is to share with others who are like minded my process and
resources used to learn Hadoop and BigData.
Let
get started.
The
first thing you will need is a VM (virtual machine). I personally
use Oracle VM Virtual Box, its free and can be found here (http://www.oracle.com/technetwork/community/developer-vm/index.html) as
well as many pre-built VM’s to learn all things Oracle. I was
playing with the OBIEE but that would be another blog.
Next
thing you will need to do is find a Hadoop VM image.
Since
everything available as opensource you can depending on your needs, configure
your Virtual Box from scratch. Download and install your linux
version / Hadoop etc…. and try to configure all your software.
However, since is see my role as a consultant using a pre-configured
running instance I grabbed a working vendor VM.
Most
vendors are now supplying them pick the vendor you need/want to
learn and download the Virtual Box version if they have one.
I
downloaded my VM’s from Cloudera.
The
Cloudera QuickStart VM (Make sure you choose the correct VM version or you will
download a large file for nothing)
I
then stumbled upon the link below which is a free intro course to Mapreduce and
it used another functional VM with the training materials.
And that is where I
began. Follow the course videos.. start up your instance and
you’re in business running mapreduce jobs on your own Cloudera VM server.
Lastly… I’m
not a java guy and I had some reservations about my ability to write mappers and
reducers. So I choose to use Python code. And the tutorial above also uses
python so if you have any scripting knowledge then you will be
fine.
Other
nice to have software:
Notepad++
great free tool http://notepad-plus-plus.org/
If
your writing Java or Python get Eclipse. It’s the tool used on the
VM’s and it’s nice to have a local copy to use when you don’t want to have your
VM up and running taking up memory and eating you laptops
resources.
No comments:
Post a Comment