"

New I have couple of RA / TA positions, could be turned into PhD positions dependent on satisfactory performance. Contact me for details.

DS 402: Big Data Concepts: Introduction (Even Semester)

Lecture 1

Introduction

Date

20th Jan, 2015

Lab/Assignment

Lab

Hadoop Deployment Running the first example (NCDC Weather data)

Assignment

Calculate year-morning/afternoon/night avg. highest temperature

Word count on a big Twitter data

 

 

Lecture 1

HDFS, Inside Map-Reduce, Introduction to Machine Learning

Date

27th Jan, 2015

Lab/Assignment

Lab

Twitter Word Count

Assignment

Naive Bayes Spam Filtering

   


 

Topic to be covered

  • An introduction to Hadoop: Key features, deployment, and uses in the Cloud.
  • Big Data Reference Architecture, Design Patterns.
  • Map Reduce Framework: Design patterns, Map Reduce in various various environments and integration with traditional data ware houses.Batch Processing: Review of Pig, a scripting language for control of MapReduce processes on Hadoop and Hive, a data warehouse like system with fairly complete SQL like syntax.
  • HBase and Hive: Fault-tolerant way of sorting, moving and querying large quantities of sparse data.
  • Impala and Flume: Overview and integration of Impala with Flume and Solr.
  • Stream Computing: Examine Apache Flume NG as a collection technique and associated tools for complex event processing (CEP) application.

References

Text Book

  • Hadoop-The Definite Guide, Author. Tom White; Publisher: O'Reilly Media.
  • Hadoop in Practice, Author: Alex Holmes; Publisher: O'Reilly Media.

References

  • Hadoop Operations; Author: Eric Sammer; Publisher:O'Reilly Media.
  • Professional Hadoop Solutions; Authors: Boris Lublinsky, Kevin T. Smith, and Alexey Yakubovich; Publisher: O'Reilly Media.
  • Map Reduce Design Patterns -- Building Effective Algorithms and Analytics for Hadoop and Other Systems; Author: Donald Miner; Publisher: O'Reilly Media.
  • HBase -- The Definite Guide; Author: Lars George; Publisher: O'Reilly Media.