Introduction to Apache Hadoop HDFS - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Apache Hadoop HDFS

Description:

This presentation introduces Apache Hadoop HDFS. It describes the HDFS file system in terms of Hadoop and big data. It looks at its architecture and resiliance. – PowerPoint PPT presentation

Number of Views:2142
Slides: 11
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Introduction to Apache Hadoop HDFS


1
Apache Hadoop HDFS
  • What is it ?
  • What is it for ?
  • Architecture
  • Resilience
  • Administration
  • Data access
  • Future changes ?

2
HDFS What is it ?
  • HDSF Hadoop Distributed File System
  • It is a distributed file system
  • Runs on low cost hardware
  • It is open source
  • Written in Java
  • Fault tolerant
  • Designed for very large data sets
  • Tuned for high throughput

3
HDFS What is it for ?
  • Designed for batch processing
  • Streaming access to data
  • Large data sizes i.e. Terabytes
  • Highly reliable using data replication
  • Supports very large node clusters
  • Supports large files
  • Supports file numbers into millions

4
HDFS Architecture
5
HDFS Architecture
  • Has a master / slave architecture
  • A master NameNode
  • Controls file system operations
  • Maps data blocks to DataNodes
  • Logs all changes
  • Slave DataNodes
  • Store file blocks
  • Store replicated data

6
HDFS Resilience
  • Data is replicated across DataNodes
  • Nodes may fail but data is still available
  • DataNodes indicate state via heart beat report
  • Single point of failure in master NameNode
  • Data integrity via check sums

7
HDFS Administration
  • Access via Java API
  • FS Shell commands language
  • HTTP browser
  • C wrapper for Java API
  • Space reclamation
  • Via control of replication factor
  • Deleted files sent to trash folder
  • Trash folder cleaned after configurable time

8
HDFS Future changes
  • Things they might consider for HDFS
  • File append
  • User quotas
  • File links
  • Stand by nodes

9
Other Areas
  • Want to know about ?
  • Big Data
  • Nutch
  • Solr
  • see my other presentations

10
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com