Title: ?????????? High Throughput Computing Technologies and NCHC's Platform Service
1??????????High Throughput Computing Technologies
and NCHC's Platform Service
??????????? ??? ltjazz_at_narlabs.org.twgt 2013/09/13
- 2013 Big Data????
2????????, ????????
????????,??? ???????????? ?????? ??????
http//www.pursuantgroup.com/blog/tag/dikw-model/
2
3???????????, ????????????!
????
????????
????
????
?????
?????
???
?????????
SMAQ
?????
????
???
????????
???
???
Open Data
???
??
3
4?????????3 Vs of Big Data
???? 1 Laney, Douglas. "3D Data Management
Controlling Data Volume, Velocity and Variety"
(6 February 2001) 2 Gartner Says Solving 'Big
Data' Challenge Involves More Than Just Managing
Volumes of Data, June 2011
?????????????????????????????
4
5???????????(1)Data at Rest MapReduce Framework
MapReduce Framework
Petabyte File System
Hadoop HPCC
Unstructured
5
6???????????The SMAQ stack for big data
??????????????SMAQ(Storage, MapReduce and Query)
???????????LAMP
????The SMAQ stack for big data,Edd Dumbill,22
September 2010, http//radar.oreilly.com/2
010/09/the-smaq-stack-for-big-data.html ????http
//smashingweb.ge6.org/wp-content/uploads/2011/10/a
pache-php-mysql-ubuntu.png
7????????? HadoopKey Concept Data Locality
Hadoop ???????????? ?????????????????? ??????????
????????,?? ???? map ? reduce ??????????
????
??????? ??????????? HDFS?? ????????????????????
????
Map
Reduce
????
????
8?????????Processing Time of Batch Jobs
9???????????(2)Data in Motion In-Memory
Processing
HBase / Drill/ Impala
Unstructured
9
10Google????? vsApache ??
Dremel (2010)
Apache Drill (2012)
Big Query (JSON, SQL-like)
Percolator (2010)
Incremental Index Update(Caffeine)
Pregel (2009)
Apache Giraph (2011)
Graph Database
BigTable(2006)
Apache HBase(2007)
MapReduce(2004)
Hadoop MapReduce(2006)
Google File System (2003)
HDFS(2006)
11???????????????NoSQL vs NewSQL
http//www.infoq.com/news/2011/04/newsql
11
12In-Memory Processing??????HBase??
13???????????(3)Streaming Data Collection
Unstructured
Message Queue Storm / Kafka
13
14????????? Life of Big Data
14
15Twitter Storm Apache Kafka
http//blog.infochimps.com/2012/10/30/next-gen-rea
l-time-streaming-storm-kafka-integration/
16?????????????Lambda Architecture
HBase Storm
ElephantDB OrVoldemort
Source Lambda Architecture, 8. March
2013http//www.ymc.ch/en/lambda-architecture-part
-1
17??????????????
??????????? ??? ltjazz_at_narlabs.org.twgt 2013/09/13
- 2013 Big Data????
18hadoop.nchc.org.tw ??
- 2009-04-13 ????????,12?
- 2010-10-20 ????,21?
- ??2013-09-10,??4012????
- ????6???,15??????
- ???????(??????????)
- 94???
- 33?????
- 3???(??/???/??????)
18
19??????Current Architectre
20??????Hadoop???? On-Demand Self Service
Powered by Zterm http//zhouer.org/ZTerm/
21??????????? - Web-based Console
22??????????????? - RStudio ????
http//hadoop.nchc.org.tw/rstudio/
23?????? Lesson Learned
- ??CDH?HDP2?????????????
- ???????
- ?DRBL?????????? /opt/drbl/sbin/drbl-useradd
- ??5000????????LDAP OpenID????
- ??????!(??????,????,???????)
- ?????HDFS???
- ????????,? hadoop fs -mkdir tmp
- ????????HDFS??
- ????????,?hadoop dfs -chown (id) /usr/(id)
- ???hadoop dfs -chmod -R 700 /usr/(id)
24?????? Lesson Learned
- ????
- JBOD??,????RAID?
- I/O??HDFS??(??)??,MapReduce????
- ?????
- ???????,???SWAP Partition
- ????In-Memory Processing???,???????
- ??????_at_ 2013
- 1 core 28 GB RAM 2 TB Disk
25?????? Lesson Learned from Users
- ??????????????????
- ???????????,?????????????!
- ?????????????Mobile App??????
- ???NoSQL??NewSQL????I/O??!
- ??Open Data,????????
- Data as a Service ?????????????
- ???????????????,???????????
- ?????
- ??????????????????
- ??????????????????????????!
26????Future Plan
27(No Transcript)
28