Title: An introduction to Apache Gora
1Apache Gora
- What is it ?
- Gora Nutch
- Supports
- Data Access
- API's
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
2Apache Gora What is it ?
- Provides for Big Data
- In memory data model
- Persistence
- Data store abstraction
- Supports persisting to
- Column stores
- Key/value stores
- Document stores
- RDBMS's
- Supports use of Hadoop
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
3Apache Gora What is it ?
- Released via Apache 2 license
- Written in Java
- Offers a persistence framework
- Designed for big data applications
- Used by Nutch 2.x for web crawl data storage
- Used for
- Persistence
- Indexing
- Analytics
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
4Apache Gora Nutch
- Nutch 2.x now uses Gora
- Abstracted storage
- Data store independence
- Handles object to persistent mappings
- Use various NoSql solutions
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
5Apache Gora Supports
- Gora supports the following
- Apache Accumulo
- Apache Cassandra
- Apache Hbase
- Amazon DynamoDB
- Pig
- Hive
- Cascading
- MapReduce
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
6Apache Gora Data Access
- Java API for data access
- Independent of location
- Core Gora API's
- Store
- Persistency
- Query
- MapReduce
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
7Apache Gora Store API
- Java API org.apache.gora.store.
- DataStore handles object persistence
- DataStore methods process objects
- Persist
- Fetch
- Query
- Delete
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
8Apache Gora Persistency API
- Java API org.apache.gora.persistency.
- Core classes
- BeanFactory
- Construct keys
- Persistent
- Persist objects
- State
- State managed through StateManager
- NEW, CLEAN (UNMODIFIED)?
- DIRTY (MODIFIED), DELETED
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
9Apache Gora Query API
- Java API org.apache.gora.query.
- Core classes
- Query
- Constructed via DataStore
- PartitionQuery
- Divide results of Query into partitions.
- Run queries on data nodes.
- Generate Hadoop InputSplits
- Result
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
10Apache Gora MapReduce API
- Java API org.apache.gora.mapreduce.
- GoraMapper
- GoraReducer
- ALL Record Counter
- Reader
- Writer
- Hadoop / Avro
- Serialise
- De-serialise
- Persistent
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
11Contact Us
- Feel free to contact us at
- www.semtech-solutions.co.nz
- info_at_semtech-solutions.co.nz
- We offer IT project consultancy
- We are happy to hear about your problems
- You can just pay for those hours that you need
- To solve your problems