Telegraph Java Experiences - PowerPoint PPT Presentation

About This Presentation
Title:

Telegraph Java Experiences

Description:

100% Java. In memory database. Query engine for alternative ... Telegraph & Java. 9. Tuples ... Java Introspection Helps. Applet-based Front End ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 27
Provided by: just118
Learn more at: https://db.csail.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Telegraph Java Experiences


1
Telegraph Java Experiences
  • Sam Madden
  • UC Berkeley
  • madden_at_cs.berkeley.edu

2
Telegraph Overview
  • 100 Java
  • In memory database
  • Query engine for alternative sources
  • Web
  • Sensors
  • Testbed for adaptive query processing

3
Telegraph WWW FFF
  • Federated Facts and Figures
  • Collect Data on the Election
  • Based on Avnur and Hellerstein Sigmod 00 Work
    Eddies
  • Route tuples dynamically based on source loads
    and selectivities

4
fff.cs.berkeley.edu
5
Architecture Overview
  • Query Parser
  • Jlex CUP
  • Preoptimizer
  • Chooses Access Paths
  • Eddy
  • Routes Tuples To Modules

6
Modules
  • Doubly-Pipelined Hash Joins
  • Index Joins
  • For probing into web-pages
  • Aggregates Group Bys
  • Scans
  • Telegraph Screen Scraper View web pages as
    Relations

7
Execution Framework
  • One Thread Per Query
  • Iterator Model for Queries
  • Experimented with Thread Per Module
  • Linux threads are expensive
  • Two Memory Management Models
  • Java Objects
  • Home Rolled Byte Arrays

8
Tuples as Java Objects
  • Tuple Data stored as a Java Object
  • Each in separate byte array
  • Tuples copied on joins, aggregates
  • Issues
  • Memory Management between Modules, Queries,
    Garbage collector control
  • Allocation Overhead
  • Performance 30,000 200byte tuples / sec - 5.9
    MB / sec

9
Tuples As Byte Array
  • All tuples stored in same byte array / query
  • Surrogate Java Objects

10
Byte Array (cont)
  • Allows explicit control over memory / query (or
    module)
  • Compaction eliminates garbage collection
    randomness
  • Lower throughput 15,000 t/sec
  • No surrogate object reuse
  • Synchronization costs

11
Other System Pieces
  • XML Based Catalog
  • Java Introspection Helps
  • Applet-based Front End
  • JDBC Interface
  • Fault Tolerance / Multiple Servers
  • Via simple UNIX tools

12
RightOrder Questions
  • Performance vs. C
  • JNI Issues
  • Garbage Collection Issues
  • Serialization Costs
  • Lots of Java Objects
  • JDBC vs ODI

13
Performance Vs. C
  • JVM JIT Performance Encouraging IBM JIT
    60 of Intel C compiler, faster than MSC for low
    level benchmarks
  • IBM JIT 2x Faster than HotSpot for Telegraph
    Scans
  • Stability Issues
  • www.javalobby.org/features/jpr

14
JIT Performance vs C
Optimized Intel
Optimized MS
IBM JIT
Source www.javalobby.org/features/jpr
15
Performance Gotchas
  • Synchronization
  • 2x Function Call overhead in HotSpot
  • Used in Libraries Vector, StringBuffer
  • String allocation single most intensive operation
    in Telegraph
  • Mercatur 20 initial CPU Cost
  • Garbage Collection
  • Java dumb about reuse
  • Mercatur 15 Cost
  • OceanStore 30ms avg latency, 1S peak

16
More Gotchas
  • Finalization
  • Finalizing methods allows inlining
  • Serialization
  • RMI, JNI use serialization
  • Philippsen Haumacher Show Performance Slowness

17
Performance Tools
  • Tools to address some issues
  • JAX, Jopt make bytecode smaller, faster
  • www.alphaworks.ibm.com/tech/JAX
  • www.condensity.com
  • Bytecode optimizer
  • www.optimizeit.com
  • Good profiler, memory allocation and garbage
    collection monitor

18
JNI Issues
  • Not a part of Telegraph
  • JNI overhead quite large (JDK 1.1.8, PII 300 MHz)

Source Matt Welsh. A System Support High
Performance Communication and IO In Java.
Masters Thesis, UC Berkeley, 1999.
19
More JNI
  • But, this is being worked on
  • IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8
    (500 Mhz PIII)
  • JNI allows synchronization (pin / unpin), thread
    management
  • See http//developer.java.sun.com/developer/online
    Training/Programming/JDCBook/jni.html
  • GCJ CNI access Java objects via C classes
  • http//gcc.gnu.org/java/

20
Garbage Collection
  • Performance
  • Big problem 1 S or longer to GC lots of objects
  • Most Java GCs blocking (not concurrent or
    multi-threaded)
  • Unexpected Latencies
  • OceanStore Network File Server, 30ms avg.
    latencies for network updates, 1000 ms peak due
    to GC
  • In high-concurrency apps, such delays disastrous

21
Garbage Collection Cont.
  • Limited Control
  • Runtime.gc() only a hint
  • Runtime.freeMemory() unreliable
  • No way to disable
  • No object reuse
  • Lots of unnecessary memory allocations

22
Serialization
  • Not in Telegraph
  • Philippsen and Haumacher, More Efficient Object
    Serialization. International Workshop on Java
    for Parallel and Distributed Computing. San Juan,
    April, 1999.
  • Serialization costs for RMI are 50 of total RMI
    time
  • Discard longevity for 7x speed up
  • Sun Serialization provides versioning
  • Complete class description stored with each
    serialized object
  • Most standard classes forward compatible (JDK
    docs note special cases)
  • See http//java.sun.com/products/jdk/1.2/docs/guid
    e/serialization/spec/serialTOC.doc.html

23
Lots of Objects
  • GC Issues Serious
  • Memory Management
  • GC makes programmers allocate willy-nilly
  • Hard to partition memory space
  • Telegraph byte-array ugliness due to inability to
    limit usage of concurrent modules, queries

24
Storage Overheads
  • Java Object class is big
  • Integer requires 23 bytes in JDK 1.3
  • int requires 4.3 bytes
  • No way to circumvent object fields
  • Use primitives or hand-written serialization
    whenever possible

25
JDBC vs ODI
  • No experience with Oracle
  • JDBC overheads are high, but dont have specific
    performance numbers

26
Bottom Line
  • Java great for many reasons
  • GC, standard libraries, type safety,
    introspection, etc.
  • Significant reductions in development and
    debugging time.
  • Java performance isnt bad
  • Especially with some tuning
  • Memory Management an Issue
  • Lack of control over JVMs bad
  • When to garbage collect, how to serialize, etc.
Write a Comment
User Comments (0)
About PowerShow.com