Title: Software Defined Acoustic Modems
1Software Defined Acoustic Modems
- Ryan Kastner
- Department of Electrical and Computer
Engineering
- University of California, Santa Barbara
- CENG Seminar
- USC
- March 22, 2005
2Outline
- Underwater Wireless Communication
- AquaNode
- Software Defined Acoustic Modems
- Application Mapping Design Flow
- Application Specification
- Reconfigurable Device Architecture - BLOBs
- Matching Pursuits Core
- Optimizations data representation and
distribution
- Parameterizable number of symbols, samples,
paths
- Design Tradeoffs area, latency, energy,
3Ecological Research Programs
- Santa Barbara Channel Long Term Ecological
Research (SBC LTER)
- Goals
- Focuses on understanding the nearshore ecosystems
of the west coast
- Time/space variation of individual organisms,
populations, and ecological communities
- Moorea LTER
- Goals
- Understanding processing in coral reef, lagoons
and forereef in French Polynesia
- Nature of animal and plant community structure
and diversity
- Responses to environmental change induced either
by human activities or natural cycles
4Monitoring in Moorea
- Establish monitoring sites in lagoons and on fore
reefs surrounding Moorea
- Response variables measured
- Weather
- Tides, Currents and Flows
- Ocean Temperature Color
- Salinity, Turbidity pH
- Nutrients
- Recruitment Settlement
- Size Age Structure
- Species Abundance
- Community Diversity
5Monitoring in the Santa Barbara Channel (SBC)
Stearns Wharf
6Typical Instrumentation SBC mooring
7Existing SBC Moorings
8Monitoring Realities
Repetitive data collection is expensive and
requires massive numbers of person hours
- 1/3 Santa Barbara Coastal LTER budget allocated
to collection and management of monitoring data
- Estimated 10,000 Scientist hours to conduct
large scale monitoring program of tropical
forest
- Need for autonomous, adaptive wireless acoustic
underwater network for eco-surveillance and
adaptive sampling.
9Scenario for WetNet for Eco-Surveillance
- Deploy Ad hoc wireless (acoustic) network in
lagoon
- Network consists of Aquanodes with Conductivity,
Temperature, Depth (CTD) sensors
- Ad hoc network allows Aquanodes to relay data to
a dockside collector
- Aquanode requirements
- Low cost, low power wireless modems
- Integral router
- Integral CTD sensor suite
- Additional nitrate, oxygen chemical sensors
- Real-time data from Moorea available on Web
10Aquanode
Float
Cabled transducer
Software Defined Acoustic Modem
Conductivity/Temperature/ Depth Sensor
Battery
11WetNet using Aquanodes
CTD, currents, nutrient data to Internet.
Adaptive sampling commands to AquaNodes.
Wi-Fi or Wi-Max link
Dockside acoustic/RF comms and signal processing.
Cabled hydrophone array
Dock
AquaNodes with acoustic modems/routers, sensors
12Underwater Acoustic Channel
- Severe multipath - 1 to 10 msec for shallow water
at up to 1 km range
- Doppler Shifts
- Long latencies speed of sound underwater approx
1500 m/sec
Dock
AquaNodes with acoustic modems/routers, sensors.
13Hardware Platform
- Hardware is wirelessly updatable no need to
retrieve equipment to update hardware for
changing communication protocols, sampling,
sensing strategies - Software Defined Acoustic Modems reconfigurable
hardware known to provide, flexible, high
performance implementations for DSP applications
Sensor
Software Defined Acoustic Modem
Transducer
14Software Defined Acoustic Modem
- Ideal One piece of hardware for all sensor
nodes
- Sensor Interface
- Must develop common interface with different
sensors (CTD, chemical, optical, etc.) and
communication elements (transducer)
- Wide (constantly changing) variety of sensors,
sampling strategies
- Communication Interface
- Amplifiers, Transducers
- Signal modulation
Transducer
CTD Sensor
Reconfigurable Hardware Platform
15Acoustic Modem Requirements
- Complex, computationally intensive communication
protocol
- Limited energy
- Fast design tools to aid mapping of the
communication protocols into hardware
Transducer
Communication Protocol
CTD Sensor
Reconfigurable Hardware Platform
Mapping
16Design Considerations for SDAM
- Multipath Spread Range of 1 to 10 milliseconds
for shallow water at up to 1 km range
- Larger bandwidths reduce frequency dependent
multipaths
- Transducers
- Size/weight/cost proportional to wavelength
- Acceptable propagation losses at 100 meter
ranges
- Waveform
- M-FSK signaling
- Datasonics/Benthos modems (used in Seaweb,
FRONT)
- Narrowband thus sensitive to frequency-selective
fading.
- Use more tones increasing sensitivity to
Doppler spread.
- Proposed Walsh/m-sequence signaling
(Direct-sequence)
- Provides frequency diversity due to wide
bandwidth
- Can be detected noncoherently
17Walsh/m-Sequence Waveforms
Chip rate 5 kcps, approx. 5 kHz bandwidth.
Uses 25 kHz carrier. Use 7 chip m-sequence c per
Walsh symbol, 8 bits per Walsh symbol bi.
Composite symbol duration is thus T 11.2 msec.
(Longer than maximum multipath spread.)
Symbol rate is 266 bps, or 133 bps using 11.2
msec. time guard band for channel clearing.
11 msec.
18Transmitted Signal
1
1
-1
1
-1
-1
-1
1
1
-1
1
-1
-1
-1
-1
-1
1
-1
1
1
1
19Walsh/m-sequence Signal Parameters
1
1
-1
1
-1
-1
-1
1
1
-1
1
-1
-1
-1
-1
-1
1
-1
1
1
1
20UWA Walsh/m-sequence GMHT-MP Modem
Generalized multiple hypothesis test (GMHT)
21Acoustic Modem Performance
- True multipath intensity profile (MIP)
- Nf paths assumed by MP estimation
- N? Number of paths present
MP identifies major paths using one symbol of
information
22Acoustic Modem Performance
- Comparison of rake receiver and matching pursuits
- Symbol Error Rate (SER)
- Nf paths assumed by MP estimation
- N? Number of paths present
23UWA Walsh/m-sequence GMHT-MP Modem
?
how do we implement it?
Modem is accurate
24System Design Tools
- Goal Map application specification to system
architecture
- Subject to always increasing design constraints
lower energy, smaller, faster, etc.
System Design
Communication Protocol Walsh/m-sequence GHMT-
MP Acoustic Modem
Reconfigurable System
25System Design and Architecture
- Problem take application code and map it to
some system platform (e.g. reconfigurable
device)
- System platforms are extremely (and increasingly)
complicated, multiprocessing computing systems
- Mix of hardware and software components
- Microprocessors RISC, DSP, network,
- Logic level (FPGA) Reconfigurable logic
- Specs for Xilinx Virtex II
- 3K to 125K logic cells,
- Four PowerPC processor cores
- Complex memory hierarchy - 1,738 KB block RAM,
external memory, local memory in CLBs
- Possibility of soft core processors DSP
- Custom hardware - embedded multipliers, fast
carry chain logic, etc.
- Large amount of performance improvement possible,
IF there is a good mapping
How do we best represent the application for
mapping?
26Obligatory Design Flow Slide
Syntactic/Semantic Analysis
Device Architecture Description
AST
Specification Language
SUIF
Application Specific Optimizations
Application Behaviors
AST
Reconfigurable System Compiler
Machine SUIF
Function Level SSA CFG Generation
Profiling
PDGSSA Generation
SSA CFG
SSA CFG
?Proc Backend
Task Level Optimizations
SSA CFG
?Proc Binary
Instruction Level Optimizations
logic bitstream
Logic and Physical Synthesis
Platform Programming Software
Synthesizable HDL
Functional Reconfigurable System
Hardware Description Language (HDL) backend
Commercial Tools
27Reconfigurable Device Architecture
- Modeling a Reconfigurable Device as an array of
BRAM-Level operation blocks (BLOBs)
- BLOB
- A multiplier
- A BRAM
- Adjacent CLBs
- Adjacent interconnects
28Application Specification
- Can be written in C, SystemC, SystemVerilog,
linear systems, signal flow graph, CDFGs
- Must have front end to task graphs
- Focusing first on a C to task graph
Signal Flow Graph
C code
Linear Systems
if(x
x y
29Application Specific Optimizations
- Data Representation
- Number of bits
- Representation Fixed, Floating,
- Linear System Optimizations
- Convert constant multiplications to shifts and
adds
- Minimize number of operations, latency, area,
etc.
Transposed form of FIR filters
Replacing constant multiplications
by a multiplier block
30Intermediate Representation
- Must exploit fine AND coarse-grain parallelism
- Ideally want automatic mapping
- Need a form that can do synthesis to both
hardware/software
val pred for(i 0 i if(val 32767) val 32767 else
if(val val -32768
?
31PDGSSA Representation
CDFG Form
Input Application
val pred for(i 0 i if(val 32767) val 32767 else
if(val val -32768
32PDGSSA Representation
CDFG Form
PDGSSA Form
33Advantages of PDGSSA
- Exploits parallelism
- Explicitly shows control and data dependences
- Control structures do not limit data parallelism
- Regions are hyperblocks allows aggressive
optimizations
- Synthesis to hardware and software
34Comparing CDFG, PDG
- Benchmarks bunch of MediaBench functions
- PDG, CDFG 2-3 times faster than sequential
execution
- PDG about 7 faster than CDFG
- PDG, CDFG approx. same area
35Comparing Different Predicated Forms
- Comparison with PSSA, sequential execution
- PSSA - predicated static single assignment
- Used by several other projects CASH, Sea
Cucumber
- PDGSSA on average 8 faster than PSSA
36Map Application to HW/SW Cores
- Dependence analysis to exploit fine/coarse grain
parallelism
- Interprocedural dependencies selective
inlining
- Control dependencies loop optimizations,
hoisting, if conversion
- Data dependencies arrays, aliasing, liveness
- System partitioning
- Cluster into coarser grained tasks
- Decide how to divide application onto platform
37System Partitioning
- How do you decide where to map different parts of
the application?
- Hardware or software which processor, which
memory, exact location, etc.
- Extremely hard set of problems (NP-Hard)
- Must be flexible - different applications/systems
have wide variety of models
- Fundamental problem - many different heuristic
methods have been developed
- Simulated annealing
- Genetic Algorithms
- Tabu Search
- Kernighan/Lin
- Ant Colony Optimization
38Code Generation
- Once task graph is partitioned
- Generate code for each task
- Create communication protocols
- Data transfer between BRAMs in BLOBs
- Memory hierarchy local registers (in CLBs),
local BRAM, remote BRAM, off chip memory
- Need code generation from every input
specification to every computational core
- Software use conventional compiler flow
- Reconfigurable Hardware need flow from task to
RTL HDL
39Data Partitioning and Storage Assignment
- Mapping high-level programs into FPGA-based
reconfigurable computing architectures with
distributed block RAM modules
- Objective Improve utilizations of available
storage resources, optimize system performance,
and meeting design goals, including area,
latencies, and throughputs
. . .
40Data Partitioning Challenges
- Reconfigurable systems are different from
multiple processor parallel systems
- Different architecture multiple processors vs.
CLBs
- Different program execution sequential programs
ILP vs. fully parallelized and concurrent
manner
- Challenges
- Indistinct boundaries between local and remote
memory accesses
- Data partitioning and storage assignment has more
compound effects on system performance
- Flexible memory port configurations
41Additional Optimizations Port Configurations
- Different port configurations support different
memory bandwidths, but require address generators
with different complexities
- Example (below) single-port 8-bit block RAM
module vs. dual-port 32-bit block RAM modules
42Additional Optimizations Buffer Insertion
- Similar to software prefetching in
microprocessors, but implemented by inserting
buffers
- Reduce the delay of critical paths, and improve
clock frequencies
43Matching Pursuits Core
- Goal Map matching pursuits to reconfigurable
device
- Parameterizable number of samples, data
representation
- Tradeoffs - Provides designs with various area,
latency, energy,
System Design Tools
Matching Pursuits Algorithm
Reconfigurable System
44Matching Pursuit Algorithm
Additional area cost
Check for Nf times
45Matching Pursuit Algorithm
- Accurate and low complexity approximation to the
Maximum Likelihood (ML) solution
- The redesigned MP is Nf times faster than the
original MP. Both MPs are faster than ML.
- Insignificant increase in memory requirement
ASE0.003
46Data Representation
- Floating point representation
- Large dynamic range and high precision
- Costly
- Fixed point representation
- Requires fewer number of resources
- Comparable performance
- Bitwidth analysis for trading off estimation
accuracy and the number of fixed-point bits
- 8 bits is sufficient
47MP Core Parameterization
- Parameterization of the MP core
- Data representation
- Data distribution
- Tradeoff system parameters
- Parameters in MP
- M the number of training symbols (M1)
- Nf the number of sparse channels (Nf15)
- Ns the number of samples per symbol (Ns88)
- The amount of hardware resources
- How much parallelism can be supported?
- Xilinx Virtex-II XC2V3000
- 1728 KBits (96 BRAMs, 18 KBits/BRAM)
- 96 embedded multipliers (18x18 Bits)
- 3584 CLBs
48Data Distribution
- Calculating matched filter outputs through
matrix-vector multiplications
- A bank of correlators multiplies each sample of
the receivedvector r with the corresponding
sample of a column in an S matrix
Global scheme Local scheme
Distribute matrix by row
Distribute matrix by column
49Experimental Results Correlator Bank
- A bank of correlators multiplies each sample of
the receivedvector r with the corresponding
sample of a column in an S matrix
- It is possible to achieve communication-free
partition
- A good time/resource tradeoff
50Experimental Results Correlator Bank
- Compared communication-free schemes with row-wise
partitioning with the same granularity
- Row-based partitioning results show a very
similar time/resource tradeoff
- 30-50 slower due to the large amount of global
communications and global controls
51Data Distribution Results
- Similarities of two schemes
- As the number of rows/columns distributed into
the same BRAM increases
- Execution time increases linearly
- Area decreases exponentially
- The local scheme exceeds the global scheme
- Due to local communication
- The MAC scheme runs faster than the pipelined
adder-tree scheme before and after physical
layout
- The MAC scheme takes less time in terms of
synthesis and placement and routing
52Matching Pursuits Mapping
53Putting It All Together
- Parameterizable MP core
- Running 216 times faster than a high performance
desktop computer with a 2.17GHz AMD Athlon XP CPU
54Conclusions
- AquaNodes
- Wireless communication devices for underwater
channel
- Software Defined Acoustic Modem (SDAM)
- System Design Flow
- Application Specification
- BLOBs
- Intermediate Representation PDGSSA
- Reconfigurable Hardware Synthesis Techniques
- Matching Pursuits Core
- Core component in SDAM demodulator
- Parameterized samples, symbols, paths
- Design implementations tradeoff latency, area,
power, energy
55ExPRESS Lab
- ExPRESS - Extensible, Programmable,
Reconfigurable Embedded SystemS -
http//express.ece.ucsb.edu/
- Students
- PhD Students Andrew Brown, Wenrui Gong, Anup
Hosangadi, Yan Meng, Gang Wang
- Undergrads - Brian DeRenzi, Talayeh Saderi,
Willis Hoang
- Colloborators Prof. Ronald Iltis, Prof. Hua
Lee, Prof. Volkan Rodoplu, Prof. Timothy
Sherwood
- Sponsors
56Extra Slides
57Leveraging of Existing UCSB Oceanographic
Infrastructure
- Partnership for Interdisciplinary Studies of
Coastal Oceans (PISCO) http//www.piscoweb.org
- - 19 moorings
- - measurements currents, temperature
- Santa Barbara Coastal Long Term Ecological
Research Project
- http//sbc.lternet.edu/
- - 3 moorings Stearns Wharf instrument
package
- - measurements currents, temperature,
conductivity, fluorescence, optical
- backscatter, nitrate
- Moorea Coral Reef LTER
- - several moorings will be deployed beginning
in 2005
- - measurements currents, temperature,
conductivity, fluorescence , optical
- backscatter, others to be
determined (TBD)
- Southern California Coastal Ocean Observing
System (SCCOOS)
- http//www.sccoos.org/
- - at least one coastal mooring will be in the
Santa Barbara Channel
- - measurements currents, temperature,
conductivity,
- spectral fluorescence, optical
backscatter, nitrate, others TBD
58Wireless Communication in Multipath Channels
- Ubiquitous wireless applications
- Multipath fading poses a strong negative effect
on wireless communication
- Multiple paths due to scattering
- Received signal consists of multiple delayed and
attenuated versions of the transmitted signal
- Signal corruption due to destructive interference
59Multipath Channel Estimation
- Recovering corrupted signals due to multipath
propagation
- Improving data rate and reliability
- Enabling accurate radiolocation, MUD
- Key technique for supporting mobility and high
data rate processing
- Found in both acoustic modem and radiolocation
applications
60Problem Formulation of Multipath Channel
Estimation
- r Sf n
- r received signal
- f f1, f2, , fNsT
- fi channel coefficient of path i
- S S1, S2, , SNs
- Si received signal due to path i if fi 1
- Sifi received signal due to path i
- n additive white Gaussian noise
- M the number of training symbols
- Ns the number of samples per symbol duration
- Sparse channel (Nf
- Computing an estimate of f given S and r
containing noise n
61(No Transcript)
62Proposed Approach
- Based on our current efforts on synthesizing C
programs into RTL designs
- Integrate traditional program test and
transformation techniques in parallelizing
compilation into our system compiler framework
- Overview
- Code Analysis calculate the reference
footprints, analyze the iteration space, and
determine directions of partitioning
- Architectural synthesis obtain performance
characteristics of the iteration body
- Data partitioning and granularity adjustment
63Problem Formulation
64Experimental Results Edge Detection
- Sobel edge detection applies horizontal and
vertical detection masks to an input image.
Optimization techniques are utilized
- In the smaller design, we achieve the 150 MHz
design goal with a 46x speedup compared to the
original design
- Failed to achieve the 150 MHz goal, which
indicates it is extremely important to consider
physical attributes of the problem at higher
levels of the design