Technology

Efficiency from Sparsity

By pruning neural networks to only store and process the necessary components, we reduce power consumption by 100x and memory footprint by 10x — all with minimal effect on performance.

Learn More

Dual Sparsity
10 X 10 =

Our hardware can achieve multiplicative benefits in speed and efficiency when both forms of sparsity are present.

Sparse Weights

Supports sparsely connected models
Only stores and computes on weights that matter
10x improvement in speed, efficiency, and memory

icon model 2 copy

Sparse Activations

Supports sparse activations
Skips computation when a neuron outputs zero
10x increase in speed and efficiency

icon model 1 copy

Sparsity-First Approach

Sparsity First Approach

SPU Architecture

icon sparse acceleration

Sparse Acceleration

Native hardware accelerates sparse math and compresses sparse data
Custom instructions maximize efficiency and speed for algorithms with sparsity
Custom memory formats maximize packing density for sparse data

Near-Memory Compute

Disaggregate on-chip memory and disperse near processing elements to reduce data motion and parallelize data access
Retain workloads on-chip to eliminate energy and throughput bottlenecks of off-chip memory access
Sparse acceleration maximizes effective on-chip memory capacity

Scalable Core

Tile or divide cores to match the needs and constraints of any deployment
Cover a wide range of applications and form factors with the same architecture
Pure digital design can be easily ported across process nodes to optimize performance vs. cost

Tooling

Build with sparsity

Design new neural networks that maximize available memory, power, and bandwidth.
Optimize existing neural networks for speed, efficiency, and memory footprint with minimal impact on performance.

Tune with ease

Optimize for different scenarios with intuitive, flexible, and powerful fine-tuning utilities
Use custom layers, sparsity regularization, sparsity loss functions, and quantization aware training to meet your objectives

Deploy seamlessly

Deploy models to hardware simulation for rapid iteration or to actual hardware with minimal firmware.
Build prototypes using one of our pre-built hardware integrations.

Sparsity in Action

Femtosense’s dual sparsity design consumes much less energy than existing approaches by using only a fraction of active neurons.

Hardware

Introducing SPU-001

1.52mm x 2.2mm WLCSP15 packaging
22nm ULL Process
1 MB of on-chip SRAM
- 10 MB effective memory with sparsity
- Unused SRAM can be repurposed
SPI for interfacing with host processor
Power gating with sleep mode
Sub-mW inference for speech, audio, and other 1-D data

ORDER NOW

Core Design

4 Core Configuration

512 kB on-chip SRAM per core
5 MB effective SRAM with weight sparsity
1.3 mm2 single core (22nm process)
AXI interface

Want to learn more about SPU-001 & Our IP Design?

Specification Sheet

Software

Femtosense SDK

Our SDK supports PyTorch, TensorFlow, and JAX, so developers can get started with minimal barriers. The SDK provides advanced sparse optimization tools, a model performance simulator, and the Femto compiler.

Deploying AI Algorithms to the SPU

Start with existing models : Deploy TensorFlow, PyTorch, and JAX models to the SPU without fine-tuning. Achieve high efficiency for dense models, and even higher efficiency for sparse models.

Optimize

Optimize models for the highest performance with sparsity regularization and quantization aware training. Fine-tune existing models or train from scratch.

Simulate

Simulate energy, latency, throughput, and footprint of your model on the SPU.

Deploy

Use the Femto compiler to deploy to the SPU for verification, testing, and production.