Hardware

hand
hand
chip
chip
chip for web
chip for web

Introducing SPU-001

  • 1.52mm x 2.2mm WLCSP15 packaging
  • 22nm ULL Process
  • 1 MB of on-chip SRAM
    • 10 MB effective memory with sparsity
    • Unused SRAM can be repurposed
  • SPI for interfacing with host processor
  • Power gating with sleep mode
  • Sub-mW inference for speech, audio, and other 1-D data
ORDER NOW

Dual Sparsity
10 X 10 =

Our hardware can achieve multiplicative benefits in speed and efficiency when both forms of sparsity are present.

Sparse Weights

  • Supports sparsely connected models

  • Only stores and computes on weights that matter

  • 10x improvement in speed, efficiency, and memory

icon model 2 copy
icon model 2 copy

Sparse Activations

  • Supports sparse activations

  • Skips computation when a neuron outputs zero

  • 10x increase in speed and efficiency

icon model 1 copy
icon model 1 copy

Core Design

coredesign gif 2
coredesign gif 2

4 Core Configuration

  • 512 kB on-chip SRAM per core
  • 5 MB effective SRAM with weight sparsity
  • 1.3 mm2 single core (22nm process)
  • AXI interface

Want to learn more about SPU-001 & Our IP Design?

Specification Sheet

SPU Architecture

Near-Memory-Compute

  • Distributes memory into small banks near processing elements to improve throughput
  • Reduces data motion by performing computations close to on-chip memory banks
  • Eliminates energy and memory bottlenecks caused by accessing off-chip memory

Scalable Core Design

  • Core can be tiled, scaling to match needs and constraints of any deployment environment
  • Targets a wide range of applications and form factors
  • Digital design can be ported to other process nodes to balance performance and cost

Software

Femtosense SDK

Our SDK supports PyTorch, TensorFlow, and JAX, so developers can get started with minimal barriers. The SDK provides advanced sparse optimization tools, a model performance simulator, and the Femto compiler. 

femto stack
femto stack

Deploying AI Algorithms to the SPU

Start with existing models : Deploy TensorFlow, PyTorch, and JAX models to the SPU without fine-tuning. Achieve high efficiency for dense models, and even higher efficiency for sparse models.

Optimize

Optimize models for the highest performance with sparsity regularization and quantization aware training. Fine-tune existing models or train from scratch.

Simulate

Simulate energy, latency, throughput, and footprint of your model on the SPU.

Deploy

Use the Femto compiler to deploy to the SPU for verification, testing, and production.

Get updates

Be the first to know about new products and features

Notice
This website stores cookies on your computer. These cookies, as specified in our Privacy Policy, are used to collect information about how you interact with our website. We use this information in order to improve and customize your browsing experiences.

Use the “Accept” button to consent to the use of such technologies. Use the “Reject” button to continue without accepting.

Privacy Preferences

We and selected third parties use cookies or similar technologies to store cookies on your computer for technical purposes and, with your consent, for other purposes such as website analytics.

Allow All
Manage Consent Preferences
  • Strictly necessary cookies
    Always Active

    These trackers are used for activities that are strictly necessary to operate or deliver the service you requested from us and, therefore, do not require you to consent.

  • Analytical / performance cookies

    These trackers help us to measure traffic and analyze your behavior with the goal of improving our service.
    Cookies Details

Save