Software Development Kit (SDK)

We’ve built our software development platform to help companies of all sizes deploy optimal sparse AI models for tomorrow’s applications and form factors.

Our SDK contains advanced sparse model optimization tools, a custom compiler, and a fast performance simulator. It’s everything you need from exploration to deployment.

Widely-used ML Frameworks

Develop and deploy networks from high-level Python frameworks like PyTorch.

Easy Optimization

Easily prune, quantize, and fine tune sparse models with our model optimization tools.

Rapid Development

Estimate energy, latency, throughput, and model footprint from a Python API.

flow 20231
flow 20231

Get Started

It's a simple 3-step process

1. Quantize

  1. from fmot import ConvertedModel
  2. model = MyModel(). # Start from a PyTorch model

  3. # Convert and Quantize
  4. cmodel = ConvertedModel(model, batch_dim=0, seq_dim=1)
  5. cmodel.quantize(quantization_inputs) # Quantize the model on a set of inputs
  6. fqir_graph = qmodel.trace() # Serializable IR for the quantized model
Quantize model and convert to serializable intermediate representation

2. Simulate

  1. from femtodriver import Femtodriver
  2. # Simulate power consumption, memory footprint and latency of your model;
  3. with Femtodriver() as fd:
  4. meta_dir, femtofile_path, femtofile_size = fd.compile(fqir_graph)
  5. metrics = fd.simulate(
  6. input_period # Time between frames to estimate leakage
  7. input_file # The file to run the simulation on
  8. )
Simulate energy, latency, throughput, and footprint of your model for rapid model development

3. Deploy

  1. # Generate binaries to deploy on the SPU
  2. with Femtodriver() as fd:
  3. meta_dir, femtofile_path, femtofile_size = fd.compile(fqir_graph)
Use the Femtocrux compiler to deploy to the SPU for verification, testing, and production

Further Optimizations

Push beyond the limits for unparalleled efficiency

4. Sparsify

  1. # Optional: Sparsify your model during training
  2. for batch in train_dataloader:
  3. pruner.prune(model)
  4. optimizer.zero_grad()
  5. outputs = model(batch)
  6. loss = loss_fn(outputs, batch)
  7. loss.backward()
  8. optimizer.step()

Prune model to sparse representation that SPU is uniquely designed to accelerate

5. Quantization Aware Training

  1. # Optional: Run Quantization-Aware Training
  2. for batch in train_dataloader:
  3. optimizer.zero_grad()
  4. outputs = qmodel(batch)
  5. loss = loss_fn(outputs, batch)
  6. loss.backward()
  7. optimizer.step()
Run quantization-aware training to maintain high model performance despite tiny scale

Get Started

Ready to start developing?
Contact Us

Get updates

Be the first to know about new products and features
PDPA Icon

Notice
This website stores cookies on your computer. These cookies, as specified in our Privacy Policy, are used to collect information about how you interact with our website. We use this information in order to improve and customize your browsing experiences.

Use the “Accept” button to consent to the use of such technologies. Use the “Reject” button to continue without accepting.

Privacy Preferences

We and selected third parties use cookies or similar technologies to store cookies on your computer for technical purposes and, with your consent, for other purposes such as website analytics.

Allow All
Manage Consent Preferences
  • Strictly necessary cookies
    Always Active

    These trackers are used for activities that are strictly necessary to operate or deliver the service you requested from us and, therefore, do not require you to consent.

  • Analytical / performance cookies

    These trackers help us to measure traffic and analyze your behavior with the goal of improving our service.
    Cookies Details

Save