Machine Learning (ML) is one of the most transformative fields in modern software development, powering applications like image recognition, recommendation systems, and predictive analytics. Traditionally, languages like Python and R have dominated this space due to their extensive libraries and community support. However, Rust — known for its speed, memory safety, and zero-cost abstractions — is quickly emerging as a powerful alternative for performance-critical machine learning tasks.
Rust combines the efficiency of C/C++ with the safety and expressiveness of a modern language. Its strong type system, thread safety, and high-performance capabilities make it an excellent choice for building fast and reliable ML pipelines — especially when models need to be deployed in production systems where performance and safety matter most.
In this tutorial, you’ll learn the fundamentals of Machine Learning in Rust through a hands-on example. We’ll walk you through setting up your Rust environment, exploring popular ML crates, and building a simple Iris flower classifier using the linfa
ecosystem — a growing collection of Rust libraries inspired by Python’s scikit-learn.
By the end of this tutorial, you will have:
-
Learned how to set up Rust for machine learning development
-
Explored how Rust handles data and modeling using
ndarray
andlinfa
-
Built, trained, and evaluated your first ML model in Rust
Let’s get started and see how Rust can bring both performance and safety to your machine learning workflow.
Prerequisites
Before we dive into coding, let’s make sure your development environment is ready to build and run machine learning applications in Rust. This section covers everything you need to get started smoothly.
Basic Knowledge
You should have:
-
A fundamental understanding of the Rust programming language, including how to create projects, use crates, and run code with Cargo.
-
A general idea of machine learning concepts, such as datasets, training, and evaluation. (Don’t worry if you’re new — we’ll explain the essentials as we go.)
1. Install Rust
If you haven’t installed Rust yet, the easiest way is through rustup, the official Rust installer and version manager.
Run the following command in your terminal or command prompt:
curl https://sh.rustup.rs -sSf | sh
Follow the on-screen instructions, then verify the installation:
rustc --version
This command should display the installed Rust compiler version.
2. Set Up Your Editor
You can use any text editor or IDE you prefer, but the recommended setup is:
-
VS Code with the Rust Analyzer extension for syntax highlighting, autocompletion, and in-editor diagnostics.
-
Alternatively, JetBrains CLion or RustRover also provides great Rust support.
3. Create a Working Directory
Choose a folder where you’ll store your Rust ML projects, then open it in your terminal. We’ll create our first project in the next section.
4. Dataset
We’ll use the well-known Iris dataset, which is commonly used for beginner-level machine learning tasks. The dataset is available directly from the linfa-datasets
crate, so you don’t need to download it manually.
5. Cargo and Crates
Make sure you’re familiar with Cargo, Rust’s package manager and build tool. You’ll use it to:
-
Add dependencies (ML and data crates)
-
Run and build your project
-
Manage versions and configurations
Once you have Rust installed and your editor ready, you’re all set to start coding.
Setting Up the Project
Now that your environment is ready, let’s create a new Rust project and set up the dependencies needed for our machine learning example.
1. Create a New Rust Project
Open your terminal and run the following commands:
cargo new rust-ml-intro
cd rust-ml-intro
This creates a new directory named rust-ml-intro
with the standard Rust project structure:
rust-ml-intro/
├── Cargo.toml
└── src/
└── main.rs
-
Cargo.toml – The configuration file for your project, where you’ll define dependencies and metadata.
-
src/main.rs – The main entry point of your Rust application.
2. Add Dependencies
Next, open the Cargo.toml
file and add the following dependencies under the [dependencies]
section:
[dependencies]
linfa = "0.8.0"
linfa-datasets = "0.8.0"
linfa-trees = "0.8.0"
Here’s what each crate does:
-
linfa: The core machine learning toolkit for Rust, providing traits, data structures, and model-building utilities.
-
linfa-datasets: A companion crate that includes built-in datasets like Iris and utilities for loading your own.
-
linfa-trees: Implements decision tree models, which we’ll use to classify Iris flowers.
-
ndarray: Provides powerful N-dimensional array data structures similar to NumPy arrays in Python.
After saving the file, run:
cargo build
This command downloads and compiles all dependencies. The first build might take a few minutes, depending on your internet connection and system performance.
3. Project Verification
Once the build completes successfully, test that everything works by running:
cargo run
You should see the default “Hello, world!” message. This means your Rust environment is properly configured, and the dependencies are installed correctly.
4. Ready for Machine Learning
At this point, your Rust project is ready to handle datasets and build ML models.
In the next section, we’ll take a closer look at how machine learning works in Rust and the key crates that make it possible.
Understanding Machine Learning in Rust
Before we dive into coding, let’s take a moment to understand how machine learning fits into the Rust ecosystem. While Rust is still relatively new in the data science world, it’s rapidly gaining attention for its performance, safety, and reliability — three critical pillars for production-grade machine learning systems.
Why Machine Learning in Rust?
Rust brings unique strengths to the ML space that set it apart from traditional languages like Python or R:
-
Performance: Rust compiles to native code and delivers C/C++-level speed — ideal for computation-heavy ML workloads.
-
Memory Safety: Rust’s ownership system eliminates entire classes of bugs (like null pointer dereferences or data races) without a garbage collector.
-
Concurrency: Rust makes it easier to write safe, concurrent ML applications that can scale efficiently across multiple threads.
-
Integration: It’s easy to integrate Rust models or computations into existing Python or C++ systems, thanks to excellent FFI (Foreign Function Interface) support.
In short, Rust is not just about writing fast code, but writing safe and maintainable code — a huge advantage in machine learning pipelines that often process massive datasets.
The Rust ML Ecosystem
Although smaller compared to Python’s ecosystem, Rust’s machine learning landscape is growing fast. Here are some of the most notable crates:
-
Linfa: A suite of Rust ML algorithms inspired by scikit-learn. It offers a consistent API for tasks like classification, clustering, and regression.
-
SmartCore: Another powerful ML library implementing algorithms such as logistic regression, random forests, and support vector machines.
-
Burn: A deep learning framework for building neural networks with GPU acceleration, similar to PyTorch.
-
tch-rs: Rust bindings for PyTorch’s C++ API — useful for leveraging existing deep learning models.
-
ndarray: The Rust equivalent of NumPy, providing N-dimensional arrays and numerical operations.
For this tutorial, we’ll focus on Linfa, as it provides a beginner-friendly, modular approach to machine learning in Rust — perfect for learning the basics.
How Rust Handles Data
In Rust, datasets are typically represented using the ndarray
crate. It provides two key structures:
-
Array1
for one-dimensional arrays (vectors) -
Array2
for two-dimensional arrays (matrices)
For example:
use ndarray::array;
let x = array![[1., 2.], [3., 4.], [5., 6.]];
println!("{:?}", x);
This data structure is used internally by ML crates like Linfa to represent features and targets.
Building a Simple ML Model
Now it’s time to get hands-on and build your first machine learning model in Rust. We’ll use the Iris dataset, a classic dataset for beginners in ML, and train a Decision Tree classifier to predict the species of iris flowers based on their sepal and petal measurements.
1. About the Iris Dataset
The Iris dataset contains 150 samples of iris flowers, each with four features:
-
Sepal length
-
Sepal width
-
Petal length
-
Petal width
Each sample belongs to one of three species:
-
Iris setosa
-
Iris versicolor
-
Iris virginica
Our goal is to train a model that can accurately predict the flower’s species from its measurements.
2. Load and Explore the Dataset
Open the src/main.rs
file and replace its content with the following:
use linfa::prelude::*;
use linfa_datasets::iris;
use linfa_trees::DecisionTree;
use std::collections::HashSet;
fn main() {
// Load the Iris dataset
let dataset = iris();
// Display basic dataset information
println!("Number of samples: {}", dataset.nsamples());
println!("Number of features: {}", dataset.nfeatures());
let unique_targets: HashSet<_> = dataset.targets().iter().cloned().collect();
println!("Target names: {:?}", unique_targets);
}
Run the program with:
cargo run
You should see output similar to:
Number of samples: 150
Number of features: 4
Target names: {0, 2, 1}
This confirms that the dataset is successfully loaded and ready for training.
3. Split the Data into Training and Test Sets
To evaluate our model properly, we’ll split the dataset into training and testing portions — typically 80% for training and 20% for testing.
Add this to your code after loading the dataset:
// Split the dataset into 80% training and 20% testing
let (train, test) = dataset.split_with_ratio(0.8);
println!("Training samples: {}", train.nsamples());
println!("Testing samples: {}", test.nsamples());
4. Train the Decision Tree Model
Now we can create and train a decision tree classifier using the training data:
// Train the Decision Tree model
let model = DecisionTree::params().fit(&train).unwrap();
The .fit()
method trains the model based on the provided training dataset.
5. Make Predictions
Once the model is trained, we can use it to make predictions on the unseen test data:
// Predict using the test data
let predictions = model.predict(&test);
6. Evaluate Model Accuracy
Finally, we’ll check how accurate our model is using the confusion matrix, which compares predicted and actual labels:
// Evaluate model accuracy
let cm = predictions.confusion_matrix(&test).unwrap();
let accuracy = cm.accuracy();
println!("Model accuracy: {:.2}%", accuracy * 100.0);
7. Full Code Example
Here’s the complete src/main.rs
file:
use linfa::prelude::*;
use linfa_datasets::iris;
use linfa_trees::DecisionTree;
use std::collections::HashSet;
fn main() {
// Load the Iris dataset
let dataset = iris();
// Display basic dataset information
println!("Number of samples: {}", dataset.nsamples());
println!("Number of features: {}", dataset.nfeatures());
let unique_targets: HashSet<_> = dataset.targets().iter().cloned().collect();
println!("Target names: {:?}", unique_targets);
// Split the dataset into 80% training and 20% testing
let (train, test) = dataset.split_with_ratio(0.8);
println!("Training samples: {}", train.nsamples());
println!("Testing samples: {}", test.nsamples());
// Train the Decision Tree model
let model = DecisionTree::params().fit(&train).unwrap();
// Predict using the test data
let predictions = model.predict(&test);
// Evaluate model accuracy
let cm = predictions.confusion_matrix(&test).unwrap();
let accuracy = cm.accuracy();
println!("Model accuracy: {:.2}%", accuracy * 100.0);
}
With this, you’ve successfully trained and evaluated your first machine learning model in Rust! 🎉
Running and Testing the Machine Learning Model
At this point, your Rust machine learning project is fully set up, and the model code is ready. Now, let’s run and test it to see everything in action.
1. Build the Project
First, compile the project to ensure all dependencies are resolved and there are no syntax issues:
cargo build
If everything goes well, you should see output similar to this:
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.13s
This means the project was built successfully.
2. Run the Application
Now, execute your Rust program using:
cargo run
If you’re following the example from the previous section, the output should look something like:
Number of samples: 150
Number of features: 4
Target names: {0, 2, 1}
Training samples: 120
Testing samples: 30
Model accuracy: 70.00%
✅ What’s happening here:
-
The program loads the Iris dataset using Linfa.
-
It trains a K-Nearest Neighbors (KNN) classifier.
-
Then it evaluates the model’s accuracy and displays the confusion matrix.
You’ve just trained and tested a fully working ML model in Rust — impressive!
3. Try Running in Release Mode
To see how fast Rust can really be, run it in release mode:
cargo run --release
Rust’s optimized compiler will make the training and prediction lightning fast — especially compared to interpreted languages like Python.
4. Debugging Tips
If you encounter any errors or unexpected results:
-
Make sure your dependencies in
Cargo.toml
match exactly:[dependencies] linfa = "0.8.0" linfa-datasets = { version = "0.8.0", features = ["iris"] }
-
Run
cargo clean && cargo build
to rebuild everything from scratch. -
Use
println!
generously to inspect data shapes and contents.
5. Testing Your Model
You can add a simple unit test to verify that the model’s accuracy meets a minimum threshold:
#[cfg(test)]
mod tests {
use super::*;
use linfa_datasets::iris;
use linfa_nn::{ LinearSearch, NearestNeighbour, distance::L2Dist };
use ndarray::Array1;
use std::collections::HashMap;
#[test]
fn test_knn_accuracy() {
let dataset = iris();
let (train, valid) = dataset.split_with_ratio(0.8);
let train_records = train.records().to_owned();
let train_targets = train.targets().to_owned();
let valid_records = valid.records();
let nn_index = LinearSearch::new()
.from_batch(&train_records, L2Dist)
.expect("Failed to build index");
let mut predictions = Vec::with_capacity(valid_records.nrows());
for sample in valid_records.outer_iter() {
let neighbours = nn_index.k_nearest(sample, 3).unwrap();
let mut votes = HashMap::new();
for (_pt, idx) in neighbours {
*votes.entry(train_targets[idx]).or_insert(0) += 1;
}
let predicted = votes
.into_iter()
.max_by_key(|(_, c)| *c)
.unwrap().0;
predictions.push(predicted);
}
let pred_ds = Dataset::new(valid_records.to_owned(), Array1::from(predictions));
let acc = pred_ds.confusion_matrix(&valid).unwrap().accuracy();
assert!(acc > 0.8, "Model accuracy too low: {}", acc);
}
}
Run the test with:
cargo test
If the model performs as expected, you’ll see:
running 1 test
test tests::test_knn_accuracy ... ok
✅ At this point, you’ve:
-
Successfully built and run an ML model in Rust.
-
Evaluated its performance.
-
Written a test to validate it automatically.
Conclusion and Next Steps
In this tutorial, you learned how to implement a K-Nearest Neighbors (KNN) algorithm in Rust using the linfa
ecosystem, specifically the linfa_nn
crate for nearest-neighbor searches. You went through the essential steps — setting up dependencies, preparing and normalizing data, training the model, and making predictions using Euclidean distance (L2Dist
).
By following this guide, you’ve gained insight into:
-
How to structure a machine learning workflow in Rust.
-
How to use
ndarray
for handling datasets efficiently. -
How to perform nearest-neighbor queries with
linfa_nn::LinearSearch
.
The Linfa framework aims to provide a comprehensive suite of machine learning algorithms in Rust, and the KNN example is an excellent first step in exploring its power and simplicity.
Next Steps
To continue learning and expanding your skills, here are a few directions to explore:
-
Use a real dataset — try loading data from CSV files using the
linfa-datasets
crate. -
Experiment with distance metrics — explore
linfa_nn::distance::CosineDist
orlinfa_nn::distance::ManhattanDist
. -
Tune parameters — adjust
k
to find the optimal number of neighbors for better accuracy. -
Integrate with other Linfa modules — such as
linfa-bayes
,linfa-trees
, orlinfa-clustering
to build more complex ML pipelines. -
Visualize results — integrate Rust with Python via
PyO3
or output data for visualization in Jupyter notebooks.
Rust’s safety, concurrency, and performance make it a great choice for machine learning projects that require speed and reliability. With Linfa, Rust is quickly becoming a viable language for data science and AI development.
You can get the full source code on our GitHub.
That's just the basics. If you need more deep learning about Rust, you can take the following cheap course:
- Learn to Code with Rust
- Rust: The Complete Developer's Guide
- Master The Rust Programming Language : Beginner To Advanced
- Embedded Rust Development with STM32: Absolute Beginners
- Build an AutoGPT Code Writing AI Tool With Rust and GPT-4
- Rust Programming Bootcamp - 100 Projects in 100 Days
- Learn Rust by Building Real Applications
- Building web APIs with Rust (advanced)
- Developing P2P Applications with Rust
- Real time web applications in Rust
Thanks!