What is ZKML?
Zero-knowledge machine learning (ZKML) is a fantastic merge of two seemingly unrelated worlds – Zero-Knowledge and Machine Learning.
Both research fields contribute in their respective domains, but their fusion brings us closer to the decentralized world. It is not just a clash of the buzzwords but a meaningful synergy of widely utilized machine learning algorithms and a somehow mysterious niche of zero-knowledge proofs.
Machine Learning algorithms enable machines to learn model parameters from data and provide different forms of predictions or even new data objects generated based on the distributions constructed based on the training data. While their usefulness is undeniable, the catch is the hardware resources and the data volume required for training models with many parameters.
To overcome this problem, cloud services emerged, offering a convenient solution where the service hosts a high-end hardware architecture.
At the same time, the user submits the data, pays the fee, and receives the model inference or object generation results.
This is an acceptable solution. However, there are no guarantees about which model generated the results, nor if the results were indeed correct answers or just random values.
Cloud companies rely on their model parameters, so they are rarely willing to show them to anyone, so revealing the parameters is out of the question. To make things even more problematic, deep learning models, such as deep neural networks, may have millions of parameters that are practically impossible to interpret, acting as a black box even to a user who can see them.
How can we ensure that the results received from the API are genuinely the results returned from the claimed (and charged) model?
ZK to the Rescue!
We thought it over at 3327 quite thoroughly. It’s ZK. Zero-knowledge algorithms prove that a claimed result is generated from a given set of inputs following a strictly defined algorithm.
That is precisely what we need, proving the claim that the results received from ML models were generated using the model on which the service provider was committed initially.
It may seem like the solution is there, and our work is done, but the reality is that the can of worms just started to open.
The first punch in the face comes from the fact that zero-knowledge proofs are based on (unsigned) integer arithmetic, while the model parameters are most commonly signed real numbers.
There are multiple ways to convert the signed real values to unsigned integers, and each of them comes with a tradeoff. The main tradeoffs are in the precision and efficiency of arithmetic operations on a given representation.
Our Proof of Concepts (PoCs)
The next step is creating a ZK circuit to prove the results of an ML model. Our choice was to establish the inference of multilayer perceptron neural network models as it was a general-purpose model with an adequate level of complexity for a PoC.
As Python is the most commonly used language, the goal was to enable the direct generation of execution proofs from Python. The result was the SKProof library, compliant with the famous scikit-learn (SK-Learn) ML python library, specifically with the MLPClassifier model.
Our PoC libraries proved to us that it is possible to create a user-friendly extension of an existing ML library to support ZK proofs and support a basic ZKML use case. However, work still needs to be done. The next challenge – performance optimization and parallelization.
Our Current Battles
Although we have successfully built the PoC for a general-purpose ML model architecture, the problem lies in its efficiency, as it requires more time to generate proofs than we would all like.
That is not the problem only with our libraries; other brothers-in-arms face the same issues. We are now exploring the potential of hardware-accelerated proofs, Rust libraries such as Arkworks library to fine-tune our system constraints, and parallelization of proof generation.
We believe tackling the problem from multiple sides will yield the desired results!
Potential use cases
We introduced a brief overview of the problem in the ML domain where ZK can come in handy, but you might only see the true potential of this technology in theoretical use cases and weird toy projects.
ZKML opens doors for decentralization of the ML industry, where any node in a network can sell their available resources to perform ML-related tasks and verify the results, being able to get paid for their work in a trustless manner.
Businesses with sensitive data, such as healthcare, could sell ML services based on their private data without revealing the raw data to anyone.
The same paradigm can be generalized to selling any computational model. This generalization enables the creation of new marketplaces backed by blockchain smart contracts. The potential for the future is excellent, and we are heading towards it!
We’ve gone deep into the problem of ZKML here at 3327 and got to understand the fundamentals and intricate details surrounding this fascinating field of research.
We also provide consultancy, system design, and implementation services, so if you have any questions related to ZK and ZKML, feel free to contact us; we will do our best to help you. Check out 3327.io
This blog was brought to you by Aleksandar Veljkovic, Senior Researcher at our Research and Development Department — 3327.
The 3327 was funded by the Ethereum Foundation and deals heavily with ZK technology. If you are working on a ZK project and need development, make sure to reach out to us.