Grigori Fursin
Co-designing more efficient and cost-effective AI systems at FlexAI | supporting open science, reproducibility and automation at ACM, IEEE and MLCommons | ex VP of MLOps at OctoAI (now Nvidia) | ex co-director of the Intel Exascale Lab | ex senior tenured scientist at INRIA | ex adjunct professor at the University of Paris-Scalay | PhD from the Unviersity of Edinburgh
My name is Grigori Fursin, and I currently live in the suburbs of Paris.
I was among the first researchers to pioneer the use of AI and ML to modernize computer systems—including compilers, run-time systems,
software, and hardware—contributing to more efficient, cost-effective, and scalable solutions for AI, ML, and other emerging workloads while managing
their growing complexity and reducing time to market. I have also been an active advocate for open science, reproducibility, and open-source contributions since 2008,
when I released all my research code, data, models, and experiments for our ML-based self-optimizing compiler to foster collaborative and reproducible R&D to co-design
more efficient AI and ML systems: ACM TechTalk'21.
After serving as a senior tenured research scientist at INRIA, an adjunct
professor at the University of Paris-Saclay, and co-director of the Intel
Exascale Lab, I transitioned my research and open-source tools into
industry. I founded several successful companies in the fields of
performance optimization and knowledge management, the most recent of which
was acquired by OctoAI (now Nvidia), where I served as VP of MLOps.
As part of my community service, I helped establish artifact evaluation and reproducibility initiatives
at ACM and IEEE conferences,and introduced a unified artifact appendix
adopted by ASPLOS, CGO, PPoPP, SuperComputing, MICRO, and other conferences. I also contributed to setting up the Intel Exascale Lab, the non-profit cTuning Foundation,
the educational Collective Knowledge initiative and MLCommons
to accelerate AI innovation for the benefit of all. I was honored to receive the ACM CGO Test of Time Award, multiple Best Paper Awards,
the INRIA Award for Scientific Excellence, and the EU HiPEAC Technology Transfer Award for my research. I'm also very glad that my
open-source technology helps many companies and organizations, including MLCommons.
With an interdisciplinary background in computer systems, compilers, machine learning, physics, and electronics—along with over 20 years of experience in research, development,
and industry—I help companies, startups, universities and non-profits establish R&D labs and launch innovative projects with solid methodologies for collaborative and reproducible R&D.
I also regularly share my knowledge, expertise, and wisdom with students,
researchers, businesses, governments, and investors, helping them navigate
the complex and rapidly evolving deep-tech landscape, avoid common
pitfalls, and drive meaningful progress.
Please check a few recent presentations and publications if you want to learn more about my current and past activities:
ACM TechTalk'21,
Google scholar,
keynote at ACM REP'23,
ArXiv white paper'24,
overview in Philosophical Transactions of the Royal Society'21,
ArXiv paper'17 about ML-based compiler auto-tuning,
and my reproducibility initiatives at ML and Systems conferences since 2014.
My current activities:
- head of FlexAI Cloud Services Labs leading efforts to co-design more efficient and cost-effective systems for AI, ML,
and other emerging workloads.
- organizer of reproducibility initiatives and artifact evaluation for AI, ML and Systems conferences
and MLPerf benchmarks in collaboration with ACM, IEEE and MLCommons since 2013. I am leading the development
of a common interface and automation language to make it easier to rerun and reuse code, data and experiments from published papers -
see my ACM Tech Talk'21,
ACM REP'23 keynote and white paper'24 for more details.
- member of the Program Committee at ACM Conference on Reproducibility and Replicability 2025.
- founder and architect of the Collective Knowledge Playground
- an educational initiative to learn how to co-design software and hardware
to run AI, ML and other emerging workloads in the most efficient and cost-effective way across diverse models, datasets, software and hardware
(trading off performance, power consumption, accuracy, cost and other characteristics) in collaboration with MLCommons, cTuning and other organizations.
Please check this ArXiv white paper.
Brief summary of my past activities:
- founder and co-chair of the MLCommons Task Force on Automation and Reproducibility to modularize and automate MLPerf benchmarks using my CM framework (white paper);
- author and tech.lead of the Collective Mind workflow automation framework (CM)
adopted by MLCommons and the Autonomous Vehicle Computing Consortium (AVCC)
to modularize MLPerf benchmarks and make it easier to run them across diverse models, data sets, software and hardware from different
vendors using portable, reusable and technology-agnostic automation recipes
(see online catalog of MLOps and MLPerf scripts
and online docs to run MLPerf inference benchmarks).
I donated this open-source technology to MLCommons to benefit everyone
and continue developing it as a community effort.
You can learn more about this project in this white paper.
Since 2025, we split CM developments into an extended version of CM (CMX)
and a simplified version of CM for MLPerf.
I thank our great contributors
for their feedback and support.
- vice president of MLOps at OctoML where I prototyped the first version of CM and CM4MLOps together with the cTuning foundation before donating it to MLCommons to benefit everyone;
- founder and chief architect of cKnowledge.io acquired by OctoML;
- author of the Collective Knowledge technology (CK)
powering cKnowledge.io;
- author of the Artifact Evaluation and Reproducibility checklist (Unified Artifact Appendix) for ACM/IEEE conferences
(see example of my artifact appendix at the end of this ASPLOS'24 paper "PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation");
- co-founder of a CodeReef platform for universal MLOps with Nicolas Essayan;
- founder in residence at Enterpreneur First;
- co-director of the Intel Exascale Lab and tech.lead for performance analysis, optimization and co-design of high-performance
and cost-effecitve computer systems;
- senior tenured scientist at INRIA developing the foundations to co-design more efficient and cost-effective computer systems using auto-tuning, machine-learning and run-time adaptation;
- research associate at the University of Edinburgh;
- holder of the PhD in computer science from the University of Edinburgh with the Overseas Research Student Award (compilers, run-time systems and software/hardware co-design);
- recipient of the European technology transfer award, ACM CGO test of time award and INRIA award of scientific excellence
for my original research to use AI, ML, federated learning and collective tuning (cTuning)
to automate development of high-performance and cost-effective computer systems
and reduce R&D costs and time to market by an order of magnitude.
My timeline (may not be up-to-date):
My community service (collaboration with MLCommons, ACM, IEEE, HiPEAC and other organization)
Academic research (tenured research scientist at INRIA with PhD in CS from the University of Edinburgh)
- I prepared the foundations to combine machine learning, autotuning, knowledge sharing and federated learning
to automate and accelerate the development of efficient software and hardware by several orders of magnitude
(Google scholar);
- developed Collective Knowledge and Collective Mind technology
and started educational initiatives
with ACM, IEEE, HiPEAC, Raspberry Pi foundation and MLCommons to bring my research and expertise to the real world to benefit everyone;
- prepared and tought M.S. course at the Paris-Saclay University on using ML to co-design efficient software and hardare (self-optimizing computing systems);
- gave 100+ invited talks about my R&D;
- honored to receive the ACM CGO test of time award, several best papers awards, INRIA award of scientific excellence and EU HiPEAC technology transfer award.
Project management and open-source development
- 2024-cur: leading the development of the next generation of my Collective Mind and Collective Knowledge technology to help researchers, engineers and students co-design software and hardware for more efficient and cost-effective AI;
- 2023-2024: led the development of the Collective Knowledge playground to benchmark and optimize AI/ML Systems via reproducible optimization challenges and tournaments;
- 2022-2024: led the development of the Collective Mind automation framework (CM) to modularize AI/ML systems and make it easier to benchmark and optimize them across diverse and rapidly evolving models, data sets, software and hardware from different vendors - I donated CK and CM to MLCommons to benefit everyone and continue developing at as a community effort (see white paper);
- 2014-2019: developed the Collective Knowledge framework to automate and accelerate design space exploration of AI/ML/SW/HW stacks while balancing speed, accuracy, energy and costs;
- 2010-2011: led the development of the cTuning 2 automation framework to benchmark emerging workloads across diverse hardware at Intel Exascale Lab;
- 2007-2009: led the development of the ML-based compiler and the cTuning.org platform across 5 teams to automate and crowdsource optimization of computer systems - this technology is considered to be the first in the world;
- 2007-2009: led the development of the compiler plugin framework in collaboration with Google and Mozilla that was added to the mainline GCC powering all Linux-based computers and helped to convert production compilers into research toolsets for machine learning;
- co-founded an engineering company (dividiti) and led it to $1M+ in revenue with Fortune 50 customers using my CK technology; donated CK technology to MLCommons in 2021;
Entrepreneurship
- 2020: founded and developed the cKnowledge.io platform acquired by OctoML.ai (now Nvidia);
- 2019: prototyped CodeReef platform with Nicolas Essayan;
- 2019: was selected for the 2nd Enterprenuer First cohourt in Paris to learn how to create startups and avoid numerous pitfalls.
Professional Career
- 2024-cur.: Head of FCS Labs at FlexAI, leading efforts to co-design more efficient and cost-effective AI systems.
- 2023-cur.: Founder of the Collective Knowledge Playground -
a free, open-source and technology-agnostic platform for collaborative benchmarking, optimization and comparison
of AI and ML systems via open and reproducible challenges
powered by my CK/CM technology.
Our technology was successfully validated by the community and MLCommons members
by automating, unifying and reproducing more than 80% of all MLPerf inference benchmark submissions
(and 98% of power results) with very diverse technology from Neural Magic, Qualcomm, DELL, HPE, Lenovo,
Hugging Face, Nvidia, AMD, Intel and Apple across diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite,
TVM and TensorRT using popular cloud providers (GCP, AWS, Azure) and individual servers and edge devices provided
by our volunteers and contributors.
- 2021-2023: Vice President at OctoAI (now Nvidia) leading the development of the 2nd generation of my
open-source CK workflow automation technology (aka Collective Mind)
and connecting it was TVM. Our technology was adopted
by MLCommons (125+ AI software and hardware companies)
to modularize AI/ML Systems and automate their development, optimization and deployment from the cloud to the edge.
- 2019-2021: Founder and developer of the cKnowledge.io platform to organize AI, ML
and Systems knowledge and enable efficient computing based on FAIR principles
(acquired by OctoAI (now Nvidia)).
- 2019: Founder in residence at Entrepreneur First learning how to build deep tech startups and MVPs from scratch while avoiding numerous pitfalls and minimizing all risks. This knowledge and experience helped me to meet many amazing people and create the cKnowledge.io platform acquired by OctoML.ai.
- 2015-2019: Co-founder and CTO of a commercial engineering company (dividiti) testing my CK framework in industry.
- 2016-2018: R&D project partner with General Motors (AI/ML/SW/HW co-design project).
- 2017-2018: R&D project partner with the Raspberry Pi foundation (crowd-tuning and machine learning).
- 2015-2016: Subcontractor for Google (performance autotuning and SW/HW co-design).
- 2014-2015: R&D project partner with Arm (EU H2020 TETRACOM project).
- 2012-2014: Tenured Research Scientist (associate professor) at INRIA.
- 2010-2011: Co-director of the Intel Exascale Lab (France) and a head of the software/hardware optimization and co-design group (on sabbatical).
- 2007-2010: Guest lecturer at the University of Paris-Sud.
- 2007-2010: Tenured Research Scientist (assistant professor) at INRIA.
- 1999-2006: Research Associate at the University of Edinburgh.
Awards
- 2017: ACM CGO test of time award for my research on ML-based self-optimizing compilers.
- 2016-cur.: Microsoft Azure Research award to support cKnowledge.org.
- 2015: European technology transfer award for my Collective Knowledge automation technology.
- 2012: INRIA scientific excellence award and personal fellowship.
- 2010: HiPEAC award for PLDI paper.
- 2009: HiPEAC award for MICRO paper.
- 2006: CGO best paper award.
- 2000: Overseas research student award for my Ph.D.
Education
- 2004: PhD in computer science with the ORS award from the University of Edinburgh.
- 1999: MS in computer engineering with a golden medal (summa cum laude) from MIPT.
- 1997: BS in electronics, mathematics and machine learning (summa cum laude) from MIPT.
Main scientific contributions
Prepared the foundations, scientific methodology, and tools to automatically co-design software and hardware
from different vendors that can run emerging workloads in the unified and most efficient way
in terms of speed, accuracy, energy and other costs.
Professional memberships
Main software developments
- 2023-cur.:
Developed a prototype of the Collective Knowledge playground
to collaboratively benchmark and optimize AI, ML and other emerging applications
in an automated and reproducible way via open challenges - see white paper
for more details.
I used Streamlit; my CM/CM4MLOps/CM4MLPerf/CM4ABTF automation; Python; MLPerf benchmarks
.
- 2022-2024.:
Prototyped the Collective Mind automation framework
with CM4MLOps automation scripts, CM4MLPerf interface to run MLPerf benchmarks and
CM4ABTF to run automotive benchmarks across diverse models, data sets, software and hardware
from different vendors in a unified and automated way
I used Python; loadgen; Docker/Podman; HuggingFace/Transformers/PyTorch/ONNX/TF; Ubuntu/Windows/RHEL/MacOS; Nvidia/Intel/AMD/Qualcomm/Arm ; AWS/Azure/Scaleway
.
- 2020-cur.:
Developed a prototype of the cKnowledge.io to organize all knowledge
about AI, ML, systems, and other innovative technology from my academic and industrial partners
in the form of portable CK workflows, automation actions, and reusable artifacts.
I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks
from data centers and supercomputers to mobile phones and edge devices
in terms of speed, accuracy, energy, and various costs.
I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers
and accelerate their adoption in production.
I collaborate with MLPerf.org to automate and simplify ML&systems benchmarking
and fair comparison based on the CK concept and DevOps/MLOps principles.
I used the following technologies: Linux/Windows/Android; Python/JavaScript/CK; apache2; flask/django; ElasticSearch;
GitHub/GitLab/BitBucket;
REST JSON API;
Travis CI/AppVeyor CI;
DevOps;
CK-based knowledge graph database; TensorFlow; Azure/AWS/Google cloud/IBM cloud
.
- 2018-cur.:
Enhanced and stabilized all main CK components
(software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization)
successfully used by dividiti to automate MLPerf benchmark submissions.
I used the following technologies: Linux/Windows/Android; CK/Python/JavaScript/C/C++;
statistical analysis; MatPlotLib/numpy/pandas/jupyter notebooks;
GCC/LLVM; TensorFlow/PyTorch; Main AI algorithms, models and data sets for image detection and object classification;
Azure/AWS/Google cloud/IBM cloud;
mobile phones/edge devices/servers;
Nvidia GPU/EdgeTPU/x86/Arm architectures
.
- 2017-2018:
Developed CK workflows
and live dashboards for
the 1st open ACM REQUEST tournament
to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs.
We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC;
ImageNet;
MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet;
MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM;
Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure
.
- 2017-2018:
Developed an example of the autogenerated and reproducible paper
with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
(collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran;
MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis;
crowd-benchmarking; crowd-tuning
.
- 2015-cur.:
Developed the Collective Knowledge framework (CK)
to help the community
automate typical tasks in ML&systems R&D,
provide a common format, APIs, and meta descriptions for shared research projects,
enable portable workflows,
and improve the reproducibility and reusability in computational research.
We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks
in terms of speed, accuracy, energy and other costs across diverse platforms
from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices;
Python/C/C++/Java;
ICC/GCC/LLVM;
JSON/REST API;
DevOps;
plugins;
apache2;
Azure cloud;
client/server architecture;
noSQL database (ElasticSearch);
GitHub/GitLab/BitBucket;
Travis CI/AppVeyor CI;
main math libraries, DNN frameworks, models, and datasets
.
- 2012-2014:
Prototyped the Collective Mind framework - prequel to CK.
I focused on web services but it turned out that my users wanted basic CLI-based framework.
This feedback motivated me to develop a simple CLI-based CK framework.
- 2010-2011:
Helped to create KDataSets (1000 data sets for CPU benchmarks)
(PLDI paper,
repo).
- 2008-2010:
Developed the Machine learning based self-optimizing compiler connected with cTuning.org
in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be
the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog;
semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis;
crowd-benchmarking; crowd-tuning; plugins; client/server architecture
.
- 2008-2009:
Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs
(report).
- 2008-2009:
Developed the interactive compilation interface
now available in mainline GCC (collaboration with Google and Mozilla).
- 2008-cur.:
Developed the cTuning.org portal
to crowdsource training of ML-based MILEPOST compiler
and automate SW/HW co-design similar to SETI@home. See press-releases from IBM
and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2;
client/server architecture; KNN/SVM/decision trees; plugins
.
- 2009-2010:
Created cBench (collaborative CPU benchmark to support autotuning R&D)
and connected it with my cTuning infrastructure from the MILEPOST project.
- 2005-2009:
Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.
- 1999-2004:
Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins
to integrate autotuning/benchmarking tools and techniques from other partners
.
- 1999-2001:
Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS
.
- 1998-1999:
Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D
.
- 1993-1998:
Developed an analog semiconductor neural network accelerator (Hopfield architecture).
My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation;
ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments
.
- 1991-1993:
Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management
.
My favorite story about Ernest Rutherford and Niels Bohr