Research Timeline - Grigori Fursin

Grigori Fursin's news and timeline

Mission LinkedIn Google scholar Twitter Medium GitHub Discord EMail

My news:

2024 July 15: I summarized my current plans and long-term vision to enable more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments int this white paper at ArXiv.
2024 April 2: I am very excited to be helping students run the MLPerf inference benchmark at the upcoming Student Cluster Competition at SuperComputing'24.
2024 March 30: We have completed a collaborative engineering project with MLCommons to enhance CM workflow automation to run MLPerf inference benchmarks across different models, software and hardware from different vendors in a unified way - it was successfully validated by automating ~90% of all MLPerf inference v4.0 performance and power submissions while finding some top performance and cost-effective software/hardware configurations for AI systems: see our report for more details.
2024 March 2: I presented our new collaborative project to Automatically Compose High-Performance and Cost-Efficient AI Systems with MLCommons' Collective Mind and MLPerf" at the MLPerf-Bench workshop @HPCA'24
2024 January 18: I gave invited talk about "what is next for reproducibility in computer systems research" at the HiPEAC'24 conference.
2023 December 18: We are working for MLCommons to enhance CM workflow automation language to modularize, automate and unify MLPerf benchmarks and make it easier to submit and reproduce MLPerf inference results across diverse models, datasets, software and hardware from Nvidia, Intel, Qualcomm, AMD, Google and other vendors.
2023 November 17: We completed artifact evaluation for ACM/IEEE MICRO'23 and successfully prototyped the use of the common CM interface and automation to manage and run research projects.
2023 November 15: The ACM YouTube channel has released my REP'23 keynote about the MLCommons CM automation language and CK playground: toward a common language to facilitate reproducible research and technology transfer.
2023 September: the cTuning foundation and my cKnowledge Ltd are proud to deliver the new version of the MLCommons CM workflow automation language, CK playground and modular inference library (MIL) that became the 1st workflow automation enabling mass submission of 12K+ performance results in a single MLPerf inference submission round with more than 1900 power results across more than 120 different system configurations from different vendors (different implementations, all reference models and support for DeepSparse Zoo, Hugging Face Hub and BERT pruners from the NeurIPS paper, main frameworks and diverse software/hardware stacks) in both open and closed divisions - see related HPC Wire article for more details about our CK and CM technology.
2023 July 19: I am very excited to be a part of the great collaborative project to enable "Federated benchmarking of medical artificial intelligence with MedPerf" - see the overview of this project in the joint Nature article.
2023 June 29: I gave a keynote "Toward a common language to facilitate reproducible research and technology transfer: challenges and solutions" at the 1st ACM conference on reproducibility and replicability in Santa Cruz. Slides are available at Zenodo.
2023 April 3: I am excited to announce that I am leading the development of the MLCommons Collective Knowledge Playground - a free, open-source and technology-agnostic platform to collaboratively benchmark, optimize and compare AI and ML Systems in terms of cost, performance, power consumption, accuracy, size and other metrics via open and reproducible challenges.
2023 Feb 16: New alpha CK2/CM GUI to visualize all MLPerf results is available here.
2023 Jan 30: New alpha CK2/CM GUI to run MLPerf inference is available here.
2022 November: I am very glad to see that our new CK2 automation meta-framework (CM) was successfully used at the Student Cluster Competition'22 to make it easier to prepare and run the MLPerf inference benchmark just under 1 hour. If you have 20 minutes, please check this tutorial to reproduce results yourself ;) !
2022 September: I am very excited to announce the release of the MLCommons Collective Mind toolkit v1.0.1 - the next generation of the MLCommons Collective Knowledge framework. It is being developed by the public workgroup after I have donated CK to MLCommons last year. We are very glad to see that more than 80% of all performance results and more than 95% of all power results were automated by the MLCommons CK v2.6.1 in the latest MLPerf inference round thanks to submissions from Qualcomm, Dell, HPE and Lenovo!
2022 July: We have pre-released CK2(CM) portable automation scripts for MLOps and DevOps: github.com/mlcommons/ck/tree/master/cm-mlops/script.
2022 March: I am very excited to announce the development of the CM framework (aka CK2) based on the community feedback - join our collaborative effort!
2022 February: We've successfully completed the artifact evaluation at ASPLOS'22!
2021 October: My Collective Knowledge framework became an official MLCommons project! I am looking forward to work with the community to make it easier to benchmark and co-design efficient ML Systems across continuously changing hardware, software, models and data sets!
2021 April: I am excited to join OctoML.ai as a VP of MLOps to automate development, optimization and deployment of efficient ML Systems (speed, accuracy, energy, size and costs) from the cloud to the edge that can help to solve real world problems.
2021 March: My ACM TechTalk about "reproducing 150 Research Papers and Testing Them in the Real World" is available on the ACM YouTube channel.
2021 March: The report from the "Workflows Community Summit: Bringing the Scientific Workflows Community Together" is now available in ArXiv.
2021 March: My paper about the CK technology has appeared in the Philosophical Transactions A, the world's longest-running journal where Newton published: DOI, ArXiv.
2020 December: I am honored to join MLCommons as a founding member to accelerate machine learning and systems innovation along with 50+ leading companies and universities: press-release.
2020 December: We are organizing artifact evaluation at ACM ASPLOS'21.
2020 November: The overview of the CK project was accepted for the Philosophical Transactions of the Royal Society: peer-reviewed journal paper.
2020 October: My CK framework helped to automate and reproduce many MLPerf benchmark v0.7 inference submissions: see shared CK solutions and CK dashboards to automate SW/HW co-design for edge devices.
2020 September: My Reddit discussion about our painful experience reproducing ML and systems papers during artifact evaluation.

My academic research (tenured research scientist at INRIA with PhD in CS from the University of Edinburgh)

I was among the first researchers to combine machine learning, autotuning and knowledge sharing to automate and accelerate the development of efficient software and hardware by several orders of magnitude (Google scholar);
developed open-source tools and started educational initiatives (ACM, Raspberry Pi foundation) to bring this research to the real world (see use cases);
prepared and tought M.S. course at Paris-Saclay University on using ML to co-design efficient software and hardare (self-optimizing computing systems);
gave 100+ invited research talks;
honored to receive the ACM CGO test of time award, several best papers awards and INRIA award of scientific excellence.

Project management, system design and consulting (collaboration with MLCommons, IBM, Intel, Arm, Synopsys, Google, Mozilla, General Motors)

leading the development of the MLCommons CK playground to co-design efficient ML and AI Systems via reproducible optimization challenges and tournaments;
led the development of the world's first ML-based compiler and the cTuning.org platform across 5 teams to automate and crowdsource optimization of computer systems (IBM and Fujitsu press-releases; invitation to help establish Intel Exascale Lab and lead SW/HW co-design group);
developed a compiler plugin framework that was added to the mainline GCC powering all Linux-based computers and helped to convert production compilers into research toolsets for machine learning;
developed the Collective Knowledge framework to automate and accelerate design space exploration of AI/ML/SW/HW stacks while balancing speed, accuracy, energy and costs; CK helped to automate most of MLPerf inference benchmark submissions for edge devices as mentioned by Forbes, ZDNet and EETimes;
co-founded an engineering company (dividiti) and led it to $1M+ in revenue with Fortune 50 customers using my CK technology; donated CK technology to MLCommons in 2021;

Entrepreneurship

founded cKnowledge Ltd;
founded and developed the cKnowledge.io platform acquired by OctoML.ai;
was selected for the 2nd Enterprenuer First cohourt in Paris to learn how to create startups and avoid numerous pitfalls.

Community service (collaboration with MLCommons, ACM and the Raspberry Pi foundation)

introduced reproducibility initiatives and artifact checklists at MLSys, ASPLOS, CGO, PPoPP and other ML and systems conferences to validate results from published papers;
helped to reproduce 150+ research papers from ML and systems conferences;
co-organized many reproducible competitions, tournaments and hackathons to co-design efficient AI and ML systems powered by my CK technology;
helped to establish the ACM taskforce and SIG on reproducibility;
helped to prepare the ACM artifact review and badging policy.

Professional Career

2023-cur.: Founder of the Collective Knowledge Playground - a free, open-source and technology-agnostic platform for collaborative benchmarking, optimization and comparison of AI and ML systems via open and reproducible challenges powered by my CK/CM technology. Our technology was successfully validated by the community and MLCommons members by automating, unifying and reproducing more than 80% of all MLPerf inference benchmark submissions (and 98% of power results) with very diverse technology from Neural Magic, Qualcomm, DELL, HPE, Lenovo, Hugging Face, Nvidia, AMD, Intel and Apple across diverse CPUs, GPUs and DSPs with PyTorch, ONNX, QAIC, TF/TFLite, TVM and TensorRT using popular cloud providers (GCP, AWS, Azure) and individual servers and edge devices provided by our volunteers and contributors.
2021-2023: Vice President at OctoML.ai leading the development of the 2nd generation of my open-source CK workflow automation technology (aka Collective Mind) and connecting it was TVM. Our technology was adopted by MLCommons (50+ AI software and hardware companies) to modularize AI/ML Systems and automate their development, optimization and deployment from the cloud to the edge.
2019-2021: Founder and developer of the cKnowledge.io platform to organize AI, ML and Systems knowledge and enable efficient computing based on FAIR principles (acquired by OctoML.ai).
2019: Founder in residence at Entrepreneur First learning how to build deep tech startups and MVPs from scratch while avoiding numerous pitfalls and minimizing all risks. This knowledge and experience helped me to meet many amazing people and create the cKnowledge.io platform acquired by OctoML.ai.
2015-2019: Co-founder and CTO of a commercial engineering company (dividiti) testing my CK framework in industry (real-world use cases).
2015-2019: Co-founder and CTO of a commercial engineering company (dividiti) testing my CK framework in industry (real-world use cases).
2016-2018: R&D project partner with General Motors (AI/ML/SW/HW co-design project).
2017-2018: R&D project partner with the Raspberry Pi foundation (crowd-tuning and machine learning).
2015-2016: Subcontractor for Google (performance autotuning and SW/HW co-design).
2014-2015: R&D project partner with Arm (EU H2020 TETRACOM project).
2012-2014: Tenured Research Scientist (associate professor) at INRIA.
2010-2011: Co-director of the Intel Exascale Lab (France) and a head of the software/hardware optimization and co-design group (on sabbatical).
2007-2010: Guest lecturer at the University of Paris-Sud.
2007-2010: Tenured Research Scientist (assistant professor) at INRIA.
1999-2006: Research Associate at the University of Edinburgh.

Community service

2022-cur.: Founder of the MLCommons taskforce on automation and reproducibility.
2020-cur.: A founding member of MLCommons to accelerate ML and systems innovation.
2020-cur.: A founding member of the ACM SIG on reproducibility.
2019-cur.: An early member of the MLPerf.org.
2017-cur.: A founding member of the ACM taskforce on reproducibility.
2015-cur.: Founder and architect of cKnowledge.org.
2015-cur.: Author of the unified artifact appendix and reproducibility checklist for the systems and ML conferences.
2014-cur.: Initiator of reproducibility initiatives at ML&systems conferences.
2014-cur.: Founder and president of the cTuning foundation, France.
2008-cur.: Founder and the architect of cTuning.org.
1999-cur.: Evangelist of collaborative and reproducible research and experimentation in computer engineering.

Awards

2017: ACM CGO test of time award for my research on ML-based self-optimizing compilers.
2016: Microsoft Azure Research award to support cKnowledge.org.
2015: European technology transfer award for my cKnowledge.org technology.
2012: INRIA scientific excellence award and personal fellowship.
2010: HiPEAC award for PLDI paper.
2009: HiPEAC award for MICRO paper.
2006: CGO best paper award.
2000: Overseas research student award for my Ph.D.
1999: Golden medal for my research during M.S. studies.
1998: International Soros Foundation fellowship in recognition and appreciation of outstanding achievements in the study of science at the university level.

Education

2004: PhD in computer science with the ORS award from the University of Edinburgh.
1999: MS in computer engineering with a golden medal (summa cum laude) from MIPT.
1997: BS in electronics, mathematics and machine learning (summa cum laude) from MIPT.

Main scientific and community contributions

2023: Organized the 1st open Collective Knowledge challenge to benchmark and optimize MLPerf inference powered by my CK technology - see HPC Wire report about our cTuning submission.
2019: Started the reproducibility initiative at the MLSys conference (machine learning and systems) using my unified artifact appendix with a reproducibility checklist ;
2019: Started the reproducibility initiative at the ASPLOS conference;
2018: Co-founded reproducible quantum computing tournaments;
2018: Co-founded reproducible optimization tournaments to co-design and optimize efficient AI/ML/SW/HW stacks;
2016: Introduced the unified artifact appendix and reproducibility check-list at systems and ML confernces;
2014-cur.: Established the non-profit cTuning foundation to support reproducibility initiatives at ML&systems conferences and develop the CK framework for collaborative ML&sytesm optimization;
2007-2010: Started a new MS course on ML based optimization, run-time adaptation and SW/HW co-design at the Paris South University;
1999-2009: Prepared the foundations, scientific methodology, and tools to automatically co-design self-optimizing, bio-inspired, and ML-based software and hardware that can run emerging workloads in the most efficient way in terms of speed, accuracy, energy and associated costs.

Professional memberships

MLCommons
ACM
IEEE
HiPEAC

Main software developments

2023-cur.: Developed a prototype of the Collective Knowledge playground to collaboratively benchmark and optimize AI, ML and other emerging applications in an automated and reproducible way via open challenges.
I used Streamlit; PyTorch/ONNX/TF/TFLite/TVM; Nvidia/Intel/AMD/Qualcomm/DSPs; CK2/CM automation; Python; MLPerf benchmarks .
2020-cur.: Developed a prototype of the cKnowledge.io to organize all knowledge about AI, ML, systems, and other innovative technology from my academic and industrial partners in the form of portable CK workflows, automation actions, and reusable artifacts. I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks from data centers and supercomputers to mobile phones and edge devices in terms of speed, accuracy, energy, and various costs. I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers and accelerate their adoption in production. I collaborate with MLPerf.org to automate and simplify ML&systems benchmarking and fair comparison based on the CK concept and DevOps/MLOps principles.
I used the following technologies: Linux/Windows/Android; Python/JavaScript/CK; apache2; flask/django; ElasticSearch; GitHub/GitLab/BitBucket; REST JSON API; Travis CI/AppVeyor CI; DevOps; CK-based knowledge graph database; TensorFlow; Azure/AWS/Google cloud/IBM cloud .
2018-cur.: Enhanced and stabilized all main CK components (software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization) successfully used by dividiti to automate MLPerf benchmark submissions.
I used the following technologies: Linux/Windows/Android; CK/Python/JavaScript/C/C++; statistical analysis; MatPlotLib/numpy/pandas/jupyter notebooks; GCC/LLVM; TensorFlow/PyTorch; Main AI algorithms, models and data sets for image detection and object classification; Azure/AWS/Google cloud/IBM cloud; mobile phones/edge devices/servers; Nvidia GPU/EdgeTPU/x86/Arm architectures .
2017-2018: Developed CK workflows and live dashboards for the 1st open ACM REQUEST tournament to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs. We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC; ImageNet; MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet; MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM; Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure .
2017-2018: Developed an example of the autogenerated and reproducible paper with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques (collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran; MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning .
2015-cur.: Developed the Collective Knowledge framework (CK) to help the community automate typical tasks in ML&systems R&D, provide a common format, APIs, and meta descriptions for shared research projects, enable portable workflows, and improve the reproducibility and reusability in computational research. We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks in terms of speed, accuracy, energy and other costs across diverse platforms from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices; Python/C/C++/Java; ICC/GCC/LLVM; JSON/REST API; DevOps; plugins; apache2; Azure cloud; client/server architecture; noSQL database (ElasticSearch); GitHub/GitLab/BitBucket; Travis CI/AppVeyor CI; main math libraries, DNN frameworks, models, and datasets .
2012-2014: Prototyped the Collective Mind framework - prequel to CK. I focused on web services but it turned out that my users wanted basic CLI-based framework. This feedback motivated me to develop a simple CLI-based CK framework.
2010-2011: Helped to create KDataSets (1000 data sets for CPU benchmarks) (PLDI paper, repo).
2008-2010: Developed the Machine learning based self-optimizing compiler connected with cTuning.org in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog; semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning; plugins; client/server architecture .
2008-2009: Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs (report).
2008-2009: Developed the interactive compilation interface now available in mainline GCC (collaboration with Google and Mozilla).
2008-cur.: Developed the cTuning.org portal to crowdsource training of ML-based MILEPOST compiler and automate SW/HW co-design similar to SETI@home. See press-releases from IBM and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2; client/server architecture; KNN/SVM/decision trees; plugins .
2009-2010: Created cBench (collaborative CPU benchmark to support autotuning R&D) and connected it with my cTuning infrastructure from the MILEPOST project.
2005-2009: Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.
1999-2004: Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins to integrate autotuning/benchmarking tools and techniques from other partners .
1999-2001: Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS .
1998-1999: Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D .
1993-1998: Developed an analog semiconductor neural network accelerator (Hopfield architecture). My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation; ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments .
1991-1993: Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management .

My favorite story about Ernest Rutherford and Niels Bohr

Link