- 2023-cur.:
Developed a prototype of the Collective Knowledge playground
to collaboratively benchmark and optimize AI, ML and other emerging applications
in an automated and reproducible way via open challenges.
I used Streamlit; PyTorch/ONNX/TF/TFLite/TVM; Nvidia/Intel/AMD/Qualcomm/DSPs; CK2/CM automation; Python; MLPerf benchmarks
.
- 2020-cur.:
Developed a prototype of the cKnowledge.io to organize all knowledge
about AI, ML, systems, and other innovative technology from my academic and industrial partners
in the form of portable CK workflows, automation actions, and reusable artifacts.
I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks
from data centers and supercomputers to mobile phones and edge devices
in terms of speed, accuracy, energy, and various costs.
I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers
and accelerate their adoption in production.
I collaborate with MLPerf.org to automate and simplify ML&systems benchmarking
and fair comparison based on the CK concept and DevOps/MLOps principles.
I used the following technologies: Linux/Windows/Android; Python/JavaScript/CK; apache2; flask/django; ElasticSearch;
GitHub/GitLab/BitBucket;
REST JSON API;
Travis CI/AppVeyor CI;
DevOps;
CK-based knowledge graph database; TensorFlow; Azure/AWS/Google cloud/IBM cloud
.
- 2018-cur.:
Enhanced and stabilized all main CK components
(software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization)
successfully used by dividiti to automate MLPerf benchmark submissions.
I used the following technologies: Linux/Windows/Android; CK/Python/JavaScript/C/C++;
statistical analysis; MatPlotLib/numpy/pandas/jupyter notebooks;
GCC/LLVM; TensorFlow/PyTorch; Main AI algorithms, models and data sets for image detection and object classification;
Azure/AWS/Google cloud/IBM cloud;
mobile phones/edge devices/servers;
Nvidia GPU/EdgeTPU/x86/Arm architectures
.
- 2017-2018:
Developed CK workflows
and live dashboards for
the 1st open ACM REQUEST tournament
to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs.
We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC;
ImageNet;
MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet;
MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM;
Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure
.
- 2017-2018:
Developed an example of the autogenerated and reproducible paper
with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
(collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran;
MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis;
crowd-benchmarking; crowd-tuning
.
- 2015-cur.:
Developed the Collective Knowledge framework (CK)
to help the community
automate typical tasks in ML&systems R&D,
provide a common format, APIs, and meta descriptions for shared research projects,
enable portable workflows,
and improve the reproducibility and reusability in computational research.
We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks
in terms of speed, accuracy, energy and other costs across diverse platforms
from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices;
Python/C/C++/Java;
ICC/GCC/LLVM;
JSON/REST API;
DevOps;
plugins;
apache2;
Azure cloud;
client/server architecture;
noSQL database (ElasticSearch);
GitHub/GitLab/BitBucket;
Travis CI/AppVeyor CI;
main math libraries, DNN frameworks, models, and datasets
.
- 2012-2014:
Prototyped the Collective Mind framework - prequel to CK.
I focused on web services but it turned out that my users wanted basic CLI-based framework.
This feedback motivated me to develop a simple CLI-based CK framework.
- 2010-2011:
Helped to create KDataSets (1000 data sets for CPU benchmarks)
(PLDI paper,
repo).
- 2008-2010:
Developed the Machine learning based self-optimizing compiler connected with cTuning.org
in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be
the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog;
semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis;
crowd-benchmarking; crowd-tuning; plugins; client/server architecture
.
- 2008-2009:
Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs
(report).
- 2008-2009:
Developed the interactive compilation interface
now available in mainline GCC (collaboration with Google and Mozilla).
- 2008-cur.:
Developed the cTuning.org portal
to crowdsource training of ML-based MILEPOST compiler
and automate SW/HW co-design similar to SETI@home. See press-releases from IBM
and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2;
client/server architecture; KNN/SVM/decision trees; plugins
.
- 2009-2010:
Created cBench (collaborative CPU benchmark to support autotuning R&D)
and connected it with my cTuning infrastructure from the MILEPOST project.
- 2005-2009:
Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.
- 1999-2004:
Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins
to integrate autotuning/benchmarking tools and techniques from other partners
.
- 1999-2001:
Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS
.
- 1998-1999:
Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D
.
- 1993-1998:
Developed an analog semiconductor neural network accelerator (Hopfield architecture).
My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation;
ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments
.
- 1991-1993:
Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management
.