Title: P2: P0: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
- unique: automation of solving the problem of time
series classification.
- build upon: catboost, lightgbm, xgboost,
statsmodels, ete3 (trees), scikit-learn, NetworkX, sktime
(time-serieses) #russia #ml #nlp #datascience #opensource #dailyreport
GitHub Actions for a Gleam monorepo
https://crowdhailer.me/2026-04-21/github-actions-for-a-gleam-monorepo/
#Programming #OpenSource #DevOps
Title: P1: P0: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
Russia.
1) ITMO University AIM.CLUB https://github.com/aimclub/
+ FEDOT - Automated modeling and machine learning
framework
- core: part of FEDOT.Industrial. #russia #ml #nlp #datascience #opensource #dailyreport
Title: P11: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
+ ai_toolbox - framework for active learning in NLP
+ eco4cast - reduce carbon footprint of machine learning
models
Main organization: Russia Open Source Foundation
https://nplus1.ru
I don't have a job, fuck you.
蠡 #russia #ml #nlp #datascience #opensource #dailyreport
Title: P10: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
in NLP: text classification and sequence tagging. instead
of annotating random samples, you annotate a portion of
the examples that are most useful to improving the model.
+ AriGraph - memory model for LLM agents interacting
with environment and multi-hop question answering tasks.
- https://arxiv.org/abs/2407.04363 #russia #ml #nlp #datascience #opensource #dailyreport
Title: P9: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
a python library for bandit algorithms and off-policy
evaluation
8) AIRI Artificial Intelligence Research Institute
https://github.com/AIRI-Institute/
+ pogema - Partially-Observable Grid Environment for
Multiple Agents. grid-based, can generate maps, can be
tailored to a variety of PO-MAPF settings
+ GENA_LM - a framework for active learning annotation #russia #ml #nlp #datascience #opensource #dailyreport
Title: P7: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
+ Py-Boost - Python based GBDT implementation on
GPU. multiclass/multilabel/multitask training
+ HypEx - framework for automatic Causal Inference.
+ Sim4Rec - Simulator for training and evaluation of
Recommender Systems
+ AutoMLWhitebox - or AutoWoE - automatic creation of
interpretable ML model based on feature binning, WoE #russia #ml #nlp #datascience #opensource #dailyreport
Title: P8: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
features transformation, feature selection and Logistic
Regression.
+ SLAMA - LightAutoML on Spark
+ ESGify - NLP model for multilabel news classification
with respect to 47 ESG risks (company environmental,
social, and governance factors that could cause
reputation or financial harm.)
+ sb-obp - Open Bandit Pipeline for Open Bandit Dataset: #russia #ml #nlp #datascience #opensource #dailyreport
Title: P6: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
7) sb-ai-lab “СБЕР” https://github.com/sb-ai-lab/
+ LightAutoML - Fast and customizable framework for
automatic ML model creation (AutoML)
+ RePlay - Framework for Building End-to-End
Recommendation Systems with State-of-the-Art Models
+ eco2ai - accumulates statistics about power
consumption and CO2 emission during running code. #russia #ml #nlp #datascience #opensource #dailyreport
Title: P5: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
distributed learning.
+ rep - wrapper for popular ML libraries. try to extends
scikit-learn.
+ ch-tools, ch-backup - administration and diagnostics
and Backup tools for ClickHouse.
6) ETNA-team, corl-team (old Tinkoff team)
+ etna, https://github.com/etna-team/etna
+ corl, https://github.com/corl-team/CORL
+ reBRAC https://github.com/DT6A/ReBRAC #russia #ml #nlp #datascience #opensource #dailyreport
Title: P4: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
5) Yandex https://github.com/yandex/
+ catboost - Gradient Boosting on Decision Trees
https://github.com/catboost/catboost
+ YaLM-100B is a GPT-like neural network for generating
and processing text.
+ YaFSDP - Sharded Data Parallelism framework, designed
to work well with transformer-like neural network
architectures. Competitor to FSDP of PyTorch for #russia #ml #nlp #datascience #opensource #dailyreport
Title: P3: Survey of Open-Source Machine Learning and Data Sciecne in [2024-10-03 Thu]
+ kmath - Kotlin-based analog to Python's NumPy library.
- https://github.com/SciProgCentre/kmath
4) Skoltech
+ ttpy, https://github.com/oseledets/ttpy
+ h2tools - H^2 -matrices, on numpy. efficient for
integral equations or particle-to-particle interactions.
- https://bitbucket.org/muxas/h2tools/
https://pythonhosted.org/h2tools/ #russia #ml #nlp #datascience #opensource #dailyreport