Skip to content
View sim2000dg's full-sized avatar
  • Rome

Block or report sim2000dg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sim2000dg/README.md

Hi everyone, I am Simone Di Gregorio ✌️

About me

I hold a bachelor's degree in Management and Computer Science from LUISS and I have gratuated with highest honors from the Data Science Master's Degree @Sapienza. I am now a PhD Student in Data Science @Sapienza.

Through university and (a lot of) self-studying I have a solid background in data science, from simple ETL to modelling. In particular, I have in-depth knowledge of:

  • R for machine learning, modelling, statistics, reporting (R Markdown) and data manipulation, exploiting the tidyverse ecosystem far more than the base language.
  • Python for scripting, manipulation, modelling and web scraping. Specifically, my experience revolves mostly around Pandas, NumPy, scikit and Tensorflow. Experience with Tensorflow has been both with Keras and more low-level APIs.
  • KNIME Analytics Platform and KNIME Server, now Business Hub, due to university projects and work experience. Specifically, I am L1, L2 and L3 certified.
  • Relational paradigm for databases and SQL.

I am a former Data Science Intern @KNIME, the software company behind KNIME Analytics Platform (and its enterprise version), a popular and powerful low-code tool to perform data science tasks, at every level. As an employee, I developed KNIME native low-code approaches for the Word2Vec complete pipeline and I also developed a fast new Python-based Word2Vec node based on Tensorflow, using a mix of low-level APIs (mainly for the pre-processing) and Keras for the modelling steps. The code for the node is publicly available in one of my repositories, at this link.

My research interests are mainly in mathematics: probability theory and statistical inference for stochastic processes (specifically, diffusions), theoretical computer science, algorithmic game theory and statistical learning theory applied to AGT topics. I work at the Department of Computer, Control, and Management Engineering @Sapienza in the group managed by Prof. Stefano Leonardi. I also help teaching a variety of courses in Sapienza, stemming from randomized algorithms to statistics and stochastic processes.

You can find here a list of publications and/or activities related to my research:

  • Neural Drift Estimation for Ergodic Diffusions: Nonparametric Analysis and Numerical Exploration, New Trends in Functional Statistics and Related Fields, with Francesco Iafrate. The work is published as a proceeding of IWFOS 2025 (International Workshop on Functional and Operatorial Statistics), which was held in Novara, Italy.
  • Nearly Tight Regret Bounds for Profit Maximization in Bilateral Trade, FOCS 2025, with Federico Fusco, Chris Schwiegelshohn and Paul Duetting. (See you in Sydney in December!) I also presented the work at EC (Economics and Computation) in Stanford this year.

How to reach me

Popular repositories Loading

  1. omds_project omds_project Public

    Repository for Optimization Methods for Data Science Final Project / Data Science @Sapienza

    Python 1

  2. DynamicGeoCells DynamicGeoCells Public

    This package helps you building geocells automatically based on the point density of your dataset.

    Python 1 1

  3. Word2VecPyNodeTF Word2VecPyNodeTF Public

    Python-based KNIME node implementing Word2Vec algorithms with Tensorflow

    Python 3

  4. homework1_ADM homework1_ADM Public

    Repository for the first homework of Algorithmic Methods for Data Mining @Sapienza

    Python

  5. adm_hw2-group30 adm_hw2-group30 Public

    Repository for 2nd homework of ADM/Data Science @Sapienza

    HTML 1

  6. hw3_group28_ADM hw3_group28_ADM Public

    Repository for the third homework of Algorithmic Methods of Data Mining and Laboratory / Data Science@Sapienza

    Jupyter Notebook 2