Data Scientist (5+ years). PhD in Computational Physics.
Data Science Skills
• Programming Languages: Python, R
• Statistical Analysis: Generalized linear models, multivariate regression, time-series analysis.(scikit-learn, statsmodel, pandas, numpy)
• Machine Learning: Neural networks, support vector machines, random forests, boosting methods (scikit-learn, pyTorch, keras)
• Data Integration and Management: SQL, handling multi-omic datasets (genomics, proteomics, transcriptomic)
• Data Visualization: ggplot2, Matplotlib, Seaborn
• Big Data and High-Performance Computing: Use of HPC clusters for large-scale data analysis
• Bioinformatics Tools: Bioconductor, Galaxy
• Natural Language Processing: Text mining, Sentiment analysis, Topic Modelling (NLTK, SPacy, BERT)
Analytical Skills
• Data preprocessing, normalization, and transformation
• Predictive modeling and algorithm development
• Network and pathway analysis, Differential gene expression analysis
Research and Project Experience
• Developed and implemented predictive models for clinical trial data analysis, improving early-phase trial insights.
• Conducted exploratory data analysis and visualized complex datasets to identify trends and patterns.
• Designed and executed experiments to test hypotheses and validate models.
Soft Skills
• Excellent written and verbal communication skills in English
• Collaboration in interdisciplinary and multicultural teams
• Independent project management and leadership
Additional Skills
• Linux systems, command-line tools
• Version control (Git)
• Deep learning frameworks (TensorFlow, Keras, PyTorch)
• Experience with relational databases and big data technologies