Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions
With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development.
For more about Michael check out these links:
Graduate class at the University of Texas at Austin, from the syllabus:
“You will learn the theory and practice of data analytics and machine learning for subsurface resource modeling”.
By the end of this course, you will be able to:
- perform prerequisite data analytics, data checking, and evaluation, to support machine learning models
- select features, engineering features, and project features to lower dimensional space to build the best possible models
- segment datasets with cluster analysis for improved models
- select from and train and tune a wide variety of predictive machine learning models
- build robust data analytics and machine learning workflows in Python with open source packages
- calculate useful diagnostics and critically evaluate and check your models
- present, communicate, document and deploy your modeling workflows
I like to put content online for anyone to access. Note: My online handle is ‘GeostatsGuy’. These are the online resources for the course:
My GitHub Repositories are here.
This includes the following repositories that may be helpful:
-
PythonNumericalDemos – worked out examples in Python, Jupyter Notebooks with Markdown for data analytics (bootstrap, declustering, principal components analysis, decision tree, and support vector machines, deep learning etc.
-
GeoDataSets) – synthetic, but realistic spatiotemporal, multivariate datasets to support my students and my educational content.
-
Geostatsr – workflows in R for linear regression, spatial continuity, kriging, simulation, principal components analysis, and decision tree.
-
GeostatLectures – short lectures and posters with concise descriptions of topics in geostatistics.
-
GeostatsPy – spatial data analytics Python package that I wrote to support this course and everyone will install.
-
ExcelNumericalDemos – worked out examples in Microsoft Excel of statistical concepts such as distributions, hypothesis tests, confidence intervals, heterogeneity measures, spatial continuity, kriging, simulation, bootstrap and decision making in the presence of uncertainty.
To support my students and provide an evergreen resource that outlasts the semester, I record the lectures and post them on YouTube channel GeostatsGuy Lectures.
This includes the following playlists:
-
Machine Learning - all the lectures for this course.
-
Data Analytics and Geostatistics - the lectures for my Data Analytics and Geostatistics course as useful prerequisites for this course.
-
Spatial Data Analytics and Modeling - the lectures for my Spatial Data Analytics and Modeling course as useful prerequisites for this course.
-
Data Science Basics in Python - live code walkthoughs of well-documented data science workflows in Python to assist with machine learning model workflow creation.
Follow me on Twitter where I'm the GeostatsGuy!
- I tweet daily about data analytics, geostatistics and machine learning ideas and resources, engineering, and infrequently unrelated to engineering or science (e.g. outdoors activities and local live music).
READINGS: There is no course textbook. All lectures are posted on YouTube and all in class demonstrations are available as well-documented workflows on GitHub. The provided notes, slides in PDF and example workflows are comprehensive and cover all content on examinations, but students interested in additional reading are welcome to refer to:
Machine Learning:
Hastie, T, Tibshirani, R., and Friedman, J., 2012, The Elements of Statistical Learning; Data Mining, Inference and Prediction, Springer.
James et al., 2013, An Introduction to Statistical Learning: with Applications in R, Springer.
Subsurface Data Analytics and Modeling:
Pyrcz, M. and Deutsch, C., Geostatistical Reservoir Modeling, Oxford University Press, New York, 2014.
Also, various journal papers will be posted for reference.
As part of the course, all students complete a machine learning project.
The challenge: build a well-documented, educational machine learning workflow.
Here's the motivation and more details:
-
produce a comprehensive, concise, well-documented, machine learning workflow in a Jupyter Notebook. Opportunity to apply course learnings and demonstrate a high level of proficiency. With permission, I post the workflows online in this GitHub repository and use it to support future classes (with credit).
-
open-source contributions to GitHub are recognized in many companies.
-
to assist I have provided a project template.
-
the workflows are graded by the following criteria:
Element | Description |
---|---|
Great Executive Summary | Gap, Work to Address Gap, Learnings and Recommendation |
Workflow Steps | All Aligned with Goal |
Concise Workflow | Every Step and Figure has a Purpose / Consistent with Provided Template / Features Briefly Explained / Feature have Units |
Images / Figures | Excellent Figures / Subplots and Combined Plots for Efficient Displays and Communication / Axes Labeled / Consistent Figure Sizes |
Demonstrated Knowledge | All Modeling Choices Defended / Demonstrated Extension of Knowledge |
Readable Code | Code Documentation / Steps’ Description and Observations between Code Blocks / Only Include Needed Packages / Use Function for Concise Code |
Citations | All Code from Others Cited |
Creativity / Innovation | Unique, Novel Application of Machine Learning |
I share these to promote the students' work.
- We are teaching novel data analytics, geostatistics and machine learning skills to engineering and science students.
I hope you join us in my PGE 383: Subsurface Modeling class. We have about 40 graduate students from engineering and geoscience participating here at the University of Texas at Austin. The Jackson School of Geosciences offered a new classroom in their building after we outgrew our room in the Petroleum and Geosystems engineering department. I appreciate the excellent support from both the Hidebrand Department of Petroleum and Geosystems Engineering and the Jackson School of Geosciences.
I hope that this is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.
-
Want to invite me to visit your company for training, mentoring, project review, workflow design and consulting, I'd be happy to drop by and work with you!
-
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
-
I can be reached at mpyrcz@austin.utexas.edu.
I'm always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin