The Oracle Cloud Infrastructure (OCI) Data Science service has created this repo to make demos, tutorials, and code examples that highlight various features of the OCI Data Science service and AI services. We welcome your feedback and would like to know what content is useful and what content is missing. Open an issue to do this. We know that a lot of you are creating great content and we would like to help you share it. See the contributions document.
- notebook_examples: The Accelerated Data Science (ADS) SDK is a data scientist friendly library that helps you speed up common data science tasks and it also provides an interface to other OCI services. This section contains JupyterLab notebooks that provide tutorials on how to use ADS. For example, the vault.ipynb shows how easy it is to store you secrets in the OCI Vault service.
- conda_environment_notebooks: The OCI Data Science service uses conda environments to manage the available libraries that a notebook can use. OCI The Data Science service provides a number of conda environments that are designed to give you the best in class libraries for common data science tasks. Each family of conda environments has notebooks that demonstrate how to perform different data science tasks. This section is organized around these conda environment families and provides the notebooks that you need to get you started quickly.
- knowledge_base: Are you struggling with a problem? Check out the knowledge base. It has a growing section of articles on how to solve common problems that you may encounter.
- labs: Looking to walk through an end-to-end problem? Check out this section. It has examples of how to train machine learning models and then deploy them on the OCI Data Science service.
- model_catalog_examples: The model catalog provides a managed and centralized storage space for models. ADS helps you create the artifacts that you need to use this service. However, you need to provide a
score.py
file that will load the model and a function that will make predictions. Theruntime.yaml
provides information about the runtime conda environment if you want to deploy the model. It also allows you to document a comprehensive set of metadata about the provenance of the model. The section of the repo provides examples of how to create yourscore.py
andruntime.yaml
files for various common machine learning models. There are many different models and configurations. - jobs: The Oracle Cloud Infrastructure Data Science Jobs enables you to define and run a repeatable machine learning task on a fully managed infrastructure. Jobs enable custom tasks, so you can apply any use case you may have such as data preparation, model training, hyperparameter optimization, batch inference and so on.
- distributed training: support for distributed training with Jobs for the frameworks: Dask, Horovod, TensorFlow Distributed and PyTorch Distributed.
- pipelines: The Oracle Cloud Infrastructure Data Science Pipelines automates and streamlines the process of building and deploying machine learning models.
- data_labeling_examples: The data labeling service helps in the process of identifying properties (labels) of documents, text, and images (records), and annotating (labeling) them with those properties. This sections contains Python and Java scripts to annotate bulk number of records in OCI Data Labeling Service (DLS).
Check out the following resources for more information about the OCI Data Science and AI services:
- ADS class documentation
- ADS user guide
- AI & Data Science blog
- OCI Data Science service guide
- OCI Data Science service release notes
- YouTube playlist
- OCI Data Labeling Service guide
- OCI DLS DP API
- OCI DLS CP API
- Create a GitHub issue.
This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide.
The Security Guide contains information about security vulnerability disclosure process. If you discover a vulnerability, consider filing an issue.