Skip to content

Commit

Permalink
Merge branch 'master' into phi3poc
Browse files Browse the repository at this point in the history
  • Loading branch information
JessicaXYWang authored Jan 10, 2025
2 parents e1105fd + bab6aed commit e59a981
Show file tree
Hide file tree
Showing 152 changed files with 23,634 additions and 186 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ SynapseML requires Scala 2.12, Spark 3.4+, and Python 3.8+.
| Topics | Links |
| :------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Build | [![Build Status](https://msdata.visualstudio.com/A365/_apis/build/status/microsoft.SynapseML?branchName=master)](https://msdata.visualstudio.com/A365/_build/latest?definitionId=17563&branchName=master) [![codecov](https://codecov.io/gh/Microsoft/SynapseML/branch/master/graph/badge.svg)](https://codecov.io/gh/Microsoft/SynapseML) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) |
| Version | [![Version](https://img.shields.io/badge/version-1.0.8-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/1.0.8/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
| Version | [![Version](https://img.shields.io/badge/version-1.0.9-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/1.0.9/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/1.0.9/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
| Support | [![Gitter](https://badges.gitter.im/Microsoft/MMLSpark.svg)](https://gitter.im/Microsoft/MMLSpark?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Mail](https://img.shields.io/badge/mail-synapseml--support-brightgreen)](mailto:synapseml-support@microsoft.com) |
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v1.0.8?labpath=notebooks%2Ffeatures) |
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v1.0.9?labpath=notebooks%2Ffeatures) |
| Usage | [![Downloads](https://static.pepy.tech/badge/synapseml)](https://pepy.tech/project/synapseml) |
<!-- markdownlint-disable MD033 -->
<details open>
Expand Down Expand Up @@ -119,7 +119,7 @@ In Azure Synapse notebooks please place the following in the first cell of your
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.8",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.9",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
Expand Down Expand Up @@ -155,15 +155,15 @@ cloud](http://community.cloud.databricks.com), create a new [library from Maven
coordinates](https://docs.databricks.com/user-guide/libraries.html#libraries-from-maven-pypi-or-spark-packages)
in your workspace.

For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.8`
For the coordinates use: `com.microsoft.azure:synapseml_2.12:1.0.9`
with the resolver: `https://mmlspark.azureedge.net/maven`. Ensure this library is
attached to your target cluster(s).

Finally, ensure that your Spark cluster has at least Spark 3.2 and Scala 2.12. If you encounter Netty dependency issues please use DBR 10.1.

You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:

`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.8.dbc`
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.9.dbc`

### Python Standalone

Expand All @@ -174,7 +174,7 @@ the above example, or from python:
```python
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.9") \
.getOrCreate()
import synapse.ml
```
Expand All @@ -185,9 +185,9 @@ SynapseML can be conveniently installed on existing Spark clusters via the
`--packages` option, examples:

```bash
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.8
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.8
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.8 MyApp.jar
spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.9
pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.9
spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.9 MyApp.jar
```

### SBT
Expand All @@ -196,7 +196,7 @@ If you are building a Spark application in Scala, add the following lines to
your `build.sbt`:

```scala
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.8"
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "1.0.9"
```

### Apache Livy and HDInsight
Expand All @@ -210,7 +210,7 @@ Excluding certain packages from the library may be necessary due to current issu
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.8",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.9",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind"
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ trait HasOpenAITextParams extends HasOpenAISharedParams {

// list of shared text parameters. In method getOptionalParams, we will iterate over these parameters
// to compute the optional parameters. Since this list never changes, we can create it once and reuse it.
private val sharedTextParams = Seq(
private[openai] val sharedTextParams: Seq[ServiceParam[_]] = Seq(
maxTokens,
temperature,
topP,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ package com.microsoft.azure.synapse.ml.services.openai

import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.param.AnyJsonFormat.anyFormat
import com.microsoft.azure.synapse.ml.param.ServiceParam
import com.microsoft.azure.synapse.ml.services.{HasCognitiveServiceInput, HasInternalJsonOutputParser}
import org.apache.http.entity.{AbstractHttpEntity, ContentType, StringEntity}
import org.apache.spark.ml.ComplexParamsReadable
Expand All @@ -16,10 +17,91 @@ import spray.json._

import scala.language.existentials

object OpenAIResponseFormat extends Enumeration {
case class ResponseFormat(paylodName: String, prompt: String) extends super.Val(paylodName)

val TEXT: ResponseFormat = ResponseFormat("text", "Output must be in text format")
val JSON: ResponseFormat = ResponseFormat("json_object", "Output must be in JSON format")

def asStringSet: Set[String] =
OpenAIResponseFormat.values.map(_.asInstanceOf[OpenAIResponseFormat.ResponseFormat].paylodName)

def fromResponseFormatString(format: String): OpenAIResponseFormat.ResponseFormat = {
if (TEXT.paylodName== format) {
TEXT
} else if (JSON.paylodName == format) {
JSON
} else {
throw new IllegalArgumentException("Response format must be valid for OpenAI API. " +
"Currently supported formats are " +
asStringSet.mkString(", "))
}
}
}

trait HasOpenAITextParamsExtended extends HasOpenAITextParams {
val responseFormat: ServiceParam[Map[String, String]] = new ServiceParam[Map[String, String]](
this,
"responseFormat",
"Response format for the completion. Can be 'json_object' or 'text'.",
isRequired = false) {
override val payloadName: String = "response_format"
}

def getResponseFormat: Map[String, String] = getScalarParam(responseFormat)

def setResponseFormat(value: Map[String, String]): this.type = {
val allowedFormat = OpenAIResponseFormat.asStringSet

// This test is to validate that value is properly formatted Map('type' -> '<format>')
if (value == null || value.size !=1 || !value.contains("type") || value("type").isEmpty) {
throw new IllegalArgumentException("Response format map must of the form Map('type' -> '<format>')"
+ " where <format> is one of " + allowedFormat.mkString(", "))
}

// This test is to validate that the format is one of the allowed formats
if (!allowedFormat.contains(value("type").toLowerCase)) {
throw new IllegalArgumentException("Response format must be valid for OpenAI API. " +
"Currently supported formats are " +
allowedFormat.mkString(", "))
}
setScalarParam(responseFormat, value)
}

def setResponseFormat(value: String): this.type = {
if (value == null || value.isEmpty) {
this
} else {
setResponseFormat(Map("type" -> value.toLowerCase))
}
}

def setResponseFormat(value: OpenAIResponseFormat.ResponseFormat): this.type = {
setScalarParam(responseFormat, Map("type" -> value.paylodName))
}

// override this field to include the new parameter
override private[openai] val sharedTextParams: Seq[ServiceParam[_]] = Seq(
maxTokens,
temperature,
topP,
user,
n,
echo,
stop,
cacheLevel,
presencePenalty,
frequencyPenalty,
bestOf,
logProbs,
responseFormat
)
}

object OpenAIChatCompletion extends ComplexParamsReadable[OpenAIChatCompletion]

class OpenAIChatCompletion(override val uid: String) extends OpenAIServicesBase(uid)
with HasOpenAITextParams with HasMessagesInput with HasCognitiveServiceInput
with HasOpenAITextParamsExtended with HasMessagesInput with HasCognitiveServiceInput
with HasInternalJsonOutputParser with SynapseMLLogging {
logClass(FeatureNames.AiServices.OpenAI)

Expand Down Expand Up @@ -54,7 +136,7 @@ class OpenAIChatCompletion(override val uid: String) extends OpenAIServicesBase(

override def responseDataType: DataType = ChatCompletionResponse.schema

private[this] def getStringEntity(messages: Seq[Row], optionalParams: Map[String, Any]): StringEntity = {
private[openai] def getStringEntity(messages: Seq[Row], optionalParams: Map[String, Any]): StringEntity = {
val mappedMessages: Seq[Map[String, String]] = messages.map { m =>
Seq("role", "content", "name").map(n =>
n -> Option(m.getAs[String](n))
Expand Down
Loading

0 comments on commit e59a981

Please sign in to comment.