Bandwidth refers to the maximum amount of data that can be transmitted over a network in a given period of time. It is usually measured in:
- Bits per second (bps)
- Kilobits per second (Kbps)
- Megabits per second (Mbps)
- Gigabits per second (Gbps)
More bandwidth = More data sent simultaneously
- In Kafka, a topic can handle more messages per second with Protobuf than JSON. Example with 10 Mbps bandwidth: Protobuf sends ~5x more messages per second than JSON!
JSON (100 KB/message) → 12 messages/sec
Protobuf (20 KB/message) → 62 messages/sec
#Convert Message Sizes to Kilobits (Kb)
JSON = 100 KB × 8 = 800 Kb
Protobuf = 20 KB × 8 = 160 Kb
Bandwidth = 10,000 Kb/s
Higher throughput = More efficiency!
✅ Microservices & Kafka → Handle more messages per second ✅ Cloud Applications → Reduce cloud costs ✅ Mobile & IoT → Improve performance in weak networks ✅ Streaming Systems (gRPC, FS2, Akka Streams) → Lower latency
Compile / sourceGenerators += (Compile / PB.generate).taskValue
This refers to ScalaPB's Protobuf code generator.
PB.generate
is a task that converts .proto files into Scala case classes- It runs before sbt compiles your project
+=
appends a new task to sourceGenerators, ensuring that Protobuf-generated files are compiled.
taskValue
forces sbt to evaluatePB.generate
as a task.
Without .taskValue
, sbt would treat it as a reference instead of running the Protobuf generation task.
compile
depends on sourceGenerators
sourceGenerators
depends on PB.generate
schema-evolution-protobuf-scalapb-fs2grpc
libraryDependencies += groupID % artifactID % revision % configuration
Often during development, you may want to use different dependencies in different situations, even within the same module (called a project in sbt). For example, some dependencies are only needed during testing (a test framework, for example), while others are always needed. It would be wasteful to have a test framework mixed into the JAR file you distribute, and you shouldn't do that.
Running Scala 3 in a Container image on AWS Lambda
I don't want to set AWS credentials directly in GitHub Actions, so I want to use an IAM role.
How Lenses and Other Optics Will Change Programming / How to Skip Modeling Complex Data
How to use, structure, and customize Scribe, an advanced logging library for Scala 3
Running Scala 3 in a Container image on AWS Lambda
- Model is Saved and Loaded in the Application (Embedded Model)
Pros & Cons ✅ Low latency (since the model runs in-memory) ✅ No external dependencies ❌ Hard to update dynamically (requires redeployment) ❌ Not scalable for high-traffic systems
- Model is Hosted as a REST API (Model-as-a-Service)
- Fraud detection systems send requests to this service via HTTP/REST or gRPC. eg fraud-detection-api
Pros & Cons ✅ Easy to update models without redeploying the entire system ✅ Scalable (runs on cloud platforms like AWS, GCP) ❌ Slightly higher latency due to network calls ❌ Needs infrastructure for hosting
show managedSourceDirectories
# terraform/target/scala-3.3.3/src_managed/main
The Compile / sources
setting includes both unmanagedSources
(manually written) and managedSources
(generated by sbt/plugins).
show Compile / sources
You import them using the package name, not the file location
In Scala (and Java), you import classes using their package name, not their file system location
//Manually Written Code (src/main/scala/com/example/MyClass.scala)
package com.example
class MyClass {
def greet(): String = "Hello"
}
//Importing it in another file
import com.example.MyClass
val obj = new MyClass()
println(obj.greet())
Even though the file is in src/main/scala/com/example/MyClass.scala
, you import it using the package name
.
// Generated Code (target/scala-2.13/src_managed/main/com/example/GeneratedClass.scala)
//(Generated by sbt or a plugin)
package com.example
class GeneratedClass {
def message(): String = "Generated code works!"
}
//Importing it in your project:
import com.example.GeneratedClass
val gen = new GeneratedClass()
println(gen.message())
Even though the file is in target/.../src_managed
, you import it using its package name
Pointers are variables that store the memory address of another variable, rather than the value itself they are useful for efficient handling memory, passing large amount of data between functions
malloc(size) used to request a block of memory
/** Converts a tuple `(T1, ..., Tn)` to `(F[T1], ..., F[Tn])` */
type Map[Tup <: Tuple, F[_ <: Union[Tup]]] <: Tuple = Tup match {
case EmptyTuple => EmptyTuple
case h *: t => F[h] *: Map[t, F]
}
Mirrors give you access to two tuples: for a sum type, MirroredElemLabels
is a tuple with the names of the subtypes while MirroredElemTypes
is a tuple with the actual subtypes
-Xprint:postInlining
prints the entire scala code that will end up being compiled after inlining
inline
copies the implementation at the call site
we can also inline the arguments to a function by passing them explicitly rather than evaluating them firs before function call
metaprogramming
-
inlines: guarantees that a definition will be inlined at the point of use
-
macros: programmatically generate code, using quoting and splicing
-
TASTy reflection: inspection and construction of ASTs
-
Type class derivation
-
Match types: compute a type at compile time
parameters
- by value: evaluated before the method call
- by name: evaluated every time they are used
- implicit: by name,but in the implicit zone
inline zone
- inline(methods+ arguments,if,match)
- magic methods(erasedValue,constValue,summon*,error*)
- transparent
- reflection and quotes
compiletime.erasedValue
- allows type-based understanding
- cannot be returned as a value,must be inspectrd in a compile time expression
- works well with tuples
-
specify the name of the Scala package to use (the default is using the java package name).
-
request that ScalaPB will not append the protofile name to the package name.
-
specify Scala imports so custom base traits and custom types (see below) do not require the full class name.
package_name
sets the Scala base package name, if this is not defined, then it falls back tojava_package
and then topackage
.
-
Setting
flat_package
to true (default isfalse
) makes ScalaPB not append the protofile base name to the package name. You can also apply this option globally to all files by adding it to your ScalaPB SBT Settings. -
The
single_file
option makes the generator output all messages and enums to a single Scala file. -
The
preamble
is a list of strings that is output at the top of the generated Scala file. This option requiressingle_file
to be set. It is commonly used to define sealed traits that are extended using(scalapb.message).extends
- see custom base traits below and this example. -
The
object_name
option lets you customize the name of the generated class that contains various file-level members such as descriptors and a list of companion objects for the generated messages and enums. This is useful in case you are running into issues where the generated class name conflicts with other things in your project.
scala-3-macros-tips-and-tricks
Programs usually work with data in at least two different representations:
-
In memory representation: In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for efficient access and manipulation by the CPU.
-
Data on file and data over the network: When you want to write data to a file or send it over the network, you need to convert it as a self-contained sequence of bytes.
improving-efficiency-linkedins-transition-from-json-to-protocol-buffers This is necessary because the data structures used in memory, such as objects or pointers, are specific to the programming language and the runtime environment. Encoding transforms the in-memory representation of data into a format that can be easily and efficiently transmitted or stored as a sequence of bytes.
The translation from the in-memory representation to a byte sequence is called encoding (also known as serialization or marshaling), and the reverse is called decoding (parsing, deserialization, unmarshalling).
We have learned that Protobuf messages can be evolved in a way that enables a consumer which only knows about the new version to consume messages created with the old version and vice versa.
n Avro, this is not possible as the consumer must always know the schema that was used to serialize the message. There are different levels of compatibility which allow different changes and are explained here.
Systems using Avro usually employ a schema registry where all versions of a schema are stored. Messages must then be prefixed with the identifier of the schema used by the producer to allow the consumer to decode the message.
In proto3, all fields of a message are optional. If a field is not present in a serialized message, the value will be set to a default value by the receiver except for fields with another message as type. Only the field number and the value are serialized, the mapping from field numbers to names is done by the receiver. This means that the length of the field names has no impact on the size of the serialized message.
The Protobuf encoding requires a different amount of bytes to encode field numbers depending on the number. The field numbers 1-15 can be encoded using one byte, the following numbers up to 2047 require two bytes. If your message contains fields that are not used too often, you should consider assigning them a field number above 15 to have some one byte field numbers available for any extensions with new fields that will be used frequently.