Kafka-Protobuf

Bandwidth refers to the maximum amount of data that can be transmitted over a network in a given period of time. It is usually measured in:

Bits per second (bps)
Kilobits per second (Kbps)
Megabits per second (Mbps)
Gigabits per second (Gbps)

More bandwidth = More data sent simultaneously

Supports More Messages per Second (High Throughput)

In Kafka, a topic can handle more messages per second with Protobuf than JSON. Example with 10 Mbps bandwidth: Protobuf sends ~5x more messages per second than JSON!

JSON (100 KB/message) → 12 messages/sec
Protobuf (20 KB/message) → 62 messages/sec

#Convert Message Sizes to Kilobits (Kb)
JSON = 100 KB × 8 = 800 Kb
Protobuf = 20 KB × 8 = 160 Kb
Bandwidth = 10,000 Kb/s

Higher throughput = More efficiency!

When Does Less Bandwidth Matter the Most?

✅ Microservices & Kafka → Handle more messages per second ✅ Cloud Applications → Reduce cloud costs ✅ Mobile & IoT → Improve performance in weak networks ✅ Streaming Systems (gRPC, FS2, Akka Streams) → Lower latency

Compile / sourceGenerators += (Compile / PB.generate).taskValue

This refers to ScalaPB's Protobuf code generator.

PB.generate is a task that converts .proto files into Scala case classes
It runs before sbt compiles your project

+=appends a new task to sourceGenerators, ensuring that Protobuf-generated files are compiled.

taskValue forces sbt to evaluate PB.generate as a task.

Without .taskValue, sbt would treat it as a reference instead of running the Protobuf generation task.

compile depends on sourceGenerators sourceGenerators depends on PB.generate

schema-evolution-protobuf-scalapb-fs2grpc

build-grpc-server-scala

libraryDependencies += groupID % artifactID % revision % configuration

Often during development, you may want to use different dependencies in different situations, even within the same module (called a project in sbt). For example, some dependencies are only needed during testing (a test framework, for example), while others are always needed. It would be wasteful to have a test framework mixed into the JAR file you distribute, and you shouldn't do that.

Running Scala 3 in a Container image on AWS Lambda

I don't want to set AWS credentials directly in GitHub Actions, so I want to use an IAM role.

How Lenses and Other Optics Will Change Programming / How to Skip Modeling Complex Data

How to use, structure, and customize Scribe, an advanced logging library for Scala 3

Running Scala 3 in a Container image on AWS Lambda

Model is Saved and Loaded in the Application (Embedded Model)

Pros & Cons ✅ Low latency (since the model runs in-memory) ✅ No external dependencies ❌ Hard to update dynamically (requires redeployment) ❌ Not scalable for high-traffic systems

Model is Hosted as a REST API (Model-as-a-Service)

Fraud detection systems send requests to this service via HTTP/REST or gRPC. eg fraud-detection-api

Pros & Cons ✅ Easy to update models without redeploying the entire system ✅ Scalable (runs on cloud platforms like AWS, GCP) ❌ Slightly higher latency due to network calls ❌ Needs infrastructure for hosting

show managedSourceDirectories
# terraform/target/scala-3.3.3/src_managed/main

The Compile / sources setting includes both unmanagedSources (manually written) and managedSources (generated by sbt/plugins).

show Compile / sources

You import them using the package name, not the file location

In Scala (and Java), you import classes using their package name, not their file system location

//Manually Written Code (src/main/scala/com/example/MyClass.scala)

package com.example

class MyClass {
  def greet(): String = "Hello"
}

//Importing it in another file

import com.example.MyClass

val obj = new MyClass()
println(obj.greet())

Even though the file is in src/main/scala/com/example/MyClass.scala, you import it using the package name.

// Generated Code (target/scala-2.13/src_managed/main/com/example/GeneratedClass.scala)

//(Generated by sbt or a plugin)
package com.example

class GeneratedClass {
  def message(): String = "Generated code works!"
}


//Importing it in your project:

    import com.example.GeneratedClass

val gen = new GeneratedClass()
println(gen.message())

Even though the file is in target/.../src_managed, you import it using its package name

Pointers in c

Pointers are variables that store the memory address of another variable, rather than the value itself they are useful for efficient handling memory, passing large amount of data between functions

malloc(size) used to request a block of memory

/** Converts a tuple `(T1, ..., Tn)` to `(F[T1], ..., F[Tn])` */
  type Map[Tup <: Tuple, F[_ <: Union[Tup]]] <: Tuple = Tup match {
    case EmptyTuple => EmptyTuple
    case h *: t => F[h] *: Map[t, F]
  }

Mirrors give you access to two tuples: for a sum type, MirroredElemLabels is a tuple with the names of the subtypes while MirroredElemTypes is a tuple with the actual subtypes

-Xprint:postInlining prints the entire scala code that will end up being compiled after inlining

inline copies the implementation at the call site

we can also inline the arguments to a function by passing them explicitly rather than evaluating them firs before function call

metaprogramming

inlines: guarantees that a definition will be inlined at the point of use
macros: programmatically generate code, using quoting and splicing
TASTy reflection: inspection and construction of ASTs
Type class derivation
Match types: compute a type at compile time

parameters

by value: evaluated before the method call
by name: evaluated every time they are used
implicit: by name,but in the implicit zone

inline zone

inline(methods+ arguments,if,match)
magic methods(erasedValue,constValue,summon*,error*)
transparent
reflection and quotes

compiletime.erasedValue

allows type-based understanding
cannot be returned as a value,must be inspectrd in a compile time expression
works well with tuples

ScalaPB file-level options lets you

specify the name of the Scala package to use (the default is using the java package name).
request that ScalaPB will not append the protofile name to the package name.
specify Scala imports so custom base traits and custom types (see below) do not require the full class name.
- package_name sets the Scala base package name, if this is not defined, then it falls back to java_package and then to package.
Setting flat_package to true (default is false) makes ScalaPB not append the protofile base name to the package name. You can also apply this option globally to all files by adding it to your ScalaPB SBT Settings.
The single_file option makes the generator output all messages and enums to a single Scala file.
The preamble is a list of strings that is output at the top of the generated Scala file. This option requires single_file to be set. It is commonly used to define sealed traits that are extended using (scalapb.message).extends - see custom base traits below and this example.
The object_name option lets you customize the name of the generated class that contains various file-level members such as descriptors and a list of companion objects for the generated messages and enums. This is useful in case you are running into issues where the generated class name conflicts with other things in your project.

macros-scala-3

intro-to-scala-3-macros

scala-3-macros-tips-and-tricks

scala3-macros-part1

scala3-macros-part2

distroless

kafka-compaction-tuning

Programs usually work with data in at least two different representations:

In memory representation: In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for efficient access and manipulation by the CPU.
Data on file and data over the network: When you want to write data to a file or send it over the network, you need to convert it as a self-contained sequence of bytes.

improving-efficiency-linkedins-transition-from-json-to-protocol-buffers This is necessary because the data structures used in memory, such as objects or pointers, are specific to the programming language and the runtime environment. Encoding transforms the in-memory representation of data into a format that can be easily and efficiently transmitted or stored as a sequence of bytes.

The translation from the in-memory representation to a byte sequence is called encoding (also known as serialization or marshaling), and the reverse is called decoding (parsing, deserialization, unmarshalling).

We have learned that Protobuf messages can be evolved in a way that enables a consumer which only knows about the new version to consume messages created with the old version and vice versa.

n Avro, this is not possible as the consumer must always know the schema that was used to serialize the message. There are different levels of compatibility which allow different changes and are explained here.

Systems using Avro usually employ a schema registry where all versions of a schema are stored. Messages must then be prefixed with the identifier of the schema used by the producer to allow the consumer to decode the message.

In proto3, all fields of a message are optional. If a field is not present in a serialized message, the value will be set to a default value by the receiver except for fields with another message as type. Only the field number and the value are serialized, the mapping from field numbers to names is done by the receiver. This means that the length of the field names has no impact on the size of the serialized message.

Important hint for field numbers

The Protobuf encoding requires a different amount of bytes to encode field numbers depending on the number. The field numbers 1-15 can be encoded using one byte, the following numbers up to 2047 require two bytes. If your message contains fields that are not used too often, you should consider assigning them a field number above 15 to have some one byte field numbers available for any extensions with new fields that will be used frequently.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
gcp		gcp
project		project
src/main		src/main
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.md		README.md
build.sbt		build.sbt
image.png		image.png
terraform.md		terraform.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka-Protobuf

Supports More Messages per Second (High Throughput)

When Does Less Bandwidth Matter the Most?

Pointers in c

ScalaPB file-level options lets you

Important hint for field numbers

About

Releases

Packages

Languages

reynaldjoabet/Kafka-Protobuf

Folders and files

Latest commit

History

Repository files navigation

Kafka-Protobuf

Supports More Messages per Second (High Throughput)

When Does Less Bandwidth Matter the Most?

Pointers in c

ScalaPB file-level options lets you

Important hint for field numbers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages