scala-graphql-caliban-workshop

references

preface

goals of this workshop
- introduction into GraphQL
  - motivation
  - schema
  - query, mutation, subscription
  - data loaders
workshop plan
- task1: implement query for returning all customers (with a size limit)
- task2: implement subscription for deleted customers
- task3: implement getting orders in batch

introduction

BFF - backend for frontend API
- different clients need different sets of data
  - Web, Iphone, Android, Tv
- instead of the frontend application aggregating data by calling multiple sources - create a BFF layer
  - layer does the following:
    - receive request from the client application
    - call multiple backend services
    - format and response according to what is needed by the client
    - respond to the client
- pros
  - simplify the frontend logic
  - avoid over-fetching or under-fetching
  - reduce the number of network calls from the client perspective
real world data representation
- best way: graph-like data structure
  - data model is usually a graph of objects with relations between them
- why think of data in terms of resources (in URLs) or tables?

graphql

GraphQL = Graph Query Language
is a new API standard that provides
- more efficient,
- powerful
- flexible
- alternative to REST
can be written in any programming language
has two major parts
- structure = strongly typed schema
  - schema = graph of fields that have types
    - all possible data objects that can be read (or updated)
  - client uses the schema to know what are the capabilities of API
- behavior = resolver functions
  - each field in a GraphQL schema is backed by a resolver function
  - resolver function defines what data to fetch for its field
  - resolver function represents the instructions on how and where to access raw data
takes the custom endpoint idea to an extreme
- the whole server = single smart endpoint
- multiple round-trip problem
  - client has to communicate with the server multiple times to gather all data
  - API server knows how to answer questions about a single resource
  - solution: GraphQL
5 key characteristics
1. hierarchical: queries = hierarchies of data definitions
2. view-centric: built to satisfy frontend requirements
3. strongly-typed: typed context (schema) + queries are executed within this context
4. introspective: type system (schema) itself is queryable
5. version-free: tools for the continuous evolution
vs REST
- don't need any documentation like swagger
- REST = over-fetching (to many fields) and under-fetching (many calls to get data you need)
  - client cannot specify which fields to select
- REST: each endpoint represents a resource
  - multiple network requests
- REST: problem with versioning
  - usually means new endpoints
  - GraphQL: add new fields and types without removing the old ones
    - clients can continue to use older features
      - can also incrementally update their code to use new features
    - important for mobile clients
      - you cannot control the version of the API they are using
    - GraphQL offers a way to deprecate (and hide)
- REST: turn into a mix of regular REST endpoints plus custom endpoints crafted for performance reasons
summary
- pros
  - decouples clients from servers
    - allows both of them to evolve and scale independently
  - efficiency
  - simplify client logic: client communicate with the GraphQL service
    - GraphQL service communicates with the different services
- cons
  - malicious queries
    - complex queries
      - solution: analyze AST complexity
    - large result size
      - solution: limit depth
    - long execution time
      - solution: limit the execution time
      - solution: stream the results
  - n+1 problem
    - solution: data loader
  - caching is no longer simple
    - network layer - unsuitable as there is a common URL for all operations
    - solution: granularity (per field)
  - hard to return simple Map

schema

something like swagger: graphql/graphiql
- https://api.spacex.land/graphql/

example

type Starship {
  id: ID!
  name: String! // non nullable
  appearsIn: [Episode!]! // list of objects
  length(unit: LengthUnit = METER): Float // argument
}

good practices
- usually make the types of fields non-null
- however, make all root fields nullable
  - in this case, nullability means that something went wrong but we’re allowing it to show other fields
scalar
- don't have any sub-fields
- predefined: ID, Boolean, Int, String

operations

three types of operations
- queries (READ operations)
- mutations (WRITE-then-READ operations)
  - queries that have side effects
- subscriptions
  - stream of responses
  - used for real-time data monitoring
  - require the use of a data-transport channel that supports continuous pushing of data
    - usually done with WebSockets

query

example

{
  hero {
    name
    friends {
      name
    }
  }
}

steps
1. validate the request against its schema
2. traverse the tree of fields and invoke the resolver functions
fragment
- example
```
{
  leftComparison: hero(episode: EMPIRE) {
    ...comparisonFields // spread that fragment
  }
  rightComparison: hero(episode: JEDI) {
    ...comparisonFields // spread that fragment
  }
}

fragment comparisonFields on Character {
  name
  appearsIn
  friends {
    name
  }
}
```
- are the composition units of the language
- are the reusable piece of any GraphQL operation
- split operations into smaller parts
- data required by an application = sum of the data required by individual components
  - makes a fragment the perfect match for a component
  - represent the data requirements for a single component and then compose them

mutation

is always a WRITE operation followed by a READ operation
vs queries
- queries will be done in parallel
- mutations sequentially
  - if an API consumer sends two mutation fields, the first is guaranteed to finish before the second begins

example

mutation CreateReviewForEpisode($ep: Episode!, $review: ReviewInput!) {
  createReview(episode: $ep, review: $review) {
    stars
    commentary
  }
}

{ // variables
  "ep": "JEDI",
  "review": {
    "stars": 5,
    "commentary": "This is a great movie!"
  }
}

subscription

client should NOT use subscriptions to stay up to date with backend
- use poll intermittently with queries
- re-execute queries on demand when a user performs a relevant action (such as clicking a button)
use subscriptions for
- small, incremental changes to large objects
  - polling for a large object is expensive
    - especially when most of the object's fields rarely change
  - fetch the object's initial state with a query
    - server can proactively push updates to individual fields
- low-latency, real-time updates
  - example: a chat application

data loaders

problem

querying for authors and books
authors "has many" books

we would like to achieve two SQL calls

SELECT *
FROM authors;
-- pretend this returns 3 authors
SELECT *
FROM books
WHERE author_id in (1, 2, 3); -- an array of the author's ids

query

{
  query { // 1 call
    authors {
      name
      books { // each book resolver would only get it’s own parent author: N calls
        title
      }
    }
  }
}

in REST: ORM will help

in graphQl: each resolver function really only knows about its own parent object

ORM won’t have the luxury of a list of author IDs anymore

result: N+1 calls

SELECT *
FROM authors;

SELECT *
FROM books
WHERE author_id in (1);

SELECT *
FROM books
WHERE author_id in (2);

SELECT *
FROM books
WHERE author_id in (3);

solutions
- batching
  - delay asking the database until we will have all appropriate IDs
- caching
  - no application-level caching shared among requests
  - rather simple memoization in the context of a single request
- example library: DataLoader (JavaScript utility library)

client cache

responses from REST are easy to cache
- dictionary nature
  - specific URL gives certain data
  - use the URL itself as the cache key

in graphQl: graph cache

no URL-like primitive (globally unique identifier for a given object)

best practice: expose such an identifier for clients to use

{
  starship(id:"3003") { // id field provides a globally unique key
    id
    name
  }
  droid(id:"2001") {
    id
    name
    friends {
      id
      name
    }
  }
}

pagination

two scenarios
- UX concern: too many items to display
  - mental overload for the user to see them all at once
- performance concern: too many items to load
  - it would overload our backend, the connection, or the client to load all of the items at once
types of pagination from the UX perspective
- numbered pages
  - example: book, Google search
  - expect it to be consistent over some period of time
  - sql
```
SELECT * FROM posts ORDER BY created_at LIMIT 10 OFFSET 20; // page 3, with a page size of 10
SELECT COUNT(*) FROM posts; // total number of entries or pages in the results
```
  - drawbacks
    - only for mostly static content
    - however, usually items are added and removed while the user is navigating
      - leads to skipping items
      - or displaying the same item twice
        
        new item was added at the top of the list
        
        skip and limit approach to show the item at the boundary between pages twice
- sequential pages like Reddit
  - aren’t numbered
  - content changes so rapidly - no point in page numbers at all
  - specify the place in the list we want to begin, and how many items we want to fetch
    - it doesn’t matter how many items were added to the top of the list
    - we have a constant pointer to the specific spot where we left off
      - pointer is called a cursor
      - cursor is a piece of data
        
        generally some kind of ID
        
        represents a location in a paginated list
  - example
```
SELECT * FROM posts
WHERE created_at < $after
ORDER BY created_at LIMIT $page_size;

https://www.reddit.com/?count=25&after=t3_49i88b
```
  - good practice: encoded cursor with some metadata or a timestamp (instead of a row ID)
    - resilient to row deletion
    - we don’t want the query to fail if a specific item is removed
- infinite scroll like Twitter
  - illusion of one very long page
- modern apps today use either the second or third approach
  - app’s content is constantly changing
  - doesn’t make sense to create the illusion of numbered pages

relay cursor connections

generic specification for how a GraphQL server should expose paginated data
generalized concepts we were talking about above
- friends(first:2 after:$opaqueCursor) // vs friends(first:2 after:$friendId)
  - cursors are opaque and their format should not be relied upon
    - suggestion: base64 encoding
  - additional flexibility for pagination model changes
    - user just uses opaque cursors

example

request

{
  user {
    id
    name
    friends(first: 10, after: "opaqueCursor") {
      edges { // each edge has a reference to the user object of the friend, and a cursor
        cursor // every item in the paginated list has its own cursor
        node {
          id
          name
        }
      }
      pageInfo {
        hasNextPage
      }
    }
  }
}

notice that if we want to, we can ask for 10 friends starting from the middle of the list we last fetched

response

{
  "data": {
    "products": {
      "pageInfo": {
        "hasNextPage": true,
        "hasPreviousPage": false
      },
      "edges": [
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc4ODg3NiwibGFzdF92YWx1ZSI6IjEwMDc5Nzg4ODc2In0=",
          "node": {
            "id": "1",
            "name": "Michal"
          }
        },
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5MzQyMCwibGFzdF92YWx1ZSI6IjEwMDc5NzkzNDIwIn0=",
          "node": {
            "id": "2",
            "name": "Marcin"
          }
        },
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5NDM4MCwibGFzdF92YWx1ZSI6IjEwMDc5Nzk0MzgwIn0=",
          "node": {
            "id": "3",
            "name": "Anna"
          }
        }
      ]
    }
  },
  ...
}

glossary
- connection - paginated field on an object
  - example: friends field on a user
- edge - metadata about one object in the paginated list
  - includes a cursor to allow pagination starting from that object
- node - actual object
- pageInfo - info about more pages of data to fetch

security

critical threat: resource-exhaustion attacks (DOS attacks) with overly complex queries
- are not specific to GraphQL
- example: query for deeply nested relationships
  - (user –> friends –> friends –> friends …)
- example use field aliases to ask for the same field many times
```
{
  empireHero: hero(episode: EMPIRE) {
    name
  }
  jediHero: hero(episode: JEDI) {
    name
  }
}
```
solution
- cost analysis on the query
- enforce limits on the amount of data
- timeouts

caliban

features
- minimize boilerplate
- purely functional (strongly typed, explicit errors)
- user friendly
- schema / resolver separation
vs sangria
- sangria: lots of boilerplate (macros to the rescue)
- sangria: future based (effects are better)
- sangria: schema and resolved tied together
schema is derived automatically from the case classes
- mangolia - used for create schema for traits and case classes
  - provide a schema for custom types
```
implicit val nesSchema: Schema[NonEmptyString] = Schema.stringSchema.contramap(_.value)
```
    - many useful types: https://github.com/niqdev/caliban-extras
```
val api = graphQL(resolver)
println(api.render) // prints derived schema
```

schema deriving examples

case classes

case class Pug(name: String, nicknames: List[String], pictureUrl: Option[String])

is transformed into

type: Pug {
    name: String,
    nicknames: [String!]!,
    pictureUrl: String // optionality -> option
}

enums

sealed trait Color
case object FAWN extends Color

is transformed into

type: enum Color { FAWN }

arguments

case class PugName(name: String)
case class Queries(pug: PugName => Option[Pug])

is transformed into

type: Queries { pug(name: String!): Pug }

n+1 problem
- solution: ZQuery
  - parallelize queries
  - cache identical queries
  - batch queries if batching function provided

builtin wrappers

val api = graphQL(...) @@
  maxDdepth(30) @@
  maxFields(200) @@
  timeout(10 seconds) @@
  printSlowQueries(1 second)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
project		project
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scala-graphql-caliban-workshop

preface

introduction

graphql

schema

operations

query

mutation

subscription

data loaders

client cache

pagination

relay cursor connections

security

caliban

About

Releases

Packages

Languages

License

mtumilowicz/scala-graphql-caliban-workshop

Folders and files

Latest commit

History

Repository files navigation

scala-graphql-caliban-workshop

preface

introduction

graphql

schema

operations

query

mutation

subscription

data loaders

client cache

pagination

relay cursor connections

security

caliban

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages