- references
- GraphQL - when REST API is not enough - lessons learned - Marcin Stachniuk
- GraphQL as an alternative approach to REST with Luis Weir
- Moving beyond REST: GraphQL and Java - Pratik Patel
- 2019 - Marcin Stachniuk - GraphQL - gdy api RESTowe to za mało
- GraphQL in Java World, let's go for a dive - Vladimir Dejanović
- Introducing and Scaling a GraphQL BFF
- Migrate your APIs to GraphQL: how? and why! by Guillaume Scheibel
- Exploring GraphQL-Braid: Leaving RESTish world and building a distributed GraphQL system
- Your GraphQL field guide by Bojan Tomić
- 4Developers Katowice: GraphQL - when REST API is not enough (...) cz. II, Marcin Stachniuk
- Zymposium - Caliban
- Functional Scala - Caliban: Designing a Functional GraphQL Library by Pierre Ricadat
- Caliban: Functional GraphQL Library for Scala by Pierre Ricadat
- SF Scala–A Tour of Caliban: FP GraphQL in Scala & Understanding Scala's Type System
- Spring Boot GraphQL Tutorial - Full Course
- https://ghostdogpr.github.io/caliban
- https://www.manning.com/books/graphql-in-action
- https://medium.com/@ghostdogpr/graphql-in-scala-with-caliban-part-1-8ceb6099c3c2
- https://medium.com/@ghostdogpr/graphql-in-scala-with-caliban-part-2-c7762110c0f9
- https://medium.com/@ghostdogpr/graphql-in-scala-with-caliban-part-3-8962a02d5d64
- https://www.peerislands.io/how-to-use-graphql-to-build-bffs/
- https://graphql.org/
- https://www.apollographql.com/docs
- https://medium.com/the-marcy-lab-school/what-is-the-n-1-problem-in-graphql-dd4921cb3c1a
- https://www.apollographql.com/blog/graphql/explaining-graphql-connections/
- https://www.apollographql.com/blog/graphql/pagination/understanding-pagination-rest-graphql-and-relay
- https://www.sitepoint.com/paginating-real-time-data-cursor-based-pagination/
- https://shopify.dev/api/usage/pagination-graphql
- goals of this workshop
- introduction into GraphQL
- motivation
- schema
- query, mutation, subscription
- data loaders
- introduction into GraphQL
- workshop plan
- task1: implement query for returning all customers (with a size limit)
- task2: implement subscription for deleted customers
- task3: implement getting orders in batch
- BFF - backend for frontend API
- different clients need different sets of data
- Web, Iphone, Android, Tv
- instead of the frontend application aggregating data by calling multiple sources - create a BFF layer
- layer does the following:
- receive request from the client application
- call multiple backend services
- format and response according to what is needed by the client
- respond to the client
- layer does the following:
- pros
- simplify the frontend logic
- avoid over-fetching or under-fetching
- reduce the number of network calls from the client perspective
- different clients need different sets of data
- real world data representation
- best way: graph-like data structure
- data model is usually a graph of objects with relations between them
- why think of data in terms of resources (in URLs) or tables?
- best way: graph-like data structure
- GraphQL = Graph Query Language
- is a new API standard that provides
- more efficient,
- powerful
- flexible
- alternative to REST
- can be written in any programming language
- has two major parts
- structure = strongly typed schema
- schema = graph of fields that have types
- all possible data objects that can be read (or updated)
- client uses the schema to know what are the capabilities of API
- schema = graph of fields that have types
- behavior = resolver functions
- each field in a GraphQL schema is backed by a resolver function
- resolver function defines what data to fetch for its field
- resolver function represents the instructions on how and where to access raw data
- structure = strongly typed schema
- takes the custom endpoint idea to an extreme
- the whole server = single smart endpoint
- multiple round-trip problem
- client has to communicate with the server multiple times to gather all data
- API server knows how to answer questions about a single resource
- solution: GraphQL
- 5 key characteristics
- hierarchical: queries = hierarchies of data definitions
- view-centric: built to satisfy frontend requirements
- strongly-typed: typed context (schema) + queries are executed within this context
- introspective: type system (schema) itself is queryable
- version-free: tools for the continuous evolution
- vs REST
- don't need any documentation like swagger
- REST = over-fetching (to many fields) and under-fetching (many calls to get data you need)
- client cannot specify which fields to select
- REST: each endpoint represents a resource
- multiple network requests
- REST: problem with versioning
- usually means new endpoints
- GraphQL: add new fields and types without removing the old ones
- clients can continue to use older features
- can also incrementally update their code to use new features
- important for mobile clients
- you cannot control the version of the API they are using
- GraphQL offers a way to deprecate (and hide)
- clients can continue to use older features
- REST: turn into a mix of regular REST endpoints plus custom endpoints crafted for performance reasons
- summary
- pros
- decouples clients from servers
- allows both of them to evolve and scale independently
- efficiency
- simplify client logic: client communicate with the GraphQL service
- GraphQL service communicates with the different services
- decouples clients from servers
- cons
- malicious queries
- complex queries
- solution: analyze AST complexity
- large result size
- solution: limit depth
- long execution time
- solution: limit the execution time
- solution: stream the results
- complex queries
- n+1 problem
- solution: data loader
- caching is no longer simple
- network layer - unsuitable as there is a common URL for all operations
- solution: granularity (per field)
- hard to return simple Map
- malicious queries
- pros
- something like swagger: graphql/graphiql
- example
type Starship { id: ID! name: String! // non nullable appearsIn: [Episode!]! // list of objects length(unit: LengthUnit = METER): Float // argument }
- good practices
- usually make the types of fields non-null
- however, make all root fields nullable
- in this case, nullability means that something went wrong but we’re allowing it to show other fields
- scalar
- don't have any sub-fields
- predefined: ID, Boolean, Int, String
- three types of operations
- queries (READ operations)
- mutations (WRITE-then-READ operations)
- queries that have side effects
- subscriptions
- stream of responses
- used for real-time data monitoring
- require the use of a data-transport channel that supports continuous pushing
of data
- usually done with WebSockets
- example
{ hero { name friends { name } } }
- steps
- validate the request against its schema
- traverse the tree of fields and invoke the resolver functions
- fragment
- example
{ leftComparison: hero(episode: EMPIRE) { ...comparisonFields // spread that fragment } rightComparison: hero(episode: JEDI) { ...comparisonFields // spread that fragment } } fragment comparisonFields on Character { name appearsIn friends { name } }
- are the composition units of the language
- are the reusable piece of any GraphQL operation
- split operations into smaller parts
- data required by an application = sum of the data required by individual components
- makes a fragment the perfect match for a component
- represent the data requirements for a single component and then compose them
- example
- is always a WRITE operation followed by a READ operation
- vs queries
- queries will be done in parallel
- mutations sequentially
- if an API consumer sends two mutation fields, the first is guaranteed to finish before the second begins
- example
mutation CreateReviewForEpisode($ep: Episode!, $review: ReviewInput!) { createReview(episode: $ep, review: $review) { stars commentary } } { // variables "ep": "JEDI", "review": { "stars": 5, "commentary": "This is a great movie!" } }
- client should NOT use subscriptions to stay up to date with backend
- use poll intermittently with queries
- re-execute queries on demand when a user performs a relevant action (such as clicking a button)
- use subscriptions for
- small, incremental changes to large objects
- polling for a large object is expensive
- especially when most of the object's fields rarely change
- fetch the object's initial state with a query
- server can proactively push updates to individual fields
- polling for a large object is expensive
- low-latency, real-time updates
- example: a chat application
- small, incremental changes to large objects
- problem
- querying for authors and books
- authors "has many" books
- we would like to achieve two SQL calls
SELECT * FROM authors; -- pretend this returns 3 authors SELECT * FROM books WHERE author_id in (1, 2, 3); -- an array of the author's ids
- query
{ query { // 1 call authors { name books { // each book resolver would only get it’s own parent author: N calls title } } } }
- in REST: ORM will help
- in graphQl: each resolver function really only knows about its own parent object
- ORM won’t have the luxury of a list of author IDs anymore
- result: N+1 calls
SELECT * FROM authors; SELECT * FROM books WHERE author_id in (1); SELECT * FROM books WHERE author_id in (2); SELECT * FROM books WHERE author_id in (3);
- solutions
- batching
- delay asking the database until we will have all appropriate IDs
- caching
- no application-level caching shared among requests
- rather simple memoization in the context of a single request
- example library: DataLoader (JavaScript utility library)
- batching
- responses from REST are easy to cache
- dictionary nature
- specific URL gives certain data
- use the URL itself as the cache key
- dictionary nature
- in graphQl: graph cache
- no URL-like primitive (globally unique identifier for a given object)
- best practice: expose such an identifier for clients to use
{ starship(id:"3003") { // id field provides a globally unique key id name } droid(id:"2001") { id name friends { id name } } }
- two scenarios
- UX concern: too many items to display
- mental overload for the user to see them all at once
- performance concern: too many items to load
- it would overload our backend, the connection, or the client to load all of the items at once
- UX concern: too many items to display
- types of pagination from the UX perspective
- numbered pages
- example: book, Google search
- expect it to be consistent over some period of time
- sql
SELECT * FROM posts ORDER BY created_at LIMIT 10 OFFSET 20; // page 3, with a page size of 10 SELECT COUNT(*) FROM posts; // total number of entries or pages in the results
- drawbacks
- only for mostly static content
- however, usually items are added and removed while the user is navigating
- leads to skipping items
- or displaying the same item twice
- new item was added at the top of the list
- skip and limit approach to show the item at the boundary between pages twice
- new item was added at the top of the list
- sequential pages like Reddit
- aren’t numbered
- content changes so rapidly - no point in page numbers at all
- specify the place in the list we want to begin, and how many items we want to fetch
- it doesn’t matter how many items were added to the top of the list
- we have a constant pointer to the specific spot where we left off
- pointer is called a cursor
- cursor is a piece of data
- generally some kind of ID
- represents a location in a paginated list
- example
SELECT * FROM posts WHERE created_at < $after ORDER BY created_at LIMIT $page_size; https://www.reddit.com/?count=25&after=t3_49i88b
- good practice: encoded cursor with some metadata or a timestamp (instead of a row ID)
- resilient to row deletion
- we don’t want the query to fail if a specific item is removed
- infinite scroll like Twitter
- illusion of one very long page
- modern apps today use either the second or third approach
- app’s content is constantly changing
- doesn’t make sense to create the illusion of numbered pages
- numbered pages
- generic specification for how a GraphQL server should expose paginated data
- generalized concepts we were talking about above
friends(first:2 after:$opaqueCursor) // vs friends(first:2 after:$friendId)
- cursors are opaque and their format should not be relied upon
- suggestion: base64 encoding
- additional flexibility for pagination model changes
- user just uses opaque cursors
- cursors are opaque and their format should not be relied upon
- example
- request
{ user { id name friends(first: 10, after: "opaqueCursor") { edges { // each edge has a reference to the user object of the friend, and a cursor cursor // every item in the paginated list has its own cursor node { id name } } pageInfo { hasNextPage } } } }
- notice that if we want to, we can ask for 10 friends starting from the middle of the list we last fetched
- response
{ "data": { "products": { "pageInfo": { "hasNextPage": true, "hasPreviousPage": false }, "edges": [ { "cursor": "eyJsYXN0X2lkIjoxMDA3OTc4ODg3NiwibGFzdF92YWx1ZSI6IjEwMDc5Nzg4ODc2In0=", "node": { "id": "1", "name": "Michal" } }, { "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5MzQyMCwibGFzdF92YWx1ZSI6IjEwMDc5NzkzNDIwIn0=", "node": { "id": "2", "name": "Marcin" } }, { "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5NDM4MCwibGFzdF92YWx1ZSI6IjEwMDc5Nzk0MzgwIn0=", "node": { "id": "3", "name": "Anna" } } ] } }, ... }
- request
- glossary
- connection - paginated field on an object
- example: friends field on a user
- edge - metadata about one object in the paginated list
- includes a cursor to allow pagination starting from that object
- node - actual object
- pageInfo - info about more pages of data to fetch
- connection - paginated field on an object
- critical threat: resource-exhaustion attacks (DOS attacks) with overly complex queries
- are not specific to GraphQL
- example: query for deeply nested relationships
- (user –> friends –> friends –> friends …)
- example use field aliases to ask for the same field many times
{ empireHero: hero(episode: EMPIRE) { name } jediHero: hero(episode: JEDI) { name } }
- solution
- cost analysis on the query
- enforce limits on the amount of data
- timeouts
- features
- minimize boilerplate
- purely functional (strongly typed, explicit errors)
- user friendly
- schema / resolver separation
- vs sangria
- sangria: lots of boilerplate (macros to the rescue)
- sangria: future based (effects are better)
- sangria: schema and resolved tied together
- schema is derived automatically from the case classes
- mangolia - used for create schema for traits and case classes
- provide a schema for custom types
implicit val nesSchema: Schema[NonEmptyString] = Schema.stringSchema.contramap(_.value)
- many useful types: https://github.com/niqdev/caliban-extras
- provide a schema for custom types
val api = graphQL(resolver) println(api.render) // prints derived schema
- mangolia - used for create schema for traits and case classes
- schema deriving examples
- case classes
is transformed into
case class Pug(name: String, nicknames: List[String], pictureUrl: Option[String])
type: Pug { name: String, nicknames: [String!]!, pictureUrl: String // optionality -> option }
- enums
is transformed into
sealed trait Color case object FAWN extends Color
type: enum Color { FAWN }
- arguments
is transformed into
case class PugName(name: String) case class Queries(pug: PugName => Option[Pug])
type: Queries { pug(name: String!): Pug }
- case classes
- n+1 problem
- solution:
ZQuery
- parallelize queries
- cache identical queries
- batch queries if batching function provided
- solution:
- builtin wrappers
val api = graphQL(...) @@ maxDdepth(30) @@ maxFields(200) @@ timeout(10 seconds) @@ printSlowQueries(1 second)