-
Notifications
You must be signed in to change notification settings - Fork 4
/
DesignInterviewQuestions.txt
885 lines (721 loc) · 49.8 KB
/
DesignInterviewQuestions.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
--------------------------------------------------------------------------------
How to Approach Design Interview Questions
1. First ask clarifying questions
a. Functional Questions
i. MVP
ii. P1 Requirements / Features
b. Scalability Questions
i. TPS
ii. Amount of data that needs to be stored
c. Load and Storage Estimation
i. Cost Analysis
2. Try to add TODO for,
a. Consitency vs Avaiability
b. Read Heavy
c. Write Heavy System
3. Main Part
3a. Things I should take care of / First build a system that will work
a. A basic flow of a non-scalable System
b. How APIs would be
c. How entries would be stored in DB
3b. API Design
STATEFUL vs STATELESS
a. REST/RPC
b. GraphQL etc
c. Websockets if needed
- In Idempotent Way
3c. BackendDesign
a. Data Structures that will be involved
b. PUSH vs PULL model
c. Transactions where needed for Consistency
3d. Database
- Store Metadata where needed
a. Is the system READ HEAVY or WRITE HEAVY
b. DB choice
c. DB Schema
d. Sharding
e. Replication
f. Master; Slave and Analytics DB
4. Things to consider for Scalability
AVOID SINGLE POINT OF FAILURE
a. Fault Tolerance
b. Load Balancers to handle load
c. Cache where necessary
d. Handle Race considtions
e. Handly Synchronization stuffs
f. Push vs Pull based model
g. Server vs Serverless Architecture
h. Zookeeper for Configuration management
i. Message Queues and Pub-Sub
j. Background Jobs and Pre-computation
k. Asynchornous where needed
l. CDN
1. Cache static content
m. Rate limiting
5. Failure Handling
a. API failure,
b. DB failure,
c. Node failure etc
6. Monitoring
a. Metrics and Dashboards
b. Alarms, Monitors
c. Monitor host health status and automatic spin-up of new hosts
7. Retry Mechanism
8. HTTP Error Codes
9. Encryption / Hashing where needed for critical info
10 Privacy when storing customer data
11. JARGONS
Reliable,
Scalable,
Maintainable
Fault Tolerant
Metrics, Dashboards, Monitors and Alarms
Sticky Session and Session Replication
CDN for static content
Heart Beat
Gossip
SPOF Single Point of Failure
--------------------------------------------------------------------------------
System Design Interview Questions
1. How to Design Twitter
1b. How to Design a Trending Algorithm for Twitter
2. Create a TinyURL System
3. Design a Distributed Cache System like Redis or Memcache
4. Design a Key-Value Store
4b. Design a distributed hash table DHT
5. Design a Recommendation System
6. Design a Chat messenger (Whatsapp, FB, Slack)
7. Random ID Generator
8. Design a Garbage Collection System
9. Design Hit Counter
10. Build a Web Crawler
12. Design eCommerce Website
12b. Design Shopping Cart
13. Design a Chess Game
14. Design Tic Tac Toe
15. Design Parking Lot
16. Design Log Storage System
17. Design Mint
18. Design Tinder
19. Design Tiktok
20. Design Instagram
21. Design a Stock Exchange / Trading System like Robinhood
22. Design a Job Scheduler
23. Design Google Doc / Shared whiteboarding tool
--------------------------------------------------------------------------------
Key References:
https://roadtoarchitect.com/2018/09/04/useful-technology-and-company-architecture/ - Contains architecture of various companies
https://roadtoarchitect.com/category/system-design/
https://github.com/prasadgujar/low-level-design-primer/blob/master/solutions.md
https://igotanoffer.com/blogs/tech/system-design-interviews
https://www.algoexpert.io/systems/questions
https://www.educative.io/courses/grokking-the-system-design-interview
https://www.interviewbit.com/courses/system-design/
--------------------------------------------------------------------------------
System Design Interview Questions
1. How to Design Twitter
1. Things to be taken care of
- Have multiple users,
- Each user will post a tweek
- A tweet can be followed by users.
2. You will have,
- Frontend that take care of
- User publishing a tweet and other GUI parts
- Backend that stores all the information
- User details
- Tweet details
3. So basically two objects:
- User object
- Can be owner of multiple tweets
- Can follow tweets
- Tweet Object
- Can be followed by multiple users
4. For any user,
- you will have to fetch the top N feeds posted by the user's followers
- Arrang the feeds by latest time
Questions:
1. When users followed a lot of people, fetching and rendering all their feeds can be costly. How to improve this?
- You will fetch only the latest N feeds and publish it and keep a cursor
- On further scroll, you will fetch the next set of feeds
- Keep the latest feeds in Cache
2. How to detect fake users?
- Use machine learning algorithms and do analysis based on date, followers etc
3. Can we order feed by other algorithms? i.e. Based on user interest instead of time
- Also use other factors to order instead of just time
4. How to implement the @ feature and retweet feature?
@feature:
- Have a list of all followers who the user can tag.
- Prefix tree data structure to get it
---------------------------------------------
1b. How to Design a Trending Algorithm for Twitter
---------------------------------------------
2. Create a TinyURL System
http://blog.gainlo.co/index.php/2016/03/08/system-design-interview-question-create-tinyurl-system/?utm_source=email&utm_medium=email&utm_campaign=email
https://leetcode.com/discuss/interview-question/124658/Design-a-URL-Shortener-(-TinyURL-)-System/
https://medium.com/@sandeep4.verma/system-design-scalable-url-shortener-service-like-tinyurl-106f30f23a82 - Good one
APPROACH:
1. Questions to Interviewer:
1. Mention the basic functionality as primary goals
2. Secondary goals like or Open questions that I will focus later
- Ability to provide custom url
- Ability to get analytics
- Can the tiny url expire
- Can we return same / different url for multiple users
3. Scalability questions:
- How many queries per second can the system get?
Thigs to Calcualte:
0. Estimate the number of char for Tiny URL
1. What will be the total DB size
API:
POST https://tinyurl.com/shortenservice/api/tinyurl
Request Body: {url=long_url}
GET: /{short_url}
Return a http redirect response(302)
Database:
User DB
- User Id
PK
- Name
- Email
- Account Creation Time
Tiny Url DB
- User Id
FK
- Short url (Index)
- Long url (Index)
1. Open Questions:
1. If you use a unique integer from DB table and use that to generate a short URL, how will you guarentee same short URL gets generated for the long URL?
a. Should you add any check sum / salt so that users can't predict the tiny URL you would give?
2. Should two users get same short url for a long url or not?
a. Can we have multiple long urls map to same short url?
3. What if your DB crashes?
2. Important Points:
- Avoid confusing characters
- Avoid swear words
https://stackoverflow.com/questions/1562367/how-do-short-urls-services-work
https://stackoverflow.com/questions/742013/how-to-code-a-url-shortener
https://stackoverflow.com/questions/11326598/how-do-url-shorteners-guarantee-unique-urls-when-they-dont-expire
Others have answered how the redirects work but you should also know how they generate their tiny urls. You'll mistakenly hear that they create a hash of the URL in order to generate that unique code for the shortened URL. This is incorrect in most cases, they aren't using a hashing algorithm (where you could potentially have collisions).
Most of the popular URL shortening services simply take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).
a. Read Dynamo: Amazons Highly Available Key-value Store
Problem:
- Given a smaller URL you should find the corresponding bigger URL
- Given a bigger URL you should convert it into a smaller URL
- Hash bigger URL into a smaller URL of just few chars
- A-Z, a-z, 0-9 = 62chars.
- With size 7, 62^7 ~ 3500billion
Solution:
- We don't need two hash tables
Hash1: Key: smallUrl, Val: bigUrl
- If someone gives the same bigURL again, we hash it and check if we already have
that key. If so we don't add it.
Problem:
1. You have millions of URL that you can't store the HASH table in memory.
2. You want to store the hash table in multiple files so that you can spread the load
Now how will you decide which URL goes to which file?
Solution 1:
- Instead of random hash, use GUID a sequentially increasing number
- This way we can say, 1 - 50 in file1, 51 - 100 in file2 etc
Problem:
- What if IDs get deleted from the system
- If you keep removing multiple entries from say file 1, how will you add entries back in file1
Solution 2:
- Store entries in distributed key-value store
- Very similar to database sharding.
- A system in front to determine where each key, value pair will be stored.
- Say keys that end with a particular number go to a shard
Further Questions:
1. Replication to prevent disasters?
How to keep read and writes consistent?
2. Concurrency?
Multiple users inserting the same URL
A better way than global lock?
---------------------------------------------
3. Design a Distributed Cache System like Redis or Memcache
https://stackoverflow.com/questions/27268182/how-does-redis-achieve-the-high-throughput-and-performance
https://stackoverflow.com/questions/10489298/redis-is-single-threaded-then-how-does-it-do-concurrent-i-o
https://stackoverflow.com/questions/48035646/if-redis-is-single-threaded-how-can-it-be-so-fast
https://stackoverflow.com/questions/45364256/why-redis-is-single-threadedevent-driven
https://stackoverflow.com/questions/21304947/redis-performance-on-a-multi-core-cpu
https://www.quora.com/Why-isnt-Redis-designed-to-benefit-from-multi-threading
REDIS - Single threaded approach
1.
There can be multiple reasons to do so.
Ease of programming - Writing a multi-threaded program can be trickier. Sometimes multi-threading may not work, locks can block other threads.
Concurrency helps - Concurrency can be achieved on single threaded system. See [1]
CPU is not bottleneck - Usually network is bottleneck. CPUs are very fast. If application is designed right, i.e. avoiding blocking IO, threading will be near the bottom of the list to worry about.
Cost effective deployment - Such applications can work on any machine having at least a single CPU core.
2.
I'm currently trying to understand some basic implementation things of Redis. I know that redis is single-threaded and I have already stumbled upon the following Question: Redis is single-threaded, then how does it do concurrent I/O?
But I still think I didn't understood it right. Afaik Redis uses the reactor pattern using one single thread. So If I understood this right, there is a watcher (which handles FDs/Incoming/outgoing connections) who delegates the work to be done to it's registered event handlers. They do the actual work and set eg. their responses as event to the watcher, who transfers the response back to the clients. But what happens if a request (R1) of a client takes lets say about 1 minute. Another Client creates another (fast) request (R2). Then - since redis is single threaded - R2 cannot be delegated to the right handler until R1 is finished, right? In a multithreade environment you could just start each handler in a single thread, so the "main" Thread is just accepting and responding to io connections and all other work is carried out in own threads.
3.
Redis is single-threaded with epoll/kqueue and scales indefinitely in terms of I/O concurrency. --@antirez (creator of Redis)
A reason for choosing an event-driven approach is that synchronization between threads comes at a cost in both the software (code complexity) and the hardware level (context switching). Add to this that the bottleneck of Redis is usually the network, not the CPU. On the other hand, a single-threaded architecture has its own benefits (for example the guarantee of atomicity).
Therefore event loops seem like a good design for an efficient & scalable system like Redis.
Also, if yes, how can we make 100% utilization of CPU resources with redis on a multi core CPU's.
The Redis approach to scale over multiple cores is sharding, mostly together with Twemproxy.
https://github.com/memcached/memcached/wiki/Overview
MEMCACHE:
Memcached servers are indeed independent of each other. Memcached server is just an efficient key-value store implemented as in-memory hash table. What makes memcached distributed is the client, which in most implementations can connect to a pool of servers. Typical implementations use consistent hashing, which means that when you add or remove server to/from a pool of N servers, you only have to remap 1/N keys. Typically keys are not duplicated on various hosts, as memcached is not meant to be persistent store and gives no guarantees that your value will persist (for example when running out of assigned memory, memcached server drops least recently used (LRU) elements). Thus it's assumed that your application should handle missing keys.
if a cache goes down, requests go to DB
If you happen to use memcached to cache results of DB queries. Which is one of many possible uses. Memcached is just generic key-value store, can be used for other purposes.
Memcached supports multithreaded access to the store. It controls access to the resources via bare POSIX thread mutexes. Operations with hash table buckets are guarded with one of the pthread_mutex objects in the power-of-two sized array. Size of this array couldn't be smaller than hash table size. Index of the mutex for the bucket is determined as bucket index % bucket mutex array size. I. e. each mutex is responsible for hash table size / bucket mutex array size buckets.
Concurrent access management
While bucket lock is well striped, other three (memory allocator, LRU cache and statistics) locks are single, that limits scalability to only 5 threads.
At very least, having allocator and LRU cache locks per slabclass should help in some cases, and should be easy to implement, because slabclasses don't share state.
Mutex locks are slow, because they cause two context switches. Context switch cost estimates vary in a wide range, depending on if this is just a in-process thread switch, or an OS-level process switch (requiring much more kernel operations and TLB cache flush), CPU model, workload, if the server is dedicated for Memcached or have other intensively-working processes, and so on.
The broadest estimate, is that context switch takes from 1 to 50 microseconds, i. e. from 1000 to 50 000 nanoseconds.
The process of the entry insertion
Compute hash code of the key
Acquire corresponding bucket lock
Try to find the entry with the searched key in the table; if it is present, updates the entry correspondingly.
Otherwise compute the total entry size (key + value + overhead).
Determine closest ceiling entry size class (slabclass) for our entry size
Walk through 5 (hard-coded constant) last entries in the LRU chain for this size class, searching for expired entries. After each unsuccessful visit, try to allocate the new entry using ordinary process, via slab allocator. Of cause, search is continued only if we failed to allocate the entry, i. e. we are running out of available memory.
If an expired entry is taken, it is unlinked from old hash table bucket and repositioned in the LRU list
The newly allocated one is unlinked from the slabclass free entry list and inserted in front of the LRU chain
Entry contents are written and it becomes the head of the needed hash table bucket chain.
Like memory allocation operations, all operation with per-size-class LRU lists are guarded by a single pthread_mutex (cache_lock).
http://qnimate.com/overview-of-redis-architecture/
Redis Replication
Replication is a technique involving many computers to enable fault-tolerance and data accessibility. In a replication environment many computers share the same data with each other so that even if few computers go down, all the data will be available.
Master and slaves are redis servers configured as such.
All the slaves contain exactly same data as master. There can be as many as slaves per master server. When a new slave is inserted to the environment, the master automatically syncs all data to the slave.
All the queries are redirected to master server, master server then executes the operations. When a write operation occurs, master replicates the newly written data to all slaves. When a large number sort or read operation are made, master distributes them to the slaves so that a large number of read and sort operations can be executed at a time.
If a slave fails, then also the environment continues working. when the slave again starts working, the master sends updated data to the slave.
If there is a crash in master server and it looses all data then you should convert a slave to master instead of bringing a new computer as a master. If we make a new computer as master then all data in the environemt will be lost because new master will have no data and will makes the slaves also to have zero data(new master does resync ). If master fails but data is persistent(disk not crashed) then starting up the same master server again will bring up the whole environment to running mode.
Replication helped us from disk failures and other kinds of hardware failures. It also helped to execute multiple read/sort queries at a time.
https://redis.io/presentation/Redis_Cluster.pdf
https://redis.io/topics/cluster-tutorial
https://redis.io/topics/replication
Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call an hash slot.
There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384.
Every node in a Redis Cluster is responsible for a subset of the hash slots, so for example you may have a cluster with 3 nodes, where:
Node A contains hash slots from 0 to 5500.
Node B contains hash slots from 5501 to 11000.
Node C contains hash slots from 11001 to 16383.
Redis Cluster master-slave model
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
In our example cluster with nodes A, B, C, if node B fails the cluster is not able to continue, since we no longer have a way to serve hash slots in the range 5501-11000.
However when the cluster is created (or at a later time) we add a slave node to every master, so that the final cluster is composed of A, B, C that are masters nodes, and A1, B1, C1 that are slaves nodes, the system is able to continue if node B fails.
Node B1 replicates B, and B fails, the cluster will promote node B1 as the new master and will continue to operate correctly.
However note that if nodes B and B1 fail at the same time Redis Cluster is not able to continue to operate.
Redis Cluster consistency guarantees
Redis Cluster is not able to guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. This means that during writes the following happens:
Your client writes to the master B.
The master B replies OK to your client.
The master B propagates the write to its slaves B1, B2 and B3.
As you can see B does not wait for an acknowledge from B1, B2, B3 before replying to the client, since this would be a prohibitive latency penalty for Redis, so if your client writes something, B acknowledges the write, but crashes before being able to send the write to its slaves, one of the slaves (that did not receive the write) can be promoted to master, losing the write forever.
Replication
At the base of Redis replication there is a very simple to use and configure master-slave replication that allows slave Redis servers to be exact copies of master servers. The slave will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
When a master and a slave instance are well-connected, the master keeps the slave updated by sending a stream of commands in order to replicate the effects on the dataset happening in the master dataset: client writes, keys expiring or evicted, and so forth.
When the link between the master and the slave breaks, for network issues or because a timeout is sensed in the master or the slave, the slave reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the slave will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the slave, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being high latency and high performance, is the natural replication mode for the vast majority of Redis use cases. However Redis slaves asynchronously acknowledge the amount of data the received periodically with the master.
MEMCACHE:
Memcached servers do not communicate with each other and in fact, a Memcached server is completely blind to which objects are stored on it, not to mention other servers. This simple architecture enables Memcached to be very fast and effective, but comes with poor reliability, which is unacceptable by most web applications today.
There is no master node, all nodes are equal, there is no replication, and node selection is done by the client hashing algorithm.
Smarter clients use consistent hashing to avoid losing the entire data while scaling. So if you scale out (i.e. adding nodes) you lose "only" 1/(N+1) of the objects, where N is the number of nodes after you scaled-out.
What memcache normally does is a simple, yet very effective loadbalance trick: for each key that gets stored or fetched, it will create a hash (you might see it as md5(key), but in fact, it¿s a more specialized - quicker - hash method). Now, the hashes we create are pretty much evenly distributed, so we can use a modulus function to find out which server to store the object to:
In php¿ish code, it would do something like this:
$server_id = hashfunc($key) % $servercount;
---------------------------------------------
4. Design a Key-Value Store
http://blog.gainlo.co/index.php/2016/06/14/design-a-key-value-store-part-i/?utm_source=email&utm_medium=email&utm_campaign=email
Consistency when you have replica for disaster recovery:
1. Have a commit log
---------------------------------------------
4b. Design a distributed hash table DHT
https://stackoverflow.com/questions/144360/simple-basic-explanation-of-a-distributed-hash-table-dht
http://www.eecs.harvard.edu/~mema/courses/cs264/cs264.html
Ok, they're fundamentally a pretty simple idea. A DHT gives you a dictionary-like interface, but the nodes are distributed across the network. The trick with DHTs is that the node that gets to store a particular key is found by hashing that key, so in effect your hash-table buckets are now independent nodes in a network.
This gives a lot of fault-tolerance and reliability, and possibly some performance benefit, but it also throws up a lot of headaches. For example, what happens when a node leaves the network, by failing or otherwise? And how do you redistribute keys when a node joins so that the load is roughly balanced. Come to think of it, how do you evenly distribute keys anyhow? And when a node joins, how do you avoid rehashing everything? (Remember you'd have to do this in a normal hash table if you increase the number of buckets).
One example DHT that tackles some of these problems is a logical ring of n nodes, each taking responsibility for 1/n of the keyspace. Once you add a node to the network, it finds a place on the ring to sit between two other nodes, and takes responsibility for some of the keys in its sibling nodes. The beauty of this approach is that none of the other nodes in the ring are affected; only the two sibling nodes have to redistribute keys.
For example, say in a three node ring the first node has keys 0-10, the second 11-20 and the third 21-30. If a fourth node comes along and inserts itself between nodes 3 and 0 (remember, they're in a ring), it can take responsibility for say half of 3's keyspace, so now it deals with 26-30 and node 3 deals with 21-25.
There are many other overlay structures such as this that use content-based routing to find the right node on which to store a key. Locating a key in a ring requires searching round the ring one node at a time (unless you keep a local look-up table, problematic in a DHT of thousands of nodes), which is O(n)-hop routing. Other structures - including augmented rings - guarantee O(log n)-hop routing, and some claim to O(1)-hop routing at the cost of more maintenance.
---------------------------------------------
5. Design a Recommendation System
---------------------------------------------
6. Design a Chat messenger (Whatsapp, FB, Slack)
https://towardsdatascience.com/ace-the-system-interview-design-a-chat-application-3f34fd5b85d0
https://medium.com/double-pointer/system-design-interview-facebook-messenger-whatsapp-slack-discord-or-a-similar-applications-47ecbf2f723d
https://sookocheff.com/post/networking/how-do-websockets-work/
https://stackoverflow.com/questions/12526265/loadbalancing-web-sockets
Features:
Direct messaging: two users can chat with each other
Group chat: users can participate in group conversations
Join/leave groups, (add/delete friends not important for Slack)
Typing indicator: when typing, the recipient gets notified
User status: whether you are online or offline
When the user is offline, try send notifications to the users mobile device if a new message arrives.
Database Choice:
- No JOIN is needed between the tables
Data Access Patterns
READ
- For two users (A, B) retrieve all recent messages
- Given group G, retrieve all messages after a certain timestamp
- Given group G, retrieve all messages after a certain timestamp
- Given group G, find all member ID
WRITE
- For two users (A, B) save the chat message
- Group Chat
- Save a new message by user A in group G
- User updates
- Add/delete user A to/from group G
Schema
Every single message will be stored as a separate entry
- Private Chat Table
User Id Src User Id Dst Message Id Timestamp Message Status
Partition Key: User Id Src, User Id Dst
Sort Key: Message Id
- Group Chat
Group Id User Id Src Message Id Timestamp Message Status
Partition Key: Group Id
Sort Key: Message Id
- Message DB
Message Id Message
Partition Key: Group Id
- User Members Table
User Id Group Id
- Group Members Table
Group Id List User Ids List
- User Online Status
User Id Status Last Active Timestamp
MESSAGE ID
- message ID is used to determine the ordering. The message Id is not globally unique, as its scope is determined by the partition key. The system will never retrieve an item by its message ID alone.
REASON for two tables of USER TABLE and GROUP TABLE
The Group Membership table is used for message broadcasting we need to figure out who gets the message. The User Membership table is for listing all the groups a user has joined. We could use a single table with a secondary index, but the cardinality of group ID/user ID is too large for a secondary index. The two-table approach isnt without problems either if we mutate one, the other should be modified to maintain consistency, which requires distributed transactions.
- User Profile Table
- User Id Status Profile Pic Name List of Friends etc
API Design
send_message(user_id, receiver_id, channel_type, message)
get_messages(user_id, user_id2, channel_type, earliest_message_id)
join_group(user_id, group_id)
leave_group(user_id, group_id)
get_all_group(user_id)
How to order the messages
- To solve the ordering issue, we can annotate every message with a prevMsgID field. The recipient checks his local log and initiates history catch-ups when an inconsistency is found.
Optimizations
- To reduce the number of messages exchanged between servers, we can implement some buffering algorithm on WebSocket servers send accumulated messages, say, ~50 MS with randomized offsets (preventing everyone from sending messages at the same time).
Group Messages Push vs Pull vs Hybrid
- Push
- The problem with pushing is that it converts one external request (one message) into many internal messages. This is called Write Amplification. If the group is large and active, pushing group messages will take up a tremendous amount of bandwidth.
- Pull
- The problem with pulling is that one message is read over and over again by different clients (Read Amplification). Going along with this approach will surely overwhelm the database.
- Hybrid
- My idea is that we can build a hybrid system with the above logic. For smaller groups or inactive groups, it is okay to do pushing since write amplification wont stress out the servers. For very active and large groups, clients must query the HTTP server regularly for messages.
DB Optimizations
- Use BATCH Writes
---------------------------------------------
7. Random ID Generator
http://blog.gainlo.co/index.php/2016/06/07/random-id-generator/?utm_source=email&utm_medium=email&utm_campaign=email
https://medium.com/@varuntayal/what-does-it-take-to-generate-cluster-wide-unique-ids-in-a-distributed-system-d505b9eaa46e
a. See how Twitter's Snowflake works?
b. Check Flickr's ticket servers
Problem:
You have to design a ID generation engine
- Is the ID an integer like number of a RANDOM ID
- If integer like number then
- Similar to generating a unique number among 4 billion ints
- Else
- Combine timestamp with some unique identifier of the machine that sends the request.
- Final ID = timestamp + serverID + counter
We can also allow multiple requests within a single timestamp on a single server.
We can keep a counter on each server, which indicates how many IDs have been generated in the current timestamp.
So the final ID is a combination of timestamp, serverID and the counter.
Questions:
1. What if you get multiple requests (millions of request)
2. How to scale the machine?
1. How to tackle Clock Synchronization?
https://stackoverflow.com/questions/2671858/distributed-sequence-number-generation
OK, this is a very old question, which I'm first seeing now.
You'll need to differentiate between sequence numbers and unique IDs that are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.
Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:
a) You could use Twitter's Snowflake ID network service. Snowflake is a:
Networked service, i.e. you make a network call to get a unique ID;
which produces 64 bit unique IDs that are ordered by generation time;
and the service is highly scalable and (potentially) highly available; each instance can generate many thousand IDs per second, and you can run multiple instances on your LAN/WAN;
written in Scala, runs on the JVM.
b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDs and Snowflake's IDs are made. There are multiple options, but something along the lines of:
The most significant 40 or so bits: A timestamp; the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)
The next 14 or so bits: A per-generator counter, which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.
The last 10 or so bits: A unique value for each generator. Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.
c) You could generate the IDs on the clients, using just a timestamp and random value. This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteed to be globally unique, they're only very highly likely to be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:
The most significant 32 bits: Timestamp, the generation time of the ID.
The least significant 32 bits: 32-bits of randomness, generated anew for each ID.
d) The easy way out, use UUIDs / GUIDs.
---------------------------------------------
8. Design a Garbage Collection System
---------------------------------------------
9. Design Hit Counter
---------------------------------------------
10. Build a Web Crawler
Questions:
0. Purpose of crawler
1. Where to begin
a. Do we have seed URLs
2. Content Type
High Level Design
1. DFS vs BFS
a. DFS:
Go too deep
b. BFS
Can't give priority
2. Scalability
a. Message Queue per host
b. Distributed Crawl
c. Cache DNS
d. Localilty / Geographic for partitioning
e. Early timeout if servers fail to response
Deep Design:
1. Forward queue
- manage priority
2. Backward queue
- manage politeness
Things for Future:
1. URL Politeness
1 page at a time
Robots.txt
2. URL Freshness
3. URL Proritization
P1:
a. Spider Trap
b. Usefullness of a page
c. Duplicate content
DNS (can be cached)| Content Storage DB
|| |
| |
Seed URLs -----> URL Feeder Service -----> URL Message Queue -----> Page Lookup Service (set of workers) -----> Web page crawler -----> Content Validator (Cheks duplicates) -----> Link Extractor
| |
| |
| URL Filter
| |
| |
------------------------------------------------------------------------------------------------------------------------------------------------------------URL Validator
|
|
URL DB
URL Feeder Service :
- Inserts into the queue
- Can determine the priority etc
IMP:
- Add delay for requeries
- Robots.txt
Page Lookup Service
- Do we need to store the entire web page data?
- Why not just extracted metadata
Content Validator
- Checksum, Fingerprint etc
---------------------------------------------
---------------------------------------------
12. Design eCommerce Website
---------------------------------------------
12b. Design Shopping Cart
---------------------------------------------
13. Design a Chess Game
https://www.geeksforgeeks.org/design-a-chess-game/
https://codereview.stackexchange.com/questions/71790/design-a-chess-game-using-object-oriented-principles
https://medium.com/system-designing-interviews/design-a-chess-game-dddd7ba11bc0
https://stackoverflow.com/questions/4168002/object-oriented-design-for-a-chess-game
---------------------------------------------
14. Design Tic Tac Toe
https://codereview.stackexchange.com/questions/31769/tic-tac-toe-design
https://medium.com/system-designing-interviews/design-tic-tac-toe-game-1b912bba64cf
https://leetcode.com/problems/design-tic-tac-toe/
---------------------------------------------
15. Design Parking Lot
https://stackoverflow.com/questions/764933/amazon-interview-question-design-an-oo-parking-lot
https://www.geeksforgeeks.org/design-parking-lot-using-object-oriented-principles/
https://leetcode.com/discuss/interview-question/124739/Parking-Lot-Design-Using-OO-Design
---------------------------------------------
16. Design Log Storage System
http://highscalability.com/product-scribe-facebooks-scalable-logging-system
https://bravenewgeek.com/building-a-distributed-log-from-scratch-part-1-storage-mechanics/
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2015/building-distributedlog-twitter-s-high-performance-replicated-log-servic
https://hackernoon.com/part-1-building-a-centralized-logging-application-5a537033da0a
https://www.learnsteps.com/logging-infrastructure-system-design/
https://www.youtube.com/watch?v=dZ3swmtR1As - Naren L
https://www.youtube.com/watch?v=JaCA_pVS_1Y\
https://www.youtube.com/watch?v=DphnpWVYeG8
https://www.elastic.co/blog/small-medium-or-large-scaling-elasticsearch-and-evolving-the-elastic-stack-to-fit
https://leetcode.com/discuss/interview-question/system-design/622704/Design-a-system-to-store-and-retrieve-logs-for-all-of-eBay
https://www.learnsteps.com/logging-infrastructure-system-design/
Main Technologies
1. Logstash / Filebeat
2. Kafka
3. Elastic Search
4. S3 / Cassandra
Other components outside of our system:
1. Log file rotation
Key:
0. Important Log attributes
a. Request or Trace ID
b. Log Level (Info, Error, etc)
c. Timestamp
d. Service name
f. API name
g. Exception trace if any
h. File name
1. LOG AGGREGATION: Use filebeats to collect logs and ship them
a. Take care of crash of a system
i. Record last log uploaded
b. Include all types of logs (application, wire, system etc)
c. LOG SHIPPER TOOLS
i. Flutend, logstash etc
2. Pass the logs to a Message Queue
3. Take each log from the message queue and use "syslogger" to parse the logs
4a. TARGET SYSTEM 1: Store the logs in Document store like Cassandra
a. Store logs such that it can be distingushed on the following attributes
4b. TARGET SYSTEM 2: Elastic Search so that the logs can be passed to Kibana
5. Log Structure
Using (3) like a syslogger parse the content and create a document in DB with various Primary and Secondary Indexes
IMAGINE ATOCHA
i. Team information - So that logs are separate for each team
ii. Timestamp - Epoch time
iii. Log level
6. Visualize the data
a. Metrics, and Dashboard reporting from the logs
i. Use Kibana to visualize the data
FB's Scribe
1. API for clients
2. Data Aggregator / Collection
3. Data Transport
- Scribe, NSQ, Kafka, Logstash, Flume, FluentId
Push logs somewhere and it should be processed there.
When you save the log objects in Kafka it is then consumed by the worker that processes it for metrics data, fixes the log object format and then either pushes it to the next Kafka queue for further processing or saves it somewhere like s3.
4. Data Storage
S3, AWS Glacier, HDFS, Cassandra, MongoDB, ElasticSearch
5. Data Usefullizer (Metrics, Dashboard etc)
Kibana, Hive, Pig, GreyLog2
6. Alerting
Sentry, HoneyBadger
Rate limiter for amount of message per second
See if client can temporarily store logs if server is not available for some reason
Different Log Levels, Tag Id, Unique App / Request Id
Custom structure for companies to pick from
The idea here is you need a distributed file system to handle the continual stream of log messages. And once you have all the data stored safely away you'll need to use map-reduce to do anything with such a large amount of data.
---------------------------------------------
17. Design Mint
https://github.com/donnemartin/system-design-primer/tree/master/solutions/system_design/mint#design-mintcom
---------------------------------------------
18. Design Tinder
---------------------------------------------
19. Design Tiktok
---------------------------------------------
20. Design Instagram
---------------------------------------------
21. Design a Stock Exchange / Trading System like Robinhood
https://github.com/tssovi/grokking-the-object-oriented-design-interview/blob/master/object-oriented-design-case-studies/design-an-online-stock-brokerage-system.md - Has Class Diagram, UML etc
1. Requirements
1. Buy Sell stocks
2. Real time price
3. Order History
4. Analyze Risks
2. Load
a. Geo specific
b. 100k million orders/sec
c. Thousands of stock
d. Low Latency
e. Highly available
f. Reliable
3. Backend
a. Matching Engine
i. Matches Buyers with Sellers
ii. Features
1. Add an order
2. Cancel an order
3. Split an Order
4. Reliable
iii. Algorithm
1. Buy Priority Queue (Desc): 10$/5 ; 9$/10; 8$/5; 8$/5
2. Sell Priority Queue (Asc): 8$/5 ; 9$/5; 9$/5; 9$/5
First buy and sell can be easily matched
1. Buy Priority Queue (Desc): 9$/10; 8$/5; 8$/5
2. Sell Priority Queue (Asc): 9$/5; 9$/5; 9$/5
1. Buy Priority Queue (Desc): 8$/5; 8$/5
2. Sell Priority Queue (Asc): 9$/5
b. Order book table
Buy Qty; Buy Price; Sell Price; Sell Qty
500 5250.00 5252.00 1650
750 5249.20 5252.10 50
---------------------------------------------
22. Design a Job Scheduler
https://leetcode.com/discuss/general-discussion/1082786/System-Design%3A-Designing-a-distributed-Job-Scheduler-or-Many-interesting-concepts-to-learn
https://dropbox.tech/infrastructure/asynchronous-task-scheduling-at-dropbox
https://www.youtube.com/watch?v=s3GfXTnzG_Y
https://stackoverflow.com/questions/26094969/design-a-generic-job-scheduler
New ----> Enqueued ----> Claimed ----> Processing ----> Success / Fatal Failure
^ ^ | |
| | | |
| | | |
| ----------- |
-------------------------------
Requirements:
1. Delivery Guarantee
- 95% of tasks begin execution within five seconds from their scheduled execution time.
2. Availability
- 99.9% available to accept task scheduling requests from any client.
Guarantees:
1. At-least-once task execution
- All ATF system errors are implicitly considered retriable failures
2. No concurrent task execution
- A task can be claimed only if its existing task state is Enqueued
- First, tasks are explicitly claimed through an exclusive task state (Claimed) before starting execution
Features:
1. Immediate Execution of task
2. Delayed Execution of task
3. Cron Job
4. Rate limiter / Task Gating
5. Q: Can concurrent tasks run at the same time?
6. Retry behaviors on tasks
Assumptions:
1. No concurrent tasks
- at most one instance of a task will be actively executing at any given in point. This helps users write their callbacks without designing for concurrent execution of the same task from different locations.
2. Idempotence
- A single task on a lambda can be executed multiple times within the ATF system. Developers should ensure that their lambda logic and correctness of task execution in clients are not affected by this.
More Features:
Cancellation - you often want to kill a long running job, or prevent one from running.
Priority - you often want high priority jobs to run in preference to low priority jobs. But implementing this in a way that low priority jobs don't wait forever in system where lots of jobs are generated is "non-trivial"
Resources - some jobs may only be schedulable on systems which have certain resources. E.g. some will require large amounts of memory, or fast local disk, or fast network access. Allocating these efficiently is tricky.
Dependencies - some jobs may only be runable once other jobs have completed, and thus can not be scheduled before a given time.
Deadlines - some jobs need to be completed by a given time. (or at least be started by a given time.)
Permissions - some users may only be able to submit jobs to certain resource groups, or with certain properties, or a certain number of jobs, etc.
Quotas - some systems give users a specified amount of system time, and running a job subtracts from that. This could make a significant difference to the numbers in your example.
Suspension - some systems allow jobs to be check-pointed and suspended and the resumed later.
--------------------------------------------------------------------------------
23. Design Google Doc / Shared whiteboarding tool
IMP:
- We always want JUST ONE VERSION of the doc which is the source of truth across all users
- Concurrency: Multiple users trying to access the same thing
- Latency: Should be super low latency across geographic regions as docs will be shared globally
- Can use Locks
Need a Lock free architecture
Optimistic Locks
- Store version of an entity
- Use version to compare if an entity is updated
- Take when there are FEWER CONFLICTS
How to sync:
- Event based syncing
- Character by character or
- Even Font change etc
- Take every bits and pieces
- Line by line
- Diff syncing / Differential sync
- Like Git diff, take the copy and keep sending it back and forth
Operational Transformation (OT):
- Optimistic concurrency control mechanism
- Allows two editors to modify the same section of a document at the same time without conflict
- Or rather, provides a way for SANELY resolving the conflicts so that neither user intervention nor locking become necessary
Every changes to the doc is captured as an event
- Insert
- Delete
- Update
- Retain
Differential Synchronization (DS):
- Client
- Client Copy
- Server
- Server Copy
Conflict-free replicated data type (CRDT)