From 9712697680779d1418192b11421ebe59577e7d26 Mon Sep 17 00:00:00 2001 From: Anton Date: Tue, 5 Nov 2024 22:14:30 +0100 Subject: [PATCH] add content --- .../arc42/01_introduction_and_goals.adoc | 60 --------------- .../arc42/02_architecture_constraints.adoc | 23 ++++-- .../arc42/10_quality_requirements.adoc | 76 ++++++++++++++++++- documentation/arc42/11_technical_risks.adoc | 66 +++++++++++++++- documentation/arc42/12_glossary.adoc | 60 +++++++++++++-- 5 files changed, 210 insertions(+), 75 deletions(-) diff --git a/documentation/arc42/01_introduction_and_goals.adoc b/documentation/arc42/01_introduction_and_goals.adoc index 5268415..461b1de 100644 --- a/documentation/arc42/01_introduction_and_goals.adoc +++ b/documentation/arc42/01_introduction_and_goals.adoc @@ -53,15 +53,6 @@ a| * Hundreds of millions of visits/day (100-500 Million) | -| QGS2 -| Scalability -| Users activity -a| -.Each user uploads ~1 image/day -* Each image size ~2MB -* Data Processing Volume: ~1PB/day -| - | QGS3 | Scalability | Dynamic Scaling @@ -80,18 +71,6 @@ a| | The system must be designed to ensure minimal downtime, providing a 99.9% availability SLA. | -| QGR2 -| Reliability -| Fault Tolerance -| Maintains service continuity despite failures. In case of a node failure, the system should redistribute load seamlessly, avoiding disruptions. -| - -| QGR3 -| Reliability -| Minimizes downtime -| In the event of a regional disaster, the system should recover instantly with zero downtime. -| - | QGP1 | Performance | Low Latency @@ -105,33 +84,6 @@ a| * Uploads and downloads of images must be fast and efficient, with minimal latency, even under heavy loads. | -| QGP2 -| Performance -| High Throughput -| The system should support up to 10,000 concurrent requests per second, especially during peak hours. -| - -| QGSEC1 -| Security -| Access Control -a| -* Limits access for functionality to authorized users only. -* Only authenticated and authorized users can access specific features like posting content, following other users. -* Un authorized users can view limited timeline. -| - -| QGS2 -| Security -| Data Encryption -| Protects sensitive data both in transit and at rest. User credentials and personal information should be encrypted to ensure privacy and comply with data regulations. -| - -| QGP1 -| Portability -| User Interfaces -| Support only web interface -| - | QGM1 | Maintainability | Modular Design @@ -141,24 +93,12 @@ a| * Supports isolated updates and bug fixes. Changes, such as bug fixes, feature enhancements, or performance improvements, should be implemented quickly with minimal impact on other components. | -| QGM2 -| Maintainability -| Automated Testing -| Validates system functionality with each deployment. Automated integration and regression tests should run for each deployment to reduce the risk of introducing errors or breaking functionality. -| - | QGM3 | Maintainability | Teams Scalability | Enables multiple teams to work on the system simultaneously without dependencies that cause delays. The architecture should support decoupled microservices, allowing teams to deploy and update their services independently. | -| QGC1 -| Compatibility -| Backward Compatibility -| Supports multiple API versions and maintains backward compatibility with existing clients. The system should allow existing clients to function without breaking when new API versions are introduced. -| - | QGCE1 | Cost Efficiency | Storage Optimization diff --git a/documentation/arc42/02_architecture_constraints.adoc b/documentation/arc42/02_architecture_constraints.adoc index 207ed8c..791c31c 100644 --- a/documentation/arc42/02_architecture_constraints.adoc +++ b/documentation/arc42/02_architecture_constraints.adoc @@ -1,9 +1,22 @@ [[section-architecture-constraints]] == Architecture Constraints -- Con1: The platform must comply with data protection laws (e.g., GDPR, CCPA) and intellectual property regulations. This includes implementing necessary data privacy measures, consent mechanisms, and content moderation practices to avoid legal liabilities. -- Con2: The solution must adhere to budgetary constraints for development, deployment, and ongoing operational costs. -- Con3: .NET technologies -- Con4: Use Google Authentication, -- Con5: Use Azure Cloud +[cols="2*", options="header"] +|=== +| ID | Quality Category +| Con1 +| The platform must comply with data protection laws (e.g., GDPR, CCPA) and intellectual property regulations. This includes implementing necessary data privacy measures, consent mechanisms, and content moderation practices to avoid legal liabilities. +| Con2 +| The solution must adhere to budgetary constraints for development, deployment, and ongoing operational costs. + +| Con3 +| .NET technologies + +| Con4 +| .NET Use Google Authentication, + +| Con5 +| Use Azure Cloud + +|=== diff --git a/documentation/arc42/10_quality_requirements.adoc b/documentation/arc42/10_quality_requirements.adoc index e1135b0..8ab302a 100644 --- a/documentation/arc42/10_quality_requirements.adoc +++ b/documentation/arc42/10_quality_requirements.adoc @@ -2,10 +2,84 @@ == Quality Requirements +[cols="5*", options="header"] +|=== +| ID | Quality Category | Quality | Description | Scenario +| QGS2 +| Scalability +| Users activity +a| +.Each user uploads ~1 image/day +* Each image size ~2MB +* Data Processing Volume: ~1PB/day +| -=== Quality Tree +| QGR2 +| Reliability +| Fault Tolerance +| Maintains service continuity despite failures. In case of a failure, the system should redistribute load seamlessly, avoiding disruptions. +| +| QGP1 +| Performance +| Low Latency +a| +.Response time +* get images, search user <500ms at 99pt +* download image of size 2mb ~2000 ms +* Timeline/Newsfeed load time < 1000ms at 99pt + +.Out of scope +* Uploads and downloads of images must be fast and efficient, with minimal latency, even under heavy loads. +| + +| QGP2 +| Performance +| High Throughput +| The system should support up to 10,000 concurrent requests per second, especially during peak hours. +| + +| QGSEC1 +| Security +| Access Control +a| +* Limits access for functionality to authorized users only. +* Only authenticated and authorized users can access specific features like posting content, following other users. +* Un authorized users can view limited timeline. +| + +| QGS2 +| Security +| Data Encryption +| Protects sensitive data both in transit and at rest. User credentials and personal information should be encrypted to ensure privacy and comply with data regulations. +| + +| QGP1 +| Portability +| User Interfaces +| Support only web interface +| + +| QGM2 +| Maintainability +| Automated Testing +| Validates system functionality with each deployment. Automated integration and regression tests should run for each deployment to reduce the risk of introducing errors or breaking functionality. +| + +| QGM3 +| Maintainability +| Teams Scalability +| Enables multiple teams to work on the system simultaneously without dependencies that cause delays. The architecture should support decoupled microservices, allowing teams to deploy and update their services independently. +| + +| QGC1 +| Compatibility +| Backward Compatibility +| Supports multiple API versions and maintains backward compatibility with existing clients. The system should allow existing clients to function without breaking when new API versions are introduced. +| + +|=== === Quality Scenarios diff --git a/documentation/arc42/11_technical_risks.adoc b/documentation/arc42/11_technical_risks.adoc index 9680417..e7ef262 100644 --- a/documentation/arc42/11_technical_risks.adoc +++ b/documentation/arc42/11_technical_risks.adoc @@ -1,5 +1,69 @@ [[section-technical-risks]] -== Risks and Technical Debts +== Risks +[cols="1,3,4", options="header"] +|=== +| Risk ID | Risk Description | Mitigation Strategy + +| R1 +| High load from influencers causing performance degradation due to large-scale timeline updates. +| Implement a separate materialized view for influencer posts to reduce load on the main timeline. Use caching and batch processing. + +| R2 +| Dependency on external APIs, leading to potential service disruptions if APIs are down. +| Use local caching and circuit breakers to handle temporary outages and minimize the impact on users. + +| R3 +| Cost overruns due to high storage and compute needs as the system scales. +| Optimize storage through data compression, archiving, and implement auto-scaling to manage resources dynamically. + +| R4 +| Inconsistent data updates due to eventual consistency in microservices. +| Use idempotent operations and reconcile data periodically to ensure consistency across services. + +| R5 +| Risk of unauthorized access and data breaches. +| Enforce strong access controls, data encryption, and multi-factor authentication for enhanced security. + +| R6 +| Outdated data in influencer timelines due to cache staleness. +| Set appropriate cache expiration times and implement cache invalidation strategies to keep data fresh. +|=== + +== Technical Debts + +[cols="1,3,4", options="header"] +|=== +| Debt ID | Technical Debt Description | Impact and Remediation Strategy + +| TD1 +| Lack of centralized logging system for monitoring and troubleshooting. +| Makes debugging across services difficult. Remediate by implementing a centralized logging solution such as Azure Application Insights or ELK Stack. + +| TD2 +| Incomplete API versioning, leading to backward compatibility issues. +| Causes disruptions for clients using older API versions. Remediate by adopting consistent API versioning and supporting multiple versions. + +| TD3 +| Insufficient automated tests for microservices. +| Leads to increased risk of errors during deployment. Remediate by developing automated test suites covering unit, integration, and end-to-end tests. + +| TD4 +| Hard-coded configurations across services. +| Reduces flexibility in deployment and configuration management. Remediate by using a centralized configuration management tool, like Azure App Configuration. + +| TD5 +| Inconsistent error handling and retry mechanisms in services. +| Leads to unpredictable behavior during failures. Remediate by standardizing error handling and retry policies across services. + +| TD6 +| Limited resilience testing for failure scenarios (e.g., network partitions). +| Increases the risk of unexpected downtime. Remediate by conducting regular chaos engineering exercises to test service resilience. + +| TD7 +| No clear deprecation policy for obsolete services or APIs. +| Results in bloated codebase and confusion among developers. Remediate by establishing a deprecation policy with clear timelines for phasing out outdated services or versions. +|=== + diff --git a/documentation/arc42/12_glossary.adoc b/documentation/arc42/12_glossary.adoc index b95c26a..51ad0b5 100644 --- a/documentation/arc42/12_glossary.adoc +++ b/documentation/arc42/12_glossary.adoc @@ -1,15 +1,59 @@ [[section-glossary]] == Glossary +[cols="1,3", options="header"] +|=== +| Term | Definition +| API Gateway +| A server that acts as an entry point for clients, managing requests to multiple backend services in a microservices architecture. -[cols="e,2e" options="header"] -|=== -|Term |Definition +| Active-Active Strategy +| A high availability approach where multiple regions or instances are active simultaneously, ensuring seamless failover and load balancing. -| -| +| Microservices +| An architectural style that structures an application as a collection of loosely coupled services, each responsible for a specific business capability. -| -| -|=== +| API Versioning +| The practice of managing different versions of an API to maintain compatibility and support for clients using older versions. + +| Eventual Consistency +| A consistency model in distributed systems where updates are not immediately reflected across all nodes but will eventually converge to the same state. + +| High Availability +| A system design approach that ensures minimal downtime and continuous operation, often achieved through redundancy and failover mechanisms. + +| CI/CD Pipeline +| A set of automated processes for continuous integration and continuous deployment, enabling rapid and reliable software delivery. + +| Cache +| A temporary data storage layer that stores frequently accessed data to reduce retrieval times and improve performance. + +| Chaos Engineering +| The practice of testing a system's resilience by intentionally introducing failures to observe how it responds and identify areas for improvement. + +| Centralized Logging +| A logging approach where logs from various services are collected and stored in a central location for monitoring and troubleshooting. + +| CRUD Operations +| Basic data operations: Create, Read, Update, and Delete, commonly used in database management. + +| Fault Tolerance +| The ability of a system to continue operating properly in the event of a failure of some of its components. + +| Load Balancer +| A component that distributes incoming network traffic across multiple servers to ensure availability and reliability. + +| Namespace +| A logical grouping used to organize resources, often used in Kubernetes to isolate applications or environments. + +| Scalability +| The capability of a system to handle increased load by adding resources, such as compute power or storage. + +| Service Bus +| A messaging infrastructure that allows applications to communicate with each other in a decoupled way, commonly used in distributed systems. + +| SLA (Service Level Agreement) +| A commitment between a service provider and a client that defines the expected level of service performance, availability, and support. + +|===