Skip to content

Latest commit

 

History

History
90 lines (43 loc) · 9.15 KB

the-modern-web.md

File metadata and controls

90 lines (43 loc) · 9.15 KB

The Modern Web

Client-Server Architecture

The architecture developed by Alice had a server and a client, this simply is called client-server architecture. The server was responsible for handling the client's request, processing it ( making decisions & fetching data), and serving a response. And the client was responsible for sending a request, handling the response, and presenting the received data in a human-readable format.

Today there are more than one client-server architecture, each architecture performs the basic functions performed by Alice's architecture, however on the basis of segregation of these functions we have classified the architectures. Let's first discuss these functions.

The presentation tier is where the user interacts with the application, consider it the code visible on the browser.

The middle tier (business logic) is what the presentation layer and the data layer use to communicate with each other, it contains the code to process client's requests and if needed query the data layer.

The data layer is where the data is stored, it may additionally store the data processing code.

Monolithic/ Single-Process applications such as desktop applications, services, etc, usually do not involve communication over a network, as the entire application (the presentation layer and the data layer) is self-contained. It may communicate with other apps and services on the system but the core of the application is a single unit.

Two-tier application, the original Client-Server applications commonly referred as a two-tier application, as these applications involve two components, a presentation layer and a data layer (excluding the communication channel). The business logic usually resides in the data layer, however, it may as well reside in the presentation layer or both layers.

N-tier application, an N-tier application has a presentation layer, middle layer, and a data layer, simply put the data logic (database/ data processing) is separated from the business layer, traditionally both are included in the same (server in client-server) layer.

Communication channel, though not termed in the architecture name, is an important part of the architecture as it is of any other network.

N-tier architecture

The above figure depicts an example of a web application architecture involving all three tiers and some additional components. A pentester should at least know the functionality of some of these components (the figure is not an exhaustive one, thus we would go beyond the figure).

Let's go back to the narrative to understand these components

Cache

With each new paper published, Alex took more time to write the next one, thus the website was not updated quite often. And the content posted on the website remained the same for long stretches of time. However, his work became more popular and people from different universities (located in different regions of the world) started following Alex and accessed his content on the website.

Each time Bob or someone else wanted to read his paper they had to load the entire website again which was resource consuming. To overcome this Alex implemented Caching, which would store a copy of requested data on Bob's end. Caching can be implemented anywhere between the client and the server, however, Alex explicitly implemented caching of the website at the browser.

Caching is storing a copy of requested resources and serving it when requested again. In this case, the cache stored a copy of the requested papers and served them back to Bob when requested.

The Cache (in case of the web, HTTP Cache) will store a copy when a resource is requested the first time, and keeps a copy for a decided period of time, post that, the request is made again to the server and a new copy is served and cached. Web Cache uses the HTTP Headers for additional features such as the validity time of the stored data, private/ public storage of data, etc.

Cache-Control Header

An HTTP header, which helps define the caching policy, the header has multiple directives depending on the cache is managed.

Cache-Control: no-store: No data from the request and response is stored at the cache.

Cache-Control: no-cache: Cache stores response but verifies with the origin server before releasing a copy to a client.

Cache-Control: max-age=xxx: Cache stores responses for the given period of time, post which a new request is sent to the origin server.

Cache-Control: must-validate: Cache stores responses but verifies the status of stale resources, if the origin server responds with a 304 Not-Modfied, the cache servers the copy it had.

Figure depicting a cache behaviour, with fresh and stale resources

We use multiple forms of cache, browser cache, CDNs, Proxies, etc. These serve different purposes and are located at different point in the web application architecture. One differentiating factor amongst all these is the nature of cache i.e. if the cache is shared or private. A cache implemented at browser end is a private cache, however a Proxy set up at a university gateway is a shared cache. Let's consider some popular use cases for Cache.

Web Proxy: As Alice hosted the website on a university server which exposed university's network to anybody visiting the website, putting the entire infrastructure at a security risk. Alice employed a proxy server in front of his website and outside the critical network segment. The proxy hosted a cached form of website and kept the content updated by communicating with Alice's server. This allowed the non university traffic to be served at the proxy and restricted access to the university infrastructure.

CDN: Content Delivery Networks, as Alice's work gained popularity, people from different region of the world started visiting his website. As Alice's site was hosted in USA, users from far away faced speed issues in loading the content. Alice proposed a network of servers hosted in different region of the world which would work as a cache and serve content to nearby client. The client now, instead of visiting Alice's server in USA visits a much closer server (aka edge server). This group of servers is called a CDN. The CDNs keep connecting to Alice's server to keep the content updated.

Browser Cache: Browser cache, the browser too keeps a copy of visited sites in its cache and serves the user with the saved site depending upon the cache-control headers.

Proxy

We saw one usage of Proxy server as a cache, which is not the only way a proxy is used in web architectures. There are various kinds of proxies each named in accordance with their usage. Lets go through some more use case of proxy.

Proxy: As the name suggests it is an intermediary, in our case, in between the server and client. It accepts requests from the client, perform decision making and some other functions and forwards the request to the server. For complex functions a proxy might need to unpack and pack the data packets before sending them to the server (this brings in some more complications in case of SSL\TLS proxies, more on that later)

Proxies provide advantages with data filtering, security, caching, load balancing, bandwidth monitoring, etc. Based on the need, proxies can be employed in the architecture to play multiple roles and types. Two popular types of proxy are forward and reverse proxies. When the traffic flow is outward from the client to the server, forward proxy is used, the client is aware of the proxy used. When the traffic flow is inward (which is usually at the server end) the proxy used is termed reverse proxy, the client is usually not aware about the presence of proxy.

Cookies

Alice's work was seen in different geographical locations, and followed by professors, students, & technicians. English, which is not a dominant language in many countries, presented a problem to users visiting the website. To serve the community better, Alice had to serve the content in different languages, units, and matrices. He used Cookies, a concept in which the website stores some additional information (aka Cookies) at the client site and uses this information for decision making. The cookies were stored at the browser on user's first visit, which were then sent to the website along with subsequent requests. Alice used this information to decide the language of presentation and units of measurements such as Time Zones. Additionally, Alice used cookies to analyze demographic on the site.

Cookies have three major usage:

Personalization: deciding the language of content in above example.

Tracking: analysis of demographic in above example.

Session Management: let's understand this one in detail

Session Management

Stateless/ Stateful protocol

Why each request had to contain the cookies for decision making? This was because HTTP is a stateless protocol, i.e. there is no link between two subsequent request sent in the same connection. Thus the website used the cookies with each request to decide the language of the served content etc.