01-Non-Life-Insurance-Pricing.Rmd

---
output:
  bookdown::pdf_document2:
    template: templates/brief_template.tex
  bookdown::word_document2: default
  bookdown::html_document2: default
documentclass: book
bibliography: references.bib
editor_options: 
  chunk_output_type: console
---

```{r echo=FALSE}
library(knitr)
```

<!-- Needed for leaving space to the quote, * is for no indentation after title -->

<!-- \titlespacing*{\chapter}{0pt}{80px}{35pt} -->

# Non-Life Insurance Pricing {#chap:nlip}

<!-- \chaptermark{Non-Life Insurance Pricing} -->

In this chapter we are going to provide an overview on how non-life insurance works from an actuarial point of view with a specific focus on the retail pricing process. For more details on the mathematics and statistics behind the concepts we introduce in this chapter, we refer to [@wuthrich-non-life-insurance-math-stats], [@wuthrich-data-analytics] and [@gigante2010tariffazione].


## Non-Life Insurance {#chap:non-life-ins}

The Italian Civil Code [@italian-civil-code] provides the following definition of insurance contract:

```{definition, ins-contr, name = "Insurance Contract, Art. 1882, Italian Civil Code"}
The insurance is the contract by which an insurer, in exchange of the payment of a certain premium, obliged himself, within the agreed limits:
\setlist{nolistsep}
\begin{enumerate}[noitemsep]
  \item to pay an indemnity to the insured equivalent to the damage caused by an accident;
  \item or to pay an income or a capital if a life-related event occurs.
\end{enumerate}
```

This definition identifies two parties: the _Insurer_ and the _Policyholder_. The policyholder pays to the Insurer a certain _Premium_ at the beginning of the insurance coverage and the insurer will pay a benefit if a certain event (_Claim_) occurs. This event could happen zero, one or more than one times, so it is possible to have more than one claim.

Usually, in non-life insurance, the benefit is the payment of a sum. This sum could be predetermined (e.g. in motor theft insurance, where the benefit is usually the value of the insured vehicle) or defined by the entity of the claim (e.g. in \ac{mtpl} insurance, it depends on the damage the policyholder has caused to a third party). Regarding the "agreed limits", another peculiarity of non-life insurances is that the coverage period is defined as a fixed amount of time, usually corresponding to 1 year.

Starting from this legal definition, we can formalize a non-life insurance contract as follows.

Let:

* $\left]t_1, t_2\right]$, with $t_1<t_2$, be the coverage period;
* $P>0$ be the premium paid by the policyholder to the insurer;
* $N\in\mathbb{N}$ be the number of claims occurred during the coverage period (_claims count_);
* $\tau_1, \tau_2, \dots, \tau_N$, with $t_1<\tau_1< \tau_2 < \dots < \tau_N<t_2$, be the timing of each claim;
* $Z_1, Z_2, \dots, Z_N > 0$ be the amount of each claim (_claims severities_ or _claims sizes_).

The total cost of claims for the insurance is
$$
S = 
\begin{cases}
  0                    & \text{if } N=0 \\
  \sum_{i=1}^{N}{Z_i}  & \text{if } N>0
\end{cases}
$$.

For simplicity, in the following we are going to just use the notation $S = \sum_{i=1}^{N}{Z_i}$ with the meaning of $0$ if $N=0$.

Figure \@ref(fig:ins-cashflow) shows the cash flows corresponding to the insurance contract. From this representation we can interpret the entering into an insurance contract by the policyholder as a way to exchange the negative cash flows $-Z_1, -Z_2, \dots, -Z_N$ with one single negative cash flow $-P$. On the other hand, the insurer undertakes the negative cash flows $-Z_1, -Z_2, \dots, -Z_N$ in exchange for a positive cash flow $+P$.

```{tikz, ins-cashflow, fig.cap = "Insurance Contract cash flows.", fig.ext = 'pdf', fig.pos = "hbtp", cache = TRUE, echo = FALSE}
\newcommand{\ImageWidth}{11cm}
\usetikzlibrary{decorations.pathreplacing, positioning, arrows.meta}

\begin{tikzpicture}
    % draw horizontal line   
    \draw[thick, -Triangle] (0, 0) -- (\ImageWidth, 0) node[font = \scriptsize, below left = 3pt and -8pt]{$t$};
    \draw[very thick] (1cm, 0) -- (9cm, 0);


    % draw vertical lines and times
    \draw (1cm, -3pt) -- (1cm, 3pt) node[anchor = south] {$t_{1}$};
    \draw (9cm, -3pt) -- (9cm, 3pt) node[anchor = south] {$t_{2}$};

    \draw (2.5cm, -3pt) -- (2.5cm, 3pt) node[anchor = south] {$\tau_{1}$};
    \draw (3.5cm, -3pt) -- (3.5cm, 3pt) node[anchor = south] {$\tau_{2}$};

    \path (5.15cm, -3pt) -- (5.15cm, 3pt) node[anchor = south] {$\dots$};

    \draw (6.8cm, -3pt) -- (6.8cm, 3pt) node[anchor = south] {$\tau_{N-1}$};
    \draw (8.0cm, -3pt) -- (8.0cm, 3pt) node[anchor = south] {$\tau_{N}$};


    % draw Policyholder cash flows
    \node at (-1cm, -14pt) {Policyholder};
    \node at (1cm, -14pt) {$-P$};


    % draw Insurer cash flows
    \node at (-1cm, -28pt) {Insurer};
    \node at (1cm, -28pt) {$P$};

    \node at (2.5cm, -28pt) {$-Z_1$};
    \node at (3.5cm, -28pt) {$-Z_2$};

    \node at (6.8cm, -28pt) {$-Z_{N-1}$};
    \node at (8.0cm, -28pt) {$-Z_N$};

    % \foreach \x in {0, 1, ..., 10}
    %     \draw (\x cm, -3pt) -- (\x cm, 3pt)
    %     node[anchor = south] {$t_{\x}$}
    % ;  
  ;

\end{tikzpicture}
```


The major difference between these cash flows is that $P$ is a certain amount, while $Z_1, Z_2, \dots, Z_N$, at the time $t_1$, are uncertain in the amount, in the count ($N$) and in the timing ($\tau_1, \tau_2, \dots, \tau_N$). So, the policyholder, paying a premium $P$, is giving his risk to the insurer.

This representation points out the inversion of the production cycle typical of the insurance activity. From the insurer point of view, the revenue emerges at the beginning of the economic activity, in $t_1$, while the costs will emerge later. In most other economic activities, the costs emerge before the selling of the product, so the agent can choose the selling price taking into account how much that product costed him. In insurance activity, the insurer, when is selling his product (the insurance coverage), doesn't know the amount of claims he is going to pay for that product. Thus, it is crucial to properly predict the future costs in order to determine an adequate premium.

From a statistical point of view, we can translate this uncertainty saying that $N$ and $Z_1, Z_2, \dots, Z_N$ are random variables. Therefore, we can say that $\left\{N, Z_1, Z_2, \dots \right\}$ is a stochastic process. Usually, in non-file insurance pricing, the variables $\tau_1, \tau_2, \dots, \tau_N$ are not taken into account because the coverage span is short and from a financial point of view the timing of the claims occurrences has negligible effect.

Previously we said that $Z_1, Z_2, \dots, Z_N$ are all positive. This assumption corresponds to the fact that we are excluding the null claims, i.e. the claims that have been opened, but result in no payment due by the insurer. For the values of $Z_i$ with $N<i$ we can use the rule that $\{N<i\} \Rightarrow \{Z_i = 0\}$, so $Z_{N+1}=0, \, Z_{N+2}=0, \, \dots$. Therefore, we can say that:
$$
\{N<i \} \Longleftrightarrow \{Z_i = 0\}
$$


## Non-Life Insurance Pricing {#chap:nlip-details}

In insurances, the premium that the the insurer offers to the policyholder in exchange for the insurance coverage is not the same for every policyholder. The insurer evaluates the risk related to that policy and determines a "proper" premium taking into account risk related factors and commercial related factors. The process of _pricing_ corresponds in defining the set of rules for determining this "proper" premium $P_i$ for a specific policyholder $i$, given the known information on him. In the next sections we are going to better explain what "proper" means.


### Compound Distribution hypotheses

The first step for evaluating the stochastic process $\left\{N, Z_1, Z_2, \dots \right\}$ is to introduce some probabilistic hypotheses. The usual hypotheses assumed are the following:

```{definition, comp-dist, name = "Compound distribution"}

Let's assume that:

1. for each $n>0$, the variables $Z_1|N=n,\ Z_2|N=n,\ \dots,\ Z_n|N=n$ are stochastically independent and identically distributed;
2. the probability distribution of $Z_i|N=n, \ i\le n$ does not depend on $n$.

Under these hypotheses we say that:
$$
S = \sum_{i=1}^{N}{Z_i}
$$
has a compound distribution.
```

The variable $Z_i|N=n$ used in this definition can be interpreted as the _claim severity for the $i$^th^ claim under the hypothesis that $n$ claims occurred_. The two hypotheses provided in definition \@ref(def:comp-dist) imply that the distribution of $Z_i|N=n, \ i\le n$ does not depend on $i$ nor on $n$. For this reason, in the following, we are going to use the notation $Z$ to represent a random variable with the distribution of $Z_i|N=n, \ i\le n$ and $F_Z(\cdot)$ for its cumulative distribution function (i.e. $F_Z(z) = P(Z\le z)$).

Let's consider the variable $Z_i|N\ge i$. We can interpret it as the _claim severity for the $i$^th^ claim under the hypothesis that the $i$^th^ claim occurred_. From the hypotheses provided in definition \@ref(def:comp-dist) we can obtain that also $Z_i|N\ge i$ has the same distribution of $Z_i|N=n, \ i\le n$, that is:
\begin{equation}
\label{eq:z}
P\left(Z_i \le z \middle| N\ge i \right) = F_Z(z)
\end{equation}

This result says that $Z$ can be considered as the _claim severity for a claim under the hypothesis that that claim occurred_. The demonstration of the equation \@ref(eq:z) is reported in the appendix in section \@ref(chap:appendix-notes-on-compound-distribution).


### Distribution of the Total Cost of Claims {#chap:tcc-dist}

Under the hypotheses of definition \@ref(def:comp-dist), it is possible to obtain the full distribution of $S$ given the distribution of $N$ and $Z$. In this chapter we are going to provide only the formula of the expected value $E(S)$, but, with the same approach one can obtain all the moments.

The expected value of the total cost of claims $E(S)$ can be obtained from the expected value of the claims count $E(N)$ and the expected value of the claim severity $E(Z)$ as:
\begin{equation}
\label{eq:s}
E(S) = E(N)E(Z)
\end{equation}

This result tells us that, under the hypotheses of the compound distribution, it is possible to easily obtain $E(S)$ from $E(N)$ and $E(Z)$. That means that we can model separately $E(N)$ and $E(Z)$ and, from them, obtain $E(S)$. That result is particularly useful in personalization (section \@ref(chap:personalization)), because, for each individual $i$, given the information we have on him, we can estimate his expected claim size $E(N_i)$ and his expected claim severity $E(Z_i)$ and obtain his expected total cost of claims as $E(S_i) = E(N_i) E(Z_i)$. The demonstration of the equation \@ref(eq:s) is reported in the appendix in section \@ref(chap:appendix-notes-on-compound-distribution).

 <!-- $x_i=(x_{i1},  x_{i2}, \dots, x_{ip})$ -->

### Risk Premium and Technical Price {#chap:risk-prem-tech-price}

The expected cost of claims $E(S)$ is important because it gives us a first interpretation of what "proper" premium means.

```{definition, risk-premium, name = "Risk Premium"}
Said $S$ the total cost of claims of a policyholder, his _Risk Premium_ is given by:
$$
P^{(risk)} = E(S)
$$
```

The _Risk Premium_ is the premium that on average covers the total cost of claims. As mentioned above, as the coverage spans are usually short, we are not taking into account the timing of the claims so we don't discount the fact that the claims occur later than the premium payment.

It is clear that this premium, that only covers the cost of claims, is not "proper" in the practice. 

First of all, the insurer has to cover also the expenses related to the policy (commission on sales and expenses related to the claim settlement) and the general expenses of the company. Adding the expenses, we obtain the _Technical Price_.

```{definition, technical-price, name = "Technical Price"}
Said $S$ the total cost of claims of a policyholder and $E$ the expenses related to his policy, his _Technical Price_ is given by:
$$
P^{(tech)} \ = \ E(S) + E \ = \ P^{(risk)} + E
$$
```

Secondly, even if the policyholder paid a premium that on average covers claims and expenses, undertaking that risk with nothing in return would not make sense for the insurer. So, to the technical price, some further loadings must be added, such as for example the loading for the cost of capital, the risk margin and the profit margin.

The amount of the technical price with these loadings can be further modified based on business logic, as we are going to discuss in section \@ref(beyond-technical-pricing).


## Modeling and Personalization {#chap:personalization}

<!--
Often the insurance premium for a specific policyholder $i$ is represented as the product of a reference premium $P$ and a relative coefficient $\alpha_i$ as follows:
$$
P_i = \alpha_i P
$$
The coefficient $\alpha_i$ and the reference premium $P$ can be estimated separately. The process of defining the function for obtaining $\alpha_i$ is called personalization.
-->

In this section we are going to better explain how pricing based on policyholder information works.


### Pricing variables {#chap:pricing-variables}

Usually for every policyholder we have a certain amount of information on him that is considered relevant for his risk evaluation. This information must be reliable and observable at the moment of the underwriting of the policy.

In motor insurances, this information could be:

* Information on the insured vehicle: make, model, engine power, vehicle mass, age of the vehicle;
* General information of the policyholder: age, sex, address (region, city, postcode), ownership of a private box where he parks the car;
* Insurance specific information of the policyholder: number of claims caused in the previous years, how long he has been covered, bonus-malus class;
* Policy options: amount of the maximum coverage, presence and amount of a deductible, presence of other insurance guarantees, how many drivers will drive the vehicle;
* Customer information on the policyholder: how many years he has been a customer of the insurer, how many other policies he owns.
* Telematic data: how many kilometers per year the policyholder traveled in the previous years, which kind of roads the policyholder traveled on, the speed maintained during the trips, how many times the policyholder exceeded the speed limit, how many sharp accelerations and decelerations per kilometer the policyholder performed.

These pieces of information are usually called _pricing variables_.

We must observe that some of these variables are available for every potential customer (such as his age and address), while others are only available for policyholder that are already customers (such as telematic data that is available only if the policyholder agreed on installing on their car the device that collects this data).

Moreover, even considering the variables that are available for every customer, it is important to be aware of how reliable they are. Some of them come from official documents (as customer age and address or bonus-malus class), but others could be declared by the customer and his statements are not easily verifiable by the insurer (as the ownership of a private box or how many drivers will drive the vehicle).

The topic of variables reliability fits in the wider framework of fraud detection. Insurance companies put a lot of effort in preventing frauds. This is done with active actions, such as documents checks and inspections, and with predictive fraud detection models. The two most common categories of frauds are underwriting frauds (such as false declaration on insurance related data) and settlement frauds (such as faking an accident). The customer information on the policyholder is usually important to predict both underwriting frauds and settlement frauds. Usually customers that have a longer relationship with the company and own many policies are less likely to commit frauds.

Regarding the topic of variables reliability, the Italian Insurance Associations ([ANIA](https://www.ania.it/)) in the last years made some big steps forward by collecting in its databases a lot of information about policyholders and vehicles and making it available to insurance companies. For example, by logging in these databases it is possible, at the moment of the quote request, to retrieve useful insurance specific information such as the number of claims caused by the customer in the previous years or how long he has been covered and useful information on his vehicle such as when it has been registered or how many changes of ownership did it experienced.

One of the roles of the actuary is to understand how reliable the information on the policyholder is and to decide how to use that information.


### Pricing variables encoding {#chap:pricing-variables-encoding}

Formally the pricing variables can be encoded as a vector of real numbers. $\boldsymbol{x}_i=(x_{i1}, x_{i2}, \dots, x_{ip})\in\mathcal{X}\subseteq\mathbb{R}^p$. In the modeling framework they can be also called explanatory variables, covariates, predictors or features.

The pricing variables can be of two types:

1. _Quantitative variables_: variables, like policyholder age or vehicle mass, that can be easily represented as a number;
2. _Qualitative variables_: variables, like policyholder sex or vehicle make, that represent a category and are usually represented with strings.

The quantitative variables, possibly transformed, are already suitable to be used.

To facilitate the use of the qualitative variables, they are usually encoded as sets of binary variables.

If a variable $x$ has only 2 possible modalities, it can be easily encoded in a binary variable $x'$ that assigns $0$ to one modality and $1$ to the other. For example, if $x = \text{sex}$, it can be encoded this way:
$$
x' = \begin{cases}
1 & \text{if } \text{sex } = \text{ `Male'} \\
0 & \text{if } \text{sex } = \text{ `Female'}
\end{cases}
$$

In general, if a variable $x$ has $K$ modalities, it can be encoded in $K-1$ binary variables $x'_1, x'_2, \dots, x'_{K-1}$. For example, if $x = \text{make}$ and it can have 4 possible modalities ('Fiat', 'Alfa-Romeo', 'Lancia', 'Ferrari') it can be encoded this way:
\begin{align*}
x'_1 & = \begin{cases}
1 & \text{if } \text{make } = \text{ `Fiat'} \\
0 & \text{otherwise} \\
\end{cases}
\\
x'_2 & = \begin{cases}
1 & \text{if } \text{make } = \text{ `Alfa-Romeo'} \\
0 & \text{otherwise} \\
\end{cases}
\\
x'_3 & = \begin{cases}
1 & \text{if } \text{make } = \text{ `Lancia'} \\
0 & \text{otherwise} \\
\end{cases}
\\
\end{align*}

The variables $x'_1$, $x'_2$, $x'_3$ are called dummy variables. We can observe that all the information about the make is embedded in just these 3 variables, so a fourth dummy variable that indicate the modality 'Ferrari' is not needed. Indeed:
$$
\text{make } = \text{`Ferrari'} \ \Longleftrightarrow \ x'_1=x'_2=x'_3=0
$$
<!-- In table \@ref(tab:dummy-variables) the dummy variable encoding is illustrated. -->

<!--
```{r, dummy-variables-table, echo = FALSE, cache = TRUE}
table <- tibble(
  Make = c("Fiat", "Alfa-Romeo", "Lancia", "Ferrari"),
  `$x'_1$` = c(1, 0, 0, 0),
  `$x'_2$` = c(0, 1, 0, 0),
  `$x'_3$` = c(0, 0, 1, 0)
)

table %>% 
  kable(
    # format = "latex",
    booktabs = T,
    align = "lccc",
    vline = "",
    # toprule = "", midrule = "\\hline",
    # linesep = "", bottomrule = "",
    toprule = "", midrule = "\\midrule\\addlinespace",
    linesep = "", bottomrule = "",
    caption = "Dummy variables encoding.",
    label = "dummy-variables",
    escape = FALSE
  ) %>% 
  kable_styling(
    position = "center",
    latex_options = "hold_position",
    full_width = FALSE
  ) %>% 
  row_spec(0, bold = T)
```
-->
 
 
For some models it is suggested to use also the dummy variable that indicates the $K$^th^ modality. This encoding is called one-hot encoding and it is mainly used in Neural Networks. For the models considered in this paper the $K-1$ dummy variables encoding is preferred, so we will always consider it.

In the following, when we use the notation $\boldsymbol{x}_i=(x_{i1}, x_{i2}, \dots, x_{ip})$, we always consider that the qualitative variables have been already encoded as dummy variables, so $(x_{i1}, x_{i2}, \dots, x_{ip})\in \mathcal{X} \subseteq \mathbb{R}^p$


### Pricing Rule and Modeling

The pricing variables are used as input of a _Pricing Rule_.

```{definition, pricing-rule, name = "Pricing Rule"}
A _Pricing Rule_ is a function $f(\cdot)$ that from an instance of a set of pricing variables $\boldsymbol{x}_i\in\mathcal{X}$ returns a price:

$$  
\begin{array}{rccl}
f: & \mathcal{X}      & \longrightarrow  & R_+ \\
   & \boldsymbol{x}_i & \longmapsto      & P_i \\
\end{array}
$$
```

The process of pricing consists in defining a Pricing Rule based on observed data from the past and assumptions on the future.

The first step for defining a Pricing Rule is to model the total cost of claims $S$ and obtain a pricing rule for the risk premium $P^{(risk)}$.

```{definition, modeling, name = "Modeling"}
Modeling a _response variable_ $Y$ means estimating a function
$$r:\mathcal{X}\rightarrow \mathcal{C}$$
that, given a set of explanatory variables $\boldsymbol{x}_i=(x_{i1}, x_{i2}, \dots, x_{ip})\in \mathcal{X} \subseteq \mathbb{R}^p$, returns the expected value of the response variable $E(Y)$ and possibly other moments of $Y$ or even the full distribution of $Y$.
```

In definition \@ref(def:modeling) we used a generic $\mathcal{C}$ as codomain of the function $r(\cdot)$ to not specify whether the model describes just $E(Y)$ (and so $\mathcal{C}=\mathbb{R}$) or something more, such as the couple $\left( E(Y), Var(Y) \right)$ or the full distribution of $Y$.

As we observed in section \@ref(chap:tcc-dist), under the compound distribution hypotheses, we don't have to model directly the total cost of claims $S$, but we can separately model $N$ and $Z$.


### Response variables and distributions {#chap:response-variables-and-distributions}

Usually in statistical modeling, the response variables are seen as random variables with a distribution belonging to a specified family.


#### Distribution for the Claims Count $N$ {#chap:dist-n}

The claim count $N$ is a discrete variable with values in $\{0, 1, 2, 3,\dots\}$. Even if in practice the number of claims can't be arbitrarily high, $N$ is usually modeled with distributions that give a positive probability to all natural numbers. One of the most common distribution used for $N$ is the Poisson distribution.

```{definition, def-poisson, name = "Poisson Distribution"}
A random variable $N$ with support $\{0,1,2,3,\dots \}$ has a Poisson distribution, if its probability function is:
$$
p_N(n) = P\left( N = n \right) = e^{-\lambda}\frac{\lambda^n}{n!}, \quad \lambda>0
$$
We will indicate it with the notation $N \sim Poisson(\lambda)$.
```

```{r, plot-poisson, echo = FALSE, fig.cap = "Poisson distribution for some values of $\\lambda$.", fig.align = "center", out.width = "90%", fig.width = 6, fig.height = 4, cache = TRUE}
tibble(x = 0:22) %>% 
  crossing(
    tibble(
      lambda = c(.8, 1, 2.5, 10)
    )
  ) %>% 
  mutate(
    y = dpois(x = x, lambda = lambda),
    lambda_label = str_c("lambda == ", lambda) %>% 
      fct_inorder()
  ) %>% 
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  geom_segment(
    aes(xend = x),
    yend = 0
  ) +
  facet_wrap(
    ~lambda_label,
    labeller = label_parsed
  ) +
  coord_cartesian(
    xlim = c(0, 20)
  ) +
  labs(
    x = "n", y = "p(n)"
  )
```

The Poisson distribution is a parametric distribution that only depends on the parameter $\lambda$. In figure \@ref(fig:plot-poisson), for different levels of $\lambda$ the distribution is represented. These plots show how, for larger values of $\lambda$, the distribution is shifted to larger values and it is wider.

Indeed, the first two moments are:
\begin{align*}
E(N)   & = \lambda \\
Var(N) & = \lambda
\end{align*}

Thus, increasing $\lambda$, both $E(N)$ and $Var(N)$ increase.

Looking to the distribution shape, we can see that:

* if $\lambda<1$, the mode is in $n=0$;
* if $\lambda=1$, $p(0)=p(1)=\frac{1}{e}$;
* if $\lambda>1$, the mode is in a value greater than $0$ and, as $\lambda$ increases, the distribution assumes a bell shape similar to the Normal distribution shape. The convergence to the Normal distribution can be obtained with the _Central Limit Theorem_.

In non-life insurance we usually are in the case with $\lambda<1$. E.g. the average claims frequency for motor third party liability insurances in Italy, in 2017 has been 5.68% [@ania-claim-freq].
<!-- ^[[ANIA yearly statistical report for motor third party liability](https://www.ania.it/ricerca-avanzata/-/asset_publisher/XIyLeujL9irt/content/id/113283)]. -->

The property $Var(N) = E(N)$ is an important constraint when the distribution is used in practice. It is possible that the observed data shows a different pattern. Often the observed data shows a situation where $Var(N) > E(N)$. This phenomenon is called _overdispersion_.

To address this issue it is possible to use more flexible distributions, such as Negative-Binomial distribution, or to adopt less assumptions on the response variable distribution. A common technique is the Quasi-Poisson model, that we will describe in chapter \@ref(chap:models).


#### Exposure {#chap:exposure}

In section \@ref(chap:non-life-ins) we said that non-life insurances usually have a fixed coverage period that usually spans for one year. Often we work with portfolios of insurances with different coverage periods. For example, this could be due to the presence of insurances born with shorter coverage periods or to the presence of insurances that has been closed earlier. Moreover, in companies data, often insurance data are collected for accounting years. This means that, if an insurance coverage $c$ spans in two consecutive accounting years $a$ and $a+1$, it is collected as two records: the couple $(c, a)$ and the couple $(c, a+1)$. This situation is quite common, as usually coverages start during the year and not all at the first of the year.

The coverage span for an insurance coverage is called _exposure_ and it is usually measured in years-at-risk. For instance, if an insurance coverage spans for 3 months, it corresponds to a quarter of year, so the exposure, measured in years-at-risk, is $v=\frac{1}{4}$. The term year-at-risk comes from the fact that the policyholder exposure is a risk for the insurer, so the exposure is the period in which the insurer is exposed to the risk of paying claims.

It is natural to assume that, if a policyholder has a longer exposure, it is expected for him to experience more claims. Considering that we have to work with policies with different exposures, in order to take this aspect into account, the usual assumption taken is the following: said $M$ the number of claims the policyholder will experience during his period of exposure $v$ (measured in years) and $N$ the number of claims the policyholder would experience during one year, we assume $E(M) = v E(N)$.

This assumption can be further extended if we assume that the claims come from a _Poisson process_.

```{definition, def-process-count, name = "Counting Process"}
A stocastic process $\{N(t), t\ge0\}$ is called \textit{counting process} if:

\begin{enumerate}
\item The determination of N(t) are natural numbers \\
      $N(t) \in \{ 0, 1, 2, ... \} \ t\ge 0$
\item The process is not decreasing \\
      $s < t \Rightarrow N(s) \le N(t)$
\end{enumerate}
```

In a counting process $\{N(t), t\ge0\}$:

* $N(t)$ can be interpreted as the number of events or arrivals that occur in the period $[0, t]$;
* $N(t) - N(s), \ s\le t$ can be interpreted as the number of events or arrivals that occur in the period $]s, t]$. $N(t) - N(s)$ is also called _increment_ of the process.

The counting process can be used to model the number of claims that occur to a specific policy.

```{definition, def-process-poisson, name = "Poisson Process"}
A counting process $\{N(t), t\ge0\}$ is a \textit{Poisson process} with intensity $\lambda$ if:

\begin{enumerate}
\item The increments of the process are stocastically independent \\
      $\forall n\ge0, \forall s_1 < t_1 \le \dots \le s_n < t_n$ \\
      $\Rightarrow \ N(t_1)-N(s_1), \dots, N(t_n)-N(s_n)$ are stocastically independent;
\item The probability of arrival in an interval is proportional to the size of the interval \\
      $\forall t\ge 0, \forall \Delta t >0 \ \Rightarrow \ P\left( N(t + \Delta t) - N(t) = 1 \right) = \lambda \Delta t + \omicron (\Delta t)$ \\
      where $\lim_{\Delta t \to 0}{\frac{\omicron(\Delta t)}{\Delta t}} = 0$
\item Multiple arrivals are excluded \\
      $\forall t\ge 0, \forall \Delta t >0 \ \Rightarrow \ P\left( N(t + \Delta t) - N(t) \ge 2 \right) = \omicron (\Delta t)$
\item Arrivals at time $0$ are almost impossible \\
      $P\left( N(0) = 0 \right) = 1 $
\end{enumerate}
```


Under these hypotheses we obtain the following result:

```{theorem, th-process-poisson, name = "Poisson Process"}
If $\{N(t), t\ge 0 \}$ is a Poisson process with intensity $\lambda$, then:
$$\forall t\ge 0, \forall \Delta t >0, \ \Rightarrow \ N(t + \Delta t) - N(t) \sim Poisson(\lambda \Delta t)$$
```

This result tells us that the distribution of the number of events in any interval $]t, t+\Delta t]$ only depends on the size of the interval $\Delta t$. Moreover, for the Poisson property we saw in section \@ref(chap:dist-n), we get:
$$E(N(t + \Delta t) - N(t)) = \lambda \Delta t$$

So, the expected number of arrivals is proportional to the size of the interval $\Delta t$. The intensity of the process $\lambda$ can be also interpreted as the expected number of claims in a unit period.

If we assume that the claims that occur to a policy come from a Poisson process with intensity $\lambda$, if we observe that policy for the period $]t, t+v]$, the claims count in that exposure period $M$ are distributed as:
$$ M\sim Poisson(v \lambda) $$
In particular, if the observed period spans 1 year, we get:
$$ M = N \sim Poisson(\lambda) $$


<!-- 
Poisson distribution
- Definition
- Properties
- Other distributions
- Quasi-Poisson

Exposure (years-at-risk)
Poisson process

-->


#### Distribution for the Claim Severity $Z$

The claim severity $Z$ is a continuous variable with values in $[0, +\infty[$. As for the claims count $N$, even if in practice it can't be arbitrarily high, it is usually modeled with distributions that give a positive density to all the numbers in $]0, +\infty[$. As the null claims are excluded, it is natural to assume $P\left( Z=0 \right) = 0$. One of the most common distribution used for $Z$ is the Gamma distribution.

```{definition, def-gamma, name = "Gamma Distribution"}
A random variable $Z$ with support $[0, +\infty[$ has a Gamma distribution, if its probability density function is:
$$
f_Z(z) = \frac{\rho^\alpha}{\Gamma(\alpha)}z^{\alpha-1}e^{-\rho z}, \quad \alpha > 0, \ \rho > 0
$$
where $\Gamma(\alpha) = \int_{0}^{+\infty}{z^{\alpha - 1} e^{-z} \mathrm{d} z}$.

We will indicate it with the notation $Z \sim Gamma(\alpha, \rho)$.
```


```{r, plot-gamma, echo = FALSE, fig.cap = "Gamma distribution for some values of $\\alpha$ and $\\rho$.", fig.align = "center", out.width = "90%", fig.width = 6, fig.height = 4,  cache = TRUE}
tibble(x = seq(from = 0, to = 25, by = .01)) %>% 
  crossing(
    tibble(
      alpha = c(.8, 1, 2, 16),
      rho = c(.2, .25, .5, 2)
    )
  ) %>% 
  mutate(
    y = dgamma(x = x, shape = alpha, rate = rho),
    label = str_c("list(alpha == ", alpha, ", ", "rho == ", rho, ")") %>%
      fct_inorder()
  ) %>% 
  ggplot(aes(x = x, y = y)) +
  geom_line() +
  facet_wrap(
    ~label,
    labeller = label_parsed
  ) +
  coord_cartesian(
    ylim = c(0, .3),
    xlim = c(0, 20)
  ) +
  labs(
    x = "z", y = "f(z)"
  )

```


The Gamma distribution is a parametric distribution that depends on two parameters:

* $\alpha > 0$, called shape parameter
* $\rho > 0$, called scale parameter

The first two moments of the Gamma distribution are:
\begin{align*}
E(Z)   & = \frac{\alpha}{\rho} \\
Var(Z) & = \frac{\alpha}{\rho^2}
\end{align*}

In figure \@ref(fig:plot-gamma), for different levels of $\alpha$ and $\gamma$ the distribution is represented. These plots show how, as the values of $\alpha$ and $\gamma$ change, the shape changes. We can see that:

* if $\alpha < 1$, $f_z(\cdot)$ is not defined in $0$ and it has a vertical asymptote in $z = 0$. In $]0, +\infty]$ it is monotonically decreasing.
* If $\alpha = 1$, $f_z(\cdot)$ starts from $f(0) = \rho$ and then decreases monotonically. In this case, the density function becomes $f_z(z) = \rho e^{-\rho z}$ and the distribution is also called exponential distribution.
* If $\alpha > 0$, $f_z(\cdot)$ starts from $f(0) = 0$, increases until the mode and then decreases.

In figure \@ref(fig:plot-gamma) the first three distributions represented have the same expected value $E(Z)=\frac{\alpha}{\rho} = 4$, but different shapes. The third and the fourth have the same variance $Var(Z) = \frac{\alpha}{\rho^2} = 8$, but different expected values. As the shape parameter $\alpha$ increases, the distribution assumes a bell shape similar to the Normal distribution one. The convergence to the Normal distribution can be obtained with the _Central Limit Theorem_.

Another parametrization often used for Gamma distribution is obtained by using the mean $\mu$ as a parameter:
$$
\mu = \frac{\alpha}{\rho}
$$
With this parametrization, the density function becomes:
$$
f_Z(z) = \frac{\left(\frac{\alpha}{\mu}\right)^\alpha}{\Gamma(\alpha)}z^{\alpha-1}e^{-\frac{\alpha}{\mu} z}, \quad \alpha > 0, \ \rho > 0
$$

The advantage of using the parameters $(\alpha, \mu)$ is that the link between $E(Z)$ and $Var(Z)$ becomes clearer:
\begin{align*}
E(Z)   & = \mu \\
Var(Z) & = \frac{1}{\alpha}\mu^2
\end{align*}

Computing the coefficient of variation we then obtain:
$$CV(Z) = \frac{\sqrt{Var(Z)}}{E(Z)} = \frac{1}{\sqrt{\alpha}}$$
This result means that the coefficient of variation is constant (given the shape parameter $\alpha$). As we saw for the Poisson distribution, it is possible that observed data shows a different pattern. In chapter \@ref(chap:models), for the Gamma distribution, we will use the parametrization based on $(\alpha, \mu)$ instead of the one based on $(\alpha, \rho)$.

Another characteristic of the Gamma distribution that could be problematic in modeling claims severity is that it has a light tail. This means that, as $z$ goes to $+\infty$, $f_Z(z)$ approaches $0$ quite fast. This could lead to a poor fitting for _large claims_. Other distributions with heavier tails are for example the _log-Normal_ and the _Pareto_.


#### Large Claims  {#chap:large-claims}

Modeling large claims in quite difficult in practice because usually there is not a lot of observed data on them, so it is hard to understand if they are related to some risk factors (identifiable by the pricing variables) or they happen just by chance due to a distribution with heavy tails.

First of all, to model large claims, we must define what a large claim is. What is usually done in practice is just choosing a threshold $\bar{z}$ and considering large all the claims with a size that exceeds that threshold. The value $\bar{z}$ must be chosen sufficiently big to consider large the claims above $\bar{z}$, but not so big that there are not enough observed claims that exceeds $\bar{z}$. One common choice for Motor Third Party Liability in European markets could be $\bar{z} = 100\, 000 \text{\euro}$.

```{definition, def-large-claim, name = "Large and Attritional Claims"}
Given a predetermined threshold $\bar{z}$, we say that:
  
\begin{itemize}
\item a claim $Z$ is a \textit{large claim} if $Z > \bar{z}$
\item a claim $Z$ is an \textit{attritional claim} if $Z \le \bar{z}$
\end{itemize}

For each claim $Z$ we call:
\begin{itemize}
\item \textit{Capped Claim Size} \\
      $Z' = \min(Z, \bar{z})$;
\item \textit{Excess Over the Threshold} \\
      $Z'' = \max(Z - \bar{z}, 0)$.
\end{itemize}
```

In figure \@ref(fig:large-claim) the _Capped Claim Size_ and the _Excess Over the Threshold_ are shown. It is easy to show that $Z$ can be decomposed as:
$$Z = Z' + Z''$$

```{tikz, large-claim, fig.cap = "Large claims.", fig.ext = 'pdf', cache = TRUE, echo = FALSE}
\newcommand{\ImageWidth}{11cm}
\usetikzlibrary{decorations.pathreplacing, positioning, arrows.meta}

\begin{tikzpicture}
    % draw horizontal lines
    
    \draw[thick, -Triangle] (-0.5cm, 0) -- (\ImageWidth, 0);
    \draw[thick, -Triangle] (0, -0.25cm) -- (0, 2/3*7cm);
    
    \draw (-3pt, 2/3*4cm) node[anchor = east] {$\bar{z}$} -- (\ImageWidth, 2/3*4cm);


    % draw vertical lines

    \draw [thick] (1.5cm, 0cm)   -- (1.5cm, 2/3*3cm) node[anchor = south] {$Z_{1}$};
    \draw [thick] (1.5cm - 3pt, 2/3*3cm) -- (1.5cm + 3pt, 2/3*3cm);
    
    \draw [thick] (3cm, 0cm)   -- (3cm, 2/3*5cm) node[anchor = south] {$Z_{2}$};
    \draw [thick] (3cm - 3pt, 2/3*5cm) -- (3cm + 3pt, 2/3*5cm);
    
    \draw [thick] (4.5cm, 0cm)   -- (4.5cm, 2/3*2cm) node[anchor = south] {$Z_{3}$};
    \draw [thick] (4.5cm - 3pt, 2/3*2cm)   -- (4.5cm + 3pt, 2/3*2cm);
    
    \draw [thick] (6cm, 0cm)   -- (6cm, 2/3*6cm) node[anchor = south] {$Z_{4}$};
    \draw [thick] (6cm - 3pt, 2/3*6cm)   -- (6cm + 3pt, 2/3*6cm);
    
    \draw [thick] (7.5cm, 0cm)   -- (7.5cm, 2/3*4.5cm) node[anchor = south] {$Z_{5}$};
    \draw [thick] (7.5cm - 3pt, 2/3*4.5cm)   -- (7.5cm + 3pt, 2/3*4.5cm);


    % draw curly brackets

    \draw [decorate, decoration = {brace, amplitude = 5pt}, xshift = -4pt, yshift = 0pt]
(4.5cm, 0cm) -- (4.5cm, 2/3*2cm) node [black, midway, xshift = -10pt] 
{\footnotesize $Z'_3$};

    \draw [decorate, decoration = {brace, amplitude = 5pt}, xshift = -4pt, yshift = 0pt]
(6cm, 0cm) -- (6cm, 2/3*4cm) node [black, midway, xshift = -10pt] 
{\footnotesize $Z'_4$};
    \draw [decorate, decoration = {brace, amplitude = 5pt}, xshift = -4pt, yshift = 0pt]
(6cm, 2/3*4cm) -- (6cm, 2/3*6cm) node [black, midway, xshift = -10pt] 
{\footnotesize $Z''_4$};


    % draw dots
    \node at (9.5cm, 2/3*1.5) {\huge $\dots$};

  ;

\end{tikzpicture}
```


Given the total number of claims $N$, it can be decomposed as:
$$N = N^{(a)} + N^{(l)}$$
where

* $N^{(a)}$ is the attritional claims count, i.e. the number of claims with size $Z \le \bar{z}$;
* $N^{(l)}$ is the large claims count, i.e. the number of claims with size $Z > \bar{z}$;


Let's indicate with $Z_{(i)}$ the $i$^th^ in order from the smallest to the bigger. Sorting the claims we can separate the attritional claims from the large claims as follows:
$$
\underbrace{Z_{(1)}, Z_{(2)}, \dots, Z_{(N^{(a)})}}_{\text{Attritional Claims}},
\underbrace{Z_{(N^{(a)} + 1)}, Z_{(N^{(a)} + 2)}, \dots Z_{(N^{(a)} + N^{(l)})}}_{\text{Large Claims}}
$$

In order to model the large claims it is possible to use the following three decompositions of the total cost of claims $S$:
\begin{align}
  \nonumber
  S & = \underbrace{Z_{(1)} + Z_{(2)} + \dots + Z_{(N^{(a)})}}_{\text{Attritional Claims}} +
        \underbrace{Z_{(N^{(a)} + 1)} + Z_{(N^{(a)} + 2)} + \dots Z_{(N^{(a)} + N^{(l)})}}_{\text{Large Claims}} \\
  \label{large-claim-decomposition-1}
    & = \underbrace{\sum_{i=1}^{N^{(a)}}{Z_{(i)}}}_{=S^{(a)}} +
            \underbrace{\sum_{i = N^{(a)} + 1}^{N^{(a)} + N^{(l)}}{Z_{(i)}}}_{=S^{(l)}}
    \ = \ S^{(a)} + S^{(l)} \\[12pt]
  \label{large-claim-decomposition-2}
  S & = \sum_{i=1}^{N}{Z_i}
    \ = \ \sum_{i=1}^{N}{\left(
      %\{Z_i|Z_i>\bar{z}\} I_{Z_i>\bar{z}} +
      %\{Z_i|Z_i\le\bar{z}\} I_{Z_i\le\bar{z}}
      Z_i I_{Z_i>\bar{z}} +
      Z_i I_{Z_i\le\bar{z}}
      \right)} \\[12pt]
  \label{large-claim-decomposition-3}
  S & = \sum_{i=1}^{N}{Z_i}
    \ = \ \sum_{i=1}^{N}{\left(Z'_i + Z''_i\right)}
    \ = \ \sum_{i=1}^{N}{\left(Z'_i + Z''_i I_{Z_i > \bar{z}}\right)}
\end{align}


These three decompositions of $S$ are useful because they provide three decompositions of $E(S)$:
\begin{align}
  \nonumber
  E(S) & = E(S^{(a)}) + E(S^{(l)}) \\
    \label{large-claim-decomposition-expected-1}
    & = E(N^{(a)}) E(Z|Z\le\bar{z}) + E(N^{(l)}) E(Z|Z>\bar{z}) \\[12pt]
  \nonumber
  E(S) & = E(N) E(Z) \\
    \nonumber
    & = E(N) \left[P(Z\le\bar{z}) E(Z|Z\le\bar{z}) + P(Z>\bar{z}) E(Z|Z > \bar{z}) \right] \\
    \label{large-claim-decomposition-expected-2}
    & = E(N) \left[\left( 1 - P(Z>\bar{z}) \right) E(Z|Z\le\bar{z}) + P(Z>\bar{z}) E(Z|Z > \bar{z})\right] \\[12pt]
  \nonumber
  E(S) & = E(N) E(Z) \\
    \label{large-claim-decomposition-expected-3}
    & = E(N) \left[E(Z') + P(Z>\bar{z}) E(Z'')\right]
\end{align}


<!-- ```{=latex} -->
<!-- \begin{equation} -->
<!--   \label{large-claim-decomposition-expected-1} -->
<!--   \begin{split} -->
<!--     E(S) & = E(S^{(a)}) + E(S^{(l)}) \\ -->
<!--     & = E(N^{(a)}) E(Z|Z\le\bar{z}) + E(N^{(l)}) E(Z|Z>\bar{z}) -->
<!--   \end{split} -->
<!-- \end{equation} -->
<!-- ``` -->

<!-- ```{=latex} -->
<!-- \begin{equation} -->
<!--   \label{large-claim-decomposition-expected-2} -->
<!--   \begin{split} -->
<!--     E(S) & = E(N) E(Z) \\ -->
<!--     & = E(N) \left[P(Z\le\bar{z}) E(Z|Z\le\bar{z}) + P(Z>\bar{z}) E(Z|Z > \bar{z}) \right] \\ -->
<!--     & = E(N) \left[\left( 1 - P(Z>\bar{z}) \right) E(Z|Z\le\bar{z}) + P(Z>\bar{z}) E(Z|Z > \bar{z})\right] -->
<!--   \end{split} -->
<!-- \end{equation} -->
<!-- ``` -->


<!-- ```{=latex} -->
<!-- \begin{equation} -->
<!--   \label{large-claim-decomposition-expected-3} -->
<!--   \begin{split} -->
<!--     E(S) & = E(N) E(Z) \\ -->
<!--     & = E(N) \left[E(Z') + P(Z>\bar{z}) E(Z'')\right] -->
<!--   \end{split} -->
<!-- \end{equation} -->
<!-- ``` -->


\@ref(large-claim-decomposition-expected-1), \@ref(large-claim-decomposition-expected-2) and \@ref(large-claim-decomposition-expected-3) provide three approaches to model attritional and large claims:

1. Looking to \@ref(large-claim-decomposition-expected-1) we can model separately attritional claims and large claims. Modeling $N^{(a)}$ and $Z|Z\le\bar{z}$ we estimate the total cost of claims for the attritional part $S^{(a)}$; modeling $N^{(l)}$ and $Z|Z>\bar{z}$ we estimate the total cost of claims for the large part $S^{(l)}$.
2. Looking to \@ref(large-claim-decomposition-expected-2) we can model together the claim count $N$, and then we can model the cost of the attritional claims $Z|Z\le\bar{z}$, the cost of the large claims $Z|Z>\bar{z}$ and the probability to exceed the threshold $P(Z>\bar{z})$.
3. Looking to \@ref(large-claim-decomposition-expected-3) we can model together the claim count $N$, and then we can model the capped claims size $Z'$, the excess over the threshold $Z''$ and the probability to exceed the threshold $P(Z>\bar{z})$.

If the large claims component weighs a lot on the total cost of claims, these approaches could lead to quite different estimates of $E(S)$. In particular, if in the observed data the number of large claims is small, it will be hard to model both $N^{(l)}$ and $P(Z>\bar{z})$, so for these components the modeling process could lead to a flat model (i.e. a model without any explanatory variable) or an almost flat one (i.e. a model with just few explanatory variables and with mild effects). However, with the first approach, a flat model for $N^{(l)}$ leads to distribute the observed total cost of large claims proportionally to all the policies, while with the second and the third, a flat model for $P(Z>\bar{z})$ leads to distribute the observed total cost of large claims proportionally to the expected number of claims $E(N)$. So, with the first approach, a flat model brings to more solidarity between policies, while, with the second approach, a flat model could bring to an exacerbation of the differences identified by modeling $N$.

For the second approach we must also introduce a distribution suitable for modeling $P(Z>\bar{z})$.


#### Binomial distribution

The _binomial distribution_ is used to model the counting on events that occurs (successes) in a fixed amount of trials $n$. For example we can use it to model the number of large claims within a fixed number of $n$ claims.

```{definition, def-binomial, name = "Binomial Distribution"}
A random variable $Y$ with support $\{0,1,2, \dots, n \}$ has a Binomial distribution, if its probability function is:
$$
p_Y(y) = P\left( Y = y \right) = \binom{n}{y} p^y (1-p)^{n-y}, \quad p \in [0, 1]
$$
We will indicate it with the notation $Y \sim Binom(n, p)$.
```

```{r, plot-binomial, echo = FALSE, fig.cap = "Binomial distribution for some values of $n$ and $p$.", fig.align = "center", out.width = "90%", fig.width = 6, fig.height = 4, cache = TRUE}
tibble(x = seq(from = 0, to = 10, by = 1)) %>% 
  crossing(
    tibble(
      n = c(1, 4, 10, 10),
      p = c(.45, .2, .2, .5)
    )
  ) %>%
  filter(x <= n) %>% 
  mutate(
    y = dbinom(x = x, size = n, prob = p),
    # label = str_c("alpha = ", alpha, ", rho = ", rho) %>% 
    #   fct_inorder()
    label = str_c("list(italic(n) == ", n, ", italic(p) == ", p, ")") %>% 
      fct_inorder()
  ) %>%
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  geom_segment(aes(xend = x),
               yend = 0) +
  facet_wrap(
    ~label,
    labeller = label_parsed
  ) +
  coord_cartesian(
    # ylim = c(0, .3),
    xlim = c(0, 10)
  ) +
  labs(
    x = "y", y = "p(y)"
  ) +
  scale_x_continuous(breaks = seq(from = 0, to = 10, by = 2))
```

The binomial distribution is a parametric distribution that depends on the parameters $n$ and $p$. $n$ represents the number of trials, while $p$ represents the probability for a trial to succeed. The assumption is that the $n$ trials are identical, so they have all the same probability $p$ to succeed.

The first two moments of the binomial distribution are:
\begin{align*}
E(N)   & = np \\
Var(N) & = np(1-p)
\end{align*}

In figure \@ref(fig:plot-binomial) the distribution is represented for different levels of $n$ and $p$. From the plots we can see that:

* if $n = 1$, the binomial distribution assumes only the values $1$ (with probability $p$) and $0$ (with probability $1-p$). In this case it is also called _Bernoullian distribution_ and it can be used to model the indicator of an event $I_E$.
* If $n>1$, the binomial distribution assumes a shape centered on its expected value $E(Y)=np$ and fading for values of $y$ that moves away from $E(Y)$. As $n$ increases, the distribution assumes a bell shape similar to the Normal distribution shape. The convergence to the Normal distribution can be obtained with the _Central Limit Theorem_.

From the binomial Distribution it is also possible to define the scaled binomial distribution by dividing its value by $n$.

```{definition, def-scaled-binomial, name = "Scaled Binomial Distribution"}
If $Y\sim Binom(n, p)$, and $Y' = \frac{Y}{n}$, we will say that $Y'$ has a \textit{Scaled Binomial Distribution} and we will indicate it with the notation $Y' \sim Binom(n, p)/n$.

The support of $Y'$ is $\{0, \frac{1}{n}, \frac{2}{n}, \dots, 1 \}$ and its probability function is:
$$
p_{Y'}(y') = P\left( Y' = y' \right) = \binom{n}{ny'} p^{ny'} (1-p)^{n-ny'}, \quad p \in [0, 1]
$$
```

In chapter \@ref(chap:models) we will use the Scaled Binomial Distribution.

In non-life insurance pricing, the binomial distribution can be used to model the probability for a claim to have specific characteristics. For example we can use it to model the probability that a certain claim is a large one, $P(Z>\bar{z})$, in order to model separately attritional claims severity $\{Z|Z\le\bar{z}\}$ and large claims severity $\{Z|Z>\bar{z}\}$, as we have seen in section \@ref(chap:large-claims).

Another example is the decomposition between claims with only material damages and claims with also bodily injuries. Modeling separately these two components is useful because they usually have a different distribution for the claim size.

As for large claims we can decompose $S$ in the following two ways:
\begin{align}
  \nonumber
  E(S) & = E(S^{(\text{things})}) + E(S^{(\text{inj})}) \\
    \label{inj-claim-decomposition-expected-1}
    & = E(N^{(\text{things})}) E(Z|\bar{J}) + E(N^{(\text{inj})}) E(Z|J) \\[12pt]
  \nonumber
  E(S) & = E(N) E(Z) \\
    \nonumber
    & = E(N) \left[P(\bar{J}) E(Z|\bar{J}) + P(J) E(Z|J) \right] \\
    \label{inj-claim-decomposition-expected-2}
    & = E(N) \left[\left( 1 - P(J) \right) E(Z|\bar{J}) + P(J) E(Z|J)\right]
\end{align}
where:

* $N^{(\text{things})}$ is the number of claims with only material damages;
* $N^{(\text{inj})}$ is the number of claims with injuries;
* $J$ is the event that represents that a specific claim presents injuries; such as $Z$ is a representative for $Z_1, Z_2, \dots, Z_N$, $J$ is a representative for $J_1, J_2, \dots, J_N$.

Combining this decomposition with what we have seen in large claims decomposition, we can further develop our decomposition taking into account both the presence or absence of injuries and the occurrence or not of a large claim. One example could be:
\begin{align*}
E(S) & = E(N) \left[\left( 1 - P(J) \right) E(Z|\bar{J}) + P(J) E(Z|J) \right] \\[4pt]
  & = E(N) \left\{ \right. \\
  & \qquad \left( 1 - P(J) \right) E(Z|\bar{J}) \\
  & \qquad + P(J) \left[ P(Z \le \bar{z} | J) E\left( Z \mid Z\le \bar{z} \land J \right) \right. \\
  & \qquad \qquad + \left. P(Z < \bar{z} | J) E\left( Z \mid Z < \bar{z} \land J \right) \right] \\
  & \qquad \left. \right\}
\end{align*}

This way, only the claims with injuries are decomposed between attritional and large. That makes sense because claims that don't produce injuries usually have small severities.


### Model fitting and available data {#chap:model-fitting-and-data-available}

Once we have chosen how to decompose $S$, we have to model the response variables needed for that decomposition ($N$, $Z$, $I_J$, ...) with the explanatory variables. Thus we have to estimate a function $r:\mathcal{X}\rightarrow \mathcal{C}$ as defined in definition \@ref(def:modeling).

In order to estimate $r(\cdot)$ we have also to take some assumptions on the distribution of the response variable and on the shape of $r(\cdot)$. We calls _model_ a set of assumptions on the response variable and on the shape of $r(\cdot)$. We will discuss some of the most widespread models for claims count and claims severity in chapter \@ref(chap:models).

Defined the model, we have to estimate it using observed data. In general, to model a response variable $Y_i$ with the explanatory variables $\boldsymbol{x}_i=(x_{i1}, x_{i2}, \dots, x_{ip})\in \mathcal{X} \subseteq \mathbb{R}^p$, the observed data is in the form:
$$
\mathcal{D} = \left\{(\boldsymbol{x}_1, w_1, y_1), \ (\boldsymbol{x}_2, w_2, y_2), \ \dots, \ (\boldsymbol{x}_i, w_i, y_i), \ \dots, \ (\boldsymbol{x}_n, w_n, y_n)\right\}
$$
where:

* $n$ is the number of observations in the dataset;
* $\boldsymbol{x}_i\in \mathcal{X} \subseteq \mathbb{R}^p$ is the set of explanatory variables for the observation $i$;
* $w_i$ is the weight for the observation $i$;
* $y_i\in \mathcal{Y}\ \subseteq \mathbb{R}$ is the realization of the response variable $Y_i$ for the observation $i$.

What an observation is, depends on the variable we are modeling. For instance:

* If we are modeling the yearly claim count $N_i$, each observation could be a policy (or a couple (policy, accounting year)), the weights could be the exposures $v_i$ and the realizations of response variables could be the number of observed claims for that policy (or couple (policy, accounting year)).
* If we are modeling the claim severity $Z_j$, each observation could be a claim $j$, the weights could all be $1$ and the realizations of the response variable could be the observed cost for the claim $j$. It is also possible to model the claim severity taking into account the total cost of claims for the policy $S_i = \sum_{j=1}^{N_i}{Z_j}$. In this case, each observation would be a policy $i$, the weights would be the number of claims for each policy $n_i$ and the realizations of response variables would be the total observed cost for the claims of the policy $i$.
* If we are modeling the occurrence of injuries in a claim $I_{Jj}$, each observation could be a claim $j$, the weights could be all $1$ and the realizations of response variables could be an indicator that assume the value $1$ if the claim $j$ caused injuries and $0$ otherwise. As for the claim severity, we can also aggregate data for policy, so each observation would be a policy $i$, the weights would be the number of claims $n_i$ for the policy $i$ and the realizations of response variables would be the number of claims that caused injuries among the claims of the policy $i$.

In each of these cases, $y_i$ is seen as a realization of the random variable $Y_i$. With an inferential process we obtain estimations on $Y_i$ distribution based on observations of their realizations $y_i$.


#### Settlement process and IBNR claims

One of the challenges in non-life insurance pricing is that obtaining the observed data is not so straightforward. In many insurance coverages, such as \ac{mtpl}, the settlement process could last many years, so, if we want to develop models using data from recent years, not all the information is available. To better understand this aspect we have to discuss how the settlement process works.

In figure \@ref(fig:settlement-process) the settlement process for a claim is represented. At time $t_1$ the insured event (e.g. an accident) occurs. From this moment a liability for the insurer emerges, even if the insurer has not been notified yet. This liability is called _Outstanding Loss Liability_. In $t_2$ the claim is reported and the insurance is notified about the occurrence of the event. From this moment the settlement process starts. This process consists in evaluating the event and understanding the responsibilities of the parts and the entity of the damage. During this process, controversies between the parts can emerge and, in particular if injuries occurred, the damage evaluation can take a lot of time. When the situation is clear and everything is defined, the claim is settled and the liabilities are paid. In $t_3$ we have the settlement and in $t_4$ the claim is closed. It is possible that $t_4=t_3$, but it is not always the case. If the settlement process takes a long time and the insurer already knows he will have to pay something, he can make some partial payments during the period $[t_2, t_3]$. These intermediate payments are paid at times $\tau_1, \tau_2, \dots, \tau_n \in [t_2, t_3]$. It is also possible that a claim is opened and then gets closed without any payment. After the closing ($t_4$) it is also possible that a claim gets reopened and that more payments emerge.

```{tikz, settlement-process, fig.cap = "Claim timeline.", fig.ext = 'pdf', cache = TRUE, echo = FALSE}
\newcommand{\ImageWidth}{11cm}
\usetikzlibrary{decorations.pathreplacing, positioning, arrows.meta}

\begin{tikzpicture}
    % draw horizontal line   
    \draw[thick, -Triangle] (0, 0) -- (\ImageWidth, 0) node[font = \scriptsize, below left = 3pt and -8pt]{$t$};
    \draw[very thick] (0.5cm, 0) -- (9.5cm, 0);


    % draw vertical lines and times
    \draw (0.5cm, -3pt) -- (0.5cm, 3pt) node[anchor = south] {$t_{1}$};
    \draw (2.5cm, -3pt) -- (2.5cm, 3pt) node[anchor = south] {$t_{2}$};
    
    \draw (4.0cm, -3pt) -- (4.0cm, 3pt) node[anchor = south] {$\tau_{1}$};
    \draw (4.5cm, -3pt) -- (4.5cm, 3pt) node[anchor = south] {$\tau_{2}$};
    
    \path (5.25cm, -3pt) -- (5.255cm, 3pt) node[anchor = south] {$\dots$};
    
    \draw (6cm, -3pt) -- (6cm, 3pt) node[anchor = south] {$\tau_{n}$};
    
    \draw (7.5cm, -3pt) -- (7.5cm, 3pt) node[anchor = south] {$t_{3}$};
    \draw (9.5cm, -3pt) -- (9.5cm, 3pt) node[anchor = south] {$t_{4}$};
    
    
    % draw events names
    
    \node[align=center] at (0.5cm, -16pt) {\small occurrence};
    \node[align=center] at (2.5cm, -16pt) {\small reporting};
    
    \node[align=center] at (5cm, -16pt) {\small intermediate \\ payments};
    
    \node[align=center] at (7.5cm, -16pt) {\small settlement};
    \node[align=center] at (9.5cm, -16pt) {\small closing};
    
    % draw curly bracket
     \draw [decorate, decoration = {brace, mirror, amplitude = 10pt}, xshift = 0pt, yshift = -10pt]
(2.5cm, -20pt) -- (7.5cm, -20pt) node [black, midway, xshift = 0pt, yshift = -15pt] 
{\small settlement process};

\end{tikzpicture}
```

From the moment the claim is reported ($t_2$), the insurer estimates how much he is going to pay for that claim and he allocates that sum in a reserve, called _case reserve_. As new information emerges and some payments are settled, the case reserve is updated. The aim for this reserve is to have a best estimate for the future payments for the claims already emerged. As the claim gets settled, the sum between the paid and the reserved converges to the final cost of the claim.

From this description emerges that:

* In the period $]t_1, t_2[$ the insurer has an outstanding loss liability for an event that has not been reported yet. In this case we talks about an \ac{ibnyr}.
* In the period $[t_2, t_3[$ the insurer has an outstanding loss liability for an event that has been reported, but has not been totally settled yet, so this liability is just an estimate. In this case, if the case reserve is not large enough to cover all the future payments that will emerge for that claim, we talks about an \ac{ibner}.


#### Model fitting with available data

The \ac{ibnyr} and \ac{ibner} issue is particularly challenging when we have to perform a risk evaluation at a specific time $t$. In general $t_1, t_2,\dots$ are not known a priori, so we don't know if in the future more claims for accidents occurred in the past will be reported and we don't know if the ones that are already reported will experience a revaluation. That means that, in general, when we model $N$ and $Z$ at a specific time $t$, we can't observe the total number of claims occurred for each policy $n_i$ and the payments for each claim $z_j$. What we can use is:

* $n_i^{(t)} = n_i^{(\text{reported in } t)}$  
  where:
  * $n_i^{(\text{reported in } t)}$ is the number of reported claims in $t$ for the policy $i$;
* $z_j^{(t)} = z_j^{(\text{paid in }t)} + z_j^{(\text{reserved in } t)}$  
  where:
  * $z_j^{(\text{paid in }t)}$ is the amount already paid in $t$ for the claim $j$;
  * $z_j^{(\text{reserved in } t)}$ is the amount reserved in $t$ for the claim $j$.

When we use this data for modeling the total cost of claims we must be particularly aware on what we are using. In general:
$$
n_i^{(t)} \ne n_i, \qquad z_j^{(t)} \ne z_j
$$

The common case is that $n_i^{(t)} < n_i$ and $z_j^{(t)} < z_j$. If we used $n_i^{(t)}$ and $z_j^{(t)}$ without any correction, we would underestimate both $E(N)$ and $E(Z)$, obtaining a biased estimate for $E(S)$.

To tackle these problems what is usually done is fitting the models for $S_i$ with $n_i^{(t)}$ and $z_j^{(t)}$ and then apply a flat corrective coefficient $\alpha$ to $\widehat{E(S_i)}$ based on an aggregated estimate of $E(S)$ that takes into account the long settlement process.

An estimate for the expected total cost of claims for a generic policy in the portfolio $E(S)$ can be obtained with techniques based on runoff triangles, such as the _Chain Ladder_. These techniques are based on projecting the cost of claims already emerged to the final total cost of claims. We are not going to discuss these techniques in this thesis. For more details on them we refer to [@wuthrich-non-life-insurance-math-stats]. For our dissertation, we just have to know that these techniques provide us with an estimate for $E(S)$. Let's call it $\widehat{E(S)}^{CL}$. This estimate does not depend on explanatory variables; it is a sort of average total cost of claims for the policies in the portfolio.

Meanwhile, with the available data $n_i^{(t)}$ and $z_j^{(t)}$, the fitting for all the models needed in the decomposition of $S$ is performed and, for each policy $i\in\{1, 2, \dots, n\}$, $E(S_i)$ is obtained. Let's call it $\widehat{E(S_i)}'$. As we used the data available in $t$ that comes from claims not totally settled, $\widehat{E(S_i)}'$ is a biased estimate for $E(S_i)$.

We can then balance the estimates $\widehat{E(S_i)}'$ with $\widehat{E(S)}^{CL}$ by computing:
$$
\alpha = \frac{n}{\sum_{i=1}^{n}{\widehat{E(S_i)}'}} \widehat{E(S)}^{CL}
$$
and applying to the estimates as follows:
$$
\widehat{E(S_i)} = \alpha \ \widehat{E(S_i)}'
$$
We calls $\widehat{E(S_i)}$ rebalanced estimate.

The property of these rebalanced estimates is that on average they are equal to $\widehat{E(S)}^{CL}$:
\begin{align*}
\frac{\sum_{i=1}^{n}{\widehat{E(S_i)}}}{n} & = \frac{\sum_{i=1}^{n}{\alpha\widehat{E(S_i)}'}}{n} \\
& = \alpha\frac{\sum_{i=1}^{n}{\widehat{E(S_i)}'}}{n} \\
& = \frac{n}{\sum_{i=1}^{n}{\widehat{E(S_i)}'}} \widehat{E(S)}^{CL} \frac{\sum_{i=1}^{n}{\widehat{E(S_i)}'}}{n} \\
& = \widehat{E(S)}^{CL}
\end{align*}

So, if $\widehat{E(S)}^{CL}$ is a unbiased estimator for $E(S)$, we obtain:
$$
E\left( \frac{\sum_{i=1}^{n}{\widehat{E(S_i)}}}{n} \right)
= E\left( \widehat{E(S)}^{CL} \right)
= E(S)
$$

This procedure can be further developed by balancing not directly the total cost of claims $E(S)$, but its components. For example, we could separately balance the total cost of claims that only caused damage to things and the total cost of claims that caused injuries. This separation in components can lead to a more precise estimate because usually claims that caused injuries have a slower settlement process so they will have a higher corrective coefficient $\alpha$.

If the dataset contains policies from many years and during the last years a relevant change in the portfolio risk mixture happened, it is also possible to compute $\alpha$ only with the policies from the last year of the dataset, rather than with all the $n$ policies of the dataset.

The fact that the final estimates $\widehat{E(S_i)}$ are rebalanced on $\widehat{E(S)}^{CL}$ means that the explanatory variables effects estimated with $n_i^{(t)}$ and $z_j^{(t)}$ are used just as relative effects and not absolute ones. For instance, if the model says that young people have an expected total cost of claims $\widehat{E(S_i)}'$ that is two times the old people one, that relative coefficient 2 will be kept also in the balanced estimate $\widehat{E(S_i)}$.

For this reason, in practice, often the modeling is considered composed in 2 parts:

1. _Tariff Requirement_ (or _Fabbisogno Tariffario_): the estimate of $\widehat{E(S)}^{CL}$ by aggregated data;
2. _Personalization_: the estimate of $\widehat{E(S_i)}'$ and the relative coefficients.

The techniques used for _Tariff Requirement_ are employed also to estimating the Claim Reserve, that is a fundamental component of the financial statement in non-life insurance companies.
<!-- general liability position for the company. This information is particularly important and it is reported in the company financial statement. -->


## Beyond Technical Pricing {#beyond-technical-pricing}

In section \@ref(chap:risk-prem-tech-price) we defined:

* the _Risk Premium_  
  $P^{(risk)} = E(S)$
* the _Technical Price_  
  $P^{(tech)} = E(S) + E$

In section \@ref(chap:personalization) we described how the risk premium can be estimated. In this thesis we are not going to deal with the estimate of the expenses.

In this section we are going to discuss what the _Tariff_ and the _Offer Price_ are and which are the further needs that the offer should satisfy. The following description is referred to \ac{mtpl} insurance in the Italian market. Most of the comments we make can be applied to other motor coverages too.


### Tariff and Offer Price

The _Tariff_ is the official price for the policy. Over the cost of claims and the expenses, it must include all the loadings, such as the loading for the cost of capital the risk margin and profit margin. The tariff has a particular importance because it is subjected to strict regulations and it must be approved by the supervisory authority, that in Italy is the [IVASS](https://www.ivass.it/) (Istituto per la vigilanza sulle Assicurazioni).

In section \@ref(chap:pricing-variables) we described some of the explanatory variables that can be used to build the technical price. For technical pricing there are no constraints because it is used only for internal monitoring and the final price proposed to the client does not directly depend on it.

However, some of the variables used for technical pricing can't be used in tariff. In particular, the regulations dictate that companies can't discriminate clients based on sex, ethnic group, religion or place of birth. Thus, for example, even if from statistical data we saw that women usually experience less claims than men, we couldn't discriminate men by offering them a higher price.

Moreover, some variables have constraints on tariff coefficients. For example, in \ac{mtpl} insurance, the bonus-malus class is strongly regulated. Every company must recognize the bonus-malus class matured by clients (even if they matured it with other companies) and the coefficients of this variable must be monotonically increasing, i.e. a lower class must correspond to a better tariff (in the Italian bonus-malus system the lower the class the better the premium).

Another tariff constraint is that for some coverage, such as \ac{mtpl}, the insurer has an obligation to contract. That means that whoever the client is, independently of how risky he is, the company must offer a premium and, if the client accepts, the company must underwrite the insurance contract. In this context, if the company offers an unreasonably high premium, it could fall into an attempt to avoiding the obligation to contract. For this reason, the tariff can't be arbitrarily high and must contemplate a maximum premium. To be sure that all the constraints has been respected, the tariff, before entering in production, must follow a strict approval process.

To make the offer price more flexible and to facilitate business competition, the supervisory authority allows insurance companies to sell policies not at the tariff price, but at the price obtained subtracting from it a discount $D_i\ge0$. The premium obtained this way is called _Offer Price_.
$$
P^{(\text{offer})}_i = P^{(\text{tariff})}_i - D_i
$$

That means that, for the offer price to adequately cover the cost of claims and expenses, the tariff must include a loading for discounting. This loading for discounting, called _discounting flexibility_, can be partially spent by the agent and partially by the insurance company itself. The discounts can change over time in a much more agile way than the tariff. For example in Italy, during the Spring 2020 Covid19 crisis, many companies introduced measures to support customers needs with important discounts on both new businesses and renewals. From a technical point of view, these discounts have been funded by the remarkable decrease on claims frequency due to the reduced traffic. Discount measures like these are welcomed by the supervisory authority because they promote business competition and lead to lower prices for consumers.


### Price Optimization

Both tariff and offer price must be based not only on technical logic, but also on commercial logic. They are determined with a process of _Price Optimization_. The final goal for a company is to maximize profits this year and in the next ones, so the objective of price optimization must be obtaining the optimal price to reach this goal. Maximizing profits is a quite generic goal and can't be easily expressed as an analytical optimization problem. For this reason the pricing choices can be guided by the business strategy that can be translated in specific _Key Performance Indicators_ (KPI) that have to be optimized. In this optimization framework, the technical price can be seen as an estimate of the expected cost related to the policy. Knowing the costs it is possible to tune the final premium by working on margins.

The components that act on price optimization can be addressed to:

1. _Technical Pricing_;
2. _Client Expectation_;
3. _Business Strategy_.

We already extensively covered technical pricing in previous chapters.

Client expectation is basically the price that the client is willing to pay for the specific product. If the client would pay a premium higher than the technical one, the insurance company has the space for determining an offer price higher than the technical one and gaining margins on that contract. To analyze client expectation, what is usually done is:

* for new business modeling his conversion probability;
* for renewal business modeling his retention probability.

For example some guarantees or some options are perceived by the clients as being really worth even if their technical price is not so high. The perception of the client depends also on the competitors pricing and how easy comparing offers from different companies is. In the last years, in the Italian market, the development of aggregators has made much easier for consumers to compare offers from different companies, increasing the competition and the attention on pricing. Anyway, if a company is able to differentiate itself from the others and to make its product be perceived as more valuable, it can sell it at a higher price than other companies. For example this can be achieved by improving customer care and customer experience.

If the technical price and the conversion probability functions are given, finding the optimal price for a policy can be expressed as an analytical optimization problem. However, to find the optimal price, one should also take into account that usually policies are not sold alone, but in packages of guarantees. With a wider vision, a business strategy could be selling \ac{mtpl} policy with almost no margins if it allows to sell other guarantees with higher margins. Moreover, as the aim is not to be profitable this year, but also in the following ones, the company should also consider the _lifetime value_ of the client. Indeed, a satisfied client will also stipulate other contracts in the future and can bring to the company other clients from his connections. So, selling a policy with small margins today can lead to high margins tomorrow in other policies.

The business strategy could also contemplate being more aggressive on certain targets of client and less on others. For example, if the company is particularly strong in certain regions, it could make sense for it to push in that region to further increase its market share. Vice versa, in regions where the company doesn't sell much, it could be safer not to push too much and to be more careful. In a risk management framework, this can be also interpreted as introducing a further risk margin for clusters where there isn't enough observed data and the lack of information brings to more uncertainty. An aggressive pricing can also make sense for a young company that is growing and it is not supposed to be profitable from the first years. From a marketing point of view this strategy can increase the brand awareness by the clients and can strengthen the company image.

Anyway, a company can't arbitrarily discount policies because an excess in discounting could cause severe drawbacks on a financial perspective. Therefore, an insurance company must always respect the solvency constraints defined by the supervisory authority to safeguard itself from bankruptcy. The company solvency is essential to protect all the stakeholders, that are both the clients and the investors.


## The actuary role {#chap:actuary-role}

In this technical pricing and price optimization framework, the actuary is the one that conducts the analysis and defines the pricing rules. The _International Actuarial Association_ (IAA) describes the actuaries as "highly qualified professionals who analyze the financial impact of risk for organizations like insurers, pensions fund managers, and more" and it states that their work "requires a combination of strong analytical skills, business knowledge, and understanding of human behavior" [@iaa-about-actuaries].

First of all, the actuary must master the main statistical and data science techniques used to develop models for technical pricing. On this field, in the last years, the development of machine learning and high performance computing has permitted a huge development of technical pricing allowing actuaries to use much more complex variables and models. However, the actuary does not have just to be an expert in statistics and machine learning. He must be also able to interpret the results he gets with his models and use his expertise to understand if the results he gets are fine for future predictions. As we already mentioned, the pricing rules will be used for policies that will be sold in the future, so they have to be defined with a mixture of observations of the past and assumptions on the future. In addition, sometimes it is needed to define prices for clusters where the company has no historical data. This can happen when a company is expanding to new customers for example by opening new selling channels or by pushing in regions where its market share is quite small. Furthermore, the lack of historical data can be due to the full sector evolution. For example, in these years, new vehicles, such as electric cars and cars with \ac{adas}, are spreading. As these vehicles didn't exist in the past, historical data doesn't exist. So, finding the proper pricing is challenging. From the company point of view, positioning with a competitive pricing on these segments is important for future business, but the risk must be properly evaluated. For these kind of tasks, the actuary must have a deep domain knowledge on the field.

In the last years, the increase of competition brought to an increase in price optimization importance. Now most of the companies build their own conversion and retention probability models and they are developing more complex business strategies. In this context, it is fundamental for the actuary to understand clients behaviors, in order to optimize tariff and offer price.

The importance for price optimization implies that the technical pricing must not be conducted independently from tariff and offer pricing. Even in companies where the technical pricing and the offer pricing are carried out by two separate teams, the two teams have to collaborate and coordinate together. This need has some relevant implications on how technical pricing is conducted that we will further discuss in section \@ref(chap:actuary-importance).


<!--
expert judgement
statistical and machine learning techniques


technical e commercial sono sempre più legati
+ technical pricing: nuove tecniche
+ commercial pricing: aumento concorrenza -> sempre più importante

technical:
bisogna scegliere i coefficienti anche per i cluster in cui non c'è esposizione
Non basta fare un fitting dei dati storici

-->