Skip to content

filippopalomba/rcrologit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rcrologit

R build status

DISCLAIMER: If the following text is not compiling for whatever reason, please use the README.pdf file available in the repo.

The package provides estimation and inferential procedures for rank-ordered logit model with agents with heterogeneous taste preferences.

Setup

We have $n$ i.i.d. random draws

$$ \mathcal{D}:=(Y_i,X_i)_{i=1}^{n} $$

where $Y_i:=(Y_{i0},Y_{i1},\ldots,Y_{iJ})^\top, Y_{i\ell}=0,1,\ldots,J,$ is a vector of ranks and $C_i:=(X_{i0}, C_{i1},\ldots, C_{iJ})^\top \in \mathbb{R}^{(J+1)\cdot K},$ $C_{i\ell} \in\mathbb{R}^K$. Let the latent utility model (McFadden, 1974) be

$$ U_{ij}^\star = u_{ij} + \epsilon_{ij},\qquad \epsilon_{ij}\overset{\mathtt{iid}}{\sim}\mathsf{Gu}(0,1). $$

For notational convenience, define the functions $r_i:{0,1,\ldots,J}\to{0,1,\ldots,J},i=1,\ldots,n$. Such functions map the rank $j\in{0,1,\ldots,J}$ into the corresponding item $r(j)\in{0,1,\ldots,J}$ according to individual $i$'s preferences. To clarify

$$ Y_{ij} = k\quad\iff\quad r_i(k)=j $$

An observed ranking for a respondent implies a complete ordering of the underlying utilities. An individual will prefer an item with a higher utility over an item with a lower utility. If we observe a full ranking $r_i:=(r_i(0),r_i(1),\ldots,r_i(J))^\top$, we know that

$$ U_{ir_i(0)}^\star> U_{ir_i(1)}^\star>\cdots> U_{ir_i(J)}^\star. $$

Therefore, the probability of observing a particular ranking $r_i$ is given by

$$ \mathbb{P}\left[r_i\mid \mathcal{D}\right] =\mathbb{P}\left[U_{ir_i(0)}^\star> U_{ir_i(1)}^\star>\cdots> U_{ir_i(J)}^\star\mid \mathcal{D}\right] =\prod_{j=0}^{J-1} \frac{\exp \left(u_{i r_{i}(j)}\right)}{\sum\limits_{j\leq \ell\leq J} \exp \left(u_{i r_{i}(\ell)}\right)}. $$

In light of this, we can see that the rank-ordered logit is nothing else than a series of multinomial logit (MNL) models: when $j=0$ we considered a MNL the most preferred item; another MNL for the second-ranked item to be preferred over all items except the one with rank 1, and so on. Finally, the probability of a complete ranking is made up of the product of these separate MNL probabilities. The product contains only $J$ probabilities, because ranking the least preferred item is done with probability 1.

Modelling

In its most general form, we allow the user to model $u_{i\ell}$ in the latent utility model as

$$ u_{i\ell} = X_{i\ell}^{\top} \boldsymbol{\beta}{\mathtt{F}} + Z_i^{\top} \boldsymbol{\alpha}{\ell,\mathtt{F}} + W_{i\ell}^{\top} \boldsymbol{\beta}i + V_i^{\top} \boldsymbol{\alpha}{i\ell} + \delta_{\ell} $$

An alternative, handier way to rewrite the model above is to define $Z_{i\ell}:=\sum\limits_{j=1}^JZ_i\times\mathbf{1}(j=\ell)$ and $V_{i\ell}:=\sum\limits_{j=1}^JV_i\times\mathbf{1}(j=\ell),\ell=1,2,\ldots,J$, and consider the equivalent model

$$ u_{i\ell}=X_{i\ell}^\top\boldsymbol{\beta}{\mathtt{F}} + Z{i\ell}^\top\boldsymbol{\alpha}{\mathtt{F}} + W{i\ell}^\top\boldsymbol{\beta}i + V{i\ell}^\top\boldsymbol{\alpha}{i} + \delta\ell, $$

where:

  • $X_{i\ell}$ are covariates varying at the unit-alternative level whose coefficients are modelled as fixed
  • $Z_{i}$ are covariates varying at the unit level whose coefficients are modelled as fixed
  • $W_{i\ell}$ are covariates varying at the unit-alternative level whose coefficients are modelled as random
  • $V_{i}$ are covariates varying at the unit level whose coefficients are modelled as random the random coefficients
  • The heterogeneous taste coefficients are modeled as a joint multivariate normal and are i.i.d. across units with mean $\left[\boldsymbol{\alpha_{\mathtt{R}}}^\top,\boldsymbol{\beta_{\mathtt{R}}}^\top\right]^\top$ and variance $\boldsymbol{\Sigma}$.
  • $\delta_\ell$ are alternative-specific fixed effects
  • $\epsilon_{i\ell}\sim\mathsf{Gu}(0,1)$ are idiosyncratic i.i.d. shocks

Note that whenever $W_{i\ell}$ and $V_i$ are not specified estimates a standard rank-ordered logit with no heterogeneous preferences and the conditional choice probabilities are given by

$$ \mathbb{P}\left[r_i\mid \mathcal{D}\right] =\prod_{j=0}^{J-1} \frac{\exp \left(u_{i r_{i}(j)}\right)}{\sum\limits_{j\leq\ell\leq J} \exp \left(u_{i r_{i}(\ell)}\right)}. $$

If instead agents are allowed to have heterogeneous taste, then

$$ \mathbb{P}[r_i\mid \mathcal{D}] = \int \prod_{j=0}^{J-1} \frac{\exp \left(u_{ir_i(j)}^\top(\beta_i)\right)}{\sum\limits_{j\leq \ell\leq J} \exp \left(u_{ir_i(\ell)}^\top(\beta_i)\right)} \phi(\beta_i;\beta,\Sigma) \mathrm{d} \beta_i. $$

The parameter vector to be estimated is thus

$$ \theta = \left(\boldsymbol{\beta_\mathtt{F}}^\top,\boldsymbol{\beta_\mathtt{R}}^\top,\boldsymbol{\alpha_\mathtt{F}}^\top,\boldsymbol{\alpha_\mathtt{R}}^\top, \mathrm{vech}(\boldsymbol{\Sigma})^\top,{\delta}_{j=1}^J\right)^\top. $$

Estimation

The ideal maximum likelihood estimator is defined as

$$ \widehat{\theta}{\mathtt{ML}}:=\mathrm{arg}\max{\theta} \sum_{i=1}^n\log\int \prod_{j=0}^{J-1} \frac{\exp \left(u_{ir_i(j)}(\theta)\right)}{\sum\limits_{j\leq \ell\leq J} \exp \left(u_{ir_i(\ell)}(\theta)\right)} \phi(\beta_i;\beta_{\mathtt{R}},\Sigma) \mathrm{d} \beta_i. $$

We approximate the integral via montecarlo as

$$ \widehat{\mathbb{P}}{(\widehat{\beta},\widehat{\Sigma})}[r_i\mid \mathcal{D}]=\frac{1}{S}\sum{i=1}^S \prod_{j=0}^{J-1} \frac{\exp \left(u_{ir_i(j)}(\theta,\beta_i^{(s)})\right)}{\sum\limits_{j\leq \ell\leq J} \exp \left(u_{ir_i(\ell)}(\theta,\beta_i^{(s)})\right)}, $$

where $\boldsymbol{\beta}i\overset{\mathtt{iid}}{\sim}\mathsf{N}(\widehat{\boldsymbol\beta}{\mathtt{R}},\widehat{\boldsymbol{\Sigma}})$.

Installation

You can install the development version of rcrologit from GitHub with:

# install.packages("devtools")
devtools::install_github("filippopalomba/rcrologit")

Basic Usage

library(rcrologit)

data <- rcrologit_data

# Rank-ordered logit
dataprep <- dataPrep(data, idVar = "Worker_ID", rankVar = "rank",
                    altVar = "alternative",
                    covsInt.fix = list("Gender"),
                    covs.fix = list("log_Wage"), FE = c("Firm_ID"))
    
rologitEst <- rcrologit(dataprep)

# Rank-ordered logit
dataprep <- dataPrep(data, idVar = "Worker_ID", rankVar = "rank",
                    altVar = "alternative",
                    covsInt.het = list("Gender"),
                    covs.fix = list("log_Wage"), FE = c("Firm_ID"))
    
rologitEst <- rcrologit(dataprep, stdErr="skip")