main.tex

\documentclass[a4paper,man,natbib]{apa6}

\usepackage[english]{babel}
\usepackage[utf8x]{inputenc}
\usepackage{graphicx}
\usepackage{graphics}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage{amsmath,amsfonts,mathabx}
\usepackage{epigraph}
\usepackage[figuresright]{rotating}
%\usepackage[ansinew]{inputenc}
\usepackage{multirow}
\usepackage{nameref}
\usepackage{hyperref}
\usepackage{makecell}
\usepackage{floatrow}
\floatplacement{figure}{!ht}
\usepackage{float}
\floatstyle{plaintop}
\restylefloat{table}
\usepackage{pdfpages}
%\renewcommand{\cellalign/theadalign}{cl}
% Keywords command

\newcommand{\cmmnt}[1]{\ignorespaces}

\providecommand{\keywords}[1]
{
  \textbf{\textit{Keywords---}} #1
}
\usepackage[doublespacing]{setspace}

\setlength\epigraphwidth{.8\textwidth}
%\setlength{\parskip}{0}

\title{The Discrete Metric in Categorization Under Time Pressure}
\shorttitle{Discrete Metric Under Time Pressure}
\author{Florian I. Seitz}
\affiliation{University of Basel}

\abstract{The generalized context model \citep{nosofsky1986attention}, which excels at explaining human categorization behavior, assumes the Minkowski metric to represent psychological distance between stimuli. In the present pre--registered study, I tested whether people with low cognitive capacities due to time pressure compare stimuli heuristically with the discrete metric which counts the number of non--identical feature values for a given stimulus pair. 
Sixty--one psychology students from the University of Basel took part in a categorization experiment that manipulated time pressure between subjects. Participants first learned without time pressure to categorize eight stimuli under trial--by--trial supervision in the learning phase. Then, participants categorized six novel and four old stimuli without feedback in the test phase. Time pressure was applied to a random half of participants in the test phase.
Inferential tests showed that the generalized context model with the Minkowski metric accounted for participants' categorization behavior both with time pressure and without time pressure, $p$ < .001, respectively. Computational cognitive modeling yielded that, without time pressure, the model with the discrete metric and attention on all features and, with time pressure, a random choice model outperformed the other models both on the aggregate and on the individual level.
The present findings suggest that under time pressure, people do not use the discrete metric, but respond more randomly. These findings imply that people with low cognitive capacities do not simplify psychological distance computation, but rather increase choice inconsistency. 
} 

\keywords{categorization, cognitive capacities, time pressure, Minkowski metric, discrete metric, psychological distance}

\begin{document}
%\setmainfont{Arial} 
%\setsansfont{Arial} 
%\setmonofont{Arial}
%\linespread{1.25}
\includepdf[pages={1-4}]{thesis_begin.pdf}
\pagenumbering{arabic}
\maketitle

%\vspace*{\fill}
\epigraph{There is nothing more basic than categorization to our thought, perception, action, and speech. [...] An understanding of how we categorize is central to any understanding of how we think and how we function, and therefore central to an understanding of what makes us human.}{George Lakoff, 1987, pp.5--6}

Humans frequently engage and excel in categorization---the generation and identification of groups from a bigger amount of objects. Therefore, categorization is considered one of the most fundamental cognitive phenomena overall \citep{ashby2001categorization, bruner1956study, cohen2005bridging, lakoff1987women, goldstone2003concepts}. Past research has shed much light on the cognitive processes during categorization learning and generalization \citep[for an overview over the diverse cognitive models of categorization, see][]{kruschke2008models,wills2013models}. For instance, research 
showed that people with low cognitive capacities due to time pressure base their categorization behavior only on part of the available stimulus information rendering the categorization process thus computationally less extensive \citep{lamberts1995categorization, lamberts1998time, lamberts1999building, lamberts1999categorization, lamberts1997fast}. In a similar vein, research investigated different computationally simple categorization processes such as basing categorization on only one single feature (i.e., a unidimensional categorization process, \citealp{johansen2002there}) or using a central tendency (i.e., a prototype) instead of multiple actual instances to represent a category \citep{smith1998prototypes}. 
% simple categorization processes: \citep{gluck2002people, meeter2006strategies, meeter2008probabilistic, johansen2002there, smith1998prototypes}

While unidimensional categorization and prototype--based categorization reduce the amount of information that enters the categorization process, another, remarkably different way to simplify categorization lies in reducing the complexity of the categorization process itself. This kind of simplification is the focus of the current study: Within the framework of the generalized context model \citep{nosofsky1986attention}, which categorizes an object (i.e., the probe) by comparing it with 
%based on its similarity and ultimately on its distance to 
category instances (i.e., exemplars), I investigate whether people simplify the way in which they compare probe and exemplars. 
More specifically, I examine a heuristic at the level of the psychological distance function which measures how far away probe and exemplars are in a multidimensional space. The heuristic I investigate checks whether probe and exemplars have the same value on a given feature leading thus to binary feature differences which in turn steer categorization. In contrast to unidimensional categorization and prototype--based categorization, all information of the objects enters the categorization process---how this information is processed, however, is heuristic. 

I assume that the heuristic stimulus comparison is computationally simple as it requires people only to check whether two feature values are identical or not. In case of visually presented stimuli this may be done by superposing the two stimuli and readily inferring from the overlap which features have identical values and which not. Determining the magnitude of feature differences between stimuli, however, requires additional processing of the feature values and is thus computationally more extensive. In sum, I assume that checking the identity of feature values which leads to binary feature differences is a computationally less extensive process than actually subtracting feature values which leads to metric feature differences. 
In accordance with these assumptions, the present thesis tests whether people with low cognitive capacities base categorization on binary feature differences and people with high cognitive capacities base categorization on metric feature differences as is traditionally assumed in the generalized context model \citep{nosofsky1986attention}. I test this hypothesis in a categorization task where low cognitive capacities are operationalized with time pressure using an experimental design that optimally discriminates between the heuristic and the computationally more extensive stimulus comparison process while allowing as well to test for unidimensional categorization. 

% From research on decision making it is well known that people use simple strategies (i.e., heuristics) when they have low cognitive capacities such as limited time for decision \citep{gigerenzer1996reasoning, finucane2000affect, kahneman2003perspective, gilovich2002heuristics, simon1957models, simon1955behavioral}. 

\subsection{Theoretical Background: Categorization Under Time Pressure}
Categorization often has to be carried out quickly. Imagine an emergency medical physician who has to quickly make a diagnosis in order to provide her patient with the right treatment or a traveler in the woods who has to decide whether a long, brown, and slightly curved object is a dangerous snake or a harmless branch. Fast categorization is required in both situations and in order to cope with this time pressure research indicates that people categorize based on only a subset of all available stimulus information \citep{lamberts1995categorization}. 

One line of research assumes that people sequentially sample and process the probe's features in order of their salience; in other words, people sequentially construct a representation of the probe \citep{lamberts1995categorization, lamberts2000information, lamberts2002feature}. At any time point, people compare the features of the probe which they have already sampled with the corresponding features of all stored exemplars in memory leading to a momentary prediction of category membership. Features which have not yet been processed do not affect the categorization prediction at this time point. With every further processed feature, people adjust their categorization prediction. The more evidence people sample in favor of one category, the higher the probability that people will end the categorization process and assign the probe to this very category. In case all features of the probe have been sampled, the categorization process ends with certainty and people initiate their response based on the evidence they sampled for the different categories. Within this feature--sampling process time pressure reduces the amount of features a person is able to process before responding, making thus categorization dependent on only a subset of all features. 

Sequential sampling of stimulus features was able account for categorization under time pressure in a series of studies \citep{lamberts1995categorization, lamberts1998time, lamberts1999building, lamberts1999categorization, lamberts1997fast}. In the first study \citep{lamberts1995categorization}, people learned to categorize stimuli with four binary features (i.e., features with each two possible values) into two categories. After learning, participants categorized all 16 possible stimuli with different response deadlines of 600, 1100, or 1600 ms or with no response deadline at all. Response deadlines were assumed to stop the feature--sampling process and thus to shift category assignments. The idea of feature--sampling could well describe the data and especially account for the fact that the response deadlines had differential effects across stimuli. More specifically, the response deadlines strongly affected the categorization of stimuli which people could classify correctly only if they processed a large amount of features. In turn, the response deadlines only weakly altered the categorization of stimuli which people could readily classify correctly on the basis of a small amount of features. The studies of \cite{lamberts1998time} and \cite{lamberts1999categorization} corroborated these findings implementing time pressure with unpredictable response signals instead of fixed, predictable response deadlines as in \cite{lamberts1995categorization}. In a further study, \cite{lamberts1997fast} used stimuli with multivalued features and found that the feature--sampling process can account for categorization behavior in two experiments with fixed time limits of 400 and 700 ms per trial. Finally, \cite{lamberts1999building} tested the feature--sampling process explicitly by letting participants categorize partial stimuli (i.e., stimuli consisting of only a subset of features) without time pressure and the complete stimuli with time pressure. If people sequentially sample stimulus features and the response deadline interrupts this process, then the categorization of the complete stimuli with time pressure should reflect the categorization of the partial stimuli without time pressure. The results were in line with the feature--sampling process showing that the category responses to the partial stimuli could well predict responses to the complete stimuli.

While in the feature--sampling process time pressure reduces the amount of features on which people compare probe and exemplars, the present study investigates whether people simplify the stimulus comparison itself. Specifically, I test a heuristic that checks the identity of feature values and leads to binary feature differences against a more extensive stimulus comparison that subtracts feature values and leads to metric feature differences. The heuristic stimulus comparison cannot be tested readily with the studies mentioned above, as all studies except for one (i.e., \citealp{lamberts1997fast}) used environments with binary features. In such environments, stimuli differ on a given feature by 0 or 1. Computing metric feature differences or binary feature differences leads to the same categorization predictions and the two processes thus cannot be discriminated. In other words, comparing stimuli with binary features cannot be simplified as binary features already have the minimal number of values possible to be informative. The results from the studies described above using binary environments \citep{lamberts1995categorization, lamberts1998time, lamberts1999building, lamberts1999categorization} are thus not applicable to the research question of this thesis, namely, whether people under time pressure simplify the stimulus comparison. In contrast, \cite{lamberts1997fast} used multivalued stimuli---an environment in which the heuristic stimulus comparison I investigate may simplify the categorization process. Testing the heuristic stimulus comparison on the data of \citeauthor{lamberts1997fast}, however, may not suffice due to the experiments' small sample sizes (n = 4 and n = 3 in experiment 1 and 2, respectively). I therefore present in the following an experiment with a larger sample size and with an environment that optimally discriminates between the heuristic and the computationally extensive stimulus comparison. I hypothesize that under time pressure people compare probe and exemplars heuristically by means of binary feature differences, while without time pressure they compare probe and exemplars more thoroughly by means of metric feature differences. 
%The extensive comparison of probe and exemplars is implemented with the Minkowski metric (see below for a formal description) which is, to my knowledge, the only measure of psychological distance used to date in categorization models. The heuristic comparison that checks whether probe and exemplars have identical feature values is formalized by the discrete metric (see also below for a formal description) and might replace the Minkowski metric in categorization under time pressure.

\subsubsection{Summary}
In sum, past research has shown that people sequentially process stimulus features and that time pressure stops this process making thus categorization dependent on only a subset of all available features \citep{lamberts1995categorization, lamberts1998time, lamberts1999building, lamberts1999categorization, lamberts1997fast}. I test an alternative process through which people might cope with time pressure, namely, a heuristic comparison of probe and exemplars. I will now continue by formally describing the framework in which I investigate this heuristic comparison process---namely, the generalized context model \citep{nosofsky1986attention}.

\subsection{Formal Models: The Generalized Context Model With Two Metrics}
One of the most prominent formal models for human categorization is the generalized context model \citep{nosofsky1984choice, nosofsky1986attention, nosofsky2011generalized}, an exemplar--based model, in which people retrieve instances from memory (i.e., exemplars) and compare them with an instance to categorize \citep[i.e., the probe; see also][]{medin1978context}. People are assumed to represent each exemplar as a point in a multidimensional space where the exemplar's feature values are the point's coordinates. Comparisons between probe and exemplars are expressed as distances, which can be interpreted as how far away the probe is spatially positioned from the exemplar of comparison. Distances are transformed into similarities, such that high distances correspond to low similarities and low distances to high similarities. Finally, the aggregate similarity between the probe and the exemplars of one category relative to the aggregate similarity between the probe and all exemplars determines the probabilities with which the probe will be assigned to the different categories. The higher the similarity of the probe to the exemplars of one category the higher the model's prediction of assigning the probe into this very category. 

In the following, I will describe the three computational steps of distance, similarity, and categorization probability which the generalized context model assumes in a multidimensional, multivalued stimulus space with two categories. The formalization of the generalized context model is equivalent to \citeauthor{nosofsky1989further} (\citeyear{nosofsky1989further}, pp. 281--282), except that it has been generalized to more than two features. 

\subsubsection{Categorization probability}
The generalized context model \citep{nosofsky1989further} predicts a probe's category membership by means of the similarity between the probe and the two categories. Specifically, the probability with which probe $i$ is categorized into category $A$ is defined as 
\begin{equation}
P(R_{A}|i) = \frac{b_{A}\sum\limits_{j \in A} s_{ij}}{b_{A}\sum\limits_{j \in A} s_{ij} + (1 - b_{A})\sum\limits_{k \in B} s_{ik}},
\label{eq:probability}
\end{equation}
where $P(R_{A}|i)$ is the probability of assigning probe $i$ to category $A$, $s_{ij}$ is the similarity between probe $i$ and exemplar $j$, and $b_{A}$ is the response bias for category $A$ (with $0 \leq b_{A} \leq 1$). The function assumes probabilistic categorization in accordance with the relative aggregate similarity that is attributable to one of the two categories. The function is often referred to as Luce's choice axiom \citep{luce1959individual} and is derived from the similarity--choice model for stimulus identification \citep{luce1963detection, shepard1957stimulus}. 

\subsubsection{Similarity}
The similarity $s_{ij}$ between probe $i$ and exemplar $j$, which is needed to determine the categorization probabilities, is itself computed from the distance between $i$ and $j$ in a multidimensional space. The transformation of distance into similarity is given by Shepard's universal law of generalization \citep{shepard1987toward}
\begin{equation}
s_{ij} = \exp\left(-c*d_{ij}^p\right),
\label{eq:similarity}
\end{equation}
where $d_{ij}$ is the distance between probe $i$ and exemplar $j$, $c$ (with $0 \leq c$) is an overall sensitivity parameter, and $p$ is a parameter that determines how similarity relates to psychological distance. Well--established versions of the similarity function are the exponential decay function ($p = 1$) for discriminable stimuli and the Gaussian function ($p = 2$) for confusable stimuli \citep{ennis1988confusable, nosofsky1985luce}. The sensitivity parameter $c$ indicates how fast similarity declines with increasing distance and can be understood as a person's sensitivity to psychological distance. For high values of $c$, similarities are already low at small distances and only few, spatially close exemplars influence categorization. For low values of $c$, similarities decline only slowly with increasing distance and numerous exemplars thus influence categorization \citep{nosofsky2011generalized}.

\subsubsection{Distance}
To determine similarity, the generalized context model computes distances between probe and exemplars in a multidimensional space. A vast number of metrics exist that could represent the psychological distance of two objects \citep{deza2009encyclopedia}. The only prerequisites that metrics need to meet are (a) the non--negativity axiom (all distances are greater than or equal to 0), (b) the identity of indiscernibles (distances between identical objects are 0), (c) symmetry (the distance between two objects is independent of the order of the objects), and (d) the triangle inequality (the distance between two objects is at least as small as the distance of these two objects via a third object; \citealp{restle1959metric}). The generalized context model and related models of categorization assume the Minkowski metric which computes metric feature differences \citep{nosofsky1989further}. I propose that under time pressure people use the discrete metric which computes binary feature differences.

\paragraph{Minkowski metric}
The Minkowski metric is formally implemented in the generalized context model as
\begin{equation}
d_{ij} = \left[\sum\limits_{m=1}^M w_{m}*\mid x_{im} - x_{jm}\mid ^r\right]^\frac{1}{r},
\end{equation}
where $d_{ij}$ is the distance between probe $i$ and exemplar $j$, $x_{im}$ is the value of probe $i$ on feature $m$, $w_{m}$ is the attention weight attributed to feature $m$ (with $0 \leq w_{m} \leq 1$ and $\sum w_{m} = 1$),\footnote{Note that the attention weights $w$s are a psychological extension of the Minkowski metric that represents differential allocation of attention across features. In its mathematical definition the Minkowski metric does not include attention weights.} $M$ is the number of features, and $r$ describes the form of the metric (with $r \geq 1$). Well--established versions of the Minkowski metric are the Manhattan metric ($r = 1$) for highly separable features and the Euclidean metric ($r = 2$) for integral features \citep{shepard1964attention, nosofsky1986attention, garner1974processing}. Values of $r$ are constrained to be equal to or higher than 1 for adherence with the triangle inequality \citep{jakel2008similarity,francois2007concentration,tversky1982similarity,beals1968foundations, kress1989linear}. More specifically, for $r < 1$ the Minkowski metric may lead to distances between probe $i$ and exemplar $j$ that are larger than the summed distances between probe $i$ and exemplar $k$ and between exemplar $k$ and exemplar $j$ which violates the triangle inequality. Note that the Minkowski metric computes metric feature differences by subtracting the exemplar's feature values $x_{jm}$ from the probe's feature values $x_{im}$.

\paragraph{Discrete metric}
The discrete metric is formally implemented in the generalized context model as 
\begin{equation}
d_{ij} = \left[\sum\limits_{m=1}^M w_{m}* \rho_{m}(x_{im}, x_{jm}) ^r\right]^\frac{1}{r},
\label{eq:distance}
\end{equation}

where $\rho_{m}(x_{im}, x_{jm})$ is the discrete distance function that checks whether probe $i$ and exemplar $j$ have an identical value on feature $m$. Checking the identity of feature values results in binary feature differences: the feature difference for feature $m$ equals 1, if probe $i$ and exemplar $j$ are not identical on $m$, and equals 0, if $i$ and $j$ are identical on $m$. The discrete distance function is given by

\begin{equation}
\rho_{m}(x_{im}, x_{jm}) = 
\begin{cases}
	1 & x_{im} \neq x_{jm} \\
	0 & else 
\end{cases}.
\end{equation}

All parameters of the generalized context model are preserved in the version with the discrete metric; the only difference to the model version with the Minkowski metric lies in the way how distances between objects are computed.

\subsubsection{Relation to the unidimensional generalized context model}
This section contrasts the heuristic discrete metric with the unidimensional generalized context model, because attending to only a subset of all features is another, prominent way that reduces the complexity of the categorization process under time pressure \citep{lamberts1995categorization, lamberts1998time, lamberts1999building, lamberts1999categorization, lamberts1997fast}. 
%Similarly to the take--the--best heuristic in judgmental research \citep{gigerenzer1999betting}, participants might attend to the one feature which seems most relevant for categorization and classify a probe only based on the value of this feature. 
The generalized context model \citep{nosofsky1989further} partitions attention across features by means of the attention weights $w$s. Whereas in the multidimensional generalized context model attention weights are only constrained to be positive and to sum up to 1, the unidimensional generalized context model sets one of the attention weights to 1 and the remaining to 0. In any environment, there exist as many unidimensional generalized context model versions as the stimuli contain features. 
%An important distinction between the unidimensional and the multidimensional generalized context model is that the unidimensional model version cannot model feature interactions. 

In the present experiment, people had to learn a non--linearly separable category structure which requires people to attend to multiple features in order to correctly categorize the stimuli. In other words, the unidimensional generalized context model cannot learn the category structure of this experiment. As people cannot use the unidimensional generalized context model during learning, I hypothesize that also after learning people will be reluctant to use the unidimensional generalized context model. Thus, I hypothesize that under time pressure people do not simplify categorization by attending to only one feature, but rather by computing psychological distance heuristically with the discrete metric. Still, assuming that people need to simplify their categorization process under time pressure, I hypothesize that people rather use the generalized context model with the unidimensional Minkowski metric than with the multidimensional Minkowski metric. 

In the present study, I also implemented the generalized context model with the unidimensional discrete metric which needs the least computational power, as it uses both a heuristic metric and attends to only one feature. In case this model accounts for categorization behavior this is counted as evidence for the discrete metric as the primary aim of this thesis is to analyze whether people use different metrics depending on the amount of cognitive capacities.

\subsubsection{Relation to prototype--based models}
This section contrasts the generalized context model, which bases categorization on the similarity between probe and exemplars (i.e., actual instances experienced in the past; \citealp{medin1978context, nosofsky1986attention}), with prototype--based models, which base categorization on the similarity between probe and prototypes (i.e., abstract central tendencies; \citealp{posner1968genesis}). 
% Iin comparison to the generalized context model, prototype models cannot depict influences from individual exemplars directly, but only indirectly via prototypes \citep{nosofsky2011generalized, nosofsky1992exemplars, medin1978context}.
% Due to the very fine--grained representation of categories using actual experienced instances, the categorization predictions of the generalized context model are sensitive to influences from individual exemplars---influences which are absent in prototype models \citep{nosofsky2011generalized, nosofsky1992exemplars, medin1978context}. 
%Following a long debate of whether people represent categories using prototypes or exemplars, research indicates that exemplar-based categorization models using a non--linear similarity rule, such as the generalized context model, outperform prototype models in explaining participants' categorization behavior (\citealp{nosofsky1992exemplars}, see also \citealp{scholkopf2002learning}). However, further evidence indicates that people shift from a prototype--based model during early categorization to a computationally more complex exemplar--based model as the number of experienced exemplars increases (\citealp{smith1998prototypes}, but for an alternative view, see \citealp{nosofsky2002exemplar}). 
On the computational level, prototype--based models require people to abstract a central tendency of a category, but, in turn, people do not have to retrieve the category's individual exemplars. Hence, compared to the generalized context model, prototype--based models consider quantitatively little information, that summarizes the categories under consideration. Therefore, research views prototype--based models as being computationally simple \citep{smith1998prototypes} and people with low cognitive capacities might use a prototype--based model to simplify categorization. 

Still, I did not integrate a prototype--based model in the present thesis, as design optimization (see below) revealed that no design simultaneously may recover and discriminate the two metrics (i.e., the Minkowski metric and the discrete metric) and the two types of category representation (i.e., prototypes and exemplars). Furthermore, in the present study, participants learned to categorize only a few stimuli in a non--linearly separable category structure---two design properties for which literature suggests that people use exemplar--based categorization processes \citep{smith1998prototypes, smith2000thirty}. Hence, I assume that people use an exemplar--based model in the categorization task of the present study, namely the well--established generalized context model \citep{nosofsky1986attention}.

\subsection{Hypotheses}
The present pre--registered study investigates whether people compute psychological distance heuristically in categorization with limited cognitive resources. Psychological distance is implemented within the framework of the generalized context model \citep{nosofsky1989further} with two different metrics---the Minkowski metric and the discrete metric. Allowing both model versions of the generalized context model to attend to several features (i.e., multidimensional model versions) or to only one feature (i.e., unidimensional model versions) further enables to examine the amount of stimulus information processed in categorization with low cognitive capacities. In the present study, low cognitive capacities are operationalized with time pressure, as time is a fundamental cognitive resource in decision--making. In the baseline condition without time pressure, I expect people to use the multidimensional Minkowski metric that computes metric feature differences; in the condition with time pressure, I expect people to use the heuristic discrete metric (either unidimensional or multidimensional) that computes binary feature differences. Based on this main hypothesis, I pre--registered the following specific hypotheses which all presuppose the use of the generalized context model \citep{nosofsky1989further}: 

Under time pressure, people use the discrete metric (H1a), else a unidimensional Minkowski metric (H1b), else the multidimensional Minkowski metric (H1c). Across participants, the multidimensional discrete metric and the unidimensional discrete metric outperform the remaining models in predicting participant behavior, followed by the unidimensional Minkowski metric on the second rank, and the multidimensional Minkowski metric on the third rank (H1d). At the individual level, the multidimensional discrete metric and the unidimensional discrete metric describe more participants than any of the remaining models (H1e). Furthermore, at the individual level, the count of participants best described by each model follows the rank order that H1d postulates (H1f). 

Without time pressure, people use the multidimensional Minkowski metric (H2a). Across participants, the multidimensional Minkowski metric outperforms the remaining models in predicting participant behavior (H2b). I have no hypotheses about the relative performance amongst the unidimensional Minkowski metric, the unidimensional discrete metric, and the multidimensional discrete metric. At the individual level, the multidimensional Minkowski metric describes more participants than any of the remaining models and the count of participants best described by each model follows the rank order that H2b postulates (H2c).

Finally, the multidimensional discrete metric describes more participants with time pressure than without time pressure (H3a), whereas the multidimensional Minkowski metric describes more participants without time pressure than with time pressure (H3b). Both with time pressure and without time pressure, any sensible model is expected to outperform a random choice model. 

The hypotheses are shown together with their behavioral predictions in Table \ref{tab:hypotheses}. The pre--registration of the present study is available on the Open Science Framework (OSF) website at \href{https://osf.io/94e6u/}{https://osf.io/94e6u/}.

\begin{sidewaystable}
\begin{center}
\begin{threeparttable}
\caption{Hypotheses and behavioral predictions}
\label{tab:hypotheses}
\begin{tabular*}{\textwidth}{lp{115mm}p{110mm}}
\toprule
\multicolumn{1}{l}{Index} & \multicolumn{1}{l}{Hypothesis} & \multicolumn{1}{l}{Prediction}\\
\midrule
\addlinespace
\multicolumn{1}{l}{\emph{H1}} & \multicolumn{1}{l}{\emph{With time pressure, people use...}} \\
\addlinespace
H1a & MULTI-DISC or UNI-DISC, ... & stimulus 100: class B, the others: class A\\
\addlinespace
H1b & else UNI-MINK, ...  & stimulus 003: class A, the others: class B ($w_1$ = 1) \newline stimulus 003: class B, the others: class A ($w_3$ = 1)\\
\addlinespace
H1c & else MULTI-MINK. & stimulus 100: class A, the others: class B\\
\addlinespace
H1d & rank order of model fits at the aggregate level: & MULTI-DISC = UNI-DISC > UNI-MINK > MULTI-MINK\\
\addlinespace
H1e & MULTI-DISC and UNI-DISC describe the most participants & $N_{DISC} > N_{j}$ for all models $j \neq$ DISC\\
\addlinespace
H1f & rank order of model fits at the individual level: & MULTI-DISC = UNI-DISC > UNI-MINK > MULTI-MINK\\
\midrule
\multicolumn{1}{l}{\emph{H2}} & \multicolumn{1}{l}{\emph{Without time pressure, people use...}} \\
\addlinespace
H2a & MULTI-MINK. & stimulus 100: class A, the others: class B\\
\addlinespace
H2b & rank order of model fits at the aggregate level: & MULTI-MINK > {MULTI-DISC, UNI-DISC, UNI-MINK}\\
\addlinespace
H2c & rank order of model fits at the individual level: & MULTI-MINK > {MULTI-DISC, UNI-DISC, UNI-MINK}\\
\midrule
\multicolumn{1}{l}{\emph{H3}} & \multicolumn{1}{l}{\emph{Across time pressure conditions}} \\
\addlinespace
H3a & more people use MULTI-DISC with than without time pressure & $N_{MULTI-DISC}$: with time pressure > without time pressure\\
\addlinespace
H3b & more people use MULTI-MINK without than with time pressure & $N_{MULTI-MINK}$: with time pressure < without time pressure\\
\bottomrule
\addlinespace
\end{tabular*}
\begin{tablenotes}[para]
\textit{Note.} DISC and MINK denote the discrete metric and the Minkowski metric, respectively. MULTI and UNI denote multidimensional and unidimensional model versions of the generalized context model \citep{nosofsky1989further}, respectively. Any sensible model is expected to outperform a random choice model. Behavioral predictions for H1a, H1b, H1c, and H2a are based on Figure \ref{fig:pred_obs_agg}.
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{sidewaystable}

\section{Method}

\subsection{Optimal Experimental Design}
I designed the experimental task using optimal experimental design \citep{myung2009optimal}. An optimal experimental design is defined as a design with ``the greatest likelihood of differentiating the models under consideration'' \cite[][p. 500]{myung2009optimal}. Being able to discriminate between the discrete metric and the Minkowski metric means that each model version is associated with unique expected behavioral predictions in the experimental design. The models under consideration can thus be discriminated well by the data and can be associated more exclusively with the participants, in the expectation. Furthermore, through the maximization of model prediction differences, design optimization increases model recovery (i.e., the best--fitting model is also the model underlying the data; however, each scientific model is only an approximation of the participant's cognitive model, see \citealp{myung2009optimal}). 
%The advantages of design optimization are thus two--fold: A participant's behavior can be associated more exclusively with one of the models under consideration and, given that the participant used one of the models under consideration, the probability is higher that this best--fitting model is the data--generating model. 
Design optimization maximizes thus the experiment's degree of informativeness and cost-effectiveness while keeping the necessary sample size and trial size at a low level \citep{cavagnaro2009better, raffert2012optimally, atkinson2007optimum, nelson2005finding}. 

The advantages of design optimization are most pronounced when the different models differ quantitatively instead of qualitatively as well as when several variables have to be considered simultaneously during the designing process of the experiment making thus a good design hardly visible to the naked eye \citep{myung2009optimal}. Both criteria apply to the present study.
The potential of optimal experimental design is shown in a reanalysis of the designs of the first two experiments of \cite{smith1998prototypes} which aimed at distinguishing the generalized context model \citep{nosofsky1986attention} from the multiplicative prototype model \citep{smith1998prototypes} by using stimuli with six binary features \citep{myung2009optimal}. While the experimental designs of \citeauthor{smith1998prototypes} had model recovery rates of 72.9\% and of 88.8\%, the optimal design that was possible yielded a model recovery rate of 96.3\%, recovering thus in more than 19 of 20 cases the model underlying the data.  

In light of these insights, I ran simulations to find a categorization environment that, in the expectation, optimally discriminates the discrete metric and the Minkowski metric, both implemented in the generalized context model, given the responses during the test phase.\footnote{In the first simulations I further wanted to discriminate the generalized context model from the multiplicative prototype model. However, as simultaneously discriminating the two models and the two metrics led to insufficient model recovery rates, I dropped the multiplicative prototype model from the remaining simulation analyses.} Finding the best possible categorization environment included determining which stimuli belong to which category in the learning phase and which stimuli are presented in the test phase. The stimuli consisted of three features with four values each resulting in 64 possible stimuli. The learning phase included eight of these 64 stimuli constrained to differed from each other maximally by 1 unit per feature. This constraint ensures that the discrete metric and the Minkowski metric yield identical distances and thus none is favored by the learning phase design. The simulations were chronologically iterated over (a) the design (i.e., all possible category structures with equal category base rates for all subsets of eight of 64 stimuli in the learning phase), (b) the true model (i.e., the generalized context model with either the Minkowski metric or the discrete metric), (c) the true parameter combination (i.e., attention weights $w$s ranging from $0$ to $1$ in steps of $1/3$ and the overall sensitivity parameter $c$ ranging from $0.1$ to $4.1$ in steps of $1$), and (d) the fitting model (i.e., the generalized context model with either the Minkowski metric or the discrete metric).

For a given design, true model, and true parameter combination I simulated binary participant responses (i.e., category A or B) for the learning set which was replicated 20 times and for the test set (i.e., the stimuli that were not part of the learning stimuli) which was replicated 10 times. Simulated participant responses were based on the true model's predictions. Then, the free parameters (i.e., attention weights $w$s and the overall sensitivity parameter $c$) of both models were fit to the simulated data of the learning set (excluding the first eight trials) and fixed to the resulting optimal parameters for each participant. Next, I computed predictions for the test set with both fitted models and calculated the log likelihood of the participants' test phase responses using the predictions of each fitted model. The fitting model with the higher log likelihood was defined as the winning model. This procedure allowed conducting model recoveries for every design across all permissible parameter combinations. The optimal design was the design for which the winning model recovered the true data--simulating model best across the permissible model parameters. In the optimal design, the learning criterion was met by both models after adding a softmax choice rule. From the 56 stimuli of the best design that were not included in the learning set, the six stimuli that discriminated models best were selected. Figure \ref{fig:environment} shows the stimuli in the resulting design.

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_environment.png}
\caption{Learning stimuli (in black and white) including their category membership and test stimuli (in grey). Each axis of the graph represents one feature, and the coordinates of each sphere represent the feature values of the respective stimulus. Stimuli in the learning phase include all learning stimuli, stimuli in the test phase include all test stimuli as well as the learning stimuli 002, 012, 101, and 111.}
\label{fig:environment}
\end{figure}

\subsection{Materials and Design}
The experiment was programmed in Expyriment \citep{krause2014expyriment} based on prior work by \cite{albrechtxxxunstacking} and conducted on Fujitsu computers (i.e., ESPRIMO C720) using Windows 7 with 1920 x 1080 BenQ screens. Participants classified different stimuli into two categories by pressing the left and right arrow keys. The stimuli were based on the material of \cite{albrechtxxxunstacking}. Each stimulus consisted of three features represented by grey beams in which feature values ranging from one to four were represented by colored squares. Figure \ref{fig:material} illustrates the material. Participants learned the category structure of eight stimuli in the learning phase, and categorized four of them and six novel stimuli in the test phase (see Figure \ref{fig:environment}). Table \ref{tab:environment} shows the median model predictions for all stimuli of the learning set and the test set. Category labels, the key--label association, the color--feature association, and the visual mapping of features to positions on the screen were randomized across participants.

\begin{figure}
\centering
\includegraphics{fig_material.PNG}
\caption{Stimuli used in the experiment. Each grey beam illustrates a feature and the number of colored squares represents the feature value.}
\label{fig:material}
\end{figure}

\begin{sidewaystable}
\begin{center}
\begin{threeparttable}
\caption{Stimulus environment, median model predictions, and participant responses}
\label{tab:environment}
\begin{tabular}{ccccccccccc}
\toprule
 &  & \multicolumn{4}{c}{Discrete} & \multicolumn{4}{c}{Minkowski} \\
\cmidrule(r){3-6} \cmidrule(r){7-10}
 &  & \multicolumn{1}{c}{Multidimensional} & \multicolumn{3}{c}{Unidimensional} & \multicolumn{1}{c}{Multidimensional} & \multicolumn{3}{c}{Unidimensional} \\
\cmidrule(r){4-6} \cmidrule(r){8-10}
Stimulus & \multicolumn{1}{c}{Category} &  & \multicolumn{1}{c}{$w$_1 = 1} & \multicolumn{1}{c}{$w$_2 = 1} & \multicolumn{1}{c}{$w$_3 = 1} &  & \multicolumn{1}{c}{$w$_1 = 1} & \multicolumn{1}{c}{$w$_2 = 1} & \multicolumn{1}{c}{$w$_3 = 1} & \multicolumn{1}{p{20mm}}{Responses}\\
\midrule
\addlinespace
\multicolumn{2}{c}{\emph{Learning phase}} \\
\addlinespace
001 & B & 0.97 & 0.74 & 0.50 & 0.74 & 0.97 & 0.74 & 0.50 & 0.74 & 0.85\\
002\makebox[0pt][l]{$^{\ast}$} & A & 0.09 & 0.74 & 0.50 & 0.26 & 0.09 & 0.74 & 0.50 & 0.26 & 0.08\\
011 & B & 0.99 & 0.74 & 0.50 & 0.74 & 0.99 & 0.74 & 0.50 & 0.74 & 1.00\\
012\makebox[0pt][l]{$^{\ast}$} & B & 0.93 & 0.74 & 0.50 & 0.26 & 0.93 & 0.74 & 0.50 & 0.26 & 0.77\\
101\makebox[0pt][l]{$^{\ast}$} & B & 0.91 & 0.26 & 0.50 & 0.74 & 0.91 & 0.26 & 0.50 & 0.74 & 0.85\\
102 & A & 0.03 & 0.26 & 0.50 & 0.26 & 0.03 & 0.26 & 0.50 & 0.26 & 0.16\\
111\makebox[0pt][l]{$^{\ast}$} & A & 0.07 & 0.26 & 0.50 & 0.74 & 0.07 & 0.26 & 0.50 & 0.74 & 0.00\\
112 & A & 0.01 & 0.26 & 0.50 & 0.26 & 0.01 & 0.26 & 0.50 & 0.26 & 0.00\\
\midrule
\addlinespace
\multicolumn{2}{c}{\emph{Test phase}} \\
\addlinespace
003 & - & 0.63 & 0.74 & 0.50 & 0.50 & 0.09 & 0.74 & 0.50 & 0.26 & 0.86\\
100 & - & 0.37 & 0.26 & 0.50 & 0.50 & 0.91 & 0.26 & 0.50 & 0.74 & 0.07\\
221 & - & 0.85 & 0.50 & 0.50 & 0.74 & 0.07 & 0.26 & 0.50 & 0.74 & 0.54\\
231 & - & 0.85 & 0.50 & 0.50 & 0.74 & 0.07 & 0.26 & 0.50 & 0.74 & 0.86\\
321 & - & 0.85 & 0.50 & 0.50 & 0.74 & 0.07 & 0.26 & 0.50 & 0.74 & 0.67\\
331 & - & 0.85 & 0.50 & 0.50 & 0.74 & 0.07 & 0.26 & 0.50 & 0.74 & 0.50\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\textit{Note.} Median model predictions for the stimuli used in the learning phase and the test phase. The four stimuli from the learning phase marked with an asterisk also appeared in the test phase. Participant responses are the mean responses for each participant and stimulus aggregated over participants using the median. Participant responses for the learning phase stimuli were computed using the last 100 trials for each participant.
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{sidewaystable}

Participants' main task was to categorize stimuli into two categories. The dependent variable was the categorization decision. The independent variable was the time pressure induced to every second participant in the test phase. It allowed a participant 400 milliseconds plus 30\% of the median decision time needed by this participant across the final 100 learning trials. The study had thus a repeated--measures 2x1 between--subjects design. Further variables assessed were the response time for each trial and the time limit if any. The stimuli were described to the participants as products and the features as ingredients with the feature value showing the quantity of the respective ingredient. The two categories were operationalized as brands (i.e., brand L and brand R). 

\subsection{Participants}
In total, 61 psychology students from the University of Basel (43 females, $M_{age}$ = 24.13 years, $SD_{age}$ = 6.39 years, age range: 19--50 years), recruited over an online platform of the Faculty of Psychology, completed the experiment between February 2019 and May 2019 in exchange for course credit. In the test phase, every second participant faced time pressure (n = 30); the remaining participants had no time pressure (n = 31).
%All participants were asked about color blindness, visual impairment, and in case of impaired vision whether they carried a visual aid during the experiment. 
% 20.02. - 02.05.
Participants selected themselves into the sample and there were no inclusion criteria. Up to six participants could show up to the same experimental session. Ten additional participants showed up to the experiment, but their data were not used for statistics and data analysis, as they failed to reach the accuracy criterion in the learning phase within an hour (n = 2; see also below) or reported that the task was somewhat or absolutely unclear to them (n = 8). Two further participants were used for pretesting purposes. 
% No participant met the exclusion criterion of exceeding with time pressure the time limit in more than 50\% of the test trials for a given test stimulus or having without time pressure a log transformed reaction time more than three standard deviations below the mean of the log transformed reaction times of the learning phase in more than 50\% of the test trials for a given test stimulus.

Sample size was predetermined by a model--based power simulation \citep{gluth2019importance}. Across participants the introduction of time pressure reduced the true probability of choosing the Minkowski metric from 70\% to 30\% and increased the true probability of choosing the discrete metric from 30\% to 70\%. For every participant, a random combination of parameters that met the accuracy criterion was sampled for the two model versions, respectively, and a mixing probability was sampled from a normal distribution truncated between 0 and 1 with the mean being equal to the share of participants using the discrete metric in the respective time pressure condition and the standard deviation being  0.3. For any given test phase trial the mixing probability indicated how much the predictions of the model with the discrete metric influenced the probabilities underlying simulated participant responses relative to the predictions of the model with the Minkowski metric. The log likelihood of the simulated responses was calculated for both models on the participant level and transformed into Akaike weights \citep{wagenmakers2004aic}. A given model was accepted if its Akaike weight exceeded .95. One--sided two proportion z--tests were calculated for each sample size comparing the relative frequency with which the model with the discrete metric was accepted with time pressure and without time pressure. Figure \ref{fig:power} illustrates the proportion of significant results for every sample size across 1000 iterations. Given an aspired power of 80\%, a total sample size of N = 60 (n = 30 in each condition) was necessary to achieve 89\% power. The final sample included 31 participants without time pressure and 30 participants with time pressure.

\begin{figure}
\centering
\includegraphics{fig_power.png}
\caption{Power in dependence of sample size N. Power was estimated in a model--based power simulation \citep{gluth2019importance} where the true probability of using the Minkowski metric changed from 70\% to 30\% and the true probability of using the discrete metric changed from 30\% to 70\% when time pressure was introduced. To achieve the aspired power of .8, 60 participants (n=30 in each condition) were needed.}
\label{fig:power}
\end{figure}

\subsection{Procedure}
Participants were welcomed to the laboratory of the Center of Economic Psychology, seated within a cubicle, and provided with an informed consent which they could sign in case of agreement. Upon signature, the experimenter started the experiment and the participant was presented a series of instructions on the computer (for the exact instructions, see the Appendix): Participants first read, that they had to learn to assign different products to two different brands (brand L and brand R) and that each product consisted of the same three ingredients, but differed from other products in the specific ingredient quantities. They then saw a randomly chosen product (see Figure \ref{fig:material} for an example), read that each grey beam represented one ingredient and the number of colored squares in the beams indicated the quantity of the respective ingredient. Participants had to explore all possible feature values by clicking on each feature at least ten times. Each click changed the value of the respective feature by one unit. Participants then read that in each trial their task was to correctly guess the brand of a randomly chosen product by pressing the left and right arrow keys. Next, participants read that in the first phase of the experiment they would learn the correct brand of each product by receiving feedback. After consistently assigning the products to the correct brand, they would in the second phase assign again products to brands, but without receiving feedback. Participants in the time pressure condition furthermore read that they would have a time limit for their response in the second phase which they should not exceed.

\subsubsection{Learning phase}
In the learning phase participants learned the category structure of eight products (see Figure \ref{fig:environment}). Each block of the learning phase included all eight products; the sequence within each block was randomized. In every trial participants could assign the product without any time limit to brand L or R by pressing the left and right arrow key, respectively. As feedback, participants were shown a happy, green smiley and the exclamation ''Richtig!'' (English: ''Correct!'') in case of a correct response and a sad, red smiley and the exclamation ''Falsch!'' (English: ''False!'') in case of a incorrect response. The product and the feedback remained visible until the participant proceeded to the next trial by pressing the upper arrow key. 
Invalid key presses led to a reminder, which keys were to press to assign the product to a brand and to continue with the next trial. 
Figure \ref{fig:timeline} illustrates two trials of the learning phase.

After the first 100 learning trials participants received every 50 trials feedback on their accuracy in the last 100 trials. To continue with the test phase, participants needed to correctly classify the last three occurrences of each stimulus and reach 80\% accuracy overall in the last 100 trials. If participants did not meet this accuracy criterion within an hour, the experiment was discontinued and the participants received appropriate course credit.

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_timeline.png}
\caption{Timeline of the learning phase. Participants could look at at a given product for as long as they wished (first and third image) and enter their response via the left and right arrow keys. In case of a correct response, a happy, green smiley as well as ''Richtig!'' (English: ''Correct!'') appeared (second image). In case of an incorrect response, a sad, red smiley as well as ''Falsch!'' (English: ''False!'') appeared (forth image). In both cases, people could continue to look at the feedback for as long as they wished before proceeding to the next trial via the upper arrow key.}
\label{fig:timeline}
\end{figure}

\subsubsection{Test phase}
Participants were reminded that the second phase did not include feedback anymore and in case of time pressure were provided with the lower and upper integer of the exact time they had per trial. Without time pressure, participants faced no response deadlines. Participants first performed 32 practice trials in which the products of the learning phase were presented in four blocks with a randomized sequence per block. After practice, 14 blocks consisting each of six novel and four familiar stimuli were presented resulting in 140 test trials (see Figure \ref{fig:environment} for the exact stimuli used). The sequence within each block was again randomized. The procedure in the test phase was equal to the one in the learning phase except for the following criteria: No feedback was provided, the following trial started 500 ms after response entry, and every second participant had time pressure. If participants in the time pressure condition exceeded the time limit in a given trial they were informed that they were too slow and continued with the next trial after a 500 ms inter--trial interval. After the test phase, participants filled out a questionnaire assessing key demographic variables (i.e., age, gender, education, and profession), vision (i.e., color blindness, impaired and corrected vision), and task--related characteristics (i.e., clearness of task and strategy used in the second phase). Participant then received appropriate course credit which marked the end of the experimental session.

\section{Results}
In the following, the investigated versions of the generalized context model \citep{nosofsky1989further} will be referred to as the multidimensional discrete metric, the multidimensional Minkowski metric, the unidimensional discrete metric, and the unidimensional Minkowski metric in accordance with the distribution of attention across features and the metric used by the specific model version. Figure \ref{fig:pred_obs_agg} illustrates for the six novel stimuli in the test phase the responses of participants separately for each time pressure condition as well as the predictions of the multidimensional discrete metric, the multidimensional Minkowski metric, the unidimensional discrete metric, and the unidimensional Minkowski metric across all parameter combinations reaching the accuracy criterion. Table \ref{tab:environment} further shows for each learning and test stimulus the aggregate participant response as well as median model predictions.

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_pred_obs_agg.png}
\caption{Predicted and observed responses for the novel stimuli of the test phase. Points and errorbars on the left denote the mean response and the standard error across participants for each stimulus separately for each time pressure condition. The four violins on the right denote model predictions for all parameter combinations reaching the accuracy criterion. The shapes within the violins denote the median model predictions. Numbers next to the shapes within the violins denote the feature that receives full attention in the unidimensional models. DISC and MINK denote the discrete metric and the Minkowski metric, respectively. MULTI and UNI denote multidimensional and unidimensional model versions of the generalized context model \citep{nosofsky1989further}, respectively.}
\label{fig:pred_obs_agg}
\end{figure}

For all statistical tests, an alpha level of .05 was used, except when multiple comparison corrections are indicated. Analyses included inferential tests at the aggregate level and cognitive modeling at the aggregate and individual levels. For all analyses the randomized category labels were derandomized and, based on Figure \ref{fig:environment}, category A was coded as 1 and category B as 0.

\subsection{Inferential Tests at the Aggregate Level}
Participants needed about 200 trials to reach the accuracy criterion in the condition with time pressure ($M$ = 256.97, $Md$ = 189, $SD$ = 193.00) and in the condition without time pressure ($M$ = 214.58, $Md$ = 140, $SD$ = 151.86). At the end of the learning phase, participants reached similar accuracy in the conditions with time pressure ($M$ = .85, $Md$ = .84, $SEM$ = .04) and without time pressure ($M$ = .86, $Md$ = .87, $SEM$ = .04). In the 32 practice trials with the stimuli from the learning phase, the onset of time pressure was noticeable: Participants in the condition with time pressure were less accurate than at the end of the learning phase ($M$ = .65, $Md$ = .67, $SEM$ = .11). In contrast, participants in the condition without time pressure further increased their accuracy ($M$ = .96, $Md$ = .97, $SEM$ = .06). Reaction times in milliseconds in the learning phase were similar for the participants with time pressure ($M$ = 1888.51, $Md$ = 1481, $SD$ = 1738.42) and without time pressure ($M$ = 1782.69, $Md$ = 1395.50, $SD$ = 1469.03). In the test phase, time limits in milliseconds for the participants in the condition with time pressure were in general somewhat lower than 1000 ms ($M$ = 901.83, $Md$ = 868, $SD$ = 127.30). Congruent with the onset of time pressure, reaction times in milliseconds in the test phase including the practice trials were lower in the condition with time pressure ($M$ = 602.27, $Md$ = 594, $SD$ = 196.74)\footnote{Note that in the condition with time pressure only trials where the time limit was not exceeded were included.} than in the condition without time pressure ($M$ = 1789.51, $Md$ = 1228, $SD$ = 1801.19). 

A linear mixed model with logit link was fit to the observed responses for the novel stimuli in the test phase using the lme4 R package. As the model did not converge in its pre--registered form, I diminished the number of estimated parameters. Specifically, I aggregated the stimuli 221, 231, 321, and 331 for which the model predictions were similar (see Table \ref{tab:environment}). Furthermore, I fit a random intercept, but no random slope per participant instead of the pre--registered random slope, but no random intercept. In the final model the criterion was the participant response, the fixed effects were time pressure condition, stimulus (with three levels), and the interaction thereof, and a random intercept was fit by participant. Table \ref{tab:estimates} shows the regression coefficients of the final model calculated using sum--to--zero contrasts \citep{singmann2017introduction}. 

If people use the discrete metric with time pressure and the multidimensional Minkowski metric without time pressure, then they should classify the stimuli differently in dependence of the time pressure condition. To investigate this effect, I tested whether the interaction between time pressure and stimulus was necessary using relative AIC weights \citep[][p. 194]{wagenmakers2004aic}. The full model with the interaction term had $5.36 * 10^{30}$ higher AIC weights than a restricted model without the interaction term indicating that the interaction between time pressure and stimulus is necessary. A supplementary comparison with a likelihood ratio test corroborated the superiority of the model with the interaction term, $\chi^{2}(2)$ = 145.51, $p$ < .001. 

\begin{center}
\begin{threeparttable}
\caption{Fixed effects estimates for the linear mixed model with logit link}
\label{tab:estimates}
\begin{tabular*}{\textwidth}{c @{\extracolsep{\fill}} ccccc}
\toprule
\multicolumn{1}{c}{Parameter} & \multicolumn{1}{c}{Estimate} & \multicolumn{1}{c}{$SE$} & \multicolumn{1}{c}{$z$} & \multicolumn{1}{c}{$p$}\\
\midrule
\addlinespace
Intercept & -0.03 & 0.18 & -0.18 & 0.86\\
Time Pressure & -0.04 & 0.18 & -0.21 & 0.83\\
Stimulus 100 & 1.65 & 0.08 & 20.56 & <.001\\
Stimulus 003 & -2.10 & 0.09 & -24.28 & <.001\\
Time Pressure x Stimulus 100 & -0.86 & 0.08 & -10.70 & <.001\\
Time Pressure x Stimulus 003 & 0.93 & 0.09 & 10.81 & <.001\\
\bottomrule
\addlinespace
\end{tabular*}
\begin{tablenotes}[para]
\textit{Note.} Coefficients were calculated using sum--to--zero contrasts \citep{singmann2017introduction}. Hence, the coefficient for the intercept is the mean of all means and the coefficients for the fixed effects are the differences between the mean of all means and the mean of the respective effect.
\end{tablenotes}
\end{threeparttable}
\end{center}
\vspace{\baselineskip}

The following tests refer to the coefficients of the full model. I pre--registered to accept H1a, H1b, H1c, and H2a, respectively, in case the coefficient of a stimulus of interest differed from the coefficients of the remaining novel stimuli of the test phase by the algebraic sign in the correct direction or by a significant one-sided contrast. Tests of contrasts used the Holm--Bonferroni alpha--level correction \citep{holm1979simple}. For an overview of the hypotheses and the behavioral predictions, see Table \ref{tab:hypotheses} and Figure \ref{fig:pred_obs_agg}. 

H1a (people use the discrete metric under time pressure) predicted that stimulus 100 is mostly classified into class B, whereas the remaining stimuli are mostly classified into class A (see Figure \ref{fig:pred_obs_agg}). However, the linear mixed model results show that stimulus 100 was mostly classified into class A with $b_{100}$ = 0.73 > $ b_{221,231,321,331}$ = 0.30 > $b_{003}$ = -1.24 ($b$s are logit least--square means and higher $b$s indicate more category A responses). The difference between $b_{100}$ and the remaining coefficients was reliable in a planned post--hoc contrast, but in the opposite direction as hypothesized, $OR$ = 10.9, $SE$ = 2.99, $z$(Inf) = 8.69, $p$ = 1. H1a was thus rejected. 

H1b (people use a unidimensional Minkowski metric under time pressure) predicted stimulus 003 is classified into another class than the remaining stimuli (see Figure \ref{fig:pred_obs_agg}). If the first feature is attended to, stimulus 003 is mostly classified into class A and the remaining stimuli into class B. If the third feature is attended to, stimulus 003 is mostly classified into class B and the remaining stimuli into class A. The linear mixed model results showed that stimulus 003 was mostly classified into class B and the remaining stimuli into class A. The difference between $b_{003}$ and the remaining coefficients was reliable in a planned post--hoc contrast, $OR$ = 0.03, $SE$ = 0.01, $z$(Inf) = -12.44, $p$ < .001. H1b was thus accepted.

H1c (people use the multidimensional Minkowski metric under time pressure) predicted the reverse of H1a, namely that stimulus 100 is mostly classified into class A and the remaining stimuli into class B (see Figure \ref{fig:pred_obs_agg}). The results of the linear mixed model support these findings and the difference between $b_{100}$ and the remaining coefficients was reliable in a planned post--hoc contrast (see H1a above). H1c was thus accepted.

H2a (people use the multidimensional Minkowski metric without time pressure) predicted that stimulus 100 is rather classified into class A and the remaining stimuli into class B (see Figure \ref{fig:pred_obs_agg}). The results of the linear mixed model were in line with this prediction with $b_{100}$ = 2.52 > $ b_{221,231,321,331}$ = 0.52 > $b_{003}$ = -3.02 (again, higher $b$s indicate more category A responses). The difference between $b_{100}$ and the remaining coefficients was reliable in a planned post--hoc contrast, $OR$ = 1870, $SE$ = 738.90, $z$(Inf) = 19.03, $p$ < .001. H2a was thus accepted.

\subsection{Cognitive Modeling}
To gain deeper insight into the cognitive processes underlying categorization, cognitive modeling was employed. During the modeling all fitting used maximum likelihood. The parameters for the different model versions were as follows: The multidimensional Minkowski metric and the multidimensional discrete metric had both five parameters, namely three attention weights $w$s, the sensitivity parameter $c$ (with $0 \leq c \leq 5$), and the softmax choice-rule parameter temperature $\tau$ (with $0.1 \leq \tau \leq 10$). The unidimensional Minkowski metric and the unidimensional discrete metric had both three parameters (which attention weight $w$ is set to 1, $c$, and $\tau$ as defined before). The decay parameter $p$ and the exponent of the metric $r$ were fixed to 1 for all model versions, as the experiment used discriminable stimuli with separable features \citep{ennis1988confusable, nosofsky1985luce, shepard1964attention, garner1974processing}. The response bias $b_A$ was not included in the analyses, as both categories had equal base rates and, hence, there was no reason that one category is chosen predominantly over the other. 

The parameters of the multidimensional Minkowski metric and the multidimensional discrete metric were fit to individual participants’ learning phase data. After fitting, the parameters of both model versions were fixed to the resulting optimal parameters for each participant. The attention weight parameter of the unidimensional Minkowski metric and the unidimensional discrete metric was fit to individual participants’ test phase data. To avoid over--fitting, the sensitivity parameter $c$ and temperature $\tau$ of the unidimensional models were not fit, but fixed to the corresponding multidimensional model’s parameter estimates for each participant. The distribution of fitted parameters across participants is shown in Figure \ref{fig:par_multidim} for the multidimensional model versions and in Figure \ref{fig:par_unidim} for the unidimensional model versions. 

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_par_multidim.png}
\caption{Distribution of parameter estimates for the multidimensional generalized context model. Grey shaded points within the violins indicate individual parameter estimates, the large point within each violin indicates the median parameter estimate across participants.}
\label{fig:par_multidim}
\end{figure}

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_par_unidim.png}
\caption{Distribution of attention weight estimates for the unidimensional generalized context model. Bars indicate separately for the Minkowski metric and the discrete metric how many people are best described by full attention to the first feature and the third feature, respectively. Full attention to the second feature was not included as this model could not be discriminated from a random choice model.}
\label{fig:par_unidim}
\end{figure}

Participants paid much attention to the second feature and had high sensitivity $c$ and low temperature $\tau$ at the boundary of the valid range of the respective parameters which points towards many corner point solutions. For the unidimensional discrete metric, 32 of the 61 participants were best fit by the model with full attention on the first feature, the remaining 29 participants by full attention on the third feature, $\chi^{2}(1)$ = 0.15, $p$ = .70. For the unidimensional Minkowski metric, 21 of the 61 participants were best fit by the model with full attention on the first feature, the remaining 40 participants by full attention on the third feature, $\chi^{2}(1)$ = 5.92, $p$ = .01.

In the following I compare the multidimensional discrete metric, the multidimensional Minkowski metric, the unidimensional discrete metric, and the unidimensional Minkowski metric. All these versions of the generalized context model were expected to outperform a baseline random choice model. If a unidimensional model attending to the second feature described a participant best, I assumed they followed the random choice model, because in my design these two models could not be distinguished. 

\subsubsection{Model comparison at the aggregate level} \label{sec:res_agg}
For each model, using each participant's optimal parameters, the median log likelihood of the test phase data (i.e., the hold--out data) was computed across participants separately for the condition with and without time pressure. The median log likelihoods were transformed into evidence strengths \citep[Akaike weights,][]{wagenmakers2004aic}, and pairwise comparisons between models were conducted \citep[as in][p. 194]{wagenmakers2004aic}. To determine the rank order of models in predicting behavior across participants, a model was accepted as superior to another model if its evidence ratio\footnote{The evidence ratio is the normalized probability of one model over the other \citep[see][p. 194]{wagenmakers2004aic}.} in the paired comparison exceeded .90, rejected if it was inferior to .10; otherwise, inconclusive evidence resulted. 

I first present the results with time pressure. H1d stated the following rank order of models in predicting behavior across participants: multidimensional discrete metric = unidimensional discrete metric > unidimensional Minkowski metric > multidimensional Minkowski metric > random choice model. The observed results contradict this prediction as the random model outperformed the remaining models. The following rank order (with median log likelihoods in brackets) was observed: random choice model (-93.23) > multidimensional discrete metric (-110.25) > multidimensional Minkowski metric (-120.97) > unidimensional Minkowski metric (-149.03) = unidimensional discrete metric (-150.89). H1d was thus rejected. Table 2 shows the median log likelihood as well as the mean and the standard deviation of the log likelihood, the mean absolute prediction error ($MAPE$), the mean accuracy based on the arg max choice rule, and the mean--square error ($MSE$) for the different models computed on the test phase data. Figure \ref{fig:log_lik} further illustrates the negative log likelihoods of the test phase data for the different models. 

In the condition without time pressure, interestingly, participants were best described by a multidimensional discrete metric. This is contrary to hypothesis H2b which stated that the multidimensional Minkowski metric outperforms the remaining models in predicting participants' behavior on the aggregate level and goes against central categorization literature which assumes a Minkowski metric (\citealp{nosofsky1986attention, nosofsky1989further, nosofsky1994rule, nosofsky1984choice}; however, this finding might be explained by the fact that the discrete metric exploits all information from the learning environment and is thus a ecologically rational strategy during learning \citealp{todd2007environments}, see also the Discussion).
%As the stimuli from the learning phase differed from each other maximally by one value on each feature, both the discrete metric and the Minkowski metric lead to equal distances given a set of attention weights. In this case, the use of the discrete metric rather than the Minkowski metric can be seen as ecologically rational, as it exploits all information from the environment equally well as the Minkowski metric, but is more computationally more frugal \citep{todd2007environments}. 
The following rank order of models (with median log likelihoods in brackets) was observed: multidimensional discrete metric (-79.60) > multidimensional Minkowski metric (-94.25) > random choice model (-97.04) > unidimensional Minkowski metric (-132.58) > unidimensional discrete metric (-151.16). H2b was thus rejected. For additional fit indices, see Table \ref{tab:fitmeasures}, and for an illustration of the negative log likelihoods, see Figure \ref{fig:log_lik}.

\begin{sidewaystable}
\begin{center}
\begin{threeparttable}
\caption{Descriptive model fit measures}
\label{tab:fitmeasures}
\begin{tabular*}{\textwidth}{ll@{\extracolsep{\fill}}ccc@{\extracolsep{\fill}}c@{\extracolsep{\fill}}c@{\extracolsep{\fill}}c@{\extracolsep{\fill}}ccc@{\extracolsep{\fill}}c@{\extracolsep{\fill}}c@{\extracolsep{\fill}}c@{\extracolsep{\fill}}}
\toprule
 &  & \multicolumn{6}{c}{Time pressure} & \multicolumn{4}{c}{No time pressure}\\
\cmidrule(r){3-8} \cmidrule(r){9-14}
 &  & \multicolumn{3}{c}{Log likelihood} & & & & \multicolumn{3}{c}{Log likelihood} & & & \\
\cmidrule(r){3-5} \cmidrule(r){9-11}
Attention & \multicolumn{1}{l}{Metric} & \multicolumn{1}{c}{$M$} & \multicolumn{1}{c}{$Md$} & \multicolumn{1}{c}{$SD$} & \multicolumn{1}{c}{$MAPE$} & \multicolumn{1}{c}{Arg max} & \multicolumn{1}{c}{$MSE$} & \multicolumn{1}{c}{$M$} & \multicolumn{1}{c}{$Md$} & \multicolumn{1}{c}{$SD$} & \multicolumn{1}{c}{$MAPE$} & \multicolumn{1}{c}{Arg max} & \multicolumn{1}{c}{$MSE$}\\
\midrule
\addlinespace
\multicolumn{2}{l}{\emph{Generalized Context Model}} \\
\addlinespace
\multirow{2}{*}{Multidim} & Discrete & -115.53 & -110.25 & 28.52 & 0.47 & 0.51 & 0.31 & -84.58 & -79.60 & 29.85 & 0.38 & 0.63 & 0.21\\
\cmidrule(r){2-14}
 & Minkowski & -137.07 & -120.97 & 60.07 & 0.47 & 0.53 & 0.34 & -106.25 & -94.25 & 54.72 & 0.36 & 0.69 & 0.24\\
\cmidrule(r){1-14}
\multirow{2}{*}{Unidim} & Discrete & -160.29 & -150.89 & 66.27 & 0.42 & 0.51 & 0.32 & -181.95 & -151.16 & 83.10 & 0.46 & 0.52 & 0.34\\
\cmidrule(r){2-14}
 & Minkowski & -163.57 & -149.03 & 83.47 & 0.34 & 0.69 & 0.29 & -163.04 & -132.58 & 86.10 & 0.34 & 0.68 & 0.29\\
\cmidrule(r){1-14}
\multicolumn{2}{l}{\emph{Random Choice Model}} & -89.79 & -93.23 & 7.76 & 0.50 & 0.00 & 0.25 & -97.04 & -97.04 & 0.00 & 0.50 & 0.00 & 0.25\\
\bottomrule
\addlinespace
\end{tabular*}
\begin{tablenotes}[para]
\textit{Note.} Unidim and multidim refer to unidimensional models (i.e., attention on one feature) and the multidimensional model (i.e., attention on several features), respectively. Fit measures are the following: $M$ = mean log likelihood, $Md$ = median log likelihood, $SD$ = standard deviation of the log likelihood, $MAPE$ = mean absolute percentage error, Argmax = mean accuracy based on the argmax choice rule, $MSE$ = mean--square error.
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{sidewaystable}

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_log_lik.png}
\caption{Negative log likelihood of the test phase data given model predictions for both time pressure conditions. Lower values indicate a higher probability that the respective model underlies the test phase data. Points illustrate the individual negative log likelihoods. DISC and MINK denote the discrete metric and the Minkowski metric, respectively. MULTI and UNI denote multidimensional and unidimensional model versions, respectively. RANDOM = random choice model.}
\label{fig:log_lik}
\end{figure}

\subsubsection{Model comparison at the individual level} \label{sec:res_ind}
While the aggregate analyses above test the performance of models across participants, additional insights can be gained from individual analyses, as different people may use different cognitive models. I will thus analyze how many individual participants each model can best describe and ultimately test through model comparisons which model is able to describe most participants. 
% In order to see how well the models describe the behavior of the individual participants and thus ultimately to shed light on the cognitive processes in categorization of individuals, I conducted model comparisons at the individual level. 
The following procedure was used: For each model and participant, using the respective participant's optimal parameters, the log likelihood of the test phase data (i.e., the hold--out data) was computed. Individual log likelihoods were transformed into Akaike weights \citep{wagenmakers2004aic} and individual strategy classification was conducted on these Akaike weights. Specifically, if any model’s Akaike weight exceeded .90 the person was assigned to that model; else the person was classified as not described by any model. Figure \ref{fig:model_selection} illustrates how many participants each model was able to describe. Figure \ref{fig:aic} further shows for each participant the Akaike weights of the different models.

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_model_selection.png}
\caption{Number of participants described by the different models under consideration separately for both time pressure conditions. DISC and MINK denote the discrete metric and the Minkowski metric, respectively. MULTI and UNI denote multidimensional and unidimensional model versions, respectively. RANDOM = random choice model.}
\label{fig:model_selection}
\end{figure}

\begin{figure}
\centering
\includegraphics[width = \textwidth]{fig_aic.png}
\caption{Akaike weights of the models for each participant. Each bar represents one participant. The Akaike weights of the different models are stacked upon each other per participant and always sum up to 1. DISC and MINK denote the discrete metric and the Minkowski metric, respectively. MULTI and UNI denote multidimensional and unidimensional model versions, respectively. RANDOM = random choice model.}
\label{fig:aic}
\end{figure}

With time pressure the majority of the 30 participants were described by the random choice model (n = 17, 56.66\%). In contrast, only few participants were described by the unidimensional Minkowski metric (n = 3, 10\%), the multidimensional discrete metric (n = 3, 10\%), and the multidimensional Minkowski metric (n = 2, 6.67\%), and no participant was described by the unidimensional discrete metric. The remaining participants could not be described by any model (n = 5, 16.67\%). The number of described participants varied across models, $\chi^{2}(4)$ = 37.2, $p$ < .001. However, contrary to H1e, which stated that a cognitive model using the discrete metric describes more people than any other model, only few participants were best described by the multidimensional discrete metric or the unidimensional discrete metric (n = 3). The random choice model (n = 17) described more than three times as many people than any model using the discrete metric, $OR$ = 11.77, $SE$ = 0.71, 95\% CI = [2.92, 47.46], Holm--Bonferroni corrected $p$ = .99. 

The rank order of models at the individual level was established in the same way as described for the aggregate level above, namely by means of evidence ratios \citep[see also][p. 194]{wagenmakers2004aic}. For the condition with time pressure, H1f stated the following rank order of models in predicting individual participant behavior: multidimensional discrete metric = unidimensional discrete metric > unidimensional Minkowski metric > multidimensional Minkowski metric > random choice model. Contrary to this prediction, the random choice model outperformed the remaining models. More specifically, the following rank order was observed (with the number of described participants in brackets): random choice model (n = 17) > unidimensional Minkowski metric (n = 3) > multidimensional discrete metric (n = 3) > unidimensional discrete metric (n = 0) > multidimensional Minkowski metric (n = 2). Due to these results, both H1e and H1f were rejected.

Without time pressure almost half of the 31 participants were described by the multidimensional discrete metric (n = 15, 48.39\%) and about a third by the multidimensional Minkowski metric (n = 9, 29.03\%). Only few participants were described by the unidimensional Minkowski metric (n = 3, 9.68\%), by the random choice model (n = 2, 6.45\%), and no participant was described by the unidimensional discrete metric. The remaining participants could not be described by any model (n = 2, 6.45\%). Again, the number of described participants varied across models, $\chi^{2}(4)$ = 26.0, $p$ < .001. Interestingly and contrary to H2c, which held that the multidimensional Minkowski metric describes more people than any other model, the multidimensional discrete metric described 6 people more than the multidimensional Minkowski metric, $OR$ = 2.29, $SE$ = 0.53, 95\% CI = [0.80, 6.53], Holm--Bonferroni corrected $p$ = 0.62. 

Also the rank order of models at the individual level yielded that the multidimensional discrete metric excelled without time pressure. This is in line with the findings from the aggregate model comparisons above and goes again against previous, influential categorization literature \citep{nosofsky1986attention, nosofsky1989further, smith1998prototypes, nosofsky1994rule, nosofsky1984choice}. This finding contrasts also with H2c which stated that the multidimensional Minkowski metric predicts the behavior of more individuals than any other model. Rather, in the data the following rank order was observed (with the number of described participants in brackets): multidimensional discrete metric (n = 15) > multidimensional Minkowski metric (n = 9) > unidimensional Minkowski metric (n = 3) > random choice model (n = 2) > unidimensional discrete metric (n = 0). Due to these results, H2c was rejected.

Finally, I investigated whether some models describe more participants in one of the two experimental conditions. To this end, I compared how many participants the different models described between the conditions with time pressure and without time pressure. Indeed, models differed in the number of participants they described across the experimental conditions as showed a Fisher's Exact Test for Count Data, $p$ < .001. With time pressure the multidimensional discrete metric described only 3 of 30 people (10\%), but without time pressure it described 15 of 31 people (48.39\%). This goes against H3a, which states that more people are described by the multidimensional discrete metric in the condition with time pressure than without time pressure. In a pairwise post hoc comparison, these proportions revealed to differ robustly, but in the opposite direction as hypothesized, $OR$ = 0.12, $SE$ = 0.71, 95\% CI = [0.03, 0.47], Holm-Bonferroni-corrected $p$ = 1. H3a was thus rejected.

Furthermore, the multidimensional Minkowski metric described only 2 of 30 people (6.67\%) with time pressure, but 9 of 31 people (29.03\%) without time pressure. This difference in proportions is in line with H3b, which states that more people are described by the multidimensional Minkowski metric in the condition without time pressure than with time pressure. However, the proportions did not differ robustly in a pairwise post hoc comparison, $OR$ = 5.73, $SE$ = 0.83, 95\% CI = [1.12, 29.25], Holm-Bonferroni-corrected $p$ = .08. H3b was thus rejected.

A pairwise comparison between the multidimensional discrete metric and the multidimensional Minkowski metric for both time pressure conditions showed that the multidimensional discrete metric and the multidimensional Minkowski metric described more participants without time pressure (n = 15 and n = 9, respectively) than with time pressure (n = 3 and n = 2, respectively). Which of the two models described a participant was thus not correlated with the time pressure condition, OR = 1.11, 95\% CI = [0.08, 11.75], $p$ = 1.

\subsection{Summary}
The inferential tests on the aggregate level suggested that people use the Minkowski metric independent of whether they face time pressure or not, which is partially in line with the hypotheses. As hypothesized, without time pressure participants' choices were in line with the multidimensional Minkowski metric, which is the cognitively more complex metric than the discrete metric. However, with time pressure the test phase data were in line with the unidimensional Minkowski metric with attention on the third feature and with the multidimensional Minkowski metric, whereas I hypothesized that people use the discrete metric. These results suggest that people under time pressure do not use a different metric than people without time pressure, but possibly reduce the number of features they attend to. This is in line with the feature--sampling process discussed in the introduction, which assumes that time pressure limits the number of features that a person is able to sample and process and that can be used for categorization \citep{lamberts1995categorization}.  
%the unidimensional Minkowski metric with attention on the third feature fulfills both the acceptance criteria of inverse algebraic signs and a significant contrast, and the multidimensional version fulfills the acceptance criterion of a significant contrast. 

In contrast with the inferential tests and with my hypotheses, cognitive modeling yielded that the multidimensional discrete metric outperformed the remaining models without time pressure and a random choice model was best able to describe participants with time pressure. Due to the binary environment during learning (i.e., all stimuli of the learning phase differed by maximally 1 on each feature) the multidimensional discrete metric and the multidimensional Minkowski metric process all information that is available in the environment. Psychological distances thus did not differ for the discrete metric and the Minkowski metric, given a set of attention weights. Because both metrics do not differ in predictions, participants were ecologically rational if they used the heuristic multidimensional discrete metric rather than the cognitively more complex multidimensional Minkowski metric during learning and they might have continued using this model in the test phase if no time pressure was induced \citep{todd2007environments}. In contrast, if time pressure was induced, the random choice model was best able to describe participants. This suggests that time pressure increases choice inconsistency which is in line with literature on the effect of diminished cognitive capacities on preferential choice \citep[][see also the Discussion]{olschewski2018taxing, burks2009cognitive}. 

\subsection{Explorative Analyses}
In addition to the inferential tests and the cognitive modeling, I conducted a set of pre--registered explorative analyses. These analyses included on the one hand a linear rule--based model to analyze whether a linear rule can describe the data. On the other hand, I implemented a superordinate version of the discrete metric, namely the discrete--threshold metric, which investigates for every participant by how many units two feature values must differ to be also perceived as different.

\subsubsection{Linear rule--based model}
In contrast to exemplar--based models, such as the generalized context model \citep{nosofsky1986attention}, different theoretical approaches predict category membership based on a set of rules. Literature suggests that people often use rules in categorization \citep{restle1962selection, tom1968attention, rouder2006comparing} and theory sharply differs between rule--based and exemplar--based categorization (\citealp{rouder2006comparing}; although models that join exemplar--based and rule--based approaches have been proposed such as ATRIUM, \citealp{erickson1998rules}, and COVIS, \citealp{ashby2011covis}). 

In the present study, members of category A have predominantly a value of 0 on the first feature and a value of 1 on the third feature; reversely, members of category B have predominantly a value of 1 on the first feature and a value of 2 on the third feature (see Figure \ref{fig:environment}). Consequently, a linear rule where people categorize stimuli with lower values on the first or third feature rather into category A and stimuli with higher values on the first or third feature rather into category B may be an appropriate categorization strategy. To test such a strategy, I fit a linear regression model with logit link, participant response as criterion, and features as predictors to the test phase data at the participant level using sum--to--zero contrasts \citep{singmann2017introduction}. In accordance with the pre--registration the complete test phase data was fit and predicted by the regression model yielding a large data set to fit but also risking over--fitting. Prediction used the individual participant's estimated regression coefficients. Across participants, the mean estimated regression coefficients for the three features (i.e., $b_1$. $b_2$, $b_3$) and the intercept (i.e., $b_0$) were $b_1$ = 0.71 ($SD$ = 10.94), $b_2$ = 0.06 ($SD$ = 2.82), $b_3$ = -1.40 ($SD$ = 2.53), and $b_0$ = 0.56 ($SD$ = 11.74). The mean $R_{McFadden}^2$ aggregated over participants was .27 (SD = .20). 
The coefficient for the second feature was close to 0, which is in line with the fact, that both categories are equally likely for any value on the second feature in the learning environment (see Figure \ref{fig:environment}). However, the mean coefficient is positive for the first feature, but negative for the third feature, indicating that people classify a stimulus with high values on the first feature rather into category A and a stimulus with high values on the third feature rather into category B. Yet, participants learned in the learning phase that high values on the first feature or on the third feature are both associated rather with category B (see Figure \ref{fig:environment}) which thus contrasts the results from the linear regression model.

To analyze how many participants the linear rule--based model described compared to the other models, I used the model classification procedure by Akaike weights detailed in the individual model comparison above. The results show that in the condition with time pressure, all 30 participants were described by the rule--based model. In the condition without time pressure, a substantial amount of the 31 participants could be described by the linear rule--based model (n = 11, 35.48\%) and by the multidimensional discrete metric (n = 10, 32.26\%). Another few participants were described by the multidimensional Minkowski metric (n = 5, 16.13\%) or could not be described by any of the models (n = 5, 16.13\%). 
The superiority of the linear rule--based model is also reflected in the rank order of models in describing individual participants' behavior that was computed using evidence ratios (for a description of how rank orders were established, see the aggregate analyses above and \citealp{wagenmakers2004aic}). With time pressure the following rank order of models was observed (with the number of described participants in brackets): linear rule--based model (n = 30) > random choice model (n = 0) > unidimensional discrete metric (n = 0) > unidimensional Minkowski metric (n = 0) = multidimensional Minkowski metric (n = 0) = multidimensional discrete metric (n = 0). Without time pressure the following rank order of models was observed: linear rule--based model (n = 11) > multidimensional discrete metric (n = 10) > multidimensional Minkowski metric (n = 5) > random choice model (n = 0) = unidimensional Minkowski metric (n = 0) = unidimensional discrete metric (n = 0).

Across both time pressure conditions, the linear rule--based model was thus superior to the remaining models in describing individual participants---however, the fact that the same data were used for fitting and predicting potentially overestimates the predictive capacity of the linear rule--based model. To test this possibility, I ran another, not pre--registered analysis: I fit the linear rule--based model to the last 100 trials of the learning phase at the participant level using sum--to--zero contrasts and predicted the test phase data (i.e., the hold--out sample). Across participants, the mean estimated regression coefficients for the three features (i.e., $b_1$. $b_2$, $b_3$) and the intercept (i.e., $b_0$) were $b_1$ = -3.04 ($SD$ = 4.01), $b_2$ = -0.12 ($SD$ = 0.71), $b_3$ = -3.06 ($SD$ = 4.17), and $b_0$ = 6.03 ($SD$ = 8.25). The mean $R_{McFadden}^2$ for the training set aggregated over participants was .28 ($SD$ = .12). The mean $R^2$ for the hold--out sample aggregated over participants was .10 ($SD$ = .11).
Again, the coefficient for the second feature was close to 0. More importantly, the coefficients for both the first and the third feature were now negative, indicating that people classify a stimulus with high values on the first or the third feature rather into category B, which is in line with the learning environment (see Figure \ref{fig:environment}).

When the linear rule--based model was fit to the end of the learning phase, but predicted the test phase data, it described only few participants best: With time pressure, a substantial amount of the 30 participants were described by the random choice model (n = 12, 40\%), the remaining participants were described by the linear--rule based model (n = 4, 13.33\%), by the unidimensional Minkowski metric (n = 3, 10\%), by the multidimensional discrete metric (n = 2, 6.67\%), by the multidimensional Minkowski metric (n = 1, 3.33\%), or were not described by any model (n = 8, 26.67\%). The following rank order of models in describing individual participants' behavior was observed in the condition with time pressure using evidence ratios (the number of described participants are in brackets): random choice model (n = 12) > linear rule--based model (n = 4) > unidimensional Minkowski metric (n = 3) > unidimensional discrete metric (n = 0) > multidimensional discrete metric (n = 2) > multidimensional Minkowski metric (n = 1). 

Without time pressure, the linear rule--based model was not able to describe any of the 31 participants anymore. Instead, about half of participants were described by the multidimensional discrete metric (n = 15, 48.39\%), about a third by the multidimensional Minkowski metric (n = 9, 29.03\%), and few were described by the unidimensional Minkowski metric (n = 2, 6.45\%), by the random choice model (n = 2, 6.45\%), or were not described by any model (n = 3, 9.68\%). The following rank order of models in describing individual participants' behavior was observed in the condition without time pressure (with the number of described participants in brackets): multidimensional discrete metric (n = 15) > multidimensional Minkowski metric (n = 9) > unidimensional Minkowski metric (n = 2) > linear rule--based model (n = 0) > random choice model (n = 2) > unidimensional discrete metric (n = 0). 

In sum, when conducting predictive model comparison on the test phase data, the linear rule--based model could not account for individual participants' behavior and was outperformed by the random choice model in the condition with time pressure and by both the multidimensional discrete metric and the multidimensional Minkowski metric in the condition without time pressure.

%The results show that in the condition with time pressure, almost all 30 participants were described by the rule--based model (n = 28). Only 1 participant was described by the unidimensional Minkowski model, and 1 participant could not be described by any of the models. In the condition without time pressure, a substantial amount of the 31 participants could be described by the linear model (n = 10), by the multidimensional discrete model (n = 8), and by the multidimensional Minkowski model (n = 7). The remaining 6 participants could not be described by any of the models. Across both time pressure conditions, the linear rule--based model was thus the model that could describe the highest number of participants---however, the fact that the same data were used for fitting and predicting potentially overestimates the predictive capacity of the linear regression model. 

\subsubsection{Discrete--threshold metric}
The high number of participants with time pressure that were best described by the random choice model indicates that people categorize differently with time pressure, but not in line with the discrete metric. However, the assumption of the discrete metric that all non--identical feature values are also perceived as being different might be too deterministic. Rather, it might be that some people under time pressure still perceive feature values as identical, if they differ only by a small amount. Only if feature values differ substantially, people would also perceive them as different. 
% vary with respect to the difference two feature values need to have in order to be perceived as being not identical. 

To test this idea, a superordinate version of the discrete metric (i.e., the discrete--threshold metric) was implemented. According to the discrete--threshold metric two feature values are identical if they differ by maximally $\gamma$ and different otherwise, with the threshold $\gamma$ being a free parameter of the model. In this experiment with four--valued features, $\gamma$ could take the values 0, 1, and 2. If $\gamma$ equals 0, the discrete-threshold metric corresponds to the discrete metric above. Formally, the discrete--threshold metric is defined as

\begin{equation}
\rho_{m}(x_{im}, x_{jm}) = 
\begin{cases}
	1 & \mid x_{im} - x_{jm} \mid > \gamma \\
	0 & else 
\end{cases},
\end{equation}

where $\rho_{m}(x_{im}, x_{jm})$ is the distance function that checks whether probe $i$ and exemplar $j$ differ by maximally $\gamma$ on dimension $m$. The reminder of the model is given by Equations \ref{eq:probability}, \ref{eq:similarity}, and \ref{eq:distance}. 

Within the generalized context model \citep{nosofsky1989further} the discrete--threshold metric was fit to the test phase data on the participant level. Learning phase data could not be fit, as the stimuli during learning diverged on each feature my maximally 1 rendering the estimation of $\gamma$ impossible. Optimal parameters for each participant were used to compute model predictions and log likelihoods of the same test phase data which was used during fitting. In the condition with time pressure, the discrete--threshold metric fit the majority of the 30 participants best with $\gamma$ = 0 (which corresponds to the discrete metric; n = 20, 66.67\%) and few participants with $\gamma$ = 1 (n = 6, 20\%) and $\gamma$ = 2 (n = 4, 13.33\%). In the condition without time pressure, the discrete--threshold metric fit even a higher proportion of the 31 participants best with $\gamma$ = 0 (n = 24, 77.42\%) and only few participants with $\gamma$ = 1 (n = 6, 19.35\%) and $\gamma$ = 2 (n = 1, 3.26\%). Relative to the condition without time pressure, more people were best fit by a $\gamma$ > 0 in the condition with time pressure suggesting that under time pressure the feature values of two stimuli must differ by a greater amount in order to be perceived as different. However, the differences between the time pressure conditions were not robust in a Fisher's Exact Test for Count Data, $p$ = .37. In order to reduce the bias caused by fitting to and predicting the same data set, the model comparison analysis retook for all 44 participants best fit by the discrete--threshold metric with $\gamma$ = 0 the log likelihood from the equivalent model with the discrete metric that was fit to the learning phase. As the discrete metric is nested in the discrete--threshold metric, model comparison did not include the multidimensional discrete metric nor the unidimensional discrete metric, but the multidimensional discrete--threshold metric and the unidimensional discrete--threshold metric. 

I first present the results on the aggregate level using evidence ratios (see above and \citealp{wagenmakers2004aic}). In the condition with time pressure, the multidimensional discrete--threshold model scored on the second rank ($M(LL)$ = -101.78, $Md(LL)$ = -96.12, $SD(LL)$ = 28.54, $MAPE$ = 0.45, argmax = 0.46, $MSE$ = 0.27) and was still outperformed by the random choice model. The unidimensional discrete--threshold model scored on the third rank ($M(LL)$ = -124.27, $Md(LL)$ = -104.16, $SD(LL)$ = 52.33, $MAPE$ = 0.42, argmax = 0.45, $MSE$ = 0.27). For the model fit coefficients of the remaining models, see Table \ref{tab:fitmeasures}. The following rank order of models in describing the aggregate data was observed with time pressure (with median log likelihoods in brackets): random choice model (-93.23) > multidimensional discrete--threshold metric (-96.12) > unidimensional discrete--threshold metric (-104.16) > multidimensional Minkowski metric (-120.97) > unidimensional Minkowski metric (-149.03).

Without time pressure, the multidimensional discrete--threshold metric outperformed the remaining models ($M(LL)$ = -79.09, $Md(LL)$ = -77.14, $SD(LL)$ = 25.50, $MAPE$ = 0.37, argmax = 0.63, $MSE$ = 0.19). The unidimensional discrete--threshold metric scored on the penultimate rank ($M(LL)$ = -149.15, $Md(LL)$ = -124.07, $SD(LL)$ = 79.80, $MAPE$ = 0.43, argmax = 0.54, $MSE$ = 0.30). For the model fit coefficients of the remaining models, see Table \ref{tab:fitmeasures}. The following rank order of models in describing the aggregate data was observed without time pressure (with median log likelihoods in brackets): multidimensional discrete--threshold metric (-77.14) > multidimensional Minkowski metric (-94.25) > random choice model (-97.04) > unidimensional discrete--threshold metric (-124.07) > unidimensional Minkowski metric (-132.58).

I now present the results on the individual level, using the procedure with Akaike weights detailed above. In the condition with time pressure, about a third of the 30 participants were best described by the random choice model (n = 11, 36.67\%). A few participants were described by the multidimensional discrete--threshold metric (n = 4, 13.33\%), by the unidimensional Minkowski metric (n = 3, 10\%), and some could not be described by any of the models (n = 12, 40\%). The following rank order of models in describing individual participants' behavior was established in the condition with time pressure using evidence ratios (see above and \citealp{wagenmakers2004aic}): random choice model (n = 11) > multidimensional discrete--threshold metric (n = 4) > unidimensional discrete--threshold metric (n = 0) > unidimensional Minkowski metric (n = 3) > multidimensional Minkowski metric (n = 0). 

Without time pressure, almost half of the 31 participants were best described by the multidimensional discrete--threshold metric (n = 14, 45.16\%) and a third by the multidimensional Minkowski metric (n = 10, 32.26\%). The remaining participants were described by the unidimensional Minkowski metric (n = 4, 12.90\%), by the random choice model (n = 2, 6.45\%), or could not be described by any of the models (n = 1, 3.26\%). The following rank order of models in describing individual participants' behavior was observed in the condition without time pressure: multidimensional discrete--threshold metric (n = 14) > multidimensional Minkowski metric (n = 10) > unidimensional Minkowski metric (n = 4) > random choice model (n = 2) > unidimensional discrete--threshold metric (n = 0).

\subsubsection{Summary}
To contrast the exemplar--based generalized context model with a rule--based model, as is often done in categorization research, \citep{restle1962selection, tom1968attention, rouder2006comparing}, I implemented a linear regression model with logit link, where the features were the predictors and the participant response was the criterion. 
%Mean coefficients across participants showed that higher values on the first feature were associated with more category A responses, while higher values on the third feature were associated with more category B responses. This is at odds with the learning environment, where stimuli with higher values on the first feature or the third feature were both rather associated with category B. 
If, as pre--registered, the test phase data were used for fitting and prediction, the linear rule--based model excelled especially at describing participants with time pressure, but also outperformed the remaining models without time pressure. In turn, if in a exploratory way the final 100 trials of the learning phase were used for fitting and the test phase data were predicted, the linear rule--based model was outperformed by the random choice model in the condition with time pressure and by the multidimensional discrete metric and the multidimensional Minkowski metric in the condition without time pressure. Given fair evaluation standards that minimize overfitting the linear rule--based model is thus inferior to some of the competing models in explaining participant behavior.

To test whether some participants under time pressure perceive feature values with small differences still as identical, I implemented the discrete--threshold metric, a superordinate version of the discrete metric. Parameter estimates for $\gamma$, however, indicate that most participants already perceive feature values that differ by 1 as not identical (n = 20 of 30 participants with time pressure and n = 24 of 31 participants without time pressure). In this case, the discrete-threshold metric corresponds with the discrete metric. Similar to the discrete metric, the discrete--threshold metric excelled in the condition without time pressure, but was outperformed by the random choice model in the condition with time pressure. 
%Similar to the discrete metric, the discrete--threshold metric described the most participants without time pressure (n = 14 of 31), but was outperformed by the random choice model in the condition without time pressure (discrete--threshold metric: n = 9; random choice model: n = 11). 
Hence, the results with the discrete--threshold metric did not qualitatively differ from the results with the discrete metric.

\section{Discussion}
The present thesis examined in a categorization task whether people, given they use the generalized context model \citep{nosofsky1986attention}, apply the heuristic discrete metric instead of the cognitively more extensive Minkowski metric when put under time pressure. Evidence from inferential statistics was not in line with this hypothesis as both with time pressure and without time pressure a cognitive model using the Minkowski metric was able to predict participant behavior. Cognitive modeling, in contrast, yielded that the multidimensional discrete metric outperformed the multidimensional Minkowski metric in both time pressure conditions. Contrary to expectations, the multidimensional discrete metric excelled in the condition without time pressure, but was outperformed by a random choice model in the condition with time pressure. More people with time pressure than without time pressure were described by the multidimensional discrete metric, which is against the hypotheses, and by the multidimensional Minkowski metric, which is in line with the hypotheses, but not statistically significant. In sum, these findings suggest that people do not use the discrete metric under time pressure, but rather the Minkowski metric (inferential tests) or respond more randomly (cognitive modeling). Cognitive modeling, however, also suggests that the discrete metric competes the Minkowski metric in describing participants independent of time pressure. 

\subsection{Implications for Theory and Research}
\subsubsection{Choice inconsistency with time pressure}
The present results indicate that people do not use different metrics in conditions with time pressure and without time pressure. Rather, there is an increase of random categorization when people have reduced cognitive capacities due to time pressure. Similar findings stem from the domain of preferential choice, where cognitive load produced by a secondary task increased participants' inconsistency in preferences \citep{olschewski2018taxing, burks2009cognitive}. 
\cite{olschewski2018taxing} modeled choice inconsistency with a probit choice model, where the utility of any given option is variable, but people always choose the option with the momentarily highest utility, and with the trembling hand error, where the utility of any given option is fixed, but people choose with a certain probability the option with the smaller utility. 

Similarly, choice inconsistency may be integrated into the generalized context model: The analogy of the probit choice model is that people do not consistently assign the correct category to the retrieved exemplars. Specifically, each exemplar which is compared to the probe is falsely remembered as a member of the opposite category with a certain probability $p$, which mimics the variability of utility in the preferential choice domain. The analogy of the trembling hand error lies to some extent in Luce's choice axiom within the generalized context model which states that people choose the less likely category with a probability equal to the model's predictions for this category. Note that Luce's choice axiom may be extended with a response--scaling parameter which tunes how deterministically responses are rendered \citep{nosofsky2002exemplar, nosofsky2011generalized, ashby1993relations}. One key difference to the trembling hand error, however, is that in Luce's choice axiom even with the response--scaling parameter the response probabilities are not stable within participants, but stimulus--dependent. Specifically, a stimulus $i$ with extreme categorization probabilities (e.g., $P(R_{A}|i)$ = .90) is classified less often into the less likely category than a stimulus with categorization probabilities close to random choice (e.g., $P(R_{A}|i)$ = .60). If one wanted to implement the trembling hand error in the generalized context model, then Luce's choice axiom needs to be replaced first by the argmax choice rule, which makes probabilistic category predictions deterministic, and then by the trembling hand error, which is a participant--wise constant probability of assigning the probe to the less likely category. However, given that Luce's choice axiom accurately predicts participants' choice proportions \citep{nosofsky1987attention, mckinley1995investigations, lamberts2000information}, it is unclear whether this alternative choice rule may outperform Luce's choice axiom. 

In the present study, choice inconsistency was modeled with softmax, which makes categorization responses less deterministic with increasing temperature $\tau$. Fitting the test phase data exploratively showed that $\tau$ was higher under time pressure than without time pressure when using the multidimensional Minkowski metric (with time pressure: $M$ = 0.38, $Md$ = 0.17, $SD$ = 0.45; without time pressure: $M$ = 0.14, $Md$ = 0.10, $SD$ = 0.09; $t$(31.05) = -2.80, $p$ = .004) and when using the multidimensional discrete metric (with time pressure: $M$ = 0.98, $Md$ = 0.22, $SD$ = 2.48; without time pressure: $M$ = 0.13, $Md$ = 0.10, $SD$ = 0.07; $t$(29.05) = -1.88, $p$ = .03). Thus, independent of the metric used to fit $\tau$, the results support the insight that people respond more inconsistently under time pressure than without time pressure.

Given the increase of inconsistent responses under low cognitive capacities, further research should try to identify the cognitive underpinnings of choice inconsistency in categorization (i.e., whether responses are inconsistently computed such as with a probit choice model or inconsistently executed such as with a trembling hand error). Which factors drive choice inconsistency is debated as well in other psychological domains \citep{blavatskyy2010models}. To that end, experimental design optimization \citep{myung2004model} could reveal which assumptions have to be met such that different models of stochastic choice can be discriminated from each other and which environments maximize discrimination and are thus potential candidates to test stochastic choice models against each other.

\subsubsection{The discrete metric without time pressure}
Interestingly, the results of the present study suggest that people use the multidimensional discrete metric rather without time pressure than with time pressure. However, these results may be explained by how I designed the stimulus environment. More specifically, the environment of the present study has been generated such that the discrete metric and the Minkowski metric lead to equal distances between learning stimuli. As already mentioned, in such an environment it is ecologically rational to use the heuristic discrete metric in the learning phase, as it exploits all available information from the environment \citep{todd2007environments}. Participants thus may have used the multidimensional discrete metric during learning and responded faster than if they would have used the computationally more extensive multidimensional Minkowski metric. Establishing a time limit from the response times of the learning phase might have rendered the use of the multidimensional discrete metric in the time pressure condition computationally too extensive. As a result, people reduced their choice consistency. In the condition without time pressure, participants were still able to use the discrete metric, which is in line with the data, as 15 of 31 participants without time pressure were best fit by the multidimensional discrete metric. Further research should investigate this idea in more detail with different heuristic categorization processes (e.g., unidimensional models and prototype--based models) that are ecologically rational given the stimulus environment in the learning phase.

\subsection{Alternative Cognitive Processes: Rule--Based Decision--Making}
Parameter estimates in Figure \ref{fig:par_multidim} show that participants in the aggregate attended primarily to the second feature. However, in my design, the second feature alone was uninformative for participants, as both categories were equally distributed across the values of the second feature in the learning phase (see Figure \ref{fig:environment}). In other words, there were both two members of category A and two members of category B that had a value of 0 and a value of 1 on the second feature. Knowing the value of the second feature gives thus by itself no information of category membership. In the generalized context model attending to the second feature shifts the predictions to the random choice level of .50 (see also Table \ref{tab:environment}). 
While a random choice model or the, in the experimental task, equivalent unidimensional generalized context model attending to the second feature cannot achieve the accuracy criterion in the learning phase, a sequential rule--based model could potentially explain these results. Specifically, given a value of 0 on the second feature, the third dimension discriminates the categories perfectly, and given a value of 1 on the second feature, the first dimension discriminates the categories perfectly. Instead of retrieving exemplars, participants could thus have learned the category structure by using a two--step rule--based model, where the first step is to always attend to the second feature and the second step is to attend to one of the remaining features contingent on the result from the first step.

Literature suggests that people use rules predominantly for well--defined categories \citep{restle1962selection, tom1968attention}, when the stimuli are confusable \citep{rouder2006comparing} and at the beginning of the categorization process \citep{rouder2006comparing}. While the categories in the present study were well--defined and the models were fit to the first phase of the experiment, the stimuli were discriminable rather than confusable, as feature values were indicated with colored unit squares. However, features differed visually only by the color of the squares such that people might have confounded different features with each other making stimuli thus potentially confusable. 
%Thus, participants may have confused learning stimuli such as 012 and 102 where equal feature values belong to different features. 
Past research on rule--based categorization thus does not exclude that people used a rule model in the present study. 

Furthermore the data of the test phase indicate that people without time pressure might have also used a rule to classify the novel stimuli as their responses were very deterministic in comparison to the participants with time pressure. Specifically, the variance of responses for the novel stimuli of the test phase calculated by stimulus and participant was higher with time pressure (aggregated over stimuli and participants: $M$ = 0.16, $Md$ = 0.18, $SD$ = 0.09) than without time pressure (aggregated over stimuli and participants: $M$ = 0.06, $Md$ = 0.00, $SD$ = 0.08). Possibly, participants without time pressure generalized the rule with which they learned the category structure in the learning phase to the novel stimuli from the test phase, whereas participants with time pressure lacked the temporal resources to execute the rule and behaved more randomly.

\subsection{Generalization to Unfamiliar Feature Values}
The present results from the test phase further indicate that people might respond differently to novel stimuli with a higher or lower number of familiar feature values. Specifically, the two novel stimuli with two familiar feature values out of three (i.e., 100 and 003) were classified predominantly in line with the multidimensional Minkowski metric, whereas the four novel stimuli with one familiar feature value out of three (i.e., 221, 231, 321, and 331) were classified at least partially in line with the multidimensional discrete metric. 
An extension of the rule--based model outlined above might account for why some novel stimuli of the test phase match the predictions of the multidimensional Minkowski metric, whereas others match the predictions of the multidimensional discrete metric. The rule--based model could first check whether the stimulus' value on the second feature is familiar from the learning phase, as the value of the second feature determined during learning which feature one needs to attend to in the second step. If the value on the second feature is familiar (i.e., for the stimuli 100 and 003), the rule continues as mentioned above with attending to one of the remaining features contingent on the value of the second feature. If the value on the second feature is not familiar (i.e., for the stimuli 221, 231, 321, and 331), the model checks whether one of the remaining features has a familiar value, which in this case would always be the third feature with the value 1. Given that more members of category A have a value of 1 on the third feature, the model predicts to classify the stimuli 221, 231, 321, and 331 predominantly into category A. 

Another possible explanation for the fact that people classify the stimuli 100 and 003 predominantly in line with the multidimensional Minkowski metric and the stimuli 221, 231, 321, and 331 rather in line with the multidimensional discrete metric is that people simplify their distance computation when facing novel stimuli with many unfamiliar feature values. In other words, a person would use different metrics for different stimuli and refer to the discrete metric to categorize novel stimuli with many unfamiliar feature values and to the Minkowski metric to categorize novel stimuli with many familiar feature values. However, past research on generalization does not support this hypothesis, since the predictions of the discrete metric go into the opposite direction of participants' responses to stimuli which lie outside of the range of familiar feature values (see, for instance \citealp{erickson2002rule, denton2008rule}).

\subsection{Limitations}
The present study goes against the hypothesis that people compute distance heuristically under low cognitive capacities. Rather, participants responded more inconsistently under time pressure while in the condition without time pressure, interestingly, the multidimensional discrete metric competed with the multidimensional Minkowski metric in predicting participant behavior. An issue of the present study is however, that due to the between--subjects design it is unknown whether people actually switch their metric in dependence of the amount of time pressure or whether different people use different metrics consistently. A within--subject design might have shed light on this issue; however, this would have implied to set up two different stimulus environments, one for each time pressure condition. This procedure, in turn, might engender learning and training effects, is more extensive to conduct, and is problematic, as the two stimulus environments probably do not discriminate equally well between the different models under consideration.

Another issue that I did not address in this study is the possibility that people after the onset of time pressure do use the discrete metric, but also allocate their attention differently across the three features. In the present analyses, predictive model comparison was used; that is attention weights were fit to the learning phase data and these estimates were used for prediction of the test phase data. I implemented a set of unidimensional models where attention is reduced to one feature; hence, I was able to test whether people switch from a multidimensional metric to a unidimensional metric after the onset of time pressure. I did not analyze more fine--grained attention shifts that occur after the onset of time pressure, because reallocating attention to all features of the stimuli does not reduce the complexity of categorization and may thus be less plausible under low cognitive capacities than reducing attention to one feature. 

A third issue of the present design is that feature values are discrete integers and visually well discriminable. In such a case, using the Minkowski metric might not be very cognitively demanding and as a result the relative advantage of the discrete metric concerning its computational simplicity is diminished. In environments with continuous features where the calculation of metric feature differences is computationally more extensive the cognitive simplicity of the discrete metric relative to the Minkowski metric may be more pronounced. In such cases, it is possible that people use the heuristic discrete metric more often under low cognitive capacities than in the present study.

\subsection{Conclusion}
The present study analyzed within the framework of the generalized context model \citep{nosofsky1986attention} whether people with low cognitive capacities simplify the categorization process by using the heuristic discrete metric instead of the Minkowski metric. Results indicate that people do not use different metrics in dependence of the time available for categorization: Inferential statistics found that the Minkowski metric could predict responses for the test phase both with and without time pressure. Cognitive modeling found that the discrete metric well described participants without time pressure. In return, time pressure led to an increase of random categorization, which is in line with research on the effects of cognitive load on choice inconsistency \citep{olschewski2018taxing}. These results suggest that people do not adapt their strategy to the available cognitive capacities, but rather display cognitive overload which is manifested by choice inconsistency. Further categorization research may implement different existing models of stochastic choice \citep{blavatskyy2010models, becker1963stochastic} to investigate the cognitive processes underlying choice inconsistency under low cognitive capacities.

\bibliography{example}

\newpage
\section{Appendix}
The following instructions are the original instructions in German used in the experiment. The parts in italic highlight the instructions that were shown only to participants in the condition with time pressure. The $x$ and $y$ in the test phase instructions for participants with time pressure refer to the lower and upper integer of the exact time the given participant had per trial.

\subsection{General Instructions}
Herzlich Willkommen zu dieser Studie, in der wir untersuchen möchten, wie Menschen Kategorisierungen vornehmen.
Bitte lesen Sie die Instruktionen aufmerksam durch. Sollten die Instruktionen unklar sein oder sollte das Experiment nicht richtig funktionieren, geben Sie bitte umgehend dem Studienleiter Bescheid.

In dieser Studie geht es darum zu lernen, hypothetische Produkte zu zwei Marken (Marke L oder Marke R) zuzuordnen.
Die Pfeiltaste nach links steht für Marke L. Die Pfeiltaste nach rechts steht für Marke R.

Jedes Produkt besteht aus drei Zutaten. Jedes Produkt hat eine bestimmte Menge von jeder Zutat und unterscheidet sich somit von den anderen Produkten durch eine einzigartige Kombination der Zutaten.
Alle Produkte bestehen aus denselben drei Zutaten, d.h. die Zutaten ändern sich nicht von Produkt zu Produkt.

Hier sehen Sie ein Produkt. Jede Zutat ist schematisch durch einen grauen Balken dargestellt. Die Anzahl farbiger Quadrate innerhalb der Balken gibt die Menge der jeweiligen Zutat an.
Da Sie das Experiment nicht erfolgreich beenden können, ohne die Produkte mit allen möglichen Zutaten zu kennen, sollten Sie sich jetzt die Produkte und Zutaten in Ruhe anschauen.
Damit Sie sich mit den Produkten vertraut machen können, klicken Sie bitte auf jede Zutat 10 mal.

In jedem Durchgang wird Ihnen ein zufällig ausgewähltes Produkt mit der dazugehörigen Menge von jeder Zutat gezeigt. Ihre Aufgabe ist es, richtig zu erraten, welcher Marke (Marke L oder Marke R) das jeweilige Produkt angehört, indem Sie entweder auf die linke oder die rechte Pfeiltaste drücken.

Das Experiment gliedert sich in zwei Phasen:
In der ersten Phase sehen Sie einige Produkte mehrere Male und lernen sie der richtigen Marke (Marke L oder Marke R) zuzuordnen. Nach jedem Durchgang erhalten Sie eine Rückmeldung, ob sie richtig getippt haben. Ihre Aufgabe ist zu lernen, zu welcher Marke die einzelnen Produkte gehören, und die Produkte konsistent der richtigen Marke (R oder L) zuzuordnen. Sobald Sie dies geschafft haben, gelangen Sie zur zweiten Phase.

In der zweiten Phase sehen Sie erneut einige Produkte. Ihre Aufgabe ist es jedes Produkt der Marke zuzuordnen, zu der es am wahrscheinlichsten gehört (Marke L oder Marke R). In dieser Phase gibt es allerdings keine richtige Antwort mehr - Sie erhalten deswegen auch keine Rückmeldung mehr zu Ihrer Antwort. \textit{Des Weiteren gibt es in der zweiten Phase ein Zeitlimit für Ihre Antwort, das Sie nicht überschreiten sollten.}

\subsection{Learning Phase Instructions}
Sie können nun mit der ersten Phase beginnen.
Drücken Sie den Pfeil nach links um ein Produkt der Marke L zuzuordnen und den Pfeil nach rechts um ein Produkt der Marke R zuzuordnen.
Haben Sie richtig getippt, erscheint ein freundliches Gesicht. Haben Sie falsch getippt, erscheint ein trauriges Gesicht.
Sie können Sich weiterhin das Produkt und die Rückmeldung solange Sie möchten ansehen.  Durch Drücken der Pfeiltaste nach oben gelangen Sie zum nächsten Durchgang.
Es dauert in der Regel mehrere hundert Durchgänge, bis man anfängt, etwas über die verschiedenen Produkte zu lernen - haben Sie also bitte etwas Geduld.

\subsection{Test Phase Instructions}
Nun beginnt die zweite Phase des Experiments.
Da es in dieser Phase des Experiments keine richtigen Antworten mehr gibt, erhalten Sie nun keine Rückmeldung mehr. Weisen Sie bitte jedes Produkt derjenigen Marke zu, der das Produkt für Sie am wahrscheinlichsten angehört.

\textit{In dieser Phase haben Sie nun zusätzlich ein Zeitlimit für jeden Durchgang.
Sie haben zwischen x und y Sekunden pro Durchgang Zeit.
Versuchen Sie bitte, dieses Zeitlimit nicht zu überschreiten!}

Zuerst bearbeiten Sie einige Übungsdurchgänge, in denen Sie Ihnen schon bekannte Produkte sehen.
Drücken Sie weiterhin den Pfeil nach links um ein Produkt der Marke L zuzuordnen und den Pfeil nach rechts um ein Produkt der Marke R zuzuordnen.

\vspace{\baselineskip}

Sie haben die Übungsdurchgänge abgeschlossen und werden nun weitere Produkte sehen.
Drücken Sie weiterhin den Pfeil nach links um ein Produkt der Marke L zuzuordnen und den Pfeil nach rechts um ein Produkt der Marke R zuzuordnen.

\vspace{\baselineskip}

Sie haben die zweite Phase nun abgeschlossen.
Durch Drücken der Leertaste gelangen Sie zum demographischen Fragebogen.
Bitte füllen Sie diesen zum Schluss noch aus.

\end{document}

% KLEINSCHREIBUNG von Modellen (z.B. generalized context model anstatt Generalized Context Model)
% Questions: do I have to use the template? nope, should be okay, Jana checks this
% Use Wilcoxon sign test to check whether observed rank orders differ from hypothesized ones? nope
% Table: Do differences between MAPE and M(LL) make sense? could make sense as MAPE is severe when predictions and obs are small
% MINK-MULTI or MMM or GCM with blabla in figures, tables etc.? spell it out!
% check threshold model. what about chisq.multcomp? what about fisher.bintest? Ask again about regression model (half data). 
% Write discrete metric as formula? yes, perhaps begin with degree of belief (evidence strength) = probability for category 1, choice rule and go to distance computation; softmax am Ende der Modellierung --> for both model versions I use the soft max
% Discrete-threshold model fitted to learning phase? 
% without (vs. with) okay? don't use brackets, write with time pressure than without time pressure
% do simulation studies with the MPM have to be in thesis? no

% To do: look for start and end date, which computers were used
% Jäkel, Schölkopf, Wichman: Generalization and similarity 
% search: environment complexity --> metric changes? no feedback --> metric changes? consequences (false categorizations give minus points)
%
% Please see the package documentation for more information
% on the APA6 document class:
%
% http://www.ctan.org/pkg/apa6
%