<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="brief-report">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2024.1477514</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A boundedly rational model for category learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Houser</surname> <given-names>Troy M.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/881896/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-review-editing/"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Psychology, University of Oregon</institution>, <addr-line>Eugene, OR</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Institute of Neuroscience, University of Oregon</institution>, <addr-line>Eugene, OR</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Ulrich Hoffrage, Universit&#x000E9; de Lausanne, Switzerland</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kenneth Kurtz, Binghamton University, United States</p>
<p>Yoshihisa Fujita, Kyoto University, Japan</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Troy M. Houser <email>thouser&#x00040;uoregon.edu</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>12</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>15</volume>
<elocation-id>1477514</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>08</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>11</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2024 Houser.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Houser</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The computational modeling of category learning is typically evaluated in terms of the model&#x00027;s accuracy. For a model to accurately infer category membership of stimuli, it has to have sufficient representational precision. Thus, many category learning models infer category representations that guide decision-making and the model&#x00027;s fitness is evaluated by its ability to accurately choose. Substantial decision-making research, however, indicates that noise plays an important role. Specifically, noisy representations are assumed to introduce an element of stochasticity to decision-making. Noise can be minimized at the cost of cognitive resource expenditure. Thus, a more biologically plausible model of category learning should balance representational precision with costs. Here, we tested an autoencoder model that learns categories (the six category structures introduced by Roger Shepard and colleagues) by balancing the minimization of error with minimization of resource usage. By incorporating the goal of reducing category complexity, the currently proposed model biases category decisions toward previously learned central tendencies. We show that this model is still able to account for category learning performance in a traditional category learning benchmark. The currently proposed model additionally makes some novel predictions about category learning that future studies can test empirically. The goal of this paper is to make progress toward development of an ecologically and neurobiologically plausible model of category learning that can guide future studies and theoretical frameworks.</p></abstract>
<kwd-group>
<kwd>category learning</kwd>
<kwd>autoencoder (AE) neural networks</kwd>
<kwd>concept learning</kwd>
<kwd>generalization (psychology)</kwd>
<kwd>RULEX</kwd>
<kwd>rate distortion theory</kwd>
<kwd>efficient coding theory</kwd>
</kwd-group>
<counts>
<fig-count count="4"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="65"/>
<page-count count="10"/>
<word-count count="8073"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Cognition</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Conceptual knowledge is a defining characteristic of human intelligence. A powerful way that conceptual knowledge is used is by generalizing it to novel situations, enabling efficient and adaptive behavior (Shepard, <xref ref-type="bibr" rid="B55">1957</xref>, <xref ref-type="bibr" rid="B56">1987</xref>, <xref ref-type="bibr" rid="B57">1994</xref>). For example, when we go to a new grocery store, we can generalize previously acquired knowledge about grocery store layouts to infer that the cheese will be close to the milk. A concept is a mental representation of a category (Goldstone et al., <xref ref-type="bibr" rid="B19">2018</xref>). Thus, the concept of a snake refers to the mental representation of a subjectively constructed category labeled <italic>snake</italic>. Given that categories are constructed by individuals to organize their personal experiences, there are numerous possibilities for <italic>how one</italic> might categorize. Despite considerable advancements in the field, there remains a lack of consensus among researchers regarding the psychological nature of categories. In what follows, we introduce a boundedly rational theoretical framework and novel extension of a previously posited process-level computational model that can capture key aspects of human category learning and memory. The guiding notion is that concepts are boundedly rational representations of categories.</p>
<sec>
<title>Bounded rationality when acquiring category knowledge</title>
<p>Humans make decisions based on internal representations of external variables (Gershman and Daw, <xref ref-type="bibr" rid="B18">2017</xref>; Niv, <xref ref-type="bibr" rid="B39">2019</xref>), but how such variables are encoded and subsequently decoded to make a decision remains an open question. In real-world decision making, biological systems often have to infer latent states (e.g., categories). Many cognitive models of categorization decisions assume veridical internal representations of categories (Nosofsky, <xref ref-type="bibr" rid="B40">1986</xref>; Nosofsky et al., <xref ref-type="bibr" rid="B43">1994a</xref>,<xref ref-type="bibr" rid="B44">b</xref>). Substantial work in reinforcement learning and magnitude discrimination suggests that some amount of noise is inevitable in internal representations (Azeredo da Silveira et al., <xref ref-type="bibr" rid="B3">2021</xref>; Barretto-Garc&#x000ED;a et al., <xref ref-type="bibr" rid="B5">2023</xref>; Li et al., <xref ref-type="bibr" rid="B31">2017</xref>; Prat-Carrabin and Woodford, <xref ref-type="bibr" rid="B47">2022</xref>, <xref ref-type="bibr" rid="B48">2024</xref>; Spitzer et al., <xref ref-type="bibr" rid="B62">2017</xref>). This is to say that it is likely infeasible for biological systems to encode and decode information without error. According to the principle of efficient coding (Barlow, <xref ref-type="bibr" rid="B4">2013</xref>), biological systems should seek to maximize representational precision <italic>while minimizing resource consumption</italic>.</p>
<p>The category learning model proposed by Kurtz (<xref ref-type="bibr" rid="B27">2007</xref>), called the DIVergent Autoencoding (DIVA), has made important advances in making the modeling of category judgements more biologically realistic. DIVA is a neural network model that utilizes an autoencoder architecture. Autoencoders traditionally learn stimulus mappings in an unsupervised fashion. They have three main components: (1) an encoder, (2) a bottleneck, and (3) a decoder. The encoder takes input data and transforms it to a low dimensional space (the bottleneck). The bottleneck is a form of data compression, or dimensionality reduction, often employed in statistical methods like principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE). It forces the model to extract out statistical regularities in the data, effectively shedding the irrelevant information, and therefore minimizing resource expenditure. Then these compressed representations are decompressed by the decoder, which transforms them back into their original dimensions, so as to reconstruct the input. Decoding is not trivial, as it is decoding <italic>from the bottleneck</italic>. In other words, the decoder attempts to reconstruct the input after getting rid of some of its original signal, consistent with the notion from efficient coding theory that biological systems have to balance representational precision with resource expenditure. Low reconstruction error indicates that the bottleneck extracted regularities well. Given that an autoencoder&#x00027;s function is to reconstruct the original input, it is typically not an architecture used to model supervised learning, which attempts to make discrete decisions. However, DIVA makes use of a divergent output layer that enables it to make categorical decisions. We discuss this feature below.</p>
<p>However, the traditional autoencoder can have trouble with generalizing because it can overfit to the data (Monshizadeh et al., <xref ref-type="bibr" rid="B38">2021</xref>), by simply reconstructing learned exemplars rather than a category&#x00027;s central tendency (Bozkurt et al., <xref ref-type="bibr" rid="B9">2021</xref>). Reconstructing a category&#x00027;s central tendency should facilitate broader generalization abilities. To circumvent this issue, we use a variational autoencoder (VAE; Kingma and Welling, <xref ref-type="bibr" rid="B24">2019</xref>).</p>
<p>Rather than deterministically mapping inputs to the bottleneck component, VAEs map inputs to probability distributions, thereby adding a stochastic element and enabling generation of diverse outputs. Moreover, rather than sampling directly from these learned distributions [<inline-formula><mml:math id="M1"><mml:mi>z</mml:mi><mml:mo>&#x0007E;</mml:mo><mml:mrow><mml:mstyle mathvariant="script"><mml:mi>N</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>], which would be computationally intractable, VAEs use the &#x0201C;reparameterization trick&#x0201D; (Kingma et al., <xref ref-type="bibr" rid="B23">2015</xref>). The reparameterization trick expresses the latent probability distributions as deterministic functions of their first two moments: <italic>z</italic> &#x0003D; &#x003BC;&#x0002B;&#x003C3;&#x000B7;&#x003B5;, where &#x003B5; is noise (which is a random sample from a 0 mean Gaussian with unit variance, see Kingma et al., <xref ref-type="bibr" rid="B23">2015</xref>; Kingma and Welling, <xref ref-type="bibr" rid="B24">2019</xref>). This trick makes the sampling procedure differentiable, which in turn allows the model parameters (&#x003BC; and &#x003C3;) to be updated through gradient descent optimization. The loss function that gets optimized is also unique for VAEs. It is a sum of two forms of loss, which is the key theoretical contribution that making DIVA variational makes. The loss function for VAEs is the sum of reconstruction error and the discrepancy between prior and posterior distributions for a sampled latent variable <italic>z</italic>. Reconstruction error is equivalent to distortion in rate distortion theory. It is a measure proportional to the mean squared error between the input and the reconstruction of the input produced by the decoder. The discrepancy between prior and posterior distributions is known as the Kullback-Leibler divergence (Cover and Thomas, <xref ref-type="bibr" rid="B14">1991</xref>) and it functions as a regularizer, constraining decoded representations to be biased toward their prior distribution. This is a desirable property as it entails that, for example, a category representation acquired across numerous experiences cannot be substantially altered from a single outlier exemplar. In other words, the Kullback-Leibler divergence minimizes resources spent on encoding specific exemplars by penalizing higher discrepancies between the input and the central tendency of previous inputs.</p>
<p>It is known that allocated cognitive resources differs between people and can even fluctuate from moment to moment. Therefore, we made use of the &#x003B2;-VAE, which incorporates a non-negative parameter (&#x003B2;) that scales the Kullback-Leibler divergence (Higgins et al., <xref ref-type="bibr" rid="B20">2017</xref>). By scaling the Kullback-Leibler divergence, the bias toward the central tendency of experience can be made more or less prominent. It is conceptually related to cognitive capacity (Bates and Jacobs, <xref ref-type="bibr" rid="B6">2020</xref>), given that less reliance on priors means one can efficiently encode more specific information. Specifically, autoencoders by their very nature try to reconstruct an input, which may make them susceptible to overfitting to the identity of a stimulus (Steck, <xref ref-type="bibr" rid="B63">2020</xref>). In the extreme case that an autoencoder learns to memorize every training stimulus, it would resemble the famous exemplar model (Nosofsky, <xref ref-type="bibr" rid="B40">1986</xref>, <xref ref-type="bibr" rid="B41">1987</xref>) of categorization. However, in the case of categories with many exemplars, this becomes computationally infeasible and thus a tradeoff must be maintained between precision of memories and resource expenditure. Because the Kullback-Leibler divergence functions as a regularizer, constraining representations to resemble prior representations, the VAE additionally minimizes the resource expenditure. Thus, by scaling the Kullback-Leibler divergence, &#x003B2; induces more or less reliance on the prior, effectively tilting the balance of precision and complexity toward one or the other. The relationship between &#x003B2;-VAEs and rate distortion theory has previously been made mathematically concrete (Alemi et al., <xref ref-type="bibr" rid="B1">2017a</xref>,<xref ref-type="bibr" rid="B2">b</xref>).</p>
<p>Finally, we make the &#x003B2;-VAE divergent, as in DIVA and for reasons which we expound upon next. Traditional autoencoders utilize a single decoder to decode <italic>n</italic>-categories, or use multiple autoencoders for each category (Oja, <xref ref-type="bibr" rid="B45">1989</xref>). Such approaches to category learning do not capture differences in category learning driven by learning conditions, such as the nature and number of contrasting categories. In the former case, it is difficult to apply to supervised learning and in the latter case, this is because each category is modeled independently (Kurtz, <xref ref-type="bibr" rid="B27">2007</xref>). To solve this issue, Kurtz (<xref ref-type="bibr" rid="B27">2007</xref>) proposed a single (shared) hidden layer of units and <italic>n</italic> decoders, or <italic>category channels</italic>, in DIVA in order to obtain reconstruction errors for each category. Comparing reconstruction errors then allows one to test the following assumption, namely that using the model&#x00027;s low-dimensional representation of one category to reconstruct the current stimulus is better than using the model&#x00027;s representation of another category to reconstruct the current stimulus. Moreover, by maintaining a shared hidden layer, DIVA and the extension proposed here are plausible models of multitask learning (Ben-David and Schuller, <xref ref-type="bibr" rid="B7">2003</xref>; Caruana, <xref ref-type="bibr" rid="B11">1996</xref>, <xref ref-type="bibr" rid="B12">1997</xref>, <xref ref-type="bibr" rid="B10">1994</xref>), which has recently been revealed to naturally facilitate generalization and abstraction (Driscoll et al., <xref ref-type="bibr" rid="B16">2024</xref>; Garner and Dux, <xref ref-type="bibr" rid="B17">2023</xref>; Sanh et al., <xref ref-type="bibr" rid="B54">2022</xref>; Wards et al., <xref ref-type="bibr" rid="B64">2023</xref>) and may be related to mixed selectivity in the brain (Jeffrey et al., <xref ref-type="bibr" rid="B21">2020</xref>; Kaufman et al., <xref ref-type="bibr" rid="B22">2022</xref>; Rigotti et al., <xref ref-type="bibr" rid="B52">2013</xref>), including the hippocampus (Bernardi et al., <xref ref-type="bibr" rid="B8">2020</xref>; Kira et al., <xref ref-type="bibr" rid="B25">2023</xref>) and the prefrontal cortex (Dang et al., <xref ref-type="bibr" rid="B15">2021</xref>; Parthasarathy et al., <xref ref-type="bibr" rid="B46">2017</xref>), both of which are involved in concept learning. Given the shared layer, the current model claims that the bottleneck component constitutes a space of multiple psychological spaces superimposed upon each other, which is distinct from predictions made by autoencoder models with a single decoder. This means that the current model will yield different reconstructions under different learning conditions (i.e., it utilizes interdependent encoding techniques). By allocating a unique output channel for each category, divergent autoencoder architectures can model supervised learning by obtaining reconstruction errors for each category. For a schematic and relevant terms of the model proposed here see <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>BR-DIVA model architecture. From the far left side, the model begins by taking in an input vector and projecting it onto a hidden layer (bottleneck). Then decoders for each category samples from the hidden layer space to reconstruct the input. Relevant terms reveals the loss function that gets optimized, which is a sum of reconstruction error and capacity-weighted bias. Capacity is simply a freely estimated parameter and bias is the Kullback Leibler divergence between prior and posterior distributions at the hidden layer. Reconstruction error is the squared absolute difference between input and reconstructed representations.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-15-1477514-g0001.tif"/>
</fig>
<p>To test the viability of the currently proposed model, which we call BR-DIVA (for <italic>Boundedly-Rational-</italic>DIVA, see below), we compare its ability to capture a classic benchmark of category learning to the original DIVA model and consider unique predictions by making DIVA variational. The aim of the current paper is to guide future research by positing a few category learning predictions that follow logically from computational principles.</p>
</sec>
<sec>
<title>Model features</title>
<p>The VAE model proposed here is a neural network model with three layers composed of three, two, and six neuron-like units, respectively. The number of units per layer were selected based on the stimulus set used in the current study. Because the stimuli are three-dimensional, the input layer is composed of three units and the output layer is composed to 3 x <italic>n-categories</italic> units. To be comparable to DIVA, which used two hidden layer units, we fixed the number of units in the hidden layer, or bottleneck to two. More details on the stimulus set are provided below. Input and output layers are fully connected with the hidden layer (i.e., the bottleneck). These connections denote the associations between input stimuli, internal cognitive representations, and reconstructions and are learned by iterative updating of weights that scale each connection strength. Unit weights are learned via standard backpropagation (Rumelhart et al., <xref ref-type="bibr" rid="B53">1986</xref>) and activations are passed through a sigmoid function. Weights are updated in proportion to the learning rate. Unit weights are initialized with random values between default values of &#x000B1;0.5, which is convention for neural network research (Kolen and Pollack, <xref ref-type="bibr" rid="B26">1990</xref>) and used in the paper introducing DIVA (Kurtz, <xref ref-type="bibr" rid="B27">2007</xref>).</p>
<p>Activations spread from input to hidden layer units. The hidden layer is comprised of two neuron-like units, which is what gives it its status as a bottleneck. That is, by projecting three-dimensional inputs (see below) onto a two-dimensional space, the encoder is forced to reduce the input&#x00027;s dimensionality. Then the hidden layer projects to the output layer, which has dimensionality equal to the dimensionality of the input stimulus for each channel, which is why the output layer has 6 units (3 units for each category; see below for explanation of the stimuli).</p>
<p>To optimize model fit, a loss function gets minimized. The loss function is the sum of two terms: (1) reconstruction error, and (2) weighted Kullback-Leibler divergence. To obtain the measure of reconstruction error, squared differences between each category channel&#x00027;s output node activations and the input are calculated and scaled with a sensitivity parameter that controls the amount of attention paid to each feature. Summing these differences within each category channel yields a reconstruction error for each category. These measures are then added to the Kullback-Leibler divergence that itself gets scaled by the regularization parameter &#x003B2;. For additional details on how parameter settings relate to category learning, see (Kurtz, <xref ref-type="bibr" rid="B27">2007</xref>, <xref ref-type="bibr" rid="B28">2015</xref>). Here, we fix the sensitivity and learning rate parameters to 1 for brevity [as was done in the original DIVA simulations (Kurtz, <xref ref-type="bibr" rid="B27">2007</xref>)]; and to elucidate the differences between DIVA and BR-DIVA models. DIVA also makes use of an attention breadth parameter that specifies how much attention is allocated to specific dimensions vs. all dimensions; however, to facilitate ease of comparison, this parameter was also fixed to 1 for both models.</p>
<p>To demonstrate the plausibility of the current model&#x00027;s ability to capture human category learning, we test its ability to simulate category learning on the seminal &#x0201C;Six Problems&#x0201D; introduced by Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>).</p></sec>
</sec>
<sec id="s2">
<title>The Six Problems</title>
<p>Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>) tested the difficulty of categorization judgments depending on how the same 8 stimuli were grouped. Specifically, participants were shown three-dimensional stimuli, where each dimension denotes a binary feature (e.g., color, size, and shape). These eight stimuli can be grouped into two groups in 70 different ways, but only six of these are structurally distinct. By &#x0201C;structurally distinct,&#x0201D; we mean that a grouping is not different simply by swapping out features. For example, a Type 1 grouping assigns all four stimuli with one color value (say, black) to Category A and all four stimuli with the other color value (say, white) to Category B. Grouping the stimuli using the same kind of unidimensional rule, simply for a different dimension (i.e., grouping all small stimuli into A and all large stimuli into B) is a technically unique grouping but not structurally distinct.</p>
<p>The six types of groupings differ in the number of dimensions one must attend to in order to achieve optimal performance (Type 1: one dimension, Type 2: two dimensions, and Types 3&#x02013;6: three dimensions). Type 1 adheres to a unidimensional rule-based structure, such that all stimuli with one value on a dimension (e.g., color in <xref ref-type="fig" rid="F2">Figure 2</xref>) belong to Category A while all stimuli with the other value on the same dimension belong to Category B. Type 2 is an exclusive-OR (XOR) problem, where two dimensions are relevant. In <xref ref-type="fig" rid="F2">Figure 2</xref>, Category A stimuli can be white and square or orange and triangle. Types 3, 4, and 5 can all be characterized as rule-plus-exception structures, where a single dimension defines category assignments for three of the category&#x00027;s four stimuli and thus the fourth stimulus for each category must be memorized. Type 6 is the most difficult because it lacks any within-category similarity structure, meaning one must memorize each of the eight stimulus-response associations to perform optimally. <xref ref-type="fig" rid="F2">Figure 2</xref> shows an example for each of the six types.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Six Problems. Every category structure implemented in the seminal paper by Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>). Within each panel, each stimulus on the left belongs to one category and all the stimuli on the right belong to another category. Below the top panels is a 3-dimensional representation of the each category structure.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-15-1477514-g0002.tif"/>
</fig>
<p>The main findings (i.e., that performance follows difficulty level; Type 1 &#x0003E; Type 2 &#x0003E; Types 3&#x02013;5 &#x0003E; Type 6) from the Six Problems introduced in Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>) have been replicated many times, with larger sample sizes, diverse stimulus sets, and across species (Kurtz et al., <xref ref-type="bibr" rid="B29">2013</xref>; Nosofsky et al., <xref ref-type="bibr" rid="B43">1994a</xref>; Smith et al., <xref ref-type="bibr" rid="B61">2004</xref>).</p></sec>
<sec id="s3">
<title>Method</title>
<p>We simulated <italic>n</italic> = 100 participants that performed each of the Six Problems. The model was constructed as a stateful list processor (see Wills et al., <xref ref-type="bibr" rid="B65">2017</xref>) and used the <italic>slpDIVA</italic> function (DIVA model) from the R package <italic>catlearn</italic> (Wills et al., <xref ref-type="bibr" rid="B65">2017</xref>) as a starting point. The current model begins each simulation with randomly initiated weights. A binary three-dimensional input, representing one of the eight stimuli from the Six Problems (i.e., a trial), serves as the first layer and is mapped to 2 probability distributions (i.e., the bottleneck) via matrix multiplication with a set of input weights. These distributions are reparameterized via the reparameterization trick (Kingma et al., <xref ref-type="bibr" rid="B23">2015</xref>). Reparameterized means of these distributions are the hidden unit activation levels which are then projected to two three-dimensional output layers via a set of output weights for each category. The output weights represent input reconstructions. A category judgment, which gets a 1 or 0 for accuracy, is whichever category has less reconstruction error. One simulation is 20 blocks of category learning, where a single block is one iteration through all eight stimuli, presented to the model in random order. We tested both the BR-DIVA and original DIVA model in order to test for any additional benefit of making DIVA variational.</p>
<p>We ran the above procedure for each of 50 different &#x003B2; values, from 0.01 to 100 in evenly spaced increments on a logarithmic scale. By fixing the parameters common to both the original DIVA model and the currently proposed BR-DIVA model, we can succinctly evaluate the contribution that bounded rationality makes to the divergent autoencoding architecture of category learning.</p>
<p>We conducted statistical analysis on the simulated performances from both BR-DIVA and DIVA. All analyses were done on accuracy (proportion correct), though plots show error rate (proportion incorrect) to facilitate easy comparison with previous studies studying the Six Problems. To test the extent to which BR-DIVA&#x00027;s category learning reflects the order of difficulty observed in the Six Problems, we ran a simple linear regression, predicting aggregated performance (overall mean accuracy) from problem type and &#x003B2; parameter value. We ran the same tests for performance from DIVA model (without the &#x003B2; parameter predictor). We ran <italic>post-hoc</italic> paired samples <italic>t</italic>-tests when necessary. To compare performance to empirical data, we obtained public datasets deposited in the R package <italic>sixproblems</italic>. These datasets are from Nosofsky et al. (<xref ref-type="bibr" rid="B43">1994a</xref>) and Lewandowsky (<xref ref-type="bibr" rid="B30">2011</xref>), and we will refer to these datasets as nosofsky94 and lewandowsky11 for simplicity. We briefly describe these datasets below.</p>
<p>After comparing overall performance, we evaluated differences in performance over time (learning curves) between BR-DIVA and DIVA. We conducted simple linear regression models predicting accuracies from block and type for both models.</p>
<p>Nosofsky94 is comprised of 120 participants. Each participant performed two problem types and each problem type was administered an equal number of times. Thus, there were 40 participants assigned to each problem type. The order of problem type assignment to each participant was counterbalanced. The first two blocks comprised one showing of each of the eight stimuli and all subsequent blocks comprised two showings of each of the eight stimuli. Participants continued the task until reaching a criterion of four consecutive sub-blocks of eight stimuli with perfect accuracy or for a maximum of 25 blocks.</p>
<p>Lewandowsky11 is comprised of 113 participants, who each did all six problem types in counterbalanced order. Each problem type was studied for a maximum of 12 blocks, where each block featured 2 showings of each of the eight stimuli. Study was terminated if accuracy was perfect for two consecutive blocks.</p>
<p>To compare learning curves predicted by BR-DIVA with observed data, we ran a mixed effects linear regression model using the <italic>lmer</italic> function from R&#x00027;s lmerTest package. This model predicted accuracy from problem type (3&#x02013;5), block, and their interaction. We also included subject IDs and which dataset the data came from Nosofsky94 or Lewandowsky11 as random effects. For effects of problem type, Type 5 was entered into the model as the reference group. Thus, positive coefficients for Types 3 and 4 indicate higher accuracy than Type 5, and vice versa.</p></sec>
<sec sec-type="results" id="s4">
<title>Results</title>
<sec>
<title>Order of difficulty</title>
<p>The relative ease of acquisition of category knowledge across the Six Problems introduced in Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>) was tested in the boundedly rational model proposed here. We first ran a simple linear regression, predicting average proportion of correct responses (across simulated subjects and blocks) from type (1&#x02013;6) and &#x003B2;. Please note that all &#x003B2;<italic>s</italic> with associated <italic>p</italic>-values below are referring to regression coefficients and not the model parameter. This reveals significant main effects of all types (&#x003B2;<sub>1 &#x02212; 2</sub> = &#x02212;0.09, <italic>p</italic> &#x0003C; 0.001; &#x003B2;<sub>1 &#x02212; 3</sub> = &#x02212;0.09, <italic>p</italic> &#x0003C; 0.001; &#x003B2;<sub>1 &#x02212; 4</sub> = &#x02212;0.08, <italic>p</italic> &#x0003C; 0.001; &#x003B2;<sub>1 &#x02212; 5</sub> = &#x02212;0.1, <italic>p</italic> &#x0003C; 0.001; &#x003B2;<sub>1 &#x02212; 6</sub> = &#x02212;0.4, <italic>p</italic> &#x0003C; 0.001). Moreover, visual inspection of <xref ref-type="fig" rid="F3">Figure 3A</xref> tells us that performance follows the order of difficulty typically observed. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 1</xref> additionally shows that BR-DIVA, like the original DIVA, can capture the revised ordering of the Six Problems, as elucidated in Kurtz et al. (<xref ref-type="bibr" rid="B29">2013</xref>). Further, the BR-DIVA model performance remains relatively stable across all tested &#x003B2; values, at least at the aggregated level (<xref ref-type="fig" rid="F3">Figure 3A</xref>). Paired-samples <italic>t</italic>-tests showed that BR-DIVA predicts worse accuracy than DIVA on Type 2 [<italic>t</italic><sub>(99)</sub> = &#x02212;3.17, <italic>p</italic> = 0.002] and, more prominently, Type 5 [<italic>t</italic><sub>(99)</sub> = &#x02212;5.76, <italic>p</italic> &#x0003C; 0.001], and predicted significantly better accuracy than DIVA on Type 4 [<italic>t</italic><sub>(99)</sub> = 4.20, <italic>p</italic> &#x0003C; 0.001]. All other <italic>ps</italic> &#x0003E; 0.402. Overall sums of squared differences between error probabilities as observed in Nosofsky94/Lewandowsky11 and both BR-DIVA and DIVA are reported in <xref ref-type="supplementary-material" rid="SM1">Supplementary Table 1</xref>.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Overall model performance on the Six Problems. <bold>(A)</bold> X-axis denotes problem type and the y-axis denotes the overall mean performance. Colored lines are BR-DIVA predictions assuming &#x003B2; values ranging from 0.01 (light blue) to 100 (orange). The dashed line are predictions made by DIVA. Xs are empirically observed performances from Lewandowsky (<xref ref-type="bibr" rid="B30">2011</xref>) and &#x0002B;s are empirically observed performances from Nosofsky et al. (<xref ref-type="bibr" rid="B43">1994a</xref>). <bold>(B)</bold> Overall mean accuracy predicted by both BR-DIVA (blue) and DIVA (pink) and observed performance from participants from both Lewandowsky (<xref ref-type="bibr" rid="B30">2011</xref>) and Nosofsky et al. (<xref ref-type="bibr" rid="B43">1994a</xref>). Dots represent individual participants or simulated participants. Error bars are &#x000B1;SEM. <sup>&#x0002A;</sup> &#x0003C; 0.05. NS &#x0003E; 0.05.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-15-1477514-g0003.tif"/>
</fig>
<p>To determine whether these unique predictions made by BR-DIVA better reflect empirical performance than DIVA, we compared performance to that reported in Nosofsky et al. (<xref ref-type="bibr" rid="B43">1994a</xref>) and Lewandowsky (<xref ref-type="bibr" rid="B30">2011</xref>). We ran six two-samples <italic>t</italic>-tests, comparing simulated performances by BR-DIVA and DIVA on Types 2, 4, and 5 with subject averages from Nosofsky et al. and Lewandowsky on the same problems. We collapsed across both datasets, but running the analyses on each dataset separately support the same conclusions. Both models predict significantly better accuracy on Type 4 than is actually observed [BR-DIVA: <italic>t</italic><sub>(251)</sub> = 5.63, <italic>p</italic> &#x0003C; 0.001; DIVA: <italic>t</italic><sub>(251)</sub> = 4.18, <italic>p</italic> &#x0003C; 0.001]. Intriguingly, however, while DIVA predicts significantly more categorization accuracy than is actually observed for both Types 2 [<italic>t</italic><sub>(251)</sub> = 2.49, <italic>p</italic> = 0.014) and 5 [<italic>t</italic><sub>(251)</sub> = 2.80, <italic>p</italic> = 0.005], BR-DIVA&#x00027;s predictions statistically match observed performances [Type 2: <italic>t</italic><sub>(251)</sub> = 1.50, <italic>p</italic> = 0.135; Type 5: <italic>t</italic><sub>(251)</sub> = &#x02212;1.38, <italic>p</italic> = 0.170]. Thus, at the aggregate level, BR-DIVA makes many of the same predictions as DIVA with respect to the Six Problems, as is to be expected given that BR-DIVA is a variational version of DIVA. However, BR-DIVA makes aggregate predictions for Types 2 and 5 that are statistically similar to what is empirically observed in people whereas DIVA does not (assuming all shared parameters are the same across models; <xref ref-type="fig" rid="F3">Figure 3B</xref>).</p>
</sec>
<sec>
<title>Learning curves</title>
<p>To obtain a finer-grained perspective of category learning, we next looked at the learning curves for BR-DIVA. We found that BR-DIVA learns at a similar rate to DIVA for Types 1, 2, 3, and 4, and that learning is relatively stable across different values for &#x003B2;. For Type 5, BR-DIVA and DIVA clearly make different predictions (by the final block, BR-DIVA&#x00027;s best performance, across &#x003B2;<italic>s</italic>, was 96% accuracy, which DIVA surpasses on the 13th block; <xref ref-type="fig" rid="F4">Figure 4A</xref>). Moreover, DIVA&#x00027;s learning curve for Type 6 appears to fluctuate more erratically than BR-DIVA&#x00027;s performance. To follow-up on these observations, we ran two linear regression models, predicting model accuracy on either Type 5 or Type 6 from block (1&#x02013;20), model (BR-DIVA, DIVA), and their interaction. Please note that all &#x003B2;<italic>s</italic> with associated <italic>p</italic>-values below are referring to regression coefficients and not the model parameter.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>BR-DIVA and observed data suggest Type 5 is harder to learn than Types 3 and 4. <bold>(A)</bold> Learning curves predicted by BR-DIVA (colored lines) and DIVA (dashed line). Colored lines are BR-DIVA predictions assuming &#x003B2; values ranging from 0.01 (light blue) to 100 (orange). <bold>(B)</bold> Coefficients for predictors denoted along the x-axis from linear regression model predicting accuracy across blocks for the BR-DIVA model. Both circled coefficients are significantly different from 0 (to reiterate, the effects of problem type, Type 5 was entered into the model as the reference group). <bold>(C)</bold> Coefficients for predictors denoted along the x-axis from a mixed effects model predicting accuracy across blocks for data obtained from Lewandowsky (<xref ref-type="bibr" rid="B30">2011</xref>) and Nosofsky et al. (<xref ref-type="bibr" rid="B43">1994a</xref>). Regression coefficients for Type 3 and Type 4 are significantly different from zero. Error bars are &#x000B1;SEM. <bold>(D)</bold> Conceptual schematic explaining why BR-DIVA predicts worse performance on Type 5 than Types 3 and 4 (which also explains its better Type 4 than Type 3 prediction). The black distribution on the left represents a prior distribution for a category representation represented in the bottleneck layer. This distribution is assumed to be learned by rule acquisition, as Types 3&#x02013;5 all adhere to a rule-plus-exception category structure. Given that this distribution is a category representation, then learning the exception stimulus for each of these types will require this distribution to expand to incorporate the exception. As such, learning the exception stimulus should be a function of its distance from the prior distribution (in the plot, distance along the x-axis). Type 4&#x02032;s exception is closest to its rule-followers, Type 3&#x02032;s exception is second-closest to its rule-followers, and Type 5&#x02032;s exception is furthest from its rule-followers.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-15-1477514-g0004.tif"/>
</fig>
<p>The regression model predicting Type 5 performance showed only a main effect of block (&#x003B2; = 0.02, <italic>p</italic> &#x0003C; 0.001; all other <italic>ps</italic> &#x0003E; 0.177), meaning both models successfully learned the category structure over time. Similarly, the regression model predicting Type 6 performance showed a main effect of block (&#x003B2; = 0.02, <italic>p</italic> &#x0003C; 0.001), but also a marginal effect of model [&#x003B2;(<italic>BRDIVA</italic>&#x02212;<italic>DIVA</italic>) = 0.02, <italic>p</italic> = 0.064]. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 2</xref> shows learning curve predictions for BR-DIVA at all tested &#x003B2;s and DIVA.</p>
<p>Given the consistent differences between Type 5 performance between BR-DIVA and DIVA (<xref ref-type="fig" rid="F3">Figures 3B</xref>, <xref ref-type="fig" rid="F4">4A</xref>), we ran an additional test to try and formulate a specific prediction that could guide future empirical research. Given that many studies on the Six Problems focus on Types 1, 2, 4, and 6 only (Kurtz et al., <xref ref-type="bibr" rid="B29">2013</xref>; Love, <xref ref-type="bibr" rid="B32">2002</xref>; Love and Markman, <xref ref-type="bibr" rid="B33">2003</xref>; Minda et al., <xref ref-type="bibr" rid="B36">2008</xref>; Rabi and Minda, <xref ref-type="bibr" rid="B49">2016</xref>; Rehder and Hoffman, <xref ref-type="bibr" rid="B50">2005a</xref>), likely because Types 3, 4, and 5 tend to be lumped together due to similar performance on these problems (Nosofsky et al., <xref ref-type="bibr" rid="B43">1994a</xref>; Shepard et al., <xref ref-type="bibr" rid="B58">1961</xref>), it is perhaps notable that BR-DIVA predicted worse performance on Type 5 than DIVA and that BR-DIVA captured the empirical data for this category structure better. Therefore, we ran an additional linear regression model, predicting BR-DIVA accuracies from Types 3, 4, and 5 from block (1&#x02013;20), Type (3&#x02013;5), &#x003B2;<italic>s</italic>, and all interactions. Indeed, this model showed that Type 5 accuracy was significantly lower than both Types 3 (&#x003B2;<sub>3 &#x02212; 5</sub> = 0.14, <italic>p</italic> &#x0003C; 0.001) and 4 (&#x003B2;<sub>4 &#x02212; 5</sub> = 0.14, <italic>p</italic> &#x0003C; 0.001). This model also revealed significant Type 3 x block (&#x003B2;<sub>3, <italic>block</italic>&#x02212;5, <italic>block</italic></sub> = &#x02212;0.006, <italic>p</italic> &#x0003C; 0.001) and Type 4 x block (&#x003B2;<sub>3, <italic>block</italic>&#x02212;5, <italic>block</italic></sub> = &#x02212;0.003, <italic>p</italic> &#x0003C; 0.001) interactions, such that learning curves were steeper for Type 5. See <xref ref-type="fig" rid="F4">Figure 4B</xref> for all model predictor effects.</p>
<p>To test the extent to which these unique predictions made by BR-DIVA are reflected in the real world, we ran a linear mixed effects model, predicting correct responses by participants from two previously collected datasets (Lewandowsky, <xref ref-type="bibr" rid="B30">2011</xref>; Nosofsky et al., <xref ref-type="bibr" rid="B43">1994a</xref>) from type (3&#x02013;5), block, and their interaction. We also included subject IDs and which dataset the data came from as random effects. As was expected, there was a main effect of block (&#x003B2;<sub><italic>block</italic></sub> = 0.02, <italic>p</italic> &#x0003C; 0.001); however, consistent with the predictions made by BR-DIVA, there were also main effects of Type 3 (&#x003B2;<sub>3 &#x02212; 5</sub> = 0.04, <italic>p</italic> &#x0003C; 0.001) and Type 4 (&#x003B2;<sub>4 &#x02212; 5</sub> = 0.02, <italic>p</italic> = 0.034). Interactions between block and Types 3 and 4 were not statistically significant (both |&#x003B2;s| &#x0003C; 0.001, both <italic>ps</italic> &#x0003E; 0.604). See <xref ref-type="fig" rid="F4">Figure 4C</xref> for all model predictor effects. <xref ref-type="fig" rid="F4">Figure 4D</xref> shows a schematic meant to visualize a plausible explanation for these results, which is further expounded upon in the discussion. Additionally, <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 3</xref> shows the low-dimensional representations of each category for BR-DIVA, as well as inter-item distances in the low-dimensional space, which reveals that BR-DIVA represents Type 5 exception stimuli as further from rule-following stimuli than for Types 3 and 4 exception stimuli. <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 4</xref> provides further evidence for this notion that Type 5 difficulty is a function of its inter-item distances by visualizing error rates across blocks split into rule-following and exception stimuli. Whereas, for Types 3 and 4 exception stimuli are learned at a pace similar to their rule-following stimuli, Type 5 shows that exception stimuli error rates remain higher than rule-following error rates until roughly the 15th block. Notably, however, this interpretation is incomplete as <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 3</xref> shows that low dimensional representations of Type 4 exception stimuli are further from rule-following stimuli than Type 3&#x02032;s exception stimuli.</p></sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>Discussion</title>
<p>In this brief report, we simulated performance on the canonical Six Problems known to elucidate general category learning behavior (Shepard et al., <xref ref-type="bibr" rid="B58">1961</xref>) using an autoencoder model that applies principles of efficient coding (Barlow, <xref ref-type="bibr" rid="B4">2013</xref>) to encode information in a boundedly rational manner. We showed that this model&#x02014;BR-DIVA&#x02014;captures the classical order of difficulty observed on the Six Problems (Nosofsky et al., <xref ref-type="bibr" rid="B43">1994a</xref>; Shepard et al., <xref ref-type="bibr" rid="B58">1961</xref>). Beyond these findings, the boundedly rational model proposed here predicted lower accuracy on Type 5 than what is predicted by the autoencoding model it is based on. Importantly, we found that this unique prediction is more aligned with empirical data than the base model. We discuss and speculate on this finding next.</p>
<sec>
<title>Type 5 is more difficult than Types 3 and 4</title>
<p>The classical Six Problems of category learning introduced in Shepard et al. (<xref ref-type="bibr" rid="B58">1961</xref>) produced substantial excitement about Types 1, 2, 4, and sometimes 6. Many studies that use the Six Problems only focus on this subset (Kurtz et al., <xref ref-type="bibr" rid="B29">2013</xref>; Love, <xref ref-type="bibr" rid="B32">2002</xref>; Minda et al., <xref ref-type="bibr" rid="B36">2008</xref>; Rabi and Minda, <xref ref-type="bibr" rid="B49">2016</xref>; Rehder and Hoffman, <xref ref-type="bibr" rid="B51">2005b</xref>). Since the findings from Shepard and colleagues, there has been a tendency to lump performance on Types 3&#x02013;5 together, as if they were the same category structures. Indeed, they do all adhere to a rule-plus-exception design (Nosofsky et al., <xref ref-type="bibr" rid="B44">1994b</xref>); however, it is perhaps notable that the boundedly rational model put forth in the current paper consistently predicted worse performance on Type 5 than Types 3 and 4. This prediction did not reach statistical significance in the model on which the boundedly rational model is based on (i.e., DIVA). When comparing boundedly-rational-DIVA and DIVA to empirically observed performance differences between Type 5 and Types 3 and 4, we found that the data is more consistent with the boundedly-rational-DIVA&#x00027;s predictions.</p>
<p>One possible explanation for this discrepancy is in terms of information gain, which expresses the amount of information gained about a signal by observing another variable (Mathy, <xref ref-type="bibr" rid="B35">2010</xref>). For example, by learning the weather one is likely better able to gauge what clothes a random person will be wearing. Thus, knowing the weather reduces one&#x00027;s uncertainty about what clothes people will be wearing. In terms of the Six Problems, information gain is relevant because it denotes the amount of information a given stimulus supplies about the categories. This notion is particularly important for rule-plus-exception category structures because it is assumed that people will learn the unidimensional rule first (<xref ref-type="fig" rid="F4">Figure 4D</xref>, black distribution), in which case learning of the exception stimulus (<xref ref-type="fig" rid="F4">Figure 4D</xref>, colored distributions) is a function of how distinct it is from the rule-following stimuli (<xref ref-type="fig" rid="F4">Figure 4D</xref>, distance between black and colored distributions). In other words, learning a rule first to categorize stimuli will induce a bias toward the rule-following stimuli. As such, the more distinct (i.e., the more informative or the further from the bias) the exception stimulus is, the harder it will be to learn it. Consistent with this interpretation, the exception stimulus in Type 5 has a larger average distance from Type 5&#x02032;s rule-following stimuli than Types 3 or 4. This within-category distance measure is proportional to a commonly used metric known as <italic>structure ratios</italic> (Conaway and Kurtz, <xref ref-type="bibr" rid="B13">2017</xref>). This interpretation is also in line with Nosofsky et al. (<xref ref-type="bibr" rid="B44">1994b</xref>)&#x00027;s RULEX model, which suggests that people test simple rules first and gradually hypothesize more complex rules if the simpler ones fail. In <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 3</xref>, the hidden unit activations for each of the eight stimuli in Type 5 from a representative simulation are plotted and visualized based on both category and whether the stimulus adhered to a unidimensional rule or not. Interestingly, this figure shows that exception stimuli are represented as further from rule-following stimuli within the same category (e.g., compare inter-item distances between red triangles and red circle, and between blue triangles and blue circle). <xref ref-type="supplementary-material" rid="SM1">Supplementary Figure 3</xref> also shows that these distances are significantly greater than rule-to-exception stimulus distances for Types 3 and 4. Thus, the low-dimensional representations of stimuli are consistent with the interpretation visualized in <xref ref-type="fig" rid="F4">Figure 4D</xref>. Together, this highlights the importance of priors during the process of learning categories and that, at least some, category structures&#x00027; difficulty is a function of balancing representational precision with complexity.</p>
</sec>
<sec>
<title>Limitations</title>
<p>The current work is not meant to encompass all categorization phenomena. Indeed, the current work only tested one category learning paradigm (i.e., classification), comprised of relatively simple stimuli. The simplicity of the stimuli actually limits the amount of dimensionality reduction that could be performed by BR-DIVA in the current work, given that stimuli were three-dimensional and the bottleneck layer was two-dimensional. This could also be why there was no difference across simulations with different &#x003B2;<italic>s</italic>. Future work will need to test for BR-DIVA&#x00027;s applicability to higher dimensional, naturalistic, and continuous stimuli, in addition to other paradigms, such as inference training and function learning. The current work was meant to take the first steps toward more broader applications, and thus, we generated BR-DIVA predictions and compared them with empirical data. Future studies will need to pit BR-DIVA against leading computational models of categorization such as SUSTAIN (Love et al., <xref ref-type="bibr" rid="B34">2004</xref>), and prototype and exemplar models (Minda and Smith, <xref ref-type="bibr" rid="B37">2002</xref>; Nosofsky, <xref ref-type="bibr" rid="B40">1986</xref>, <xref ref-type="bibr" rid="B41">1987</xref>, <xref ref-type="bibr" rid="B42">1992</xref>; Smith and Minda, <xref ref-type="bibr" rid="B59">2000</xref>, <xref ref-type="bibr" rid="B60">2002</xref>). Moreover, while we did observe differences between Type 5 and Types 3 and 4 in empirical data, the analysis revealing this difference was targeted by using a subset of the overall dataset (only Types 3&#x02013;5). As such, and in combination with many previous studies showing minimal performance discrepancies between these category structures, it is likely that this effect is quite subtle and future studies will need to test this prediction explicitly before any conclusive interpretations can be made.</p></sec>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="author-contributions" id="s7">
<title>Author contributions</title>
<p>TH: Writing &#x02013; review &#x00026; editing, Writing &#x02013; original draft.</p>
</sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s10">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1477514/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1477514/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.DOCX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/></sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alemi</surname> <given-names>A. A.</given-names></name> <name><surname>Fischer</surname> <given-names>I.</given-names></name> <name><surname>Dillon</surname> <given-names>J. V.</given-names></name> <name><surname>Murphy</surname> <given-names>K.</given-names></name></person-group> (<year>2017a</year>). <article-title>&#x0201C;Deep variational information bottleneck,&#x0201D;</article-title> in <source>5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings</source>. arXiv:1612.00410v2.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alemi</surname> <given-names>A. A.</given-names></name> <name><surname>Poole</surname> <given-names>B.</given-names></name> <name><surname>Fischer</surname> <given-names>I.</given-names></name> <name><surname>Dillon</surname> <given-names>J. V.</given-names></name> <name><surname>Saurous</surname> <given-names>R. A.</given-names></name> <name><surname>Murphy</surname> <given-names>K.</given-names></name></person-group> (<year>2017b</year>). <source>Information Theoretic Analysis of Deep Latent Variable Models. arXiv [Preprint]</source>. arXiv:1711.00464v3.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Azeredo da Silveira</surname> <given-names>R.</given-names></name> <name><surname>Sung</surname> <given-names>Y.</given-names></name> <name><surname>Woodford</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>Optimally imprecise memory and biased forecasts</article-title>. <source>SSRN Electr. J</source>. <volume>2021</volume>:<fpage>3731244</fpage>. <pub-id pub-id-type="doi">10.2139/ssrn.3731244</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barlow</surname> <given-names>H. B.</given-names></name></person-group> (<year>2013</year>). <article-title>Possible principles underlying the transformations of sensory messages</article-title>. <source>Sens. Commun</source>. <volume>3</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.7551/mitpress/9780262518420.003.0013</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barretto-Garc&#x000ED;a</surname> <given-names>M.</given-names></name> <name><surname>de Hollander</surname> <given-names>G.</given-names></name> <name><surname>Grueschow</surname> <given-names>M.</given-names></name> <name><surname>Polan&#x000ED;a</surname> <given-names>R.</given-names></name> <name><surname>Woodford</surname> <given-names>M.</given-names></name> <name><surname>Ruff</surname> <given-names>C. C.</given-names></name></person-group> (<year>2023</year>). <article-title>Individual risk attitudes arise from noise in neurocognitive magnitude representations</article-title>. <source>Nat. Hum. Behav.</source> <volume>7</volume>:<fpage>4</fpage>. <pub-id pub-id-type="doi">10.1038/s41562-023-01643-4</pub-id><pub-id pub-id-type="pmid">37460762</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bates</surname> <given-names>C. J.</given-names></name> <name><surname>Jacobs</surname> <given-names>R. A.</given-names></name></person-group> (<year>2020</year>). <article-title>Efficient data compression in perception and perceptual memory</article-title>. <source>Psychol. Rev</source>. <volume>2020</volume>:<fpage>rev0000197</fpage>. <pub-id pub-id-type="doi">10.1037/rev0000197</pub-id><pub-id pub-id-type="pmid">32324016</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ben-David</surname> <given-names>S.</given-names></name> <name><surname>Schuller</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>&#x0201C;Exploiting task relatedness for multiple task learning,&#x0201D;</article-title> in <source>Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). Vol</source>. <italic>2777</italic> (Berlin; Heidelberg: Springer).</citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bernardi</surname> <given-names>S.</given-names></name> <name><surname>Benna</surname> <given-names>M. K.</given-names></name> <name><surname>Rigotti</surname> <given-names>M.</given-names></name> <name><surname>Munuera</surname> <given-names>J.</given-names></name> <name><surname>Fusi</surname> <given-names>S.</given-names></name> <name><surname>Daniel Salzman</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>The geometry of abstraction in the hippocampus and prefrontal cortex</article-title>. <source>Cell</source> <volume>183</volume>:<fpage>31</fpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2020.09.031</pub-id><pub-id pub-id-type="pmid">33058757</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bozkurt</surname> <given-names>A.</given-names></name> <name><surname>Esmaeili</surname> <given-names>B.</given-names></name> <name><surname>Tristan</surname> <given-names>J. -B.</given-names></name> <name><surname>Brooks</surname> <given-names>D.</given-names></name> <name><surname>Dy</surname> <given-names>J.</given-names></name> <name><surname>van de Meent</surname> <given-names>J.-W.</given-names></name></person-group> (<year>2021</year>). <article-title>&#x0201C;Rate-regularization and generalization in variational autoencoders,&#x0201D;</article-title> in <source>Proceedings of the 24th International Conference on Artificial Intelligence and Statistics</source>, 130. Retrieved from: <ext-link ext-link-type="uri" xlink:href="https://par.nsf.gov/biblio/10280434">https://par.nsf.gov/biblio/10280434</ext-link></citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Caruana</surname> <given-names>R.</given-names></name></person-group> (<year>1994</year>). <article-title>&#x0201C;Learning many related tasks at the same time with backpropagation,&#x0201D;</article-title> in <source>NIPS 1994: Proceedings of the 7th International Conference on Neural Information Processing Systems</source> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>).</citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Caruana</surname> <given-names>R.</given-names></name></person-group> (<year>1996</year>). <article-title>&#x0201C;Algorithms and applications for multitask learning,&#x0201D;</article-title> in <source>Conference on Machine Learning</source> (<publisher-loc>San Francisco, CA</publisher-loc>: <publisher-name>Morgan Kaufmann Publishers Inc.</publisher-name>), 12.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caruana</surname> <given-names>R.</given-names></name></person-group> (<year>1997</year>). <article-title>Multitask learning</article-title>. <source>Machine Learn.</source> <volume>28</volume>:<fpage>34</fpage>. <pub-id pub-id-type="doi">10.1023/A:1007379606734</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Conaway</surname> <given-names>N.</given-names></name> <name><surname>Kurtz</surname> <given-names>K. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Similar to the category, but not the exemplars: a study of generalization</article-title>. <source>Psychon. Bullet. Rev.</source> <volume>24</volume>:<fpage>1</fpage>. <pub-id pub-id-type="doi">10.3758/s13423-016-1208-1</pub-id><pub-id pub-id-type="pmid">27981437</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Cover</surname> <given-names>T. M.</given-names></name> <name><surname>Thomas</surname> <given-names>J. A.</given-names></name></person-group> (<year>1991</year>). <source>Elements of Information Theory</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dang</surname> <given-names>W.</given-names></name> <name><surname>Jaffe</surname> <given-names>R. J.</given-names></name> <name><surname>Qi</surname> <given-names>X. L.</given-names></name> <name><surname>Constantinidis</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Emergence of non-linear mixed selectivity in prefrontal cortex after training</article-title>. <source>J. Neurosci.</source> <volume>41</volume>:<fpage>20</fpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.2814-20.2021</pub-id><pub-id pub-id-type="pmid">34301827</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Driscoll</surname> <given-names>L. N.</given-names></name> <name><surname>Shenoy</surname> <given-names>K.</given-names></name> <name><surname>Sussillo</surname> <given-names>D.</given-names></name></person-group> (<year>2024</year>). <article-title>Flexible multitask computation in recurrent networks utilizes shared dynamical motifs</article-title>. <source>Nat. Neurosci.</source> <volume>27</volume>:<fpage>6</fpage>. <pub-id pub-id-type="doi">10.1038/s41593-024-01668-6</pub-id><pub-id pub-id-type="pmid">38982201</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garner</surname> <given-names>K. G.</given-names></name> <name><surname>Dux</surname> <given-names>P. E.</given-names></name></person-group> (<year>2023</year>). <article-title>Knowledge generalization and the costs of multitasking</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>24</volume>:<fpage>653</fpage>. <pub-id pub-id-type="doi">10.1038/s41583-022-00653-x</pub-id><pub-id pub-id-type="pmid">36347942</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gershman</surname> <given-names>S. J.</given-names></name> <name><surname>Daw</surname> <given-names>N. D.</given-names></name></person-group> (<year>2017</year>). <article-title>Reinforcement learning and episodic memory in humans and animals: an integrative framework</article-title>. <source>Ann. Rev. Psychol.</source> <volume>68</volume>:<fpage>33625</fpage>. <pub-id pub-id-type="doi">10.1146/annurev-psych-122414-033625</pub-id><pub-id pub-id-type="pmid">27618944</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Goldstone</surname> <given-names>R. L.</given-names></name> <name><surname>Kersten</surname> <given-names>A.</given-names></name> <name><surname>Carvalho</surname> <given-names>P. F.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Categorization and concepts,&#x0201D;</article-title> in <source>Stevens&#x00027; Handbook of Experimental Psychology and Cognitive Neuroscience</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Wiley</publisher-name>).</citation>
</ref>
<ref id="B20">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Higgins</surname> <given-names>I.</given-names></name> <name><surname>Matthey</surname> <given-names>L.</given-names></name> <name><surname>Pal</surname> <given-names>A.</given-names></name> <name><surname>Burgess</surname> <given-names>C.</given-names></name> <name><surname>Glorot</surname> <given-names>X.</given-names></name> <name><surname>Botvinick</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>&#x0201C;&#x003B2;-VAE: learning basic visual concepts with a constrained variational framework,&#x0201D;</article-title> in <source>5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings</source>. Available at: <ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=Sy2fzU9gl">https://openreview.net/forum?id=Sy2fzU9gl</ext-link></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jeffrey</surname> <given-names>J. W.</given-names></name> <name><surname>Palmer</surname> <given-names>S. E.</given-names></name> <name><surname>Freedman</surname> <given-names>D. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Nonlinear mixed selectivity supports reliable neural computation</article-title>. <source>PLoS Comput. Biol.</source> <volume>16</volume>:<fpage>1007544</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1007544</pub-id><pub-id pub-id-type="pmid">32069273</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaufman</surname> <given-names>M. T.</given-names></name> <name><surname>Benna</surname> <given-names>M. K.</given-names></name> <name><surname>Rigotti</surname> <given-names>M.</given-names></name> <name><surname>Stefanini</surname> <given-names>F.</given-names></name> <name><surname>Fusi</surname> <given-names>S.</given-names></name> <name><surname>Churchland</surname> <given-names>A. K.</given-names></name></person-group> (<year>2022</year>). <article-title>The implications of categorical and category-free mixed selectivity on representational geometries</article-title>. <source>Curr. Opin. Neurobiol.</source> <volume>77</volume>:<fpage>102644</fpage>. <pub-id pub-id-type="doi">10.1016/j.conb.2022.102644</pub-id><pub-id pub-id-type="pmid">36332415</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Salimans</surname> <given-names>T.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Variational dropout and the local reparameterization trick,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems. Vols. 2015-January</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>).</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Welling</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>An introduction to variational autoencoders</article-title>. <source>Found. Trends Machine Learn.</source> <volume>12</volume>:<fpage>56</fpage>. <pub-id pub-id-type="doi">10.1561/2200000056</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kira</surname> <given-names>S.</given-names></name> <name><surname>Safaai</surname> <given-names>H.</given-names></name> <name><surname>Morcos</surname> <given-names>A. S.</given-names></name> <name><surname>Panzeri</surname> <given-names>S.</given-names></name> <name><surname>Harvey</surname> <given-names>C. D.</given-names></name></person-group> (<year>2023</year>). <article-title>A distributed and efficient population code of mixed selectivity neurons for flexible navigation decisions</article-title>. <source>Nat. Commun.</source> <volume>14</volume>:<fpage>2</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-023-37804-2</pub-id><pub-id pub-id-type="pmid">37055431</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolen</surname> <given-names>J. F.</given-names></name> <name><surname>Pollack</surname> <given-names>J. B.</given-names></name></person-group> (<year>1990</year>). <article-title>Back propagation is sensitive to initial conditions</article-title>. <source>Compl. Syst.</source> <volume>1990</volume>:<fpage>4</fpage>.</citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kurtz</surname> <given-names>K. J.</given-names></name></person-group> (<year>2007</year>). <article-title>The Divergent Autoencoder (DIVA) model of category learning</article-title>. <source>Psychon. Bullet. Rev.</source> <volume>14</volume>:<fpage>560</fpage>&#x02013;<lpage>576</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196806</pub-id><pub-id pub-id-type="pmid">17972718</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kurtz</surname> <given-names>K. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Human category learning: toward a broader explanatory account</article-title>. <source>Psychol. Learn. Motivat.</source> <volume>63</volume>:<fpage>77</fpage>&#x02013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1016/bs.plm.2015.03.001</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kurtz</surname> <given-names>K. J.</given-names></name> <name><surname>Levering</surname> <given-names>K. R.</given-names></name> <name><surname>Stanton</surname> <given-names>R. D.</given-names></name> <name><surname>Romero</surname> <given-names>J.</given-names></name> <name><surname>Morris</surname> <given-names>S. N.</given-names></name></person-group> (<year>2013</year>). <article-title>Human learning of elemental category structures: revising the classic result of Shepard, Hovland, and Jenkins (1961)</article-title>. <source>J. Exp. Psychol.</source> 39:a0029178. <pub-id pub-id-type="doi">10.1037/a0029178</pub-id><pub-id pub-id-type="pmid">22799282</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewandowsky</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Working memory capacity and categorization: individual differences and modeling</article-title>. <source>J. Exp. Psychol.</source> <volume>37</volume>:<fpage>a0022639</fpage>. <pub-id pub-id-type="doi">10.1037/a0022639</pub-id><pub-id pub-id-type="pmid">21417512</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>V.</given-names></name> <name><surname>Casta&#x000F1;&#x000F3;n</surname> <given-names>S. H.</given-names></name> <name><surname>Solomon</surname> <given-names>J. A.</given-names></name> <name><surname>Vandormael</surname> <given-names>H.</given-names></name> <name><surname>Summerfield</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Robust averaging protects decisions from noise in neural computations</article-title>. <source>PLoS Comput. Biol.</source> <volume>13</volume>:<fpage>e1005723</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005723</pub-id><pub-id pub-id-type="pmid">28841644</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Love</surname> <given-names>B. C.</given-names></name></person-group> (<year>2002</year>). <article-title>Comparing supervised and unsupervised category learning</article-title>. <source>Psychon. Bullet. Rev.</source> <volume>9</volume>:<fpage>829</fpage>&#x02013;<lpage>835</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196342</pub-id><pub-id pub-id-type="pmid">12613690</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Love</surname> <given-names>B. C.</given-names></name> <name><surname>Markman</surname> <given-names>A. B.</given-names></name></person-group> (<year>2003</year>). <article-title>The nonindependence of stimulus properties in human category learning</article-title>. <source>Mem. Cogn.</source> <volume>31</volume>:<fpage>790</fpage>&#x02013;<lpage>799</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196117</pub-id><pub-id pub-id-type="pmid">12956243</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Love</surname> <given-names>B. C.</given-names></name> <name><surname>Medin</surname> <given-names>D. L.</given-names></name> <name><surname>Gureckis</surname> <given-names>T. M.</given-names></name></person-group> (<year>2004</year>). <article-title>SUSTAIN: a network model of category learning</article-title>. <source>Psychol. Rev.</source> <volume>111</volume>:<fpage>309</fpage>&#x02013;<lpage>332</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.111.2.309</pub-id><pub-id pub-id-type="pmid">15065912</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathy</surname> <given-names>F.</given-names></name></person-group> (<year>2010</year>). <article-title>Assessing conceptual complexity and compressibility using information gain and mutual information</article-title>. <source>Tutor. Quant. Methods Psychol.</source> <volume>6</volume>, <fpage>16</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.20982/tqmp.06.1.p016</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Minda</surname> <given-names>J. P.</given-names></name> <name><surname>Desroches</surname> <given-names>A. S.</given-names></name> <name><surname>Church</surname> <given-names>B. A.</given-names></name></person-group> (<year>2008</year>). <article-title>Learning rule-described and non-rule-described categories: a comparison of children and adults</article-title>. <source>J. Exp. Psychol.</source> <volume>34</volume>:<fpage>a0013355</fpage>. <pub-id pub-id-type="doi">10.1037/a0013355</pub-id><pub-id pub-id-type="pmid">18980411</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Minda</surname> <given-names>J. P.</given-names></name> <name><surname>Smith</surname> <given-names>J. D.</given-names></name></person-group> (<year>2002</year>). <article-title>Comparing prototype-based and exemplar-based accounts of category learning and attentional allocation</article-title>. <source>J. Exp. Psychol.</source> <volume>28</volume>, <fpage>275</fpage>&#x02013;<lpage>292</lpage>. <pub-id pub-id-type="doi">10.1037//0278-7393.28.2.275</pub-id><pub-id pub-id-type="pmid">11911384</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Monshizadeh</surname> <given-names>M.</given-names></name> <name><surname>Khatri</surname> <given-names>V.</given-names></name> <name><surname>Gamdou</surname> <given-names>M.</given-names></name> <name><surname>Kantola</surname> <given-names>R.</given-names></name> <name><surname>Yan</surname> <given-names>Z.</given-names></name></person-group> (<year>2021</year>). <article-title>Improving data generalization with variational autoencoders for network traffic anomaly detection</article-title>. <source>IEEE Access</source> <volume>9</volume>, <fpage>2169</fpage>&#x02013;<lpage>3536</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3072126</pub-id></citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Niv</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning task-state representations</article-title>. <source>Nat. Neurosci.</source> <volume>22</volume>, <fpage>1544</fpage>&#x02013;<lpage>1553</lpage>. <pub-id pub-id-type="doi">10.1038/s41593-019-0470-8</pub-id><pub-id pub-id-type="pmid">31551597</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nosofsky</surname> <given-names>R. M.</given-names></name></person-group> (<year>1986</year>). <article-title>Attention, similarity, and the identification-categorization relationship</article-title>. <source>J. Exp. Psychol.</source> <volume>115</volume>, <fpage>39</fpage>&#x02013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1037//0096-3445.115.1.39</pub-id><pub-id pub-id-type="pmid">2937873</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nosofsky</surname> <given-names>R. M.</given-names></name></person-group> (<year>1987</year>). <article-title>Attention and learning processes in the identification and categorization of integral stimuli</article-title>. <source>J. Exp. Psychol.</source> <volume>13</volume>, <fpage>87</fpage>&#x02013;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1037//0278-7393.13.1.87</pub-id><pub-id pub-id-type="pmid">2949055</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nosofsky</surname> <given-names>R. M.</given-names></name></person-group> (<year>1992</year>). <source>Exemplar, Prototypes, and Similarity Rules. From Learning Theory to Connectionist Theory: Essays in Honor of William K. Estes, Vol</source>. <italic>1</italic>. Lawrence Erlbaum Associates, Inc.</citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nosofsky</surname> <given-names>R. M.</given-names></name> <name><surname>Gluck</surname> <given-names>M. A.</given-names></name> <name><surname>Palmeri</surname> <given-names>T. J.</given-names></name> <name><surname>Mckinley</surname> <given-names>S. C.</given-names></name> <name><surname>Glauthier</surname> <given-names>P.</given-names></name></person-group> (<year>1994a</year>). <article-title>Comparing modes of rule-based classification learning: a replication and extension of Shepard, Hovland, and Jenkins (1961)</article-title>. <source>Mem. Cogn.</source> <volume>22</volume>, <fpage>352</fpage>&#x02013;<lpage>362</lpage>. <pub-id pub-id-type="doi">10.3758/BF03200862</pub-id><pub-id pub-id-type="pmid">8007837</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nosofsky</surname> <given-names>R. M.</given-names></name> <name><surname>Palmeri</surname> <given-names>T. J.</given-names></name> <name><surname>McKinley</surname> <given-names>S. C.</given-names></name></person-group> (<year>1994b</year>). <article-title>Rule-plus-exception model of classification learning</article-title>. <source>Psychol. Rev.</source> <volume>101</volume>, <fpage>53</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.101.1.53</pub-id><pub-id pub-id-type="pmid">8121960</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oja</surname> <given-names>E.</given-names></name></person-group> (<year>1989</year>). <article-title>Neural networks, principal components, and subspaces</article-title>. <source>Int. J. Neural Syst.</source> <volume>18</volume>:<fpage>475</fpage>. <pub-id pub-id-type="doi">10.1142/S0129065789000475</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parthasarathy</surname> <given-names>A.</given-names></name> <name><surname>Herikstad</surname> <given-names>R.</given-names></name> <name><surname>Bong</surname> <given-names>J. H.</given-names></name> <name><surname>Medina</surname> <given-names>F. S.</given-names></name> <name><surname>Libedinsky</surname> <given-names>C.</given-names></name> <name><surname>Yen</surname> <given-names>S. C.</given-names></name></person-group> (<year>2017</year>). <article-title>Mixed selectivity morphs population codes in prefrontal cortex</article-title>. <source>Nat. Neurosci.</source> <volume>20</volume>, <fpage>1770</fpage>&#x02013;<lpage>1779</lpage>. <pub-id pub-id-type="doi">10.1038/s41593-017-0003-2</pub-id><pub-id pub-id-type="pmid">29184197</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prat-Carrabin</surname> <given-names>A.</given-names></name> <name><surname>Woodford</surname> <given-names>M.</given-names></name></person-group> (<year>2022</year>). <article-title>Efficient coding of numbers explains decision bias and noise</article-title>. <source>Nat. Hum. Behav.</source> <volume>6</volume>, <fpage>1142</fpage>&#x02013;<lpage>1152</lpage>. <pub-id pub-id-type="doi">10.1038/s41562-022-01352-4</pub-id><pub-id pub-id-type="pmid">35637295</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prat-Carrabin</surname> <given-names>A.</given-names></name> <name><surname>Woodford</surname> <given-names>M.</given-names></name></person-group> (<year>2024</year>). <article-title>Imprecise probabilistic inference from sequential data</article-title>. <source>Psychol. Rev</source>. <volume>131</volume>, <fpage>1161</fpage>&#x02013;<lpage>1207</lpage>. <pub-id pub-id-type="doi">10.1037/rev0000469</pub-id><pub-id pub-id-type="pmid">38635157</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rabi</surname> <given-names>R.</given-names></name> <name><surname>Minda</surname> <given-names>J. P.</given-names></name></person-group> (<year>2016</year>). <article-title>Category learning in older adulthood: a study of the Shepard, Hovland, and Jenkins (1961) Tasks</article-title>. <source>Psychol. Aging</source> <volume>31</volume>, <fpage>185</fpage>&#x02013;<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1037/pag0000071</pub-id><pub-id pub-id-type="pmid">26765750</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rehder</surname> <given-names>B.</given-names></name> <name><surname>Hoffman</surname> <given-names>A. B.</given-names></name></person-group> (<year>2005a</year>). <article-title>Eyetracking and selective attention in category learning</article-title>. <source>Cogn. Psychol.</source> <volume>51</volume>, <fpage>1</fpage>&#x02013;<lpage>41</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogpsych.2004.11.001</pub-id><pub-id pub-id-type="pmid">16039934</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rehder</surname> <given-names>B.</given-names></name> <name><surname>Hoffman</surname> <given-names>A. B.</given-names></name></person-group> (<year>2005b</year>). <article-title>Thirty-something categorization results explained: selective attention, eyetracking, and models of category learning</article-title>. <source>J. Exp. Psychol.</source> <volume>31</volume>, <fpage>811</fpage>&#x02013;<lpage>829</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.31.5.811</pub-id><pub-id pub-id-type="pmid">16248736</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rigotti</surname> <given-names>M.</given-names></name> <name><surname>Barak</surname> <given-names>O.</given-names></name> <name><surname>Warden</surname> <given-names>M. R.</given-names></name> <name><surname>Wang</surname> <given-names>X. J.</given-names></name> <name><surname>Daw</surname> <given-names>N. D.</given-names></name> <name><surname>Miller</surname> <given-names>E. K.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>The importance of mixed selectivity in complex cognitive tasks</article-title>. <source>Nature</source> <volume>497</volume>, <fpage>585</fpage>&#x02013;<lpage>590</lpage>. <pub-id pub-id-type="doi">10.1038/nature12160</pub-id><pub-id pub-id-type="pmid">23685452</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rumelhart</surname> <given-names>D. E.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Williams</surname> <given-names>R. J.</given-names></name></person-group> (<year>1986</year>). <article-title>Learning representations by back-propagating errors</article-title>. <source>Nature</source> <volume>323</volume>, <fpage>533</fpage>&#x02013;<lpage>536</lpage>. <pub-id pub-id-type="doi">10.1038/323533a0</pub-id><pub-id pub-id-type="pmid">37022259</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanh</surname> <given-names>V.</given-names></name> <name><surname>Webson</surname> <given-names>A.</given-names></name> <name><surname>Raffel</surname> <given-names>C.</given-names></name> <name><surname>Bach</surname> <given-names>S. H.</given-names></name> <name><surname>Sutawika</surname> <given-names>L.</given-names></name> <name><surname>Alyafeai</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2022</year>). <article-title>&#x0201C;Multitask prompted training enables zero-shot task generalization,&#x0201D;</article-title> in <source>ICLR 2022 - 10th International Conference on Learning Representations</source>. arXiv:2110.08207v3.</citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shepard</surname> <given-names>R. N.</given-names></name></person-group> (<year>1957</year>). <article-title>Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space</article-title>. <source>Psychometrika</source> <volume>22</volume>, <fpage>325</fpage>&#x02013;<lpage>345</lpage>. <pub-id pub-id-type="doi">10.1007/BF02288967</pub-id><pub-id pub-id-type="pmid">13563763</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shepard</surname> <given-names>R. N.</given-names></name></person-group> (<year>1987</year>). <article-title>Toward a universal law of generalization for psychological science</article-title>. <source>Science</source> <volume>237</volume>, <fpage>1317</fpage>&#x02013;<lpage>1323</lpage>. <pub-id pub-id-type="doi">10.1126/science.3629243</pub-id><pub-id pub-id-type="pmid">3629243</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shepard</surname> <given-names>R. N.</given-names></name></person-group> (<year>1994</year>). <article-title>Perceptual-cognitive universals as reflections of the world</article-title>. <source>Psychon. Bullet. Rev.</source> <volume>1</volume>, <fpage>2</fpage>&#x02013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.3758/BF03200759</pub-id><pub-id pub-id-type="pmid">24203412</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shepard</surname> <given-names>R. N.</given-names></name> <name><surname>Hovland</surname> <given-names>C. I.</given-names></name> <name><surname>Jenkins</surname> <given-names>H. M.</given-names></name></person-group> (<year>1961</year>). <article-title>Learning and memorization of classifications</article-title>. <source>Psychol. Monogr.</source> <volume>75</volume>, <fpage>1</fpage>&#x02013;<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1037/h0093825</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>J. D.</given-names></name> <name><surname>Minda</surname> <given-names>J. P.</given-names></name></person-group> (<year>2000</year>). <article-title>Thirty categorization results in search of a model</article-title>. <source>J. Exp. Psychol.</source> <volume>26</volume>, <fpage>3</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1037//0278-7393.26.1.3</pub-id><pub-id pub-id-type="pmid">10682288</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>J. D.</given-names></name> <name><surname>Minda</surname> <given-names>J. P.</given-names></name></person-group> (<year>2002</year>). <article-title>Distinguishing prototype-based and exemplar-based processes in dot-pattern category learning</article-title>. <source>J. Exp. Psychol.</source> <volume>28</volume>, <fpage>800</fpage>&#x02013;<lpage>811</lpage>. <pub-id pub-id-type="doi">10.1037//0278-7393.28.4.800</pub-id><pub-id pub-id-type="pmid">12109770</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>J. D.</given-names></name> <name><surname>Minda</surname> <given-names>J. P.</given-names></name> <name><surname>Washbum</surname> <given-names>D. A.</given-names></name></person-group> (<year>2004</year>). <article-title>Category learning in rhesus monkeys: a study of the Shepard, Hovland, and Jenkins (1961) tasks</article-title>. <source>J. Exp. Psychol.</source> <volume>133</volume>, <fpage>398</fpage>&#x02013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1037/0096-3445.133.3.398</pub-id><pub-id pub-id-type="pmid">15355146</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spitzer</surname> <given-names>B.</given-names></name> <name><surname>Waschke</surname> <given-names>L.</given-names></name> <name><surname>Summerfield</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Selective overweighting of larger magnitudes during noisy numerical comparison</article-title>. <source>Nat. Hum. Behav.</source> <volume>1</volume>:<fpage>e0145</fpage>. <pub-id pub-id-type="doi">10.1038/s41562-017-0145</pub-id><pub-id pub-id-type="pmid">32340412</pub-id></citation></ref>
<ref id="B63">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Steck</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Autoencoders that don&#x00027;t overfit towards the identity,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems. Vols. 2020-December</source> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>).</citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wards</surname> <given-names>Y.</given-names></name> <name><surname>Ehrhardt</surname> <given-names>S. E.</given-names></name> <name><surname>Filmer</surname> <given-names>H. L.</given-names></name> <name><surname>Mattingley</surname> <given-names>J. B.</given-names></name> <name><surname>Garner</surname> <given-names>K. G.</given-names></name> <name><surname>Dux</surname> <given-names>P. E.</given-names></name></person-group> (<year>2023</year>). <article-title>Neural substrates of individual differences in learning generalization via combined brain stimulation and multitasking training</article-title>. <source>Cerebr. Cortex</source> <volume>33</volume>, <fpage>11679</fpage>&#x02013;<lpage>11694</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/bhad406</pub-id><pub-id pub-id-type="pmid">37930735</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wills</surname> <given-names>A. J.</given-names></name> <name><surname>O&#x00027;Connell</surname> <given-names>G.</given-names></name> <name><surname>Edmunds</surname> <given-names>C. E. R.</given-names></name> <name><surname>Inkster</surname> <given-names>A. B.</given-names></name></person-group> (<year>2017</year>). <article-title>Progress in modeling through distributed collaboration: concepts, tools and category-learning examples</article-title>. <source>Psychol. Learn. Motivat.</source> <volume>66</volume>, <fpage>79</fpage>&#x02013;<lpage>115</lpage>. <pub-id pub-id-type="doi">10.1016/bs.plm.2016.11.007</pub-id></citation>
</ref>
</ref-list>
</back>
</article>