<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="brief-report" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1531334</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2024.1531334</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Perspective</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>AI foundation models for experimental fusion tasks</article-title>
<alt-title alt-title-type="left-running-head">Churchill</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fphy.2024.1531334">10.3389/fphy.2024.1531334</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Churchill</surname>
<given-names>R. Michael</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/2871826/overview"/>
<role content-type="https://credit.niso.org/contributor-roles/writing-original-draft/"/>
<role content-type="https://credit.niso.org/contributor-roles/Writing - review &#x26; editing/"/>
</contrib>
</contrib-group>
<aff>
<institution>Princeton Plasma Physics Laboratory</institution>, <addr-line>Princeton</addr-line>, <addr-line>NJ</addr-line>, <country>United States</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1921765/overview">Alessandro Maffini</ext-link>, Polytechnic University of Milan, Italy</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/2126340/overview">Riccardo Rossi</ext-link>, University of Rome Tor Vergata, Italy</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: R. Michael Churchill, <email>rchurchi@pppl.gov</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>10</day>
<month>02</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>12</volume>
<elocation-id>1531334</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>11</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>12</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2025 Churchill.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Churchill</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Artificial Intelligence (AI) foundation models, while successful in various domains of language, speech, and vision, have not been adopted in production for fusion energy experiments. This brief paper presents how AI foundation models can be used for fusion energy diagnostics, enabling, for example, visual automated logbooks to provide greater insights into chains of plasma events in a discharge, in time for between-shot analysis.</p>
</abstract>
<kwd-group>
<kwd>fusion energy</kwd>
<kwd>artificial intelligence</kwd>
<kwd>machine learning</kwd>
<kwd>foundation models</kwd>
<kwd>diagnostic</kwd>
</kwd-group>
<contract-sponsor id="cn001">U.S. Department of Energy<named-content content-type="fundref-id">10.13039/100000015</named-content>
</contract-sponsor>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Fusion Plasma Physics</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>AI foundation models [<xref ref-type="bibr" rid="B1">1</xref>] encapsulate a concept wherein an AI model is pre-trained in an unsupervised or self-supervised manner with a fundamental task, for example, predicting the next word in a sentence, on a wide range of data, and the trained model subsequently serves as a foundation to fine-tune the pre-trained foundation model for more detailed downstream tasks, for example, sentence generation, text summary, machine translation, etc. Essentially, instead of being a narrow expert, they are generalists. Although the concept of these models gained popularity with large language models (LLMs), such as those underlying ChatGPT [<xref ref-type="bibr" rid="B2">2</xref>], in principle, similar techniques can be utilized across a range of modalities, for example, images, audio, video, unstructured meshes, etc. Given the plethora of data on different modalities in experimental magnetic confinement fusion devices and the wide variety of tasks experimental fusion scientists need to perform, a natural question arises on whether AI foundation models can be created for experimental fusion data to enhance and accelerate fusion science. This paper seeks to explain at a conceptual level how these foundation models could be created and how they could effectively be used in experimental fusion settings.</p>
</sec>
<sec id="s2">
<title>2 Foundation models for fusion energy experiments</title>
<p>Currently, when AI/machine learning (ML) is used for tasks within fusion energy experiments, most often, the focus is on bespoke solutions for a particular task. These bespoke solutions require a lot of work from the practitioner, in gathering data, cleaning data, often performing data reductions (i.e., feature engineering), labeling data for classification problems, etc. The targeted tasks range widely, including models created specifically for anomaly detection [<xref ref-type="bibr" rid="B3">3</xref>], classification of plasma events [<xref ref-type="bibr" rid="B4">4</xref>&#x2013;<xref ref-type="bibr" rid="B7">7</xref>], and time&#x2013;series semantic search [<xref ref-type="bibr" rid="B8">8</xref>]. <xref ref-type="fig" rid="F1">Figure 1</xref> shows a representation of a foundation model that would instead serve the basis for these many tasks and more, reducing substantially the burden for repeating many of the steps for custom bespoke solutions. However, the question arises as to what this foundation model is and how this is achieved.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Foundation models for fusion energy enable many downstream tasks to be accomplished by a single model, including classification of plasma phenomena from fast diagnostics, making predictions from few examples, combining multiple diagnostics (modalities), and extracting physics model parameters from diagnostic data, anomaly detection, and more.</p>
</caption>
<graphic xlink:href="fphy-12-1531334-g001.tif"/>
</fig>
<p>For LLMs, one of the more popular foundation models is the generative pre-trained transformer (GPT) [<xref ref-type="bibr" rid="B9">9</xref>]. This is a decoder-only transformer [<xref ref-type="bibr" rid="B10">10</xref>], pre-trained for next-token prediction (where tokens are created by splitting the text into a fixed vocabulary size of subwords, usually on order <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mo>&#x223c;</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>k). Mathematically, GPT model next-token prediction training maximizes the log-likelihood:<disp-formula id="equ1">
<mml:math id="m2">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mi>max</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mi>log</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.3333em"/>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>where <inline-formula id="inf2">
<mml:math id="m3">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the probability density represented by the transformer neutral network with the parameter <inline-formula id="inf3">
<mml:math id="m4">
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the token represented by <inline-formula id="inf4">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, with <inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> being the context length of tokens provided to the model to make the next-token prediction of <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. After training this model with gradient descent, the trained model can be used for various tasks, by, for example, adding a learnable layer at the end of the model and fine-tuning for supervised classification problems [<xref ref-type="bibr" rid="B9">9</xref>]. One of the most impactful findings with these trained models is &#x201c;in-context learning&#x201d; [<xref ref-type="bibr" rid="B11">11</xref>], in which a few examples can be input to the model and, without any additional training or fine-tuning, have the model complete a similar task (for example, provide example pairs of word translation from English -<inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> Spanish and an empty word to translate, for example, provide as input to the model &#x201c;dog -<inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> perro, cat -<inline-formula id="inf9">
<mml:math id="m10">
<mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> gato, bird -<inline-formula id="inf10">
<mml:math id="m11">
<mml:mrow>
<mml:mo>&#x3e;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>&#x201d;, and the model outputs &#x201c;pajaro&#x201d;). The increased in-context learning performance of GPT models with the size of the model (and size of data trained on) has enabled these models to be of general purpose (able to perform many different tasks) and led directly to the success of ChatGPT.</p>
<p>In experimental fusion energy sciences, the data are fundamentally different from text, in the first place being continuous instead of discrete but also consists of hundreds of different diagnostic data modalities, ranging from simple time series to more complex multi-channel, line-integrated 2d spatial videos. The time&#x2013;series nature of the data maps well onto foundation models created for audio or music [<xref ref-type="bibr" rid="B12">12</xref>], where their typical downstream tasks are speaker identification, automatic speech recognition, music generation, etc. Typically, to train these models, the self-supervised learning objective differs from the discrete language case since the continuous nature of the time series is a large space to attempt the next-token prediction. Instead, often, contrastive learning for self-supervised training is used, where a time series sequence is partially masked, and the model learns to predict this masked portion by discerning from a set, including the true sequence and many negative or false sequence samples:<disp-formula id="equ2">
<mml:math id="m12">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="normal">l</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">g</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">x</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2f;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mo>&#x2061;</mml:mo>
<mml:mi mathvariant="normal">exp</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2f;</mml:mo>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>where sim<inline-formula id="inf11">
<mml:math id="m13">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mi>b</mml:mi>
<mml:mo>/</mml:mo>
<mml:mo>&#x2225;</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>&#x2225;</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mo>&#x2225;</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>&#x2225;</mml:mo>
</mml:math>
</inline-formula> is the cosine similarity, <inline-formula id="inf12">
<mml:math id="m14">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the model output predicted sequence, <inline-formula id="inf13">
<mml:math id="m15">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the true sequence (quantized to ease learning), <inline-formula id="inf14">
<mml:math id="m16">
<mml:mrow>
<mml:mi>&#x3c4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a modifiable temperature parameter, and <inline-formula id="inf15">
<mml:math id="m17">
<mml:mrow>
<mml:mi>Q</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a set including the true sequence and a number of false sequences to discern between. Contrastive losses are more suitable for situations with continuous valued sequences. With the pre-trained model, a similar path as the LLM can be followed for fine-tuning the models using a few specific labeled examples for supervised learning tasks like classification.</p>
<p>It should be noted here that while LLMs based on next-token prediction loss have been useful for both generative and discriminative downstream tasks, often, foundation models for audio or time-series have been focused on one set or the other (generative or discriminative downstream tasks). <xref ref-type="fig" rid="F1">Figure 1</xref> focuses on discriminative downstream tasks (e.g., classifying plasma modes in diagnostic data), but it should be noted that there are generative tasks that can be useful in fusion energy, such as scenario planning. Many foundation models for modalities like audio focused on generative tasks use diffusion or flow matching models [<xref ref-type="bibr" rid="B13">13</xref>], although they are not studied here.</p>
<p>AI foundation models can be created for single diagnostics; however, AI model architectures exist to incorporate multiple modalities [<xref ref-type="bibr" rid="B14">14</xref>&#x2013;<xref ref-type="bibr" rid="B16">16</xref>], thereby taking advantage of the correlations between modalities. For fusion experiments, this is particularly useful as information in, for example, the electron cyclotron emission imaging (ECEI) diagnostic and the beam emission spectroscopy (BES) diagnostic, measures different physical phenomena, and combining the data for predictions will potentially provide greater information than the sum of its parts.</p>
<p>Because AI foundation models are pre-trained to effectively learn the underlying data distribution, it is observed that large parameter models pre-trained on large amounts of unlabeled data perform better [<xref ref-type="bibr" rid="B11">11</xref>]. As a result, the consequence is that large high-performance computing (HPC) resources with many GPUs are needed to train these models. With the popularity of deep learning and foundation models, many good frameworks and tools are available to make this easier, including PyTorch, Hugging Face Accelerate, and MetaFAIR library.</p>
</sec>
<sec id="s3">
<title>3 Automated logbook</title>
<p>One relevant example of how to use such an AI foundation model for fusion energy experiment is shown in the automated logbook example in <xref ref-type="fig" rid="F2">Figure 2</xref>. Fusion energy researchers have a deluge of data to process and understand from experiment, on short timescales between experimental discharges (usually 10&#x2013;20 min) and longer timescales of months to years for understanding campaign-level data. Insights, if recorded, are normally formulated as text into personal or online logbooks. This manual analysis can be laborious. The AI foundation model could be used to automatically tag plasma events of interest in the diagnostic data, creating a metadata database and enabling fast visualization of plasma event sequences between plasma discharges.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Workflow for the automated logbook, enriched by few-shot learning with large neural networks. A CNN &#x2b; Transformer foundation model is pre-trained on unlabeled data and then fine-tuned with a small labeled dataset. With the fine-tuned network, fast inference can be done between shots on diagnostic data, to quickly identify plasma events of interest.</p>
</caption>
<graphic xlink:href="fphy-12-1531334-g002.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, first, a large dataset of raw diagnostic data from many plasma discharges is gathered, without having to label specific plasma events in the data. The AI foundation model is pre-trained on this data, passing in the sequences of data and using a contrastive loss to learn to predict masked portions of the sequence (the model shown is based on the wav2vec 2.0 model [<xref ref-type="bibr" rid="B17">17</xref>], with a convolutional neural network (CNN) encoder to reduce the data to a latent space representation, followed by a transformer model [<xref ref-type="bibr" rid="B10">10</xref>]). In the second step, a small dataset is gathered and labeled at time slices with a specific plasma event or mode, for example, neoclassical tearing modes (NTMs), Alfven eigenmodes (AEs), edge harmonic oscillation (EHO), etc. The fine-tuning of the model can be to predict a single type of plasma event or different types of events. The size of this labeled dataset is smaller than would be required when training a model directly in a traditional supervised learning fashion since the pre-trained model has learnt good representations of the underlying data distribution. The size of this labeled dataset in principle can be as little as one or a few examples but, in practice, may require more and is problem-dependent. A decoder layer with learnable parameters is added onto the end of the pre-trained model, and with the labeled dataset, the model is fine-tuned to output predicted labels based on an input sequence. This fine-tuning can involve only updating the decoder layer learnable parameters and retaining the rest of the pre-trained model parameters frozen, or unfreezing various layers of the pre-trained model and having those parameters also updated by the learning process. This fine-tuning needs to be done once, and then, the model is used for inference (in machine learning parlance prediction <italic>versus</italic> learning). As new plasma discharges are completed, the fine-tuned model takes in the new diagnostic data and predicts labels for the various plasma events. In the final step shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, these predictions can be visualized with the data in the automated logbook, for fast feedback to fusion researchers between plasma discharges and further investigation later. Detected modes can also trigger further analysis, for example, bandpass filtering on the detected mode frequencies, and visualizing the resulting spatial model structure in different diagnostics.</p>
<p>Although bespoke AI models could be created for each diagnostic or each plasma event, the traditional supervised learning route would almost surely require thousands of labeled examples gathered by researchers, a long tedious process often avoided. The AI foundation model offers a route where fewer labeled examples are needed. The foundation model can be fine-tuned for different plasma events. This enables identification of chains of events often important for understanding phenomena such as disruptions [<xref ref-type="bibr" rid="B18">18</xref>, <xref ref-type="bibr" rid="B19">19</xref>].</p>
<p>Foundation models do require a large unlabeled dataset, and there are no well-defined rules for its size (this is dependent on the variety of the data and information content per sample, which may be hard to quantify). For many fusion energy experiments, substantial data can be available, depending on the device and diagnostic. An example of the largest diagnostic datasets on the DIII-D tokamak is shown in <xref ref-type="table" rid="T1">Table 1</xref> (there are a total of 60 different diagnostic systems on DIII-D), showing a substantial amount of data available that can reasonably be expected to be sufficient for the purpose of training an AI foundation model.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Diagnostics on the DIII-D tokamak with the largest dataset sizes. Note that not all of these data are for overlapping plasma discharges (i.e., some plasma discharges will not have all of these diagnostics available).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Diagnostic</th>
<th align="center">Spatial</th>
<th align="center">Temporal</th>
<th align="center">Total size [TB]</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Fast camera</td>
<td align="center">512 &#xd7; 368</td>
<td align="center">10 kHz</td>
<td align="center">30.8</td>
</tr>
<tr>
<td align="left">Helicon antenna cameras</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">8.6</td>
</tr>
<tr>
<td align="left">IR TV camera</td>
<td align="center">3 &#xd7; (640 &#xd7; 512; 464 &#xd7; 4)</td>
<td align="center">125 Hz; 12 kHz</td>
<td align="center">10.1</td>
</tr>
<tr>
<td align="left">Tangential viewing visible light camera</td>
<td align="center">512 &#xd7; 512</td>
<td align="center">31 Hz</td>
<td align="center">2.7</td>
</tr>
<tr>
<td align="left">Beam emission spectroscopy</td>
<td align="center">8 &#xd7; 8</td>
<td align="center">1 MHz</td>
<td align="center">14.2</td>
</tr>
<tr>
<td align="left">Electron cyclotron emission imaging</td>
<td align="center">2 &#xd7; 20 &#xd7; 8</td>
<td align="center">1 MHz</td>
<td align="center">60.0</td>
</tr>
<tr>
<td align="left">Ultra-fast charge exchange recombination spectroscopy (UF-CHERS)</td>
<td align="center">16</td>
<td align="center">1 MHz</td>
<td align="center">1.7</td>
</tr>
<tr>
<td align="left">Fast soft X-ray imaging</td>
<td align="center">100</td>
<td align="center">1 MHz</td>
<td align="center">10.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In addition to the need for sizeable information-rich data to train on, out-of-distribution (OOD) data during inference need to be considered. Fusion experiments often push the boundaries to new areas, resulting in diagnostic data that may be far from that seen previously. Various works have approached this topic in bespoke AI models for fusion energy, seeking to enable models to adapt to new datasets [<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B20">20</xref>, <xref ref-type="bibr" rid="B21">21</xref>]. In the context of AI foundation models, there are some indications in other fields, such as medical imaging, that foundation models are more robust to data distribution shift [<xref ref-type="bibr" rid="B22">22</xref>], even being useful to discriminate OOD data [<xref ref-type="bibr" rid="B23">23</xref>]. However, this needs to be researched in the specific context of AI foundation models for fusion energy diagnostic data.</p>
</sec>
<sec sec-type="discussion" id="s4">
<title>4 Discussion</title>
<p>AI foundation models could serve to simplify and greatly expand the use of AI in experimental fusion energy. The ability to create good latent space representations of diagnostic data can aid in a number of downstream tasks for experimental fusion scientists, such as identification of plasma phenomena across multiple diagnostics, anomaly detection, extracting physics parameters from data, and use in control systems. The automation of these tasks leads to remarkable opportunities to gain further insights across many plasma discharges and uncover hidden relationships. Foundation models also ease the burden on scientists from identifying and labeling thousands of examples for AI models, to a much more manageable level. Some work toward foundation models for fusion energy diagnostics has begun, for example, through the ExaLearn project, which was part of the Exascale Computing Project [Rodriguez et al., 2024 (unpublished study)], EUROFusion projects [<xref ref-type="bibr" rid="B24">24</xref>], and multi-modal bespoke models[<xref ref-type="bibr" rid="B25">25</xref>, <xref ref-type="bibr" rid="B26">26</xref>], but until now, the full realization of AI foundation models as a production-ready tool in experimental fusion science has not been realized.</p>
<p>Although the focus of this paper has been foundation models for multi-modal time-series-based diagnostics, the advent of reasoning models such as the OpenAI o1 model [<xref ref-type="bibr" rid="B27">27</xref>] presents an opportunity to combine these in a hybrid system of AI agents, which can leverage these multi-modal time-series foundation models as tools to further automate discovery and utility of the investment in these experimental devices, including coupling with simulation. Creating these flexible building blocks of multi-modal time-series foundation models, to build these advanced workflows, could greatly aid fusion energy scientists ultimately toward the realization of fusion energy as a clean and sustainable energy source.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.</p>
</sec>
<sec sec-type="author-contributions" id="s6">
<title>Author contributions</title>
<p>RC: writing&#x2013;original draft and writing&#x2013;review and editing.</p>
</sec>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by the US Department of Energy under DE-AC02-09CH11466.</p>
</sec>
<ack>
<p>The author gratefully acknowledges stimulating conversations and collaboration with colleagues in the ExaLearn project and with attendees at the Visualizing Offline and Live Data with AI (VOLDA) workshop where the work was presented.</p>
</ack>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="ai-statement" id="s9">
<title>Generative AI statement</title>
<p>The author(s) declare that no Generative AI was used in the creation of this manuscript.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Bommasani</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hudson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Adeli</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Arora</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Arx</surname>
<given-names>S</given-names>
</name>
<etal/>
</person-group> <article-title>On the opportunities and risks of foundation models</article-title>. <comment>arXiv [Preprint] <italic>arXiv:2108.07258</italic>
</comment> (<year>2021</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2108.07258">http://arxiv.org/abs/2108.07258</ext-link> (Accessed August 27, 2021)</comment>
</citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="web">
<collab>OpenAI</collab>
<person-group person-group-type="author">
<name>
<surname>Achiam</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Adler</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ahmad</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Akkaya</surname>
<given-names>I</given-names>
</name>
<etal/>
</person-group> <article-title>GPT-4 technical report</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2303.08774</italic>
</comment> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2303.08774">http://arxiv.org/abs/2303.08774</ext-link> (Accessed November 19, 2024)</comment>
</citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anand</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Sammuli</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Olofsson</surname>
<given-names>KEJ</given-names>
</name>
<name>
<surname>Humphreys</surname>
<given-names>DA</given-names>
</name>
</person-group>. <article-title>Real-time magnetic sensor anomaly detection using autoencoder neural networks on the DIII-D tokamak</article-title>. <source>IEEE Trans. Plasma Sci.</source> (<year>2022</year>) <volume>50</volume>:<fpage>4126</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1109/TPS.2022.3181548</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Churchill</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Tobias</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Deep convolutional neural networks for multi-scale time-series classification and application to tokamak disruption prediction using raw, high temporal resolution diagnostic data</article-title>. <source>Phys. Plasmas</source> (<year>2020</year>) <volume>27</volume>(<issue>6</issue>):<comment>062510</comment>. <pub-id pub-id-type="doi">10.1063/1.5144458</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kates-Harbeck</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Svyatkovskiy</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>Predicting disruptive instabilities in controlled fusion plasmas through deep learning</article-title>. <source>Nature</source> (<year>2019</year>) <volume>568</volume>:<fpage>526</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1116-4</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rea</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Granetz</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Montes</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tinguely</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Eidietis</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Hanson</surname>
<given-names>JM</given-names>
</name>
<etal/>
</person-group> <article-title>Disruption prediction investigations using Machine Learning tools on DIII-D and Alcator C-Mod</article-title>. <source>Plasma Phys. Control. Fusion</source> (<year>2018</year>) <volume>60</volume>: <issue>8</issue>. <pub-id pub-id-type="doi">10.1088/1361-6587/aac7fe</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>&#x160;kv&#xe1;ra</surname>
<given-names>V</given-names>
</name>
<name>
<surname>&#x160;m&#xed;dl</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Pevn&#xfd;</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Seidl</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Havr&#xe1;nek</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tskhakaya</surname>
<given-names>D</given-names>
</name>
<etal/>
</person-group> <article-title>Detection of alfv&#xe9;n Eigenmodes on COMPASS with generative neural networks</article-title>. <source>Fusion Sci. Technol.</source> (<year>2020</year>) <volume>76</volume>(<issue>8</issue>):<fpage>962</fpage>&#x2013;<lpage>71</lpage>. <pub-id pub-id-type="doi">10.1080/15361055.2020.1820805</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Montes</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Rea</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tinguely</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Sweeney</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Granetz</surname>
<given-names>RS</given-names>
</name>
</person-group>. <article-title>A semi-supervised machine learning detector for physics events in tokamak discharges</article-title>. <source>Nucl. Fusion</source> (<year>2021</year>) <volume>61</volume>(<issue>2</issue>):<fpage>026022</fpage>. <pub-id pub-id-type="doi">10.1088/1741-4326/abcdb9</pub-id>
</citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Radford</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Child</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Luan</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dario</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I</given-names>
</name>
</person-group> <article-title>Language models are unsupervised multitask learners</article-title>. <source>Tech Rep OpenAi</source> (<year>2019</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/codelucas/newspaper">https://github.com/codelucas/newspaper</ext-link> (Accessed July 17, 2019).</comment>
</citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Ashish</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Shazeer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Parmar</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Uszkoreit</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gomez</surname>
<given-names>AN</given-names>
</name>
<etal/>
</person-group> <article-title>Attention is all you need</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:1706.03762</italic>
</comment>. (<year>2017</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1706.03762">http://arxiv.org/abs/1706.03762</ext-link> (Accessed July 17, 2019).</comment>
</citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>TB</given-names>
</name>
<name>
<surname>Benjamin</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nick</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Melanie</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jared</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Prafulla</surname>
<given-names>D</given-names>
</name>
<etal/>
</person-group> <article-title>Language models are few-shot learners</article-title> <comment>arXiv [Preprint]. <italic>arXiv:2005.14165</italic>
</comment> (<year>2020</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2005.14165">http://arxiv.org/abs/2005.14165</ext-link> (Accessed June 17, 2020).</comment>
</citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yinghao</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Anders</surname>
<given-names>&#xd8;</given-names>
</name>
<name>
<surname>Anton</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bleiz MacSen Del</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Charalampos</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chris</surname>
<given-names>D</given-names>
</name>
<etal/>
</person-group> <article-title>Foundation Models for Music: A Survey</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2408.14340</italic>
</comment> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2408.14340">http://arxiv.org/abs/2408.14340</ext-link> (Accessed November 20, 2024)</comment>
</citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lipman</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>RTQ</given-names>
</name>
<name>
<surname>Ben-Hamu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Maximilian</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Matt</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Flow Matching for Generative Modeling</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2210.02747</italic>
</comment> (<year>2023</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2210.02747">http://arxiv.org/abs/2210.02747</ext-link> (Accessed October 2, 2024)</comment>
</citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Akbari</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Liangzhe</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rui</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Wei-Hong</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>C</given-names>
</name>
<etal/>
</person-group> <article-title>VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2104.11178</italic>
</comment> (<year>2021</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2104.11178">http://arxiv.org/abs/2104.11178</ext-link> (Accessed November 13, 2024)</comment>
</citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Jaegle</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sebastian</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Jean-Baptiste</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Carl</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Catalin</surname>
<given-names>I</given-names>
</name>
<name>
<surname>David</surname>
<given-names>D</given-names>
</name>
<etal/>
</person-group> <article-title>Perceiver IO: a general architecture for structured inputs and outputs</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2107.14795</italic>
</comment> (<year>2021</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2107.14795">http://arxiv.org/abs/2107.14795</ext-link> (Accessed January 18, 2022).</comment>
</citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Alayrac</surname>
<given-names>J-B</given-names>
</name>
<name>
<surname>Jeff</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Pauline</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Antoine</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Iain</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Yana</surname>
<given-names>H</given-names>
</name>
<etal/>
</person-group> <article-title>Flamingo: a Visual Language Model for Few-Shot Learning</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2204.14198</italic>
</comment> (<year>2022</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2204.14198">http://arxiv.org/abs/2204.14198</ext-link> (Accessed November 20, 2024)</comment>
</citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Baevski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Henry</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Abdelrahman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Michael</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>wav2vec 2.0: a framework for self-supervised learning of speech representations</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2006.11477</italic>
</comment> (<year>2020</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2006.11477">http://arxiv.org/abs/2006.11477</ext-link> (Accessed June 25, 2020)</comment>
</citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Vries</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Alper</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Buratti</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hender</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Koslowski</surname>
<given-names>HR</given-names>
</name>
<etal/>
</person-group> <article-title>Survey of disruption causes at JET</article-title>. <source>Nucl. Fusion</source> (<year>2011</year>) <volume>51</volume>(<issue>5</issue>):<fpage>053018</fpage>. <pub-id pub-id-type="doi">10.1088/0029-5515/51/5/053018</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sabbagh</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Berkery</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>YS</given-names>
</name>
<name>
<surname>Ahn</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rizques</surname>
<given-names>JD</given-names>
</name>
<etal/>
</person-group> <article-title>Disruption event characterization and forecasting in tokamaks</article-title>. <source>Phys. Plasmas</source> (<year>2023</year>) <volume>30</volume>:<fpage>032506</fpage>. <pub-id pub-id-type="doi">10.1063/5.0133825</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murari</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Peluso</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lungaroni</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gaudio</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gelfusa</surname>
<given-names>M</given-names>
</name>
<etal/>
</person-group> <article-title>On the transfer of adaptive predictors between different devices for both mitigation and prevention of disruptions</article-title>. <source>Nucl. Fusion</source> (<year>2020</year>) <volume>60</volume>(<issue>5</issue>):<fpage>056003</fpage>. <pub-id pub-id-type="doi">10.1088/1741-4326/ab77a6</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murari</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Craciunescu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Vega</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mailloux</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Abid</surname>
<given-names>N</given-names>
</name>
<etal/>
</person-group> <article-title>A control oriented strategy of disruption prediction to avoid the configuration collapse of tokamak reactors</article-title>. <source>Nat Commun</source> (<year>2024</year>) <volume>15</volume>(<issue>1</issue>):<fpage>2424</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-024-46242-7</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Duy</surname>
<given-names>MHN</given-names>
</name>
<name>
<surname>Tan Ngoc</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Nghiem</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Nghi</surname>
<given-names>QP</given-names>
</name>
<name>
<surname>Quang</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Vinh</surname>
<given-names>T</given-names>
</name>
<etal/>
</person-group> <article-title>On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2311.11096</italic>
</comment> (<year>2023</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2311.11096">http://arxiv.org/abs/2311.11096</ext-link> (Accessed December 10, 2024)</comment>
</citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>X</given-names>
</name>
</person-group>. <article-title>Can OOD Object Detectors Learn from Foundation Models?</article-title> <comment>arXiv [Preprint]. <italic>arXiv:2409.05162</italic>
</comment> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2409.05162">http://arxiv.org/abs/2409.05162</ext-link> (Accessed December 10, 2024)</comment>
</citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>de Vries</surname>
<given-names>G</given-names>
</name>
</person-group>. <article-title>EUROfusion spearheads advances in Artificial Intelligence and Machine Learning to unlock fusion energy. en-US</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://euro-fusion.org/eurofusion-news/eurofusion-spearheads-advances-in-artificial-intelligence-and-machine-learning-to-unlock-fusion-energy/">https://euro-fusion.org/eurofusion-news/eurofusion-spearheads-advances-in-artificial-intelligence-and-machine-learning-to-unlock-fusion-energy/</ext-link> (Accessed December 10, 2024)</comment>
</citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zheng</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Fengming</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhongyong</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Dalong</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bihao</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chengshuo</surname>
<given-names>S</given-names>
</name>
<etal/>
</person-group> <article-title>Disruption prediction for future tokamaks using parameter-based transfer learning</article-title>. <source>Commun. Phys.</source> (<year>2023</year>) <volume>6.1</volume>:<fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1038/s42005-023-01296-9</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Jalalvand</surname>
<given-names>A</given-names>
</name>
<name>
<surname>SangKyeun</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Jaemin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Qiming</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Max</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Peter</surname>
<given-names>S</given-names>
</name>
<etal/>
</person-group> <article-title>Multimodal Super-Resolution: Discovering hidden physics and its application to fusion plasmas</article-title>. <comment>arXiv [Preprint]. <italic>arXiv:2405.05908</italic>
</comment> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2405.05908">http://arxiv.org/abs/2405.05908</ext-link> (Accessed November 19, 2024)</comment>
</citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<collab>OpenAI</collab>. <article-title>Learning to Reason with LLMs</article-title> (<year>2024</year>). <comment>Available from: <ext-link ext-link-type="uri" xlink:href="https://openai.com/index/learning-to-reason-with-llms/">https://openai.com/index/learning-to-reason-with-llms/</ext-link> (Accessed November 20, 2024)</comment>
</citation>
</ref>
</ref-list>
</back>
</article>