<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="editorial">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2022.838097</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Editorial</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Editorial: Towards Exascale Solutions for Big Data Computing</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Talia</surname> <given-names>Domenico</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1008713/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Trunfio</surname> <given-names>Paolo</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/147542/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Carretero</surname> <given-names>Jesus</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Garcia-Blas</surname> <given-names>Javier</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/552047/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>DIMES, University of Calabria</institution>, <addr-line>Rende</addr-line>, <country>Italy</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Computer Science and Engineering, Universidad Carlos III de Madrid de Madrid</institution>, <addr-line>Legan&#x000E9;s</addr-line>, <country>Spain</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited and reviewed by: Huan Liu, Arizona State University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Domenico Talia <email>talia&#x00040;dimes.unical.it</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data</p></fn></author-notes>
<pub-date pub-type="epub">
<day>11</day>
<month>02</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>838097</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>12</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>01</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Talia, Trunfio, Carretero and Garcia-Blas.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Talia, Trunfio, Carretero and Garcia-Blas</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<related-article id="RA1" related-article-type="commentary-article" xlink:href="https://www.frontiersin.org/research-topics/15044/towards-exascale-solutions-for-big-data-computing" ext-link-type="uri">Editorial on the Research Topic <article-title>Towards Exascale Solutions for Big Data Computing</article-title></related-article>
<kwd-group>
<kwd>big data</kwd>
<kwd>high performance computing</kwd>
<kwd>exascale systems</kwd>
<kwd>machine learning</kwd>
<kwd>parallel programming</kwd>
</kwd-group>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="4"/>
<page-count count="2"/>
<word-count count="1390"/>
</counts>
</article-meta>
</front>
<body>
<p>The design and implementation of Big Data management and analysis solutions has received many benefits and improvements via the utilization of high-performance computing (HPC) systems. Today, complex processing and analysis of real-world massive data sources in AI, machine learning, and large simulations require using HPC infrastructures such as highly parallel clusters, supercomputers, and clouds (Talia, <xref ref-type="bibr" rid="B3">2019</xref>). However, as parallel research and technologies advance, in the next few years, exascale computing systems will be used for implementing scalable Big Data analysis solutions in science and business (Reed and Dongarra, <xref ref-type="bibr" rid="B2">2015</xref>). To reach this goal, new design and implementation challenges must be addressed and solved for exploiting the computation power of new HPC systems in running Big Data and machine learning applications.</p>
<p>Exascale supercomputers refer to computing systems capable of at least one exaflop or a quintillion calculations per second (10<sup>18</sup>). Despite their future contribution to support very large and very complex applications, exascale systems are becoming harder and harder to use efficiently (Talia et al., <xref ref-type="bibr" rid="B4">2020</xref>). In particular, in the area of Big Data analysis new solutions are needed to achieve scalable software systems running quickly on exascale platforms. Extreme data refers to massive amounts of Big Data that must be queried, communicated, and analyzed in (near) real-time by using a very large number of memory and computing elements. Large repositories and continuous streams of data soon will be processed and analyzed by Exascale computing systems that today are under development (Gropp and Snir, <xref ref-type="bibr" rid="B1">2013</xref>). Significant examples are scientific data produced at a rate of hundreds of gigabits-per-second that must be stored, filtered, and analyzed; millions of images per day that must be analyzed in parallel; or billions of social data posts queried in real-time on an in-memory components database. Nowadays, traditional disks and commercial storage systems cannot handle the extreme scale of data required for such applications and a very large number of cores are needed to process them. Following the need for improvement of current concepts and technologies, this Research Topic aims at focusing on data-intensive algorithms, systems, and applications running on systems composed of up to millions of computing elements on which are based the exascale systems.</p>
<p>Key scientific fields discussed in the papers that have been selected for this Research Topic include:</p>
<list list-type="bullet">
<list-item><p>Studies of parallel hardware and software systems for Big Data storing, processing, and analysis.</p></list-item>
<list-item><p>Methods, techniques, and prototypes designed and used to implement Big Data solutions on massive HPC and exascale systems.</p></list-item>
<list-item><p>Massively parallel algorithms and applications for machine learning solutions.</p></list-item>
<list-item><p>New programming paradigms, APIs, runtime tools, and methodologies for expressing data-intensive tasks on exascale systems.</p></list-item>
<list-item><p>Innovative applications of Big Data computing.</p></list-item>
<list-item><p>Big Data analysis use cases in large-scale parallel systems.</p></list-item>
</list>
<p>In particular, the Research Topic includes four papers. In the paper titled &#x0201C;<italic>HPTMT Parallel Operators for High-Performance Data Science &#x00026; Data Engineering</italic>,&#x0201D; <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2021.756041">Fox et al.</ext-link> introduce and illustrate the HPTMT architecture that has been developed for creating rich data applications that link all aspects of data engineering and data science together efficiently. The paper discusses an architecture using an end-to-end application with deep learning and data engineering parts working together.</p>
<p>In the paper &#x0201C;<italic>BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data</italic>&#x0201D; the authors present a Hadoop-based software system, termed BigFiRSt, to analyze Simple Sequence Repeats (SSRs) of nucleotide sequences using cutting-edge big data technology (<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2021.727216">Chen et al.</ext-link>). BigFiRSt consists of two major modules, BigFLASH and BigPERF, to address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data.</p>
<p>In the paper titled &#x0201C;<italic>The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers?</italic>&#x0201D; <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2021.669097">Markidis</ext-link> discusses Physics-Informed Neural Networks (PINNs) that are neural networks encoding the problem governing equations, such as partial differential equations (PDE), as a part of the neural network. Physics-Informed Neural Networks have emerged as a new tool to solve challenging problems like computing linear systems arising from PDEs. The paper focuses first on evaluating the potential of PINNs as linear solvers in the case of the Poisson equation, and it characterizes PINN linear solvers in terms of accuracy and performance under different network configurations (depth, activation functions, input data set distribution).</p>
<p>Finally, the contribution by <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2021.657218">Kimovski et al.</ext-link> focuses on the &#x0201C;<italic>Autotuning of Exascale Applications With Anomalies Detection</italic>.&#x0201D; Autotuning automates the process of identification of the most desirable application implementation in terms of code variations and run-time parameters. The complexity and size of exascale systems make autotuning very difficult, especially considering the number of parameter variations that have to be identified. The authors introduce a novel approach for autotuning of exascale applications based on a genetic multi-objective optimization algorithm, integrated within the ASPIDE exascale computing framework. The approach considers multi-dimensional search space with support for pluggable objective functions, including execution time and energy requirements, and a machine-learning-based event detection approach capable of detecting events and anomalies during application execution, such as hardware failures or communication bottlenecks.</p>
<p>Those research contributions provide novel insights and solutions for the exploitation of massive parallelism in processing very large repositories of data. They describe methods and mechanisms for fostering high performance and efficiency and for offering powerful operations and tools in processing extreme data sources at high speed and/or in real-time on highly parallel computing systems, according to the high-performance data analytics (HPDA) approach.</p>
<sec id="s1">
<title>Author Contributions</title>
<p>All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.</p>
</sec>
<sec sec-type="funding-information" id="s2">
<title>Funding</title>
<p>This work was supported by the ASPIDE Project funded by the European Union&#x00027;s Horizon 2020 Research and Innovation Programme under grant agreement No 801091.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s3">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gropp</surname> <given-names>W.</given-names></name> <name><surname>Snir</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <article-title>Programming for exascale computers</article-title>. <source>Comput. Sci. Eng.</source> <volume>15</volume>, <fpage>27</fpage>&#x02013;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1109/MCSE.2013.96</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reed</surname> <given-names>D. A.</given-names></name> <name><surname>Dongarra</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Exascale computing and big data</article-title>. <source>Commun. ACM</source> <volume>58</volume>, <fpage>56</fpage>&#x02013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1145/2699414</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Talia</surname> <given-names>D..</given-names></name></person-group> (<year>2019</year>). <article-title>A view of programming scalable data analysis: from clouds to exascale</article-title>. <source>J. Cloud Comput.</source> <volume>8</volume>, <fpage>4</fpage>. <pub-id pub-id-type="doi">10.1186/s13677-019-0127-x</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Talia</surname> <given-names>D.</given-names></name> <name><surname>Trunfio</surname> <given-names>P.</given-names></name> <name><surname>Marozzo</surname> <given-names>F.</given-names></name> <name><surname>Belcastro</surname> <given-names>L.</given-names></name> <name><surname>Blas</surname> <given-names>J. G.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;A novel data-centric programming model for large-scale parallel systems</article-title>, in <source>Euro-Par 2019 Parallel Processing Workshops Revised Selected Papers</source> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>452</fpage>&#x02013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-48340-1_35</pub-id></citation>
</ref>
</ref-list> 
</back>
</article>