<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="discussion">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2016.00154</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Opinion</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Against Dataism and for Data Sharing of Big Biomedical and Clinical Data with Research Parasites</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Emmert-Streib</surname> <given-names>Frank</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/37376/overview"/></contrib>
<contrib contrib-type="author">
<name><surname>Dehmer</surname> <given-names>Matthias</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/37754/overview"/></contrib>
<contrib contrib-type="author">
<name><surname>Yli-Harja</surname> <given-names>Olli</given-names></name>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology</institution> <country>Tampere, Finland</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Mechatronics and Biomedical Computer Science, UMIT</institution> <country>Hall in Tyrol, Austria</country></aff>
<aff id="aff3"><sup>3</sup><institution>College of Computer and Control Engineering, Nankai University</institution> <country>Tianjin, China</country></aff>
<aff id="aff4"><sup>4</sup><institution>Computational Systems Biology, Department of Signal Processing, Tampere University of Technology</institution> <country>Tampere, Finland</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Mogens Fenger, The Capital Region of Denmark, Denmark</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Angela Re, University of Trento, Italy</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Frank Emmert-Streib <email>frank.emmert-streib&#x00040;tut.fi</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Statistical Genetics and Methodology, a section of the journal Frontiers in Genetics</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>31</day>
<month>08</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>7</volume>
<elocation-id>154</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>06</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>08</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2016 Emmert-Streib, Dehmer and Yli-Harja.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Emmert-Streib, Dehmer and Yli-Harja</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<kwd-group>
<kwd>data sharing</kwd>
<kwd>clinical data</kwd>
<kwd>biomedical data</kwd>
<kwd>genomics</kwd>
<kwd>computational biology</kwd>
</kwd-group>
<contract-num rid="cn001">project - P26142</contract-num>
<contract-sponsor id="cn001">Austrian Science Fund<named-content content-type="fundref-id">10.13039/501100002428</named-content></contract-sponsor>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="17"/>
<page-count count="3"/>
<word-count count="1815"/>
</counts>
</article-meta>
</front>
<body>
<p>According to the Oxford Dictionaries Online, Medicine is &#x0201C;The science or practice of the diagnosis, treatment, and prevention of disease.&#x0201D; This implies that a patient is in the central focus of the profession and all relevant specializations and subareas are concerned with benefiting a patient&#x00027;s health. In recent years, the analysis of clinical and biomedical data, including high-throughput experiments, has been added to the list of such specializations that make contributions for the greater good. However, the analysis and the reuse of such data is in general difficult and for this reason has been under scrutiny (Ioannidis, <xref ref-type="bibr" rid="B8">2005</xref>; Chalmers and Glasziou, <xref ref-type="bibr" rid="B3">2009</xref>; Ioannidis and Khoury, <xref ref-type="bibr" rid="B9">2011</xref>; Rung and Brazma, <xref ref-type="bibr" rid="B16">2013</xref>; Ioannidis et al., <xref ref-type="bibr" rid="B7">2015</xref>).</p>
<p>With breakthroughs in data production, the integration of unprecedentedly rich data is expected to lead to an enormous impact on basic research and to translate on healthcare, but comes with significant challenges for the practices of analysis, data sharing, and the evaluation of results (Marx, <xref ref-type="bibr" rid="B13">2013</xref>; Fan et al., <xref ref-type="bibr" rid="B6">2014</xref>; Emmert-Streib et al., <xref ref-type="bibr" rid="B5">2016</xref>). Improvements in these areas would undoubtedly make research process more efficient and its results more reliable. An important case is offered by Baggerly and Coombes (<xref ref-type="bibr" rid="B1">2009</xref>) who found by the <italic>re-analysis</italic> of various data sets from Potti et al. (<xref ref-type="bibr" rid="B15">2011</xref>) fundamental flaws leading ultimately in the discontinuation of three clinical cancer trials. This became known as Duke Saga (Kolata, <xref ref-type="bibr" rid="B10">2011</xref>). It is difficult to quantify their impact on the health of patients but given they even identified erroneous therapeutic interventions based on the work of Dr Potti, it is fair to assume that their work helped even saving the life of patients. Given this contribution and its clearly beneficial impact for patients it is stunning that according to a recent publication by Longo and Drazen (<xref ref-type="bibr" rid="B12">2016</xref>) scientists like Keith Baggerly and Kevin Coombes have been pejoratively characterized as &#x0201C;research parasites.&#x0201D;</p>
<p>Regarding regulations for data sharing, a major point made in a series of papers published in the New England Journal of Medicine (NEJM; Drazen, <xref ref-type="bibr" rid="B4">2016</xref>; Longo and Drazen, <xref ref-type="bibr" rid="B12">2016</xref>; Taichman et al., <xref ref-type="bibr" rid="B17">2016</xref>) was that
<list list-type="simple">
<list-item><p>1. &#x0201C;Those using data collected by others should seek collaboration with those who collected the data&#x0201D; (Taichman et al., <xref ref-type="bibr" rid="B17">2016</xref>)</p></list-item>
</list>
and
<list list-type="simple">
<list-item><p>2. &#x0201C;Report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested&#x0201D; (Longo and Drazen, <xref ref-type="bibr" rid="B12">2016</xref>).</p></list-item>
</list></p>
<p>The initial reaction of the computational research community has not been positive (Berger et al., <xref ref-type="bibr" rid="B2">2016</xref>; McNutt, <xref ref-type="bibr" rid="B14">2016</xref>).</p>
<p>We are of the opinion that both suggestions are reasonable as &#x0201C;can rules&#x0201D; if circumstances allow it, however, we think that neither should be mandatory. The reason for this is simple. Let&#x00027;s say a published data set, and by this we mean a data set that had to be made publicly available in order to publish major findings in a journal or an obligation imposed by a funding agency, is re-analyzed. In the following we call the scientists generating the data &#x0201C;experimental party&#x0201D; and the scientists re-analyzing the data &#x0201C;computational party.&#x0201D; There are three possible outcomes. First, no results are found which means nothing needs to be published. Second, results are found and both parties are happy with the conclusions. In this case the results can be published and the experimental party could be offered coauthorship but only if the usual criteria for receiving an authorship are met, requiring a significant contribution <italic>beyond</italic> merely providing the data. Third, results are found but both parties disagree with the conclusions. This is certainly the most interesting outcome that deserves attention and is also the case in the Duke Saga. The problem with requiring to name the experimental party as coauthors could be a conflict of interests preventing a paper even from being submitted to a journal for review. Hence, there would be a leverage one would give to such authors allowing to at least delay such a submission indefinitely. For instance, we could ask ourselves at what time point after the accusation made by Keith Baggerly and Kevin Coombes would Anil Potti have agreed to be a coauthor on the paper in Baggerly and Coombes (<xref ref-type="bibr" rid="B1">2009</xref>)? The answer to this question is unknown, however, it is not difficult to see the problems that are implied by such a &#x0201C;must&#x0201D; rule that are clearly not beneficial for the patients enrolled in clinical trials based on flawed benefits.</p>
<p>From the outline of these problems, we suggest the following rules for data sharing:
<list list-type="simple">
<list-item><p>Mandatory rules:</p></list-item>
<list-item><p>M1 In the publication of an article re-analyzing published data, add a citation to the original publication(s) of the data.</p></list-item>
<list-item><p>M2 A possible communication with the experimental party should be acknowledged in the published article.</p></list-item>
<list-item><p>M3 The code used for re-analyzing the data should be made publicly available.</p></list-item>
<list-item><p>Optional rule:</p></list-item>
<list-item><p>O1 If the computational and the experimental parties agree on the research findings declaring no conflict of interest and the experimental party contributes significantly to the re-analysis, both parties should receive authorship.</p></list-item>
</list></p>
<p>In addition to this, we consider it obligatory for journals publishing articles to turn out being erroneous that they publish the articles revealing these issues. For instance, Anil Potti had to retract papers published in Nature and Science but the paper by Keith Baggerly and Kevin Coombes wasn&#x00027;t accepted there, instead, it appeared in the Annal of Applied Statistics (Baggerly and Coombes, <xref ref-type="bibr" rid="B1">2009</xref>). This is not acceptable!</p>
<p>The above rules M1&#x02013;M3 will ensure that it is possible that the re-analysis of data can &#x0201C;disprove what the original investigators had posited&#x0201D; (Longo and Drazen, <xref ref-type="bibr" rid="B12">2016</xref>) because if the initial analysis is wrong this needs to be revealed without any hesitation or qualification.</p>
<p>From a more fundamental point of view the above question of data sharing has an analogy with capitalism. The reason for this is that in capitalism the capital (money) can generate more capital without labor by means of interests. In our case the new capital is data which, according to the rules suggested by Longo and Drazen (<xref ref-type="bibr" rid="B12">2016</xref>), Drazen (<xref ref-type="bibr" rid="B4">2016</xref>), and Taichman et al. (<xref ref-type="bibr" rid="B17">2016</xref>), can generate authorship(s) without contributing to the re-analysis of data ad infimum. As such it would change the way we know science completely. That means the question we need to ask ourselves is do we want a dataism (Lohr, <xref ref-type="bibr" rid="B11">2015</xref>) in science that allows such a monopoly? We are strictly against such a monopoly based on data and for this reason suggested publication rules that prevent this from happening and plead for a data sharing with &#x0201C;research parasites&#x0201D; in the interest of the patients from whom the data originate.</p>
<sec id="s1">
<title>Author contributions</title>
<p>FE conceived the study. FE, MD, and OY wrote the paper.</p>
</sec>
<sec id="s2">
<title>Funding</title>
<p>FE would like to thank TUT for financial support. MD thanks the Austrian Science Funds for supporting this work (project P26142).</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack><p>We would like to thank Galina Glazko and Benjamin Haibe-Kains for fruitful discussions and suggestions on the manuscript. For professional proof reading of the manuscript we would like to thank B&#x000E1;rbara Mac&#x000ED;as Sol&#x000ED;s.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baggerly</surname> <given-names>K. A.</given-names></name> <name><surname>Coombes</surname> <given-names>K. R.</given-names></name></person-group> (<year>2009</year>). <article-title>Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology</article-title>. <source>Ann. Appl. Stat.</source> <volume>3</volume>, <fpage>1309</fpage>&#x02013;<lpage>1334</lpage>. <pub-id pub-id-type="doi">10.1214/09-AOAS291</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berger</surname> <given-names>B.</given-names></name> <name><surname>Gaasterland</surname> <given-names>T.</given-names></name> <name><surname>Lengauer</surname> <given-names>T.</given-names></name> <name><surname>Orengo</surname> <given-names>C.</given-names></name> <name><surname>Gaeta</surname> <given-names>B.</given-names></name> <name><surname>Markel</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>ISCB&#x00027;s initial reaction to <italic>The New England Journal of Medicine</italic> editorial on data sharing</article-title>. <source>PLoS Comput. Biol.</source> <volume>12</volume>:<fpage>e1004816</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004816</pub-id><pub-id pub-id-type="pmid">27010398</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chalmers</surname> <given-names>I.</given-names></name> <name><surname>Glasziou</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Avoidable waste in the production and reporting of research evidence</article-title>. <source>Obstet. Gynecol.</source> <volume>114</volume>, <fpage>1341</fpage>&#x02013;<lpage>1345</lpage>. <pub-id pub-id-type="doi">10.1097/AOG.0b013e3181c3020d</pub-id><pub-id pub-id-type="pmid">19935040</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Drazen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Data sharing and the <italic>journal</italic></article-title>. <source>N. Engl. J. Med.</source> <volume>374</volume>:<fpage>e24</fpage>. <pub-id pub-id-type="doi">10.1056/NEJMe1601087</pub-id><pub-id pub-id-type="pmid">26808582</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Emmert-Streib</surname> <given-names>F.</given-names></name> <name><surname>Moutari</surname> <given-names>S.</given-names></name> <name><surname>Dehmer</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>The process of analyzing data is the emergent feature of data science</article-title>. <source>Front. Genet.</source> <volume>7</volume>:<issue>12</issue>. <pub-id pub-id-type="doi">10.3389/fgene.2016.00012</pub-id><pub-id pub-id-type="pmid">26904100</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fan</surname> <given-names>J.</given-names></name> <name><surname>Han</surname> <given-names>F.</given-names></name> <name><surname>Liu</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Challenges of big data analysis</article-title>. <source>Natl. Sci. Rev.</source> <volume>1</volume>, <fpage>293</fpage>&#x02013;<lpage>314</lpage>. <pub-id pub-id-type="doi">10.1093/nsr/nwt032</pub-id><pub-id pub-id-type="pmid">25419469</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioannidis</surname> <given-names>J. P.</given-names></name> <name><surname>Fanelli</surname> <given-names>D.</given-names></name> <name><surname>Dunne</surname> <given-names>D. D.</given-names></name> <name><surname>Goodman</surname> <given-names>S. N.</given-names></name></person-group> (<year>2015</year>). <article-title>Meta-research: evaluation and improvement of research methods and practices</article-title>. <source>PLoS Biol.</source> <volume>13</volume>:<fpage>e1002264</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pbio.1002264</pub-id><pub-id pub-id-type="pmid">26431313</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioannidis</surname> <given-names>J. P. A.</given-names></name></person-group> (<year>2005</year>). <article-title>Why most published research findings are false</article-title>. <source>PLoS Med.</source> <volume>2</volume>:<fpage>e124</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pmed.0020124</pub-id><pub-id pub-id-type="pmid">16060722</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ioannidis</surname> <given-names>J. P. A.</given-names></name> <name><surname>Khoury</surname> <given-names>M. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Improving validation practices in &#x0201C;omics&#x0201D; research</article-title>. <source>Science</source> <volume>334</volume>, <fpage>1230</fpage>&#x02013;<lpage>1232</lpage>. <pub-id pub-id-type="doi">10.1126/science.1211811</pub-id><pub-id pub-id-type="pmid">22144616</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Kolata</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>How bright promise in cancer testing fell apart</article-title>. <source>The New York Times</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.nytimes.com/2011/07/08/health/research/08genes.html?_r=0">http://www.nytimes.com/2011/07/08/health/research/08genes.html?_r=0</ext-link></citation>
</ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lohr</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <source>Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else.</source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>HarperCollins</publisher-name>.</citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Longo</surname> <given-names>D. L.</given-names></name> <name><surname>Drazen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2016</year>). <article-title>Data sharing</article-title>. <source>N. Engl. J. Med.</source> <volume>374</volume>, <fpage>276</fpage>&#x02013;<lpage>277</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMe1516564</pub-id><pub-id pub-id-type="pmid">27168446</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marx</surname> <given-names>V.</given-names></name></person-group> (<year>2013</year>). <article-title>Biology: the big challenges of big data</article-title>. <source>Nature</source> <volume>498</volume>, <fpage>255</fpage>&#x02013;<lpage>260</lpage>. <pub-id pub-id-type="doi">10.1038/498255a</pub-id><pub-id pub-id-type="pmid">23765498</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McNutt</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x00023;IAmAResearchParasite</article-title>. <source>Science</source> <volume>351</volume>, <fpage>1005</fpage>&#x02013;<lpage>1005</lpage>. <pub-id pub-id-type="doi">10.1126/science.aaf4701</pub-id><pub-id pub-id-type="pmid">26941292</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Potti</surname> <given-names>A.</given-names></name> <name><surname>Dressman</surname> <given-names>H. K.</given-names></name> <name><surname>Bild</surname> <given-names>A.</given-names></name> <name><surname>Riedel</surname> <given-names>R. F.</given-names></name> <name><surname>Chan</surname> <given-names>G.</given-names></name> <name><surname>Sayer</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>Retraction: genomic signatures to guide the use of chemotherapeutics</article-title>. <source>Nat. Med.</source> <volume>17</volume>:<fpage>135</fpage>. <pub-id pub-id-type="doi">10.1038/nm0111-135</pub-id><pub-id pub-id-type="pmid">21217686</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rung</surname> <given-names>J.</given-names></name> <name><surname>Brazma</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Reuse of public genome-wide gene expression data</article-title>. <source>Nat. Rev. Genet.</source> <volume>14</volume>, <fpage>89</fpage>&#x02013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3394</pub-id><pub-id pub-id-type="pmid">23269463</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taichman</surname> <given-names>D. B.</given-names></name> <name><surname>Backus</surname> <given-names>J.</given-names></name> <name><surname>Baethge</surname> <given-names>C.</given-names></name> <name><surname>Bauchner</surname> <given-names>H.</given-names></name> <name><surname>de Leeuw</surname> <given-names>P. W.</given-names></name> <name><surname>Drazen</surname> <given-names>J. M.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Sharing clinical trial data a proposal from the international committee of medical journal editors</article-title>. <source>N. Engl. J. Med.</source> <volume>374</volume>, <fpage>384</fpage>&#x02013;<lpage>386</lpage>. <pub-id pub-id-type="doi">10.1056/NEJMe1515172</pub-id><pub-id pub-id-type="pmid">26786954</pub-id></citation>
</ref>
</ref-list>
</back>
</article>
