<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Appl. Math. Stat.</journal-id>
<journal-title>Frontiers in Applied Mathematics and Statistics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Appl. Math. Stat.</abbrev-journal-title>
<issn pub-type="epub">2297-4687</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">668082</article-id>
<article-id pub-id-type="doi">10.3389/fams.2021.668082</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Applied Mathematics and Statistics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Topology-Inspired Method Recovers Obfuscated Term Information From Induced Software Call-Stacks</article-title>
<alt-title alt-title-type="left-running-head">Maggs and Robins</alt-title>
<alt-title alt-title-type="right-running-head">Topology of Call Stacks</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Maggs</surname>
<given-names>Kelly</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1302589/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Robins</surname>
<given-names>Vanessa</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1063092/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Mathematical Sciences Institute, Australian National University, <addr-line>Canberra</addr-line>, <addr-line>ACT</addr-line>, <country>Australia</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Research School of Physics, Australian National University, <addr-line>Canberra</addr-line>, <addr-line>ACT</addr-line>, <country>Australia</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/982034/overview">Fr&#xe9;d&#xe9;ric Chazal</ext-link>, Inria Saclay-&#xce;le-de-France Research Center, France</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1274012/overview">Andr&#xe9; Lieutier</ext-link>, Dassault Syst&#xe8;mes, France</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1290096/overview">Vincent Rouvreau</ext-link>, Inria Saclay-&#xce;le-de-France Research Center, France</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Vanessa Robins, <email>vanessa.robins@anu.edu.au</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>05</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>7</volume>
<elocation-id>668082</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>02</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>03</day>
<month>05</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Maggs and Robins.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Maggs and Robins</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Fuzzing is a systematic large-scale search for software vulnerabilities achieved by feeding a sequence of randomly mutated input files to the program of interest with the goal being to induce a crash. The information about inputs, software execution traces, and induced call stacks (crashes) can be used to pinpoint and fix errors in the code or exploited as a means to damage an adversary&#x2019;s computer software. In black box fuzzing, the primary unit of information is the call stack: a list of nested function calls and line numbers that report what the code was executing at the time it crashed. The source code is not always available in practice, and in some situations even the function names are deliberately obfuscated (i.e.,&#x20;removed or given generic names). We define a topological object called the call-stack topology to capture the relationships between module names, function names and line numbers in a set of call stacks obtained via black-box fuzzing. In a proof-of-concept study, we show that structural properties of this object in combination with two elementary heuristics allow us to build a logistic regression model to predict the locations of distinct function names over a set of call stacks. We show that this model can extract function name locations with around 80% precision in data obtained from fuzzing studies of various linux programs. This has the potential to benefit software vulnerability experts by increasing their ability to read and compare call stacks more efficiently.</p>
</abstract>
<kwd-group>
<kwd>fuzzing</kwd>
<kwd>crash-triage</kwd>
<kwd>software vulnerability research</kwd>
<kwd>call-stack analysis</kwd>
<kwd>topology</kwd>
<kwd>TDA</kwd>
<kwd>specialization pre-order</kwd>
</kwd-group>
<contract-sponsor id="cn001">Australian Research Council10.13039/501100000923</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>A black-box fuzzing campaign is one conducted without explicit knowledge of the source code or its intermediate representations. Generally, methods in this area require a brute-force generation of inputs. This can lead to masses of crashes where many are duplicates of one another. For practitioners, untangling the output of a black-box fuzzing campaign is a time-consuming task. The goal of this article is to investigate methods that alleviate the difficulty of comprehending such results.</p>
<sec id="s1-1">
<title>1.1 Call Stacks</title>
<p>When a program crashes, the slew of error text it returns to the user is referred to as the call-stack (Example in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). The call-stack is a record of the nested functions traced out by the program in its final moments and is one of the few pieces of information available to us when analyzing black-box fuzzing. The lines in the call-stack are called frames, and while contingent on the operating system&#x2019;s debugging syntax, decompose roughly into three columns: 1) the module (or filename), 2) the function and 3) the line number. We will refer to the set of all constituent modules, functions and line numbers in a set of call-stacks <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> as the <italic>terms</italic> in&#x20;<inline-formula id="inf2">
<mml:math id="m2">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>A simplified call-stack in our data-set with and without function terms.</p>
</caption>
<graphic xlink:href="fams-07-668082-g001.tif"/>
</fig>
<p>Further complicating matters is that&#x2014;depending on whether the source code is available&#x2014;call-stacks may have partial information excluded. In particular, when fuzzing programs without possession of source code or full knowledge of terms, the partially obscured call-stack may be the only source of information available.</p>
</sec>
<sec id="s1-2">
<title>1.2 Goals</title>
<p>The ANU researchers [<xref ref-type="bibr" rid="B1">1</xref>] provided to us a data set of call stacks generated by fuzzing several Linux programs with the afl fuzzing algorithm (See [<xref ref-type="bibr" rid="B2">2</xref>]). They were interested in answering two key research questions:<list list-type="simple">
<list-item>
<p>1. Clustering and Deduplication: determining the extent to which there are discernible clusters in the set of call-stacks.</p>
</list-item>
<list-item>
<p>2. Term Removal: quantifying how much information about function terms can be recovered given they are obscured (for example, as in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>).</p>
</list-item>
</list>
</p>
<p>While the first question has been studied in a number of contexts [<xref ref-type="bibr" rid="B3">3</xref>, <xref ref-type="bibr" rid="B4">4</xref>], to the best of our knowledge no attempt has been made at the second. In this paper, we show that once the data has been suitably whitelisted, the set of crashes contain a high number of exact duplicate call-stacks. This observation highlights a fundamental lack of diversity in the data generated by fuzzing, and alone is enough to answer the first question to a large extent.</p>
<p>To address the question of function term removal, we introduce a model of call-stack information using finite topological spaces, posets and the theory of [<xref ref-type="bibr" rid="B5">5</xref>]. This not only helps to quantify the significance of removing function terms, but is a useful object to capture the dependencies between terms in the set of call-stacks.</p>
</sec>
</sec>
<sec id="s2">
<title>2 Data-Set Overview</title>
<p>Six common Linux programs were fuzzed using the program afl. Key aspects of the program: the binary name, file extension, and version are presented in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. Call-stacks were generated within the framework of the GDB debugger using the AddressSanitizer (ASAN) [<xref ref-type="bibr" rid="B6">6</xref>]&#x20;tool.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>The number of crashes generated by fuzzing each program, and the number of unique crashes after whitelisting and removed exact duplicates. Within the set of call-stacks, some files were either blank or unable to be opened. We discarded such files, as appears in the second column from the&#x20;right.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Binary</th>
<th align="center">Extension</th>
<th align="center">Version</th>
<th align="center">Call stacks</th>
<th align="center">Discarded (per 1,000)</th>
<th align="center">Distinct</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">SoX</td>
<td align="center">mp3</td>
<td align="char" char=".">14.4.2</td>
<td align="char" char=".">40,017</td>
<td align="char" char="(">1 (0.02)</td>
<td align="char" char=".">12</td>
</tr>
<tr>
<td align="left">Librsvg</td>
<td align="center">Svg</td>
<td align="char" char=".">2.40.20</td>
<td align="char" char=".">6,276</td>
<td align="char" char="(">94 (15)</td>
<td align="char" char=".">68</td>
</tr>
<tr>
<td align="left">Libtiff</td>
<td align="center">Tiff</td>
<td align="char" char=".">4.0.9</td>
<td align="char" char=".">5,486</td>
<td align="char" char="(">2 (0.36)</td>
<td align="char" char=".">9</td>
</tr>
<tr>
<td align="left">Freetype</td>
<td align="center">Ttf</td>
<td align="char" char=".">2.5.3</td>
<td align="char" char=".">17,034</td>
<td align="char" char="(">1 (0.06)</td>
<td align="char" char=".">51</td>
</tr>
<tr>
<td align="left">SoX</td>
<td align="center">Wav</td>
<td align="char" char=".">14.4.2</td>
<td align="char" char=".">30,856</td>
<td align="char" char="(">1 (0.03)</td>
<td align="char" char=".">11</td>
</tr>
<tr>
<td align="left">Libxml2</td>
<td align="center">Xml</td>
<td align="char" char=".">2.9.0</td>
<td align="char" char=".">240,821</td>
<td align="char" char="(">7,467 (31)</td>
<td align="char" char=".">3,175</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Upon recommendation by the ANU cyber-security researchers, we performed several pre-processing whitelisting steps to each call-stack text file. Firstly, frames appearing up to and including the ASAN error frame were considered superfluous and hence deleted. In crashes that did not call the ASAN module, frames up to assert_fail were deleted. For every file, the two final generic end-of-file frames were deleted. Finally, we extracted three salient features from each frame: the module, the function and intra-module line number, and discarded the other information in the call-stack&#x20;file.</p>
<p>Unlike the afl protocol&#x2013;where crashes are de-duplicated based on a hashing scheme&#x2013;we labeled two call-stacks to be deplicates whenever their text files were identical after the pre-processing described above. A striking result of frequency analysis is that there are dramatically fewer distinct crashes relative to total crashes (see <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>). Further, the frequency of distinct crashes is unevenly distributed. Across programs, the call-stack data displayed largely the same pattern: most of the weight was distributed among a few crashes, with the rest rarely occurring (See <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Frequency (logarithmic scale) distribution of crashes in each binary, where the crash ID is with respect to distinct call-stacks.</p>
</caption>
<graphic xlink:href="fams-07-668082-g002.tif"/>
</fig>
</sec>
<sec id="s3">
<title>3 Topological Model</title>
<p>In this section, we propose a model to frame the complex dependency relationships between the terms <inline-formula id="inf3">
<mml:math id="m3">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula> appearing across a set of call-stacks <inline-formula id="inf4">
<mml:math id="m4">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>. Our model is inspired by the work of [<xref ref-type="bibr" rid="B5">5</xref>] on finite topological spaces, where pre-orders, equivalence classes and posets capture certain topological interactions between points. Our applications will use primarily the poset representation of the data, but we have included the topological perspective which motivated the original idea with the hope that future work may be able to further incorpate the topological characteristics of the&#x20;model.</p>
<p>Recall that in any topological space <italic>X</italic>, the open neighborhood <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of a point <inline-formula id="inf6">
<mml:math id="m6">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the set of open sets containing <italic>x</italic>. A rough intuition of point-set topologies over finite sets is that elements are considered close when they have similar open neighbourhoods. Our goal is to create a topology over the set of terms <inline-formula id="inf7">
<mml:math id="m7">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula> where those that occur in a similar set of call-stacks are&#x20;close.</p>
<sec id="s3-1">
<title>3.1 Call-Stack Topology</title>
<p>Given a set of call-stacks <inline-formula id="inf8">
<mml:math id="m8">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> comprised of terms <inline-formula id="inf9">
<mml:math id="m9">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula>, we define the call-stack topology <inline-formula id="inf10">
<mml:math id="m10">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> on <inline-formula id="inf11">
<mml:math id="m11">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula> to be that generated by treating each call-stack <inline-formula id="inf12">
<mml:math id="m12">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as an open set of terms. The complete collection of open sets in <inline-formula id="inf13">
<mml:math id="m13">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is then formed by taking all possible intersections and unions of call-stacks from&#x20;<inline-formula id="inf14">
<mml:math id="m14">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>Unlike many topologies we are familiar with, e.g., topologies generated by open balls in a metric space, the call-stack topology is seldom Hausdorff. In our context, the looser criterion of a <inline-formula id="inf15">
<mml:math id="m15">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> space is a more useful notion of point separation. Recall that a <inline-formula id="inf16">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> space <inline-formula id="inf17">
<mml:math id="m17">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is one where points may be distinguished by their open neighbourhoods; explicitly, for each pair of points <inline-formula id="inf18">
<mml:math id="m18">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> there exists either an open set in <inline-formula id="inf19">
<mml:math id="m19">
<mml:mi mathvariant="script">N</mml:mi>
</mml:math>
</inline-formula> containing <italic>x</italic> without <italic>y</italic> or <italic>y</italic> without&#x20;<italic>x</italic>.</p>
<p>Since distinct terms can appear in the same subset of call-stacks, <inline-formula id="inf20">
<mml:math id="m20">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is not even <inline-formula id="inf21">
<mml:math id="m21">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. However, we can transform <inline-formula id="inf22">
<mml:math id="m22">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> into a <inline-formula id="inf23">
<mml:math id="m23">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> space by taking the Kolmogorov quotient (see [<xref ref-type="bibr" rid="B7">7</xref>]). The Kolmogorov quotient <inline-formula id="inf24">
<mml:math id="m24">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is obtained from <inline-formula id="inf25">
<mml:math id="m25">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> by the equivalence relation <inline-formula id="inf26">
<mml:math id="m26">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> whenever they have the same open neighborhood <inline-formula id="inf27">
<mml:math id="m27">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. It is known that <inline-formula id="inf28">
<mml:math id="m28">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>X</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf29">
<mml:math id="m29">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> have the same homotopy type. By taking the Kolmogorov quotient of the call-stack topology, one reduces the object of study from a potentially large set of terms <inline-formula id="inf30">
<mml:math id="m30">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula> into a more manageable set of equivalence classes of terms <inline-formula id="inf31">
<mml:math id="m31">
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The following simple lemma shows that one may characterize equivalence classes in the Kolmogorov quotient of the call-stack topology by examining the set of call-stacks directly rather than the topology. For <inline-formula id="inf32">
<mml:math id="m32">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, we refer to the set of call-stacks in <inline-formula id="inf33">
<mml:math id="m33">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> which contain <italic>t</italic> as the call-stack neighborhood, using the notation&#x20;<inline-formula id="inf34">
<mml:math id="m34">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>L<sc>emma</sc> 1. For a set of call-stacks <inline-formula id="inf35">
<mml:math id="m35">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> comprised of terms <inline-formula id="inf36">
<mml:math id="m36">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula>, two terms <inline-formula id="inf37">
<mml:math id="m37">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are equivalent if and only if <inline-formula id="inf38">
<mml:math id="m38">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>P<sc>ROOF</sc>. The definition of the equivalence relation is <inline-formula id="inf39">
<mml:math id="m39">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> whenever <inline-formula id="inf40">
<mml:math id="m40">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the call-stack topology. Hence, we need to show that open neighbourhoods <inline-formula id="inf41">
<mml:math id="m41">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are equal if and only if call-stack neighbourhoods <inline-formula id="inf42">
<mml:math id="m42">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are&#x20;equal.</p>
<p>Suppose that <inline-formula id="inf43">
<mml:math id="m43">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Without loss of generality, suppose there exists <inline-formula id="inf44">
<mml:math id="m44">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> such that <inline-formula id="inf45">
<mml:math id="m45">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf46">
<mml:math id="m46">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. By the definition of call-stack topology, c is open and hence a member of <inline-formula id="inf47">
<mml:math id="m47">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Since <inline-formula id="inf48">
<mml:math id="m48">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, it follows that <inline-formula id="inf49">
<mml:math id="m49">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, proving one side of the statement.</p>
<p>Conversely, suppose that <inline-formula id="inf50">
<mml:math id="m50">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and further suppose that <inline-formula id="inf51">
<mml:math id="m51">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. All open sets in the topology generated by a set <inline-formula id="inf52">
<mml:math id="m52">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> may be expressed in the form<disp-formula id="e1">
<mml:math id="m53">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x222a;</mml:mo>
<mml:mi>j</mml:mi>
</mml:munder>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where each <inline-formula id="inf53">
<mml:math id="m54">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a generating set. The assumption <inline-formula id="inf54">
<mml:math id="m55">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> implies that <inline-formula id="inf55">
<mml:math id="m56">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>U</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and further that there exists j such that <inline-formula id="inf56">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> for all i. Since we have assumed that <inline-formula id="inf57">
<mml:math id="m58">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf58">
<mml:math id="m59">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula> implies that <inline-formula id="inf59">
<mml:math id="m60">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
<mml:mo>&#x2286;</mml:mo>
<mml:mi>U</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as well. This implies that <inline-formula id="inf60">
<mml:math id="m61">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. By the same argument <inline-formula id="inf61">
<mml:math id="m62">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, thus <inline-formula id="inf62">
<mml:math id="m63">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, and finishing the&#x20;proof.</p>
<p>According to the above lemma, equivalence classes in the Kolmogorov quotient of the call-stack topology consist of terms that occur in the same set of call-stacks. The intuition is that by taking the Kolmogorov quotient, we only consider terms up to the information of which call-stacks they appear in. The composition of equivalence classes in such a quotient will be a key feature for analysis in our application.</p>
<p>In theory, calculating such equivalence classes requires knowledge of open neighbourhoods and, ergo, the entire gamut of open sets in the call-stack topology. Aside from providing useful intuition, the above lemma also ensures that we can avoid this computationally expensive task, attaining equivalence classes indirectly by comparing the call-stack neighbourhoods of pairs of&#x20;terms.</p>
<p>Example 1. In <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>, we depict a set of three call-stacks. In the center of the Figure, the three circles each represent a generating set for the call-stack topology <inline-formula id="inf63">
<mml:math id="m64">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> over the constituent terms <inline-formula id="inf64">
<mml:math id="m65">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula> of <inline-formula id="inf65">
<mml:math id="m66">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>. The coloring of the terms represents their partition into equivalence classes under the Kolmogorov quotient operation. Following Lemma 1, equivalence classes consist of terms sharing identical call-stack neighbourhoods. This example also highlights that both the ordering of terms in the call-stack and the frequency of each term within it are both disregarded by the&#x20;model.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Three call-stacks <inline-formula id="inf66">
<mml:math id="m67">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> and their corresponding generating sets over their constituent terms <inline-formula id="inf67">
<mml:math id="m68">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula>.</p>
</caption>
<graphic xlink:href="fams-07-668082-g003.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>3.2 Call-Stack Partial Order</title>
<p>In this section we equip the set of call-stack terms with the additional structures of a pre-order and partial order. Our approach in later sections is to use this structure to examine relations between terms in different equivalence classes. For any topological space, one may use the structure of the open sets to define a pre-order over its points called the specialization pre-order. This may be defined in the following equivalent statements.</p>
<p>D<sc>efinition</sc> 1. <italic>For a topological space X, the</italic> specialization pre-order <inline-formula id="inf68">
<mml:math id="m69">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>&#x003c;</mml:mo>
<mml:mo>&#x003D;</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> over X is given by either<disp-formula id="equ1">
<mml:math id="m70">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>or equivalently<disp-formula id="equ2">
<mml:math id="m71">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>w</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>v</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>y</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2229;</mml:mo>
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mi>U</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>The specialization pre-order forms a partial order over <italic>X</italic> precisely when <italic>X</italic> is a <inline-formula id="inf69">
<mml:math id="m72">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> space, with the <inline-formula id="inf70">
<mml:math id="m73">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> condition ensuring that the order relation satisfies the anti-symmetry condition: <inline-formula id="inf71">
<mml:math id="m74">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf72">
<mml:math id="m75">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> implies <inline-formula id="inf73">
<mml:math id="m76">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>D<sc>efinition</sc> 2. The call-stack pre-order on a set of call-stacks <inline-formula id="inf74">
<mml:math id="m77">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the specialization pre-order over the call-stack topology <inline-formula id="inf75">
<mml:math id="m78">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>
<sc>Definition</sc> 3. The call-stack poset on a set of call-stacks <inline-formula id="inf76">
<mml:math id="m79">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the specialization pre-order over the Kolmogorov quotient <inline-formula id="inf77">
<mml:math id="m80">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the call-stack topology.</p>
<p>Unlike the call-stack topology <inline-formula id="inf78">
<mml:math id="m81">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in general, the Kolmogorov quotient <inline-formula id="inf79">
<mml:math id="m82">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of the call-stack topology is guaranteed to be <inline-formula id="inf80">
<mml:math id="m83">
<mml:mrow>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> space (see [<xref ref-type="bibr" rid="B7">7</xref>] for a full survey of Kolmogorov quotients). Note here that the call-stack poset is defined over equivalence classes of terms within the call-stacks, rather than the individual terms themselves. In moving to this construction, we reduce the space of information we are working with; order theoretic notions are considered between blocks of terms rather than individual&#x20;ones.</p>
<p>As is the case for equivalence classes, the call-stack partial order can be computed without explicitly calculating the open sets in the call-stack topology.</p>
<p>L<sc>emma</sc> 2. <italic>Two classes of terms</italic> <inline-formula id="inf81">
<mml:math id="m84">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <italic>satisfy an order relation</italic> <inline-formula id="inf82">
<mml:math id="m85">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <italic>in the call-stack partial order if and only if</italic> <inline-formula id="inf83">
<mml:math id="m86">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<italic>.</italic>
</p>
<p>P<sc>roof</sc>. Suppose first that <inline-formula id="inf84">
<mml:math id="m87">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Let <inline-formula id="inf85">
<mml:math id="m88">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be an open neighborhood of <inline-formula id="inf86">
<mml:math id="m89">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> in the call-stack topology. As in Equation 1, U may be expressed in the form<disp-formula id="equ3">
<mml:math id="m90">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x222a;</mml:mo>
<mml:mi>j</mml:mi>
</mml:munder>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>where there exists j such that <inline-formula id="inf87">
<mml:math id="m91">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula>. Since <inline-formula id="inf88">
<mml:math id="m92">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, it follows that <inline-formula id="inf89">
<mml:math id="m93">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mstyle displaystyle="true">
<mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</inline-formula> also, implying that <inline-formula id="inf90">
<mml:math id="m94">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf91">
<mml:math id="m95">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Thus, <inline-formula id="inf92">
<mml:math id="m96">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> as required.</p>
<p>In the other direction, suppose that <inline-formula id="inf93">
<mml:math id="m97">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is not a subset of <inline-formula id="inf94">
<mml:math id="m98">
<mml:mrow>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. Then there exists <inline-formula id="inf95">
<mml:math id="m99">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> such that <inline-formula id="inf96">
<mml:math id="m100">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2209;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Since c is open in the call-stack topology, <inline-formula id="inf97">
<mml:math id="m101">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf98">
<mml:math id="m102">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2209;</mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, implying that <inline-formula id="inf99">
<mml:math id="m103">
<mml:mrow>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="normal">&#x2288;N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and thus <inline-formula id="inf100">
<mml:math id="m104">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The above lemma suggests how one should interpret the call-stack partial order: two sets of terms <inline-formula id="inf101">
<mml:math id="m105">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> satisfy an order relation <inline-formula id="inf102">
<mml:math id="m106">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> when every call-stack containing the <inline-formula id="inf103">
<mml:math id="m107">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> terms also contains the <inline-formula id="inf104">
<mml:math id="m108">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> terms. In this sense, witnessing the terms in <inline-formula id="inf105">
<mml:math id="m109">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> depends on witnessing the terms in <inline-formula id="inf106">
<mml:math id="m110">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> across the call-stacks in&#x20;<inline-formula id="inf107">
<mml:math id="m111">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>.</p>
<p>Example 2. In <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, we depict the call-stack poset corresponding to the example call-stack topology provided in Example 1. Each circle contains an equivalence class of terms that have identical call-stack neighbourhoods. Lemma 2 tells us that an order relation <inline-formula id="inf108">
<mml:math id="m112">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> between classes occurs when <inline-formula id="inf109">
<mml:math id="m113">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>; namely, when all call-stacks containing <inline-formula id="inf110">
<mml:math id="m114">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> also contain&#x20;<inline-formula id="inf111">
<mml:math id="m115">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>The call-stack poset corresponding to the call-stack topology defined in Example 1.</p>
</caption>
<graphic xlink:href="fams-07-668082-g004.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>3.3 Function Term Obfuscation</title>
<p>One of the key research questions is how much information can be extracted from the call-stack when terms are removed. Let <inline-formula id="inf112">
<mml:math id="m116">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula> be a collection of call-stacks, tokenized into a set of terms <inline-formula id="inf113">
<mml:math id="m117">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula>. To consider the effect on the model of removing a single term <inline-formula id="inf114">
<mml:math id="m118">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, let <inline-formula id="inf115">
<mml:math id="m119">
<mml:mrow>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mtext>&#x27;</mml:mtext>
<mml:mo>&#x225c;</mml:mo>
<mml:mi mathvariant="fraktur">I&#x2216;</mml:mi>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and<disp-formula id="equ4">
<mml:math id="m120">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mtext>&#x27;</mml:mtext>
<mml:mo>&#x225c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi mathvariant="normal">&#x2216;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>be the set of call stacks with the term <italic>t</italic> removed. Note here that each <inline-formula id="inf116">
<mml:math id="m121">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is a set of terms <inline-formula id="inf117">
<mml:math id="m122">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, so the set notation <inline-formula id="inf118">
<mml:math id="m123">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi mathvariant="normal">&#x2216;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>t</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> makes sense. The following lemma describes the effect on the call-stack poset when <italic>t</italic> is removed.</p>
<p>L<sc>emma</sc> 3. Suppose <inline-formula id="inf119">
<mml:math id="m124">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. Then<list list-type="simple">
<list-item>
<p>1. <inline-formula id="inf120">
<mml:math id="m125">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="fraktur">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>if&#xa0;and&#xa0;only&#xa0;if</mml:mtext>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi>T</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>2. <inline-formula id="inf121">
<mml:math id="m126">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>if&#xa0;and&#xa0;only&#xa0;if</mml:mtext>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi>T</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
<p>P<sc>roof</sc>. For <inline-formula id="inf122">
<mml:math id="m127">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf123">
<mml:math id="m128">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in <inline-formula id="inf124">
<mml:math id="m129">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> if and only if there exists <inline-formula id="inf125">
<mml:math id="m130">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> such that either <inline-formula id="inf126">
<mml:math id="m131">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf127">
<mml:math id="m132">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf128">
<mml:math id="m133">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf129">
<mml:math id="m134">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. This occurs if and only if there exists <inline-formula id="inf130">
<mml:math id="m135">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> such that either <inline-formula id="inf131">
<mml:math id="m136">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf132">
<mml:math id="m137">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf133">
<mml:math id="m138">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2209;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf134">
<mml:math id="m139">
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, which is equivalent to <inline-formula id="inf135">
<mml:math id="m140">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2260;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in <inline-formula id="inf136">
<mml:math id="m141">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. For <inline-formula id="inf137">
<mml:math id="m142">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf138">
<mml:math id="m143">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in <inline-formula id="inf139">
<mml:math id="m144">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf140">
<mml:math id="m145">
<mml:mo>&#x21d4;</mml:mo>
</mml:math>
</inline-formula> <inline-formula id="inf141">
<mml:math id="m146">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x21d4;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x21d4;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in <inline-formula id="inf142">
<mml:math id="m147">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>One can summarize the above result as the fact that <inline-formula id="inf143">
<mml:math id="m148">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is isomorphic to <inline-formula id="inf144">
<mml:math id="m149">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> whenever the singleton <inline-formula id="inf145">
<mml:math id="m150">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is not an equivalence class. When it is an equivalence class, it is the only difference between the two call-stack posets <inline-formula id="inf146">
<mml:math id="m151">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf147">
<mml:math id="m152">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. By inductively removing all of the function terms, <inline-formula id="inf148">
<mml:math id="m153">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mo>&#x2282;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and applying the lemma at each step, we attain the following corollary.</p>
<p>C<sc>orollary</sc> 1. Let <inline-formula id="inf149">
<mml:math id="m154">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> be the call-stack poset over <inline-formula id="inf150">
<mml:math id="m155">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mo>&#x225c;</mml:mo>
<mml:mi mathvariant="fraktur">I&#x2216;&#x2131;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> generated by<disp-formula id="equ5">
<mml:math id="m156">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mo>&#x225c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi mathvariant="normal">&#x2216;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>Then <inline-formula id="inf151">
<mml:math id="m157">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the sub-poset of <inline-formula id="inf152">
<mml:math id="m158">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> spanned by equivalence classes<disp-formula id="equ6">
<mml:math id="m159">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mi mathvariant="normal">&#x2288;&#x2131;</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>In other words, whenever we remove function terms from the model, the structure of the call-stack poset is unchanged away from classes comprised solely of function terms. When a function term <italic>t</italic> shares an equivalence class with non-function terms, these may be used to recover its structural dependency information even when <italic>t</italic> is removed. The point of the above theorems is to motivate the idea that many attributes of the call-stack poset are retained in the case where some terms are missing.</p>
</sec>
</sec>
<sec id="s4">
<title>4 Function Term Reconstruction</title>
<p>The goal of this paper is to reconstruct information about function terms from call-stacks in which they are obscured. In this section, we present a small-scale experiment on our linux data set using features extracted from the call-stack topology&#x20;model.</p>
<p>Accordingly, we must first define what we mean by &#x2018;function information.&#x2019; When the function names are missing, it is not possible to recover them explicitly from the call-stack data. The next best data, and what we choose to focus on in this paper, is to recover the set of positions within the call-stacks that share a common function name. This notion is captured in the following definition.</p>
<p>D<sc>efinition</sc> 4. For a term <inline-formula id="inf153">
<mml:math id="m160">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> within a set of call stacks <inline-formula id="inf154">
<mml:math id="m161">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>, define the frame trace <inline-formula id="inf155">
<mml:math id="m162">
<mml:mrow>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of t to be the set of pairs<disp-formula id="equ7">
<mml:math id="m163">
<mml:mrow>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x225c;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>where <inline-formula id="inf156">
<mml:math id="m164">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the <italic>n</italic>th frame of call stack&#x20;c.</p>
<p>If <italic>t</italic> appears in multiple frames <inline-formula id="inf157">
<mml:math id="m165">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf158">
<mml:math id="m166">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>n</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> within the same crash <inline-formula id="inf159">
<mml:math id="m167">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, then both <inline-formula id="inf160">
<mml:math id="m168">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf161">
<mml:math id="m169">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>n</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> are elements of <inline-formula id="inf162">
<mml:math id="m170">
<mml:mrow>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mi mathvariant="script">C</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. For any pair of terms of the same type in a set of call stacks, their frame traces must be disjoint. It is impossible to guess the explicit names of obscured function terms. However, if one can recover the frame traces of every function, then one can generate call stacks that are equivalent up to re-naming function&#x20;terms.</p>
<p>By performing logistic regression over features extracted from the call-stack model, we will show that a surprising number of function frame traces can be recovered without any explicit knowledge of function names. This is particularly striking given that the user also knows nothing about the internal structure of the program. Additionally, we provide an algorithm for generating fake function names based on the guessed frame traces, making sets of call stacks more human-readable in the setting where function names are missing.</p>
<sec id="s4-1">
<title>4.1 Preliminary Analysis of Call-Stack Equivalence Classes and Poset Structure</title>
<p>To motivate the use of our novel tools in the task of recovering function frame traces, we first present a basic analysis of the data through the lens of the call-stack topology and poset. In particular, we study the characteristics of equivalence classes&#x2014;their size and the types of terms of which they comprise&#x2014;as well as the order relations and dependencies they exhibit on one another.</p>
<sec id="s4-1-1">
<title>4.1.1 Basic Statistics</title>
<p>Recall that the equivalence classes in the call-stack topology consist of terms that occur in the same set of call stacks. <xref ref-type="table" rid="T2">Table&#x20;2</xref> shows the extent of reduction from the number of terms to their equivalence classes under the quotient operation.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Call-stack model and term statistics for each linux program.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">SoX (m)</th>
<th align="center">Librsvg</th>
<th align="center">Libtiff</th>
<th align="center">Freetype</th>
<th align="center">SoX (w)</th>
<th align="center">Libxml</th>
<th align="center">Mean</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Modules</td>
<td align="char" char=".">15</td>
<td align="char" char=".">15</td>
<td align="char" char=".">8</td>
<td align="char" char=".">23</td>
<td align="char" char=".">14</td>
<td align="char" char=".">17</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Functions</td>
<td align="char" char=".">36</td>
<td align="char" char=".">92</td>
<td align="char" char=".">17</td>
<td align="char" char=".">48</td>
<td align="char" char=".">34</td>
<td align="char" char=".">151</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Line num</td>
<td align="char" char=".">43</td>
<td align="char" char=".">99</td>
<td align="char" char=".">21</td>
<td align="char" char=".">81</td>
<td align="char" char=".">40</td>
<td align="char" char=".">361</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Total terms</td>
<td align="char" char=".">94</td>
<td align="char" char=".">206</td>
<td align="char" char=".">46</td>
<td align="char" char=".">152</td>
<td align="char" char=".">88</td>
<td align="char" char=".">529</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Classes</td>
<td align="char" char=".">33</td>
<td align="char" char=".">113</td>
<td align="char" char=".">14</td>
<td align="char" char=".">66</td>
<td align="char" char=".">27</td>
<td align="char" char=".">343</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Reduction %</td>
<td align="char" char=".">65%</td>
<td align="char" char=".">45%</td>
<td align="char" char=".">70%</td>
<td align="char" char=".">56%</td>
<td align="char" char=".">69%</td>
<td align="char" char=".">35%</td>
<td align="char" char=".">57%</td>
</tr>
<tr>
<td align="left">Order relations</td>
<td align="char" char=".">108</td>
<td align="char" char=".">1,024</td>
<td align="char" char=".">22</td>
<td align="char" char=".">352</td>
<td align="char" char=".">89</td>
<td align="char" char=".">2,220</td>
<td align="left"/>
</tr>
<tr>
<td align="left">F-loss</td>
<td align="char" char=".">4</td>
<td align="char" char=".">32</td>
<td align="char" char=".">0</td>
<td align="char" char=".">6</td>
<td align="char" char=".">2</td>
<td align="char" char=".">29</td>
<td align="left"/>
</tr>
<tr>
<td align="left">F-retention %</td>
<td align="char" char=".">89%</td>
<td align="char" char=".">65%</td>
<td align="char" char=".">100%</td>
<td align="char" char=".">88%</td>
<td align="char" char=".">94%</td>
<td align="char" char=".">81%</td>
<td align="char" char=".">86%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Our primary interest is to understand the effect of obscuring function terms. Corollary 1 states that removing the function terms only alters the model&#x2019;s structure at equivalence classes consisting of function terms alone. Accordingly, we say a function term <italic>f</italic> is <italic>retained</italic> under the quotient operation when it is equivalent to a non-function term <italic>t</italic>. Notably, in the case <inline-formula id="inf163">
<mml:math id="m171">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is retained, there exists a term <inline-formula id="inf164">
<mml:math id="m172">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="fraktur">I&#x2216;&#x2131;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> with call-stack set equivalent to&#x20;<italic>f</italic>.</p>
<p>
<xref ref-type="table" rid="T2">Table&#x20;2</xref> shows that, on average, <inline-formula id="inf165">
<mml:math id="m173">
<mml:mrow>
<mml:mn>86</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> of function terms are retained. Extensive term equivalences in the call-stack topology mean that a dramatic reduction in the available terms has little effect on the call-stack poset structure. The main takeaway from this analysis is that function terms rarely occur in an equivalence class on their&#x20;own.</p>
</sec>
<sec id="s4-1-2">
<title>4.1.2 Patterns Relating Line Numbers and Function Terms</title>
<p>When two terms are in the same equivalence class, they occur in the same set of call-stacks. However, our topological model encodes none of the information about which frame they occur in. Our toy example (Example 1) suggested that line numbers and functions in the same equivalence class tend to occupy similar frames in the call-stack. In a thorough examination of the data, we observed two patterns, demonstrating each with the example call stacks in <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>.<list list-type="simple">
<list-item>
<p>&#x2022; Pattern 1<italic>:</italic> When multiple line numbers belong to an equivalence class, they are usually paired with functions in the same frame except for the line number in the bottom frame. The lowest line number appears to act as a switch point between blocks of terms, instigating a run of function calls that are either seen in only one call-stack or always together. In <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>, this occurs in the brown and green equivalence classes of the example on the&#x20;left.</p>
</list-item>
<list-item>
<p>&#x2022; Pattern 2<italic>:</italic> When a single line number is in an equivalence class with a function, it is likely paired with a function one frame above. In <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>, this occurs in the purple and orange boxes of the left call-stack, and the purple, orange and green boxes of the&#x20;right.</p>
</list-item>
</list>
</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Example call stacks from the Linux data exhibiting the behavior described by the two patterns. The left and right examples are taken from two separate libraries, where each of the seven colors represents an equivalence class in the call-stack topology; that is, a set of terms which have identical call-stack neighbourhoods.</p>
</caption>
<graphic xlink:href="fams-07-668082-g005.tif"/>
</fig>
<p>It is important to state that neither pattern reflects an underlying mathematical truth. Rather, they seem to be a symptom of programming convention. Namely, as source code tends to decompose into many different simple functions nested within one another, runs of frames in the call-stack tend to cycle through distinct function names. Further, these patterns only apply in the case that a line number occurs in the same set of crashes as a function, in which case we assume that they describe how the frames of a function and line number are related.</p>
</sec>
</sec>
<sec id="s4-2">
<title>4.2 Method</title>
<p>Our method for frame trace recovery is centered around leveraging structure of the call-stack topology and poset. To do so, we generate the call-stack equivalence classes <inline-formula id="inf166">
<mml:math id="m174">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi>T</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and poset <inline-formula id="inf167">
<mml:math id="m175">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="fraktur">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> from the incomplete data <inline-formula id="inf168">
<mml:math id="m176">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, i.e.,&#x20;the set of terms with function names omitted. The intuition of Corollary 1&#x20;&#x2014;&#x20;as well as the empirical observations of <xref ref-type="table" rid="T2">Table&#x20;2</xref> &#x2014;&#x20;suggest that such objects should be relatively similar to their counterparts <inline-formula id="inf169">
<mml:math id="m177">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="fraktur">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> generated from the full data that we aim to partially reconstruct.</p>
<p>Once such objects are constructed, our approach consists of the following two steps.<list list-type="simple">
<list-item>
<p>1. Classifying Equivalence Classes: Within the incomplete data model, the terms of an equivalence class <inline-formula id="inf170">
<mml:math id="m178">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi>T</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> consist only of line numbers and module names. However, in the complete data model, there may exist function terms that are also in the corresponding class. The first step of our method is to estimate the likelihood that an equivalence class <inline-formula id="inf171">
<mml:math id="m179">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msup>
<mml:mi>T</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the incomplete data corresponds to an equivalence class containing a function term in the full data <inline-formula id="inf172">
<mml:math id="m180">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>2. Pairing Frame Traces: The second step of our method is to apply our two observations above to obtain a heuristic for predicting frame trace locations of function terms. This is done by selecting line numbers within a given equivalence class whose frame traces are likely to be paired with a function term frame trace in the complete data. The frame traces of these line numbers serve as our set of predictions for function frame traces and enable us to partially reconstruct the data&#x20;set.</p>
</list-item>
</list>
</p>
<sec id="s4-2-1">
<title>4.2.1 Classifying Equivalence Classes Within Libxml</title>
<p>The first step in our method of frame trace recovery is learning to detect when an equivalence class contains a function term. Before tackling the task of classifying equivalence classes in the incomplete data, we restrict our focus to a small study of libxml, which offers the largest base of terms and equivalence classes from which to garner information. We outline our method here to examine the relationship between the structure of an equivalence class within the libxml call-stack poset and the types of terms that it contains.</p>
<p>Consider the following three binary classification problems over the equivalence classes <inline-formula id="inf173">
<mml:math id="m181">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in the call-stack topology.<list list-type="simple">
<list-item>
<p>1. Modules: each equivalence class <inline-formula id="inf174">
<mml:math id="m182">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is labeled 1 if there exists a module <inline-formula id="inf175">
<mml:math id="m183">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2133;</mml:mi>
<mml:mo>&#x2229;</mml:mo>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and 0 otherwise.</p>
</list-item>
<list-item>
<p>2. Functions: each equivalence class <inline-formula id="inf176">
<mml:math id="m184">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is labeled 1 if there exists a function <inline-formula id="inf177">
<mml:math id="m185">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2131;</mml:mi>
<mml:mo>&#x2229;</mml:mo>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and 0 otherwise.</p>
</list-item>
<list-item>
<p>3. Line Numbers: each equivalence class <inline-formula id="inf178">
<mml:math id="m186">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is labeled 1 if there exists a line number <inline-formula id="inf179">
<mml:math id="m187">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
<mml:mo>&#x2229;</mml:mo>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and 0 otherwise.</p>
</list-item>
</list>
</p>
<p>We address each of the above by performing a simple logistic regression based on four features in the call-stack model. For an equivalence class <inline-formula id="inf180">
<mml:math id="m188">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>T</mml:mi>
<mml:mo>&#x223c;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, these are as follows.<list list-type="simple">
<list-item>
<p>1. The size <inline-formula id="inf181">
<mml:math id="m189">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> of an equivalence&#x20;class.</p>
</list-item>
<list-item>
<p>2. The frequency (number of call-stacks) of the class <inline-formula id="inf182">
<mml:math id="m190">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>3. The weighted in-degree</p>
</list-item>
</list>
<disp-formula id="equ8">
<mml:math id="m191">
<mml:mrow>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:munder>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>of the class within the order graph of the call-graph poset <inline-formula id="inf183">
<mml:math id="m192">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">C</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.<list list-type="simple">
<list-item>
<p>4. The <italic>weighted out-degree</italic>
</p>
</list-item>
</list>
<disp-formula id="equ9">
<mml:math id="m193">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:munder>
<mml:mi>&#x3d5;</mml:mi>
</mml:mstyle>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>of the class within the order graph of the call-graph poset <inline-formula id="inf184">
<mml:math id="m194">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">C</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi mathvariant="fraktur">T</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The names given to features 3 and 4 reference the fact that a poset <italic>p</italic> can be represented as a graph whose nodes are elements of <italic>p</italic> and edges are order relations <inline-formula id="inf185">
<mml:math id="m195">
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. The in- and out-degree of <inline-formula id="inf186">
<mml:math id="m196">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> are the number of equivalence classes that <inline-formula id="inf187">
<mml:math id="m197">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> depends on and that depend on <inline-formula id="inf188">
<mml:math id="m198">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> respectively. To incorporate the magnitude of such dependencies, the weight of an order relation is determined by the function<disp-formula id="equ10">
<mml:math id="m199">
<mml:mrow>
<mml:mi>&#x3d5;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>Lastly, for normalization each of the four variables is scaled by minimum and maximum to lie within <inline-formula id="inf189">
<mml:math id="m200">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>0,1</mml:mn>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, making the logistic regression weights comparable across variables.</p>
<p>Since the classification labels are unbalanced, the classes were re-weighted according to the to sci-kit learn class re-weighting scheme. To prevent over-fitting, the data was randomly split into an <inline-formula id="inf190">
<mml:math id="m201">
<mml:mrow>
<mml:mn>80</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> training set and <inline-formula id="inf191">
<mml:math id="m202">
<mml:mrow>
<mml:mn>20</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> testing set. To measure results, we use the F1 score and Area Under (precision-recall) Curve, which is suggested to be the most sensible measurements when predicting heavily weighted classes in binary classification (See&#x20;[<xref ref-type="bibr" rid="B8">8</xref>]).</p>
<p>As a baseline to compare the statistical significance of our method, we propose the following binary classification null-model. Firstly, we empirically derive three probabilities from the ratio of the number of equivalence classes containing each term over the total number of equivalence classes. For each type of term, the null-model randomly guesses whether each class contains that particular term type with the empirically derived probability</p>
</sec>
<sec id="s4-2-2">
<title>4.2.2 Classifying Incomplete-Data Equivalence Classes</title>
<p>Once we have attained logistic regression weights for the libxml data, we then apply them to other programs. An important point of this stage is that, unlike the libxml program experiment, we withhold the full-data with function names as a validation set. This means that the call-stack topology and poset are generated for each program from the call-stacks <inline-formula id="inf192">
<mml:math id="m203">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> with function terms obscured <inline-formula id="inf193">
<mml:math id="m204">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>From each of these objects, we extract the same four features as above, normalizing in the same way to ensure that the learned libxml weights scale appropriately. The goal of this stage is to predict whether an equivalence class in the incomplete data is likely to contain a function in the full-data, thus predicting a set of call-stacks which share a common missing function&#x20;term.</p>
</sec>
<sec id="s4-2-3">
<title>4.2.3 Pairing Line Numbers With Function Frame Traces</title>
<p>The outline of our approach to predicting function frame traces is to 1) guess when a line number in the incomplete-data model was likely to have been in an equivalence class with a function name and 2) generating predicted frame traces for functions from line number frame traces using our two heuristics. Algorithm LABEL:predict_FTs ties these two steps together, taking in the set of obscured call-stacks <inline-formula id="inf194">
<mml:math id="m205">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and their frame traces <inline-formula id="inf195">
<mml:math id="m206">
<mml:mrow>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> then returning a set of predicted frame traces. The value <italic>p</italic> is a cut-off likelihood for using logistic regression weights to decide when a pattern should be applied to predict a frame&#x20;trace.</p>
<p>
<statement content-type="algorithm" id="alq1">
<label>Algorithm 1</label>
<p>PredictFTs<inline-formula id="inf196">
<mml:math id="m207">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>&#x2033;</mml:mo>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</statement>
</p>
<p>
<statement>
<p>
<inline-formula id="inf197">
<mml:math id="m208">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</inline-formula>
<inline-formula id="inf198">
<mml:math id="m209">
<mml:mi mathvariant="fraktur">I</mml:mi>
</mml:math>
</inline-formula>
<inline-formula id="inf199">
<mml:math id="m210">
<mml:mrow>
<mml:mi mathvariant="sans-serif">predicted</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">fts</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf200">
<mml:math id="m211">
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi mathvariant="fraktur">I</mml:mi>
<mml:mo>&#x2dc;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mtext>&#x27;&#x27;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mtext>&#x27;&#x27;</mml:mtext>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf201">
<mml:math id="m212">
<mml:mrow>
<mml:mi mathvariant="normal">&#x2119;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>&#x2203;</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2265;</mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf202">
<mml:math id="m213">
<mml:mi mathvariant="normal">&#x25b7;</mml:mi>
</mml:math>
</inline-formula>
<inline-formula id="inf203">
<mml:math id="m214">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3e;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf204">
<mml:math id="m215">
<mml:mi mathvariant="normal">&#x25b7;</mml:mi>
</mml:math>
</inline-formula>
<inline-formula id="inf205">
<mml:math id="m216">
<mml:mrow>
<mml:mi mathvariant="sans-serif">lines</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
<mml:mi mathvariant="sans-serif">drop</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">bottom</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">linenum</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf206">
<mml:math id="m217">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="sans-serif">lines</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf207">
<mml:math id="m218">
<mml:mrow>
<mml:mi mathvariant="sans-serif">predicted</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">fts</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="sans-serif">predicted</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">fts</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mtext>&#x27;&#x27;</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>&#xa0;for&#xa0;</mml:mtext>
<mml:mi>l</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="sans-serif">lines</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf208">
<mml:math id="m219">
<mml:mrow>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>
<inline-formula id="inf209">
<mml:math id="m220">
<mml:mi mathvariant="normal">&#x25b7;</mml:mi>
</mml:math>
</inline-formula>
<inline-formula id="inf300">
<mml:math id="m221">
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2229;</mml:mo>
<mml:mi mathvariant="normal">&#x2112;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf301">
<mml:math id="m222">
<mml:mrow>
<mml:mi mathvariant="sans-serif">predicted</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">fts</mml:mi>
<mml:mo>.</mml:mo>
<mml:mi mathvariant="sans-serif">append</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1,0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="sans-serif">F</mml:mi>
<mml:msub>
<mml:mi mathvariant="sans-serif">T</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mtext>&#x27;&#x27;</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf302">
<mml:math id="m223">
<mml:mrow>
<mml:mi mathvariant="sans-serif">predicted</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="sans-serif">fts</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
</p>
</statement>
</p>
<p>In our algorithm, the logistic regression weights in Line 3 learned over libxml serve as the basis for detecting whether there exists a function in each line number&#x2019;s equivalence class. Using only the libxml weights ensures that when we predict function frame traces, we require no information about function names in other programs beforehand.</p>
<p>The logistic regression is learned over the full-term model of libxml then performed over the call-stack topology models generated without terms in other programs. There are two significant obstacles that the model must overcome to be successful. Firstly, the model must exhibit <italic>transference</italic> if the regression weights from libxml are to work for other programs. Secondly, the model must be <italic>robust</italic> to term removal given that it classifies over a model without function terms. On the second point, Corollary 1 states that the call-stack model retains much of its structure when function terms are removed, which suggests that the logistic regression weights have a chance of still being applicable. </p>
<p>The role of the logistic regression model is primarily to act as a gate-keeper, probabilistically determining when a given line number is <italic>not</italic> in the same class as any function. This prevents the model from over-predicting instances where a function&#x2019;s frame trace should be paired to that of a line number. The method drop bottom lineum in Line 5 removes the line number with the lowest average frame trace from the set, which is necessary to apply pattern&#x20;1.</p>
</sec>
</sec>
<sec id="s4-3">
<title>4.3 Results</title>
<sec id="s4-3-1">
<title>4.3.1 Learning Libxml Logistic Regression Weights</title>
<p>In <xref ref-type="table" rid="T3">Table&#x20;3</xref> we present the results of our binary classification experiment within the libxml data described in Subsection 4.2.1. The results show that the inclusion of call-stack topology features significantly improves the quality of prediction across terms when compared with the null model. To quantify the effect of each feature in classification, we plot the logistic regression weights in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>. In all cases, the frequency of a term has little effect on classification. For individual term types, there are several observations about the model variables that detect its presence in an equivalence class.<list list-type="simple">
<list-item>
<p>&#x2022; Modules are likely to be in smaller equivalence classes with lower weighted in-degree and higher weighted out-degree. This means more terms depend on them than they depend&#x20;on.</p>
</list-item>
<list-item>
<p>&#x2022; Functions are likely to be in large equivalence classes, with high out-degree. This means many terms are likely to depend on&#x20;them.</p>
</list-item>
<list-item>
<p>&#x2022; Line Numbers are likely to be in larger equivalence classes, with a low weighted out-degree. This means terms are unlikely to depend on line numbers.</p>
</list-item>
</list>
</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Proportion of equivalence classes containing each term type, and logistic regression F1 score and AUC improvements on the null-model for the libxml call-stacks.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">Module</th>
<th align="center">Function</th>
<th align="center">Line number</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">% Equivalence Classes</td>
<td align="center">
<inline-formula id="inf224">
<mml:math id="m224">
<mml:mrow>
<mml:mn>4.76</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf225">
<mml:math id="m225">
<mml:mrow>
<mml:mn>65.01</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
<td align="center">
<inline-formula id="inf226">
<mml:math id="m226">
<mml:mrow>
<mml:mn>88.06</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">F1</td>
<td align="char" char=".">0.40</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">0.92</td>
</tr>
<tr>
<td align="left">Null model</td>
<td align="char" char=".">0.00</td>
<td align="char" char=".">0.40</td>
<td align="char" char=".">0.90</td>
</tr>
<tr>
<td align="left">AUC</td>
<td align="char" char=".">0.20</td>
<td align="char" char=".">0.94</td>
<td align="char" char=".">0.97</td>
</tr>
<tr>
<td align="left">Null model</td>
<td align="char" char=".">0.04</td>
<td align="char" char=".">0.43</td>
<td align="char" char=".">0.93</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Logistic regression weights for libxml fitted to each term&#x20;type.</p>
</caption>
<graphic xlink:href="fams-07-668082-g006.tif"/>
</fig>
<p>Each observation agrees with the structure of library dependencies, where line numbers depend on functions, and functions depend on modules. The logistic regression model is notably adept at detecting the presence of function terms within a given equivalence&#x20;class.</p>
</sec>
<sec id="s4-3-2">
<title>4.3.2 Frame Trace Recovery</title>
<p>The <sc>PredictFts</sc> algorithm is run over each Linux program. Since the function term information in libxml was used to generate the logistic regression weights, we exclude it from the analysis. To measure the results, we compare the set of predicted function frame traces generated by the algorithm against the set of actual function frame traces in each set of call stacks.</p>
<p>
<xref ref-type="table" rid="T4">Table&#x20;4</xref> contains the results of each experiment with three different cut-off probabilities 0.4, 0.5 and 0.6. Despite the heavy reliance on fairly na&#xef;ve heuristics, our model has a reasonable mean precision of above 0.75 is each case. Notably, both precision and recall of frame traces are relatively stable across each program. This suggests that the libxml logistic regression weights and the heuristics both exhibit some degree of transference across programs.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Precision and recall of <sc>PredictFts</sc> algorithm at cut-off probabilities 0.4, 0.5 and 0.6.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left"/>
<th align="center">SoX (m)</th>
<th align="center">Librsvg</th>
<th align="center">Libtiff</th>
<th align="center">Freetype</th>
<th align="center">SoX (w)</th>
<th align="center">Mean</th>
<th align="center">Cut-off probability</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Precision</td>
<td align="char" char=".">0.71</td>
<td align="char" char=".">0.57</td>
<td align="char" char=".">1.0</td>
<td align="char" char=".">0.78</td>
<td align="char" char=".">0.67</td>
<td align="char" char=".">0.75</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Recall</td>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.29</td>
<td align="char" char=".">0.71</td>
<td align="char" char=".">0.6</td>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.50</td>
<td align="char" char=".">0.4</td>
</tr>
<tr>
<td align="left">Precision</td>
<td align="char" char=".">0.71</td>
<td align="char" char=".">0.71</td>
<td align="char" char=".">1.0</td>
<td align="char" char=".">0.76</td>
<td align="char" char=".">0.66</td>
<td align="char" char=".">0.77</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Recall</td>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.22</td>
<td align="char" char=".">0.71</td>
<td align="char" char=".">0.52</td>
<td align="char" char=".">0.47</td>
<td align="char" char=".">0.48</td>
<td align="char" char=".">0.5</td>
</tr>
<tr>
<td align="left">Precision</td>
<td align="char" char=".">0.73</td>
<td align="char" char=".">0.89</td>
<td align="char" char=".">1.0</td>
<td align="char" char=".">0.86</td>
<td align="char" char=".">0.74</td>
<td align="char" char=".">0.84</td>
<td align="left"/>
</tr>
<tr>
<td align="left">Recall</td>
<td align="char" char=".">0.44</td>
<td align="char" char=".">0.18</td>
<td align="char" char=".">0.53</td>
<td align="char" char=".">0.52</td>
<td align="char" char=".">0.41</td>
<td align="char" char=".">0.42</td>
<td align="char" char=".">0.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>, we analyze the effect of the cut-off probability parameter in detail. When this parameter is high, the algorithm requires a large degree of confidence that an equivalence class contains a function term before predicting a frame trace. This is reflected in an increasing precision and decreasing recall as the cut-off probability increases.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Precision and recall for the PredictFTs algorithm on each program indexed by the cut-off probability parameter.</p>
</caption>
<graphic xlink:href="fams-07-668082-g007.tif"/>
</fig>
<p>The cut-off probability parameter indirectly allows the user to dictate the importance of precision at the expense of recall. Given that the purpose of our experiment is to reliably reconstruct what function names we can, the importance of precision outweighs that of recall. Indeed, there exist function names in the data that could not possibly be recalled from the module and line number information alone. For example, large swathes of function names are hidden behind the repeated line number 0 in the librsvg data (<xref ref-type="fig" rid="F8">Figure&#x20;8</xref>), rendering their recall impossible by our method.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>The top eight frames from an example librsvg call-stack.</p>
</caption>
<graphic xlink:href="fams-07-668082-g008.tif"/>
</fig>
<p>Example 3. To make call stacks without function terms more readable we insert random words into each predicted frame trace. Keeping with the afl theme, we sample random words from the surnames of champion players from the Richmond Tigers Australian Football League (AFL) team. These words are consistent across the set of call stacks, making it easier for the user to visually compare different call stacks.</p>
<p>
<xref ref-type="fig" rid="F9">Figure&#x20;9</xref> demonstrates two reconstructed call stacks from the SoX program. In the original call-stacks, the coloring represents equivalence classes in the model. We color the reconstructed call-stacks the same, noting that when we attempt to reconstruct we do not know what equivalence classes will contain functions a priori.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Two call stacks from the SoX data set, along with their reconstructions using the PredictFTs algorithm.</p>
</caption>
<graphic xlink:href="fams-07-668082-g009.tif"/>
</fig>
<p>As is evident, words representing functions are consistent across the set of call-stacks when frame traces are correctly predicted. The ???? terms in the reconstructed call-stacks represent function frame traces that the algorithm did not attempt to predict.</p>
<p>In <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>, we explain which fake function names were paired with which line numbers. We describe how each fake function name pairs with an original in the case that it was a correct prediction, as well as which of the two heuristics were used to pair it with the frame trace of a given line number. The only incorrect prediction was the fake function name DELEDIO, which erroneously predicted that the line number 219 was paired with a function in the original call-stack set via heuristic&#x20;2.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>The list of functions whose frame trace was recovered, and the fake function names and line numbers that were paired.</p>
</caption>
<graphic xlink:href="fams-07-668082-g010.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec id="s5">
<title>5 Related Work</title>
<p>There is a large collection of research centered on crash triage, in particular in crash de-duplication. The most common tasks are either 1) to automatically de-duplicate full bug-reports submitted to open-source software or 2) to bucket crashes by dissimilarity. In contrast to the setting considered in this paper, most research concerns full bug reports where call-stacks are only a subset of the entire information. In particular, the language in user-reported comments is incorporated, and in some cases the central object&#x20;[<xref ref-type="bibr" rid="B9">9</xref>].</p>
<p>Some of the highest rates of an expert-validated crash duplicate recall are attributed when program execution traces are also recorded [<xref ref-type="bibr" rid="B10">10</xref>]. Including call stacks in bug report data has been shown to increase de-duplication recall of full bug reports significantly [<xref ref-type="bibr" rid="B11">11</xref>] validating that they are an important object of study in crash triage. We refrained from using the common methods outlined in the above research, showing that a reasonable whitelisting and de-duplication was enough to significantly reduce the number of call-stacks.</p>
<p>Other models of call-stacks exist, albeit with slightly different machinery. For example, the crash-graph defined in [<xref ref-type="bibr" rid="B12">12</xref>] serves as a way to graphically compare the similarity between call-stacks. The use of such a model for function frame trace recovery could be an avenue for future research.</p>
</sec>
<sec id="s6">
<title>6 Limitations</title>
<p>One of the main limitations of our work is that it is performed on a relatively small data-set. Indeed there are less than 100 distinct call-stacks in each Linux program we have tested, barring the libxml data used to generate the logisitic regression weights. When the set of call-stacks is not very diverse, there may be a tendency to only see function terms with a single line number, making them easier to recover using our method.</p>
<p>A second limitation of our model is that it relies on two fairly na&#xef;ve patterns. It is not clear if 1) such patterns yield similar results on larger data-sets or 2) whether such patterns could be improved upon or replaced with a more scientific approach. At present, the heuristic method means that our model can only predict function terms that are consistently associated with a single line number across the call-stack set. Our hope is that more sophisticated pattern recognition techniques applied to our topological model could accommodate cases such as frames that pair a particular function with various line numbers. In particular, since the call-stack poset can be thought of as a graph, we expect that more sophisticated techniques from the graph-learning literature could be leveraged 1) in lieu of our logistic regression model and 2) to derive better heuristics and push recall beyond <inline-formula id="inf227">
<mml:math id="m227">
<mml:mrow>
<mml:mn>40</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>50</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</inline-formula> while preserving precision.</p>
</sec>
<sec id="s7">
<title>7 Conclusion</title>
<p>In summary, our main contribution has been to present a novel topological model to address the problem of function term reconstruction in call-stack data. We performed a small-scale experiment, providing an algorithm to predict the frame-traces of function terms which have been obscured in the call-stack data. Despite the limitations, the performance of the model is relatively encouraging, showing that more information about obscured function terms can be recovered than one may initially suspect. In the future, we envision further research could be done within this framework to improve the recall of the <sc>PredictFTs</sc> algorithm.</p>
<p>We also showed that there is a fundamental lack of diversity in our call-stack data, and we hypothesize that the brute-force nature of fuzzing means that this will probably occur in most data-sets generated by a fuzzer. It is an open question whether our method will work on larger, more diverse call-stack data-sets. Given that some level of dependence between terms is required to form equivalence classes, there is no guarantee that similar results will be achieved.</p>
<p>Lastly, the topological model used here is an example of a larger framework defined in [<xref ref-type="bibr" rid="B13">13</xref>]. The extended model is used to tackle applications in gray-box fuzzing, with the goal being to help guide fuzzing campaigns to generate more diverse call-stack data. The use of these models of dependency relations may be applicable in broader contexts outside of fuzzing, such as analyzing dependencies between genes in medical&#x20;data.</p>
</sec>
</body>
<back>
<sec id="s8">
<title>Data Availability Statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s12">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s9">
<title>Author Contributions</title>
<p>VR conceived the project and helped source the data. KM conducted the research and wrote the manuscript with assistance and guidance from VR. Both authors discussed the results and analysis at length.</p>
</sec>
<sec id="s10">
<title>Funding</title>
<p>KM received funding from the Australian Commonwealth Department of Defense under the project title &#x201c;Mathematical methods for analysis and classification of call-stack data sets&#x201d;. VR was supported by ARC Future Fellowship FT140100604 in the early stages of the project.</p>
</sec>
<sec sec-type="COI-statement" id="s11">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>The authors are indebted to W.P. Malcolm for suggesting that topological data analysis might be usefully applied to study call-stacks, for assistance in sourcing the data, and many fascinating conversations about probability theory. The paper would not exist without the data provided by Adrian Herrera. The authors gratefully acknowledge many helpful discussions with AH about the art of fuzzing, software compilers, and call-stack jargon.</p>
</ack>
<sec id="s12">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fams.2021.668082/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fams.2021.668082/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet1.ZIP" id="SM1" mimetype="application/ZIP" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gunadi</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Herrera</surname>
<given-names>A</given-names>
</name>
</person-group>. <source>Experiment Data for MoonLight: Effective Fuzzing with Near-Optimal Corpus Distillation</source> (<year>2019</year>). <pub-id pub-id-type="doi">10.1109/icicos48119.2019.8982513</pub-id> </citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Zalewski</surname>
<given-names>M</given-names>
</name>
</person-group>. <source>Technical Whitepaper for AFL-Fuzz</source> (<year>2014</year>). <pub-id pub-id-type="doi">10.3726/978-3-653-03549-0</pub-id> <ext-link ext-link-type="uri" xlink:href="https://lcamtuf.coredump.cx/afl/technical_details.txt">https://lcamtuf.coredump.cx/afl/technical_details.txt</ext-link>. </citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bartz</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Stokes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Platt</surname>
<given-names>J</given-names>
</name>
</person-group>. <source>Finding Similar Failures Using Callstack Similarity</source>. <publisher-loc>Redmond WA</publisher-loc>: <publisher-name>Microsoft Corporation</publisher-name> (<year>2009</year>). </citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Modani</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gupta</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lohman</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Syeda-Mahmood</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Mignet</surname>
<given-names>L</given-names>
</name>
</person-group>. <article-title>Automatically Identifying Known Software Problems</article-title>. In: <conf-name>Proceedings - International Conference on Data Engineering</conf-name> (<year>2007</year>). <conf-loc>Istanbul, Turkey</conf-loc>. <pub-id pub-id-type="doi">10.1109/ICDEW.2007.4401026</pub-id> </citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCord</surname>
<given-names>MC</given-names>
</name>
</person-group>. <article-title>Singular Homology Groups and Homotopy Groups of Finite Topological Spaces</article-title>. <source>Duke Math J</source> (<year>1966</year>). <volume>33</volume>:<fpage>465</fpage>&#x2013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1215/S0012-7094-66-03352-7</pub-id> </citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Serebryany</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Bruening</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Potapenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vyukov</surname>
<given-names>D</given-names>
</name>
</person-group>. <article-title>AddressSanitizer: A Fast Address Sanity Checker</article-title>. In: <conf-name>Proceedings of the 2012 USENIX Conference on Annual Technical Conference</conf-name>. <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>USENIX Association, USENIX ATC&#x2019;12</publisher-name> (<year>2012</year>). p. <fpage>28</fpage>. </citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Pirttim&#xe4;ki</surname>
<given-names>T</given-names>
</name>
</person-group>. <source>A Survey of Kolmogorov Quotients</source> (<year>2019</year>). <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1905.01157">https://arxiv.org/abs/1905.01157</ext-link>.</citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>H</given-names>
</name>
</person-group>. <source>Imbalanced Learning</source>. <publisher-name>John Wiley &#x26; Sons</publisher-name> (<year>2011</year>). p. <fpage>44</fpage>&#x2013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1002/9781118025604.ch3</pub-id> </citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Runeson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Alexandersson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nyholm</surname>
<given-names>O</given-names>
</name>
</person-group>. <article-title>Detection of Duplicate Defect Reports Using Natural Language Processing</article-title>. In: <conf-name>Proceedings - International Conference on Software Engineering</conf-name> (<year>2007</year>). <publisher-loc>Barcelona, Spain</publisher-loc>. <pub-id pub-id-type="doi">10.1109/ICSE.2007.32</pub-id> </citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xiaoyin</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Tao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Anvik</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information</article-title>. In: <conf-name>Proceedings - International Conference on Software Engineering</conf-name> (<year>2008</year>). <publisher-loc>Germany</publisher-loc>. <pub-id pub-id-type="doi">10.1145/1368088.1368151</pub-id> </citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Lerch</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mezini</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Finding Duplicates of Your yet Unwritten Bug Report</article-title>. In: <conf-name>Proceedings of the European Conference on Software Maintenance and Reengineering</conf-name>. <publisher-loc>Geneva, Italy</publisher-loc>: <publisher-name>CSMR</publisher-name> (<year>2013</year>). <pub-id pub-id-type="doi">10.1109/CSMR.2013.17</pub-id> </citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zimmermann</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Nagappan</surname>
<given-names>N</given-names>
</name>
</person-group>. <article-title>Crash Graphs: An Aggregated View of Multiple Crashes to Improve Crash Triage</article-title>. In: <conf-name>2011 IEEE/IFIP 41st International Conference on Dependable Systems Networks</conf-name>. <publisher-loc>Hong Kong</publisher-loc>: <publisher-name>DSN</publisher-name> (<year>2011</year>). p. <fpage>486</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1109/DSN.2011.5958261</pub-id> </citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Maggs</surname>
<given-names>K</given-names>
</name>
</person-group>. <source>A Topological Model For Applications In Fuzzing</source>. <publisher-loc>Canberra</publisher-loc>: <publisher-name>Mathematical Science Institute, ANU College of Science, The Australian National University</publisher-name> (<year>2021</year>). <comment>Master&#x2019;s thesis</comment>.</citation>
</ref>
</ref-list>
</back>
</article>