<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Phys.</journal-id>
<journal-title>Frontiers in Physics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Phys.</abbrev-journal-title>
<issn pub-type="epub">2296-424X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">768181</article-id>
<article-id pub-id-type="doi">10.3389/fphy.2021.768181</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Physics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Efficient Targeted Influence Maximization Based on Multidimensional Selection in Social Networks</article-title>
<alt-title alt-title-type="left-running-head">Jing and Liu</alt-title>
<alt-title alt-title-type="right-running-head">Efficient Targeted Influence Maximization</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Jing</surname>
<given-names>Dong</given-names>
</name>
<uri xlink:href="https://loop.frontiersin.org/people/1390890/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Liu</surname>
<given-names>Ting</given-names>
</name>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
</contrib>
</contrib-group>
<aff>School of Computer Science and Technology, Harbin Institute of Technology, <addr-line>Harbin</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/101109/overview">Chengyi Xia</ext-link>, Tianjin University of Technology, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1024765/overview">Peican Zhu</ext-link>, Northwestern Polytechnical University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/942114/overview">Zhen Wang</ext-link>, Hangzhou Dianzi University, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Ting Liu, <email>tliu@ir.hit.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Social Physics, a section of the journal Frontiers in Physics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>9</volume>
<elocation-id>768181</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>11</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Jing and Liu.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Jing and Liu</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>The influence maximization problem over social networks has become a popular research problem, since it has many important practical applications such as online advertising, virtual market, and so on. General influence maximization problem is defined over the whole network, whose intuitive aim is to find a seed node set with size at most <italic>k</italic> in order to affect as many as nodes in the network. However, in real applications, it is commonly required that only special nodes (target) in the network are expected to be influenced, which can use the same cost of placing seed nodes but influence more targeted nodes really needed. Some research efforts have provided solutions for the corresponding targeted influence maximization problem (TIM for short). However, there are two main drawbacks of previous works focusing on the TIM problem. First, some works focusing on the case the targets are given arbitrarily make it hard to achieve efficient performance guarantee required by real applications. Second, some previous works studying the TIM problems by specifying the target set in a probabilistic way is not proper for the case that only exact target set is required. In this paper, we study the Multidimensional Selection based Targeted Influence Maximization problem, MSTIM for short. First, the formal definition of the problem is given based on a brief and expressive fragment of general multi-dimensional queries. Then, a formal theoretical analysis about the computational hardness of the MSTIM problem shows that even for a very simple case that the target set specified is 1 larger than the seed node set, the MSTIM problem is still NP-hard. Then, the basic framework of RIS (short for Reverse Influence Sampling) is extended and shown to have a 1 &#x2212; 1/<italic>e</italic>&#x20;&#x2212; <italic>&#x3f5;</italic> approximation ratio when a sampling size is satisfied. To satisfy the efficiency requirements, an index-based method for the MSTIM problem is proposed, which utilizes the ideas of reusing previous results, exploits the covering relationship between queries and achieves an efficient solution for MSTIM. Finally, the experimental results on real datasets show that the proposed method is indeed rather efficient.</p>
</abstract>
<kwd-group>
<kwd>targeted</kwd>
<kwd>influence maximization</kwd>
<kwd>index</kwd>
<kwd>sampling</kwd>
<kwd>multidimensional selection</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>As the social applications and graph structured data become more and more popular, many fundamental research problems over social networks have increased the interests of researchers. Influence maximization problem is a typical one of such problems, which aims to find a set of nodes with enough influential abilities over the whole network. One typical application of influence maximization problem is virtual marketing, which utilizes the method of pushing advertisements to special users and encourages them to propagate the advertisements to more users by their social relationships. Recently, the problem has been focused by lots of research works, which spreads over several areas such as network, information management and so on. Also, in different applications, the corresponding variants of the influence maximization problem have been proposed and studied.</p>
<p>One important variant of the general influence maximization problem is called targeted influence maximization, TIM for short. In the general definition, given a network <italic>G</italic>, the influence maximization problem is to compute a set <italic>S</italic> of <italic>k</italic> seed nodes such that <italic>S</italic> can influence the most nodes in <italic>G</italic>. Different from the general one, the aim of targeted influence maximization problem is to influence the nodes in a special subset <italic>T</italic>&#x20;&#x2286; <italic>V</italic>
<sub>
<italic>G</italic>
</sub> (target set) as many as possible but not the whole node set of <italic>G</italic>. Obviously, in the definition of TIM, how to define the target set <italic>T</italic> is a key step. In [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B2">2</xref>], the target set is chosen arbitrarily, whose definition is independent from the application settings and in the most general way. In Li et&#x20;al. (2015), it is given by a topic-aware way, where each node is associated with several topics and the target is specified by a topic list. Given the topic list (query), a measure about the closeness between each node and the query can be computed. As a consequence, the optimizing goal of TIM can be defined by a weighted sum of all nodes in <italic>G</italic>. In fact, the definition used by [<xref ref-type="bibr" rid="B3">3</xref>] assigns each node a probability of appearing in the target set and solves the corresponding influence maximization problem by using a modified optimizing&#x20;goal.</p>
<p>There are two main drawbacks of previous works focusing on the TIM problem. First, providing abilities of quick feedback for the influence maximization applications [<xref ref-type="bibr" rid="B2">2</xref>,<xref ref-type="bibr" rid="B3">3</xref>] is very important, however, the general definition of TIM taken by previous works like [<xref ref-type="bibr" rid="B1">1</xref>] makes it hard to improve the performance of TIM algorithms by utilizing previous efforts on computing for other target sets, since the targets are usually given randomly and independent and caching the related information will cause huge costs. Second, previous works like [<xref ref-type="bibr" rid="B3">3</xref>] study the TIM problems by specifying the target set by topic-ware queries, query based specification of target sets make it possible to index relatively less information and answer an arbitrary TIM problem defined on topics efficiently by reusing the information indexed. However, as shown by the following example, in many applications, users may expect the target set can be specified in a more exact way, and the definition used in [<xref ref-type="bibr" rid="B3">3</xref>] will be not proper.</p>
<p>Example 1. As shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, there is a social network whose relationships can be represented by the graph structure. The node labelled by &#x201c;<italic>a</italic>&#x201d; maintains the information about a man aged 20 lived in &#x201c;NY&#x201d; whose salary is 5,000 per month. Also, the information associated with other nodes in the graph can be explained similarly. Each directed edge between two nodes <italic>u</italic> and <italic>v</italic> means that <italic>u</italic> can influence <italic>v</italic>. The edge between <italic>a</italic> and <italic>f</italic> is labelled by 0.8, it means that when receiving a message from <italic>a</italic> the probability that <italic>f</italic> will accept and transform the message is 0.8. That is, the value in the middle of each edge is the corresponding influence probability.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>A motivated example of multidimensional selection based targeted influence maximization.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g001.tif"/>
</fig>
<p>Let us consider a simple example. Suppose there is only one seed node <italic>b</italic> during the information propagation, since there is only one path between <italic>b</italic> and <italic>d</italic>, the probability that <italic>d</italic> will be influenced at last will be <italic>p</italic>
<sub>
<italic>b</italic>
</sub>,<sub>
<italic>d</italic>
</sub> &#x3d; 0.5 &#xd7; 0.7 &#x3d; 0.35<italic>.</italic> The general influence maximization problem is to find a seed node set such that after the information propagation procedure the expected number of nodes influenced is maximized.</p>
<p>Now, consider a case that the user want to select some nodes to help him to make an advertisement of an expensive razor. In this application, the target set may be naturally expected to be the male persons with high salary (no less than 15,000). That is, only the seed nodes with high influences to the nodes in {<italic>d</italic>, <italic>e</italic>, <italic>f</italic>, <italic>h</italic>} should be considered. Moreover, to specify the target set exactly and briefly, multi-dimensional range query is a proper choice, which can express the above requirements by a statement <italic>q</italic>: (<italic>gender</italic> &#x3d; &#x201c;M&#x201d;) &#x2227; (<italic>income</italic> &#x2265; 15000).</p>
<p>As far as known by us, there are no previous works focusing on the targeted influence maximization problem based on multi-dimensional queries. To provided quick response to the targeted influence maximization problem based on multi-dimensional queries, there are at least two challenges. 1) Different from previous methods, since the target set is specified in a non-trivial way, efficient techniques for collecting the exact target set for an ad-hoc query must be developed. 2) To return the seed node set efficiently, the idea of using previously cached results should be well exploited. Although in the area of topic aware influence maximization [<xref ref-type="bibr" rid="B3">3</xref>] has investigated such method, it is not proper for the cases when the target set is specified by multi-dimensional queries.</p>
<p>Therefore, in this paper, we address the problem of Multidimensional Selection based Targeted Influence Maximization, MSTIM for short. To support efficient evaluation of queries specifying the target set, an index based solution is utilized to reuse the previous query results. While to compute the corresponding influence maximization problem efficiently, a sample based method is developed based on previous works, and it is extended to an index based solution which can reuse the samples obtained before and improve the performance significantly. The main contributions of this paper can be summarized as follows.<list list-type="simple">
<list-item>
<p>1. We identify the effects of multi-dimensional queries to specify the target set in the influence maximization problem, and propose the formal problem definition of Multidimensional Selection based Targeted Influence Maximization (MSTIM for short) based on a brief and expressive fragment of general multi-dimensional queries.</p>
</list-item>
<list-item>
<p>2. We show the computational hardness of the MSTIM problem, in fact, even if a very simple case that the target set specified is 1 larger than the seed node set, the MSTIM problem is still NP-hard.</p>
</list-item>
<list-item>
<p>3. Based on the Reverse Influence Sampling (RIS for short) method previously proposed, for the MSTIM problem, the basic framework of RIS is extended and shown to have a 1 &#x2212; 1/<italic>e</italic>&#x20;&#x2212; <italic>&#x3f5;</italic> approximation ratio when a sampling size is satisfied.</p>
</list-item>
<list-item>
<p>4. The index-based solution for the MSTIM problem is proposed. Using indexes of queries previously maintained, the performance of evaluating multi-dimensional queries are improved by reusing the results computed before. Sophisticated techniques for handling searching query predicates are designed and well studied. By the help of indexes of previous samples and the inverted index between nodes and samples, the MSTIM problem can be solved efficiently.</p>
</list-item>
<list-item>
<p>5. The experimental results on real datasets show that the proposed method is indeed rather efficient.</p>
</list-item>
</list>
</p>
<p>The rest parts of the paper are organized as follows. In <xref ref-type="sec" rid="s2">section 2</xref>, some background information are introduced. Then, in <xref ref-type="sec" rid="s3">section 3</xref>, the theoretical analysis of the MSTIM problem is given. <xref ref-type="sec" rid="s4">Section 4</xref> provides the basic framework of the sampling based approximation solution for the MSTIM problem. In <xref ref-type="sec" rid="s5">section 5</xref>, the index version is proposed and introduced in details. <xref ref-type="sec" rid="s6">Section 6</xref> shows the experimental results. Related works are discussed in <xref ref-type="sec" rid="s7">section 7</xref>, and the final part is the conclusion.</p>
</sec>
<sec id="s2">
<title>2 Related Work</title>
<p>The influence maximization is an important and classical problem in the research area of online social networking, which has many applications such as viral marketing, computational advertising and so on. It is firstly studied by Domingo and Richardson [<xref ref-type="bibr" rid="B4">4</xref>,<xref ref-type="bibr" rid="B5">5</xref>], and the formalized definitions and comprehensive theoretical analysis are given in [<xref ref-type="bibr" rid="B6">6</xref>]. Different models have been formally defined to simulate the information propagation processes with different characteristics, the two most popular models are the Independent Cascade (IC for short) and Linear Threshold (LT for short) models. In [<xref ref-type="bibr" rid="B6">6</xref>], the influence maximization problems under both IC and LT models are shown to be NP-hard problems and the problem of computing the exact influence of given nodes set is shown to be <italic>&#x266f;</italic>P-hard problem in&#x20;[<xref ref-type="bibr" rid="B7">7</xref>].</p>
<p>After the problem is proposed, many research efforts have been made to find the node set with maximum influence [<xref ref-type="bibr" rid="B6">6</xref>]. Proposed an algorithm for influence maximization based on greedy ideas which has constant approximation ratio (1 &#x2212; 1/<italic>e</italic>), whose time cost is usually expensive for large networks. To overcome the shortcomings of greedy based algorithms [<xref ref-type="bibr" rid="B8">8</xref>], proposed CELF (Cost-Effective Lazy-Forward) algorithm, which can improve the performance of greedy based algorithms for influence maximization by reducing the times of evaluations of influence set of given seed set [<xref ref-type="bibr" rid="B9">9</xref>]. Proposed SIMPATH algorithm in LT model which improve the performance of greedy based influence maximization algorithm in LT model. Similar works focusing on improve the performance of influence maximization algorithms can be found also, such as [<xref ref-type="bibr" rid="B10">10</xref>&#x2013;<xref ref-type="bibr" rid="B12">12</xref>] and so on. Recently, a series of sampling based influence maximization algorithms such as [<xref ref-type="bibr" rid="B13">13</xref>&#x2013;<xref ref-type="bibr" rid="B15">15</xref>] are proposed and well developed, which have improved the practical performance greatly by involving a tiny loss on the approximation ratio. However, as shown by [<xref ref-type="bibr" rid="B16">16</xref>,<xref ref-type="bibr" rid="B17">17</xref>], the efficiency problem is still challenging for applying influence maximization algorithms in real applications.</p>
<p>The work most related with ours is [<xref ref-type="bibr" rid="B3">3</xref>], which focuses on the topic aware targeted influence maximization problem. In the topic aware setting, each node is associated with a list of interesting topics and the query is specified in the form of topic list. After calculating the similarities between users and the query, a weight can be assigned to the node such that the general optimizing goal of IM problem is extended to the weighted case. A sampling based solution under the help of indexes are given in [<xref ref-type="bibr" rid="B3">3</xref>]. Indexes are built for each keyword refereed in the topic setting. When a random query is given, the topic list associated with the query will be weighted and the computations of the similarities will be assigned to each keyword, and finally the samples are collected by combining the samples maintained for each keyword. Different from this paper, the work is not proper for the case that an exact subset of the whole network is expected to be the target set. Considering the case that the target set can be specified in an arbitrary way [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B2">2</xref>] studies the most general targeted influence maximization problems and provides efficient solutions. However, the general method adopted by them makes it almost impossible to provide efficient solutions using previous results with acceptable space cost. While this paper considers a more specific case that the target set can be described by a multi-dimensional query and utilizes the characteristics of those queries to develop sophisticated index based solutions. Therefore, the paper studies a variant of targeted influence maximization problem which is different from previous&#x20;works.</p>
<p>There are also many works which try to extend the classic influence maximization methods to other application settings [<xref ref-type="bibr" rid="B18">18</xref>]. Studies the problem of influence maximization under location based social networks. In those networks, one node can be influenced by the other node if and only if they are neighbours according to their location informations, and [<xref ref-type="bibr" rid="B18">18</xref>] focus on the problem of finding <italic>k</italic> users which can affect maximum users in the location based social network [<xref ref-type="bibr" rid="B19">19</xref>]. Identifies the relation types during propagating the information and formally defines the problem of influence maximization by considering different types of relationships between nodes. A key idea is that given certain information which needs to be propagate the influence set of some node set can be computed more efficiently by reducing those edges belonging to some certain types [<xref ref-type="bibr" rid="B20">20</xref>]. Studies the problem of influence maximization under topic-aware applications. As shown by [<xref ref-type="bibr" rid="B21">21</xref>], the influence probabilities between users with special triangle structures are obviously higher than others. The above research efforts focus on totally different problems, compared with this paper, but their ideas on developing efficient influence maximization algorithms are helpful for&#x20;us.</p>
</sec>
<sec id="s3">
<title>3 Preliminary</title>
<sec id="s3-1">
<title>3.1 Classical Influence Maximization</title>
<p>The general description of information diffusion can be explained to be a propagating procedure of information over some special network. A network is denoted by a graph <italic>G</italic>(<italic>V</italic>, <italic>E</italic>). Given an information diffusion model <italic>M</italic>, the model will describe how the nodes influences others in network. In an instant state of the network, nodes in the network will be labelled by active or inactive. According to the model <italic>M</italic>, the inactive nodes may become active because of the existence of special active neighbours, whose rule is defined by&#x20;<italic>M</italic>.</p>
<p>There are two classical methods to define the information propagation model, linear threshold and independent cascade model. This paper focuses on the independent cascade model (IC for short). In this model, for each edge (<italic>u</italic>, <italic>v</italic>) a probability <italic>p</italic>
<sub>
<italic>uv</italic>
</sub> is given to describe that <italic>u</italic> can activate <italic>v</italic> with probability <italic>p</italic>
<sub>
<italic>uv</italic>
</sub>. After initializing an active node set <italic>S</italic>
<sub>0</sub>, in the <italic>i</italic>th step, every node will try to activate their neighbours. In detail, for each node <italic>u</italic>&#x20;&#x2208; <italic>S</italic>
<sub>
<italic>i</italic>&#x2212;1</sub> and node <italic>v</italic>&#x20;&#x2208; <italic>V</italic> \ <italic>S</italic>
<sub>
<italic>i</italic>&#x2212;1</sub>, if (<italic>u</italic>, <italic>v</italic>) &#x2208; <italic>E</italic>, <italic>v</italic> will be activated once in probability <italic>p</italic>
<sub>
<italic>uv</italic>
</sub>. If <italic>v</italic> indeed becomes active, it will be added to <italic>S</italic>
<sub>
<italic>i</italic>
</sub> and not be further considered in current step. Repeat this procedure until that no new nodes are added. Obviously, under a specific information propagation model, given an initial active set <italic>S</italic> over a network <italic>G</italic>, we can obtain a node set <italic>I</italic>
<sub>
<italic>S</italic>
</sub> which can be activated when the propagation procedure is finished. Therefore, an expected value <inline-formula id="inf1">
<mml:math id="m1">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> can be defined according to the probabilistic distributions of the possible information propagation graphs, whose details can be found in&#x20;[<xref ref-type="bibr" rid="B6">6</xref>].</p>
<p>Definition 1 (Influence Maximization, IM for short). Given a propagation graph <italic>G</italic>&#x20;&#x3d; (<italic>V</italic>, <italic>E</italic>) such that there is an associated probability <italic>p</italic>
<sub>
<italic>uv</italic>
</sub> for each edge (<italic>u</italic>, <italic>v</italic>) &#x2208; <italic>E</italic>, and an integer <italic>k</italic>&#x20;&#x3e; 0, the goal is to compute a node set <italic>S</italic> such that <inline-formula id="inf2">
<mml:math id="m2">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is maximum.</p>
</sec>
<sec id="s3-2">
<title>3.2 Multidimentional Selection</title>
<p>To support multidimensional selections over social networks, it is necessary to consider an extended model of the general network for information propagation.</p>
<p>For each node <italic>v</italic>&#x20;&#x2208; <italic>V</italic>
<sub>
<italic>G</italic>
</sub>, there are <italic>m</italic> attributes <inline-formula id="inf3">
<mml:math id="m3">
<mml:mi>A</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mspace width="0.3333em"/>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="double-struck">N</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> associated. Let <italic>N</italic>
<sub>
<italic>i</italic>
</sub> be the number of distinct values in the attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub>. For ordered attributes, wlog, it can be assumed that the domain of <italic>A</italic>
<sub>
<italic>i</italic>
</sub> is <italic>Dom</italic>(<italic>A</italic>
<sub>
<italic>i</italic>
</sub>) &#x3d; {1, 2, <italic>&#x2026;</italic> , <italic>N</italic>
<sub>
<italic>i</italic>
</sub>}. While, for general attributes, we can represent the values of <italic>A</italic>
<sub>
<italic>i</italic>
</sub> as <italic>Dom</italic>(<italic>A</italic>
<sub>
<italic>i</italic>
</sub>) &#x3d; {<italic>a</italic>
<sub>1</sub>, <italic>a</italic>
<sub>2</sub>, <italic>&#x2026;</italic> , <italic>a</italic>
<sub>
<italic>i</italic>
</sub>}. In practical applications, most of semantic information related to nodes in the network can be represented by the associated attributes. For example, in social networks, the vertex related information such as age, birthplace, interests, and so on can be represented by the attributes associated with vertices. Specially, we use <italic>v</italic>.<italic>A</italic>
<sub>
<italic>i</italic>
</sub> to represent the value of node <italic>v</italic> on attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub>. Formally, such a network can be represented by <italic>G</italic>&#x20;&#x3d; (<italic>V</italic>, <italic>E</italic>, <italic>A</italic>), where for each node <italic>v</italic> there is <italic>v</italic>.<italic>A</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>Dom</italic>(<italic>A</italic>
<sub>
<italic>i</italic>
</sub>) for every attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub> &#x2208;&#x20;<italic>A</italic>.</p>
<p>Then, we can define some basic concepts of multidimensional selection queries.</p>
<p>Definition 2 (1-Dimension Set). A 1-dimension set of a general attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub> is a set <italic>s</italic>&#x20;&#x3d; {<italic>v</italic>
<sub>1</sub>, <italic>v</italic>
<sub>2</sub>, <italic>&#x2026;</italic> , <italic>v</italic>
<sub>
<italic>l</italic>
</sub>} satisfying <italic>v</italic>
<sub>
<italic>j</italic>
</sub> &#x2208; <italic>Dom</italic>(<italic>A</italic>
<sub>
<italic>i</italic>
</sub>) for each <italic>j</italic> between 1 and&#x20;<italic>l</italic>.</p>
<p>Definition 3 (1-Dimension Range). A 1-dimensional range <inline-formula id="inf4">
<mml:math id="m4">
<mml:mi mathvariant="sans-serif">r</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> satisfying the constraint <italic>l</italic>
<sub>
<italic>i</italic>
</sub> &#x3c; <italic>u</italic>
<sub>
<italic>i</italic>
</sub> of an ordered attribute <italic>A</italic>
<sub>
<italic>i</italic>
</sub> defines a 1-dimensional set [<italic>l</italic>
<sub>
<italic>i</italic>
</sub>, <italic>u</italic>
<sub>
<italic>i</italic>
</sub>] &#x3d; {<italic>l</italic>
<sub>
<italic>i</italic>
</sub>, <italic>l</italic>
<sub>
<italic>i</italic>
</sub> &#x2b; 1, <italic>&#x2026;</italic> ,&#x20;<italic>u</italic>
<sub>
<italic>i</italic>
</sub>}.</p>
<p>Similarly, we can define the 1-dimension range (<italic>l</italic>
<sub>
<italic>i</italic>
</sub>, <italic>u</italic>
<sub>
<italic>i</italic>
</sub>), (<italic>l</italic>
<sub>
<italic>i</italic>
</sub>, <italic>u</italic>
<sub>
<italic>i</italic>
</sub>] and [<italic>l</italic>
<sub>
<italic>i</italic>
</sub>, <italic>u</italic>
<sub>
<italic>i</italic>
</sub>), where the round bracket means that it excludes the boundary&#x20;value.</p>
<p>Then, a 1-dimensional selection query <italic>q</italic> can be represented by (<italic>A</italic>
<sub>
<italic>i</italic>
</sub>, <italic>p</italic>) where <italic>p</italic> is a 1-dimension set <italic>s</italic> or a 1-dimension range <inline-formula id="inf5">
<mml:math id="m5">
<mml:mi mathvariant="sans-serif">r</mml:mi>
</mml:math>
</inline-formula>, the predict <italic>p</italic> essentially defines a function <italic>p</italic>: <italic>Dom</italic>(<italic>A</italic>
<sub>
<italic>i</italic>
</sub>)&#x21a6;{0, 1}. For a node <italic>v</italic>&#x20;&#x2208; <italic>V</italic>, <italic>v</italic> satisfies a 1-dimensional query <italic>q</italic>&#x20;&#x3d; (<italic>A</italic>
<sub>
<italic>i</italic>
</sub>, <italic>s</italic>) or <inline-formula id="inf6">
<mml:math id="m6">
<mml:mi>q</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="sans-serif">r</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, represented by <italic>v</italic>&#x20;&#x2208; <italic>q</italic>(<italic>V</italic>), if and only if <italic>v</italic>.<italic>A</italic>
<sub>
<italic>i</italic>
</sub> &#x2208; <italic>s</italic> or <inline-formula id="inf7">
<mml:math id="m7">
<mml:mi>v</mml:mi>
<mml:mo>.</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2208;</mml:mo>
<mml:mi mathvariant="sans-serif">r</mml:mi>
</mml:math>
</inline-formula>. It should be noted that a 1-dimensional selection query <italic>q</italic> defines a function <italic>q</italic>: <italic>V</italic>&#x21a6;{0,&#x20;1}.</p>
<p>To be more general, we can define <italic>k</italic>-dimensional selection based on the definitions&#x20;above.</p>
<p>Definition 4 (<italic>k</italic>-Dimensional Selection Query). A <italic>k</italic>-dimensional selection query <italic>Q</italic>, which defines a function <italic>V</italic>&#x21a6;(0, 1), is composed of a set of <italic>k</italic> 1-dimensional query {<italic>q</italic>
<sub>1</sub>, <italic>&#x2026;</italic> , <italic>q</italic>
<sub>
<italic>k</italic>
</sub>}. For each node <italic>v</italic>&#x20;&#x2208; <italic>V</italic>, <italic>Q</italic>(<italic>v</italic>) &#x3d; 1 or <italic>v</italic>&#x20;&#x2208; <italic>Q</italic>(<italic>V</italic>) if and only if we have <italic>q</italic>
<sub>
<italic>i</italic>
</sub>(<italic>v</italic>) &#x3d; 1 for all <italic>q</italic>
<sub>
<italic>i</italic>
</sub> (1 &#x2264; <italic>i</italic>&#x20;&#x2264;&#x20;<italic>k</italic>).</p>
<p>Here, given a <italic>k</italic>-dimensional selection query <italic>Q</italic>, let <inline-formula id="inf8">
<mml:math id="m8">
<mml:mi mathvariant="double-struck">Q</mml:mi>
</mml:math>
</inline-formula> be the vector which includes all associated 1-dimensional selection queries of <italic>Q</italic> and for each <italic>i</italic>&#x20;&#x2208; (1, <italic>k</italic>) the query <italic>q</italic>
<sub>
<italic>i</italic>
</sub> is stored in <inline-formula id="inf9">
<mml:math id="m9">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</inline-formula>. Then, the condition of <italic>Q</italic>(<italic>v</italic>) &#x3d; 1 will be equivalent with the fact that <inline-formula id="inf10">
<mml:math id="m10">
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="double-struck">Q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula> for all <italic>i</italic>&#x20;&#x2208; (1, <italic>k</italic>). In the followings, to be convenient, we will use <italic>k</italic>-dimensional query and 1-dimensional query to denote <italic>k</italic>-dimensional selection query and 1-dimensional query, respectively.</p>
<p>Then, based on the concepts above, for a specific node set <italic>V</italic>, we can give a formal definition of the selection result of query <italic>Q</italic> as <italic>Q</italic>(<italic>V</italic>) &#x3d; [<italic>v</italic>&#x7c;<italic>v</italic>&#x20;&#x2208; <italic>V</italic> and <italic>Q</italic>(<italic>v</italic>) &#x3d; 1], which is used in an informal way before.</p>
<p>Example 2. Continue with Example 1, the network shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> can be transformed into the form shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, where each node in the network is associated with four attributes which are listed on the left. For each node, the corresponding values are shown on the right in the form of a table. Given a 2-dimensional query <italic>Q</italic>: [<italic>A</italic>
<sub>1</sub> &#x3d; <italic>M</italic>, <italic>A</italic>
<sub>2</sub> &#x2208; (20, 40)], the query result <italic>Q</italic>(<italic>V</italic>) will include the nodes <italic>a</italic>, <italic>d</italic>, <italic>e</italic>, and&#x20;<italic>f</italic>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>An example of network with attributes.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g002.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>3.3 Multidimensional Selection Based Targeted Influence Maximization</title>
<p>Given a <italic>k</italic>-dimensional query <italic>Q</italic> representing the target users, it can be used to determine whether a node <italic>v</italic> is interested by checking whether <italic>Q</italic>(<italic>v</italic>) &#x3d; 1. For a node set <italic>S</italic>&#x20;&#x2286; <italic>V</italic>, after a specific information propagation procedure, only the nodes activated which belong to the result of query <italic>Q</italic> are really interested by the users. Then, we can define the selection query based targeted influence as follows.</p>
<p>Definition 5 (Multidimensional Selection based Targeted Influence). Given a network graph <italic>G</italic>&#x20;&#x3d; (<italic>V</italic>, <italic>E</italic>, <italic>A</italic>), a <italic>k</italic>-dimensional query <italic>Q</italic> and a node set <italic>S</italic>&#x20;&#x2286; <italic>V</italic>, if the influence of <italic>S</italic> in classic influence maximization model is denoted by <italic>I</italic>
<sub>
<italic>S</italic>
</sub>, the targeted influence based on <italic>Q</italic> can be represented by <italic>F</italic>
<sub>
<italic>S</italic>
</sub> &#x3d; <italic>I</italic>
<sub>
<italic>S</italic>
</sub> &#x2229;&#x20;<italic>Q</italic>(<italic>V</italic>).</p>
<p>Similarly, we can define the expected targeted influence <inline-formula id="inf11">
<mml:math id="m11">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> based on the probabilistic distributions generated by the information diffusing procedures, and the formal definition of targeted influence maximization problem can be given as follows.</p>
<p>Definition 6 (Multidimensional Selection based Targeted Influence Maximization, MSTIM for short). Given a network graph <italic>G</italic>&#x20;&#x3d; (<italic>V</italic>, <italic>E</italic>, <italic>A</italic>), a <italic>k</italic>-dimensional query <italic>Q</italic> and an integer <italic>k</italic>, the problem is to find a subset <italic>S</italic>&#x20;&#x2286; <italic>V</italic> satisfying &#x7c;<italic>S</italic>&#x7c; &#x2264; <italic>k</italic> such that the expected size of <inline-formula id="inf12">
<mml:math id="m12">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is maximized.</p>
</sec>
</sec>
<sec id="s4">
<title>4 The Computational Complexity of MSTIM</title>
<p>Obviously, the general influence maximization problem is a special case of the MSTIM problem. Therefore, it can be known that the MSTIM problem is NP-<italic>hard</italic> when the query <italic>Q</italic> simply returns all nodes in <italic>V</italic>. Then, we can have the following result without detailed proofs.</p>
<p>Proposition 4.1. The MSTIM problem is NP-hard.</p>
<p>Of course, the above general case is very special and our interesting point is whether the MSTIM problem can be solved more efficiently when a practical query <italic>Q</italic> is&#x20;met.</p>
<p>Intuitively, the definition of MSTIM is based on limiting the nodes influenced within a range defined by the query <italic>Q</italic>. An extreme case is that the result size of <italic>Q</italic>(<italic>V</italic>) is very small, and it is interested to study whether or not there exists an efficient algorithm for such cases. Obviously, when &#x7c;<italic>Q</italic>(<italic>V</italic>)&#x7c; &#x2264; <italic>k</italic>, the MSTIM problem can be solved efficiently simply by choosing the nodes in <italic>Q</italic>(<italic>V</italic>) into <italic>S</italic>, since the query result <italic>Q</italic>(<italic>V</italic>) can be solved by scanning every node <italic>v</italic>&#x20;&#x2208; <italic>V</italic> and checking the dimensional predicates which will produce an algorithm in linear time. Then, it is meaningful to study whether such an algorithm can be extended to solve more cases for the MSTIM problem.</p>
<p>Next, we can prove that even for very limited but not trivial cases, it is still hard to solve the MSTIM problem efficiently.</p>
<p>Theorem 1. Given a network graph <italic>G</italic>&#x20;&#x3d; (<italic>V</italic>, <italic>E</italic>, <italic>A</italic>)<italic>, a</italic> <italic>k</italic>
<italic>-</italic>dimensional query <italic>Q</italic> and an integer <italic>k</italic>, the MSTIM problem is still NP-hard even for the case &#x7c;<italic>Q</italic>(<italic>V</italic>)&#x7c; &#x3d; <italic>k</italic>&#x20;&#x2b;&#x20;1<italic>.</italic>
</p>
<p>Proof. Consider an instance of the Set Cover problem, which is a well-known NP-complete problem, whose input includes a collection of subsets <italic>S</italic>
<sub>
<italic>1</italic>
</sub>, <italic>S</italic>
<sub>
<italic>2</italic>
</sub>, <italic>&#x2026;</italic>, <italic>S</italic>
<sub>
<italic>m</italic>
</sub> of a ground set <italic>U &#x3d;</italic> {<italic>u</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, u</italic>
<sub>
<italic>2</italic>
</sub>
<italic>, &#x2026; , u</italic>
<sub>
<italic>n</italic>
</sub>}<italic>.</italic> The question is whether there exist <italic>k</italic> of the subsets whose union is equal to&#x20;<italic>U</italic>
<italic>.</italic>
</p>
<p>In [<xref ref-type="bibr" rid="B6">6</xref>], the Set Cover problem is shown to be a special case of the classical influence maximization problem, whose following results is that the classical influence maximization problem is NP-hard. While, in this paper, by the following reduction, it can be utilized to show that the MSTIM problem is NP-hard even for the case <italic>&#x7c;Q(V)&#x7c; &#x3d; k &#x2b;&#x20;1</italic>
<italic>.</italic>
</p>
<p>Given an arbitrary instance of the Set Cover problem, we first define a corresponding directed bipartite graph with <italic>n &#x2b; m</italic> nodes like done in the proof of [<xref ref-type="bibr" rid="B6">6</xref>]. Suppose the bipartite graph constructed is <italic>G</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x3d;</italic> (<italic>V</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, E</italic>
<sub>
<italic>1</italic>
</sub>). For each set <italic>S</italic>
<sub>
<italic>i</italic>
</sub>, there is a corresponding node <italic>v</italic>
<sub>
<italic>i</italic>
</sub>, and there is a node <inline-formula id="inf13">
<mml:math id="m13">
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> for each element <italic>u</italic>
<sub>
<italic>j</italic>
</sub>. If <italic>u</italic>
<sub>
<italic>j</italic>
</sub> <italic>&#x2208; S</italic>
<sub>
<italic>i</italic>
</sub>, there is an edge <inline-formula id="inf14">
<mml:math id="m14">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with activation probability <inline-formula id="inf15">
<mml:math id="m15">
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula>. Then, <italic>G</italic>
<sub>
<italic>2</italic>
</sub> is built based on <italic>G</italic>
<sub>
<italic>1</italic>
</sub> by adding <italic>k</italic> nodes {<italic>w</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, w</italic>
<sub>
<italic>2</italic>
</sub>
<italic>, &#x2026; , w</italic>
<sub>
<italic>k</italic>
</sub>} into <italic>G</italic>
<sub>
<italic>1</italic>
</sub> and building an edge (<italic>v</italic>
<sub>
<italic>i</italic>
</sub>
<italic>, w</italic>
<sub>
<italic>j</italic>
</sub>) with activation probability <inline-formula id="inf16">
<mml:math id="m16">
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>c</mml:mi>
</mml:math>
</inline-formula> for each <italic>i &#x2208;</italic> (<italic>1, m</italic>) and <italic>j &#x2208;</italic> (<italic>1, k</italic>), where <italic>c</italic> is a constant between 0 and 1. Then, <italic>G</italic>
<sub>
<italic>3</italic>
</sub> is built based on <italic>G</italic>
<sub>
<italic>2</italic>
</sub> by adding one node <italic>w</italic>
<sub>
<italic>0</italic>
</sub> and inserting an edge <inline-formula id="inf17">
<mml:math id="m17">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> with activation probability <italic>c</italic> for each <italic>j &#x2208;</italic> (<italic>1, n</italic>). Then, we will build the associated attributes for the graph <italic>G</italic>
<sub>
<italic>3</italic>
</sub> as follows. Let the attribute set <italic>A &#x3d;</italic> (<italic>A</italic>
<sub>
<italic>1</italic>
</sub>). For each node <italic>v</italic> of <italic>G</italic>
<sub>
<italic>3</italic>
</sub>, if <italic>v &#x2208;</italic> {<italic>w</italic>
<sub>
<italic>0</italic>
</sub>
<italic>, w</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, &#x2026; , w</italic>
<sub>
<italic>k</italic>
</sub>}, set <italic>v.A</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x3d; 1</italic>, otherwise, set <italic>v.A</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x3d; 2</italic>. Let <italic>Q</italic> <italic>&#x3d;</italic> (<italic>q</italic>
<sub>
<italic>1</italic>
</sub>) where <italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>: A</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x3d; 1</italic>. Finally, we require that the constant <italic>c</italic> used above is larger than <inline-formula id="inf18">
<mml:math id="m18">
<mml:mfrac>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>. An example of the reduction can be found in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>An example of the reduction, where the edge with activation probability 1 is in red and the other edges have probability <italic>c</italic>
<italic>.</italic>
</p>
</caption>
<graphic xlink:href="fphy-09-768181-g003.tif"/>
</fig>
<p>The following facts are essential for verifying the correctness of the reduction above.<list list-type="simple">
<list-item>
<p>&#x2022; For the query <italic>Q</italic>, we have <italic>&#x7c;Q(V)&#x7c; &#x3d; k &#x2b; 1</italic>, and we can obtain an easy lower bound by selecting arbitrary <italic>k</italic> nodes from {<italic>w</italic>
<sub>
<italic>0</italic>
</sub>
<italic>, &#x2026; , w</italic>
<sub>
<italic>k</italic>
</sub>}. Observing that there are no output edges of the nodes in (<italic>w</italic>
<sub>
<italic>0</italic>
</sub>
<italic>, &#x2026; , w</italic>
<sub>
<italic>k</italic>
</sub>), such a method can produce a seeding node set with expected influence <italic>k</italic> exactly.</p>
</list-item>
<list-item>
<p>&#x2022; Obviously, <italic>S</italic> should not choose nodes in <inline-formula id="inf19">
<mml:math id="m19">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
<mml:mspace width="0.3333em"/>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. In that case, an alternative node in (<italic>v</italic>
<sub>
<italic>i</italic>
</sub>) [<italic>i &#x2208;</italic> (<italic>1, m</italic>)] can be used to replace those nodes without decreasing the influence.</p>
</list-item>
<list-item>
<p>&#x2022; Suppose <italic>S</italic> contains nodes in both (<italic>w</italic>
<sub>
<italic>i</italic>
</sub>) and (<italic>v</italic>
<sub>
<italic>j</italic>
</sub>) and there exists a set cover (<italic>S</italic>
<sub>
<italic>t</italic>
</sub>) of size <italic>k</italic>, we can always increase the influence by utilizing nodes in (<italic>v</italic>
<sub>
<italic>j</italic>
</sub>) to replace nodes in (<italic>w</italic>
<sub>
<italic>i</italic>
</sub>). Assume that there are <italic>x</italic> nodes of {<italic>v</italic>
<sub>
<italic>j</italic>
</sub>} and <italic>y &#x3e; 0</italic> nodes of {<italic>w</italic>
<sub>
<italic>i</italic>
</sub>} in <italic>S</italic>, and the <italic>x</italic> nodes can cover <italic>n &#x2212; 1</italic> nodes in <inline-formula id="inf20">
<mml:math id="m20">
<mml:mrow>
<mml:mo stretchy="false">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> which is the best situation. The expected influence in (<italic>w</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, &#x2026; , w</italic>
<sub>
<italic>k</italic>
</sub>) obtained by <italic>S</italic> can be calculated by the following formula.</p>
</list-item>
</list>
<disp-formula id="e1">
<mml:math id="m21">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(1)</label>
</disp-formula>
</p>
<p>Let <italic>S&#x2032;</italic> be the nodes of {<italic>v</italic>
<sub>
<italic>i</italic>
</sub>} corresponding to the cover {<italic>S</italic>
<sub>
<italic>t</italic>
</sub>}<italic>.</italic>
<disp-formula id="e2">
<mml:math id="m22">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(2)</label>
</disp-formula>
</p>
<p>Then, we have the following results.<disp-formula id="e3">
<mml:math id="m23">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m24">
<mml:mo>&#x21d4;</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(4)</label>
</disp-formula>
<disp-formula id="e5">
<mml:math id="m25">
<mml:mo>&#x21d4;</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2265;</mml:mo>
<mml:mi>k</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
<label>(5)</label>
</disp-formula>
<disp-formula id="e6">
<mml:math id="m26">
<mml:mo>&#x21d4;</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(6)</label>
</disp-formula>
<disp-formula id="e7">
<mml:math id="m27">
<mml:mo>&#x21d4;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(7)</label>
</disp-formula>
<disp-formula id="e8">
<mml:math id="m28">
<mml:mo>&#x21d4;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2265;</mml:mo>
<mml:mi>y</mml:mi>
</mml:math>
<label>(8)</label>
</disp-formula>Obviously, we have <inline-formula id="inf21">
<mml:math id="m29">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> always. Moreover, consider the node <italic>w</italic>
<sub>
<italic>0</italic>
</sub>, it can only increase its influence when choosing more nodes in {<italic>v</italic>
<sub>
<italic>i</italic>
</sub>}. Therefore, if there exists a set cover {<italic>S</italic>
<sub>
<italic>t</italic>
</sub>} of size <italic>k</italic>, an optimal solution can be built by choosing the corresponding <italic>k</italic> nodes in&#x20;{<italic>v</italic>
<sub>
<italic>i</italic>
</sub>}.</p>
<p>Then, according to the facts above, it is easy to check that there exists a solution of the set cover problem if and only if we can find <italic>k</italic> seeding nodes such that the influence obtained is at least <italic>k &#x2b; </italic>1<italic>&#x20;&#x2212; k(<italic>1</italic> &#x2212; c)</italic>
<sup>
<italic>n</italic>
</sup> <italic>&#x2212; (<italic>1</italic> &#x2212;&#x20;c)</italic>
<sup>
<italic>k</italic>
</sup>.</p>
<p>Finally, the MSTIM problem is NP-hard even for the case <italic>&#x7c;Q(V)&#x7c; &#x3d; k &#x2b;&#x20;</italic>1.</p>
</sec>
<sec id="s5">
<title>5 The Basic Sampling Algorithm for MSTIM</title>
<sec id="s5-1">
<title>5.1 Reverse Influence Sampling</title>
<p>RIS (Reverse Influence Sampling) based methods are the state-of-the-art techniques for approximately solving influence maximization problem. In this part, we introduce this kind of methods first. First, we introduce the concept of Reverse Reachable (RR) Set and Random RR&#x20;Set.</p>
<p>Definition 7 (Reverse Reachable Set, [<xref ref-type="bibr" rid="B22">22</xref>]). Let <italic>v</italic> be a node in <italic>G</italic>, and <inline-formula id="inf22">
<mml:math id="m30">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> be a graph obtained by removing each edge <italic>e</italic> in <italic>G</italic> with <italic>1 &#x2212; p(e)</italic> probability. The reverse reachable (RR) set for <italic>v</italic> in <inline-formula id="inf23">
<mml:math id="m31">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the set of nodes in <inline-formula id="inf24">
<mml:math id="m32">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> that can reach <italic>v</italic>. (That is, for each node <italic>u</italic> in the RR set, there is a directed path from <italic>u</italic> to <italic>v</italic> in <inline-formula id="inf25">
<mml:math id="m33">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>).</p>
<p>Definition 8 (Random RR Set, [<xref ref-type="bibr" rid="B22">22</xref>]). Let <inline-formula id="inf26">
<mml:math id="m34">
<mml:mi mathvariant="script">G</mml:mi>
</mml:math>
</inline-formula> be the distribution of <inline-formula id="inf27">
<mml:math id="m35">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> induced by the randomness in edge removals from <italic>G</italic>. A random RR set is an RR set generated on an instance of <inline-formula id="inf28">
<mml:math id="m36">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> randomly sampled from <inline-formula id="inf29">
<mml:math id="m37">
<mml:mi mathvariant="script">G</mml:mi>
</mml:math>
</inline-formula>, for a node selected uniformly at random from <inline-formula id="inf30">
<mml:math id="m38">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Based on the two definitions above, a typical RIS based method can be described as follows.<list list-type="simple">
<list-item>
<p>&#x2022; Generate multiple random RR sets based on&#x20;<italic>G</italic>
<italic>.</italic>
</p>
</list-item>
<list-item>
<p>&#x2022; Utilize the greedy based algorithm for max-coverage problem shown in [<xref ref-type="bibr" rid="B23">23</xref>] to find a node set <italic>A</italic> satisfying <italic>&#x7c;A&#x7c; &#x2264; k</italic> such that as many random RR sets can be covered by <italic>A</italic> as possible. The solution is a <italic>(1 &#x2212; 1/e)</italic>-approximation result.</p>
</list-item>
</list>
</p>
<p>Obviously, since the second step is just a standard method for solving maximum coverage problem, to guarantee the <italic>(1 &#x2212; 1/e)</italic> approximation ratio, enough samples should be gathered in the first round of the algorithm. As shown in [<xref ref-type="bibr" rid="B24">24</xref>], there have been several research works providing such sampling based influence maximization algorithm with <italic>1 &#x2212; 1/e &#x2212; &#x3f5;</italic> approximation ratio within time cost <inline-formula id="inf31">
<mml:math id="m39">
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x2b;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Here, assuming that the sample size is presented by <italic>&#x3b8;</italic>, a result shown in [<xref ref-type="bibr" rid="B22">22</xref>] is utilized to explain the principles of our method. Since there have been always research efforts focusing on improving the sampling size, it is easy to check that the improved version can be easily applied to the method proposed by&#x20;us.</p>
<p>Theorem 2 [<xref ref-type="bibr" rid="B22">22</xref>]. If <italic>&#x3b8;</italic> is at least <inline-formula id="inf32">
<mml:math id="m40">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>8</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mfrac linethickness="0.0pt">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, the typical RIS method will return a solution with <italic>1 &#x2212; 1/e &#x2212; &#x3f5;</italic> approximation ratio with high probability <italic>1 &#x2212; &#x3b4;</italic>.</p>
</sec>
<sec id="s5-2">
<title>5.2 RIS Based Algorithm for MSTIM</title>
<p>As shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, a conceptual algorithm is given to solve the MSTIM problem. Compared with the original version of RIS method, the random RR set is not generated by starting from an arbitrary node in <italic>G</italic>, but from a node in <italic>Q(V)</italic>, where <italic>Q(V)</italic> is the nodes in the selection result of query&#x20;<italic>Q</italic>.</p>
<p>
<statement content-type="algorithm" id="alg1">
<label>Algorithm 1</label>
<p>RIS-MSTIM.</p>
<p>
<inline-graphic xlink:href="fphy-09-768181-fx1.tif"/>
</p>
<p>Obviously, the verify the correctness of the above algorithm, it is only needed to show that the random RR set obtained is a proper estimator of <inline-formula id="inf33">
<mml:math id="m41">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and <italic>&#x3b8;</italic> can take a large enough size such that the quality of the final seeding node set can be guaranteed.</p>
<p>Theorem 3. Given a set <italic>S &#x2286; V</italic>, for a random RR set <italic>e</italic> of <italic>Q(V)</italic>, we have <inline-formula id="inf34">
<mml:math id="m42">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>.</p>
<p>Proof. According to the definition of <inline-formula id="inf35">
<mml:math id="m43">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, given a special possible graph <inline-formula id="inf36">
<mml:math id="m44">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> let <inline-formula id="inf37">
<mml:math id="m45">
<mml:msubsup>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> be the influenced node size of <italic>S</italic>, then we have<disp-formula id="e9">
<mml:math id="m46">
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">G</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(9)</label>
</disp-formula>
<disp-formula id="e10">
<mml:math id="m47">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">G</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x22c5;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
<mml:mi mathvariant="bold">I</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(10)</label>
</disp-formula>
<disp-formula id="e11">
<mml:math id="m48">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">G</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x22c5;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
<mml:mi mathvariant="bold">I</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(11)</label>
</disp-formula>
<disp-formula id="e12">
<mml:math id="m49">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>&#x223c;</mml:mo>
<mml:mi mathvariant="script">G</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">&#x302;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi mathvariant="bold">I</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(12)</label>
</disp-formula>
<disp-formula id="e13">
<mml:math id="m50">
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<label>(13)</label>
</disp-formula>
<disp-formula id="e14">
<mml:math id="m51">
<mml:mo>&#x3d;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:math>
<label>(14)</label>
</disp-formula>Since the starting node <italic>v</italic> of <italic>Q(V)</italic> is randomly selected, for each of them the probability of being selected is <inline-formula id="inf38">
<mml:math id="m52">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>. In addition, the selection of <italic>S</italic> and <italic>v</italic> is independent from each other. Therefore, we have <inline-formula id="inf39">
<mml:math id="m53">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo movablelimits="false" form="prefix">&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
<mml:mfrac>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>. Finally, we have <inline-formula id="inf40">
<mml:math id="m54">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2229;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>&#x2260;</mml:mo>
<mml:mi>&#x2205;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>.</p>
<p>Theorem 4. If <italic>&#x3b8;</italic> is at least <inline-formula id="inf41">
<mml:math id="m55">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>8</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mfrac linethickness="0.0pt">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, the RIS-MSTIM method will return a solution with <italic>1 &#x2212; 1/e &#x2212; &#x3f5;</italic> approximation ratio with high probability <italic>1 &#x2212; &#x3b4;</italic>, where <inline-formula id="inf42">
<mml:math id="m56">
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is the largest influence of <italic>k</italic> seeding set in the MSTIM problem.</p>
<p>Proof. Let <italic>&#x3c1;</italic> be the probability <italic>Pr[e &#x2229; S &#x2260; &#x2205;]</italic> and <italic>X</italic>
<sub>
<italic>i</italic>
</sub> be a Bernoulli variable defined based on <italic>&#x3c1;</italic>, considering the sum of all <italic>X</italic>
<sub>
<italic>i</italic>
</sub>s corresponding to the generated RR sets, we only need to guarantee the following condition.<disp-formula id="e15">
<mml:math id="m57">
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x2212;</mml:mo>
<mml:mi mathvariant="double-struck">E</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mfrac linethickness="0.0pt">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(15)</label>
</disp-formula>
<disp-formula id="e16">
<mml:math id="m58">
<mml:mo>&#x21d4;</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfenced open="|" close="|">
<mml:mrow>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo>&#x2211;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2265;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x22c5;</mml:mo>
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b8;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x2264;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mfrac linethickness="0.0pt">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:math>
<label>(16)</label>
</disp-formula>Then, similar with the analysis in [<xref ref-type="bibr" rid="B22">22</xref>], it can be shown that when <inline-formula id="inf43">
<mml:math id="m59">
<mml:mi>&#x3b8;</mml:mi>
<mml:mo>&#x2265;</mml:mo>
<mml:mspace width="0.3333em"/>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mn>8</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>Q</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mo>&#x22c5;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mfrac linethickness="0.0pt">
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mfenced>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>ln</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mi>P</mml:mi>
<mml:msubsup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>Q</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>&#x3f5;</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, the RIS-MSTIM method will return a solution with <italic>1 &#x2212; 1/e &#x2212; &#x3f5;</italic> approximation ratio with high probability <italic>1 &#x2212; &#x3b4;</italic>.</p>
</statement>
</p>
</sec>
</sec>
<sec id="s6">
<title>6&#x20;Index-Based Solution for MSTIM</title>
<sec id="s6-1">
<title>6.1 Indexing for the Range Query</title>
<p>Since the query <italic>Q</italic> required is holistic, the result <italic>Q(V)</italic> cannot be predicated well. For the step 2 of the conceptual level algorithm RIS-MSTIM, the query result <italic>T &#x3d; Q(V)</italic> is extracted to help the following sampling steps, however, evaluating the query <italic>Q</italic> may be an expensive procedure because of the multi-selection predicates&#x20;[<xref ref-type="bibr" rid="B25">25</xref>].</p>
<p>A possible solution is to utilize sophisticated index to improve the query performance, such as R-Tree [<xref ref-type="bibr" rid="B26">26</xref>], k-d Tree [<xref ref-type="bibr" rid="B27">27</xref>] and so on. However, the above indexes will cause large storage overhead and usually the selection time cost using the index will increase as storage cost increases, especially when the data distribution is seriously skewed. For the worst cases, even the scanning method may become more time and space efficient&#x20;[<xref ref-type="bibr" rid="B28">28</xref>].</p>
<p>Since the evaluation of the range queries is a preprocess of the whole RIS-MSTIM algorithm and most space cost will be expected to be improve the sampling performance as shown by the follows, inspired by the method used by [<xref ref-type="bibr" rid="B28">28</xref>], an adaptive indexing method for the range queries are utilized here to improve the performance of evaluating range queries.</p>
<p>There are mainly two index structures, resultPool and queryPool, utilized to improve the performance of the range query evaluation.</p>
<sec id="s6-1-1">
<title>6.1.1 The ResultPool Index Structure</title>
<p>The resultPool index maintains the information about the query results of the queries processed previously, whose function is to provide a physical cache for the results such that the queries selecting a subset of some previous query can be processed efficiently. Assuming that each query can be identified by an unique queryID, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, for each <italic>q</italic>
<sub>
<italic>i</italic>
</sub>, four parts of information are maintained in resultPool.<list list-type="simple">
<list-item>
<p>a) The query statement is the formal representation of the query&#x20;<italic>q</italic>
<sub>
<italic>i</italic>
</sub>
<italic>.</italic>
</p>
</list-item>
<list-item>
<p>b) The query results are the IDs of the nodes in <italic>V</italic> which belong to the set <italic>q</italic>
<sub>
<italic>i</italic>
</sub>
<italic>(V)</italic>, and the nodes are listed in the ascending order of their IDs. Furthermore, the node set <italic>V</italic> can be stored in an array indexed by the nodeIDs, such that the attribute values of some node <italic>v</italic> can be accessed in constant time cost using the ID of&#x20;<italic>v</italic>.</p>
</list-item>
<list-item>
<p>c) The dimension size <italic>k</italic> represents how many predicates are utilized in&#x20;<italic>q</italic>
<sub>
<italic>i</italic>
</sub>.</p>
</list-item>
<list-item>
<p>d) The result size is the size of <italic>q</italic>
<sub>
<italic>i</italic>
</sub>
<italic>(V)</italic>.</p>
</list-item>
</list>
</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Index structure <sans-serif>resultPool</sans-serif> of queries.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g004.tif"/>
</fig>
<p>The queries stored in <sans-serif>resultPool</sans-serif> come from two parts. One part includes the queries processed before, and the other part includes some queries maintained in previous, which are represented by <italic>q</italic>
<sub>
<italic>i</italic>
</sub>
<italic>s</italic> and <inline-formula id="inf44">
<mml:math id="m60">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2217;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>s, respectively. Intuitively, since the queries appear in an ad-hoc way, if only <italic>q</italic>
<sub>
<italic>i</italic>
</sub>
<italic>s</italic> are maintained, a totally new query will cause poor performance, therefore, some typical queries which can improve the performance of more general queries are also maintained. The details of how to choose the queries <inline-formula id="inf45">
<mml:math id="m61">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>s will be introduced in the following.</p>
<p>Example 3. Continue with Example 2, given the graph shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, the index structure related with a special query history {<italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, q</italic>
<sub>
<italic>2</italic>
</sub>
<italic>, &#x2026; </italic>} contains two parts, <italic>q</italic>
<sub>
<italic>i</italic>
</sub>s and <inline-formula id="inf46">
<mml:math id="m62">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>s. The query <italic>q</italic>
<sub>
<italic>1</italic>
</sub> is requested by the users before, according to <sans-serif>
<italic>resultPool</italic>
</sans-serif>, it can be known that 1) <italic>q</italic>
<sub>
<italic>1</italic>
</sub> is a 2-dimensional range query whose statement is <italic>(A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY) &#x2227; (20 &#x2264; A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2264; 30)</italic>, and 2) the result is <italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>(V) &#x3d; {a}</italic>. The query <inline-formula id="inf47">
<mml:math id="m63">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> do not need to be requested by the previous users, but the related information is still maintained. According to <sans-serif>resultPool</sans-serif>, it can be know that 1) <inline-formula id="inf48">
<mml:math id="m64">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> is a 1-dimensional range query with statement <italic>0 &#x2264; A</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x2264; 20</italic>, and 2) the result set <inline-formula id="inf49">
<mml:math id="m65">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is of size 1 and only contains the node&#x20;<italic>a</italic>.</p>
</sec>
<sec id="s6-1-2">
<title>6.1.2 The QueryPool Index Structure</title>
<p>
<statement content-type="algorithm" id="alg2">
<label>Algorithm 2</label>
<p>IndexOperations.</p>
<p>
<inline-graphic xlink:href="fphy-09-768181-fx2.tif"/>
</p>
<p>The <sans-serif>resultPool</sans-serif> makes it possible to utilize previous query answers to accelerate the evaluation of current query. To make it work, given a special query <italic>q</italic>, it still needs to support efficient extraction of helpful queries of <italic>q</italic> from all queries maintained by <sans-serif>resultPool</sans-serif>. Before introducing the index structure <sans-serif>queryPool</sans-serif> with the above abilities, the concepts of query covering and redundant queries are explained&#x20;first.</p>
<p>Given two range queries <italic>q</italic>
<sub>
<italic>1</italic>
</sub> and <italic>q</italic>
<sub>
<italic>2</italic>
</sub>, <italic>q</italic>
<sub>
<italic>1</italic>
</sub> is contained by <italic>q</italic>
<sub>
<italic>2</italic>
</sub>, denoted by <italic>q</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x2286; q</italic>
<sub>
<italic>2</italic>
</sub>, if for every data instance <italic>D</italic> there is <italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>(D) &#x2286; q</italic>
<sub>
<italic>2</italic>
</sub>
<italic>(D)</italic>. Specially, if <italic>q</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x2286; q</italic>
<sub>
<italic>2</italic>
</sub> and <italic>q</italic>
<sub>
<italic>2</italic>
</sub> is a 1-dimensional range query, a.k.a. a predicate, we say <italic>q</italic>
<sub>
<italic>2</italic>
</sub> covers <italic>q</italic>
<sub>
<italic>1</italic>
</sub>. For a set of queries {<italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, &#x2026; , q</italic>
<sub>
<italic>n</italic>
</sub>}, if there is <italic>q</italic>
<sub>
<italic>i</italic>
</sub> <italic>&#x2286; q</italic>
<sub>
<italic>j</italic>
</sub> <italic>(i &#x2260; j)</italic>, <italic>q</italic>
<sub>
<italic>j</italic>
</sub> is called to be a redundant one in the following.</p>
<p>The <sans-serif>queryPool</sans-serif> index maintains the relations between keys and queries, where a key is some predicate utilized in the queries. For all queries <italic>q</italic>
<sub>
<italic>i</italic>
</sub>s previously processed, <sans-serif>queryPool</sans-serif> organizes those predicates into groups by the dimension size. Each key is assigned with an unique keyid. For each special key <italic>p</italic>, qlist maintains the queryIDs of the queries which contain <italic>p</italic>. Moreover, the queryIDs are sorted into the ascending order and stored, which will facilitate the process of searching queries. The following example will explain how to search the related queries utilizing <sans-serif>queryPool</sans-serif>.</p>
<p>Example 4. Continue with Example 3, the corresponding <sans-serif>queryPool</sans-serif> index is shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>. The rows with <italic>k &#x3d; 2</italic> maintains the information about the 2-dimensional queries previously used. For the predicate with statement <italic>A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY</italic>, its keyid is <italic>k2</italic> and the corresponding <italic>qlist</italic> contains two queries <italic>q</italic>
<sub>
<italic>1</italic>
</sub> and <italic>q</italic>
<sub>
<italic>3</italic>
</sub>, which are sorted in the ascending order. For the predicate <italic>A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208;</italic> [<italic>20, 30</italic>], although it appears in both <italic>q</italic>
<sub>
<italic>1</italic>
</sub> and <italic>q</italic>
<sub>
<italic>5</italic>
</sub>, its corresponding qlist only contains <italic>q</italic>
<sub>
<italic>1</italic>
</sub>, because <italic>q</italic>
<sub>
<italic>1</italic>
</sub> is a 2-dimensional query but <italic>q</italic>
<sub>
<italic>5</italic>
</sub> is not. The related information for the queries <inline-formula id="inf50">
<mml:math id="m66">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>s are stored in the group with <italic>k &#x3d; 0</italic> of <sans-serif>queryPool</sans-serif>.</p>
</statement>
</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Index structures for the range queries.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g005.tif"/>
</fig>
</sec>
<sec id="s6-1-3">
<title>6.1.3 The SearchQueryPool Procedure</title>
<p>Next, the principles of using the two index structures to improve the performance of range query evaluation are introduced.</p>
<p>The <sc>SearchQueryPool</sc> function is shown in <xref ref-type="other" rid="alg2">Algorithm 2</xref>, which will return a query <italic>Q&#x2032;</italic> for given a query <italic>Q</italic> such that the result <italic>Q(V)</italic> can be obtained efficiently based on <italic>Q&#x2032;(V)</italic>. First, a set candidate is initialized to be empty (line 2), which will be used to store the candidates of <italic>Q&#x2032;</italic>. Then, an iteration from the group with <italic>k &#x3d; &#x7c;Q&#x7c;</italic> to the one with <italic>k &#x3d; 1</italic> is done to generate the queries in candidate (line 3&#x2013;21). For each fixed <italic>i &#x2208; </italic>[<italic>1, &#x7c;Q&#x7c;</italic>], a merge-style method is used to extract the queries containing <italic>Q</italic> efficiently by the following steps. (a) The keys covering some predicate of <italic>Q</italic> are collected, and a pointer is initialized at the front of the corresponding qlist for each key found (line 4&#x2013;7). The details of obtaining keys covering some predicate will be introduced later. (b) Using the pointers initialized above, the merge-style method works by counting the appearing times of the current smallest queryID and inserting the query appearing no less than <italic>i</italic> times to the candidate set (line 8&#x2013;19). At the end of each iteration, if the candidate is not empty, the iterations will stop. Finally, after the iterations, if the candidate set is not empty, an arbitrary non-redundant query in candidate is returned (line 22&#x2013;24). Otherwise, <sans-serif>null</sans-serif> is returned (line&#x20;26).</p>
<p>As shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>, to efficiently search the keys covering some predicate, for the numerical attribute we can use a tree based search structure to achieve a <italic>O(log&#x2009; n)</italic> time cost for search operation where <italic>n</italic> is the number of distinct values appearing in the query predicates, and for the category values a hash table can be utilized to achieve an expected <italic>O(1)</italic> time cost.<list list-type="simple">
<list-item>
<p>&#x2022; For each numerical attribute <italic>A</italic>, a standard binary search tree is built. The search keys are chosen from the set of all distinct values appearing in the queries. In each leaf node with search key <italic>m</italic>, the keyID of each predicate <italic>q</italic>
<sub>
<italic>A</italic>
</sub> over <italic>A</italic> satisfying <italic>m &#x2208; q</italic>
<sub>
<italic>A</italic>
</sub> is stored. For example, as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>, the leaf node with search key 25 includes K4 and K5 whose corresponding predicates in the <sans-serif>
<italic>queryPool</italic>
</sans-serif> are <italic>A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208; [20, 30]</italic> and <italic>A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208; </italic>[<italic>25, 50</italic>] respectively. Then, given a predicate with range <italic>[lbound, rbound]</italic>, the leaf node <italic>lnode</italic> is obtained by finding the smallest search key no less than <italic>lbound</italic>, and the leaf node <italic>rnode</italic> is obtained by finding the largest search key no larger than <italic>rbound</italic>. Finally, the intersection of the two keyID sets associated with <italic>lnode</italic> and <italic>rnode</italic> is returned.</p>
</list-item>
<list-item>
<p>&#x2022; For each category attribute <italic>A</italic>, a standard hash map is built by using the values as the hashing keys. Similarly, each item <italic>x</italic> obtained by the hash map is associated the keyIDs of the predicates in the form <italic>A &#x3d; x</italic>. For example, as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>, the hashed item of &#x201c;NY&#x201d; is associated with K2 and K3, which are keyIDs of predicates <italic>A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY</italic> and <italic>A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x2208; </italic>{<italic>NY,&#x20;GA</italic>}.</p>
</list-item>
</list>
</p>
<p>Example 5. Continue with Example 4, given a query <italic>Q: (A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY) &#x2227; </italic>(<italic>25 &#x2264; A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2264; 40</italic>), the candidate queries which contain <italic>Q</italic> can be found as follows. First, <italic>Q</italic> can be divided into two predicates <italic>q</italic>
<sub>
<italic>1</italic>
</sub>
<italic>: A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208; [25, 40]</italic> and <italic>q</italic>
<sub>
<italic>2</italic>
</sub>
<italic>: A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY</italic>. Then, for the group of keys with <italic>k &#x3d; 2</italic> in the <sans-serif>queryPool</sans-serif> index, as shown in <xref ref-type="fig" rid="F5">Figure&#x20;5B</xref>, <italic>q</italic>
<sub>
<italic>1</italic>
</sub> can be processed in the search tree structure with keys 25 and&#x20;40, and the query obtained is {<italic>K4,K5</italic>}<italic>&#x2229;</italic>{<italic>K5</italic>}<italic> &#x3d; </italic>{<italic>K5</italic>}. Similarly, the queries obtained for <italic>q</italic>
<sub>
<italic>2</italic>
</sub> are {<italic>K2K3</italic>} using the hash map structure. After that, the corresponding qlists of {<italic>K2,K3,K5</italic>} in <sans-serif>queryPool</sans-serif> are extracted and merged as the following steps.<disp-formula id="e17">
<mml:math id="m67">
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x21d2;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x21d2;</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:mfenced>
<mml:mfenced open="{" close="}">
</mml:mfenced>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x21d2;</mml:mo>
<mml:mo>{</mml:mo>
<mml:mo>}</mml:mo>
<mml:mfenced open="{" close="}">
<mml:mrow>
<mml:munder accentunder="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">&#x332;</mml:mo>
</mml:munder>
</mml:mrow>
</mml:mfenced>
<mml:mo>&#x21d2;</mml:mo>
<mml:mo>{</mml:mo>
<mml:mo>}</mml:mo>
<mml:mo>{</mml:mo>
<mml:mo>}</mml:mo>
<mml:mo>{</mml:mo>
<mml:mo>}</mml:mo>
</mml:math>
<label>(17)</label>
</disp-formula>Here, the underline indicates the position of the related pointer for each list. During the merge procedure, the query <italic>q</italic>
<sub>
<italic>3</italic>
</sub> will be inserted into the candidate set, since in the third step <italic>q</italic>
<sub>
<italic>3</italic>
</sub> appears at the front of two lists. To verify the correctness, it can be found that the statement of <italic>q</italic>
<sub>
<italic>3</italic>
</sub> is <italic>(A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208; </italic>[25, 50]) &#x2227; (<italic>A</italic>
<sub>
<italic>2</italic>
</sub> <italic>&#x3d; NY)</italic> and obviously we have <italic>Q &#x2286;&#x20;q</italic>
<sub>
<italic>3</italic>
</sub>.</p>
</sec>
<sec id="s6-1-4">
<title>6.1.4 Index Based Range Query Evaluation</title>
<p>
<statement content-type="algorithm" id="alg3">
<label>Algorithm 3</label>
<p>I<italic>ndexRQE</italic> (Index Based Range Query Evaluation).</p>
<p>
<inline-graphic xlink:href="fphy-09-768181-fx3.tif"/>
</p>
<p>Now, we will show how to obtain the exact result of a given query based on the techniques shown above. As shown in <xref ref-type="other" rid="alg3">Algorithm 3</xref>, given a <italic>k &#x2212; dimensional</italic> query <italic>Q</italic>, the function <sc>SearchQueryPool</sc> is first invoked to search a candidate query set <italic>q</italic> whose item will contain the query <italic>Q</italic> (line 2). The structure <italic>qres</italic> is utilized to maintain a superset of <italic>Q(V)</italic> and initialized to be empty (line 3). If <italic>q</italic> is not <sans-serif>null</sans-serif>, the results of <italic>q</italic> will be collected into <italic>qres</italic> (line 5), otherwise, <italic>qres</italic> will be built based on the queries <inline-formula id="inf51">
<mml:math id="m68">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula>s as follows (line 7&#x2013;12). The selectivity <italic>&#x3b4;</italic>
<sub>
<italic>A</italic>
</sub> of each attribute <italic>A</italic> involved in <italic>Q</italic> is calculated based on the assumption that all appearing values of <italic>A</italic> are chosen in an uniform random way. Here, <italic>&#x3b4;</italic>
<sub>
<italic>A</italic>
</sub> is defined to be <inline-formula id="inf52">
<mml:math id="m69">
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>r</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>l</mml:mi>
<mml:mi>b</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>D</mml:mi>
<mml:mi>o</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula>, where <italic>lbound</italic> and <italic>rbound</italic> is the range of <italic>A</italic> in <italic>Q</italic> and <italic>Dom</italic>
<sub>
<italic>A</italic>
</sub> is the domain size of <italic>A</italic>. Intuitively, <italic>&#x3b4;</italic>
<sub>
<italic>A</italic>
</sub> can tell us how many items can be filtered using the predicate of <italic>A</italic> in <italic>Q</italic>. The attribute <italic>X</italic> with the smallest <italic>&#x3b4;</italic>
<sub>
<italic>X</italic>
</sub> will be chosen (line 8), since it is expected to filter the largest part of the data. Then, the predicates of <italic>X</italic> maintained in the group with <italic>k &#x3d; 0</italic> in <sans-serif>queryPool</sans-serif> will be scanned, and the results of the predicates having intersections with <italic>Q</italic> will be collected together into <italic>qres</italic> (line 9&#x2013;12). Then, the items in <italic>qres</italic> will be checked one by one to obtain <italic>S &#x3d; Q(V)</italic> (line 13&#x2013;16). Finally, if <inline-formula id="inf53">
<mml:math id="m70">
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>q</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> is smaller than a predefined threshold <italic>&#x3b1;</italic>, the query <italic>Q</italic> will be inserted into the index <sans-serif>resultPool</sans-serif> and <sans-serif>queryPool</sans-serif> (line 17&#x2013;18), where the details are omitted here since it can be implemented trivially. Essentially, the value <inline-formula id="inf54">
<mml:math id="m71">
<mml:mfrac>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">&#x7c;</mml:mo>
<mml:mi>q</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">&#x7c;</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
</inline-formula> can represent the ratio of truly useful items in the set <italic>pres</italic>, and a smaller value means that more useless items are collected in the previous&#x20;step.</p>
<p>Example 6. Continue with Example 5, suppose the query <italic>Q&#x2032;: A</italic>
<sub>
<italic>3</italic>
</sub> <italic>&#x2208; </italic>[<italic>10, 40</italic>] is given. Obviously, there are no predicates covering <italic>Q&#x2032;</italic>, therefore, the function SearchQueryPoo<italic>l</italic>
<italic>(Q&#x2032;)</italic> will return <sans-serif>null</sans-serif>. Then, according to the function IndexRQE, the results of <inline-formula id="inf55">
<mml:math id="m72">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> and <inline-formula id="inf56">
<mml:math id="m73">
<mml:msubsup>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>&#x2a;</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
</inline-formula> maintained in <sans-serif>resultPool</sans-serif> will be collected into <italic>qres</italic>. Finally, only the nodes <italic>a, c, d, e</italic> and <italic>f</italic> will be checked, but not all nodes in&#x20;<italic>V</italic>.</p>
</statement>
</p>
</sec>
</sec>
<sec id="s6-2">
<title>6.2 Indexing for the Random RR Sets</title>
<sec id="s6-2-1">
<title>6.2.1 The Indexing Structures</title>
<p>To maintain the related information of random RR sets, three index structures are utilized, <sans-serif>
<italic>NodeToRR</italic>
</sans-serif>, <sans-serif>
<italic>RRSet</italic>
</sans-serif>, and <sans-serif>
<italic>RRInvertedList</italic>
</sans-serif>.</p>
<p>In the <sans-serif>RRSet</sans-serif>, for each random RR set <italic>rr</italic>
<sub>
<italic>i</italic>
</sub>, there is an unique RRid, and all nodes contained in <italic>rr</italic>
<sub>
<italic>i</italic>
</sub> are stored as nodeList. Within <sans-serif>RRSet</sans-serif>, the items are maintained in the ascending order of RRid. In the <sans-serif>NodeToRR</sans-serif>, for each node <italic>v &#x2208; V</italic>, the nodeID of <italic>v</italic> is stored and the corresponding rrList includes the list of RRids of the random RR sets which are obtained by sampling from node <italic>v</italic>. In the <sans-serif>RRInvertedList</sans-serif>, for each node <italic>v &#x2208; V</italic>, the nodeID of <italic>v</italic> is stored and the corresponding rrCoverList maintains the list of RRids of the random RR sets which cover the node&#x20;<italic>v</italic>
<italic>.</italic>
</p>
<p>Example 7. As shown in <xref ref-type="fig" rid="F6">Figure&#x20;6B</xref>, a set of samples are listed in the <sans-serif>RRSet</sans-serif>, where there are 9 samples in total and the first random RR set is <italic>rr</italic>
<sub>
<italic>1</italic>
</sub> <italic>&#x3d; {a, b}</italic>. According to the information in <sans-serif>RRSet</sans-serif>, as shown in <xref ref-type="fig" rid="F6">Figure&#x20;6A</xref>, the <sans-serif>NodeToRR</sans-serif> structure maintains the list of samples beginning from some special node. There are totally two samples <italic>rr</italic>
<sub>
<italic>1</italic>
</sub> and <italic>rr</italic>
<sub>
<italic>2</italic>
</sub> in <sans-serif>RRSet</sans-serif> beginning from the node <italic>a</italic>, then, the corresponding rrList of nodeID <italic>a</italic> includes <italic>rr</italic>
<sub>
<italic>1</italic>
</sub> and <italic>rr</italic>
<sub>
<italic>2</italic>
</sub>. In the <sans-serif>RRInvertedList</sans-serif> structure, the rrCoverList of nodeID <italic>a</italic> includes 5 items <italic>rr</italic>
<sub>
<italic>1</italic>
</sub>
<italic>, rr</italic>
<sub>
<italic>2</italic>
</sub>
<italic>, rr</italic>
<sub>
<italic>4</italic>
</sub>
<italic>, rr</italic>
<sub>
<italic>7</italic>
</sub> and <italic>rr</italic>
<sub>
<italic>9</italic>
</sub>
<italic>,</italic> each of them contains the nodeID <italic>a</italic> in the corresponding nodeList.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Index structures for the random RR&#x20;sets.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g006.tif"/>
</fig>
</sec>
<sec id="s6-2-2">
<title>6.2.2 Adaptive Sampling Using Index</title>
<p>
<statement content-type="algorithm" id="alg4">
<label>Algorithm 4</label>
<p>AdaptiveSampling.</p>
<p>
<inline-graphic xlink:href="fphy-09-768181-fx4.tif"/>
</p>
<p>In this part, we will explain how to sample the random RR sets adaptively under the help of indexes introduced above. As shown in <xref ref-type="other" rid="alg4">Algorithm 4</xref>, the inputs of <sc>AdaptiveSampling</sc> include the network graph <italic>G</italic>, a target node set <italic>T</italic> and a sampling size <italic>&#x3b8;</italic>, and the output <italic>R</italic> is a sample set of random RR sets. For each node <italic>v &#x2208; T</italic>, a variable <italic>cnt</italic>
<sub>
<italic>v</italic>
</sub> will be used to record the number of samples needed starting from <italic>v</italic> and initialized to be 0 (line 3&#x2013;4). Then, <italic>&#x3b8;</italic> nodes are randomly selected from <italic>T</italic> with replacement, different from the commonly used sampling methods this step does not start the random walking but just increases the counter variable <italic>cnt</italic>
<sub>
<italic>v</italic>
</sub> to remember that one sample is needed (line 5&#x2013;7). After that, we will know the number of samples needed for each node, all left we need to do is to reuse the samples maintained in <sans-serif>RRSet</sans-serif> to build <italic>R</italic> and collect more samples adaptively by random walking when there are no enough samples in <sans-serif>RRSet</sans-serif>. The details can be explained as follows. For each node <italic>v &#x2208; T</italic>, if <sans-serif>NodeToRR</sans-serif>
<italic>[v]</italic>.rrList.size() is as large as <italic>cnt</italic>
<sub>
<italic>v</italic>
</sub>, the samples maintained in <sans-serif>RRSet</sans-serif> is enough and no further real sampling operations are needed (line 9&#x2013;10). Otherwise, we use <italic>&#x394;</italic>
<sub>
<italic>v</italic>
</sub> to represent the difference between <sans-serif>NodeToRR</sans-serif>
<italic>[v]</italic>.rrList.size() and <italic>cnt</italic>
<sub>
<italic>v</italic>
</sub> (line 12), and <italic>&#x394;</italic>
<sub>
<italic>v</italic>
</sub> times random walking will be executed to collect extra samples needed which will be inserted into <sans-serif>NodeToRR</sans-serif> at the same time (line 13). After updating the <sans-serif>RRInvertedList</sans-serif> (line 14), those new samples will be inserted into <italic>R</italic> (line 15). Since the <sans-serif>RRInvertedList</sans-serif> is a commonly used inverted list, the update can be implemented in a natural way whose details are omitted&#x20;here.</p>
<p>Example 8. Let us consider the case that <italic>cnt</italic>
<sub>
<italic>g</italic>
</sub> and <italic>cnt</italic>
<sub>
<italic>f</italic>
</sub> are assigned to be 1 and 2, respectively, when running AdaptiveSampling. As shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>, the information shown in black is the index structures before invoking AdaptiveSampling. Since <italic>cnt</italic>
<sub>
<italic>g</italic>
</sub> <italic>&#x3d; 1</italic> and there are no items with <italic>nodeID &#x3d; g</italic>, one random walking will be made to obtain a new sample <italic>rr</italic>
<sub>
<italic>1</italic>
</sub>
<italic>0</italic>. Then, the parts in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref> shown in red will be added. For the node <italic>f</italic>, since <italic>cnt</italic>
<sub>
<italic>f</italic>
</sub> <italic>&#x3d; 2</italic> and there are three <italic>RRid</italic>s in the rrList of the item with <italic>nodeID &#x3d; f</italic> in <sans-serif>NodeToRR</sans-serif>, no more random walking are needed, AdaptiveSampling will choose two samples from {<italic>rr</italic>
<sub>
<italic>7</italic>
</sub>
<italic>, rr</italic>
<sub>
<italic>8</italic>
</sub>
<italic>,&#x20;rr</italic>
<sub>
<italic>9</italic>
</sub>}.</p>
</statement>
</p>
</sec>
</sec>
<sec id="s6-3">
<title>6.3 RIS-MSTIM-Index Algorithm</title>
<p>Finally, we introduce the main procedure of index based solution for the MSTIM problem. As shown in <xref ref-type="other" rid="alg5">Algorithm 5</xref>, the <sc>RIS-MSTIM-Index</sc> function is the index version of <xref ref-type="other" rid="alg1">Algorithm 1</xref>. Different from <xref ref-type="other" rid="alg1">Algorithm 1</xref>, <sc>RIS-MSTIM-Index</sc> utilizes the <sc>IndexRQE</sc> method to collect the results of <italic>Q(V)</italic> (line 1). Then, during the sampling procedure, <sc>AdaptiveSampling</sc> is invoked to obtain the sample set with enough random RR sets (line 4). After that, to utilize the <sans-serif>RRInvertedList</sans-serif> structure to compute the greedy based approximation solution of the optimal <italic>k</italic> seeds, for each node <italic>v</italic> maintained in <sans-serif>RRInvertedList</sans-serif>(nodeID), a list structure <italic>rrTmpList</italic>
<sub>
<italic>v</italic>
</sub> is used to store the random RR sets during the computation (line 7&#x2013;13). There are <italic>k</italic> round in total (line 7&#x2013;12), each of them chooses the current node with maximum <italic>RR</italic> sets covered. Within one round, for each node <italic>v</italic>, the size of its <italic>rrCoverList</italic> can measure how many RR sets it can cover in the left, therefore, the one with largest <italic>rrCoverList.size</italic>() is chosen to be the seed node in that round (line 8, 12). Since the <italic>rrCoverList</italic> needs to be maintained dynamically to show the number of RR sets each node can cover, within each round, if some node <italic>v</italic>
<sub>
<italic>i</italic>
</sub> is chosen, the RR sets it covers should be removed from the <italic>rrCoverList</italic> and maintained temporally in <italic>rrTmpList</italic> (line 9&#x2013;11). Finally, before returning the final seed set (line 14), the <sans-serif>RRInvertedList</sans-serif> is restored by merging with <italic>rrTmpList</italic> (line&#x20;13).</p>
<p>
<statement content-type="algorithm" id="alg5">
<label>Algorithm 5</label>
<p>RIS-MSTIM-Index.</p>
<p>
<inline-graphic xlink:href="fphy-09-768181-fx5.tif"/>
</p>
</statement>
</p>
</sec>
</sec>
<sec id="s7">
<title>7 Experimental Evaluation</title>
<p>In this part, experiments on real datasets are conducted to evaluate the efficiency and performance of the targeted influence maximization algorithm defined based on multidimensional queries.</p>
<sec id="s7-1">
<title>7.1 Experiment Setup</title>
<p>We ran our experiments on four real datasets, Google, Youtube, Twitter, and WikiVote, which are collected from Konect (<ext-link ext-link-type="uri" xlink:href="http://konect.uni-koblenz.de/">http://konect.uni-koblenz.de/</ext-link>) and SNAP (<ext-link ext-link-type="uri" xlink:href="http://snap.stanford.edu/data/">http://snap.stanford.edu/data/</ext-link>) respectively. All of them are social network datasets. Typically, the characteristic information of the four datasets are shown in <xref ref-type="table" rid="T1">Table&#x20;1</xref>. In these four social networks, nodes represent users and edges represent friendships between users. All experiments were executed on a PC with 3.40&#xa0;GHz Intel Core i7 CPU and 32&#xa0;GB of DDR3 RAM, running Ubuntu&#x20;20.04.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Statistics of graph datasets.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">Type</th>
<th align="center">&#x23;Vertices</th>
<th align="center">&#x23;Edges</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">google</td>
<td align="left">Social network</td>
<td align="center">23,628</td>
<td align="center">39,242</td>
</tr>
<tr>
<td align="left">wikivote</td>
<td align="left">Social network</td>
<td align="center">7,115</td>
<td align="center">103,689</td>
</tr>
<tr>
<td align="left">twitter</td>
<td align="left">Social network</td>
<td align="center">465,017</td>
<td align="center">834,797</td>
</tr>
<tr>
<td align="left">youtube</td>
<td align="left">Social network</td>
<td align="center">1,138,499</td>
<td align="center">4,942,297</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We compare two versions of the influence maximization algorithm proposed. For RIS algorithm, it means that the baseline method shown in <xref ref-type="other" rid="alg1">Algorithm 1</xref>, where a trivial method is utilized to obtain the query result first and the commonly used RIS method is utilized to solve the corresponding targeted influence maximization problem. While, for RIS-MSTIM-Index algorithm, it means that the index version of <xref ref-type="other" rid="alg1">Algorithm 1</xref>, which utilizes the <sans-serif>resultPool</sans-serif> to improve the performance of query evaluation and the <sans-serif>RRSet</sans-serif> index to save the sampling costs. There have been several sophisticated influence maximization algorithms (such as [<xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B13">13</xref>,<xref ref-type="bibr" rid="B22">22</xref>,<xref ref-type="bibr" rid="B29">29</xref>]) within the framework of RIS. In this paper, we implement the method in [<xref ref-type="bibr" rid="B13">13</xref>] and use it as the standard RIS algorithm. In fact, the method proposed by [<xref ref-type="bibr" rid="B1">1</xref>] is the state of the art, which is denoted by BCT here. There are two reasons that we choose [<xref ref-type="bibr" rid="B13">13</xref>] as the standard RIS method. First, the method in [<xref ref-type="bibr" rid="B13">13</xref>] is simple and easy to be integrated to the framework of selection based targeted influence maximization. Second, the cost of computing seed nodes for targeted influence maximization is highly dominated by the cost of evaluating multi-dimensional queries, choosing different RIS methods does not produce a significant impact on the total cost, as verified by the following experimental results.</p>
<p>For each dataset, the dimensions and values of the nodes are generated randomly. For each node, 5 category dimensions and 10 numerical dimension are associated and the corresponding values are randomly selected in an uniform way within the value domain. To evaluate the query based targeted influence maximization algorithm, multidimensional range queries are generated in the following way. For each dataset, 10 group of queries are generated randomly, within each group, there are 50 queries in total. The dimension number of each query is chosen between 1 and 5 according to a normal distribution, most of the queries have three or four predicates. A query is only allowed to contain at most one predicate on each dimension. The range predicates are also randomly generated by controlling the selectivities to be about 20%. Usually, to be simple, we use the number of queries and random RR sets to control the index size. It should be remarked that the controlling method is not an accurate way since the size of each query or sample is not known, but it can avoid complex calculating the index size which may affect the performance of the main algorithm.</p>
</sec>
<sec id="s7-2">
<title>7.2 Experimental Results and Analysis</title>
<p>The experimental results are conducted to verify the efficiencies of the influence maximization algorithm proposed from several aspects.</p>
<sec id="s7-2-1">
<title>7.2.1 Efficiency of RIS-MSTIM Algorithm</title>
<p>To study the efficiencies of the RIS-MSTIM algorithm proposed, for each real dataset, both the RIS-MSTIM algorithms with and without indexes are executed to solve the targeted influence maximization problem. To measure the performance of RIS-MSTIM fairly when considering different queries, 10 range query groups, each of which contain 50 queries, are generated randomly. For each group of queries, RIS-MSTIM algorithm is invoked to solve the corresponding targeted influence maximization problem. Within each group, the performance of RIS-MSTIM will become better as the increase of the number of queries processed, assuming that there are enough space costs to maintain the corresponding indexes. For each same setting, the algorithm is ran 5&#x20;times and the average time costs are recorded.</p>
<p>As shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>, executing the RIS-MSTIM algorithms with and without indexes over the four datasets, when the seed size k is increased from 5 to 80, the average time costs for evaluating each group of queries generated are reported. Here, for the previous three datasets, the average time cost of 10 group of queries are reported, while for the youtube dataset the average time cost of 3 group of queries are reported since it will take much longer time than other datasets. Based on the above results, we have two observations. The first one is that as the seed size increases in an exponential speed both the index and no-index versions of RIS-MSTIM can scale well, where it should be noted that the seed size is enlarged twice each time. The second one is that using the indexes the RIS-MSTIM algorithm performances much better, where the indexes can help the reduce the time costs to about 50% for all datasets. It is verified that the index based idea of improving the performance of RIS-MSTIM is effective and can be utilized in rather diverse settings for which within the experiments there are about 500 queries and five different seed sizes. Moreover, since the datasets used here have different characteristics, it has been also verified that our index based method is proper for different types of&#x20;data.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>The average time cost of targeted influence maximization algorithm for evaluating the randomly generated query groups over four datasets.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g007.tif"/>
</fig>
<p>Also, it is verified that the index based solution can be applied to other sophisticated method within RIS framework also. As shown in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>, using BCT method in [<xref ref-type="bibr" rid="B1">1</xref>] which is the state of the art and the same setting, for wiki-vote and google datasets, the average time costs of BCT methods with and without index are compared. It can be observed that the index solution proposed by this paper can improve the performance of BCT based method significantly also. The second key observation is that, although BCT is better than the standard RIS method, the total time costs caused by BCT and RIS methods are nearly same. The reason is that the cost of performing targeted influence maximization is highly dominated by the cost of evaluating multi-dimensional queries.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>The average time cost of BCT based targeted influence maximization algorithm for evaluating the randomly generated query groups over two datasets.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g008.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="F9">Figure&#x20;9</xref>, fixing the seed size to be 20, the time costs for evaluating each query group are reported. In <xref ref-type="fig" rid="F9">Figure&#x20;9A</xref>, over the wiki-vote dataset, the results for 10 groups of queries are shown, where as discussed above each of the group contains 50 queries and the label n of <italic>x</italic>-axis means the group id. Similarly, the experimental results over google dataset is shown in <xref ref-type="fig" rid="F9">Figure&#x20;9B</xref>. It can be observed that over those two datasets the performance of RIS-MSTIM method can be improved by using indexes for all the query group randomly generated.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>The time costs of each query group over wiki-vote and google datasets.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g009.tif"/>
</fig>
</sec>
<sec id="s7-2-2">
<title>7.2.2 Effectiveness of Adjusting Index Sizes</title>
<p>To evaluate the effects of index sizes, the memory size used by the indexes are controlled by limiting the total number of queries and random RR sets cached by the indexes. The total size is changed between 1 and 1000&#xa0;K, where each time it is increased by 10 times. For each dataset, the average time costs of evaluating the targeted influence maximization algorithm for the 10 groups of queries generated are recorded. As shown in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>, when increasing the index size, the time costs over all four datasets are reduced. Since increasing the index size can enlarge the probability that a random chosen query share random RR sets with previous queries, such an observation is just what are expected. Therefore, it can be verified that the indexes using by the algorithm is the essential part for improving the performance of RIS-MSTIM, and when being allowed, the threshold for controlling index size should be assigned to a value as large as possible.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>The effects of adjusting the index&#x20;sizes.</p>
</caption>
<graphic xlink:href="fphy-09-768181-g010.tif"/>
</fig>
</sec>
<sec id="s7-2-3">
<title>7.2.3 The Space Cost of Indexes</title>
<p>Intuitively, to let the index based method more general and usable, the space cost of the indexes used should be well controlled. Intuitively, when it is assumed that the memory is enough large to contain all possible index items, without considering the maintaining and searching costs of the indexes, the performance of the influence maximization algorithms can only become better when more items are indexed. Although, the maintaining and searching costs are relatively small comparing with the total time cost of RIS-MSTIM, it still needs huge space cost to store the sampled random RR sets temporally. If the indexes consume too many space costs, there may be only few space to store the new samples needed and the algorithm will perform very bad because of no enough memory space. In this part, fixing the index control size as 1000&#xa0;K, for the four datasets, we run RIS-MSTIM algorithm over them for the parameter settings <italic>k &#x2208; </italic>{5, 10, 20, 40, 80}. Then, the sizes of indexes for query results and random RR sets are reported. As shown in <xref ref-type="table" rid="T2">Table&#x20;2</xref>, the space cost of RIS-MSTIM with index over the youtube dataset is the largest and the cost over the wiki-vote dataset is the smallest, which is expected based on the observation about the execution time costs. Generally speaking, when the seed size becomes larger, the space cost increases also. For the google and wiki-vote datasets, when the seed size is rather smaller, because that the space cost of all samples generated is still smaller than the threshold, the cost labelled by <sans-serif>random-RR</sans-serif> is significantly smaller than the case with larger <italic>k</italic> value. Moreover, it can be observed that the cost of <sans-serif>query-result</sans-serif> is much smaller than the cost of <sans-serif>random-RR</sans-serif>, which is as expected since each random RR set is essentially a node&#x20;set.</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Space costs of indexes.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Index type</th>
<th align="center">
<italic>k</italic>&#x20;&#x3d; 5</th>
<th align="center">
<italic>k</italic>&#x20;&#x3d; 10</th>
<th align="center">
<italic>k</italic>&#x20;&#x3d; 20</th>
<th align="center">
<italic>k</italic>&#x20;&#x3d; 40</th>
<th align="center">
<italic>k</italic>&#x20;&#x3d; 80</th>
<th align="center">Dataset</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">
<sans-serif>query-result</sans-serif>
</td>
<td align="left">1.54&#xa0;MB</td>
<td align="left">1.96&#xa0;MB</td>
<td align="left">1.54&#xa0;MB</td>
<td align="left">1.68&#xa0;MB</td>
<td align="left">1.69&#xa0;MB</td>
<td rowspan="2" align="left">google</td>
</tr>
<tr>
<td align="left">
<sans-serif>random-RR</sans-serif>
</td>
<td align="left">134&#xa0;MB</td>
<td align="left">163&#xa0;MB</td>
<td align="left">182&#xa0;MB</td>
<td align="left">183&#xa0;MB</td>
<td align="left">184&#xa0;MB</td>
</tr>
<tr>
<td align="left">
<sans-serif>query-result</sans-serif>
</td>
<td align="left">448&#xa0;KB</td>
<td align="left">565&#xa0;KB</td>
<td align="left">639&#xa0;KB</td>
<td align="left">538&#xa0;KB</td>
<td align="left">537&#xa0;KB</td>
<td rowspan="2" align="left">wiki-vote</td>
</tr>
<tr>
<td align="left">
<sans-serif>random-RR</sans-serif>
</td>
<td align="left">29.5&#xa0;MB</td>
<td align="left">64.5&#xa0;MB</td>
<td align="left">164&#xa0;MB</td>
<td align="left">163&#xa0;MB</td>
<td align="left">169&#xa0;MB</td>
</tr>
<tr>
<td align="left">
<sans-serif>query-result</sans-serif>
</td>
<td align="left">101&#xa0;MB</td>
<td align="left">104&#xa0;MB</td>
<td align="left">105&#xa0;MB</td>
<td align="left">106&#xa0;MB</td>
<td align="left">115&#xa0;MB</td>
<td rowspan="2" align="left">twitter</td>
</tr>
<tr>
<td align="left">
<sans-serif>random-RR</sans-serif>
</td>
<td align="left">434&#xa0;MB</td>
<td align="left">437&#xa0;MB</td>
<td align="left">432&#xa0;MB</td>
<td align="left">445&#xa0;MB</td>
<td align="left">434&#xa0;MB</td>
</tr>
<tr>
<td align="left">
<sans-serif>query-result</sans-serif>
</td>
<td align="left">307&#xa0;MB</td>
<td align="left">312&#xa0;MB</td>
<td align="left">310&#xa0;MB</td>
<td align="left">340&#xa0;MB</td>
<td align="left">320&#xa0;MB</td>
<td rowspan="2" align="left">youtube</td>
</tr>
<tr>
<td align="left">
<sans-serif>random-RR</sans-serif>
</td>
<td align="left">6.0&#xa0;GB</td>
<td align="left">5.8&#xa0;GB</td>
<td align="left">5.9&#xa0;GB</td>
<td align="left">5.9&#xa0;GB</td>
<td align="left">6.1&#xa0;GB</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s7-2-4">
<title>7.2.4 Comparison of Influence Obtained</title>
<p>Since the two versions of the RIS-MSTIM algorithm are the same one in principle, the effects of solving targeted influence maximization problem should be the same also. However, the key idea utilized in the index version is to reuse the samples obtained before, therefore, the detailed execution of the two RIS-MSTIM algorithms may be different. In this part, we should verify that the difference discussed above is very tiny such that we can ignore them when considering the qualities of the solutions obtained.</p>
<p>Since for each parameter setting, 50 queries in total are ran, we randomly select one query for each setting, record the seed node set, evaluate and compare their corresponding influence obtained. The results are shown in <xref ref-type="table" rid="T3">Table&#x20;3</xref>. It can be observed that there are tiny differences between the influence values obtained by RIS-MSTIM with and without indexes. This is as expected since the algorithm is a randomized one which only makes sure that a <italic>(1 &#x2212; 1/e &#x2212; &#x3f5;)</italic> approximation solution is obtained with at least <italic>(1 &#x2212; &#x3b4;)</italic> probability. Also, it can be observed that the differences are acceptable for each dataset when considering the total dataset&#x20;sizes.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Influence of the&#x20;seeds.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="center">SeedSize</th>
<th align="center">RIS-MSTIM &#x2b; index</th>
<th align="center">RIS-MSTIM</th>
<th align="left">QueryID</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4" align="left">wiki-vote</td>
<td align="char" char=".">5</td>
<td align="char" char=".">20</td>
<td align="char" char=".">20</td>
<td align="center">
<italic>q</italic>
<sub>26</sub>
</td>
</tr>
<tr>
<td align="char" char=".">10</td>
<td align="char" char=".">22</td>
<td align="char" char=".">22</td>
<td align="center">
<italic>q</italic>
<sub>24</sub>
</td>
</tr>
<tr>
<td align="char" char=".">20</td>
<td align="char" char=".">133</td>
<td align="char" char=".">134</td>
<td align="center">
<italic>q</italic>
<sub>44</sub>
</td>
</tr>
<tr>
<td align="char" char=".">40</td>
<td align="char" char=".">78</td>
<td align="char" char=".">79</td>
<td align="center">
<italic>q</italic>
<sub>17</sub>
</td>
</tr>
<tr>
<td/>
<td align="char" char=".">80</td>
<td align="char" char=".">195</td>
<td align="char" char=".">193</td>
<td align="center">
<italic>q</italic>
<sub>16</sub>
</td>
</tr>
<tr>
<td rowspan="4" align="left">google</td>
<td align="char" char=".">5</td>
<td align="char" char=".">2302</td>
<td align="char" char=".">2305</td>
<td align="center">
<italic>q</italic>
<sub>21</sub>
</td>
</tr>
<tr>
<td align="char" char=".">10</td>
<td align="char" char=".">3275</td>
<td align="char" char=".">3273</td>
<td align="center">
<italic>q</italic>
<sub>6</sub>
</td>
</tr>
<tr>
<td align="char" char=".">20</td>
<td align="char" char=".">2609</td>
<td align="char" char=".">2618</td>
<td align="center">
<italic>q</italic>
<sub>35</sub>
</td>
</tr>
<tr>
<td align="char" char=".">40</td>
<td align="char" char=".">4050</td>
<td align="char" char=".">4063</td>
<td align="center">
<italic>q</italic>
<sub>45</sub>
</td>
</tr>
<tr>
<td/>
<td align="char" char=".">80</td>
<td align="char" char=".">4075</td>
<td align="char" char=".">4039</td>
<td align="center">
<italic>q</italic>
<sub>9</sub>
</td>
</tr>
<tr>
<td rowspan="4" align="left">twitter</td>
<td align="char" char=".">5</td>
<td align="char" char=".">17832</td>
<td align="char" char=".">17824</td>
<td align="center">
<italic>q</italic>
<sub>1</sub>
</td>
</tr>
<tr>
<td align="char" char=".">10</td>
<td align="char" char=".">18484</td>
<td align="char" char=".">18456</td>
<td align="center">
<italic>q</italic>
<sub>18</sub>
</td>
</tr>
<tr>
<td align="char" char=".">20</td>
<td align="char" char=".">3856</td>
<td align="char" char=".">3883</td>
<td align="center">
<italic>q</italic>
<sub>14</sub>
</td>
</tr>
<tr>
<td align="char" char=".">40</td>
<td align="char" char=".">53494</td>
<td align="char" char=".">53430</td>
<td align="center">
<italic>q</italic>
<sub>1</sub>
</td>
</tr>
<tr>
<td/>
<td align="char" char=".">80</td>
<td align="char" char=".">23690</td>
<td align="char" char=".">23708</td>
<td align="center">
<italic>q</italic>
<sub>47</sub>
</td>
</tr>
<tr>
<td rowspan="5" align="left">youtube</td>
<td align="char" char=".">5</td>
<td align="char" char=".">4833</td>
<td align="char" char=".">4824</td>
<td align="center">
<italic>q</italic>
<sub>9</sub>
</td>
</tr>
<tr>
<td align="char" char=".">10</td>
<td align="char" char=".">11246</td>
<td align="char" char=".">11263</td>
<td align="center">
<italic>q</italic>
<sub>26</sub>
</td>
</tr>
<tr>
<td align="char" char=".">20</td>
<td align="char" char=".">6038</td>
<td align="char" char=".">6097</td>
<td align="center">
<italic>q</italic>
<sub>32</sub>
</td>
</tr>
<tr>
<td align="char" char=".">40</td>
<td align="char" char=".">24179</td>
<td align="char" char=".">24185</td>
<td align="center">
<italic>q</italic>
<sub>4</sub>
</td>
</tr>
<tr>
<td align="char" char=".">80</td>
<td align="char" char=".">8816</td>
<td align="char" char=".">8834</td>
<td align="center">
<italic>q</italic>
<sub>4</sub>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s8">
<title>8 Conclusion</title>
<p>In this paper, the problem of multidimensional selection based Targeted Influence Maximization is studied. The MSTIM problem is shown to be NP-hard even when the target set is rather small. The RIS framework is extended to the MSTIM case, based on a careful analysis of the sampling size, it is shown that the MSTIM problem admits a <italic>1 &#x2212; 1/e &#x2212; &#x3f5;</italic> approximation algorithm based on reverse influence sampling. To answer the MSTIM problem efficiently, an index based solution is proposed. To improve the performance of evaluating multi-selection queries, an inverted list style index for query predicates is presented, and efficient index based query evaluation method is developed. To improve the performance of the sampling procedure, using the idea of sharing samples as much as possible, an adaptive sampling strategy based on index is introduced and the corresponding influence maximization algorithm is designed. The experimental results show that the method proposed is efficient.</p>
</sec>
</body>
<back>
<sec id="s9">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="http://snap.stanford.edu/data/">http://snap.stanford.edu/data/</ext-link> <ext-link ext-link-type="uri" xlink:href="http://konect.uni-koblenz.de/">http://konect.uni-koblenz.de/</ext-link>.</p>
</sec>
<sec id="s10">
<title>Author Contributions</title>
<p>DJ completes the main work and updates the manuscript of this paper. TL gave the main idea of the key method, designed the study, and helped to draft the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec id="s11">
<title>Funding</title>
<p>This work is supported by the National Key Research and Development Program of China (2018AAA0101901), the National Natural Science Foundation of China (NSFC) <italic>via</italic> Grant 61976073 and 61702137.</p>
</sec>
<sec sec-type="COI-statement" id="s12">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s13">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nguyen</surname>
<given-names>HT</given-names>
</name>
<name>
<surname>Thai</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Dinh</surname>
<given-names>TN</given-names>
</name>
</person-group>. <article-title>A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing</article-title>. <source>Ieee/acm Trans Networking</source> (<year>2017</year>) <volume>25</volume>:<fpage>2419</fpage>&#x2013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1109/tnet.2017.2691544</pub-id> </citation>
</ref>
<ref id="B2">
<label>2.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Epasto</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mahmoody</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Upfal</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Real-time Targeted-Influence Queries over Large Graphs</article-title>. In: <conf-name>Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017</conf-name>; <conf-date>July 31 - August 03, 2017</conf-date>; <conf-loc>Sydney, Australia</conf-loc> (<year>2017</year>). p. <fpage>224</fpage>&#x2013;<lpage>31</lpage>. <pub-id pub-id-type="doi">10.1145/3110025.3110105</pub-id> </citation>
</ref>
<ref id="B3">
<label>3.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K-L</given-names>
</name>
</person-group>. <article-title>Real-time Targeted Influence Maximization for Online Advertisements</article-title>. <source>Proc VLDB Endow</source> (<year>2015</year>) <volume>8</volume>:<fpage>1070</fpage>&#x2013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.14778/2794367.2794376</pub-id> </citation>
</ref>
<ref id="B4">
<label>4.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Domingos</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>M</given-names>
</name>
</person-group>. <article-title>Mining the Network Value of Customers</article-title>. In: <conf-name>KDD &#x2019;01</conf-name> (<year>2001</year>). p. <fpage>57</fpage>&#x2013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.1145/502512.502525</pub-id> </citation>
</ref>
<ref id="B5">
<label>5.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Richardson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Domingos</surname>
<given-names>P</given-names>
</name>
</person-group>. <article-title>Mining Knowledge-Sharing Sites for Viral Marketing</article-title>. In: <conf-name>Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2002</year>). p. <fpage>61</fpage>&#x2013;<lpage>70</lpage>. <comment>KDD &#x2019;02</comment>. <pub-id pub-id-type="doi">10.1145/775047.775057</pub-id> </citation>
</ref>
<ref id="B6">
<label>6.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kempe</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kleinberg</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tardos</surname>
<given-names>E</given-names>
</name>
</person-group>. <article-title>Maximizing the Spread of Influence through a Social Network</article-title>. In: <conf-name>Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2003</year>). p. <fpage>137</fpage>&#x2013;<lpage>46</lpage>. <comment>KDD &#x2019;03</comment>. <pub-id pub-id-type="doi">10.1145/956750.956769</pub-id> </citation>
</ref>
<ref id="B7">
<label>7.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks</article-title>. In: <conf-name>KDD &#x2019;10</conf-name>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2010</year>). p. <fpage>1029</fpage>&#x2013;<lpage>38</lpage>. <pub-id pub-id-type="doi">10.1145/1835804.1835934</pub-id> </citation>
</ref>
<ref id="B8">
<label>8.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Leskovec</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Krause</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Guestrin</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Faloutsos</surname>
<given-names>C</given-names>
</name>
<name>
<surname>VanBriesen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Glance</surname>
<given-names>N</given-names>
</name>
</person-group>. <article-title>Cost-effective Outbreak Detection in Networks</article-title>. In: <conf-name>Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2007</year>). p. <fpage>420</fpage>&#x2013;<lpage>9</lpage>. <comment>KDD &#x2019;07</comment>. <pub-id pub-id-type="doi">10.1145/1281192.1281239</pub-id> </citation>
</ref>
<ref id="B9">
<label>9.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Goyal</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lakshmanan</surname>
<given-names>LVS</given-names>
</name>
</person-group>. <article-title>Simpath: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model</article-title>. In: <conf-name>ICDM &#x2019;11</conf-name>; <conf-loc>Washington, DC, USA</conf-loc>, (<year>2011</year>). p. <fpage>211</fpage>&#x2013;<lpage>20</lpage>. </citation>
</ref>
<ref id="B10">
<label>10.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Efficient Influence Maximization in Social Networks</article-title>. In: <conf-name>Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>; <conf-loc>New York, USA:</conf-loc> <publisher-name>ACM</publisher-name> (<year>2009</year>). p. <fpage>199</fpage>&#x2013;<lpage>208</lpage>. <comment>KDD &#x2019;09</comment>. <pub-id pub-id-type="doi">10.1145/1557019.1557047</pub-id> </citation>
</ref>
<ref id="B11">
<label>11.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Y-C</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>W-C</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S-Y</given-names>
</name>
</person-group>. <article-title>Efficient Algorithms for Influence Maximization in Social Networks</article-title>. <source>Knowl Inf Syst</source> (<year>2012</year>) <volume>33</volume>:<fpage>577</fpage>&#x2013;<lpage>601</lpage>. <pub-id pub-id-type="doi">10.1007/s10115-012-0540-7</pub-id> </citation>
</ref>
<ref id="B12">
<label>12.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Cong</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Si</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>K</given-names>
</name>
</person-group>. <article-title>Simulated Annealing Based Influence Maximization in Social Networks</article-title>. In: <conf-name>Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence</conf-name>. <publisher-loc>Palo Alto, CA</publisher-loc>: <publisher-name>AAAI Press</publisher-name> (<year>2011</year>). p. <fpage>127</fpage>&#x2013;<lpage>32</lpage>. <comment>AAAI&#x2019;11</comment>. </citation>
</ref>
<ref id="B13">
<label>13.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>X</given-names>
</name>
</person-group>. <article-title>Influence Maximization in Near-Linear Time</article-title>. In: <person-group person-group-type="editor">
<name>
<surname>Sellis</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Ives</surname>
<given-names>ZG</given-names>
</name>
</person-group>, editors. <conf-name>Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne</conf-name>; <conf-date>May 31 - June 4, 2015</conf-date>; <conf-loc>New York, USA:</conf-loc> <publisher-name>ACM</publisher-name> (<year>2015</year>). p. <fpage>1539</fpage>&#x2013;<lpage>54</lpage>. <pub-id pub-id-type="doi">10.1145/2723372.2723734</pub-id> </citation>
</ref>
<ref id="B14">
<label>14.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bevilacqua</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lakshmanan</surname>
<given-names>LVS</given-names>
</name>
</person-group>. <article-title>Revisiting the Stop-And-Stare Algorithms for Influence Maximization</article-title>. <source>Proc VLDB Endow</source> (<year>2017</year>) <volume>10</volume>:<fpage>913</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.14778/3099622.3099623</pub-id> </citation>
</ref>
<ref id="B15">
<label>15.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>A</given-names>
</name>
<etal/>
</person-group> <article-title>Efficient Approximation Algorithms for Adaptive Influence Maximization</article-title>. <source>VLDB J</source> (<year>2020</year>) <volume>29</volume>:<fpage>1385</fpage>&#x2013;<lpage>406</lpage>. <pub-id pub-id-type="doi">10.1007/s00778-020-00615-8</pub-id> </citation>
</ref>
<ref id="B16">
<label>16.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Arora</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Galhotra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ranu</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Influence Maximization Revisited: The State of the Art and the Gaps that Remain</article-title>. In: <person-group person-group-type="editor">
<name>
<surname>Herschel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Galhardas</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Reinwald</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fundulaki</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Binnig</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kaoudi</surname>
<given-names>Z</given-names>
</name>
</person-group>, editors. <conf-name>Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019</conf-name>; <conf-date>March 26-29, 2019</conf-date>; <conf-loc>Lisbon, Portugal</conf-loc>. <publisher-loc>Konstanz, Germany</publisher-loc>: <publisher-name>OpenProceedings.org</publisher-name> (<year>2019</year>). p. <fpage>440</fpage>&#x2013;<lpage>3</lpage>. <pub-id pub-id-type="doi">10.5441/002/edbt.2019.40</pub-id> </citation>
</ref>
<ref id="B17">
<label>17.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Arora</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Galhotra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ranu</surname>
<given-names>S</given-names>
</name>
</person-group>. <article-title>Debunking the Myths of Influence Maximization</article-title>. In: <person-group person-group-type="editor">
<name>
<surname>Salihoglu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chirkova</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Suciu</surname>
<given-names>D</given-names>
</name>
</person-group>, editors. <conf-name>Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017</conf-name>; <conf-date>May 14-19, 2017</conf-date>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2017</year>). p. <fpage>651</fpage>&#x2013;<lpage>66</lpage>. <pub-id pub-id-type="doi">10.1145/3035918.3035924</pub-id> </citation>
</ref>
<ref id="B18">
<label>18.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
</person-group>. <article-title>Efficient Location-Aware Influence Maximization</article-title>. In: <conf-name>Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data</conf-name>; <publisher-loc>New York, USA</publisher-loc>: <publisher-name>ACM</publisher-name> (<year>2014</year>). p. <fpage>87</fpage>&#x2013;<lpage>98</lpage>. <comment>SIGMOD &#x2019;14</comment>. <pub-id pub-id-type="doi">10.1145/2588555.2588561</pub-id> </citation>
</ref>
<ref id="B19">
<label>19.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>G</given-names>
</name>
</person-group>. <article-title>Relationship Classification in Large Scale Online Social Networks and its Impact on Information Propagation</article-title>. In: <conf-name>INFOCOM 2011</conf-name> (<year>2011</year>). p. <fpage>2291</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1109/infcom.2011.5935046</pub-id> </citation>
</ref>
<ref id="B20">
<label>20.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K-l.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>Online Topic-Aware Influence Maximization</article-title>. <source>Proc VLDB Endow</source> (<year>2015</year>) <volume>8</volume>:<fpage>666</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.14778/2735703.2735706</pub-id> </citation>
</ref>
<ref id="B21">
<label>21.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Mo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>G</given-names>
</name>
<etal/>
</person-group> <article-title>Structinf: Mining Structural Influence from Social Streams</article-title>. In: <conf-name>AAAI&#x2019;17</conf-name> (<year>2017</year>). p. <fpage>73</fpage>&#x2013;<lpage>80</lpage>. </citation>
</ref>
<ref id="B22">
<label>22.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
</person-group>. <article-title>Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency</article-title>. In: <conf-name>SIGMOD 2014</conf-name> (<year>2014</year>). p. <fpage>75</fpage>&#x2013;<lpage>86</lpage>. </citation>
</ref>
<ref id="B23">
<label>23.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khuller</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Moss</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Naor</surname>
<given-names>J</given-names>
</name>
</person-group>. <article-title>The Budgeted Maximum Coverage Problem</article-title>. <source>Inf Process Lett</source> (<year>1999</year>) <volume>70</volume>:<fpage>39</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1016/s0020-0190(99)00031-9</pub-id> </citation>
</ref>
<ref id="B24">
<label>24.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K-L</given-names>
</name>
</person-group>. <article-title>Influence Maximization on Social Graphs: A Survey</article-title>. <source>IEEE Trans Knowl Data Eng</source> (<year>2018</year>) <volume>30</volume>:<fpage>1852</fpage>&#x2013;<lpage>72</lpage>. <pub-id pub-id-type="doi">10.1109/tkde.2018.2807843</pub-id> </citation>
</ref>
<ref id="B25">
<label>25.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gaede</surname>
<given-names>V</given-names>
</name>
<name>
<surname>G&#xfc;nther</surname>
<given-names>O</given-names>
</name>
</person-group>. <article-title>Multidimensional Access Methods</article-title>. <source>ACM Comput Surv</source> (<year>1998</year>) <volume>30</volume>:<fpage>170</fpage>&#x2013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1145/280277.280279</pub-id> </citation>
</ref>
<ref id="B26">
<label>26.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Guttman</surname>
<given-names>A</given-names>
</name>
</person-group>. <article-title>R-trees: A Dynamic index Structure for Spatial Searching</article-title>. In: <conf-name>SIGMOD&#x2019;84, Proceedings of Annual Meeting</conf-name>; <conf-date>June 18-21</conf-date>, <volume>1984</volume>. <publisher-loc>Boston, Massachusetts, USA</publisher-loc> (<year>1984</year>). p. <fpage>47</fpage>&#x2013;<lpage>57</lpage>. </citation>
</ref>
<ref id="B27">
<label>27.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bentley</surname>
<given-names>JL</given-names>
</name>
</person-group>. <article-title>Multidimensional Binary Search Trees Used for Associative Searching</article-title>. <source>Commun ACM</source> (<year>1975</year>) <volume>18</volume>:<fpage>509</fpage>&#x2013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1145/361002.361007</pub-id> </citation>
</ref>
<ref id="B28">
<label>28.</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lamb</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fuller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Varadarajan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Vandiver</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Doshi</surname>
<given-names>L</given-names>
</name>
<etal/>
</person-group> <article-title>The Vertica Analytic Database</article-title>. <source>Proc VLDB Endow</source> (<year>2012</year>) <volume>5</volume>:<fpage>1790</fpage>&#x2013;<lpage>801</lpage>. <pub-id pub-id-type="doi">10.14778/2367502.2367518</pub-id> </citation>
</ref>
<ref id="B29">
<label>29.</label>
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Borgs</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Brautbar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chayes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lucier</surname>
<given-names>B</given-names>
</name>
</person-group>. <article-title>Maximizing Social Influence in Nearly Optimal Time</article-title>. In: <conf-name>Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms</conf-name>; <conf-loc>USA: Society for Industrial and Applied Mathematics</conf-loc>. <publisher-loc>Philadelphia, PA</publisher-loc>. <comment>SODA &#x2019;14</comment> (<year>2014</year>). p. <fpage>946</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1137/1.9781611973402.70</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>