<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Energy Res.</journal-id>
<journal-title>Frontiers in Energy Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Energy Res.</abbrev-journal-title>
<issn pub-type="epub">2296-598X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">848754</article-id>
<article-id pub-id-type="doi">10.3389/fenrg.2022.848754</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Energy Research</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Multi-Scale Video Flame Detection for Early Fire Warning Based on Deep Learning</article-title>
<alt-title alt-title-type="left-running-head">Dai et&#x20;al.</alt-title>
<alt-title alt-title-type="right-running-head">Flame-Detection Based on Deep Learning</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Dai</surname>
<given-names>Peiwen</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1613646/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Zhang</surname>
<given-names>Qixing</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1615254/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lin</surname>
<given-names>Gaohua</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shafique</surname>
<given-names>Muhammad Masoom</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Huo</surname>
<given-names>Yinuo</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1623233/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tu</surname>
<given-names>Ran</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Yongming</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>State Key Laboratory of Fire Science</institution>, <institution>University of Science and Technology of China</institution>, <addr-line>Hefei</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>College of Mechanical Engineering and Automation</institution>, <institution>Huaqiao University</institution>, <addr-line>Xiamen</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1538236/overview">Weiguang An</ext-link>, China University of Mining and Technology, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1639514/overview">Xueming Shu</ext-link>, Tsinghua University, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1133529/overview">Chenqiang Gao</ext-link>, Chongqing University of Posts and Telecommunications, China</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Qixing Zhang, <email>qixing@ustc.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Sustainable Energy Systems and Policies, a section of the journal Frontiers in Energy Research</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>03</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>10</volume>
<elocation-id>848754</elocation-id>
<history>
<date date-type="received">
<day>05</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>02</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Dai, Zhang, Lin, Shafique, Huo, Tu and Zhang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Dai, Zhang, Lin, Shafique, Huo, Tu and Zhang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>The widespread use of renewable energy resources requires more immediate and effective fire alarms as a preventive measure. The fire is usually weak in the initial stages, which is not conducive to detection and identification. This paper validates a solution to resolve that problem by a flame detection algorithm that is more sensitive to small flames. Based on Yolov3, the parallel convolution structure of Inception is used to obtain multi-size image information. In addition, the receptive field of the convolution kernel is increased with the dilated convolution so that each convolution output contains a range of information to avoid information omission of tiny flames. The model accuracy has improved by introducing a Feature Pyramid Network in the feature extraction stage that has enhanced the feature fusion capability of the model. At the same time, a flame detection database for early fire has been established, which contains more than 30 fire scenarios and is suitable for flame detection under various challenging scenes. Experiments validate the proposed method not only improves the performance of the original algorithm but are also advantageous in comparison with other state-of-the-art object detection networks, and its false positives rate reaches 1.2% in the test&#x20;set.</p>
</abstract>
<kwd-group>
<kwd>flame detection</kwd>
<kwd>video fire detection</kwd>
<kwd>multi-scale</kwd>
<kwd>deep learning</kwd>
<kwd>early fire</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Renewable energy sources is playing an increasingly important role in industry (<xref ref-type="bibr" rid="B24">Qazi et&#x20;al., 2019</xref>). Therefore, its security problem is attracting widespread attention. We can see numerous studies on Hydrogen safety, lithium-ion battery safety, and Photovoltaic safety (<xref ref-type="bibr" rid="B37">Yang et&#x20;al., 2018</xref>; <xref ref-type="bibr" rid="B23">Ould Ely et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B1">Abohamzeh et&#x20;al., 2021</xref>; <xref ref-type="bibr" rid="B8">Fang et&#x20;al., 2021</xref>). However, few studies are reported in the literature for efficient flame detection in the case of fire accidents for renewable energy sources. Because of the unique characteristics of renewable energy resources, their fire situation is complicated, and the immediacy in fire detection and accuracy of fire alarms is necessary for reducing fire hazards. Traditional fire detection technologies detect fire according to the characteristic signals of fire, such as temperature, combustion gas, aerosol, etc. (<xref ref-type="bibr" rid="B35">Xu, 2020</xref>). However, such characteristic signals are weakened gradually in the process of propagation, therefore the traditional contact detector will be restricted by the height and area of the detection space. With the development of digital image processing, video fire detection technology has been proposed and researched. Video fire detection technology that does not depend on contact characteristic parameters became more advantageous in the fire detection domain due to its advantages like a fast response, visualization, and broader detection space. So, vision-based fire detection systems can play a decisive role in the flame detection of renewable energy sources.</p>
<p>The traditional video fire detection method is based on the characteristics of the flame. The static characteristics of the flame include color, shape, number of sharp angles, and circularity, while the dynamic characteristics include flicker frequency and flame area change rate. Yamagishi et&#x20;al. (<xref ref-type="bibr" rid="B36">Yamagishi and Yamaguchi, 1999</xref>) innovatively processed the HSV color space and extracted the flame area by taking advantage of the changing characteristics of the color and saturation in the flame area. Izquierdo and Borges (<xref ref-type="bibr" rid="B3">Borges and Izquierdo, 2010</xref>) realized fire detection by Bayes classifier using changes in the shape, boundary, and area of the flame region and other additional features. Dimitropoulos K and Barmpoutis (<xref ref-type="bibr" rid="B5">Dimitropoulos et&#x20;al., 2014</xref>) proposed a fire detection method based on multi-feature extraction, which simultaneously established a fire model based on flame scintilla feature, dynamic texture feature, color feature, and spatiotemporal energy. However, the traditional video flame detection method based on flame features has its limitations. The algorithm mostly uses static images, lacks dynamic feature extraction, and is susceptible to interference from the shadow, brightness, energy, and other factors. The false positive rate is high, and the detection sensitivity is overly dependent on the algorithm parameters.</p>
<p>Since the rise of deep learning in 2012 (<xref ref-type="bibr" rid="B10">Ghali et&#x20;al., 2020</xref>), it has made outstanding achievements in image classification and object detection, causing a new upsurge in the fields of artificial intelligence and computer vision. Among them, the convolutional neural network is the most outstanding one in image data processing. Convolutional neural network (CNN) is a kind of feedforward neural network with a deep structure (<xref ref-type="bibr" rid="B16">Lecun et&#x20;al., 1998</xref>), which includes convolution computation. And it is a research hotspot in the field of semantic analysis and image recognition. CNN has a weight sharing network structure similar to a biological neural network, which reduces the complexity of the network model by reducing the number of weights, which not only reduces the training parameters but also greatly improves the training&#x20;speed.</p>
<p>As a branch of computer vision, video fire detection also begins to introduce deep learning. Frizzi (<xref ref-type="bibr" rid="B9">Frizzi et&#x20;al., 2016</xref>) uses a 9-layer convolutional neural network to extract features from images and realizes the classification of smoke and flame through sliding window search, which is very fast. Compared with traditional video fire detection methods, this method has better classification performance, indicating that it is promising to use CNN to detect fire in video. Yong-Jin Kim (<xref ref-type="bibr" rid="B38">Young-Jin and Eun-Gyung, 2017</xref>)tries to apply Faster RCNN to flame detection, and Shen (<xref ref-type="bibr" rid="B28">Shen et&#x20;al., 2018</xref>)simplifies the Yolo (You Only Look Once) network to carry out flame detection, both of which achieve good results, indicating that the flame detection method based on deep learning is superior to the traditional video fire detection method in performance.</p>
<p>The early stage of fire is the best stage to extinguish the fire, so the fire detection and alarm at this stage are particularly important. However, the early flame of fire is weak, so it is easy to be ignored by the detection model. To solve this problem, this paper proposes a fire detection and identification method based on improved Yolov3. Yolov3 (You only look once v3) is an excellent object detector with good performance in both aspects of accuracy and speed. Based on this, we hope to improve its ability to identify small objects and introduce multi-scale convolution and dilated convolution into the backbone network to improve its ability to identify flames at different scales. At the same time, the idea of FPN (Feature Pyramid Networks) is used to improve the feature extraction network of Yolov3. The proposed method strengthens the feature fusion and reuses high-level features to achieve the purpose of improving accuracy. In addition, this paper has established a flame database for early fires, which involves a variety of fire scenarios to establish a foundation for future flame detection research.</p>
</sec>
<sec id="s2">
<title>Related Work</title>
<p>There are many applications of deep learning methods in flame detection. Some studies try to combine the traditional video flame detection method with deep learning, and first carry out feature extraction and then use convolutional neural network for recognition. Chen et&#x20;al. (<xref ref-type="bibr" rid="B41">Zhong et&#x20;al., 2020</xref>) designed a flame detection method based on multi-channel convolutional neural network, the OTSU algorithm was used to extract the flame color contour and dynamic features, and then the three features were input into the three-channel convolutional neural network for detection and recognition. Compared with traditional methods, the accuracy is improved, but the method of training specific features has some problems of over-fitting. Otabek Khudayberdiev et&#x20;al. (<xref ref-type="bibr" rid="B14">Khudayberdiev and Butt, 2020</xref>). combined PCA(Principal component analysis) and CNN, extracted data features using PCA, and CNN conducted inspection and classification. MobileNet was selected as the backbone to simplify the size of the model but there is a lack of accuracy.</p>
<p>Some researchers choose to carry out transfer learning, which is to apply the pre-trained deep CNN architecture for the development of fire detection systems. Mohit et&#x20;al. (<xref ref-type="bibr" rid="B6">Dua et&#x20;al., 2020</xref>) believed that the traditional use of the CNN method to carry out flame detection using balanced data sets was not in line with the actual situation, so they proposed to use unbalanced data sets with more non-fire pictures. They used two models, VGG (Visual Geometry Group) and MobileNet, for flame detection, and the experimental results were superior to the traditional CNN method. Jivitesh (<xref ref-type="bibr" rid="B27">Sharma et&#x20;al., 2017</xref>) also used unbalanced data sets for training detection, and in his experiment, Resnet50 outperformed VGG16.</p>
<p>Some researchers believe that the detection of static frames has its limitations, so the deep learning method is considered to identify the dynamic characteristics of flames in the video. Lin et&#x20;al. (<xref ref-type="bibr" rid="B18">Lin et&#x20;al., 2019</xref>) proposed a joint detection framework based on Faster RCNN (Faster Regions with CNN features) (<xref ref-type="bibr" rid="B26">Ren et&#x20;al., 2016</xref>) and 3D CNN (<xref ref-type="bibr" rid="B32">Tran et&#x20;al., 2015</xref>), where RCNN is mainly used to select the suspected fire area for preliminary identification, while 3D CNN is used to extract temporal information and combine the static features and temporal features of smoke. Kim et&#x20;al. (<xref ref-type="bibr" rid="B15">Kim and Lee, 2019</xref>)first used Faster RCNN to detect the suspected fire area, and then used LSTM (Long Short-Term Memory) to judge whether there was a flame from the space-time characteristics. Although this kind of method improves the accuracy of fire detection compared with the image-based method, the huge structure of the model is limited in practical application.</p>
<p>Fire detection based on deep learning mainly revolves around detection accuracy and model size. At present, most of the data sets in the literature are flame images with clear texture and large size, but there is a small proportion of research being carried out for early flame detection that is mandatory to meet the real world applications. Therefore, based on the characteristics of early flames, this paper proposes a detection method for small flames, and at the same time strikes a balance between accuracy and model size. Pu Li (<xref ref-type="bibr" rid="B17">Li and Zhao, 2020</xref>) et&#x20;al. summarized the current advanced object detection algorithm and selected four representative models, such as Faster-RCNN, R-FCN (<xref ref-type="bibr" rid="B4">Dai et&#x20;al., 2016</xref>), SSD, and Yolov3, to test the fire data set. The results showed that Yolov3 had the best performance in flame detection. Therefore, this paper considers Yolov3 aiming to research small flame detections.</p>
</sec>
<sec id="s3">
<title>The Proposed Method</title>
<p>Object detection is a common method used in fire detection by computer vision technology. Yolov3 (<xref ref-type="bibr" rid="B25">Redmon and Farhadi, 2018</xref>)is an excellent object detection network with a balance between speed and accuracy. It has three times the detection speed while achieving the same accuracy as SSD (<xref ref-type="bibr" rid="B22">Liu et&#x20;al., 2016</xref>). Many experiments have shown that Yolov3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed (<xref ref-type="bibr" rid="B40">Zhang et&#x20;al., 2020</xref>). Based on the results of Puli&#x2019;s study (<xref ref-type="bibr" rid="B17">Li and Zhao, 2020</xref>), we chose Yolov3 to improve on early flame detection.</p>
<p>The overall structure of the Yolov3 algorithm is shown in <xref ref-type="fig" rid="F1">Figure&#x20;1</xref>, which can be divided into three parts, including backbone, multi-scale feature extraction structure, and the output. Our model uses parallel convolution structure to get semantic information of different sizes and uses dilated convolution to increase the reception field. Feature Pyramid Networks is used in feature extraction structure to strengthen the utilization of information of different feature layers. Our model has a total of 45,774,941 parameters and a size of 174&#xa0;MB, and the overall structure of the network is shown in <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Structure of Yolov3 which includes backbone and feature extraction structure.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g001.tif"/>
</fig>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Structure of the proposed method which includes backbone and feature extraction structure.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g002.tif"/>
</fig>
<sec id="s3-1">
<title>Backbone</title>
<p>In <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> we can see that there are many residual modules in the backbone network Darknet53 of Yolov3. The structure of the residual module is shown in <xref ref-type="fig" rid="F3">Figure&#x20;3</xref>. In the residual module, a convolution calculation with the size of 3&#x20;&#xd7; 3 and an activation function processing is first carried out, and then the layer is temporarily saved. Then, this layer is convoluted twice with sizes of 1&#x20;&#xd7; 1, 3&#x20;&#xd7; 3 respectively. Finally, this convolutional layer is merged with the previously saved convolutional layer by jumping connection and output. It can be found that the convolution scale and convolution method in the backbone network of Yolov3 are relatively single. As a result, Yolov3 is slightly weak in multi-scale recognition. Therefore, the residual module should be improved from these two directions, and the grouping convolution idea of Inception and dilated convolution method are introduced&#x20;here.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Cyclic structure of residual module.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g003.tif"/>
</fig>
<p>Inception (<xref ref-type="bibr" rid="B30">Szegedy et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B29">Szegedy et&#x20;al., 2017</xref>) is a module in Googlenet, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref>, which is a locally topologically structured network. Inception performs multiple parallel convolution or pooling operations on the input image and concatenates all the results into a very deep feature map. It uses convolution kernels of different sizes in parallel convolution to obtain different information of the input image, which not only increases the width of the network but also increases the adaptability of the network to scale. The structure of Inception extracts the information of different scales from the input image, enriches the feature information of the image and improves the accuracy of recognition. The structure of Inception has the high-performance characteristics of dense matrix, while maintaining the sparse structure of the network, in order to reduce the computational cost of convolution operation. Therefore, without increasing the complexity of the network, the network can capture more information, retain the original details of more objects, perceive more small-scale feature maps through the perception sparse structure, and improve the recognition accuracy of small object parts while optimizing the neural network.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Structure of the Inception module.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g004.tif"/>
</fig>
<p>Dilated Convolution (<xref ref-type="bibr" rid="B39">Yu and Koltun, 2016</xref>) increases the reception field by injecting voids into the Convolution map of standard Convolution. Therefore, based on Standard Convolution, Dilated Convolution adds a hyper-parameter called dilation rate, which refers to the number of kernel intervals. The dilated convolution increases the receptive field of the convolution kernel while keeping the number of parameters unchanged so that each convolution output contains a large range of information. At the same time, it can ensure that the size of the output feature map remains unchanged. As shown in the <xref ref-type="fig" rid="F5">Figure&#x20;5</xref>, the size of the convolution kernel with a dilated rate of 1 remains unchanged, and the receptive field of the 3&#x20;&#xd7; 3 convolution kernel with a dilated rate of 2 is the same as that of the 5&#x20;&#xd7; 5 standard convolution kernel, but the number of parameters is only 9, which is 36% of the number of parameters of the 5&#x20;&#xd7; 5 standard convolution kernel. Compared with traditional convolution, dilated convolution can not only preserve the internal structure of data, but also obtain context information, but also will not reduce the spatial resolution (<xref ref-type="bibr" rid="B34">Wang and Ji, 2018</xref>). Dilated convolution is also used in WaveNet (<xref ref-type="bibr" rid="B33">van den Oord et&#x20;al., 2016</xref>), bytenet (<xref ref-type="bibr" rid="B13">Kalchbrenner et&#x20;al., 2016</xref>) and other networks to improve network performance.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Schematic diagram of dilated convolution with different dilation rates. <bold>(A)</bold> Dilation rate &#x3d;1&#x20;<bold>(B)</bold> dilation rate &#x3d;2&#x20;<bold>(C)</bold> dilation rate &#x3d;4.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g005.tif"/>
</fig>
<p>Using the idea of Inception and dilated convolution, we propose a multi-scale convolution module based on the residual module. As shown in <xref ref-type="fig" rid="F6">Figure&#x20;6</xref>, the convolution in the module is divided into four groups, and convolution cores of different scales are added. After the image enters the backbone network, features of different scales and depths will be extracted. Compared with the original single convolution method, the possibility of flame features being ignored by the convolution layer is greatly reduced. At the same time, the dilated convolution calculation is added to the standard convolution calculation in the multi-scale convolution module, which can not only simplify the number of weight parameters but also improve the feature extraction ability of the network by improving the receptive field. Dilated convolution will not reduce the spatial resolution. When using multi-size convolution structure, it may affect the resolution and is not conducive to the recognition of small objects. However, if dilated convolution is used, it can effectively avoid the reduction of resolution and strengthen the recognition of small objects. We convert the residual module in the backbone into a multi-scale convolution module, and the rewritten multi-scale convolution module retains the DarknetConv2D and LeakyReLU in the original residual module.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Cyclic structure of multi-scale convolution module.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g006.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>Feature Extraction Structure</title>
<p>Yolov3 extracted three feature layers for object detection, and the output scales of the three feature layers were 52&#x20;&#xd7; 52, 26&#x20;&#xd7; 26, and 13&#x20;&#xd7; 13, respectively. The depth of the corresponding feature layers in the backbone network was located in the middle layer, the middle and lower layer, and the bottom layer. The feature fusion of Yolov3 is a bottom-up one-way path, in this process, the semantic information in the 13&#x20;&#xd7; 13 feature graph is fully utilized after two times of up-sampling and feature fusion. However, for the feature layer with a scale of 52&#x20;&#xd7; 52, it only plays a role in the feature output of its own scale. Therefore, information extraction is missing to some extent. In order to reduce the information missing and ensure the effective extraction of small-scale flame features, the idea of FPN was introduced to improve the feature extraction structure.</p>
<p>FPN(Feature Pyramid Networks) (<xref ref-type="bibr" rid="B19">Lin et&#x20;al., 2017a</xref>) is a Feature extraction structure. As shown in <xref ref-type="fig" rid="F7">Figure&#x20;7</xref>, FPN carries out multiple feature fusion at different scales. First, feature extraction is carried out from bottom to top, and the scale of the feature map is gradually reduced. After reaching the top level, the feature fusion path is carried out from top to bottom, and the top-level features are up-sampled and gradually merged with the lower level features. It helps to reinforce the low-resolution features of the underlying layer. The idea of feature fusion of FPN has been embodied in many networks.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Structure of feature pyramid networks.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g007.tif"/>
</fig>
<p>Inspired by FPN, we expand the one-way feature fusion path in Yolov3 into a two-way feature fusion path. The top-down path is added based on the bottom-up path, which enriches the high-level semantic information and helps detect small flames.</p>
<p>
<xref ref-type="fig" rid="F2">Figure&#x20;2</xref> is the structure diagram of the proposed method, in which some residual modules in the backbone network are replaced with multi-scale convolution modules. In terms of feature extraction structure, a bottom-up feature fusion process was carried out first, and a feature map with a scale of 52&#x20;&#xd7; 52 was output. Then, the feature layer was sampled twice and fused with the fusion layer of the 13&#x20;&#xd7; 13 feature layer and the 26&#x20;&#xd7; 26 feature layer, and the result was used as the feature output at the 26&#x20;&#xd7; 26 scale. Finally, the output layer is fused with the underlying feature layer as the feature output at the 13&#x20;&#xd7; 13 scale. In this way, the semantic information of the middle layer feature map is enhanced and the model performance is optimized for the detection of small a&#x20;flame.</p>
</sec>
</sec>
<sec id="s4">
<title>Experiment</title>
<sec id="s4-1">
<title>Data Set Production</title>
<p>Getting real data sets is not easy for researchers. Currently, the data sets studied are all from several open data sets on the internet. However, there is no standard flame data set for comparison in the field of flame detection (<xref ref-type="bibr" rid="B10">Ghali et&#x20;al., 2020</xref>). Many existing flame data sets on the internet have some problems such as image distortion and excessive flame, which are not conducive to the training of the model and detection of early flame. To better realize the detection of flame by the model and highlight the pertinence of renewable energy fires, we have built a flame data set by ourselves, which includes a variety of combustion conditions under different disturbances in different scenarios.</p>
<p>Flame data sets are mainly divided into two types, indoor and outdoor. In the indoor scene, the standard combustion chamber was selected as the environment for the shooting of the flame video. A standard combustion chamber has a large facility commonly used in the field of fire detection. It is generally used for the research of fuel combustion, fuel products, detectors, and so on. The current utilization forms of renewable energy are mainly renewable energy batteries and new energy vehicles. The battery has a certain fire risk in the process of production, storage and transportation. The warehouse is an important scene in this process. Therefore, for the indoor scene, we make the warehouse scene in the standard combustion room to obtain a similar background. Renewable energy vehicle fire is in a high incidence trend in recent years, so the outdoor scene selects the common parking spots in the campus. Trees, buildings, cars, and other objects are used as the background to obtain the flame data set. Considering the richness of the data set and the robustness of the model, a variety of combustibles and oil plates of different sizes were used to photograph the flames. Sunlight, light, personnel, and other interference items were added into the shooting background, and a variety of combustibles were used to enrich the types of flames. <xref ref-type="table" rid="T1">Table 1</xref> shows the working conditions involved in this data&#x20;set.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Flame dataset conditions.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Combustible</th>
<th align="center">Fuel plate size</th>
<th align="center">Interference items</th>
<th align="center">Indoor/outdoor</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Polyurethane</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Polyurethane</td>
<td align="center">&#x2014;</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Polyurethane</td>
<td align="center">&#x2014;</td>
<td align="left">People</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Polyurethane</td>
<td align="center">&#x2014;</td>
<td align="left">Lamplight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Polyurethane</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">Cardboard</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Cardboard</td>
<td align="center">&#x2014;</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Cardboard</td>
<td align="center">&#x2014;</td>
<td align="left">People</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Cardboard</td>
<td align="center">&#x2014;</td>
<td align="left">Lamplight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Cardboard</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">Straw</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Straw</td>
<td align="center">&#x2014;</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Igniter</td>
<td align="center">&#x2014;</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">People</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Lamplight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">15&#xa0;cm &#xd7; 15&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Ethanol</td>
<td align="center">15&#xa0;cm &#xd7; 15&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">People</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Lamplight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">15&#xa0;cm &#xd7; 15&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">n-heptane</td>
<td align="center">15&#xa0;cm &#xd7; 15&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Outdoor</td>
</tr>
<tr>
<td align="left">Toluene</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Toluene</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Sunlight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Toluene</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">People</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">Toluene</td>
<td align="center">7&#xa0;cm &#xd7; 7&#xa0;cm</td>
<td align="left">Lamplight</td>
<td align="left">Indoor</td>
</tr>
<tr>
<td align="left">toluene</td>
<td align="center">15&#xa0;cm &#xd7; 15&#xa0;cm</td>
<td align="left">&#x2014;</td>
<td align="left">Indoor</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A total of 7,254 images were selected from the filmed videos and used as the training dataset. Some of the images are shown in <xref ref-type="fig" rid="F8">Figure&#x20;8</xref>. A public LabelImg labeling system was used to label the flame part of the image and store it in the format of Pascal Voc 2007 (<xref ref-type="bibr" rid="B7">Everingham et&#x20;al., 2006</xref>) sample&#x20;set.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Some images of the flame dataset.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g008.tif"/>
</fig>
</sec>
<sec id="s4-2">
<title>Training</title>
<p>This experiment is carried out under Win10 system, GPU is GeForce GTX 1080, CPU is Intel(R) Core(TM) I7-3960X, 32G memory. Keras, a deep learning framework is used for the model, and Mosaic enhancement is adopted for training. We divided 7,254 data set images into training set and test set, of which 5,558 pictures were used for training and 1,696 pictures were used for testing and verification. The initial learning rate of the training model was set at&#x20;0.001.</p>
</sec>
<sec id="s4-3">
<title>Evaluation Index</title>
<p>To test the detection performance of the model, we introduced the following indicators: Precision Rate (PR), Recall Rate (RR), Accuracy Rate (AR), and False Alarm Rate (FAR).</p>
<p>The calculation formula of Precision Rate is shown below. In the formula, TP (true positive) refers to the correct response, that is, the number of correctly identified flame pictures in the test set, FP (false positive) refers to the false response, that is, the number of negative samples in the test set that are wrongly identified as flame. In the test, we determine the results of the test set according to the actual situation. We hope that the proposed algorithm can quickly identify all flame objects in the monitoring picture. Therefore, if the intersection of the detection box and the ground truth of the flame object is greater than 0.5 of the union set, it is deemed that the object is correctly detected; otherwise, it misses detection. For a positive sample, we can classify it as <italic>TP</italic> only when all objects in the image are detected. Even if multiple objects are successfully detected, we strictly classify them as <italic>FN</italic> as long as there is one missing object detected because this result does not meet our requirements. For a negative sample, if there is no detection box, it can be classified as <italic>TN</italic>, and if there is a detection box, it can be classified as <italic>FP</italic>. Precision Rate represents the proportion of correctly detected flame in all detection results, reflecting the credibility of flame detection.<disp-formula id="equ1">
<mml:math id="m1">
<mml:mrow>
<mml:mtext>PR</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The formula for Recall Rate is defined as follows. In the formula, FN (false negative) refers to the wrong negative sample, that is, the number of flame images that are not recognized in the test set. Recall Rate represents the proportion of correctly detected flames in all fires that should have been detected, reflecting the model&#x2019;s ability to detect flames.<disp-formula id="equ2">
<mml:math id="m2">
<mml:mrow>
<mml:mtext>RR</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The calculation formula for Accuracy Rate is shown below, in which TN (true negative) refers to the correct negative sample, that is, the number of negative samples without false positives. Accuracy Rate refers to the ratio of correctly predicted samples to the total predicted samples, which reflects the comprehensive ability of model detection.<disp-formula id="equ3">
<mml:math id="m3">
<mml:mrow>
<mml:mtext>AR</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The formula for False Alarm Rate is defined as follows. False Alarm Rate is an evaluation index in the field of fire detection. For the application scenarios of fire detection, most of the time is in the non-flame negative sample state, so it is very important to control false positives for fire detection.<disp-formula id="equ4">
<mml:math id="m4">
<mml:mrow>
<mml:mtext>FAR</mml:mtext>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mtext>%</mml:mtext>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</sec>
<sec id="s4-4">
<title>Test With Test Set</title>
<p>The trained model was used to test the test set, and part of the test results were shown in <xref ref-type="fig" rid="F9">Figure&#x20;9</xref>. The proposed method works well in different scenarios, it can be seen that the model has high accuracy in the identification and location of small flames, which is helpful to detect and alarm in the early stage of&#x20;fire.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Partial results identified by the proposed&#x20;model.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g009.tif"/>
</fig>
<p>To better evaluate the performance of the proposed model, in addition to Yolov3, other common one-stage target detection models are introduced for comparative testing. The same training set was used to train these models in the same environment, and the same test set was used for testing. The results show that the proposed method is more sensitive to small flames. As shown in <xref ref-type="fig" rid="F10">Figure&#x20;10</xref>, other models cannot successfully identify such image frames with a small fire, but in our proposed model, they can be successfully identified as fire. More specific results are shown in <xref ref-type="table" rid="T2">Table&#x20;2</xref>. Both the training set and the test set are small flame images. It can be seen that compared with Yolov3, the proposed method has improved in all four indicators, including a significant increase in Precision Rate and a significant decrease in False Alarm Rate, reflecting that the improved model has indeed improved the performance of detecting small flames. Compared with other models, the proposed method also shows some advantages, with all four indexes ranking first, reflecting the absolute superiority of our method in early flame detection. Precision Rate and False Alarm Rate were superior, with the false alarm rate as low as 1.2%, indicating the stability of the method. Besides, we add two advanced two-stage models as a comparison. The two-stage model first generates a series of candidate boxes as samples by the algorithm, and then classifies samples by convolution neural network. Therefore, it has higher accuracy and slower speed. It can be seen that the proposed method has higher accuracy while having smaller size and faster calculation&#x20;speed.</p>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Recognition results of small flame image frames in different models. <bold>(A)</bold> The image frame identified as a negative sample in other models <bold>(B)</bold> the image frame identified as a positive sample in the proposed&#x20;model.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g010.tif"/>
</fig>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Test results.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Model</th>
<th align="center">PR (%)</th>
<th align="center">RR (%)</th>
<th align="center">FAR(%)</th>
<th align="center">AR (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Proposed method</td>
<td align="char" char=".">98.7</td>
<td align="char" char=".">93.7</td>
<td align="char" char=".">1.2</td>
<td align="char" char=".">96.3</td>
</tr>
<tr>
<td align="left">Yolov3 (<xref ref-type="bibr" rid="B25">Redmon and Farhadi, 2018</xref>)</td>
<td align="char" char=".">93.4</td>
<td align="char" char=".">92.5</td>
<td align="char" char=".">6.2</td>
<td align="char" char=".">93.2</td>
</tr>
<tr>
<td align="left">SSD (<xref ref-type="bibr" rid="B22">Liu et&#x20;al., 2016</xref>)</td>
<td align="char" char=".">94.5</td>
<td align="char" char=".">51.9</td>
<td align="char" char=".">2.8</td>
<td align="char" char=".">74.9</td>
</tr>
<tr>
<td align="left">RFBnet (<xref ref-type="bibr" rid="B21">Liu et&#x20;al., 2018</xref>)</td>
<td align="char" char=".">90.4</td>
<td align="char" char=".">86.1</td>
<td align="char" char=".">8.8</td>
<td align="char" char=".">88.7</td>
</tr>
<tr>
<td align="left">Efficientdet (<xref ref-type="bibr" rid="B31">Tan et&#x20;al., 2019</xref>)</td>
<td align="char" char=".">95.9</td>
<td align="char" char=".">93.3</td>
<td align="char" char=".">3.8</td>
<td align="char" char=".">94.8</td>
</tr>
<tr>
<td align="left">Yolov4 (<xref ref-type="bibr" rid="B2">Bochkovskiy et&#x20;al., 2004</xref>)</td>
<td align="char" char=".">93.1</td>
<td align="char" char=".">92.7</td>
<td align="char" char=".">6.6</td>
<td align="char" char=".">87.1</td>
</tr>
<tr>
<td align="left">Retinanet (<xref ref-type="bibr" rid="B20">Lin et&#x20;al., 2017b</xref>)</td>
<td align="char" char=".">96.5</td>
<td align="char" char=".">88.7</td>
<td align="char" char=".">3.1</td>
<td align="char" char=".">92.9</td>
</tr>
<tr>
<td align="left">Faster R-CNN (<xref ref-type="bibr" rid="B26">Ren et&#x20;al., 2016</xref>)</td>
<td align="char" char=".">97.3</td>
<td align="char" char=".">93.1</td>
<td align="char" char=".">2.3</td>
<td align="char" char=".">95.2</td>
</tr>
<tr>
<td align="left">Fast R-CNN (<xref ref-type="bibr" rid="B11">Girshick, 2015</xref>)</td>
<td align="char" char=".">94.3</td>
<td align="char" char=".">85.2</td>
<td align="char" char=".">3.5</td>
<td align="char" char=".">90.1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4-5">
<title>Real-Time Test of Fire Scenarios</title>
<p>In order to test the effect of the model we proposed in practical application, we used the monitoring cameras installed in the laboratory building to carry out real-time flame detection, which were similarly divided into an indoor scene and an outdoor&#x20;scene.</p>
<p>As shown in the <xref ref-type="fig" rid="F11">Figure&#x20;11</xref>, the outdoor scenes include the rooftop and the outdoor scene of the first floor. The rooftop is equipped with three surveillance cameras at different angles, while the outdoor scene of the first floor is equipped with one surveillance camera. The interior scene is a standard combustion chamber with a surveillance camera installed inside. Since the test object is small flame, we choose an oil pan with the size of 7&#xa0;cm <inline-formula id="inf1">
<mml:math id="m5">
<mml:mo>&#xd7;</mml:mo>
</mml:math>
</inline-formula> 7&#xa0;cm to ignite n-heptane for the&#x20;test.</p>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Views of monitoring cameras. <bold>(A)</bold> View of rooftop camera1&#x20;<bold>(B)</bold> view of rooftop camera2&#x20;<bold>(C)</bold> View of rooftop camera3&#x20;<bold>(D)</bold> view of indoor camera <bold>(E)</bold> view of first floor camera.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g011.tif"/>
</fig>
<p>We tested the non-fire scenario in each environment before ignition, and no false positives were generated. So, PR and FAR were not included in the analysis of the results, only the recall rate (RR) was analyzed. The total number of frames in the real-time detection process and the number of fire frames detected were calculated by the script, and the recall rate was calculated accordingly. In real-time detection, we introduce the concept of FPS, FPS is the number of image frames detected per second, reflecting the speed of model detection. The test results are shown in <xref ref-type="table" rid="T3">Table&#x20;3</xref>. The model performs well in real-time detection. As shown in <xref ref-type="fig" rid="F12">Figure&#x20;12</xref>, the majority of small flames can be successfully identified in real-time detection, and the FPS value is stable at around 11, which reflects the sensitivity of the proposed model to small flames. However, there is still a gap between the recall rate in real-time detection and the recall rate in the test&#x20;set.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Real-time detection results.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Camera</th>
<th align="center">Fuel</th>
<th align="center">RR (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Indoor camera</td>
<td align="center">n-heptane</td>
<td align="char" char=".">78.4</td>
</tr>
<tr>
<td align="left">Rooftop camera1</td>
<td align="center">n-heptane</td>
<td align="char" char=".">70.2</td>
</tr>
<tr>
<td align="left">Rooftop camera2</td>
<td align="center">n-heptane</td>
<td align="char" char=".">65.6</td>
</tr>
<tr>
<td align="left">Rooftop camera3</td>
<td align="center">n-heptane</td>
<td align="char" char=".">66.3</td>
</tr>
<tr>
<td align="left">First floor camera</td>
<td align="center">n-heptane</td>
<td align="char" char=".">76.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>Part of the flame frames that were detected.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g012.tif"/>
</fig>
<p>Therefore, we analyzed the flame frames that were not detected. As shown in <xref ref-type="fig" rid="F13">Figure&#x20;13</xref>, undetected flame frames fall into two categories. One is that it is difficult to identify the flame due to the complex background. This situation mainly occurs in two of the monitoring pictures on the rooftop. The monitoring perspective of these two cameras leads to a chaotic picture, which also explains why the real-time detection performance is better in the standard room with a relatively empty environment. In this case, we can consider labeling the undetected flame frames and iterating training. One is due to the influence of light, the flame is blurred or even the flame becomes invisible, so they cannot be detected. For flame frames whose flame profile is still recognizable under the influence of light, we can label them and then carry out iterative training of the model. As for the flame frame that is not visible under the influence of light, this problem is difficult to be solved in a single visible light channel. In the future, we will seek a solution by combining infrared channels and visible light channels.</p>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>Part of the flame frames that were not detected. <bold>(A)</bold> Part of the flame frames that were not detected due to complex background <bold>(B)</bold> part of the flame frames that were not detected due to the influence of&#x20;light.</p>
</caption>
<graphic xlink:href="fenrg-10-848754-g013.tif"/>
</fig>
<p>In the training process of neural network, iteration refers to the process of updating the parameters of the model with a batch of data. In practical engineering applications, because the detection environment is in a stable state for a long time, we collect the missing flame frames in the detection and update the existing data set. On this basis, the training is continued with the new data set to update the model parameters, so that the model can recognize the unrecognized flame frames and achieve better actual detection effect in this environment. We call this training method iterative training. After the iterative training, the detection rate of the flame frame has been significantly improved when the same scene is detected in real-time. The results are shown in <xref ref-type="table" rid="T4">Table&#x20;4</xref>. The recall rate has now approached the value in the test set. This shows that in practical applications, model iteration with pictures of actual scenes can achieve the best detection performance of the&#x20;model.</p>
<table-wrap id="T4" position="float">
<label>TABLE 4</label>
<caption>
<p>Real-time detection results (after iterative training).</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Camera</th>
<th align="center">Fuel</th>
<th align="center">RR (%)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Indoor camera</td>
<td align="center">n-heptane</td>
<td align="char" char=".">91.3</td>
</tr>
<tr>
<td align="left">Rooftop camera1</td>
<td align="center">n-heptane</td>
<td align="char" char=".">89.1</td>
</tr>
<tr>
<td align="left">Rooftop camera2</td>
<td align="center">n-heptane</td>
<td align="char" char=".">88.7</td>
</tr>
<tr>
<td align="left">Rooftop camera3</td>
<td align="center">n-heptane</td>
<td align="char" char=".">88.3</td>
</tr>
<tr>
<td align="left">First floor camera</td>
<td align="center">n-heptane</td>
<td align="char" char=".">90.5</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec sec-type="conclusion" id="s5">
<title>Conclusion</title>
<p>In order to solve the problem that small flames in early fires are prone to omission and false positives, this paper proposes an improved model based on Yolov3 for this problem. Multi-scale convolution and increasing receptive field were used to improve the sensitivity of the model to a small flame, and FPN structure was used to enhance the ability of feature extraction. The experimental results show that both compared with the original Yolov3 model and other commonly used object detection models, the proposed model performs better in flame recognition and accomplishes its original design intention for small flame recognition. In this paper, the proposed model is applied to the actual scene to obtain good performance, found iterative training for practical application has a key role in testing. At the same time, this paper also establishes a flame data set for early fires, including indoor and outdoor conditions, which provides a certain basis for future flame detection research.</p>
<p>In the process of establishing the data set, we also found the deficiency of the current model. In the case of direct interference from some strong light sources, the flame and light are fused and cannot be distinguished from the naked eye, which cannot be started from the annotation of the data set. How to solve such problems will be the research target of the next&#x20;stage.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>QZ and PD conceived of the presented idea. PD. and YH developed the theory and performed the experiments. GL. and RT. verified the analytical methods. YZ encouraged PD and MS to investigate real-time detection and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.</p>
</sec>
<sec id="s8">
<title>Funding</title>
<p>This work was financially supported by the National Key Research and Development Plan under Grant No. 2018YFC0809502, the Open Project Program of State Key Laboratory of Fire Science under Grant No. HZ2020-KF07, and the Research Plan of Fire and Rescue Department, Ministry of Emergency Management under Grant No. 2018XFGG12. National Natural Science Foundation of China (No. 52076084).</p>
</sec>
<sec sec-type="COI-statement" id="s9">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abohamzeh</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Salehi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Sheikholeslami</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Review of Hydrogen Safety during Storage, Transmission, and Applications Processes</article-title>. <source>J.&#x20;Loss Prev. Process Industries</source> <volume>72</volume>, <fpage>72</fpage>. <pub-id pub-id-type="doi">10.1016/j.jlp.2021.104569</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Bochkovskiy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C-Y.</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>H-Y. M.</given-names>
</name>
</person-group> (<year>2004</year>). <source>YOLOv4: Optimal Speed and Accuracy of Object Detection</source>. <comment>arxiv is 2004.10934</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2020arXiv200410934B">https://ui.adsabs.harvard.edu/abs/2020arXiv200410934B</ext-link>
</comment> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borges</surname>
<given-names>P. V. K.</given-names>
</name>
<name>
<surname>Izquierdo</surname>
<given-names>E.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>A Probabilistic Approach for Vision-Based Fire Detection in Videos</article-title>. <source>IEEE Trans. Circuits Syst. Video Technol.</source> <volume>20</volume>, <fpage>721</fpage>&#x2013;<lpage>731</lpage>. <pub-id pub-id-type="doi">10.1109/tcsvt.2010.2045813</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dai</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>R-FCN: Object Detection <italic>via</italic> Region-Based Fully Convolutional Networks</article-title>,&#x201d; in <conf-name>Proceedings of the 30th International Conference on Neural Information Processing Systems</conf-name>, <conf-loc>Barcelona, Spain</conf-loc>, <conf-date>2016</conf-date> (<publisher-loc>Barcelona</publisher-loc>: <publisher-name>Spain: Curran Associates Inc</publisher-name>), <fpage>379.</fpage> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dimitropoulos</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Barmpoutis</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Grammalidis</surname>
<given-names>N.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Spatio-Temporal Flame Modeling and Dynamic Texture Analysis for Automatic Video-Based Fire Detection</article-title>. <source>IEEE Trans. Circuits Syst. Video Techn.</source> <volume>2015</volume> (<issue>25</issue>), <fpage>339</fpage>&#x2013;<lpage>351</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Dua</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Charan</surname>
<given-names>G. S.</given-names>
</name>
</person-group> (<year>2020</year>). &#x201c;<article-title>An Improved Approach for Fire Detection Using Deep Learning Models</article-title>,&#x201d; in <conf-name>2020 International Conference on Industry 40 Technology (I4Tech)</conf-name>, <conf-loc>Pune, India</conf-loc>, <conf-date>February 13-15, 2020</conf-date>. </citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Everingham</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zisserman</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>C. K. I.</given-names>
</name>
</person-group> (<year>2006</year>). &#x201c;<article-title>The 2005 PASCAL Visual Object Classes challenge</article-title>,&#x201d; in <source>Machine Learning Challenges&#x2014;Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment</source> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>). </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>J.&#x20;N.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>X. Z.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Experimental Study on the Vertical thermal Runaway Propagation in Cylindrical Lithium-Ion Batteries: Effects of Spacing and State of Charge</article-title>. <source>Appl. Therm. Eng.</source> <volume>197</volume>, <fpage>197</fpage>. <pub-id pub-id-type="doi">10.1016/j.applthermaleng.2021.117399</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Frizzi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kaabi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Bouchouicha</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>Convolutional Neural Network for Video Fire and Smoke Detection</article-title>,&#x201d; in <conf-name>IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics Society</conf-name>, <conf-loc>Florence, Italy</conf-loc>, <conf-date>October 23-26, 2016</conf-date>, <fpage>877</fpage>&#x2013;<lpage>882</lpage>. <pub-id pub-id-type="doi">10.1109/iecon.2016.7793196</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ghali</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jmal</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Souidene Mseddi</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Recent Advances in Fire Detection and Monitoring Systems: A Review. in: Cham</source>. <publisher-name>Springer International Publishing.</publisher-name> </citation>
</ref>
<ref id="B11">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). <source>Fast R-CNN</source>. <comment>arxiv is 1504.08083</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2015arXiv150408083G">https://ui.adsabs.harvard.edu/abs/2015arXiv150408083G</ext-link>
</comment>.</citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kalchbrenner</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Espeholt</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Simonyan</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Neural Machine Translation in Linear Time</source>. <comment>arxiv is 1610.10099</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2016arXiv161010099K">https://ui.adsabs.harvard.edu/abs/2016arXiv161010099K</ext-link>
</comment> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khudayberdiev</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Butt</surname>
<given-names>M. H. F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Fire Detection in Surveillance Videos Using a Combination with PCA and CNN</article-title>. <source>Acad. J.&#x20;Comput. Inf. Sci.</source> <volume>3</volume>, <fpage>3</fpage>. <pub-id pub-id-type="doi">10.25236/AJCIS.030304</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>J.&#x20;A.</given-names>
</name>
</person-group> (<year>2019</year>). <source>Video-Based Fire Detection Using Deep Learning Models</source>. <publisher-loc>Basel, Switzerland</publisher-loc>: <publisher-name>Applied Sciences.</publisher-name> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lecun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bottou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Haffner</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>1998</year>). <article-title>Gradient-based Learning Applied to Document Recognition</article-title>. <source>Proc. IEEE</source> <volume>86</volume>, <fpage>2278</fpage>&#x2013;<lpage>2324</lpage>. <pub-id pub-id-type="doi">10.1109/5.726791</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Image Fire Detection Algorithms Based on Convolutional Neural Networks</article-title>. <source>Case Studies in Thermal Engineering</source>. <publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Elsevier</publisher-name>, <fpage>19</fpage>. <pub-id pub-id-type="doi">10.1016/j.csite.2020.100625</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Smoke Detection on Video Sequences Using 3D Convolutional Neural Networks</article-title>. <source>Fire Technol.</source> <volume>55</volume>, <fpage>1827</fpage>&#x2013;<lpage>1847</lpage>. <pub-id pub-id-type="doi">10.1007/s10694-019-00832-w</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Doll&#xe1;r</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Focal Loss for Dense Object Detection</article-title>,&#x201d; in <conf-name>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 21-26</conf-name>, <conf-loc>Venice, Italy</conf-loc>, <conf-date>October 22-29, 2017</conf-date>. </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Goyal</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R</given-names>
</name>
</person-group>. <article-title>Feature Pyramid Networks for Object Detection</article-title>.&#x201d; in <conf-name>2017 IEEE International Conference on Computer Vision (ICCV) 22-29</conf-name>, <conf-loc>Honolulu, HI</conf-loc>, <conf-date>July 21-26,</conf-date> <year>2017.</year> </citation>
</ref>
<ref id="B21">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Receptive Field Block Net for Accurate and Fast Object Detection</article-title>,&#x201d; in <source>Cham</source> (<publisher-name>Springer International Publishing</publisher-name>) <pub-id pub-id-type="doi">10.1007/978-3-030-01252-6_24</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Anguelov</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Erhan</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). &#x201c;<article-title>SSD: Single Shot MultiBox Detector</article-title>,&#x201d; in <source>Cham</source> (<publisher-name>Springer International Publishing</publisher-name>), <pub-id pub-id-type="doi">10.1007/978-3-319-46448-0_2</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ould Ely</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kamzabek</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chakraborty</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Batteries Safety: Recent Progress and Current Challenges</article-title>. <source>Front. Energ. Res.</source> <volume>7</volume>, <fpage>7</fpage>. <pub-id pub-id-type="doi">10.3389/fenrg.2019.00071</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qazi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hussain</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Rahim</surname>
<given-names>N. A.</given-names>
</name>
<name>
<surname>Hardaker</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Alghazzawi</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Shaban</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Towards Sustainable Energy: A Systematic Review of Renewable Energy Sources, Technologies, and Public Opinions</article-title>. <source>Ieee Access</source> <volume>7</volume>, <fpage>63837</fpage>&#x2013;<lpage>63851</lpage>. <pub-id pub-id-type="doi">10.1109/access.2019.2906402</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Redmon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Farhadi</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <source>YOLOv3: An Incremental Improvement</source>. <comment>arXiv: 1804.02767</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2018arXiv180402767R">https://ui.adsabs.harvard.edu/abs/2018arXiv180402767R</ext-link>
</comment> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>He</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Girshick</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks</article-title>. <source>IEEE Trans. Pattern Anal. Mach Intell.</source> <volume>39</volume> (<issue>39</issue>), <fpage>1137</fpage>&#x2013;<lpage>1149</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI10.1109/TPAMI.2016.2577031</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Granmo</surname>
<given-names>O-C.</given-names>
</name>
<name>
<surname>Goodwin</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Deep Convolutional Neural Networks for Fire Detection in Images</source>. <publisher-name>Springer International Publishing</publisher-name> </citation>
</ref>
<ref id="B28">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Shen</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Flame Detection Using Deep Learning</article-title>,&#x201d; in <conf-name>2018 4th International Conference on Control, Automation and Robotics (ICCAR) 20-23</conf-name>, <conf-loc>Auckland, New zealand</conf-loc>, <conf-date>April 20-23, 2018</conf-date>. </citation>
</ref>
<ref id="B29">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Ioffe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Vanhoucke</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2017</year>). <source>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence</source>. <publisher-loc>San Francisco, CaliforniaUSA</publisher-loc>: <publisher-name>AAAI Press</publisher-name>. </citation>
</ref>
<ref id="B30">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Szegedy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yangqing</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Going Deeper with Convolutions</article-title>,&#x201d; in <conf-name>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 7-12</conf-name>, <conf-loc>San Francisco, CA</conf-loc>, <conf-date>February 04-09, 2017</conf-date>. </citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>Q. V.</given-names>
</name>
</person-group> (<year>2019</year>). <source>EfficientDet: Scalable and Efficient Object Detection</source>. <comment>arXiv: 191109070T</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2019arXiv191109070T">https://ui.adsabs.harvard.edu/abs/2019arXiv191109070T</ext-link>
</comment> </citation>
</ref>
<ref id="B32">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Tran</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bourdev</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Fergus</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2015</year>). &#x201c;<article-title>Learning Spatiotemporal Features with 3D Convolutional Networks</article-title>,&#x201d; in <conf-name>2015 IEEE International Conference on Computer Vision (ICCV)</conf-name>, <conf-loc>Santiago, Chile</conf-loc>, <conf-date>December, 11-18, 2015</conf-date>. </citation>
</ref>
<ref id="B33">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>van den Oord</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Dieleman</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zen</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2016</year>). <source>WaveNet: A Generative Model for Raw Audio</source>. <comment>arXiv :1609.03499</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2016arXiv160903499V">https://ui.adsabs.harvard.edu/abs/2016arXiv160903499V</ext-link>
</comment> </citation>
</ref>
<ref id="B34">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z. Y.</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>S. W.</given-names>
</name>
</person-group> (<year>2018</year>). &#x201c;<article-title>Smoothed Dilated Convolutions for Improved Dense Prediction</article-title>,&#x201d; in <conf-name>24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) Londonproceedings of the 24th acm sigkdd international conference on knowledge discovery &#x26; data mining</conf-name>, <conf-loc>London, England</conf-loc>, <conf-date>August 19-23, 1999</conf-date> (<publisher-loc>ENGLAND)</publisher-loc>. </citation>
</ref>
<ref id="B35">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Research on Deep Domain Adaptation and Saliency Detection in Fire Smoke Image Recognition</source>. <publisher-name>University of Science and Technology of China</publisher-name>. </citation>
</ref>
<ref id="B36">
<citation citation-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yamagishi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yamaguchi</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1999</year>). &#x201c;<article-title>Fire Flame Detection Algorithm Using a Color Camera</article-title>,&#x201d; in <conf-name>MHS&#x27;99 Proceedings of 1999 International Symposium on Micromechatronics and Human Science (Cat No99TH8478)</conf-name>, <conf-loc>Nagoya, Japan</conf-loc>, <conf-date>November 23-26, 1999</conf-date>. </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Leow</surname>
<given-names>W. R.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Thermal-responsive Polymers for Enhancing Safety of Electrochemical Storage Devices</article-title>. <source>Adv. Mater.</source> <volume>30</volume>, <fpage>e1704347</fpage>. <pub-id pub-id-type="doi">10.1002/adma.201704347</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Young-Jin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Eun-Gyung</surname>
<given-names>K</given-names>
</name>
</person-group>. (<year>2017</year>). <source>Fire Detection System Using Faster R-CNN</source>. <publisher-loc>Seoul</publisher-loc>: <publisher-name>Korea Information and Communication Society</publisher-name>. </citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Koltun</surname>
<given-names>V.</given-names>
</name>
</person-group> (<year>2016</year>). <source>Multi-Scale Context Aggregation by Dilated Convolutions</source>. <comment>arxiv is 1511.07122</comment>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://ui.adsabs.harvard.edu/abs/2015arXiv151107122Y">https://ui.adsabs.harvard.edu/abs/2015arXiv151107122Y</ext-link>
</comment> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improve YOLOv3 Using Dilated Spatial Pyramid Module for Multi-Scale Object Detection</article-title>. <source>Int. J.&#x20;Adv. Robotic Syst.</source> <volume>17</volume>, <fpage>1729881420936062</fpage>. <pub-id pub-id-type="doi">10.1177/1729881420936062</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhong</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Video Fire Recognition Based on Multi-Channel Convolutional Neural Network</article-title>. <source>J.&#x20;Phys. Conf. Ser.</source> <volume>1634</volume>, <fpage>1634</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/1634/1/012020</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>