1 Introduction

Front. Trop. Dis

Frontiers in Tropical Diseases

Front. Trop. Dis

2673-7515

Frontiers Media S.A.

10.3389/fitd.2021.769968

Tropical Diseases

Original Research

A Comparative Study of Machine Learning Techniques for Multi-Class Classification of Arboviral Diseases

Tabosa de Oliveira

Thomás

¹ da Silva Neto

Sebastião Rogério

¹ Teixeira

Igor Vitor

¹ Aguiar de Oliveira

Samuel Benjamin

² ³ de Almeida Rodrigues

Maria Gabriela

² ³ Sampaio

Vanderson Souza

² ³ ⁴ Endo

Patricia Takako

¹ ^*

¹ Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife, Brazil ² Programa de Pós-Graduação em Medicina Tropical, Universidade do Estado do Amazonas, Manaus, Brazil ³ Fundação de Medicina Tropical Dr. Heitor Vieira Dourado, Manaus, Brazil ⁴ Fundação de Vigilância em Saúde Dra. Rosemary Costa Pinto, Manaus, Brazil

Edited by: Manoel Barral-Netto, Gonçalo Moniz Institute (IGM), Brazil

Reviewed by: Rajnikant Dixit, National Institute of Malaria Research (ICMR), India; Ricardo Khouri, Oswaldo Cruz Foundation (Fiocruz), Brazil

*Correspondence: Patricia Takako Endo, patricia.endo@upe.br

This article was submitted to Major Tropical Diseases, a section of the journal Frontiers in Tropical Diseases

18 02 2022

2021

769968

02 09 2021 29 12 2021

2022

Tabosa de Oliveira, da Silva Neto, Teixeira, Aguiar de Oliveira, de Almeida Rodrigues, Sampaio and Endo

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Among the neglected tropical diseases (NTDs), arboviral diseases present a significant number of cases worldwide. Their correct classification is a complex process due to the similarity of symptoms and the lack of tests in Brazil countryside is a big challenge to be overcome. Given this context, this paper proposes a comparative study of machine learning techniques for multi-class classification of arboviral diseases, which considers three classes: DENGUE, CHIKUNGUNYA and OTHERS, and uses clinical and socio-demographic data from patients. Feature selection techniques were also used for selecting the best subset of attributes for each model. Gradient boosting machines presented the best result in the metrics and a good subset of attributes for daily usage by the physicians that resulted in a 76.58% recall on the CHIKUNGUNYA class.

arboviral diseases neglected tropical disease (NTD) machine learning multi-class classification dengue (DENV) Chikungunya (CHIKV)

062.00249/2020

Fundação de Amparo à Pesquisa do Estado do Amazonas10.13039/501100004916

1 Introduction

In 2015, the 2030 Agenda^¹ was conceived by representatives of the member states of the United Nations (UN), and its main purpose is focused on eradicating poverty in all forms and dimensions via the implementation of sustainable development around the world. To achieve this major objective, 17 sustainable development goals (SDGs) were developed. Among them, Goal 3 (health and well-being) seeks to promote well-being for all, at all ages. Target 3.3 aims to end epidemics of AIDS, tuberculosis, malaria, and neglected tropical diseases (NTD), as well as combating hepatitis, waterborne diseases and other communicable diseases by the year 2030.

Arboviral diseases are NTDs caused by viruses and are transmitted by mosquitoes as their vector. Currently, there are about 545 known species of arboviruses, of which about 150 of them cause diseases in humans (1). In addition to Dengue virus (DENV), in the last 10 years, the emergence of other arboviruses, such as Chikungunya virus (CHIKV), Zika virus (ZIKV) and West Nile virus (WNV), has been observed. According to Lima-Camara (2016), disorganised urban growth and the modification of the environment by human actions are some of the reasons that influenced the increase in this type of disease (2).

According to reports released by the Pan American Health Organization (PAHO)^²,^³ in 2020, together Dengue and Chikungunya accounted for a total of 2,402,128 cases in the Americas. However, most of these cases were classified as suspected cases due to the difficulty involved in their confirmation. For example, only 43.81% of reported Dengue cases (1,007,939 cases) were actually confirmed, and for Chikungunya, as few as 39% (39,619 cases) were confirmed. The low proportion of confirmed cases is due to the high complexity in the classification of these diseases in terms of their signs and symptoms. According to the Health Library of Primary Health Care (from Portuguese Biblioteca Virtual em Saúde da Atenção Primária à Saúde) (BVS APS)^⁴, most of cases are limited to the patients’ signs and symptoms and the local epidemiological status. In addition, rapid tests available at primary healthcare centers have low accuracy. Despite (3) state that “cross-reactions with DENV or ZIKV infections are unlikely, because CHIKV is an alphavirus, while DENV and ZIKV are antigenically unrelated flaviviruses”, it can be a concern. Actually, the cross-reactivity is one of the issues that pose barriers to the correct diagnosis for all arboviruses diseases at low-level health units. However, the lack of tests is also a major issue in the Amazon countryside. Therefore, accurate testing require specific equipment and time, though this also presents operational costs.

As a tropical country, Brazil has a huge diversity of both flora and fauna, and this includes mosquitos, which play an important role as vectors of illnesses such as arboviral diseases (4). According to PAHO, Brazil had the highest number of Dengue cases in the Americas in 2020, with 1,040,481 cases (65% of the total). Clinical classification of an arboviral disease is particularly a complex task in Brazil because of concomitant circulation of other arboviruses, such as Mayaro virus (MAYV), Venezuelan equine encephalitis virus (VEEV), Eastern equine encephalitis virus (EEEV), and Rocio virus (ROCV), which present a similar clinical profile (2). Besides the difficulty in clinical classification, cross-reaction is an issue for the current rapid tests that are available and this reduces their accuracy (2). Although high lethality has not been evidenced so far, the occurrence of coinfection with several arboviruses or concomitant circulation is cause for concern.

The Brazilian Unified Health System (from Portuguese, Sistema Único de Saúde) SUS has suffered over the years from a reduction in funding and this imposes an additional barrier to expanding quality diagnostic testing and presents a major public health challenge, highlighting the need for a low-cost diagnostic approach. The use of Machine Learning (ML) techniques becomes an interesting alternative, as they are able to recognise and develop a classification without the need for immediate laboratory tests. This would avoid the costs of collecting them and running these tests. As stated by Bulbul and Unsal, “compared to classical methods, the process of obtaining information is much more accurate and faster with data mining and ML” (5). ML models estimate results by learning from previously entered information. In addition, these models do not require computational power and can be executed in tablets or cell phones.

Most studies that deal with this problem have proposed models for diagnosing Dengue (6, 7); Chikungunya (8); or Zika (9) individually; and, to the best of our knowledge, only one study has provided a model for distinguishing of two arboviral diseases (Dengue and Chikungunya) (10), however the study also used laboratory data to perform the classification. Despite improving the results, we do not employ these types of data, as they, in addition to needing adequate equipment, would prevent the ML model from being used for a quick diagnosis at the time of the patient’s arrival at the health unit. Furthermore, most of the existing works did not present a clear methodology that describes the pre-processing of data, hyperparameter optimization techniques, or feature selection. In our work, the entire data pre-processing and balancing are systematically presented, as well as a comparison of feature selection techniques with grid search. We present not only the best attributes for each model, but also the best configuration for each scenario. We also provide a discussion regarding the model that was trained with the best features selected by the sequential feature algorithm (SFA) techniques and a model designed with features selected by health specialists.

The present work proposes different ML models and compares them for multi-class classification of Dengue, Chikungunya and other diseases, using the clinical and socio-demographic data of the patients. The objective is to assist the physician in a rapid diagnosis at the time of arrival of the patient at the health unit by providing an auxiliary tool for decision making.

2 Materials and Methods 2.1 Feature Selection

Feature selection is a technique that is used to reduce the dimensionality of the data set, which leads to better learning performance and/or lower computational cost. This technique selects the most relevant attributes in the data set by removing noisy, irrelevant and redundant features (11). Different feature selection techniques can be found in the literature, and can be categorised according to the search strategy. There are three main approaches: filter, wrapper, and embedded (11).

In this work, the wrapper approach is used, since it makes use of a learning algorithm to determine the best subset of attributes, called features, where an evaluation is usually made in terms of predictive accuracy. Due to the use and dependence of a learning model, this type of approach can become computationally expensive, though the possibility of selecting irrelevant features is less likely (12). Among the wrapper techniques, we used the SFA. This technique has four different types, and each type differs in the way it selects or removes features from the data set: sequential forward selection (SFS), sequential backward selection (SBS), sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS).

2.2 Grid Search

Grid search is an exhaustive search technique for setting hyperparameters of a given model. With it, it is possible to analyse the results of a ML model, and then decide which configuration best fits the target problem. According to Bergstra and Bengio (13), despite having limitations, this technique is widely used along with the manual search technique.

2.3 Machine Learning Techniques

ML is a branch of artificial intelligence that is composed of several techniques that have been widely used for pattern learning (8, 14–18). The ML models used in this work are Random Forest (RF), Adaptative Boosting (Adaboost), Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (Xgboost), k-Nearest Neighbours (KNN), Naive Bayes (NB) and Multilayer Perceptron (MLP).

2.4 Evaluation Metrics

The following metrics are used: accuracy, precision, sensitivity and F1-Score. With the exception of accuracy, in the other metrics, the value of the metric in each class and the macro average of each one of them is also analysed.

2.5 Data Set

In this work, data regarding Dengue and Chikungunya notifications from the state of Amazonas and the city of Recife, Pernambuco from 2015 to 2020 are used. Regarding the state of Amazonas, data were retrieved from the Health Problem and Notification Information System, from Portuguese Sistema de Informação de Agravo de Notificação (SINAN)^⁵. SINAN is the official system for disease reporting in Brazil. Diseases from the national list of compulsory notification must be reported, and this list includes Dengue and Chikungunya. This data set contains 57,445 entries and 146 variables and hereafter is referred to as “SINAN-db”.

The data set for Recife was retrieved from an open data set named Portal de Dados Abertos do Recife (19), maintained by the Recife Health Department, whose primary source is also the SINAN, and therefore it follows the same dictionary pattern, and allows integration without further issues. This data set contains 83,073 registers and 124 variables and is referred to as “Recife-db” in this work.

Figure 1 illustrates the steps taken during the pre-processing of the data set. First, both data sets were integrated. Variables available in only one of the data sets were disregarded. The resulting data set from the integration of SINAN-db and Recife-db has 140,518 registers and 120 variables.

Figure 1

Data set pre-processing steps.

The output classes were grouped into three distinct classes:

DENGUE: Patients with confirmed Dengue;

CHIKUNGUNYA: Patients with confirmed Chikungunya; and

OTHERS: Patients classified as “inconclusive” or “negative” for both Dengue and Chikungunya.

Only records confirmed or denied by clinical diagnoses were selected. Registers that did not relate signs or symptoms were discarded since they are the most important information for classification models. Moreover, variables with more than 50% of data missing were also removed. Besides the original variables, a new one (DIAS) was created so that the time (in days) from onset of these symptoms to the date of notification could be added to the models. For the selection of attributes, specialists were consulted. After coding variables as numbers, duplicates were removed, and missing values were replaced by “not informed” for each variable. Registers with missing values for all variables were also removed. Finally, the clean data set consisted of 17,948 registers in the DENGUE class, 5,724 in the CHIKUNGUNYA class and 16,704 in the OTHERS class, totalling 40,376 registers with 27 variables. In data science, a higher number of registers of a specific class compared to another in the same data set is known as imbalance and it can bias the ML model, which favours the classification of the class that has the largest number of registers (20).

In order to balance the data set, the random undersampling technique was performed. In this technique, the class with the least number of registers defines the amount of the other classes, so that all classes have the same number of registers. After balancing, the data set still had 27 attributes and 17,172 records, with 5,724 for each of the three classes. The 27 variables resulting from the pre-processing are described in Table 1 . The data set can be accessed in Mendeley Data (21).

Table 1

Database attributes after pre-processing.

Attribute	Description
NU_IDADE_N	Patient age
CS_SEXO	Patient sex
CS_GESTANT	Gestational Age of the Patient (Quarter), in case CS_SEXO=F
CS_RACA	Patient Race
CS_ZONA	Residence area
FEBRE	Symptom - Fever
MIALGIA	Symptom - Myalgia
CEFALEIA	Symptom - Headache
EXANTEMA	Symptom - Rash
VOMITO	Symptom - Vomiting
NAUSEA	Symptom - Nausea
DOR_COSTAS	Symptom - Back Pain
CONJUNTVIT	Symptom - Conjunctivitis
ARTRITE	Symptom - Arthritis
ARTRALGIA	Symptom - Arthralgia
PETEQUIA_N	Symptom - Petechiae
LACO	Symptom - Tourniquet test
DOR_RETRO	Symptom - Eye pain
DIABETES	Pre-existing disease - Diabetes
HEMATOLOG	Pre-existing disease - Haematological diseases
HEPATOPAT	Pre-existing disease - Liver diseases
RENAL	Pre-existing disease - Kidney disease
HIPERTENSA	Pre-existing disease - Hypertension
ACIDO_PEPT	Pre-existing disease - Peptic acid disease
AUTO_IMUNE	Pre-existing disease - autoimmune disease
DIAS	Days that the patient is feeling the symptoms
CLASSI_FIN	Final patient classification

2.6 Experiments

The experiment is divided into three main steps: (a) optimisation of hyperparameters and attribute selection, using Grid Search and SFA; (b) evaluation of models performance; and (c) specialist evaluation.

2.6.1 Optimisation of Hyperparameters and Attribute Selection

The grid search technique was performed for each model individually and, on each model, not only were the combinations of the hyperparameters tested, but we also defined which SFA technique offers the best subset of attributes.

Figure 2 illustrates how the grid search process was executed considering the model’s hyperparameters together with the SFA techniques. We used the Python library sklearn GridSearchCV^⁶, using the training set (70% of the data set). The cross-validation technique (22) with k=10 was used. At the end of the grid search of each model, the result was the best combination of model hyperparameters and the best subset of data set attributes for the same configuration.

Figure 2

Grid Search flowchart with SFA.

Table 2 shows the hyperparameters of each model that were tested in the grid search and their respective value ranges. All models, except Xgboost, were executed using the Python library sklearn.

Table 2

Parameters used in Grid Search.

Model	Parameters	Values
Adaboost	learning_rate	[0.36, 1, 1.5]
Adaboost	n_estimators	[25, 50, 100]
RF	criterion	[gini, entropy]
RF	n_estimators	[50, 100, 200]
GBM	max_depth	[1, 3, 5]
GBM	n_estimators	[50, 100, 200]
Xgboost	eta	[0.3, 0.5]
Xgboost	max_depth	[2, 6]
KNN	metric	[euclidean, manhattan]
	n_neighbors	[2, 5, 10]
	weights	[uniform, distance]
MLP	hidden_layer_sizes	[(100), (100,100), (100,100,100)]
MLP	learning_rate_init	[0.001, 0.01, 0.1]

The Adaboost was executed with the AdaBoostClassifier^⁷ and two hyperparameters were tested: learning_rate and n_estimators. n_estimators is the maximum number of stumps that the model will produce in the training, and learning_rate is a weight applied to each stump at each iteration. A higher learning_rate increases the contribution of each classifier. The higher the learning_rate, the greater the contribution of stumps during training. Low values decrease correct classification, while high values are associated with model instability (23).

The RF was executed with the RandomForestClassifier^⁸ and two hyperparameters were tested: criterion and n_estimators. n_estimators, as in Adaboost, is the maximum number of Decision Tree (DT) that the model produces and criterion is the function that determines which are the best splits in each node.

The GBM was executed with the GradientBoostingClassifier^⁹, and two hyperparameters were tested, max_depth and n_estimators. max_depth is the level of depth that each DT within the model has. The higher the level, consequently, the more nodes the DT has. n_estimators, as in Adaboost and RF, is the maximum number of DT that the model produces.

The Xgboost was executed with the Python library XGBoost^¹⁰ and two hyperparameters were tested, max_depth and eta. max_depth, as in GBM, is the level of depth that each DT within the model has; and eta, also known as learning rate, is the shrinkage in update to prevent overfitting.

The KNN was executed with the KNeighborsClassifier^¹¹ and three hyperparameters were tested, namely, metric, n_neighbors and weights. n_neighbors is the number of neighbours that is used in the training. weights contains the function that determines the weights each neighbour has in the training, and metric is the function used to calculate the distance to each neighbour.

The MLP was executed with the MLPClassifier^¹² and two hyperparameters were tested, in this case, hidden_layer_sizes and learning_rate_init. hidden_layer_sizes defines the number of hidden layers and the number of neurons that each layer has. learning_rate_init is the value that determines how often the weights of each layer will be updated during training.

Lastly, the NB^¹³ was executed with the GaussianNB. As NB does not have hyperparameters, the Grid Search of this model was executed only with SFA techniques.

2.6.2 Evaluation of Models

After the execution of the grid search, the models were evaluated using the remaining 30% of the data set that was not part of the training, which was called the test set. The models were evaluated using the metrics described in subsection 2.4. The tests were executed 30 times and the metrics were averaged in order to be compared. The model chosen was the one that best fitted the needs of the experiment. After that, the model was submitted to specialists so that the application in the health care routine could be assessed.

3 Results

The scenario of low-income countries and limited-resource settings requires physicians to make a diagnosis often using only clinical parameters and without laboratory data support. ML techniques can aid in the classification of arboviral diseases using only these clinical parameters. Therefore this work evaluated seven ML techniques using only clinical and socio-demographic features.

Overall and per-disease baseline characteristics are presented in Table 3 . Baseline characteristics show an overall mean (SD) age over 30 years and a predominance of men and in urban areas for each arboviral disease. Fever (85.3%), headache (60.6%), myalgia (58.4%), and arthralgia (51.1%) were the most frequent symptoms.

Table 3

Clinical and socio-demographic findings of patients at baseline.

Variables	Total	Dengue	Chikungunya	Others
Variables	N=17172	N=5724	N=5724	N=5724
Gender Women, %	7267/17172 (42.3)	2540/5724 (44.4)	2200/5724 (38.4)	2527/5724 (44.1)
Age, Mean (SD)	32.6 (20.1)	31.0 (19.8)	36.6 (20.0)	30.1 (19.9)
Race, %
White	690/17172 (4.0)	223/5724 (3.9)	203/5724 (3.5)	264/5724 (4.6)
Black	156/17172 (0.9)	53/5724 (0.9)	56/5724 (1.0)	47/5724 (0.8)
Yellow	34/17172 (0.2)	10/5724 (0.2)	11/5724 (0.2)	13/5724 (0.2)
Admixed	5292/17172 (30.8)	1806/5724 (31.6)	954/5724 (16.7)	2532/5724 (44.2)
Indigenous	176/17172 (1.0)	104/5724 (1.8)	22/5724 (0.4)	50/5724 (0.9)
Missing	10824/17172 (63.0)	3528/5724 (61.6)	4478/5724 (78.2)	2818/5724 (49.2)
Pregnant, %
1st Quarter	53/17172 (0.3)	9/5724 (0.2)	13/5724 (0.2)	31/5724 (0.5)
2nd Quarter	77/17172 (0.4)	25/5724 (0.4)	22/5724 (0.4)	30/5724 (0.5)
3rd Quarter	75/17172 (0.4)	17/5724 (0.3)	27/5724 (0.5)	31/5724 (0.5)
Ignored gestational age	19/17172 (0.1)	4/5724 (0.1)	7/5724 (0.1)	8/5724 (0.1)
Missing	16948/17172 (98.7)	5669/5724 (99.0)	5655/5724 (98.8)	5624/5724 (98.3)
Residence area, %
Urban	14658/17172 (85.4)	4775/5724 (83.4)	5187/5724 (90.6)	4696/5724 (82.0)
Rural	175/17172 (1.0)	27/5724 (0.5)	9/5724 (0.2)	139/5724 (2.4)
Periurban	5/17172 (0.0)	2/5724 (0.0)	2/5724 (0.0)	1/5724 (0.0)
Missing	2334/17172 (13.6)	920/5724 (16.1)	526/5724 (9.2)	888/5724 (15.5)
Fever, %	14647/17172 (85.3)	5190/5724 (90.7)	5300/5724 (92.6)	4157/5724 (72.6)
Myalgia, %	10029/17172 (58.4)	3948/5724 (69.0)	3364/5724 (58.8)	2717/5724 (47.5)
Headache, %	10406/17172 (60.6)	4020/5724 (70.2)	3316/5724 (57.9)	3070/5724 (53.6)
Rash, %	4395/17172 (25.6)	1765/5724 (30.8)	1637/5724 (28.6)	993/5724 (17.3)
Vomit, %	3312/17172 (19.3)	1440/5724 (25.2)	992/5724 (17.3)	880/5724 (15.4)
Nausea, %	3517/17172 (20.5)	1610/5724 (28.1)	1076/5724 (18.8)	831/5724 (14.5)
Back pain, %	2612/17172 (15.2)	1088/5724 (19.0)	877/5724 (15.3)	647/5724 (11.3)
Conjunctivitis, %	678/17172 (3.9)	297/5724 (5.2)	222/5724 (3.9)	159/5724 (2.8)
Arthritis, %	1641/17172 (9.6)	638/5724 (11.1)	715/5724 (12.5)	288/5724 (5.0)
Arthralgia, %	8770/17172 (51.1)	2394/5724 (41.8)	4890/5724 (85.4)	1486/5724 (26.0)
Petechiae, %	802/17172 (4.7)	421/5724 (7.4)	211/5724 (3.7)	170/5724 (3.0)
Tourniquet test, %	290/17172 (1.7)	207/5724 (3.6)	38/5724 (0.7)	45/5724 (0.8)
Retroorbital pain, %	2555/17172 (14.9)	1407/5724 (24.6)	622/5724 (10.9)	526/5724 (9.2)
Diabetes, %	216/17172 (1.3)	57/5724 (1.0)	103/5724 (1.8)	56/5724 (1.0)
Haematological diseases, %	58/17172 (0.3)	22/5724 (0.4)	16/5724 (0.3)	20/5724 (0.3)
Liver diseases, %	72/17172 (0.4)	21/5724 (0.4)	25/5724 (0.4)	26/5724 (0.5)
Kidney disease, %	50/17172 (0.3)	10/5724 (0.2)	20/5724 (0.3)	20/5724 (0.3)
Hypertension, %	454/17172 (2.6)	128/5724 (2.2)	191/5724 (3.3)	135/5724 (2.4)
Peptic acid disease, %	97/17172 (0.6)	27/5724 (0.5)	28/5724 (0.5)	42/5724 (0.7)
Autoimmune disease, %	42/17172 (0.2)	10/5724 (0.2)	16/5724 (0.3)	16/5724 (0.3)
Symptom time in days, Mean (SD)	21.0 (217.3)	17.0 (32.8)	22.6 (58.2)	23.3 (370.5)

Our results are presented in three parts: (a) the results obtained from each model using grid search; (b) evaluation of the models using the configurations found by the grid search; and (c) comparison of the best model with a model designed with features selected by health specialists.

3.1 Grid Search

Table 4 presents the results from the Grid Search technique of the seven models: Adaboost, RF, GBM, Xgboost, KNN, MLP and NB.

Table 4

Results from Grid Search.

Model	Hyper parameters	QTD. Att	SFA	Accuracy
Adaboost	Learning_rate: 0.36	10	SBS	0.5972
Adaboost	n_estimators: 25	10	SBS	0.5972
RF	criterion: gini	16	SFFS	0.6061
RF	n_estimators: 200	16	SFFS	0.6061
GBM	max_depth: 3	18	SFFS	0.6218
GBM	n_estimators: 200	18	SFFS	0.6218
Xgboost	eta: 0.3	20	SFFS	0.6230
Xgboost	max_depth:2	20	SFFS	0.6230
KNN	metric: euclidean	19	SBS	0.5739
	n_neighbors: 2
	weights: uniform
MLP	hidden_layer_sizes: (100),	15	SFFS	0.6153
MLP	learning_rate_init: 0.1	15	SFFS	0.6153
NB	–	10	SBFS	0.585

Regarding SFA, the techniques that presented the best performance were SFFS and SBS. The size of the subset of attributes ranged between 10 and 20 attributes, and the most common attributes were CS_RACA, CS_ZONA, FEBRE, EXANTEMA, NAUSEA, ARTRALGIA, DOR_RETRO, which appeared in all subsets. Table 5 shows the attributes selected by the SFA techniques for each model.

Table 5

Attributes select by the SFA techniques for each model.

Model	Attributes
Adaboost	NU_IDADE_N, CS_RACA, CS_ZONA, FEBRE, CEFALEIA
Adaboost	EXANTEMA, NAUSEA, ARTRALGIA, LACO, DOR_RETRO
RF	CS_RACA, CS_ZONA, FEBRE, MIALGIA, CEFALEIA, EXANTEMA,
	NAUSEA, ARTRITE, ARTRALGIA, PETEQUIA_N, DOR_RETRO,
	DIABETES, HEMATOLOG, HEPATOPAT, RENAL, AUTO_IMUNE
GBM	CS_RACA, CS_ZONA, FEBRE, MIALGIA, CEFALEIA, EXANTEMA,
	NAUSEA, DOR_COSTAS, CONJUNTVIT, ARTRITE, ARTRALGIA,
	PETEQUIA_N, DOR_RETRO, DIABETES, HIPERTENSA,
	ACIDO_PEPT, AUTO_IMUNE, DIAS
Xgboost	NU_IDADE_N, CS_RACA, CS_ZONA, FEBRE, MIALGIA, CEFALEIA,
Xgboost	EXANTEMA, VOMITO, NAUSEA, DOR_COSTAS, CONJUNTVIT, ARTRITE, ARTRALGIA, PETEQUIA_N, DOR_RETRO, DIABETES, HEMATOLOG, HIPERTENSA, ACIDO_PEPT, DIAS
KNN	CS_GESTANT, CS_RACA, CS_ZONA, FEBRE, MIALGIA, CEFALEIA,
KNN	VOMITO, NAUSEA, DOR_COSTAS, CONJUNTVIT, ARTRITE, ARTRALGIA, PETEQUIA_N, LACO, DOR_RETRO, DIABETES, HEMATOLOG, HIPERTENSA, ACIDO_PEPT
MLP	CS_SEXO, CS_RACA, FEBRE, MIALGIA, CEFALEIA, EXANTEMA, VOMITO NAUSEA, ARTRALGIA, PETEQUIA_N, LACO, DOR_RETRO, DIABETES, HEMATOLOG, HEPATOPAT
NB	CS_RACA, CS_ZONA, FEBRE, MIALGIA, EXANTEMA, NAUSEA, ARTRALGIA, LACO, DOR_RETRO, ACIDO_PEPT

The model that best performed was the Xgboost model, using the SFFS technique with 20 attributes (the largest subset size in this experiment), eta = 0.3 and max_depth = 2, which obtained 62.3% accuracy. On the other hand, the KNN model with 19 attributes, selected by the SBS technique, metric = euclidian, n_neighbors = 2 and weights = uniform, was the worst model in the grid search, with 57.39% accuracy.

3.2 Evaluation of Models

Table 6 presents the results of accuracy and macro medians from recall, precision and F1-score. The GBM model outperformed all the models. It is interesting to note that the MLP model showed poor performance in comparison with the result it presented in the grid search. This difference may indicate that the MLP model failed to generalize the data during training and underfitting probably occurred and, as consequence, the MLP model did not performed well when using the test set.

Table 6

The result from accuracy and macro median of recall, precision, and F1-score.

Model	Accuracy	Recall	Precision	F1-score
Adaboost	0.5879	0.5903	0.5837	0.5782
RF	0.6011	0.6033	0.5965	0.5949
GBM	0.6240	0.6257	0.6205	0.6196
Xgboost	0.6153	0.6173	0.6116	0.6093
KNN	0.5411	0.5410	0.5519	0.5222
MLP	0.5380	0.5424	0.5569	0.4967
NB	0.5798	0.5833	0.5782	0.5704