Using topological data analysis for diagnosis pulmonary embolism

Matteo Rucco, Emanuela Merelli, Damir Herman, Devi Ramanan, Tanya Petrossian, Lorenzo Falsetti, Cinzia Nitti, Aldo Salvi

Download PDF


M. Rucco, E. Merelli, D. Herman, D. Ramanan, T. Petrossian, L. Falsetti, C. Nitti, A. Salvi, "Using topological data analysis for diagnosis pulmonary embolism", Journal of Theoretical and Applied Computer Science, vol. 9, no. 1, pp. 41-55, 2015.

Export to BibTex


Clinical Prediction Rule (CPR), Pulmonary Embolism, Topological Data Analysis, Artificial Neural Network (ANN), Computer Aided Diagnosis (CAD)


Pulmonary Embolism (PE) is a common and potentially lethal condition. Most patients die within the first few hours from the event. Despite diagnostic advances, delays and underdiagnosis in PE are common. Moreover, many investigations pursued in the suspect of PE result negative and no more than 10% of the pulmonary angio-CT scan performed to confirm PE confirm the suspected diagnosis. To increase the diagnostic performance in PE, current diagnostic work-up of patients with suspected acute pulmonary embolism usually starts with the assessment of clinical pretest probability using plasma d-Dimer measurement and clinical prediction rules. One of the most validated and widely used clinical decision rules are the Wells and Geneva Revised scores. However, both indices have limitations. We aimed to develop a new clinical prediction rule (CPR) for PE based on a new approach for features selection based on topological concepts and artificial neural network. Filter or wrapper methods for features reduction cannot be applied to our dataset: the application of these algorithms can only be performed on datasets without missing data. Alternatively, eliminating rows with null values in the dataset would reduce the sample size significantly and result in a covariance matrix that is singular. Instead, we applied Topological data analysis (TDA) to overcome the hurdle of processing datasets with null values missing data. A topological network was developed using the Ayasdi-Iris software (Ayasdi, Inc., Palo Alto). The PE patient topology identified two flares in the pathological group and hence two distinct clusters of PE patient populations. Additionally, the topological network detected several sub-groups among healthy patients that likely are affected with non-PE diseases. To be diagnosed properly even though they are not affected by PE, in a next study we will introduce also the survival curves for the patients. TDA was further utilized to identify key features which are best associated as diagnostic factors for PE and used this information to define the input space for a back-propagation artificial neural network (BP-ANN). It is shown that the area under curve (AUC) of BP-ANN is greater than the AUCs of the scores (Wells and revised Geneva) used among physicians. The results demonstrate topological data analysis and the BP-ANN, when used in combination, can produce better predictive models than Wells or revised Geneva scores system for the analyzed cohort. The new CPR can help physicians to predict the probability of PE.