List of Contributors xix Series Preface xxiii Preface xxv 1 Intelligent Data Analysis: Black Box Versus White Box Modeling 1Sarthak Gupta, Siddhant Bagga, and Deepak Kumar Sharma 1.1 Introduction 1 1.1.1 Intelligent Data Analysis 1 1.1.2 Applications of IDA and Machine Learning 2 1.1.3 White Box Models Versus Black Box Models 2 1.1.4 Model Interpretability 3 1.2 Interpretation of White Box Models 3 1.2.1 Linear Regression 3 1.2.2 Decision Tree 5 1.3 Interpretation of Black Box Models 7 1.3.1 Partial Dependence Plot 7 1.3.2 Individual Conditional Expectation 9 1.3.3 Accumulated Local Effects 9 1.3.4 Global Surrogate Models 12 1.3.5 Local Interpretable Model-Agnostic Explanations 12 1.3.6 Feature Importance 12 1.4 Issues and Further Challenges 13 1.5 Summary 13 References 14 2 Data: Its Nature and Modern Data Analytical Tools 17Ravinder Ahuja, Shikhar Asthana, Ayush Ahuja, and Manu Agarwal 2.1 Introduction 17 2.2 Data Types and Various File Formats 18 2.2.1 Structured Data 18 2.2.2 Semi-Structured Data 20 2.2.3 Unstructured Data 20 2.2.4 Need for File Formats 21 2.2.5 Various Types of File Formats 22 2.2.5.1 Comma Separated Values (CSV) 22 2.2.5.2 ZIP 22 2.2.5.3 Plain Text (txt) 23 2.2.5.4 JSON 23 2.2.5.5 XML 23 2.2.5.6 Image Files 24 2.2.5.7 HTML 24 2.3 Overview of Big Data 25 2.3.1 Sources of Big Data 27 2.3.1.1 Media 27 2.3.1.2 The Web 27 2.3.1.3 Cloud 27 2.3.1.4 Internet of Things 27 2.3.1.5 Databases 27 2.3.1.6 Archives 28 2.3.2 Big Data Analytics 28 2.3.2.1 Descriptive Analytics 28 2.3.2.2 Predictive Analytics 28 2.3.2.3 Prescriptive Analytics 29 2.4 Data Analytics Phases 29 2.5 Data Analytical Tools 30 2.5.1 Microsoft Excel 30 2.5.2 Apache Spark 33 2.5.3 Open Refine 34 2.5.4 R Programming 35 2.5.4.1 Advantages of R 36 2.5.4.2 Disadvantages of R 36 2.5.5 Tableau 36 2.5.5.1 How TableauWorks 36 2.5.5.2 Tableau Feature 37 2.5.5.3 Advantages 37 2.5.5.4 Disadvantages 37 2.5.6 Hadoop 37 2.5.6.1 Basic Components of Hadoop 38 2.5.6.2 Benefits 38 2.6 Database Management System for Big Data Analytics 38 2.6.1 Hadoop Distributed File System 38 2.6.2 NoSql 38 2.6.2.1 Categories of NoSql 39 2.7 Challenges in Big Data Analytics 39 2.7.1 Storage of Data 40 2.7.2 Synchronization of Data 40 2.7.3 Security of Data 40 2.7.4 Fewer Professionals 40 2.8 Conclusion 40 References 41 3 Statistical Methods for Intelligent Data Analysis: Introduction and Various Concepts 43Shubham Kumaram, Samarth Chugh, and Deepak Kumar Sharma 3.1 Introduction 43 3.2 Probability 43 3.2.1 Definitions 43 3.2.1.1 Random Experiments 43 3.2.1.2 Probability 44 3.2.1.3 Probability Axioms 44 3.2.1.4 Conditional Probability 44 3.2.1.5 Independence 44 3.2.1.6 Random Variable 44 3.2.1.7 Probability Distribution 45 3.2.1.8 Expectation 45 3.2.1.9 Variance and Standard Deviation 45 3.2.2 Bayes’ Rule 45 3.3 Descriptive Statistics 46 3.3.1 Picture Representation 46 3.3.1.1 Frequency Distribution 46 3.3.1.2 Simple Frequency Distribution 46 3.3.1.3 Grouped Frequency Distribution 46 3.3.1.4 Stem and Leaf Display 46 3.3.1.5 Histogram and Bar Chart 47 3.3.2 Measures of Central Tendency 47 3.3.2.1 Mean 47 3.3.2.2 Median 47 3.3.2.3 Mode 47 3.3.3 Measures of Variability 48 3.3.3.1 Range 48 3.3.3.2 Box Plot 48 3.3.3.3 Variance and Standard Deviation 48 3.3.4 Skewness and Kurtosis 48 3.4 Inferential Statistics 49 3.4.1 Frequentist Inference 49 3.4.1.1 Point Estimation 50 3.4.1.2 Interval Estimation 50 3.4.2 Hypothesis Testing 51 3.4.3 Statistical Significance 51 3.5 Statistical Methods 52 3.5.1 Regression 52 3.5.1.1 Linear Model 52 3.5.1.2 Nonlinear Models 52 3.5.1.3 Generalized Linear Models 53 3.5.1.4 Analysis of Variance 53 3.5.1.5 Multivariate Analysis of Variance 55 3.5.1.6 Log-Linear Models 55 3.5.1.7 Logistic Regression 56 3.5.1.8 Random Effects Model 56 3.5.1.9 Overdispersion 57 3.5.1.10 Hierarchical Models 57 3.5.2 Analysis of Survival Data 57 3.5.3 Principal Component Analysis 58 3.6 Errors 59 3.6.1 Error in Regression 60 3.6.2 Error in Classification 61 3.7 Conclusion 61 References 61 4 Intelligent Data Analysis with Data Mining: Theory and Applications 63Shivam Bachhety, Ramneek Singhal, and Rachna Jain Objective 63 4.1 Introduction to Data Mining 63 4.1.1 Importance of Intelligent Data Analytics in Business 64 4.1.2 Importance of Intelligent Data Analytics in Health Care 65 4.2 Data and Knowledge 65 4.3 Discovering Knowledge in Data Mining 66 4.3.1 Process Mining 67 4.3.2 Process of Knowledge Discovery 67 4.4 Data Analysis and Data Mining 69 4.5 Data Mining: Issues 69 4.6 Data Mining: Systems and Query Language 71 4.6.1 Data Mining Systems 71 4.6.2 Data Mining Query Language 72 4.7 Data Mining Methods 73 4.7.1 Classification 74 4.7.2 Cluster Analysis 75 4.7.3 Association 75 4.7.4 Decision Tree Induction 76 4.8 Data Exploration 77 4.9 Data Visualization 80 4.10 Probability Concepts for Intelligent Data Analysis (IDA) 83 Reference 83 5 Intelligent Data Analysis: Deep Learning and Visualization 85Than D. Le and Huy V. Pham 5.1 Introduction 85 5.2 Deep Learning and Visualization 86 5.2.1 Linear and Logistic Regression and Visualization 86 5.2.2 CNN Architecture 89 5.2.2.1 Vanishing Gradient Problem 90 5.2.2.2 Convolutional Neural Networks (CNNs) 91 5.2.3 Reinforcement Learning 91 5.2.4 Inception and ResNet Networks 93 5.2.5 Softmax 94 5.3 Data Processing and Visualization 97 5.3.1 Regularization for Deep Learning and Visualization 98 5.3.1.1 Regularization for Linear Regression 98 5.4 Experiments and Results 102 5.4.1 Mask RCNN Based on Object Detection and Segmentation 102 5.4.2 Deep Matrix Factorization 108 5.4.2.1 Network Visualization 108 5.4.3 Deep Learning and Reinforcement Learning 111 5.5 Conclusion 112 References 113 6 A Systematic Review on the Evolution of Dental Caries Detection Methods and Its Significance in Data Analysis Perspective 115Soma Datta, Nabendu Chaki, and Biswajit Modak 6.1 Introduction 115 6.1.1 Analysis of Dental Caries 115 6.2 Different Caries Lesion Detection Methods and Data Characterization 119 6.2.1 Point Detection Method 120 6.2.2 Visible Light Property Method 121 6.2.3 Radiographs 121 6.2.4 Light-Emitting Devices 123 6.2.5 Optical Coherent Tomography (OCT) 125 6.2.6 Software Tools 125 6.3 Technical Challenges with the Existing Methods 126 6.3.1 Challenges in Data Analysis Perspective 127 6.4 Result Analysis 129 6.5 Conclusion 129 Acknowledgment 131 References 131 7 Intelligent Data Analysis Using Hadoop Cluster – Inspired MapReduce Framework and Association Rule Mining on Educational Domain 137Pratiyush Guleria and Manu Sood 7.1 Introduction 137 7.1.1 Research Areas of IDA 138 7.1.2 The Need for IDA in Education 139 7.2 Learning Analytics in Education 139 7.2.1 Role of Web-Enabled and Mobile Computing in Education 141 7.2.2 Benefits of Learning Analytics 142 7.2.3 Future Research Directions of IDA 142 7.3 Motivation 142 7.4 Literature Review 143 7.4.1 Association Rule Mining and Big Data 143 7.5 Intelligent Data Analytical Tools 145 7.6 Intelligent Data Analytics Using MapReduce Framework in an Educational Domain 149 7.6.1 Data Description 149 7.6.2 Objective 150 7.6.3 Proposed Methodology 150 7.6.3.1 Stage 1 Map Reduce Algorithm 150 7.6.3.2 Stage 2 Apriori Algorithm 150 7.7 Results 151 7.8 Conclusion and Future Scope 153 References 153 8 Influence of Green Space on Global Air Quality Monitoring: Data Analysis Using K-Means Clustering Algorithm 157Gihan S. Pathirana and Malka N. Halgamuge 8.1 Introduction 157 8.2 Material and Methods 159 8.2.1 Data Collection 159 8.2.2 Data Inclusion Criteria 159 8.2.3 Data Preprocessing 159 8.2.4 Data Analysis 161 8.3 Results 161 8.4 Quantitative Analysis 163 8.4.1 K-Means Clustering 163 8.4.2 Level of Difference of Green Area 167 8.5 Discussion 167 8.6 Conclusion 169 References 170 9 IDA with Space Technology and Geographic Information System 173Bright Keswani, Tarini Ch. Mishra, Ambarish G. Mohapatra, Poonam Keswani, Priyatosh Sahu, and Anish Kumar Sarangi 9.1 Introduction 173 9.1.1 Real-Time in Space 176 9.1.2 Generating Programming Triggers 178 9.1.3 Analytical Architecture 178 9.1.4 Remote Sensing Big Data Acquisition Unit (RSDU) 180 9.1.5 Data Processing Unit 180 9.1.6 Data Analysis and Decision Unit 181 9.1.7 Analysis 181 9.1.8 Incorporating Machine Learning and Artificial Intelligence 181 9.1.8.1 Methodologies Applicable 182 9.1.8.2 Support Vector Machines (SVM) and Cross-Validation 182 9.1.8.3 Massively Parallel Computing and I/O 183 9.1.8.4 Data Architecture and Governance 183 9.1.9 Real-Time Spacecraft Detection 185 9.1.9.1 Active Phased Array 186 9.1.9.2 Relay Communication 186 9.1.9.3 Low-Latency Random Access 186 9.1.9.4 Channel Modeling and Prediction 186 9.2 Geospatial Techniques 187 9.2.1 The Big-GIS 187 9.2.2 Technologies Applied 187 9.2.2.1 Internet of Things and Sensor Web 188 9.2.2.2 Cloud Computing 188 9.2.2.3 Stream Processing 188 9.2.2.4 Big Data Analytics 188 9.2.2.5 Coordinated Observation 188 9.2.2.6 Big Geospatial Data Management 189 9.2.2.7 Parallel Geocomputation Framework 189 9.2.3 Data Collection Using GIS 189 9.2.3.1 NoSQL Databases 190 9.2.3.2 Parallel Processing 190 9.2.3.3 Knowledge Discovery and Intelligent Service 190 9.2.3.4 Data Analysis 191 9.3 Comparative Analysis 192 9.4 Conclusion 192 References 194 10 Application of Intelligent Data Analysis in Intelligent Transportation System Using IoT 199Rakesh Roshan and Om Prakash Rishi 10.1 Introduction to Intelligent Transportation System (ITS) 199 10.1.1 Working of Intelligent Transportation System 201 10.1.2 Services of Intelligent Transportation System 201 10.1.3 Advantages of Intelligent Transportation System 203 10.2 Issues and Challenges of Intelligent Transportation System (ITS) 204 10.2.1 Communication Technology Used Currently in ITS 205 10.2.2 Challenges in the Implementation of ITS 206 10.2.3 Opportunity for Popularity of Automated/Autonomous/Self-Driving Car or Vehicle 207 10.3 Intelligent Data Analysis Makes an IoT-Based Transportation System Intelligent 208 10.3.1 Introduction to Intelligent Data Analysis 208 10.3.2 How IDA Makes IoT-Based Transportation Systems Intelligent 210 10.3.2.1 Traffic Management Through IoT and Intelligent Data Analysis 210 10.3.2.2 Tracking of Multiple Vehicles 211 10.4 Intelligent Data Analysis for Security in Intelligent Transportation System 212 10.5 Tools to Support IDA in an Intelligent Transportation System 215 References 217 11 Applying Big Data Analytics on Motor Vehicle Collision Predictions in New York City 219Dhanushka Abeyratne and Malka N. Halgamuge 11.1 Introduction 219 11.1.1 Overview of Big Data Analytics on Motor Vehicle Collision Predictions 219 11.2 Materials and Methods 220 11.2.1 Collection of Raw Data 220 11.2.2 Data Inclusion Criteria 220 11.2.3 Data Preprocessing 220 11.2.4 Data Analysis 221 11.3 Classification Algorithms and K-Fold Validation Using Data Set Obtained from NYPD (2012–2017) 223 11.3.1 Classification Algorithms 223 11.3.1.1 k-Fold Cross-Validation 223 11.3.2 Statistical Analysis 225 11.4 Results 225 11.4.1 Measured Processing Time and Accuracy of Each Classifier 225 11.4.2 Measured p-Value in each Vehicle Group Using K-Means Clustering/One-Way ANOVA 227 11.4.3 Identified High Collision Concentration Locations of Each Vehicle Group 229 11.4.4 Measured Different Criteria for Further Analysis of NYPD Data Set (2012–2017) 229 11.5 Discussion 233 11.6 Conclusion 237 References 238 12 A Smart and Promising Neurological Disorder Diagnostic System: An Amalgamation of Big Data, IoT, and Emerging Computing Techniques 241Prableen Kaur and Manik Sharma 12.1 Introduction 241 12.1.1 Difference Between Neurological and Psychological Disorders 241 12.2 Statistics of Neurological Disorders 243 12.3 Emerging Computing Techniques 244 12.3.1 Internet of Things 244 12.3.2 Big Data 245 12.3.3 Soft Computing Techniques 245 12.4 Related Works and Publication Trends of Articles 249 12.5 The Need for Neurological Disorders Diagnostic System 251 12.5.1 Design of Smart and Intelligent Neurological Disorders Diagnostic System 251 12.6 Conclusion 259 References 260 13 Comments-Based Analysis of a Bug Report Collection System and Its Applications 265Arvinder Kaur and Shubhra Goyal 13.1 Introduction 265 13.2 Background 267 13.2.1 Issue Tracking System 267 13.2.2 Bug Report Statistics 267 13.3 Related Work 268 13.3.1 Data Extraction Process 268 13.3.2 Applications of Bug Report Comments 270 13.3.2.1 Bug Summarization 270 13.3.2.2 Emotion Mining 271 13.4 Data Collection Process 272 13.4.1 Steps of Data Extraction 273 13.4.2 Block Diagram for Data Extraction 274 13.4.3 Reports Generated 274 13.4.3.1 Bug Attribute Report 274 13.4.3.2 Long Description Report 275 13.4.3.3 Bug Comments Reports 275 13.4.3.4 Error Report 275 13.5 Analysis of Bug Reports 275 13.5.1 Research Question 1: Is the Performance of Software Affected by Open Bugs that are Critical in Nature? 275 13.5.2 Research Question 2: How Can Test Leads Improve the Performance of Software Systems? 277 13.5.3 Research Question 3: Which Are the Most Error-Prone Areas that Can Cause System Failure? 277 13.5.4 Research Question 4: Which Are the Most Frequent Words and Keywords to Predict Most Critical Bugs? 279 13.5.5 Research Questions 5: What is the Importance of Frequent Words Mined from Bug Reports? 281 13.6 Threats to Validity 284 13.7 Conclusion 284 References 286 14 Sarcasm Detection Algorithms Based on Sentiment Strength 289Pragya Katyayan and Nisheeth Joshi 14.1 Introduction 289 14.2 Literature Survey 291 14.3 Experiment 294 14.3.1 Data Collection 294 14.3.2 Finding SentiStrengths 294 14.3.3 Proposed Algorithm 295 14.3.4 Explanation of the Algorithms 297 14.3.5 Classification 300 14.3.5.1 Explanation 300 14.3.6 Evaluation 302 14.4 Results and Evaluation 303 14.5 Conclusion 305 References 305 15 SNAP: Social Network Analysis Using Predictive Modeling 307Samridhi Seth and Rahul Johari 15.1 Introduction 307 15.1.1 Types of Predictive Analytics Models 307 15.1.2 Predictive Analytics Techniques 308 15.1.2.1 Regression Techniques 308 15.1.2.2 Machine Learning Techniques 308 15.2 Literature Survey 309 15.3 Comparative Study 313 15.4 Simulation and Analysis 313 15.4.1 Few Analyses Made on the Data Set Are Given Below 314 15.4.1.1 Duration of Each Contact Was Found 314 15.4.1.2 Total Number of Contacts of Source Node with Destination Node Was Found for all Nodes 314 15.4.1.3 Total Duration of Contact of Source Node with Each Node Was Found 315 15.4.1.4 Mobility Pattern Describes Direction of Contact and Relation Between Number of Contacts and Duration of Contact 315 15.4.1.5 Unidirectional Contact, that is, Only 1 Node is Contacting Second Node but Vice Versa is Not There 317 15.4.1.6 Graphical Representation for the Duration of Contacts with Each Node is Given below 317 15.4.1.7 Rank and Percentile for Number of Contacts with Each Node 320 15.4.1.8 Data Set is Described for Three Days Where Time is Calculated in Seconds. Data Set can be Divided Into Three Days. Some of the Analyses Conducted on the Data set Day Wise Are Given Below 326 15.5 Conclusion and Future Work 329 References 329 16 Intelligent Data Analysis for Medical Applications 333Moolchand Sharma, Vikas Chaudhary, Prerna Sharma, and R. S. Bhatia 16.1 Introduction 333 16.1.1 IDA (Intelligent Data Analysis) 335 16.1.1.1 Elicitation of Background Knowledge 337 16.1.2 Medical Applications 337 16.2 IDA Needs in Medical Applications 338 16.2.1 Public Health 339 16.2.2 Electronic Health Record 339 16.2.3 Patient Profile Analytics 339 16.2.3.1 Patient’s Profile 339 16.3 IDA Methods Classifications 339 16.3.1 Data Abstraction 339 16.3.2 Data Mining Method 340 16.3.3 Temporal Data Mining 341 16.4 Intelligent Decision Support System in Medical Applications 341 16.4.1 Need for Intelligent Decision System (IDS) 342 16.4.2 Understanding Intelligent Decision Support: Some Definitions 342 16.4.3 Advantages/Disadvantages of IDS 344 16.5 Conclusion 345 References 345 17 Bruxism Detection Using Single-Channel C4-A1 on Human Sleep S2 Stage Recording 347Md Belal Bin Heyat, Dakun Lai, Faijan Akhtar, Mohd Ammar Bin Hayat, Shafan Azad, Shadab Azad, and Shajan Azad 17.1 Introduction 347 17.1.1 Side Effect of Poor Snooze 348 17.2 History of Sleep Disorder 349 17.2.1 Classification of Sleep Disorder 349 17.2.2 Sleep Stages of the Human 351 17.3 Electroencephalogram Signal 351 17.3.1 Electroencephalogram Generation 351 17.3.1.1 Classification of Electroencephalogram Signal 352 17.4 EEG Data Measurement Technique 352 17.4.1 10–20 Electrode Positioning System 352 17.4.1.1 Procedure of Electrode placement 353 17.5 Literature Review 354 17.6 Subjects and Methodology 354 17.6.1 Data Collection 354 17.6.2 Low Pass Filter 355 17.6.3 Hanning Window 355 17.6.4 Welch Method 356 17.7 Data Analysis of the Bruxism and Normal Data Using EEG Signal 356 17.8 Result 358 17.9 Conclusions 361 Acknowledgments 363 References 364 18 Handwriting Analysis for Early Detection of Alzheimer’s Disease 369Rajib Saha, Anirban Mukherjee, Aniruddha Sadhukhan, Anisha Roy, and Manashi De 18.1 Introduction and Background 369 18.2 Proposed Work and Methodology 376 18.3 Results and Discussions 379 18.3.1 Character Segmentation 380 18.4 Conclusion 384 References 385 Index 387
Les mer