Big Data and Deep Learning

DiagSoftfailure: Automated Soft-Failure Diagnostic Tool Using Machine Learning for Network Users

As increasing individuals and organizations move their activities and services online, network performance problems resulting in slow data communication speed become a significant obstacle for satisfactory user experience. Currently, there is a lack of a fully automated tool that can help network users find the complicated network problems that degrade the performance of network applications. DiagSoftfailure (Automated Soft-Failure Diagnostic Tool) fills this gap by promoting machine learning technology for Network Users to infer the location and root cause of network failures that result in performance degradation.

Upon the user's request via the web browser, DiagSoftfailure server deployed at the border router between the campus/enterprise LAN and the backbone network collects and analyzes the packet trace corresponding to the target application for soft-failure diagnosis. It first utilizes open-source TCP trace analysis software (i.e., libpcap and tcptrace) to obtain the raw features of the network behavior. Then, those raw features are further processed to extract network signatures that can provide sufficient distinction for an effective and reliable diagnosis. Based on the network signature, automated classifiers trained by combining supervised and semi-supervised machine learning are used to identify both known and unknown soft-failures in the network. Finally, the DiagSoftfailure server sends the network user a diagnosis report, which allows novice and expert users to view and understand the network condition and problems.

DiagSoftfailure provides the capabilities for automated network soft-failure diagnosis with the following features:

A user-focused diagnosis requires no cooperation with the network manager
An adaptive network signature that is robust against data inconsistency and high-dimensionality of network behavior data ensures high diagnosis accuracy
Capable of identifying unknown faults by combining supervised and unsupervised machine learning
Requires no changes in the OS system kernel and allows implementation flexibility
A diagnosis report groups test results into different categories in a comprehensive format and can be understood by novice users

All these features are useful for quickly and easily identifying a specific set of conditions that impact the network performance. It can help users find the root cause of the network performance degradation. It can assist the user and network administrator in rapidly resolving the network problem and improving connection speeds and alleviating user dissatisfaction while reducing the network administrative cost.

DB&EL: Threat Classification and Reasoning by Dynamic Bayesian Network, Description Logic, and Intuitionistic Fuzzy Sets

Missile defense faces new challenges for national defense due to a spectrum of increasingly capable air and missile delivery systems and the means to counter them. Complex and integrated attacks are possible using a set of guided rockets, artillery, and mortars; unmanned aerial vehicles (UAVs); a range of land attack and anti-ship ballistic and cruise missiles; increasingly maneuverable ballistic missile reentry vehicles; hypersonic glide vehicles; anti-satellite weapons; and active air and missile defense interceptors. Consequently, understanding and classifying inbound sophisticated weapons quickly provides critical information necessary to perform optimal defense through precision assignment and targeting of the highest priority objects. However, current defense systems are incapable of dealing with the contextual uncertainty imposed by target proximity features (e.g., the time before hit), capability features (e.g., envelope), human judgment (e.g., target's intent), and relevant social features as a whole to yield optimal response during the full stages of missile defense.

DB&EL (Threat Classification and Reasoning by Dynamic Bayesian Network, Description Logic, and Intuitionistic Fuzzy Sets) is a dynamic reasoning technology to provide threat classification refinement with advanced contextual deep learning capabilities. It takes all the sensing data, text/messages, and human judgments into account for optimal defense decision-making through the full life cycle of a threat. Advanced contextual reasoning is achieved by leveraging Bayesian theory, dynamic graphs, Description Logic (DL), text mining, and Fuzzy technologies. Dynamic Bayesian Network (DBN) can derive threat classification or degrees of threat confidence that allow probabilistic event reasoning (e.g., cause and consequence probability) while fusing the spatial and temporal dependent features of observations. Influence Diagram (ID) with DL is empowered for sophisticated contextual reasoning, such as the likelihood of a consequence to hold of a threat giving the context, and the optimistic and pessimistic outcome of a responding strategy. It then performs a risk assessment to sort the degree of threats if multiple threats appear such that defending resources can be optimally allocated.

DB&EL aligns well with missile defense applications such as the ground midcourse defense system, offering value propositions from aspects of classification refinement and strategy optimization. DB&EL can be beneficial for clinical reasoning and diagnosis of Depression, Bipolar Disorder, Schizophrenia, Dementia, Post-traumatic Stress Disorder, etc., where these diseases exhibit symptom uncertainty. It can also be used for terrorist identification, cybersecurity risk assessment, disaster management, and other applications where contextual information should be incorporated for reasoning under uncertainty.

COMAND: Context-Aware Machine Learning Middleware for Real-Time Distributed Streaming Big Data Analysis

Emerging applications and instrumental systems produce massive streaming data every day. It is essential that the big streaming data be quickly and accurately processed and analyzed to extract useful information for timely decision making. Despite substantial work being proposed for streaming data processing and analysis, e.g., SAMOA, Jubatus, etc., many shortcomings have not been well addressed. InfoBeyond advocates Context-aware Machine Learning Middleware for Real-time Distributed Streaming Big Data Analysis (COMAND) to address the challenges of distributed massive streaming data analysis. COMAND includes machine learning (ML) based algorithms that are designed to facilitate its implementation and integration with the existing infrastructures and algorithms. The operation is implemented via a two-stage operational architecture for efficient and accurate processing of massive streaming data. We will first design and develop a middleware architecture that enables efficient and scalable capabilities in distributed environments. Next, we will analyze and develop three key algorithms, namely, Distributed Streaming Data Clustering (DISDC), Distributed Optimal Context-Aware Data Classification (DOCDC), and Real-time Publish/Subscribe Data Model Update (RPSDMU) for the middleware. The system is wrapped by APIs and Adapters for facilitating the implementation and integration of COMAND with the existing systems.

DB&EL: Threat Classification and Reasoning by Dynamic Bayesian Network, Description Logic, and Intuitionistic Fuzzy Sets

AnomLoc: A perfSONAR-Based Distributed Network Anomaly Detection and Localization

Data-intensive scientific applications incline to high-performance computing which is getting more and more widespread in supercomputing centers, research laboratories, and universities. DoE and many organizations need an automatic and adaptive network analysis tool for effective anomaly detection and localization in the high-speed network. Currently, approaches, Pythia, APD, etc., are unable to provide such a function effectively, accurately, and user-friendly. InfoBeyond advocates a perfSONAR-based distributed network anomaly detection and localization scheme (AnomLoc) to address the challenges of performance problem diagnosis in distributed data-intensive-oriented networks by relying on perfSONAR legacy measurement infrastructure. AnomLoc includes Q-statistics and convex-optimization-based algorithms that are designed as pluggable tools to facilitate its implementation and integration with the existing infrastructures and algorithms. It is to be developed as a pluggable tool for real-time distributed anomaly detection and localization. In the first step, we perform data acquisition and purging on perfSONAR measurement data to filter out the misleading data. In the second step, we carry out anomaly detection and localization on the pruned data by relying on four key algorithms, namely, Sparse Principal Component Analysis (SPCA), Graph-based SPCA (GSPCA), Karhunen-Loeve-based SPCA (KLSPCA), and Graph-based KLSPCA (GKLSPCA) for anomaly localization. Additionally, we also develop APIs for the AnomLoc modules for easy implementation and smooth integration with the existing perfSONAR infrastructure.

Embedded Space Analytics

The U.S.Navy was in search for a real-time graph embedding tool for analyzing huge graphs (millions of nodes and billions of edges) from diverse sources. However, current approaches cannot provide dynamic and scalable graph analytics to show the military value of tactical data. In this project, InfoBeyond advocates EStreaming (Embedding & Streaming) for scalable and efficient graph streaming. EStreaming promotes big data streaming technology where unsupervised and semi-supervised machine learning algorithms can be conducted over the streaming platform.

It can split a huge graph into small subgraphs such that distributed graph embedding can be conducted in parallel among a set of processors. Meanwhile, the graph embedding can be effectively merged and visualized. Considering the diversity of Navy applications, EStreaming is an open platform that can implement many graph embedding algorithms such as IsoMap, LLE, Laplacian eigenmaps, and graph factorizations. We have demonstrated the implementation of Distributed LINE (DLINE) and Distributed MVE (DMVE). Compared to other algorithms, classification of DLINE can be conducted from the internal relations and the similarity among the persons in the solider, enemy, and other social networks. Differently, DMVE can be used for analyzing spatial-temporal data such as for ISR sensor networks, e.g., Navy ISR geographic traffic monitoring.

InfoBeyond

Contact Info

Learn More

Follow Us

Our Strength