• Big data streaming
  • Context-aware collaborative learning
  • Graph embedding and learning
  • Attack detection and network reconfiguration
  • Network Abnormal Detection and Localization by Learning

Embedded Space Analytics

Navy needs a real-time graph embedding tool for analyzing huge graphs (millions of nodes and billions of edges) from diverse sources. However, current approaches cannot provide dynamic and scalable graph analytics to show the military value of tactical data. In this project, InfoBeyond advocates EStreaming (Embedding & Streaming) for scalable and efficient graph streaming. EStreaming promotes big data streaming technology where unsupervised and semi-supervised machine learning algorithms can be conducted over the streaming platform.

It can split a huge graph into small subgraphs such that distributed graph embedding can be conducted in parallel among a set of processors. Meanwhile, the graph embedding can be effectively merged and visualized. Considering the diversity of Navy applications, EStreaming is an open platform that can implement many graph embedding algorithms such as IsoMap, LLE, Laplacian eigenmaps, and graph factorizations. We have demonstrated the implementation of Distributed LINE (DLINE) and Distributed MVE (DMVE). Compared to other algorithms, classification of DLINE can be conducted from the internal relations and the similarity among the persons in the solider, enemy, and other social networks. Differently, DMVE can be used for analyzing spatial-temporal data such as for ISR sensor networks, e.g., Navy ISR geographic traffic monitoring.

COMAND: Context-aware Machine Learning Middleware for Real-time Distributed Streaming Big Data Analysis

Emerging applications and instrumental systems produce massive streaming data every day. It is essential that the big streaming data be quickly and accurately processed and analyzed to extract useful information for timely decision making. Despite substantial work has been proposed for streaming data processing and analysis, e.g., SAMOA, Jubatus, etc., many shortcomings have not well addressed. Statement of How this Problem or Situation is Being Addressed: InfoBeyond advocates Context-aware Machine Learning Middleware for Real-time Distributed Streaming Big Data Analysis (COMAND) to address the challenges of distributed massive streaming data analysis. COMAND includes machine learning (ML) based algorithms that are designed to facilitate its implementation and integration with the existing infrastructures and algorithms. The operation is implemented via a two-stage operational architecture for efficient and accurate processing of massive streaming data. We will first design and develop a middleware architecture that enables efficient and scalable capabilities in distributed environments. Next, we will analyze and develop three key algorithms, namely, Distributed Streaming Data Clustering (DISDC), Distributed Optimal Context-aware Data Classification (DOCDC), and Real-time Publish/Subscribe Data Model Update (RPSDMU) for the middleware. The system is wrapped by APIs and Adapters for facilitating the implementation and integration of COMAND with the existing systems.

AnomLoc: A perfSONAR-based Distributed Network Anomaly Detection and Localization

Data-intensive scientific applications incline to high-performance computing which is getting more and more widespread in supercomputing centers, research laboratories, and universities. DoE and many organizations need an automatic and adaptive network analysis tool for effective anomaly detection and localization in the high-speed network. Currently, approaches, Pythia, APD, etc., are unable to provide such a function effectively, accurately, and user-friendly. InfoBeyond advocates a perfSONAR-based distributed network anomaly detection and localization scheme (AnomLoc) to address the challenges of performance problem diagnosis in distributed data-intensive-oriented networks by relying on perfSONAR legacy measurement infrastructure. AnomLoc includes Q-statistics and convex-optimization-based algorithms that are designed as pluggable tools to facilitate its implementation and integration with the existing infrastructures and algorithms. It is to be developed as a pluggable tool for real-time distributed anomaly detection and localization. In the first step, we perform data acquisition and purging on perfSONAR measurement data to filter out the misleading data. In the second step, we carry out anomaly detection and localization on the pruned data by relying on four key algorithms, namely, Sparse Principal Component Analysis (SPCA), Graph-based SPCA (GSPCA), Karhunen-Loeve-based SPCA (KLSPCA), and Graph-based KLSPCA (GKLSPCA) for anomaly localization. Additionally, we also develop APIs for the AnomLoc modules for easy implementation and smooth integration with the existing perfSONAR infrastructure.

InfoBeyond promotes big data research for efficient and accurate knowledge discovery using graph embedding, streaming, AI, machine learning and visualization.