Skills

Programming

Python, Java, Bash, R, git

NLP Frameworks

Spacy, NLTK, OpenNLP, CoreNLP, AllenNLP, Gensim

ML Frameworks

PyTorch, TensorFlow, Huggingface, Scikit-Learn, Keras

Storage

ElasticSearch, FAISS, MongoDB, SQL

Projects

*

My main PhD project. Stay tuned for the upcoming preprint.

Designed and developed an intelligence analysis toolkit that has support for multi-source data ingestion, fusion and analysis of unstructured data. Implemented various data exploration components in IPM like spatial-temporal analysis, topic extraction/monitoring, sentiment analysis, facetted search, text summarization and automated report generation. Developed a multi-staged scalable data ingestion framework called DIGEST that works as the backend data processing pipeline for IPM.

Developed a vulnerability analysis toolkit called Attack Path Analyzer using Netbeans Rich Client platform that addresses the growing need for analysing cyber-physical system vulnerabilities and identifying their critical resources. Implemented structural and parametric analysis components that supports the acquisition, representation, storage, mapping, vulnerability, and dependency analysis of all information that links cyber and physical resources in a system.

Developed Secure Netcentric Information Assurance Classifier for automatically classifying and downgrading information across domains to be used by the Intelligence community at Rome Labs. Designed and developed a SPARQL generator from user free text for querying RDF generated from unstructured text. Developed OSR Studio, a rich client application for managing Ontology Semantic Resource used in the semantic analysis.

Developed a Classification model for predicting post-surgical complications in patients with an F1-score of 90%. Performed data visualization using Tableau and Spotfire. Developed mining structures using SSAS. Developed and tuned SQL queries to fetch patient data from SQL databases for analysis.

Recent Publications

Wikipedia is a critical platform for organizing and disseminating knowledge. One of the key principles of Wikipedia is neutral point of view (NPOV), so that bias is not injected into objective treatment of subject matter. As part of our research vision to develop resilient bias detection models that can self-adapt over time, we present in this paper our initial investigation of the potential of a cross-domain transfer learning approach to improve Wikipedia bias detection.

Practical data analysis scenarios involve more than just the interpretation of data through visual and algorithmic analysis. Many real-world analysis environments involve multiple types of experts and analysts working together to solve problems and make decisions, adding organizational and social requirements to the mix. We aim to provide new knowledge about the role of provenance for practical problems in a variety of analysis scenarios central to national security. We present the findings from interviews with data analysts from domains, such as intelligence analysis, cyber-security, and geospatial intelligence.

There continues to be growing pressure to sell off spectrum currently allocated for defense purposes in favor of private sector applications. These pressures come at a time when Department of Defense DoD spectrum needs are growing at an exponential pace, raising concerns that we will soon reach a point where they can no longer be met. In response, the Range Commanders Council RCC Frequency Management Group FMG developed a baseline set of standard metrics to measure spectrum utilization, demand, efficiency, and operational effectiveness.

The sidewalk infrastructures are a critical element of urban transportation networks, providing support and connectivity to other modes of transportation. Cities have as many sidewalks as they have streets, but most municipalities find it difficult to devote the same amount of resources to the management and maintenance of sidewalks as they do to streets. In this paper we introduce MySidewalk™ (Research 2015), a cost-effective crowdsourcebased approach to data collection for sidewalk inventory and condition.

Because of its economic value, there has been growing pressure to sell off spectrum currently allocated for defense purposes. These pressures come at a time when Department of Defense (DoD) spectrum needs are growing at an exponential pace, thus prompting heightened efforts to clearly demonstrate both the need and the responsible, efficient use of electromagnetic spectrum. In response, the DoD has developed a baseline set of standard metrics to measure spectrum utilization, demand, efficiency, and operational effectiveness.

This paper describes the motivations, solution concepts, and architecture of a framework for a Rapid Information Discovery System (RAID), to support semantic enterprise search and knowledge discovery from large volumes of multi-source text data. First, the overall solution concept is summarized. An ontology-driven approach to natural language processing (NLP) is described. Then the RAID architecture for semantic indexing and semantic search is summarized.

The goal of the Data Integration and Predictive Analysis System (IPAS) is to enable prediction, analysis, and response management for incidents of infectious diseases. IPAS collects and integrates comprehensive datasets of previous disease incidents and potential influencing factors to facilitate multivariate, predictive analytics of disease patterns, intensity, and timing.

This paper describes an ontology driven framework for collaborative information analysis and knowledge discovery. The framework, called Semantic Technology for Evidence Exploration and Learning (STEEL), includes a collection of methods and an architecture that implements the method.

Generating an all source information report involves collecting data from various heterogeneous data sources, analyzing the content and fusing it in a topic sensitive fashion. This paper describes a framework for extracting information from multiple text and social media sources and fusing the information so as to facilitate enhanced situational awareness and decision making. The central innovative aspect of the proposed solution approach is the ability to identify and analyze nuggets of relevant and actionable knowledge from highly unstructured social-media data, recognizing the fast evolving situations affording limited lagtime for emergency indicator, response and mitigation.

A recent trend in the popular health news is, reporting the dangers of prolonged inactivity in one's daily routine. The claims are wide in variety and aggressive in nature, linking a sedentary lifestyle with obesity and shortened lifespans [25]. Rather than enforcing an individual to perform a physical exercise for a predefined interval of time, we propose a design, implementation, and evaluation of a context aware health assistant system (called Step Up Life) that encourages a user to adopt a healthy life style by performing simple, and contextually suitable physical exercises.

This paper describes the motivations, methods, and automation architecture of a framework for multisource Semantic Information extraction & Fusion for collaborative Threat assessment (SIFT). First, the technical and pragmatic challenges that motivate the research ideas are summarized. Next, a characterization of the activities for generating decision enabling information from multi-source data is provided. This characterization, called the ‘SIFT Method,’specifies the SIFT automation support requirements.