I/O Optimization for HydraGNN Ensemble Training
Engineered Rust-based object-centric data store replacing ADIOS used in material science data, which provides unified NDArray storage that stores multiple NDArrays per data object, improving overall I/O throughput by 54% in PyTorch-based material science ensemble training scenario.
Designed and implemented DataLoader coordinator around PyTorch Sampler resolving I/O contention, achieving 48% reduction in I/O wait time during concurrent ensemble GNN training.
Built elastic scaling framework that automated DDP/NCCL port management for 512+ GPU clusters across 64-1024 node configurations.
IDIOMS Distributed Metadata Indexing and Querying Engine (CCGrid’24 First Author)
Designed IDIOMS, an index-powered, distributed metadata search system that enables high-performance affix-based metadata querying in object-centric data management (ODM) systems.
Engineered double-layered trie-based distributed index, supporting prefix, suffix, infix, and exact metadata searches, achieving 407× faster independent queries and 300× faster collective queries than SQLite-based alternatives.
Integrated DART (Distributed Adaptive Radix Tree) for deterministic query routing, reducing query communication overhead by 92%, enabling scalable metadata indexing across HPC clusters.
Validated IDIOMS on NERSC’s Perlmutter Supercomputer across 128 nodes and 16k CPU cores using 1M+ objects, 10M+ metadata attributes, demonstrating at least 370× better performance than state-of-the-art methods with ≤52.57% memory overhead.
BULKI - Binary Unified Layout for Key-value Interchange
Designed BULKI, a next-generation binary data format for scientific workflows, achieving 50% smaller serialized size and 100× faster parsing compared to MessagePack in metadata-intensive scenarios (1,000+ attributes).
Nested Data Structures - Support for recursive embedding of scalar values, arrays, and nested entities
Self-Describing Metadata - VLE encoded metadata ensures machine-readable parsing without schema predefinition
Compact Storage - Optimized binary layout reduces overhead by 37% for multi-dimensional astronmy datasets
AI-powered Data Discovery for Gray Graph Engine (Collaborative Research with Hewlett Packard Enterprise (HPE))
Led cross-institutional team across LBNL, OSU and HPE to optimize AI inference workflows in HPE’s Cray Graph Engine (CGE), mentoring the Ph.D student at Dr. Suren Byna’s and achieving 63% faster query latency for scientific datasets through -
Dynamic UDF Filtering - Optimized CV inference workflow in both animal taxonomy and facial recognition use cases, enabling real-time feature extraction for 90k+ OpenImages datasets;
Three-Stage Caching Framework - Designed object/target/feature caching strategies, reducing redundant AI computations by 82% across 128-node clusters on Perlmutter;
Feature caching reduced AI inference query time from 74.1s → 0.42s (176× speedup);
Data ingestion overhead limited to less than 5%;
Paper published at IEEE BigData 2024 (acceptance rate - 19.7%)
Semantic Query over Scientific Datasets
Mentored the Ph.D. student at Dr. Yong Chen’s group on exploring the intersection of natural language processing, semantic query, LLM, RAG, and scientific data discovery. The deliverables include -
Kv2vec - LSTM-based vector embedding for key-value pairs in scientific metadata, reducing semantic search errors from 17.3% → 3.1% (by 80%) vs. traditional methods (published in IEEE HPEC’22).
PSQS - a paralle semantic metadata querying service over self-describing data formats, achieving 20% improvement in query hit rate and 15% higher recall (published in IEEE BigData 2023).
ICEAGE - an LLM-powered metadata search engine that outperforms traditional keyword-based and code-generation methods, achieving 98% query accuracy and 5.43× higher throughput in CPU-based environments and 29.52× in GPU-accelerated settings for scientific data retrieval (under review).
Metastore Integration with OCI Data Catalog
Owned and led the integration of OCI Data Catalog (DCAT) into OCI Big Data Service, ensuring seamless metadata federation across 12+ components (HDFS, Spark, Hive, etc.).
Designed and implemented automated DCAT enablement & disablement, ensuring zero-touch configuration for customers deploying big data clusters on OCI cloud.
Engineered a distributed metadata orchestration framework, reducing sync latency by 67% and enhancing data lineage tracking across 50+ OCI regions.
Developed a region-aware service provisioning mechanism, enabling 5× faster onboarding of new OCI regions, reducing manual operational overhead by 80%.
Security & Identity Management for OCI Big Data Service
Owned and implemented the Active Directory (AD) integration, enabling enterprise-grade authentication for all 12+ big data components, reducing enterprise onboarding time by 65%.
Standardized UID/GID registration and access control frameworks, ensuring 85% fewer identity conflicts in multi-tenant cloud environments.
Designed policy-driven user access enforcement, ensuring SOC 2, ISO 27001, and FedRAMP compliance for big data deployments across public and private OCI regions.
Developed role-based security automation, reducing manual security enforcement overhead by 40%, ensuring zero-trust compliance at scale.
External Service Integration Framework for Customizable Cluster
Designed and implemented a dynamic cluster profile switching mechanism, allowing users to select and provision tailored big data environments, reducing cluster deployment time by 60%.
Developed a pluggable integration model, ensuring that external services (DCAT, Object Storage, etc.) dynamically adapt to different component combinations, providing future-proof extensibility as new profiles (e.g., Iceberg, Delta Lake) are introduced.
Architected a multi-region-aware deployment strategy, ensuring low-latency interoperability across OCI’s 50+ public regions and private cloud environments.
Ensured robust cross-component compatibility, allowing customers to seamlessly mix-and-match big data components without service disruption, enhancing cluster flexibility by 4×.
AI-driven Data Retention Framework (published in SC ‘21)
Led cross-institutional collaboration involving 2 national labs and an R1 university, resulting in -
Pioneered ActiveDR, an AI-driven data retention system designed for HPC environments, optimizing storage efficiency by prioritizing file retention for active users while purging inactive ones.
Evaluated on Titan supercomputer traces, it reduces file misses by up to 37% and retains 213% more data for active users while maintaining the same level of space utilization compared to traditional fixed-lifetime retention methods.
Metadata Indexing and High-Performance Data Search Engine for HPC Data Management
Designed and Implemented MIQS - a Metadata Indexing and Querying Service for high-performance metadata search over self-describing data formats such as HDF5, NetCDF, etc. This work enables direct metadata indexing over data while eliminating the need of setting up external databases. It achieves 172k× faster metadata searches than MongoDB-based solutions, 99% reduction in index construction time, and 75% lower memory footprint, making it an efficient and portable alternative for HPC scientific data management (Published in SC ‘19).
Designed and Implemented DART - a Distributed Adaptive Radix Tree for high-performance distributed affix-based keyword search, achieving 55× higher throughput than Distributed Hash Tables (DHTs) for prefix and suffix searches while ensuring balanced keyword distribution, reducing query contention and improving scalability (Published in PACT ‘18).
Graph Partitioning Algorithm for Distributed Graph Data Databases
Designed and implemented AKIN - similarity-based streaming graph partitioning algorithm that improves data locality and reduces edge-cut ratio in distributed graph storage systems. By leveraging vertex similarity, AKIN reduces edge-cut ratio by up to 20% compared to FENNEL, while maintaining balanced partitions with minimal overhead, making it a superior alternative to IOGP for real-time graph partitioning (Published in CCGrid ‘18).
Initiated and Shaped the core idea of IOGP - the first multi-stage, online graph partitioning algorithm designed for distributed graph databases, dynamically adjusting partitions as data evolves. By leveraging vertex connectivity and degree changes, IOGP improves query performance by 2× while maintaining balanced partitions with less than 10% overhead as compared to FENNEL (Published in HPDC ‘17).
Other Achievements
Secured NSF research funding by leading the successful development and submission of a competitive proposal, demonstrating expertise in grant writing and research leadership.
Mentored a Ph.D. student, guiding their research direction, publication strategy, and technical growth, contributing to the advancement of the research group.
Released two open-source software tools, enabling broader adoption of HPC research innovations, including -
Two software releases:
Scalable Geospatial Data Mining & Visualization
Designed and deployed a distributed geospatial data mining infrastructure using Apache Spark, HBase, and HDFS, reducing overall large-scale geospatial analytics time from 12 months to 3 months.
Optimized big data compression and indexing strategies, reducing storage overhead by 40% and improving query latency by 60% for spatial datasets.
Developed a real-time geospatial visualization system, leveraging GDAL, Redis, and NodeJS, enabling high-resolution mapping of social media demographics across 10+ regions.
Conducted advanced geospatial data mining and demographic analysis on 500M+ geo-tagged Twitter records across 5 years, providing insights into geospatial demographic pattern and socioeconomic trends.
Scalable Data API & Messaging Architecture
Architected & developed Meshwork, a high-performance graph-based data access API supporting MySQL & Redis, increasing query efficiency by 5× in distributed systems.
Designed & implemented BrookSide, a low-latency AMQP-based messaging framework using RabbitMQ, improving inter-service communication by 40%.
Led the development of Webshot-rest-amqp-service, a Node.js-based snapshot automation tool, enabling real-time monitoring & web scraping at 10× faster capture rates.
Developed PCVF, a Parameter Constraining & Validation Framework for secure REST API governance, reducing API failures by 60% in mission-critical applications.
Integrated DevOps best practices with Maven, Jenkins, and JUnit, streamlining RESTful API deployment and improving CI/CD efficiency by 3×.
Weibo Platform API & Scalable Data Services
Designed & Unified Weibo’s REST API, setting the foundation for its scalable platform (Patented: CN103049271B), improving API consistency & usability for 100M+ users.
Owned & led the development of T.cn, Weibo’s high-availability URL shortening service, processing millions of daily requests with sub-ms latency.
Owned & led Weibo’s User Data Service, a critical data infrastructure handling billions of user interactions, ensuring 99.99% uptime & horizontal scalability.
Optimized API query pipelines, reducing data retrieval latency by 50% and enhancing real-time content recommendations.
Business Data Management & Operational Systems
Developed & optimized a business data management system, improving operational efficiency by 3×.
Designed & implemented a batch processing framework, optimizing data transformation pipelines for higher throughput.
Developed & maintained an operation management system, ensuring seamless feature integration & data consistency.
Implemented a business reporting module, enabling automated, real-time business insights.