Scalable I/O Optimization for Large-scale Parallel GNN Training
Led the architectural design of a Rust-based NDArray data store 💾, improving I/O throughput by 31~135× in PyTorch-based graph neural network (GNN) ensemble training workflows.
Engineered an I/O-aware PyTorch DataLoader, significantly reducing contention and cutting I/O wait time by 99% under high-concurrency training scenarios.
Delivered a high-performance, production-ready system that bridges AI model scalability and I/O performance across modern HPC clusters.
Exascale End-to-End Object-Centric Data Management
Spearheaded the evolution of team’s software release management standard, including developer experience unification, DevOps best practices and software release review process improvement. These efforts introduced better control on the pace of software development and research execution.
Architected a high-performance, trie-based distributed metadata search system - IDIOMS 💾, achieving 407× faster independent queries and 300× faster collective queries than SQLite, published in CCGrid 2024.
Designed a schema-less binary format for data serialization - up to 10% space reduction as compared to MessagePack with better API for self-guided data parsing process, presented at SC’24.
AI-powered Data Discovery for Gray Graph Engine (Collaborative Research with Hewlett Packard Enterprise (HPE))
Led cross-institutional team across LBNL, OSU and HPE to optimize AI inference workflows in HPE’s Cray Graph Engine (CGE), mentoring the Ph.D student at Dr. Suren Byna’s and achieving 63% faster query latency for scientific datasets through -
Dynamic UDF Filtering - Optimized CV inference workflow in both animal taxonomy and facial recognition use cases, enabling real-time feature extraction for 90k+ OpenImages datasets;
Three-Stage Caching Framework - Designed object/target/feature caching strategies, reducing redundant AI computations by 82% across 128-node clusters on Perlmutter;
Feature caching reduced AI inference query time from 74.1s → 0.42s (176× speedup);
Data ingestion overhead limited to less than 5%;
Paper published at IEEE BigData 2024 (acceptance rate - 19.7%)
Semantic Query over Scientific Datasets
Mentored the Ph.D. student at Dr. Yong Chen’s group on exploring the intersection of natural language processing, semantic query, LLM, RAG, and scientific data discovery.
Kv2vec - LSTM-based vector embedding for key-value pairs in scientific metadata, reducing semantic search errors from 17.3% → 3.1% (by 80%) vs. traditional methods (published in IEEE HPEC’22).
PSQS - a paralle semantic metadata querying service over self-describing data formats, achieving 20% improvement in query hit rate and 15% higher recall (published in IEEE BigData 2023).
ICEAGE - an LLM-powered metadata search engine that outperforms traditional keyword-based and code-generation methods, achieving 98% query accuracy and 5.43× higher throughput in CPU-based environments and 29.52× in GPU-accelerated settings for scientific data retrieval (under review).
Metastore Integration with OCI Data Catalog
Owned and led the integration of OCI Data Catalog (DCAT) into OCI Big Data Service, ensuring seamless metadata federation across 12+ components (HDFS, Spark, Hive, etc.).
Designed and implemented automated DCAT enablement & disablement, ensuring zero-touch configuration for customers deploying big data clusters on OCI cloud.
Engineered a distributed metadata orchestration framework, reducing sync latency by 67% and enhancing data lineage tracking across 50+ OCI regions.
Security & Identity Management for OCI Big Data Service
Owned and implemented the Active Directory (AD) integration, enabling enterprise-grade authentication for all 12+ big data components, reducing enterprise onboarding time by 65%.
Standardized UID/GID registration and access control frameworks, ensuring 85% fewer identity conflicts in multi-tenant cloud environments.
External Service Integration Framework for Customizable Cluster
Designed and implemented a dynamic cluster profile switching mechanism, allowing users to select and provision tailored big data environments, reducing cluster deployment time by 60%.
Developed a pluggable integration model, ensuring that external services (DCAT, Object Storage, etc.) dynamically adapt to different component combinations, providing future-proof extensibility as new profiles (e.g., Iceberg, Delta Lake) are introduced.
Architected a multi-region-aware deployment strategy, ensuring low-latency interoperability across OCI’s 50+ public regions and private cloud environments.
AI-driven Data Retention Framework (published in SC ‘21)
Led cross-institutional collaboration involving 2 national labs and an R1 university, resulting in -
Pioneered ActiveDR, an AI-driven data retention system designed for HPC environments, optimizing storage efficiency by prioritizing file retention for active users while purging inactive ones.
Evaluated on Titan supercomputer traces, it reduces file misses by up to 37% and retains 213% more data for active users while maintaining the same level of space utilization compared to traditional fixed-lifetime retention methods.
Metadata Indexing and High-Performance Data Search Engine for HPC Data Management
Designed and Implemented MIQS - a Metadata Indexing and Querying Service for high-performance metadata search over self-describing data formats such as HDF5, NetCDF, etc. This work enables direct metadata indexing over data while eliminating the need of setting up external databases. It achieves 172k× faster metadata searches than MongoDB-based solutions, 99% reduction in index construction time, and 75% lower memory footprint, making it an efficient and portable alternative for HPC scientific data management (Published in SC ‘19).
Designed and Implemented DART - a Distributed Adaptive Radix Tree for high-performance distributed affix-based keyword search, achieving 55× higher throughput than Distributed Hash Tables (DHTs) for prefix and suffix searches while ensuring balanced keyword distribution, reducing query contention and improving scalability (Published in PACT ‘18).
Graph Partitioning Algorithm for Distributed Graph Data Databases
Designed and implemented AKIN - similarity-based streaming graph partitioning algorithm that improves data locality and reduces edge-cut ratio in distributed graph storage systems. By leveraging vertex similarity, AKIN reduces edge-cut ratio by up to 20% compared to FENNEL, while maintaining balanced partitions with minimal overhead, making it a superior alternative to IOGP for real-time graph partitioning (Published in CCGrid ‘18).
Initiated and Shaped the core idea of IOGP - the first multi-stage, online graph partitioning algorithm designed for distributed graph databases, dynamically adjusting partitions as data evolves. By leveraging vertex connectivity and degree changes, IOGP improves query performance by 2× while maintaining balanced partitions with less than 10% overhead as compared to FENNEL (Published in HPDC ‘17).
Other Achievements
Secured NSF research funding by leading the successful development and submission of a competitive proposal, demonstrating expertise in grant writing and research leadership.
Mentored a Ph.D. student, guiding their research direction, publication strategy, and technical growth, contributing to the advancement of the research group.
Released two open-source software tools, enabling broader adoption of HPC research innovations, including -
Two software releases:
Scalable Geospatial Data Mining & Visualization
Designed and deployed a distributed geospatial data mining infrastructure using Apache Spark, HBase, and HDFS, reducing overall large-scale geospatial analytics time from 12 months to 3 months.
Optimized big data compression and indexing strategies, reducing storage overhead by 40% and improving query latency by 60% for spatial datasets.
Developed a real-time geospatial visualization system, leveraging GDAL, Redis, and NodeJS, enabling high-resolution mapping of social media demographics across 10+ regions.
Conducted advanced geospatial data mining and demographic analysis on 500M+ geo-tagged Twitter records across 5 years, providing insights into geospatial demographic pattern and socioeconomic trends.
Scalable Data API & Messaging Architecture
Architected & developed Meshwork, a high-performance graph-based data access API supporting MySQL & Redis, increasing query efficiency by 5× in distributed systems.
Designed & implemented BrookSide, a low-latency AMQP-based messaging framework using RabbitMQ, improving inter-service communication by 40%.
Led the development of Webshot-rest-amqp-service, a Node.js-based snapshot automation tool, enabling real-time monitoring & web scraping at 10× faster capture rates.
Developed PCVF, a Parameter Constraining & Validation Framework for secure REST API governance, reducing API failures by 60% in mission-critical applications.
Integrated DevOps best practices with Maven, Jenkins, and JUnit, streamlining RESTful API deployment and improving CI/CD efficiency by 3×.
Weibo Platform API & Scalable Data Services
Designed & Unified Weibo’s REST API, setting the foundation for its scalable platform (Patented: CN103049271B), improving API consistency & usability for 500M+ users.
Owned & led the development of T.cn, Weibo’s high-availability URL shortening service, processing millions of daily requests with sub-ms latency.
Owned & led Weibo’s User Data Service, a critical data infrastructure handling billions of user interactions, ensuring 99.9% uptime & horizontal scalability.
Optimized API query pipelines, reducing data retrieval latency by 20% and enhancing real-time content recommendations.
Business Data Management & Operational Systems
Developed & optimized a business data management system, improving operational efficiency by 3×.
Designed & implemented a batch processing framework, optimizing data transformation pipelines for higher throughput.
Developed & maintained an operation management system, ensuring seamless feature integration & data consistency.
Implemented a business reporting module, enabling automated, real-time business insights.