Postdoctoral Researcher
Lawrence Berkeley National Laboratory
Mar 2023 - Now
- Led the IDIOMS project, addressing complex technical challenges in distributed metadata indexing to significantly enhance data management capabilities, contributing to growth in distributed computing and leadership, aligning with the Laboratory's mission to advance scientific data management.
- Led the exploration and implementation of data I/O optimization techniques for Graph Neural Networks (GNN) in scientific applications. This research directly improves AI training efficiency by refining data access patterns and storage techniques, thereby contributing to the advancement of scalable AI solutions.
- Spearheaded research initiatives and mentoring Ph.D. students on the direction of AI for Scientific Data Discovery, which aims to revolutionize how researchers interact with and derive insights from complex data. This work domain specific models, large language models (LLM) and retrieval augmented generation (RAG) techniques.
- Developed BULKI, a novel data format designed to address limitations in traditional data serialization methods. BULKI's flexibility and efficiency make it an ideal candidate for supporting advanced data management and AI-driven applications, particularly in environments requiring rapid and adaptable data processing.
- Led the evaluation and optimization of scientific data management techniques, including enhancing metadata indexing and querying within the Proactive Data Container (PDC) project and advancing multi-dimensional data stitching for lattice light-sheet microscopy. These efforts significantly improved data retrieval and processing capabilities in scientific applications.
Visiting Researcher
The Ohio State University
Mar 2023 - Now
- Research collaboration with Prof. Suren Byna's group on scientific data management, particularly focusing on data discovery, and AI-powered data discovery.
- Mentoring junior Ph.D. students on the direction of scientific data management.
Adjunct Research Scientist
Texas Tech University
June 2021 - Now
- Research collaboration with Prof. Yong Chen's group on scientific data management, particularly focusing on data discovery, provenance, and semantic scientific data discovery.
- Mentoring junior Ph.D. students on the direction of semantic scientific data discovery, exploring the intersection of natural language processing, semantic query, LLM, RAG, and scientific data discovery.
Senior Member of Technical Staff
Oracle Corporation
July 2021 - Feb 2023
- Directed critical initiatives within Oracle’s OCI Big Data Service, including the integration of the OCI Data Catalog Metastore and the optimization of Active Directory. These projects enhanced platform security, reliability, and scalability, and established best practices that drove operational excellence across the organization.
- Spearheaded the development and implementation of key coordination and security services, including the UID/GID Coordinating Service and Security Management Module. These efforts significantly strengthened system efficiency and resilience, ensuring robust performance and adaptability in a high-demand environment.
- Led the design and execution of the External Service Integration Framework in the Cluster Profile Project, focusing on metadata-driven access control. This initiative was pivotal in enhancing system flexibility and supporting the platform’s scalability to meet evolving business needs.
Research Assistant
Data-Intensive Scalable Computing Laboratory (DISCL), Texas Tech University
Aug 2017 - May 2021
- Pioneered the development of advanced data management solutions in HPC systems, including a Distributed Adaptive Radix Tree for affix-based keyword search and a Metadata Indexing and Querying Service for self-describing data formats, significantly enhancing search and data retention capabilities. The study has been published in PACT '18, SC '19, and HiPC '20.
- Spearheaded the development of innovative approaches for graph partitioning and storage resource management, including leading the creation of a Similarity-based Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems, which was published at CCGrid '18 and HPDC '17. These contributions have been pivotal in optimizing storage and computational efficiency in distributed systems.
- Led a collaborative effort integrating researchers from two national laboratories and an R1 university to develop an innovative data retention solution for data centers. This research is published at SC '21, highlighting the impact of analyzing user activeness in enhancing data management efficiency in high-performance computing systems.
- Led the successful development and submission of an NSF funding proposal, demonstrating strong expertise in grant writing and research project management.
- Mentored and guided a junior Ph.D. student, fostering their academic and research growth, while contributing to the advancement of the research group.
- Two software releases:
Research Assistant
STARLab, Texas Tech University
Jan 2016 - Dec 2016
- Led the development and deployment of a comprehensive data mining infrastructure leveraging cutting-edge technologies such as Apache Spark, HBase, and HDFS. This infrastructure was pivotal in setting up a distributed big data cluster specifically designed for efficient geospatial data mining.
- Directed strategic optimizations across the big data software stack, with a focus on implementing unified data compression techniques. These optimizations significantly enhanced the performance and scalability of geospatial data processing workflows.
- Pioneered scalable geo-spatial visualization solutions by deploying a robust visualization system for the distribution of social media users. This solution utilized GDAL in combination with NodeJS, Python, and Redis, providing geoscientists with powerful tools to analyze and interpret complex spatial data.
- Executed advanced geospatial data mining and demographic analysis using technologies such as Apache Spark, HBase, and Hadoop. This work included extracting and analyzing demographic information from Twitter data, offering new insights into population dynamics and social behavior.
- Initiated and led innovative research projects, including a comparative study titled "Remote Sensing and Social Sensing for Socioeconomic Systems," which examined the differences between Nighttime Lights and Location-based Social Media data. This research highlighted the potential of integrating social media data with traditional remote sensing for enhanced socioeconomic analysis.
Senior System R&D Engineer
Beijing Serious Technology Co., Ltd
Jan 2014 - Jan 2016
- Architected and developed Meshwork, a sophisticated graph-like data access API supporting both MySQL and Redis for seamless and optimized data retrieval. This API significantly improved data access efficiency across distributed systems.
- Designed and implemented BrookSide, a high-performance message processing framework built on AMQP protocols, specifically RabbitMQ, enabling efficient and reliable communication across microservices.
- Led the development of the Webshot-rest-amqp-service project, a cutting-edge NodeJS application that automates the capture of website snapshots based on AMQP messages from RabbitMQ. This tool enhanced automated monitoring and web scraping capabilities.
- Spearheaded the creation of PCVF, a robust Parameter Constraining and Validating Framework for RESTful Web Service APIs, developed as part of a confidential project. This framework ensured API reliability and security, aligning with stringent project requirements.
- Guided the adoption of advanced DevOps practices, including the integration of Maven, Jenkins, Unit Testing, and a custom document generator. These practices were implemented to support RESTful Web Service APIs, ensuring seamless compatibility with the PCVF framework and enhancing the development workflow.
System R&D Engineer
Sina.com Technology (China) Co.,Ltd.
Jul 2010 - May 2013
- Spearheaded the unification and optimization of the Weibo REST API, setting a strategic foundation for the platform's scalable growth by standardizing design, development, documentation, and testing processes. This initiative, which was ultimately patented in CN103049271B, enabled Weibo to efficiently evolve and meet the demands of its rapidly expanding user base.
- Owned and led the development of T.cn, Weibo's URL shortening service, overseeing all critical aspects including design, implementation, and tracking infrastructure. This project was pivotal in enhancing content management and analytics across the platform.
- Led the User Data Service team at Weibo's data platform, driving the development of new features, advancing technical capabilities, and ensuring the scalability and stability of the platform’s critical data services.
Senior Software Developer
Beijing JustMusic Co.,Ltd.
Feb 2009 - Jun 2010
- Spearheaded the end-to-end development of a sophisticated business data management system, leveraging software engineering expertise to ensure efficient data handling, storage, and retrieval.
- Designed and implemented a streamlined batch processing framework, enabling seamless execution of data processing tasks and optimizing system performance for enhanced productivity.
Software Developer
Beijing Datuu.com Technology Co.,Ltd.
Jan 2008 - Jan 2009
- Developed an operation management system, taking charge of routine feature development, data maintenance, and ensuring seamless integration of essential functionalities.
- Implemented a robust business reporting module within the operation management system, enabling accurate and timely generation of business reports to facilitate informed decision-making