Undergraduate Student Research

Undergraduate Research Symposium Sub-Competition

The Mississippi State University (MSU) Data Science Program Undergraduate Research Symposium Sub-Competition recognizes excellence in undergraduate research relevant to data science.

Development of Parameter Dependent conditional Generative Adversarial Network (PDcGAN) Model for Multi-Phase Flow Prediction

Overall Winner: Mississippi State University (MSU) Data Science Program Undergraduate Research Symposium Sub-Competition. Minjae Cho's elegant and well-composed poster presented a project that included data acquisition and storage, the application of artificial intelligence methods to predict the complex spray and air-fuel mixing in gasoline direct-injection (GDI) engines, and a sophisticated interpretation of results. Minjae's confident verbal explanation provided clear explanations of complex concepts that proved his deep engagement with both the subject matter and the data science aspects of the project. Minjae Cho is a Mechanical Engineering major in MSU's Bagley College of Engineering. His faculty advisor is Sungkwang Mun of the MSU Center of Advanced Vehicular Sysyems.

Development of Parameter Dependent conditional Generative Adversarial Network (PDcGAN) Model for Multi-Phase Flow Prediction Machine learning and artificial intelligence (ML/AI) techniques have had a significant impact on various scientific and engineering fields by uncovering new insights from data or predicting previously untested properties. Deep learning, particularly in the form of artificial neutral networks, has been instrumental in the successes of ML/AI. Our research focuses on developing a deep learning framework using a conditional Generative Adversarial Network (cGAN) to predict the complex spray and air-fuel mixing in gasoline direct-injection (GDI) engines. This has been a major challenge due to the complicated two-phase flow dynamics involved. Our Parameter Dependent cGAN (PDcGAN) predicts the fuel's 3D shape based solely on its sprayed condition and fuel property. Our training dataset consists of nine different fuels sprayed through multi-hole injectors 100 times each and projected into a combustion chamber. The projected images are acquired from three different camera angles and later used for 3D reconstruction as post-processing. These images are converted to pixel intensity data based on physically quantitative projected liquid volume (PLV) and separated into a training and validation set before training with the PDcGAN algorithm implemented in MATLAB. After extensive training on a GPU, the model can predict the morphology of fuels not included in the training data. Our research has identified optimal parameters and network architecture that yield an average test set error of around 10-15%. While this error rate may be considered high for practical use, it is important to note that engineers typically spend three hours using CFD (computational fluid dynamics) to predict the shape of a single fuel, requiring significant computational resources. In contrast, our trained algorithm, PDcGAN, provides results in seconds, making it a more efficient solution for engineers without access to supercomputing resources. We anticipate that the model's accuracy will continue to improve as we expand the dataset to include a wider range of fuels.

Student Scientists: Minjae Cho
Advisor: Sungkwang Mun
Department: Department of Mechanical Engineering

ODBPlotter: An Open Source Data Processing and Visualization Tool for Wire Arc Directed Energy Deposition

Wire arc directed energy deposition (WA-DED) is a metal-based additive manufacturing process that produces a component in a layer-wise fashion as defined in a Computer Automated Design geometry. Tools for finite element analysis simulate the heat transfer through the deposition process, such as Abaqus finite element software. However, Abaqus uses Python 2 scripting to analyze the data generated by the finite element solver. This means that engineers working with Abaqus must either interface with Python 2, which is no longer supported and considered insecure to use, or must manually run their simulations in Abaqus. In the past, there have been tools aimed at improving this shortcoming by ensuring that Abaqus Python 2 is written to the same standard as modern Python 3 code or writing more intuitive Python 2 interfaces. This work has developed an open-source program, named ODBPlotter, to interface directly with Abaqus' object database (.odb) file format, allowing for Python 3 scripting, post-processing, and visualization of data generated with Abaqus. ODBPlotter implements modern data storage via the .hdf5 file format and 3-dimensional plotting techniques to improve efficiency and reliability when interfacing with the .odb format. ODBPlotter's modern interface allows for its users to efficiently extract data from .odb files and store that data in a portable-cross-platform .hdf5 format which can be used in many environments and shared with collaborators.

*Winner in Category 3: Applied Data Science Research

Student Scientists: Clark Hensley
Advisor: Dr. Matthew Priddy
Department: Department of Computer Science and Engineering
Co-Authors: Logan Betts, CJ Nguyen, Matthew W. Priddy

Predicting the Antibacterial Effectiveness of Nanotextured Surfaces Using Transfer-Learning

The goal of this study is to predict the effectiveness of various antibacterial surfaces using a Machine Learning (ML) method. Antibacterial surfaces are the nanotextures that prevent bacterial adhesion and thus suppress biofilm formation. In order to build correlations between many types of nanotextures and various bacterial strains, and to develop an optimized design framework, we employed ML, which has grown rapidly in recent years and has been affecting many engineering and scientific fields in the context of data analysis, such as classification and regression. Firstly, through the method of transfer learning, we used a pre-trained convolution-based neutral network (CNN), such as ResNet18 composed of 152 various computational layers trained over 1 million images with 1,000 categories, to learn the new task by fine-tuning the parameters (weights and biases) of the final fully connected layer of network, as transfer learning can transfer quickly learned features to the new dataset. For the training database, we used two different antibacterial effectiveness levels of 5% and 65% with a size of 256x256 cropped scanning electron microscope (SEM) images taken in 10,000 and 20,000 magnifications. We used Pytorch software for transfer learning to train over new datasets on GPU. After evaluating various combinations, we found that with a total of 100 images that 80 for training and 20 for validation are sufficient to retrain the pre-trained CNN model. The average computational time for training 100 images is 34 seconds with an average training accuracy of 90.12%. Finally, we ran ten more effectiveness levels with different nanotextures, which resulted in higher accuracy in the CNN network.

*Honorable mention in Category 2: Use-Inspired Data Science Research

Student Scientists: Zijie Chen
Advisor: Sungkwang Mun
Departments: Department of Computer Science

Developing a Computer Vision Algorithm for Monitoring Colony Strength in Honeybees

Monitoring colony strength is essential to beekeeping as well as to pollination and apiculture research. The number of foraging honeybees returning to the colony with pollen and the total number of returning foragers are both accurate indicators of colony strength. While maintaining their colonies, beekeepers often glance at or around the entrances of their hives to gain a relative understanding of colony health. However, the accuracy of a cursory check varies even throughout the same day and is not rigorous. For research, scientists often set up a camera at a colony entrance to record the incoming foragers. Once videos have been collected, they typically spend hundreds of hours manually analyzing the data. Thus, rigorously estimating the indicators is very time consuming and labor intensive for humans. To overcome these difficulties, strategies ranging from simply standing near a hive and counting foragers as they arrive to installing forager traps and AI enabled cameras at colony entrances have been tested. Among the most promising methods for health monitoring involve computer vision, the branch of AI related to visual problems. Proof of concepts have already established that computer vision algorithms are an effective strategy for colony monitoring. However, these proofs do not differentiate between different castes or the presence of pollen. They also do not account for changing light and environmental conditions throughout the day. For this project, we seek to develop an algorithm capable of distinguishing both between castes of pollen presence and across changing light conditions with high accuracy. A significant aspect of this development is creating a representative and clean dataset. This information presented the methods used and current state of the dataset. Early AI testing, discussion of its implications, and future directions were also included.

*Received Special Recognition for Contributions to Data Acquisition, Data Wrangling, and Data Labeling 

Student Scientists: Conner Foley
Advisor: Dr. Priyadarshini Chakrabarti Basu
Department: Department of Biochemistry
Co-Author: Dawson Boes

Identification of Seafloor Gas Seeps in Sonar Data to Develop a Machine Learning Detection Database

Seafloor gas seeps, which discharge methane gas into the ocean, are found on continental margins globally. They are an important component of the global marine biogeochemical cycle but their quantity and distribution are not well understood. Notably, seeps contribute to ocean acidification and deoxygenation. Additionally, they are biodiversity hotspots for benthic ecosystems, a demonstrated energy production resource, and a potential marine geohazard. Hence, it is important to identify these seeps but the current method used to discover them is a manual visual detection of seep bubble plumes in sonar data by trained individuals, which is costly and time consuming. Here we develop a database of identified seeps to train machine learning algorithms to automatically detect gas seeps in sonar data. We developed MATLAB code to process and display multibeam sonar fan beam water column imagery as well as to label rectangular portions of the images containing seep plumes. So far, we have labeled and classified the presence or absence of seeps in over 160,000 sonar images. Additionally, we have collaborated with computer engineering colleagues to develop a machine learning framework for seep identification from the identified and labelled seep database. Machine learning algorithms enabled by this database will create a broadly applicable ocean exploration technology that will increase the efficiency and accuracy of seep discovery while also decreasing cost and personnel requirements. Furthermore, it will improve our understanding of these dynamic ocean features and subsequently, associated seafloor environmental ecological processes.

*Received Special Recognition for Contributions to Data Acquisition, Data Wrangling, and Data Labeling

Student Scientists: Surabhi Gupta
Advisor: Adam Skarke
Departments: Department of Wildlife, Fisheries, & Aquaculture

Adrift in Time: Correcting Time Drift in Animal-borne Accelerometer and Magnetometer Dataloggers Using Animal Behavior

Throughout the United States, the invasive wild pig (Sus scrofacosts billions of dollars in damage and control efforts. In recent years, accelerometer and magnetometer dataloggers (AM loggers) have been used to study the behavior of animals offering opportunities for enhanced wildlife management. In this project, we used data from AM loggers and video recordings of captive wild pigs at Mississippi State University in 2016. These dataloggers measure fine scale three-dimensional movements of animals and produce unique signals for each behavior. Generally, to determine which signal patterns correspond to which behavior, we align AM signals to the recorded behavioral observations by their shared timestamps. However, the dataloggers accumulated "time drift," creating a discrepancy between when a behavior was observed and where that behavior occurred in the recorded time-series of movement signals. Thus, we looked for distinctive patterns in the data caused by abrupt changes from the flat signal characteristic of resting to the highly dynamic signals characteristic of walking to determine the time drift between time of the observed behavior and appropriate AM signal. We calculated the offsets for each day of data, fit linear models to the time drift of each pig, and applied these models to the data to correct their behaviors we did not observe. This resulted in alignment between the datalogger signals and video-recording behavior for most of the behavioral observations. This method can be used to correct time drift in AM signal data enhancing predictive performance and increasing the effectiveness of AM loggers for future studies of animal behavior.

*Honorable Mention for Category 3: Applied Data Science Research 

Student Scientists: Curtis Coleman
Advisor: Dr. Garrett Street
Department: Department of Wildlife, Fisheries & Aquaculture 
Co-Authors: Jane Dentinger, Bronson Strickland

Developing a Prototype of Cost-Effective Artificial Intelligence System for Real-Time Cotton Weed Detection

Weeds pose a substantial threat to the final yield of cotton productivity. To eradicate weeds with site- and species-species herbicides, it is essential to precisely classify and localize weeds. Deep learning machine vision technology has emerged as an efficient solution to address this issue. However, the proper implementation of machine vision technology requires multiple phases, including data collection, model development, training, and deployment. While research regarding the first three phases has been extensive, more investigation into the fourth step- transferring the trained model into a device with resource constraints such as Single Board Computer (SBC)- is warranted. In this study, we report the implementation details of pre-trained YOLOv5 model on an NVIDIA Jetson Nano SBC for cotton weed detection. The implemented weed detector can detect the cotton weeds in real-time with a certain confidence score. This implementation (http://github.com/mcPython04/weed_detection) will serve as a resource for the development of prototypes and eventually, industrial-scale field-deployable equipment.

*Honorable Mention for Category 3: Applied Data Science Research 

Student Scientists: Meng Xiang Chen
Advisor: Haifeng Wang
Departments: Department of Industrial and Systems Engineering
Co-Authors: Abdur Rahman, Yuzhen Lu

Computational Fluid Dynamics in a Perfusion Bioreactor

Mechanical bioreactors have shown much promise for studying cell growth in 3D bone structures via perfusion. Fluid flow is extremely important in these studies because media perfuses through the porous bone, providing nutrients to the cells and stimulating growth. However, media flow is inconsistent throughout the bioreactor due to the placement of the inlet/outlet of the chamber and perfusion through the bone from the femoral head with computational fluid dynamics (CFD) simulations. This work utilizes Ansys Fluent to mesh the bioreactor and simulate fluid flow through the device to calculate shear stresses, strain rates, and changes in flow throughout the model. The desired media flow through the bioreactor is 1mL per minute and expectations are to show 1500-3500 micro strain along the bone to promote bone growth. Ansys provides a visual representation of streamlines, velocity gradients, and stress or strain contours throughout the bioreactor. This data will be used to validate the physical properties required for bone growth in physical experiments.

*Honorable Mention for Category 2: Use-Inspired Data Science Research 

Student Scientists: Darrock Flynn
Advisor: Dr. Matthew Priddy
Department: Department of Mechanical Engineering
Co-Authors: Alexis Graham, Amirtaha Taebi, Lauren Priddy

Digital Twin Creation in Off-Road Environments from LiDAR Scans

Digital twins are digital representations of real objects whose purpose is to allow simulation and testing in a virtual interface. Recreation of mapped and structured environments such as roads and buildings in response to the development and integration of autonomous vehicles and city planning have already seen extensive research. However, the usage of digital twins in regards to off-road environments such as forests, farms, and mountainous areas is poorly studied. This research project seeks to create and study digital twins in regards to off-road environments with a focus on modeling terrain and vegetation. Point cloud maps of the environment are constructed from Velodyne LiDAR scans taken from a Clearpath Husky UGV using Simultaneous Localization and Mapping (SLAM). The point clouds are processed by ground segmentation to extract the ground terrain from the point clouds and Euclidean clustering to group the remaining points into clusters of trees and other vegetation. Ground points are used to build an elevation map modeling sloping terrain, ditches, and gullies. Poisson meshing was used to convert the tree and vegetation clusters into meshes stored in OBJ format. The terrain and vegetation data is stored in a .json file containing the position and orientation of the objects are are loaded on to the Mississippi State University Autonomous Vehicle Simulator (MAVS) to be reconstructed in a digital environment. The proposed project has a wide range of applications including virtual autonomous vehicle testing, synthetic testing generation, and training of AI models.

*Honorable Mention for Category 3: Applied Data Science Research 

Student Scientists: Justin Yee
Advisor: Dr. Jingdao Chen
Departments: Department of Computer Science and Engineering
Co-Authors: Prabesh Khanal, Amanuel Tesfaye

Food Insecurity: Paying the Price During the COVID-19 Pandemic

The COVID-19 pandemic and its aftermath have impacted the lives of millions of people, disrupting not only public health but also food supply chains. Consumer food prices increase over time, but recent increases are faster than usual. In December 2022, the U.S. Consumer Price Index for food was 24% higher than in January 2019, before the pandemic. Aggregate prices for eggs, poultry, meat, and seafood rose 29% over this period, with egg prices alone rising by 71%. Prices for fruits and vegetables increased too, albeit slower than other food categories, with prices for processed fruits and vegetables rising more than for fresh produce. Changes in food prices may affect the number of low- and no-income individuals participating in food-purchasing assistance programs, such as the federal Supplemental Nutrition Assistance Program (SNAP). Our research examines changes in SNAP participation resulting from changes in food prices during the COVID-19 pandemic and changes in SNAP participation relative to pre-pandemic levels at the national and regional levels. At the national level, we find that a 1% increase in food prices is associated with an increase in SNAP participation of 2.08 million people over the period comprised between March 2020 (when the World Health Organization declared COVID-19 a pandemic) and November 2020 (when the Food & Drug Administration authorized emergency use of a COVID-19 vaccine in the United States). Our models also examine differences in SNAP participation and food prices across U.S. regions.

*Honorable Mention for Category 3: Applied Data Science Research 

Student Scientists: Josie Nasekos
Advisor: Dr. Alba J. Collart
Department: Department of Agricultural Economics