Date of Defense
18-6-2025 7:00 PM
Location
F3-40
Document Type
Thesis Defense
Degree Name
Master of Science in Chemical Engineering (MSChE)
College
COE
Department
Chemical and Petroleum Engineering
First Advisor
Dr. Kaushik Sivaramakrishnan
Keywords
Machine Learning, Predictive Modeling, Bayesian Optimization, Fourier Transform Infrared Spectroscopy, Thermogravimetric Analysis, Rate of Penetration, Petrochemical, Ecofriendly, Bitumen, Medium-density Fiberboard, Gradient Boosted Regression, Random Forest
Abstract
Traditional experimental approaches in industrial processes, such as Fourier Transform Infrared Spectroscopy (FTIR) spectroscopy, thermogravimetric analysis (TGA), and well-drilling operations, are often constrained by time, cost, and operational limitations. This research explores the application of data-driven Machine Learning (ML)-based predictive modeling to improve efficiency and reduce dependency on resource-intensive experimentation. The study develops ML models for three distinct processes: FTIR intensity prediction of bitumen thermal cracking products, thermal degradation of Medium-Density Fibreboard (MDF) using TGA data, and Rate of Penetration (ROP) prediction in petrochemical industry. Six algorithms: Linear Regression (LinReg), Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), Random Forest (RF), and K-Nearest Neighbors (KNN) were evaluated across multiple scenarios. The models were assessed usingmetrics such as the coefficient of determination (R²) and Root Mean Squared Error (RMSE) to ensure both accuracy and generalization capabilities. All computational modeling, including data cleaning, feature engineering, ML modeling and Bayesian Optimization (BO), was performed using Python.
Results show that ensemble models, particularly GBR and RF, consistently outperformed other techniques in predictive accuracy and generalizability. In the FTIR analysis, GBR achieved 99.65% accuracy under an 80/20 data split, while RF yielded 94.37% accuracy when trained on lower temperatures and tested on unseen high temperatures. For the TGA data, RF achieved 100% test accuracies in oxidation and pyrolysis under full dataset splits, while GBR maintained strong performance in extrapolative scenarios achieving 98.91% accuracy for oxidation and 99.67% for pyrolysis when trained on lower heating rates and tested on higher ones. In ROP prediction, the GBR model reached 96.2% accuracy, outperforming empirical models such as the Bourgoyne and Young (BY) and Bingham models. The findings emphasize the importance of data distribution in training/testing splits, particularly when extrapolating to high-temperature conditions.
This study demonstrates the transformative potential of ML in enhancing predictive accuracy across various industrial systems. The integration of Python-based modeling, scenario-driven analysis, and advanced hyperparameter tuning through BO establishes a versatile framework for data-driven optimization. These outcomes support the broader adoption of ML in petrochemical and environmentally focused industries, offering pathways toward more sustainable, efficient, and intelligent process management.
Included in
DATA-DRIVEN MACHINE LEARNING APPLICATIONS FOR PREDICTIVE MODELING OF PETROCHEMICAL AND ECOFRENDLY SYSTEMS
F3-40
Traditional experimental approaches in industrial processes, such as Fourier Transform Infrared Spectroscopy (FTIR) spectroscopy, thermogravimetric analysis (TGA), and well-drilling operations, are often constrained by time, cost, and operational limitations. This research explores the application of data-driven Machine Learning (ML)-based predictive modeling to improve efficiency and reduce dependency on resource-intensive experimentation. The study develops ML models for three distinct processes: FTIR intensity prediction of bitumen thermal cracking products, thermal degradation of Medium-Density Fibreboard (MDF) using TGA data, and Rate of Penetration (ROP) prediction in petrochemical industry. Six algorithms: Linear Regression (LinReg), Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), Random Forest (RF), and K-Nearest Neighbors (KNN) were evaluated across multiple scenarios. The models were assessed usingmetrics such as the coefficient of determination (R²) and Root Mean Squared Error (RMSE) to ensure both accuracy and generalization capabilities. All computational modeling, including data cleaning, feature engineering, ML modeling and Bayesian Optimization (BO), was performed using Python.
Results show that ensemble models, particularly GBR and RF, consistently outperformed other techniques in predictive accuracy and generalizability. In the FTIR analysis, GBR achieved 99.65% accuracy under an 80/20 data split, while RF yielded 94.37% accuracy when trained on lower temperatures and tested on unseen high temperatures. For the TGA data, RF achieved 100% test accuracies in oxidation and pyrolysis under full dataset splits, while GBR maintained strong performance in extrapolative scenarios achieving 98.91% accuracy for oxidation and 99.67% for pyrolysis when trained on lower heating rates and tested on higher ones. In ROP prediction, the GBR model reached 96.2% accuracy, outperforming empirical models such as the Bourgoyne and Young (BY) and Bingham models. The findings emphasize the importance of data distribution in training/testing splits, particularly when extrapolating to high-temperature conditions.
This study demonstrates the transformative potential of ML in enhancing predictive accuracy across various industrial systems. The integration of Python-based modeling, scenario-driven analysis, and advanced hyperparameter tuning through BO establishes a versatile framework for data-driven optimization. These outcomes support the broader adoption of ML in petrochemical and environmentally focused industries, offering pathways toward more sustainable, efficient, and intelligent process management.