dc.description.abstract |
With these hard times that we are living after covid, inflation but also problems
like fertilizer shortage and supply of chain issues, has made everyone turn their
attention to better, more affordable, faster, and organic solution almost in every field
of science and not only.
The inspiration for this project was found on the BioSPRINT project, where the
target reaction is the simultaneous dehydration of multiple C5 and C6 sugars to
produce 5-HMF and FUR. The objective was to find machine learning (ML) models
that would speed up the discovery of catalysts using high-throughput (HTP) screening
techniques. Maximum activity for the conversion of complex sugar combinations is
sought, with the best selectivity for the major products of interest.
The three additional models used are generalised boosted regression modelling,
extreme gradient boosting and boosted generalised additive models for location, scale,
and shape.
The results show that XGBoost has the best performance overall. All the
models performed poorly in the case of Selectivity. Another approach for this response
is to apply a transformation on the response variable. The performance of these models
can be potentially improved by adding new “catalytic-informed” features, that will be
engineered based on the expert knowledge about the problem. |
en_US |