HELLO, I AM SIMON M. THOMAS

SCIENTIST WORKING IN MACHINE LEARNING, BIOINFORMATICS & SOFTWARE DEVELOPMENT.

I began my scientific training with a Bachelor of Science at the Queensland University of Technology in Brisbane, Australia (2015). A developing interest for me at this time was the power of computational methods to answer biological questions, which ultimately lead to my focus on bioinformatics. Upon graduation I worked at the Australian Centre for Robotic Vision (2016), which served as my foray into Machine Learning, working on plant species classification. Later I began studying a Master of Bioinformatics at the University of Queensland, and this coincided with employment as a laboratory scientist at a histopathology laboratory. Seeing the potential of combining these two areas, my master's thesis culminated in a demonstration of the effectiveness of machine learning for skin cancer diagnosis (2018).

The application of deep learning techniques to biomedical imaging naturally extended to a PhD program, where I focused my interests on the problem of machine learning interpretability, in particular as it relates to medical diagnosis such as skin cancer. I completed my PhD (thesis) under the supervision of Dr. Nicholas A. Hamilton, at the Institute for Molecular Bioscience (IMB) at the University of Queensland (2022).

In 2022, I became a CERC Postdoctoral Fellow at CSIRO, working on applied machine learning in the domain of precision medicine. It is here in which I developed a working example of the way in which generative modelling can pave the way for highly interpreatable machine learning systems in histopathology. See my interactive exposition on this work here.

I am currently a Clinical AI Associate at Franklin.ai, working as a technical lead on machine learning applications for decision support in histopathology.

My scientific interests are primarily in the areas of machine learning interpretability applied to problems in medicine, as well as the philosophy of science and knowledge creation in machine learning systems.

Publications

Peer-Reviewed

Non-melanoma skin cancer segmentation for histopathology dataset

Densely labelled segmentation data for digital pathology images is costly to produce but is invaluable to training effective machine learning models. We make available 290 hand-annotated histopathology tissue sections of the 3 most common skin cancers; basal cell carcinoma (BCC), squamous cell carcinoma (SCC) and intraepidermal carcinoma (IEC). These non-melanoma skin cancers constitute over 90% of all skin cancer diagnoses and hence this dataset gives an opportunity to the scientific community to benchmark analytic methodologies on a significant portion of the dermatopathology workflow. The data represents typical cases of the three cancer types (not requiring a differential diagnosis) across shave, punch and excision biopsy contexts. Each image is accompanied with a segmentation mask which characterizes the section into 12 tissue types, specifically: keratin, epidermis, papillary dermis, reticular dermis, hypodermis, inflammation, glands, hair follicles and background, as well as BCC, SCC and IEC. Included also are cancer margin measurements to work towards automated assessment of surgical margin clearance and tumour invasion. This leaves open many opportunities for researchers to utilize or extend the dataset, building upon recent work on image analysis problems in skin cancer (Thomas et al., 2021).

Ref: Thomas, S. M., Lefevre, J. G., Baxter, G., & Hamilton, N. A. (2021). Non-melanoma skin cancer segmentation for histopathology dataset. Data in brief, 39, 107587.

Characterization of tissue types in basal cell carcinoma images via generative modeling and concept vectors

The promise of machine learning methods to act as decision support systems for pathologists continues to grow. However, central to their successful adoption must be interpretable implementations so that people can trust and learn from them effectively. Generative modeling, most notable in the form of adversarial generative models, is a naturally interpretable technique because the quality of the model is explicit from the quality of images it generates. Such a model can be further assessed by exploring its latent space, using human-meaningful concepts by defining concept vectors. Motivated by these ideas, we apply for the first time generative methods to histological images of basal cell carcinoma (BCC). By simultaneously learning to generate and encode realistic image patches, we extract feature rich latent vectors that correspond to various tissue morphologies, namely BCC, epidermis, keratin, papillary dermis and inflammation. We show that a logistic regression model trained on these latent vectors can achieve high classification accuracies across 6 binary tasks (86–98%). Further, by projecting the latent vectors onto learned concept vectors we can generate a score for the absence or degree of presence for a given concept, providing semantically accurate “conceptual summaries” of the various tissues types within a patch. This can be extended to generate multi-dimensional heat maps for whole-image specimens, which characterizes the tissue in a similar way to a pathologist. We additionally find that accurate concept vectors can be defined using a small labeled dataset.

Ref: Thomas, S. M., Lefevre, J. G., Baxter, G., & Hamilton, N. A. (2021). Characterization of tissue types in basal cell carcinoma images via generative modeling and concept vectors. Computerized Medical Imaging and Graphics, 94, 101998.

Interpretable deep learning systems for multi-class segmentation and classification of non-melanoma skin cancer.

We apply for the first-time interpretable deep learning methods simultaneously to the most common skin cancers (basal cell carcinoma, squamous cell carcinoma and intraepidermal carcinoma) in a histological setting. As these three cancer types constitute more than 90% of diagnoses, we demonstrate that the majority of dermatopathology work is amenable to automatic machine analysis. A major feature of this work is characterising the tissue by classifying it into 12 meaningful dermatological classes, including hair follicles, sweat glands as well as identifying the well-defined stratified layers of the skin. These provide highly interpretable outputs as the network is trained to represent the problem domain in the same way a pathologist would. While this enables a high accuracy of whole image classification (93.6-97.9%), by characterising the full context of the tissue we can also work towards performing routine pathologist tasks, for instance, orientating sections and automatically assessing and measuring surgical margins. This work seeks to inform ways in which future computer aided diagnosis systems could be applied usefully in a clinical setting with human interpretable outcomes.

Ref: Thomas, S. M., Lefevre, J. G., Baxter, G., & Hamilton, N. A. (2021). Interpretable deep learning systems for multi-class segmentation and classification of non-melanoma skin cancer. Medical Image Analysis, 68, 101915.

General Interest

Pathologist Versus Artificial Pathologist: What Do We Really Want (Need) From Machine Learning?

The burgeoning development of machine learning systems for digital pathology places us in an exciting time. One often reads that the complexities of anatomical pathology are now, or are soon to be unraveled by the latest machine learning technologies. Such incredible claims are bolstered by the experience of seeing a system classify histology images (or better, training one’s own). It truly is remarkable that this is even possible. Yet, as this becomes a more common experience for the pathology community, it is likely that our current expectations and ambitions will be tempered by the constraints of reality...

Ref: Thomas, S. M., (2020). Pathologist Versus Artificial Pathologist. Digital Pathology Association - Blog, Online.

Algebra With Python

Algebra With Python introduces a revolutionary approach to learning algebra from scratch using the symbolic manipulation Sympy library written for Python. This book advocates the use of computers for learning mathematics from the beginning. It does for algebra what calculators have done for arithmetic. Instead of waiting until university to use this technology, Algebra With Python brings the speed, intuition and power of symbolic manipulation to the finger-tips of the beginner mathematician.

Format: eBook, Pages: +420, Published March 2020. Cost: Free.

Introduction To Machine Learning (Course)

I developed this course to improve the introductory knowledge of deep learning within the University of Queensland's research community. This course starts from the basics, introducing optimization and linear models from first-principles. Part 2 steps into deep learning, utilising the Tensorflow 2.+ library to perform regression and multi-class classification. Part 3 applies the same principles in the context of image processing, discussing convolutions, as well as how to validate models. By the end of the course you will understand the principles of modern machine learning, the componments that form the backbone of state-of-the-art models, and some experience implementing networks in a deep learning framework.

Contents: 10min Video, 3 Jupyter notebooks, traning data and saved weights

Projects

Writing Web-Native Papers for Expressive Scientific Storytelling

An on-going project to develop a workflow for people who are interested in using the web as an alternative publishing platform for scientific papers, utilising the flexibile and interative features of modern web-development.

Status: In progress.