KSM Tozammel Hossain

Logo

Tozammel's personal website

GitHub

About Me

My name is Tozammel Hossain. I am currently an assistant professor in the Department of Information Science at the University of North Texas. Previously, I was an assistant research professor at the University of Missouri - Institute for Data Science and Informatics. I received my PhD in computer science at Virginia Tech and earned three years of experience as a postdoctoral associate at the University of Southern California-Information Sciences Institute (USC-ISI).

My research interests lie in applied machine learning and data science, with an emphasis on bioinformatics, health informatics, social science, and cybersecurity. Some of my recent research include devising methods for anticipating societal population-level events (e.g., flu outbreak and civil unrest), disease incidence, and cyber-attacks; inferring co-evolving entities and their interactions in complex systems; and modeling state and behavior in social systems (e.g., ideological leanings, polarization, and equity).

My career goal is to solve challenging problems in machine learning and data science.

Education

Virginia Tech

I earned my doctoral degree in the Department of Computer Science at Virginia Tech. I conducted my research in the Discovery Analytics Center led by Dr. Naren Ramakrishnan. My thesis was in the area of bioinformatics, specifically on modeling evolutionary constraints and improving multiple sequence alignments using couplings. I received a masters degree from the same department in 2014.

Bangladesh University of Engineering & Technology

I obtained my B.Sc. degree in Computer Science & Engineering in the Department of CSE at Bangladesh University of Engineering & Technology in 2007. I conducted my undergraduate thesis under the supervision of Dr. Saidur Rahman, and my topic was Applications of Graphs in Bioinformatics.

Experience

Publications

News Coverage

[Health ’16] Anonymous. “Study Data from Virginia Tech Update Understanding of Epidemiology”. In: Health & Medicine Week (2016). Atlanta, 29 Apr 2016:5537.
[Conv ’16] Mohammad R Islam, KSM Tozammel Hossain, Siddharth Krishnan, and Naren Ramakrishnan. What AI Can Tell Us About the U.S. Supreme Court. In: The Conversation (2016). [This news is also featured on ACM Technews and the Daily Mail, UK.] [link]

Invited Talk

[INFORMS ’17] KSM Tozammel Hossain, Greg Ver Steeg, and Aram Galstyan.Identifying latent structuresin human performance data using CorEx. INFORMS 2017.

Refereed Journal Papers

[AS ‘21] Prayitno, Chi-Ren Shyu, Karisma Trinanda Putra, Hsing-Chung Chen, Yuan-Yu Tsai, KSM Tozammel Hossain, Wei Jiang, and Zon-Yin Shae. “A Systematic Review of Federated Learning in the Healthcare Area: from the perspective of Data Properties and Applica- tions”. In: Applied Science (2021)
[JBA ‘21] Abu S M Mosa, Chalermpon Thongmotai, Humayera Islam, Tanmoy Pal, KSM Tozam- mel Hossain, and Vasanthi Mandhadi. “Evaluation of Machine Learning Applications using Real-World EHR Data for Predicting Diabetes-Related Long-Term Complications”. In: Journal of Business Analytics (2021), pp. 1–11
[JDMS ‘18] KSM Tozammel Hossain, Shuyang Gao, Brendan Kennedy, Aram Galstyan, and Prem Natarajan. “Forecasting violent events in the Middle East and North Africa using HMM and regularized autoregressive models”. In: Journal of Defense Modeling and Simulation (2018). [Accepted].
[DAMI ‘15] Liangzhe Chen, KSM Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, and B Aditya Prakash. Syndromic Surveillance of Flu on Twitter using Weakly Supervised Temporal Topic Models. In: Data Mining and Knowledge Discovery (DAMI) (2015), pp. 1–30. [Link]
[TCBB ‘13] KSM Tozammel Hossain, Debprakash Patnaik, Srivatsan Laxman, Prateek Jain, Chris Bailey-Kellogg, and Naren Ramakrishnan. Improved Multiple Sequence Alignments using Coupled Pattern Mining. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 10.5 (Sept. 2013), pp. 1098–1112. [Link]
[JAMC ’09] Muhammad N Yanhaona, KSM Tozammel Hossain, and M Saidur Rahman. Pairwise Compatibility Graphs. In: Journal of Applied Mathematics and Computing (JAMC) 30.1- 2 (2009), pp. 479–503. [Link]

Refereed Conference Papers

[AMIA ‘22] Lokesh Karanam, Kun Yi, Zon-Yin Shae, David Chang, Chi-Ren Shyu, and K S M Tozammel Hossain. “Continuous Anticipation of AKI in the ICU using Time-Gated LSTMs”. In: AMIA 2022 Clinical Informatics Conference. 2022. [ICML WS ’19] Mehrnoosh Mirtaheri, Sami Abu-El-Haija, KSM Tozammel Hossain, Fred Morstatter, and Aram Galstyan. “Tensor-based Method for Temporal Geopolitical Event Forecasting”. In: ICML Workshop. 2019.
[IJCAI Demo ’19] Fred Morstatter, Aram Galstyan, and Gleb Satyukov et al. “SAGE: A Hybrid Geopolitical Event Forecasting System”. In: IJCAI Demo. 2019.
[epiDAMIK ’18] Huijuan Shao, KSM Tozammel Hossain, Hao Wu, Maleq Khan, Anil Vullikanti, B Aditya Prakash, Madhav Marathe, and Naren Ramakrishnan. “Forecasting the Flu: Designing So- cial Network Sensors for Epidemics”. In: Proceedings of the ACM SIGKDD Workshop on Epidemiology meets Data Mining and Knowledge Discovery. 2018.
[AAAI ‘16] Mohammad R Islam, KSM Tozammel Hossain, Siddharth Krishnan, and Naren Ramakrishnan. Inferring Multi-dimensional Ideal Points for US Supreme Court Justices. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI). 2016. [Acceptance Rate = 26%]
[ICDM ’14] Liangzhe Chen, KSM Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, and B Aditya Prakash. Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models. In: Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE Computer Society, 2014, pp. 755–760. [Acceptance Rate 142/727 = 19.53%]. [Link]
[BCB ’12] KSM Tozammel Hossain, Debprakash Patnaik, Srivatsan Laxman, Prateek Jain, Chris Bailey-Kellogg, and Naren Ramakrishnan. Improved Multiple Sequence Alignments using Coupled Pattern Mining. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. Orlando, Florida: ACM, 2012, pp. 28–35. [Acceptance Rate 33/159 = 20.75%]. [Link]
[BIOKDD ’11] KSM Tozammel Hossain, Chris Bailey-Kellogg, Alan M Friedman, Michael J Bradley, Nathan Baker, and Naren Ramakrishnan. Using Physicochemical Properties of Amino Acids to Induce Graphical Models of Residue Couplings. In: Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics. San Diego, California: ACM, 2011, 3:1–3:10. [Link]
[KDD ’10] Naren Sundaravaradan, KSM Tozammel Hossain, Vandana Sreedharan, Douglas J Slotta, John Paul Vergara, Lenwood S Heath, and Naren Ramakrishnan. Extracting Temporal Signatures for Comprehending Systems Biology Models. In: Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA: ACM, 2010, pp. 453–462. [Acceptance Rate 77/578 = 13.32%]. [Link]
[WALCOM ’08] Muhammad N Yanhaona, KSM Tozammel Hossain, and M Saidur Rahman. Pairwise Compatibility Graphs. In: Proceedings of the Second International Workshop on Algorithms and Computation (WALCOM). Springer, 2008, pp. 222–233. [Acceptance Rate 19/57 = 33.3%]. [Link]

Under Review

[AMIA ‘22] Lokesh Karanam, Kun Yi, Zon-Yin Shae, David Chang, Chi-Ren Shyu, and K S M Tozammel Hossain. “Continuous Anticipation of AKI in the ICU using Time-Gated LSTMs”. In: 2022. [Under review in the AMIA 2022 Clinical Informatics Conference]
[FAI ‘22] KSM Tozammel Hossain, Hrayr Harutyunyan, Yue Ning, Brendan Kennedy, Naren Ramakrishnan, and Aram Galstyan. “Identifying Event Precursors using Attention-based LSTMs”. In: (2022). [Under Review in Frontiers in Artificial Intelligence]
[WCA ‘22] Bhuwan Thapa, Sarah Lovell, Zhen Cai, K S M Tozammel Hossain, and Mi Young Kwon. “Agroforestry for Climate Risk Management: Effectiveness of Windbreaks in Reducing Crop Loss in the Midwest, USA”. In: 2022. [Under review in the 5th World Congress on Agroforestry].

Archived

[arXiv ‘18] Florian Quinkert, Thorsten Holz,KSM Tozammel Hossain, Emilio Ferrara, and KristinaLerman. “RAPTOR: Ransomware Attack PredicTOR”. In:arXiv preprint arXiv:1803.01598 (2018).
[arXiv ‘16] KSM Tozammel Hossain†, Huijuan Shao†, Hao Wu, Maleq Khan, Anil Vullikanti, B Aditya Prakash, Madhav Marathe, and Naren Ramakrishnan. Forecasting the Flu: Designing Social Network Sensors for Epidemics. In: arXiv:1602.06866.(2016). (Published in ACM SIGKDD Workshop epiDAMIK ‘2018) [Link]

Research Projects

Some of the selected projects that I am either working or had been working on is listed below.

Modeling Evolutionary Constraints in Proteins

This project aims to model evolutionary constraints in proteins. Evolutionary constraints shape the sequences, structures, and functions of protein families. We are interested in a type of evolutionary constraint, residue coupling or correlated mutation, which is an important indicator for predicting protein structures and revealing functional insights into proteins. We are focusing on modeling rich set of residue couplings, both pairwise and higher-order, with an emphasis on providing a mechanistic explanation for couplings and decomposing couplings of various orders. We also investigate a method for mining frequent episodes, called coupled patterns, in an alignment produced by a classical algorithm for both proteins and RNAs, and for exploiting the coupled patterns for improving the alignment quality concerning exposition of couplings. NSF supports this project with a proposal—Integration, Prediction, and Generation of Mixed Mode Information. This proposal is a collaboration between Carnegie Mellon University, Dartmouth College, Purdue University, and PNNL.

Modeling Ailment State of Users using Social Network Data

Contagions arise in many situations—biological (like Flu), social (memes, hashtag propagating on twitter), etc. While epidemiological research has inspired researchers modeling social contagion, recent work has shown that there are key aspects along which social contagions differ from biological contagions. In this project, we reconcile the apparently contrasting behaviors with a finer-grained modeling of biological phases as inferred from tweets. We propose a temporal topic model for inferring hidden biological states for users. Our work can be seen as a stepping stone to a better understanding of contagions that occur in both biological and social spheres.

Designing Social Network Sensors for Epidemics

Early detection and modeling of a contagious epidemic can provide important guidance about quelling the contagion, controlling its spread, or the effective design of countermeasures. The goal of this project is to design social network sensors—a small set of people who can be monitored to provide insight into the emergence of an epidemic in a larger population. Using the graph theoretic notion of dominators we develop an efficient and effective heuristic for lead-time detection. Using city-scale datasets generated by extensive microscopic epidemiological simulations involving millions of individuals, we illustrate the practical applicability of our methods and show significant benefits (up to 22 days more lead time) compared to other competitors.

Inferring Ideal Points for US Supreme Court Justices

In Supreme Court parlance and the political science literature, an ideal point positions a justice in a continuous space and can be interpreted as a quantification of the justices policy preferences. We present an automated approach to infer such ideal points for justices of the US Supreme Court. This approach combines topic modeling over case opinions with the voting (and endorsing) behavior of justices. Furthermore, given a topic of interest, say the Fourth Amendment, the topic model can be optionally seeded with supervised information to steer the inference of ideal points. Application of this methodology to five years of cases provides interesting perspectives into the leaning of justices on crucial issues, coalitions underlying specific topics, and the role of swing justices in deciding the outcomes of cases.

Feature Selection in Rank-Order Spaces

In some important financial and scientific domains, features are best interpreted via ordinal comparisons with other features, rather than as absolute values. For example, biological cells go through various phases, and abundant of cell products may vary or remain constant in these cycles. It is more realistic to compare the abundant of cell products in terms of their ranks. We propose algorithms for extracting temporal signatures from multi-variate time series data, where the signatures are composed of ordinal comparisons between time series components.

Computational Modeling of Gene Silencing

I worked on the project Computational Modeling of Gene Silencing for one year (Aug 2009 to Jun 2010). The goal of the project was to construct models of the gene silencing phenomenon for the microscopic worm Caenorhabditis Elegans. My responsibility was to improve the data integration and annotation part of the model. Moreover, I performed data analysis by applying compositional data mining, a new tool developed in the computer science department, to retrieve interesting and informative patterns from the data. This project had been running for six years and ended successfully on June 2010. This Dr. Lenwood S. Heath and Dr. Naren Ramakrishnan supervised me for this project.

Contact