About Me
My name is Tozammel Hossain. I am currently an assistant professor in the Department of Information Science at the University of North Texas. Previously, I was an assistant research professor at the University of Missouri - Institute for Data Science and Informatics. I received my PhD in computer science at Virginia Tech and earned three years of experience as a postdoctoral associate at the University of Southern California-Information Sciences Institute (USC-ISI).
My research interests lie in applied machine learning and data science, with an emphasis on bioinformatics, health informatics, social science, and cybersecurity. Some of my recent research include devising methods for anticipating societal population-level events (e.g., flu outbreak and civil unrest), disease incidence, and cyber-attacks; inferring co-evolving entities and their interactions in complex systems; and modeling state and behavior in social systems (e.g., ideological leanings, polarization, and equity).
My career goal is to solve challenging problems in machine learning and data science.
Education
Virginia Tech
I earned my doctoral degree in the Department of Computer Science at Virginia Tech. I conducted my research in the Discovery Analytics Center led by Dr. Naren Ramakrishnan. My thesis was in the area of bioinformatics, specifically on modeling evolutionary constraints and improving multiple sequence alignments using couplings. I received a masters degree from the same department in 2014.
Bangladesh University of Engineering & Technology
I obtained my B.Sc. degree in Computer Science & Engineering in the Department of CSE at Bangladesh University of Engineering & Technology in 2007. I conducted my undergraduate thesis under the supervision of Dr. Saidur Rahman, and my topic was Applications of Graphs in Bioinformatics.
Experience
- Assistant Professor, Dept. of Information Science, University of North Texas, Aug 2022–To Date
- Assistant Research Professor, Institute for Data Science & Informatics, University of Missouri, Oct 2019–Aug 2022
- Postdoctoral Research Associate, Information Sciences Institute, University of Southern California, Oct 2016–Sep, 2019
- Research Assistant, Discovery Analytics Center, Virginia Tech, Jun 2013–May 2014; Jun 2015–Aug 2016
- Research Assistant, SoftLab, Virginia Tech, Aug 2010–May 2013
- Research Assistant, Bioinformatics Lab, Virginia Tech, Jan 2010–Aug 2010
- Teaching Assistant, Dept. of CS, Virginia Tech, Aug 2009–Dec 2009; Aug 2014–May 2015
- Software Engineer, Commlink Info Tech Ltd., Dhaka, Bangladesh, Jun 2007–Jul 2009
Publications
News Coverage
- Asif Razzaq. “OPEN-RAG: A Novel AI Framework Designed to Enhance Reasoning Capabilities in RAG with Open-Source LLMs.” In: Marktechpost (2024).
- Anonymous. “Study Data from Virginia Tech Update Understanding of Epidemiology.” In: Health & Medicine Week (2016). Atlanta, 29 Apr 2016:5537.
- Mohammad R Islam, KSM Tozammel Hossain, Siddharth Krishnan, and Naren Ramakrishnan. What AI Can Tell Us About the U.S. Supreme Court. In: The Conversation (2016). [This news is also featured on ACM Technews and the Daily Mail, UK.] [link]
Invited Talk
- Bhuwan Thapa, Sarah Lovell, Zhen Cai, K S M Tozammel Hossain, and Mi Young Kwon. “Agroforestry for Climate Risk Management: Effectiveness of Windbreaks in Reducing Crop Loss in the Midwest, USA”. In: 5th World Congress on Agroforestry. 2022.
- KSM Tozammel Hossain, Greg Ver Steeg, and Aram Galstyan.Identifying latent structures in human performance data using CorEx. INFORMS 2017.
Refereed Journal Papers
- Yulia I. Nussbaum, K.S.M. Tozammel Hossain, Jussuf Kaifi, Wesley C. Warren, Chi-Ren Shyu, and Jonathan B. Mitchem. “Identifying gene expression programs in single-cell RNA-seq data using linear correlation explanation.” In: Journal of Biomedical Informatics 154 (2024), p. 104644. ISSN: 1532-0464.
- D. Benjamin, F. Morstatter, A. Abbas, A. Abeliuk, P. Atanasov, S. Bennett, A. Beger, S. Birari, D. Budescu, M. Catasta, E. Ferrara, L. Haravitch, M. Himmelstein, K. Hossain, Y. Huang, R. Joseph, J. Leskovec, A. Matsui, M. Mirtaheri, G. Satyukov, R. Sethi, A. Singh, R. Sosic, M. Steyvers, P. Szekely, M. Ward, and A. Galstyan. “Hybrid Forecasting of Geopolitical Events”. In: AI Magazine (2023).
- K. S. M. Tozammel Hossain, Hrayr Harutyunyan, Yue Ning, Brendan Kennedy, Naren Ramakrishnan, and Aram Galstyan. “Identifying geopolitical event precursors using attention-based LSTMs.” In: Frontiers in Artificial Intelligence 5 (2022). ISSN: 2624-8212. DOI: 10.3389/frai.2022.893875.
- Prayitno, Chi-Ren Shyu, Karisma Trinanda Putra, Hsing-Chung Chen, Yuan-Yu Tsai, KSM Tozammel Hossain, Wei Jiang, and Zon-Yin Shae. “A Systematic Review of Federated Learning in the Healthcare Area: from the perspective of Data Properties and applications.” In: Applied Science (2021)
- Abu S M Mosa, Chalermpon Thongmotai, Humayera Islam, Tanmoy Pal, KSM Tozam- mel Hossain, and Vasanthi Mandhadi. “Evaluation of Machine Learning Applications using Real-World EHR Data for Predicting Diabetes-Related Long-Term Complications.” In: Journal of Business Analytics (2021), pp. 1–11
- KSM Tozammel Hossain, Shuyang Gao, Brendan Kennedy, Aram Galstyan, and Prem Natarajan. “Forecasting violent events in the Middle East and North Africa using HMM and regularized autoregressive models.” In: Journal of Defense Modeling and Simulation (2018). [Accepted].
- Liangzhe Chen, KSM Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, and B Aditya Prakash. Syndromic Surveillance of Flu on Twitter using Weakly Supervised Temporal Topic Models. In: Data Mining and Knowledge Discovery (DAMI) (2015), pp. 1–30. [Link]
- KSM Tozammel Hossain, Debprakash Patnaik, Srivatsan Laxman, Prateek Jain, Chris Bailey-Kellogg, and Naren Ramakrishnan. Improved Multiple Sequence Alignments using Coupled Pattern Mining. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 10.5 (Sept. 2013), pp. 1098–1112. [Link]
- Muhammad N Yanhaona, KSM Tozammel Hossain, and M Saidur Rahman. Pairwise Compatibility Graphs. In: Journal of Applied Mathematics and Computing (JAMC) 30.1- 2 (2009), pp. 479–503. [Link]
Refereed Conference Papers
- Islam, S.B., Rahman, M.A., Hossain, K.S.M., Hoque, E., Joty, S. and Parvez, M.R., 2024. OPEN-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models. EMNLP Findings, 2024. arXiv preprint arXiv:2410.01782.
- Md Maruf Hossain Shuvo, Twisha Titirsha, K S M Tozammel Hossain, Guido Lastra Gonzalez, and Syed Kamrul Islam. “Enhancing Personalization and Mitigating Inter-Patient Variability in Continuous Blood Glucose Prediction Using Multi-Task Deep LSTMs.” In: IEEE International Symposium on Medical Measurements and Applications (MeMeA). 2024.
- Huyen Nguyen, Haihua Chen, Roopesh Maganti, K S M Tozammel Hossain, and Junhua Ding. “Identifying High-quality Informative Comments for Software Review Summarization”. In: IEEE AITest 2023. 2023.
- Lokesh Karanam, Kun Yi, Zon-Yin Shae, David Chang, Chi-Ren Shyu, and K S M Tozammel Hossain. “Continuous Anticipation of AKI in the ICU using Time-Gated LSTMs.” In: AMIA 2022 Clinical Informatics Conference. 2022.
- Mehrnoosh Mirtaheri, Sami Abu-El-Haija, KSM Tozammel Hossain, Fred Morstatter, and Aram Galstyan. “Tensor-based Method for Temporal Geopolitical Event Forecasting.” In: ICML Workshop. 2019.
- Fred Morstatter, Aram Galstyan, and Gleb Satyukov et al. “SAGE: A Hybrid Geopolitical Event Forecasting System.” In: IJCAI Demo. 2019.
- Huijuan Shao†, KSM Tozammel Hossain†, Hao Wu, Maleq Khan, Anil Vullikanti, B Aditya Prakash, Madhav Marathe, and Naren Ramakrishnan. “Forecasting the Flu: Designing Social Network Sensors for Epidemics.” In: Proceedings of the ACM SIGKDD Workshop on Epidemiology meets Data Mining and Knowledge Discovery. 2018.
- Mohammad R Islam†, KSM Tozammel Hossain†, Siddharth Krishnan, and Naren Ramakrishnan. Inferring Multi-dimensional Ideal Points for US Supreme Court Justices. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI). 2016. [Acceptance Rate = 26%]
- Liangzhe Chen†, KSM Tozammel Hossain†, Patrick Butler, Naren Ramakrishnan, and B Aditya Prakash. Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models. In: Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE Computer Society, 2014, pp. 755–760. [Acceptance Rate 142/727 = 19.53%]. [Link]
- KSM Tozammel Hossain, Debprakash Patnaik, Srivatsan Laxman, Prateek Jain, Chris Bailey-Kellogg, and Naren Ramakrishnan. Improved Multiple Sequence Alignments using Coupled Pattern Mining. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. Orlando, Florida: ACM, 2012, pp. 28–35. [Acceptance Rate 33/159 = 20.75%]. [Link]
- KSM Tozammel Hossain, Chris Bailey-Kellogg, Alan M Friedman, Michael J Bradley, Nathan Baker, and Naren Ramakrishnan. Using Physicochemical Properties of Amino Acids to Induce Graphical Models of Residue Couplings. In: Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics. San Diego, California: ACM, 2011, 3:1–3:10. [Link]
- Naren Sundaravaradan, KSM Tozammel Hossain, Vandana Sreedharan, Douglas J Slotta, John Paul Vergara, Lenwood S Heath, and Naren Ramakrishnan. Extracting Temporal Signatures for Comprehending Systems Biology Models. In: Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA: ACM, 2010, pp. 453–462. [Acceptance Rate 77/578 = 13.32%]. [Link]
- Muhammad N Yanhaona, KSM Tozammel Hossain, and M Saidur Rahman. Pairwise Compatibility Graphs. In: Proceedings of the Second International Workshop on Algorithms and Computation (WALCOM). Springer, 2008, pp. 222–233. [Acceptance Rate 19/57 = 33.3%]. [Link]
Under Review
- Lokesh Karanam, Kun Yi, Zon-Yin Shae, David Chang, Chi-Ren Shyu, and K S M Tozammel Hossain. “Continuous Anticipation of AKI in the ICU using Time-Gated LSTMs.” In: 2022. [Under review in the AMIA 2022 Clinical Informatics Conference]
- KSM Tozammel Hossain, Hrayr Harutyunyan, Yue Ning, Brendan Kennedy, Naren Ramakrishnan, and Aram Galstyan. “Identifying Event Precursors using Attention-based LSTMs.” In: (2022). [Under Review in Frontiers in Artificial Intelligence]
- Bhuwan Thapa, Sarah Lovell, Zhen Cai, K S M Tozammel Hossain, and Mi Young Kwon. “Agroforestry for Climate Risk Management: Effectiveness of Windbreaks in Reducing Crop Loss in the Midwest, USA”. In: 2022. [Under review in the 5th World Congress on Agroforestry].
Archived
[arXiv ‘18] Florian Quinkert, Thorsten Holz,KSM Tozammel Hossain, Emilio Ferrara, and KristinaLerman. “RAPTOR: Ransomware Attack PredicTOR”. In:arXiv preprint arXiv:1803.01598 (2018).
[arXiv ‘16] KSM Tozammel Hossain†, Huijuan Shao†, Hao Wu, Maleq Khan, Anil Vullikanti, B Aditya Prakash, Madhav Marathe, and Naren Ramakrishnan. Forecasting the Flu: Designing Social Network Sensors for Epidemics. In: arXiv:1602.06866.(2016). (Published in ACM SIGKDD Workshop epiDAMIK ‘2018) [Link]
Research Projects
Below are some of the selected projects I am working on or have previously worked on.
Modeling Evolutionary Constraints in Proteins
This project aims to model evolutionary constraints in proteins. Evolutionary constraints shape the sequences, structures, and functions of protein families. We are interested in a type of evolutionary constraint, residue coupling, or correlated mutation, an important indicator for predicting protein structures and revealing functional insights into proteins. We are focusing on modeling a rich set of pairwise and higher-order residue couplings, emphasizing providing a mechanistic explanation for couplings and decomposing couplings of various orders. We also investigate a method for mining frequent episodes, called coupled patterns, in an alignment produced by a classical algorithm for proteins and RNAs. We also exploit the coupled patterns to improve the alignment quality concerning the exposition of couplings. NSF supports this project with a proposal for integrating, predicting, and generating mixed-mode information. This proposal is a collaboration between Carnegie Mellon University, Dartmouth College, Purdue University, and PNNL.
Modeling Ailment State of Users using Social Network Data
Contagions arise in many situations, such as biological (like Flu), social (memes, hashtag propagating on Twitter), etc. While epidemiological research has inspired researchers modeling social contagion, recent work has shown that there are key aspects along which social contagions differ from biological contagions. In this project, we reconcile the apparently contrasting behaviors with finer-grained modeling of biological phases as inferred from tweets. We propose a temporal topic model for inferring hidden biological states for users. Our work can be seen as a stepping stone to a better understanding of contagions in both biological and social spheres.
Designing Social Network Sensors for Epidemics
Early detection and modeling of a contagious epidemic can provide important guidance about quelling the contagion, controlling its spread, or the effective design of countermeasures. This project aims to design social network sensors—a small set of people who can be monitored to provide insight into the emergence of an epidemic in a larger population. Using the graph-theoretic notion of dominators, we develop an efficient and effective heuristic for lead-time detection. Using city-scale datasets generated by extensive microscopic epidemiological simulations involving millions of individuals, we illustrate the practical applicability of our methods and show significant benefits (up to 22 days more lead time) compared to other competitors.
Inferring Ideal Points for US Supreme Court Justices
In Supreme Court parlance and the political science literature, an ideal point positions a justice in a continuous space. It can be interpreted as a quantification of the justice policy preferences. We present an automated approach to infer such ideal points for justices of the US Supreme Court. This approach combines topic modeling over case opinions with judges’ voting (and endorsing) behavior. Furthermore, given a topic of interest, say the Fourth Amendment, the topic model can be optionally seeded with supervised information to steer the inference of ideal points. Applying this methodology to five years of cases provides exciting perspectives into the leaning of justices on crucial issues, coalitions underlying specific topics, and the role of swing justices in deciding the outcomes of cases.