Led by Prof. Dr. Rafet Sifa, the Applied Machine Learning Lab focuses on addressing the challenges of implementing machine learning models in real-world settings while developing novel methods for pattern analysis and representation learning. The lab's primary area of investigation is based on constructing hybrid, interpretable, and resource-aware learning systems with practical applications in text mining, behavioral analytics, and medical informatics.
Prof. Dr. Rafet Sifa is a Machine Learning professor at University of Bonn and the head of the Media Engineering Department at Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS). His current research focus is based on statistical data mining in the context of representation learning for a variety of industry applications involving behavioral analytics, medical informatics, accounting, digital forensics and text mining. Dr. Lorenz Sparrenberg is a post-doctoral researcher and holds a PhD in natural sciences from RWTH Aachen University. At Fraunhofer FIT, Dr. Sparrenberg focused on cancer markers and multi-resistant germs, using single molecule detection methods as well as statistical approaches and machine learning. His interests now lie in research on large language models and the analysis of medical data. He also works as an independent data scientist and has substantial experience in industry.Tobias Deußer is a Machine Learning and Natural Language Processing researcher at the University of Bonn and a Senior Data Scientist at Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS). He is pursuing his PhD in Machine Learning at the University of Bonn. Prior to starting his PhD, he worked as a Data Scientist at Ernst & Young, where he developed and deployed various machine learning solutions in a Finance context. His research focuses on developing new methods to leverage large language models (LLMs) to improve downstream tasks that are typically unsuited to be solved by such models and how we can use LLMs to solve real-world problems.
Armin Berger is a Data Scientist at Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) and a PhD Research Fellow in Machine Learning at the University of Bonn. His research is centered on Model Distillation, particularly its application in Natural Language Processing (NLP) and Large Language Models (LLMs). This focus on model compression facilitates the deployment of LLMs in resource-constrained environments and enhances data privacy. Before his tenure at Fraunhofer and the University of Bonn, he finished his Master of Data Science at Monash University in Melbourne, Australia, and gained experience in Data Science and Management Consulting at various firms, including KPMG and Porsche Consulting.
Maren Pielka is a Data Scientist at Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) and a PhD Research Fellow in Machine Learning at the University of Bonn. Her research focus lies in Natural Language Processing (NLP) and Large Language Models (LLMs), with a particular interest in model compression and efficiency. For her PhD thesis, she studies different methods for integrating linguistic knowledge into LLM training. She finished her Master's in Computer Science at the University of Bonn in 2019. From 2020, she has been a full-time Data Scientist and worked in several NLP-related industry projects, for example on an AI-based tool for automated auditing.
Natural Language Processing (NLP) is a critical area of artificial intelligence and machine learning that focuses on the interaction between computers and human languages. This field enables machines to read, decipher, and respond to human languages in meaningful ways. Our lab is dedicated to pushing the boundaries of NLP through innovative research and practical applications in various domains, of which we highlight four below. We are committed to addressing the unique challenges in each field to harness the full potential of NLP in transforming our interaction with a diverse set of languages across various sectors.
Representation Learning is a fundamental aspect of NLP, involving the development of models that can represent linguistic data for machine understanding. A focus lies on the research of language models, which capture the context, semantics, and syntax of language, and how these models function and can be improved upon.
We advance Representation Learning in NLP by focusing on developing sophisticated language models, ensuring data diversity and quality, and fostering interdisciplinary collaborations. We emphasize low-resource settings and adaptability, enhancing model efficiency across various tasks. We also engage with the broader research community through open-source contributions to be pivotal in driving innovation in the field.
- Ramamurthy, R., Ammanabrolu, P., Brantley, K., Hessel, J., Sifa, R., Bauckhage, C., Hajishirzi, H & Choi, Y. (2023). Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization. The Eleventh International Conference on Learning Representations (ICLR).
- Deußer, T., Hillebrand, L., Bauckhage, C., & Sifa, R. (2023). Informed Named Entity Recognition Decoding for Generative Language Models.
- Wahab, A., & Sifa, R. (2021, December). DIBERT: Dependency injected bidirectional encoder representations from transformers. 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8.
Natural Language Processing (NLP) is transforming the landscape of financial reporting and accounting. In these fields, where the accuracy and timeliness of information are paramount, NLP stands out as a pivotal technology. Our lab is committed to leveraging NLP to enhance the efficiency and reliability of financial reporting and accounting processes, focusing on automating data extraction, enhancing report generation, and improving the financial auditing process.
In financial reporting and accounting, professionals often grapple with vast amounts of unstructured data, including invoices, receipts, and financial statements. One of the primary challenges lies in extracting and interpreting relevant financial information accurately from these diverse documents. Additionally, the financial sector is subject to rigorous compliance and regulatory standards, making it crucial for reports to be precise and in line with current laws and regulations. The opportunity for NLP is significant: automating data extraction and analysis can lead to faster, error-free reporting and accounting processes and aid in real-time decision-making based on financial insights.
Our lab is dedicated to developing advanced NLP solutions specifically tailored to financial reporting and accounting needs. We focus on creating algorithms that can seamlessly parse and interpret complex financial documents, extracting key data points with high accuracy. Our models are designed to understand and categorize financial information, facilitating swift and automated report generation. Furthermore, we are working on integrating regulatory compliance checks within these models, ensuring that all reports adhere to the latest financial regulations and standards. By incorporating NLP into financial reporting and accounting, we aim to revolutionize these fields, making them more efficient, compliant, and reliable.
- Deußer, T., Leonhard, D., Hillebrand, L. P., Berger, A., Khaled, M., Heiden, S., Dilmaghani, T., Kliem, B., Loitz, R., Bauckhage, C., & Sifa, R. (2023). Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models. International Conference on Big Data (BigData).
- Hillebrand, L., Deußer, T., Dilmaghani, T., Kliem, B., Loitz, R., Bauckhage, C., & Sifa, R. (2022). KPI-BERT: A joint named entity recognition and relation extraction model for financial reports. 2022 26th International Conference on Pattern Recognition (ICPR), pp. 606-612.
- Sifa, R., Ladi, A., Pielka, M., Ramamurthy, R., Hillebrand, L., Kirsch, B., ... & Loitz, R. (2019). Towards automated auditing with machine learning. Proceedings of the ACM Symposium on Document Engineering (DocEng) 2019, pp. 1-4.
Natural Language Processing (NLP) is increasingly becoming a transformative technology in the legal sector. Its applications range from document analysis to automated contract review, enhancing the efficiency and accuracy of legal processes. Our lab is deeply invested in harnessing NLP to revolutionize legal operations, focusing on automating legal document analysis, facilitating legal research, and streamlining compliance.
The legal field presents unique challenges for NLP due to the complex, nuanced nature of legal language. Legal documents such as contracts, legislation, and case law are often lengthy, dense, and laden with specialized terminology. The challenge for NLP in this domain is to interpret and analyze these texts accurately, ensuring a deep understanding of legal jargon and context. On the opportunity side, NLP can significantly automate and simplify tasks such as document review, case prediction, and compliance monitoring. This can lead to substantial time and cost savings, allowing legal professionals to focus on more strategic aspects of their work.
At AML Lab, we are focused on developing NLP technologies tailored for legal applications. Our research and development efforts are geared towards creating sophisticated algorithms capable of understanding and processing complex legal language. We are working on tools that can automatically extract relevant information from legal documents, predict legal outcomes, and ensure compliance with laws and regulations. Additionally, we are exploring ways to facilitate legal research by enabling efficient searching and categorization of legal texts. By applying NLP to legal challenges, we aim to support legal professionals in delivering more efficient, accurate, and accessible legal services.
Topic ideas: AI guided medicine/Intelligent diagnostics/ Cognitive medicine
The field of modern medicine faces significant challenges. On one hand, demographic changes in our society are leading to a shortage of skilled professionals in the medical sector. On the other hand, rapid advancements in medicine are generating an overwhelming amount of new knowledge, published in specialized literature. Medical professionals are thus confronted with an increase in workload due to the shortage of skilled workers, as well as the need to stay updated with the latest scientific and technological developments through continuous education, whether through seminars or studying current literature.
The "Applied Machine Learning Lab" (AML Lab) rises to meet these challenges in medicine. We are dedicated to developing and implementing artificial intelligence (AI) to enhance efficiency and precision in diagnostics. Our algorithms assist medical professionals by aiding in the analysis and interpretation of findings, thereby accelerating the diagnosis process. Our AI models analyze imaging data, identify patterns, and provide valuable insights that serve as decision-making aids for medical staff. However, we take the view that the final decision is always the responsibility of the healthcare professional and that the AI tools provide assistance.
At AML Lab, we are committed to bridging the gap between advanced technology and everyday medical practices, ensuring that healthcare professionals are equipped with the best tools to make informed decisions.
- Biesner, D. , Schneider, H., Wulff, B., Attenberger, U. & Sifa, R. (2022). Improving Chest X-Ray Classification by RNN-based Patient Monitoring. 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 2022, pp. 946-950.
- Schneider, H., Lübbering, M., Kador, R., Broß, M., Priya, P., Biesner, D., Wulff, B., de Oliveira, T.B., Layer, Y.C., Attenberger, U. & Sifa, R. (2023). Symmetry-Aware Siamese Network: Exploiting Pathological Asymmetry for Chest X-Ray Analysis. Artificial Neural Networks and Machine Learning – ICANN 2023. Lecture Notes in Computer Science, 14257, Springer.
- Luetkens, J.A., Nowak, S., Mesropyan, N. et al. (2022). Deep learning supports the differentiation of alcoholic and other-than-alcoholic cirrhosis based on MRI. Sci Rep 12, 8297.
Behavioral analytics is a sophisticated field dedicated to deciphering and modeling human behavior, personality traits and decision-making processes. Its aim is to analyze and understand the reasons for and manner of individual actions. In this area, elements from psychology, data analysis and user experience design are combined to obtain a comprehensive picture of human behavior, with a focus on reactions to different stimuli and environments. This fusion leads to more strategic and informed decision making. By using behavioral analytics, companies can not only anticipate future trends and behaviors, but also adapt their services and products to the individual needs and preferences of their customers. This customization is the key to improving performance and maintaining a competitive advantage.
Modeling human behavior is a complex task due to the multivariate nature of personalities. Furthermore, human behavior is highly dependent on the social, cultural and environmental context. Therefore, controlled environments in which the variables can be closely observed and analyzed are of great value. Video games provide such a controlled setting and are therefore ideal for pioneering studies in the field of behavior analysis. Their defined rules and interactive elements provide a microcosm of real-life social interactions and behaviors. Research in gaming behavior allows the development of models with potential applicability to broader real-life scenarios. However, this involves intricate modeling that might not always directly translate to non-gaming contexts, given the unique aspects of gaming motivations and environments. Additionally, the anonymity prevalent in these settings might not fully reflect real-world social dynamics.
In recognition of these challenges, our team is dedicated to extending the reach of behavioral analytics from gaming to wider real-life applications. We strive to use the controlled environment of gaming to build foundational models of personality and behavior that can enhance our understanding of real-world human interactions.
Our strategy involves refining analytical methods to capture complex human behaviors and personalities in more diverse settings. By examining player motivations and social dynamics within games, we aim to identify parallels and contrasts with behaviors in everyday life. This approach is intended to improve the predictive accuracy and relevance of our models, fostering a deeper understanding of human behavior across various contexts.
- Advanced Methods for Text Mining
- Theory of Deep Learning
- Explainable AI & Applications