Development of Open Source EEG Foundation model #39

ItshMoh · 2024-03-05T18:07:35Z

ItshMoh
Mar 5, 2024

Hello @zeydabadi !
I am Mohan a student of IIT BHU. I am interested in contributing to this project. This project can help doctors for faster EEG analysis which can help to avoid many psychological disorders.
I have experience with python and frameworks like Pytorch and TensorFlow. I have done internship related to large language models specifically For AI security for integrating security to stop prompt injection in the system. I have experience with fine tuning and have done some fine tuning of stable diffusion for producing icons Link. I have experience with self supervised learning methods like SimCLR , BYOL etc. I have some experience with signal processing which I think will be very important for Preprocessing of EEG data.
I have done some research regarding the project. I have read some research some papers regarding it and found some research papers very useful related to the project. As self supervised learning has proved its capabilities of learning the intricacies hidden in the data. It will play a key role in the foundation model.

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model : This is the latest paper regarding to EEG Foundation model. It has a described a self supervised learning approach with the help of vector quantisation. It mainly focuses on earning temporal patterns among multi-channel EEG data. Here the vector quantiser is used to look up for the nearest neighbor in the codebook. It has transformer encoder and decoder architecture. Though it has a very shallow decoder architecture. It has outperform other models like EEGNet and GraphS4mer(supervised model) on being fine tuned. It has mentioned about some parameter settings like the codebook size and encoder layer(it plays a very crucial role). We can analyse the codebook and can see how each univariate time series data chunk is related with its optimal vector.
Neuro-GPT: Towards A Foundation Model for EEG: It discusses about a foundation model consisting of an EEG encoder to extract spatio-temporal features from EEG data, and a GPT model that uses self-supervision to predict the masked chunks. The encoder architecture includes both convolutional and self-attention modules. The GPT model employs a decoder architecture consisting of multi layered stacks of self-attention and feed forward modules for capturing the dependencies between the tokens. It has outperformed the BENDR(SSL model finetuned on the BCI classification data). Here the encoder layer plays a very crucial role, and its accuracy can be improved by making the encoder channel independent.

The main requirements of our project to make a foundation model are as follows:

Preprocessing of the data with some know preprocessing techniques like Fourier transform, continuous wavelet transform and common spatial pattern. This step plays a crucial role in accuracy of our model on downstream tasks.
An architecture for learning the hidden pattern in the EEG data. It could be a convolutional network for learning the temporal or spatial features or both with the help of SSL and transformer architecture.

It is my research till now around this topic. I am open to any suggestions and discussions in the comments. I am looking forward to collaborate with amazing folks and to contribute to this project.

Mohan (mohan.kumar.min22@itbhu.ac.in)

aliceheiman · 2024-03-06T04:50:43Z

aliceheiman
Mar 6, 2024

Hello @zeydabadi and everyone!

My name is Alice and I am a Computer Science undergrad at Stanford University pursuing the Computational Biology track.

I am very interested in joining this project as I am passionate about contributing to the intersection between AI and health.

The course PSYC221: Machine Learning for Neuroimaging introduced me to the exciting possibilities of combining neuroscience and AI. I used Scikit-Learn and PyTorch to apply and build machine-learning models for brain analysis. For example, I recreated a U-net for MRI segmentation and built an encoder-decoder model to generate artificial brain images.
The course CS173A: A Computational Tour of the Human Genome gave me skills in working with genomic data and the genome browser using Python, Bedtools, and Unix.

I have a strong foundation in Python and HTML/CSS, and also have experience in C, PHP, and JavaScript.

I tried working on the Kaggle Competition for EEG recognition. However, as my studies took too much of my attention, I have not been able to fully dive deep into this matter. It would be so interesting and fulfilling to get that opportunity this summer!

I look forward to discussing this project further with you!

Warm regards,
Alice :)

0 replies

AaryanSahu · 2024-03-07T19:58:54Z

AaryanSahu
Mar 7, 2024

Hey @zeydabadi and everyone else,

This is Aryan, a pre-final year undergraduate student majoring in Computer Science at BITS Pilani.Your project on developing an open-source EEG Foundation Model has sparked my interest immensely. The prospect of contributing to such a noble endeavor resonates strongly with both my academic pursuits and personal aspirations.

Coming to my experience, I have worked on supervised methods of ML like Semi-Supervised and Self-Supervised. Temporal data(Financial Data in my case), LLMs, Retrieval Augmented Generation, building small-scale LLMs and playing around with the embeddings ;)

My journey in the field of machine learning and deep learning has been both enriching and inspiring. I am profoundly fascinated with the potential of Large Language Models (LLMs) and their applications across various domains. My experience spans working with frameworks like PyTorch, TensorFlow, and Keras, where I've worked on advanced algorithms to tackle diverse challenges. My research revolves around harnessing the power of LLMs, with a focus on improving model interpretability and explainability—a crucial aspect in ensuring the reliability and trustworthiness of AI systems.

I have a solid understanding of the complexities of handling temporal data, which is crucial considering EEG data's inherently temporal nature. My previous work with semi-supervised learning and working with Medical Data(with U-NET) has equipped me with valuable insights that I believe will be beneficial in developing the EEG Foundation Model.

I am eager to bring my skills and knowledge to this project and collaborate with like-minded individuals to contribute to EEG research. Thank you for considering my interest, and I look forward to participating in this innovative endeavor.

I tried to delve into some existing Papers on Foundational models for EEG, Here’s what I found so far:

NEURO-GPT: DEVELOPING A FOUNDATION MODEL FOR EEG: This paper talks about Neuro-GPT, a foundation model that merges an EEG encoder with a GPT model. Pre-trained on extensive EEG datasets, it learns to reconstruct masked segments, capturing complex EEG patterns. The EEG encoder extracts features, while GPT predicts the next masked chunk.
The GPT model helps to perform the self-supervised task and assists the EEG encoder in extracting expressive features from raw EEG during pre-training. The limitations, or I would say a further development, can be to make it channel agnostic so that it can adapt to varying number of channels. Another Idea would be to inculcate multi-modalities like(fMRI, ECG, etc).

LARGE BRAIN MODEL FOR LEARNING GENERIC REPRESENTATIONS WITH TREMENDOUS EEG DATA IN BCI: The study introduces the Large Brain Model (LaBraM) as a comprehensive framework for EEG analysis. LaBraM facilitates cross-dataset learning by dividing EEG signals into EEG channel patches and utilizes vector-quantized neural spectrum prediction to encode these patches into concise neural representations, drawing inspiration from image patch embeddings. To enhance the model's ability to learn generic representations from extensive EEG data, the authors propose a masked EEG modeling approach, employing a symmetric masking strategy to enhance training efficiency. Subsequently, the efficacy of LaBraM is assessed through evaluations on various downstream tasks such as TUAB and TUEV. Experimental results demonstrate that LaBraM surpasses SOTA methods in their respective domains. Data scaling remains one of the problems mentioned in the paper.

These are something I found upon research and I am open to suggestions and interaction on these.
I hope to collaborate on this and contribute to this research project.

Thank you!
Regards ;)
Aryan
(aryansahu010103@gmail.com)

0 replies

carminoplata · 2024-03-23T19:22:38Z

carminoplata
Mar 23, 2024

Development of an Open-Source EEG Foundation Model

Abstract

The aim of the project is to develop an open-source foundation model which will be able to help the analysis of Electroencephalography (EEG) data. The main activities will be on investigating and adopting several algorithms from data extractions, patterns recognition and deep learning on a public dataset of EEG data.

Contributor

Carmine Sacco
email: carmine.sacco91@icloud.com
github: carminoplata
personal website: https://csacco.eu

Potential Mentors

Mahmoud Zeydabadinezhad (@zeydabadi)
Babak Mahmoud

My Background
I'm Software Engineer with a Master degree in Computer Engineer at Polytechnic of Turin. During my degree, my friends and I launched a startup in healthcare which purpose was to provide reliable information about services and therapies offered by public and private Italian hospitals. During this experience, I gain the skills to communicate with doctors and I discovered the existence of DICOM format for which we developed a proof of concept for the sharing of medical images between the patient and the doctor.
Unfortunately, we didn't believe so much into our ideas, so we moved with other opportunities.
On my side I was hired by a group of researchers of my university to join into a European Research program as a Research Assistant for development of a mobile application for user identification using the fingerprint of its smartphone's camera and the development of a web application for a search engine for images taken by the same camera device.
After this fantastic experience, I decided to move into industry field in particular automotive and aerospace sectors. I worked there several years on several projects using several programming languages and learnt several methods to organize the activities.
Nowadays I'm developing my own methodology and processes to organize any activity into any field and I'm pretty sure that it works very well into research field too, since I already experimented during a research activitiy for developing a live streaming platform for 4K videos.

Last but not least I attended some Coursera Courses from Andrew NG about the strategies to develop ML models in particular when your dataset is small, such as transfer learning which is based on the idea of fine tuning a model working for the same scope (pattern recognition from images) but with a larger dataset.

In conclusion, I gained expertise from several sectors and environments (large companies, startups and universities) but I would like to join into Google Summer of Code program because it's huge opportunity to gain hands-on experience on real-world problems and start to switch my careers.

Project Goals

Understand the state of art of ML models adopted for Electroencephalography data and/or Radiology images
Make a benchmark on between the several models and find the one with the best precision
Improve the model adopting several techniques
Paper to publish

Major Contributions

Knowledge of how to move from a notebook to production environment
Possible to use my personal laptop which is Macbook Pro with M2 and 11-core GPU and 16-core Neural Engine
Possibility to access to Italian Machine Learning Community which is a network of the several university professors and AI experts which can provide some advices.

Project Schedule

Since the project is 350 hours and you consider 35 hours per week, I suppose that totally there are 10 weeks which I'd like to develop in this way

1st - 2nd Weeks: Collect the public EEG datasets and all papers about the state of art of ML models for detection of diseases from medical images and/or EEG detection

3rd - 4th Weeks: Data anylisys of EEG (clean, extract and transform) + implement into a notebook the most promising models

5th - 6th Week: Make tests on EEG dataset and make a report for the different results you got on several models. Start to think how you can improve the model's precision, training.

7th - 8th Week: Implement your ideas and make the reports about the results.

9th - 10th Week: Clean Code and write a Paper.

Planned GSoC work hours

Full-time Project so at least 35 hours per week. If I can work full-remote from Italy, I would like to work from 10 am to 2pm CEST and from 3pm to 7pm CEST but I am flexible to adapt half day to your timezone.

Vacation days

Maybe I can take one week in the middle of August but it will depend by my progresses and achievements during these phases

Skills set
I can make an infinite list of bullets points but let me say that I'm proficient in Python and C++ and I have some demos about some recommender system and a beers classifier that are published on Github as private repository, so if you have time we can have a call and I can show them or otherwise I'll provide them.
In addition, in the latest five years I worked pratically on projects alone or fully remote where I asked for some support to my colleagues just a couple of times when I was blocked or I wanted to speed up my activity since someone had got already experience on that.
If you want to have a deeper knwoledge about my skillset, please visit my resume here

Conclusion
I know that it's a little late but I discovered this awesome program just a couple of days ago so I'm trying to find the projects which can give the opportunity to move my career into AI.
I look forward to receive some news from you.

Best regards,

Carmine Sacco

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development of Open Source EEG Foundation model #39

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Development of Open Source EEG Foundation model #39

ItshMoh Mar 5, 2024

Replies: 3 comments

aliceheiman Mar 6, 2024

AaryanSahu Mar 7, 2024

carminoplata Mar 23, 2024

Development of an Open-Source EEG Foundation Model

ItshMoh
Mar 5, 2024

aliceheiman
Mar 6, 2024

AaryanSahu
Mar 7, 2024

carminoplata
Mar 23, 2024