SkimLit (Sequence Classification in Medical Abstracts)

This repository contains code for training and evaluating deep learning models for sequence classification in medical abstracts. The models aim to classify medical abstracts into predefined categories based on their content.

Overview

The provided code implements several deep learning models using TensorFlow and Keras to classify medical abstracts. The models utilize token embeddings, character embeddings, and hybrid embeddings to encode the textual data and make predictions.

The following models are included:

Conv1D with Token Embeddings: This model uses token embeddings generated by a pre-trained TensorFlow Hub Universal Sentence Encoder. It employs a 1D convolutional layer for sequence processing.
Feature Extraction with Pretrained Token Embeddings: This model utilizes the Universal Sentence Encoder (USE) from TensorFlow Hub as a feature extractor. It extracts high-level features from the text to make predictions.
Conv1D with Character Embeddings: This model employs character-level embeddings and a 1D convolutional layer. It processes the sequences at a character level to capture finer-grained information.
Combining Pretrained Token Embeddings and Character Embeddings (Hybrid Embedding Layer): This model combines token embeddings from the Universal Sentence Encoder with character-level embeddings using a hybrid token embedding layer. It aims to leverage both token and character-level information for improved performance.

Usage

Clone the repository:

git clone https://github.com/your_username/sequence-classification-medical-abstracts.git

install the necessary dependencies

you won't need to install if you're running the notebook on Google Colaboratory

Pubmed RCT Dataset

The Pubmed RCT (Randomized Controlled Trial) dataset is a collection of medical abstracts obtained from PubMed. It consists of abstracts from randomized controlled trials and has been pre-processed for various natural language processing tasks, including sentence classification.

Dataset Overview

Objective: The dataset aims to facilitate research in the domain of biomedical natural language processing by providing a standardized collection of abstracts from randomized controlled trials.
Content: The dataset includes abstracts from various medical research papers, with each abstract containing information about the study's objective, methods, results, and conclusion.
Labels: The sentences in the abstracts are classified into five categories:
1. Objective
2. Methods
3. Results
4. Conclusion
5. Other (Sentences that do not fall into any of the above categories)

Accessing the Data

The dataset is available on GitHub at the following link: Pubmed RCT Dataset.

Researchers and practitioners interested in biomedical natural language processing can utilize this dataset for various tasks such as text classification, summarization, information extraction, and more.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
SkimLit.ipynb		SkimLit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkimLit (Sequence Classification in Medical Abstracts)

Overview

Usage

Pubmed RCT Dataset

Dataset Overview

Accessing the Data

About

Releases

Packages

Languages

License

kunal-kumar-chaudhary/SkimLit

Folders and files

Latest commit

History

Repository files navigation

SkimLit (Sequence Classification in Medical Abstracts)

Overview

Usage

Pubmed RCT Dataset

Dataset Overview

Accessing the Data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages