Information extraction from PubMed abstracts sentences on polyphenols anticancer activity

This repository contains files and information about step 2 of Kaphta Architecture: Information Extraction. In this stage, PubMed abstracts classified as positive in the previous stage (Text Classification step) were used to extract information. Information was extracted from sentences of PubMed abstracts with associations of recognized entities. The following are the files used in the tasks of NER (Named entity recognition), AR (Association recognition) and your respective results:

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

NER (Named entity recognition)

ner-pubmed-abstracts-gh.R: R script for named entity recognition (NER) in PubMed abstracts classified as positive in the previous stage (Text Classification step), using PubTator API
functions.R: R script with auxiliary functions. Save this file in the same folder of ner-pubmed-abstracts-gh.R and association-recognition-pubmed-abstracts-gh.R scripts, because it is needed to execute these scripts.
db_total_project.db: SQLite Database needed to execute all R scripts of kaphta architecture steps. This database contains tables with the Entity dictionary, Total PubMed abstracts textual corpus, and Pubmed abstracts classified as positive in text classification. Save this file in the same folder of ner-pubmed-abstracts-gh.R script, because it is needed to execute this script.

AR (Association recognition)

association-recognition-pubmed-abstracts-gh.R: R script for association recognition (AR) in PubMed abstracts classified as positive in the previous stage (Text Classification step), using regular expressions from rules dictionary (see sequential-pattern-mining-in-pubmed-abstracts-sentences repository).
- To execute this R script it's necessary to download the entities-associations-sentences-recognized and entities-recognized folders.

Results of the NER and AR tasks

entities-recognized: folder with files resulted from NER task in information extraction with the named entities (polyphenols, cancers and genes) recognized on PubMed abstracts classified as positive in the previous stage (Text Classification step). Save this folder with the files in the same folder of association-recognition-pubmed-abstracts-gh.R script, because it is needed to execute this script, on the Association recognition task.
entities-associations-sentences-recognized: folder with files resulted of NER task in information extraction with sentences recognized with entities (polyphenols, cancers and genes) associations on PubMed abstracts classified as positive in the previous stage (Text Classification step). Save this folder with the files in the same folder of association-recognition-pubmed-abstracts-gh.R script, because it is needed to execute this script, on the Association recognition task.
ner-frequency: folder with files with the frequency of entities about polyphenols, cancers and/or genes recognized in PubMed abstracts classified as positive in the previous stage (Text Classification step).
Rule_associations_recognized.rar: compacted file resulted of AR task containing the PubMed abstract sentences with at least one rule from rules dictionary recognized.

Result of AR task

Below is presented a table with the results of the Association Recognition task, separated for category, rules and sentence type (PC, PG, and P).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information extraction from PubMed abstracts sentences on polyphenols anticancer activity

NER (Named entity recognition)

AR (Association recognition)

Results of the NER and AR tasks

Result of AR task

Table with the total of the recognized sentences associations for the different sentence type

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
entities-associations-sentences-recognized		entities-associations-sentences-recognized
entities-recognized		entities-recognized
images		images
ner-frequency		ner-frequency
README.md		README.md
Rule_associations_recognized.rar		Rule_associations_recognized.rar
association-recognition-pubmed-abstracts-gh.R		association-recognition-pubmed-abstracts-gh.R
functions.R		functions.R
ner-pubmed-abstracts-gh.R		ner-pubmed-abstracts-gh.R

ramongsilva/Information-extraction-from-pubmed-abstracts-sentences-on-polyphenols-anticancer-activity

Folders and files

Latest commit

History

Repository files navigation

Information extraction from PubMed abstracts sentences on polyphenols anticancer activity

NER (Named entity recognition)

AR (Association recognition)

Results of the NER and AR tasks

Result of AR task

Table with the total of the recognized sentences associations for the different sentence type

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages