NLP Profiler

A simple NLP library allows profiling datasets with one or more text columns.

NLP Profiler returns either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column.

In short: Think of it as using the pandas.describe() function or running Pandas Profiling on your data frame, but for datasets containing text columns rather than the usual columnar datasets.

Input a Pandas dataframe series as input paramater.
You get back a new dataframe with various features about the parsed text per row.
- high-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, etc...
- low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...
From the above numerical data in the resulting dataframe descriptive statistics can be drawn using the pandas.describe() on the dataframe.

See screenshots under the Jupyter section and also under Screenshots for further illustrations.

Under the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.

A simple notebook have been provided to illustrate the usage of the library.

_Note: this is a new endeavour and it's may have rough edges i.e. probably NOT capable of doing many things atm. Many of these gaps are opportunities we can work on and plug, as we go along using it. Please provide constructive feedback to help with the improvement of this library. We just recently achieved this with scaling with larger datasets.

Requirements

Python 3.6.x or higher.
Dependencies described in the requirements.txt.
High-level including Grammar checks:
- faster processor
- higher RAM capacity
(Optional)
- Jupyter Lab (on your local machine).
- Google Colab account.
- Kaggle account.
- Grammar check functionality:
  - Internet access
  - Java 8 or higher

Getting started

Demo

Look at this short demo of the NLP Profiler library by clicking on the below image: or you find the rest of the talk here.

Installation

From PyPi:

pip install nlp_profiler

From the GitHub repo:

pip install git+https://github.com/neomatrix369/nlp_profiler.git@master

From the source (only for development purposes):

git clone https://github.com/neomatrix369/nlp_profiler
cd nlp_profiler

python setup.py install

or

pip install -e .

or

pip install --prefix .

Usage

import nlp_profiler.core as nlpprof

new_text_column_dataset = nlpprof.apply_text_profiling(dataset, 'text_column')

or

from nlp_profiler.core import apply_text_profiling

new_text_column_dataset = apply_text_profiling(dataset, 'text_column')

See Notebooks section for further illustrations.

Notebooks

After succesful installation of the library, RESTART Jupyter kernels or Google Colab runtimes for the changes to take effect.

Jupyter

See Jupyter Notebook

Google Colab

You can open these notebooks directly in Google Colab

Kaggle kernels

Notebook/Kernel | Script | Other related links

Screenshots

Credits and supporters

See CREDITS_AND_SUPPORTERS.md

Changes

See CHANGELOG.md

License

Refer licensing (and warranty) policy.

Contributing

Contributions are Welcome!

Please have a look at the CONTRIBUTING guidelines.

Please share it with the wider community (and get credited for it)!

Go to the NLP page

Name		Name	Last commit message	Last commit date
Latest commit History 330 Commits
nlp_profiler		nlp_profiler
notebooks		notebooks
presentations/01-nlp-zurich-2020		presentations/01-nlp-zurich-2020
slow-tests		slow-tests
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CREDITS_AND_SUPPORTERS.md		CREDITS_AND_SUPPORTERS.md
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
release-to-github.sh		release-to-github.sh
release-to-pypi.sh		release-to-pypi.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
test-coverage.sh		test-coverage.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Profiler

Table of contents

What do you get from the library?

Requirements

Getting started

Demo

Installation

Usage

Notebooks

Jupyter

Google Colab

Kaggle kernels

Screenshots

Credits and supporters

Changes

License

Contributing

About

Releases

Packages

Languages

License

IruneAI/nlp_profiler

Folders and files

Latest commit

History

Repository files navigation

NLP Profiler

Table of contents

What do you get from the library?

Requirements

Getting started

Demo

Installation

Usage

Notebooks

Jupyter

Google Colab

Kaggle kernels

Screenshots

Credits and supporters

Changes

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages