Natural language processing - Changes in Algorithm name in movie datasets #1128

neerja198 · 2023-09-12T18:53:58Z

Related issues: [please specify]

Description:

What are you up to? Fill us in :)

I solemnly swear that:

I ran the hugo server and looked at my changed in the browser with my own eyes
I ran the linter and there were no errors
My code follows the style guidelines of this project
I have performed a self-review of my own code

Samantha-Hampton

Hi Neerja,

Additional to the other comments made, it's also good to include instructions on how we wish learners to setup their project directories

look at Aminat's PR here:
Cross validation syllabus updates #1125

If you look at the index file she has updated for this project you'll see she shows how we expect learners to setup their file structures (and make it clear they need to include a requirements file, etc - we try to make it clear that learners need to setup their work to be reproducible)
I hope this makes sense?

Samantha-Hampton · 2023-10-05T11:16:02Z

content/projects/data-science-specific/natural-language-processing/_index.md

+This dataset contains a collection of movie reviews, each labeled with its corresponding sentiment (positive or negative). This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using Logistic Regression or other classification algorithms.
+For more dataset information, please go through the following [link](http://ai.stanford.edu/~amaas/data/sentiment/)


I think the wording of this could be changed slightly

Instead of this:

We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using Logistic Regression or other classification algorithms

This:

The dataset contains 25,000 highly polarised movie reviews for training, and 25,000 for testing. Predict the number of positive and negative reviews using Logistic Regression or other classification algorithms

Samantha-Hampton · 2023-10-05T11:25:34Z

content/projects/data-science-specific/natural-language-processing/_index.md

+## Dataset 2: The contents of the State of the Nation Address (SONA) for every year dating back to 1990 is available on the [South African Government website](https://www.gov.za/state-nation-address).

+### Description:
+This gives us a great opportunity to look at the priorities and challenges have faced over time, and the focus points for the various presidents over this time.


This doesn't match the format of the first dataset, so rather:

Dataset 2: Each State of the Nation Address (SONA) since 2000 South African Government website

Description:

The link provided contains the contents of of the State of the Nation address (SONA) for every year dating back to 1990. NLP techniques will be used to understand how the priorities and challenges faced by South Africa's presidents have changed over time since the year 2000.

neerja198 added 7 commits August 31, 2023 18:03

_index.md

ffd275c

_index.md

e34fd2a

_index.md

4078f13

_index.md

5d8b7a4

Update _index.md

6650bc2

_index.md

f98db6e

_index.md

6348e15

Samantha-Hampton requested changes Oct 5, 2023

View reviewed changes

Samantha-Hampton closed this Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural language processing - Changes in Algorithm name in movie datasets #1128

Natural language processing - Changes in Algorithm name in movie datasets #1128

neerja198 commented Sep 12, 2023

Samantha-Hampton left a comment

Samantha-Hampton Oct 5, 2023

Samantha-Hampton Oct 5, 2023

		This dataset contains a collection of movie reviews, each labeled with its corresponding sentiment (positive or negative). This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using Logistic Regression or other classification algorithms.
		For more dataset information, please go through the following [link](http://ai.stanford.edu/~amaas/data/sentiment/)

Natural language processing - Changes in Algorithm name in movie datasets #1128

Natural language processing - Changes in Algorithm name in movie datasets #1128

Conversation

neerja198 commented Sep 12, 2023

Description:

I solemnly swear that:

Samantha-Hampton left a comment

Choose a reason for hiding this comment

Samantha-Hampton Oct 5, 2023

Choose a reason for hiding this comment

Samantha-Hampton Oct 5, 2023

Choose a reason for hiding this comment

Dataset 2: Each State of the Nation Address (SONA) since 2000 South African Government website

Description: