-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Natural language processing - Changes in Algorithm name in movie datasets #1128
Natural language processing - Changes in Algorithm name in movie datasets #1128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Neerja,
Additional to the other comments made, it's also good to include instructions on how we wish learners to setup their project directories
- look at Aminat's PR here:
Cross validation syllabus updates #1125
If you look at the index file she has updated for this project you'll see she shows how we expect learners to setup their file structures (and make it clear they need to include a requirements file, etc - we try to make it clear that learners need to setup their work to be reproducible)
I hope this makes sense?
This dataset contains a collection of movie reviews, each labeled with its corresponding sentiment (positive or negative). This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using Logistic Regression or other classification algorithms. | ||
For more dataset information, please go through the following [link](http://ai.stanford.edu/~amaas/data/sentiment/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording of this could be changed slightly
Instead of this:
We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using Logistic Regression or other classification algorithms
This:
The dataset contains 25,000 highly polarised movie reviews for training, and 25,000 for testing. Predict the number of positive and negative reviews using Logistic Regression or other classification algorithms
## Dataset 2: The contents of the State of the Nation Address (SONA) for every year dating back to 1990 is available on the [South African Government website](https://www.gov.za/state-nation-address). | ||
|
||
### Description: | ||
This gives us a great opportunity to look at the priorities and challenges have faced over time, and the focus points for the various presidents over this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the format of the first dataset, so rather:
Dataset 2: Each State of the Nation Address (SONA) since 2000 South African Government website
Description:
The link provided contains the contents of of the State of the Nation address (SONA) for every year dating back to 1990. NLP techniques will be used to understand how the priorities and challenges faced by South Africa's presidents have changed over time since the year 2000.
Related issues: [please specify]
Description:
What are you up to? Fill us in :)
I solemnly swear that: