- Manipulate and analyze data using Pandas DataFrame objects
- Read Excel data into a DataFrame
- Use hierarchical indexing to sort and slice data
- Process data according to specifications
- Find values in a dataset
- Operate on data in Pandas
The City of Calgary tracks information on the most popular licensed dog breeds in the city. You are being asked to design a terminal-based application for computing and printing statistics based on a provided breed input. The provided dataset has been modified from the City of Calgary data: https://data.calgary.ca/Demographics/Top-100-Licensed-Dog-Breeds/wi52-6nx6
Your application must meet the following design specifications:
- Your final output should match the given examples (screenshots included in the rubric)
- Stage 1: DataFrame Creation
- Import the provided data into a Pandas DataFrame.
- You may not change or sort the Excel file.
- You may not hard-code/copy-paste any information into your program except for the expected years and column header names.
- You may index the information in any way.
- Stage 2: User Entry
- Prompt the user to enter a dog breed.
- If the name does not exist within the given data, raise a KeyError to print “Dog breed not found in the data. Please try again.”
- Users should be prompted continually until a correct input is provided.
- As long as the entry is spelled correctly, your program should accept entries in uppercase, lowercase, camel case, and mixed case.
- You may assume that the user knows how the breed names are identified in the database.
- After successful data entry and analysis, your program should end.
- Stage 3: Data Analysis
- Find and print all years where the selected breed was listed in the top breeds.
- Calculate and print the total number of registrations of the selected breed found in the dataset.
- Calculate and print the percentage of selected breed registrations out of the total percentage for each year (2021, 2022, 2023).
- Calculate and print the percentage of selected breed registrations out of the total three-year percentage.
- Find and print the months that were most popular for the selected breed registrations. Print all months that tie.
- Your terminal output must match the provided example interface, including the wording (not just printing the values).
- Your code should include and use at least one multi-index Pandas DataFrame, at least one IndexSlice object, at least one masking operation, at least one grouping operation, and at least one built-in Pandas or NumPy computational method.
- Your code must follow the conventions discussed so far in the course (names_with_underscores, ClassNames, four spaces for indentations, spaces between variables/operators, comments throughout, etc.)
- Any user-defined classes, methods, and functions must contain docstring documentation.
- For each class, include a functionality summary and describe any class and/or instance variables (do not include a separate docstring for __init__)
- For each method/function, include a functionality summary and describe parameters and return values (or specify if there are none)
- Main functions do not need a docstring but should be well-commented
- Your code will be run by the TAs as your end user.
- To facilitate grading, please leave the Excel spreadsheet unmodified within your directory. Do not place it within a separate subfolder.
- Please do not provide a path to the Excel file when importing the data. You should simply use
'CalgaryDogBreeds.xlsx'
as it is contained within the same directory. - FAQs about the assignment will be answered on the D2L discussion boards. Please check the boards for any clarifications before submitting.
- The grading rubric will be posted to D2L.
- Make sure to watch video lessons 21 - 28, and review the corresponding labs and Jupyter Notebooks.
- Clone this repository to your local computer.
- Open VSCode and start a new terminal. Make sure that your
ensf592
environment is activated. calgary_dogs.py
is provided as a starting point. Fill in the header with your own information.- Remember to test your program execution via the terminal:
python calgary_dogs.py
- Take a screenshot of your successful program run and upload it to your assignment repository.
- Commit your screenshot and code.
- Push your local git history to GitHub.
- Submit your repository HTTPS link to the Assignment 4 D2L dropbox.
- Tip: If you want to learn more about a specific aspect of a Python object or the functionality of Pandas, remember to take a look at the official documentation!