-
Notifications
You must be signed in to change notification settings - Fork 0
/
Journal.txt
53 lines (39 loc) · 2.25 KB
/
Journal.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
2024.10.16
When predicting how many orders of what will be sold in the coming months, should I generalize the goal? For instance, only ask it how much will sell instead of what country?
Actually, I think a bigger probelm is what is supposed to be the output? Is it a large sheet with the potential orders for the month? A daily prediction? hourly?
Lets go with daily.
I am trying to make a chart that shows how orders fluctuate over time, but it's quite difficult because so many of the countries orders are being shown as zero.
Produced a graph of total orders each week for top ten countries:
# Resample the data by week (using 'W' for weekly frequency) and group by both the week and 'Country'
# Then sum the 'Quantity' for each week and country
grouped_df = df.groupby([pd.Grouper(freq='W'), 'Country'])['Quantity'].sum().reset_index()
# Calculate the total quantity for each country
total_quantity_per_country = grouped_df.groupby('Country')['Quantity'].sum()
# Sort the countries by total quantity and get the top 10
top_10_countries = total_quantity_per_country.nlargest(10).index
# Filter the grouped DataFrame to only include the top 10 countries
filtered_df = grouped_df[grouped_df['Country'].isin(top_10_countries)]
# Exclude the first week by skipping the first entry
# Find the first week's date and filter out rows before that
first_week = filtered_df['InvoiceDate'].min()
filtered_df = filtered_df[filtered_df['InvoiceDate'] > first_week]
# Pivot the data to have countries as separate columns, indexed by the week
pivot_df = filtered_df.pivot(index='InvoiceDate', columns='Country', values='Quantity')
# Fill missing weeks/countries with 0 if needed
pivot_df = pivot_df.fillna(0)
# Plotting with larger figure size
plt.figure(figsize=(15, 8)) # Increase the figure size
# Plotting the pivot DataFrame
pivot_df.plot(kind='line', marker='o', ax=plt.gca())
# Customize the plot
plt.title('Total Quantity Ordered by Top 10 Countries Over Time')
plt.xlabel('Date')
plt.ylabel('Quantity Ordered')
plt.grid(True)
plt.legend(title='Country')
# Adjust x-axis ticks and rotate labels
plt.gca().xaxis.set_major_locator(plt.MaxNLocator(nbins=10)) # Reduce the number of x-axis ticks
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels
# Show the plot
plt.tight_layout()
plt.show()