-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
region sizes #11
Comments
https://www.gstatic.com/ipranges/cloud.json from collections import defaultdict
import requests
prefixes = requests.get('https://www.gstatic.com/ipranges/cloud.json').json()['prefixes']
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
try:
mask = prefix['ipv4Prefix'].split('/')[1]
except:
pass
regions[prefix['scope']] += 2**(32-int(mask))
sum += 2**(32-int(mask))
for region in regions:
print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
|
https://www.microsoft.com/en-us/download/details.aspx?id=41653 from collections import defaultdict
from xml.etree import ElementTree
import requests
regions = defaultdict(lambda: 0)
sum = 0
for region in ElementTree.fromstring(requests.get('https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20200824.xml').text):
for cidr in region:
mask = cidr.attrib['Subnet'].split('/')[1]
regions[region.attrib['Name']] += 2**(32-int(mask))
sum += 2**(32-int(mask))
for region in regions:
print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
|
I should consider switching the Azure source: https://twitter.com/0xdabbad00/status/1275821557785309184 https://www.microsoft.com/en-us/download/details.aspx?id=56519 from collections import defaultdict
import requests
prefixes = requests.get('https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20220523.json').json()['values']
regions = defaultdict(lambda: 0)
sum = 0
for prefixList in prefixes:
for prefix in prefixList['properties']['addressPrefixes']:
mask = prefix.split('/')[1]
try:
regions[prefixList['name'].split('.')[1]] += 2**(32-int(mask))
sum += 2**(32-int(mask))
except:
pass
# sum += 2**(32-int(mask))
for region in regions:
print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million') |
@0xdabbad00 most CIDRs seem listed multiple times in that newer Azure IP range file.. listed by service and then listed again by service + region Confirmed no duplicates in GCP's data, found some duplicates in AWS' data that need to be handled too, still some duplicates in Azure's CIDR ranges even after only counting service + region data |
Yep, there's a lot of overlap in AWS's ip-ranges, you need to account for it. For instance, both of these entries exist in ip-ranges.json, but they're clearly for the same exact IP addresses.
There is also some overlap that's not quite as obvious, one larger example is:
All of the ranges in that list are either whole or in part included in the first range in the list. To account for this, the code in my stuff uses netaddr: from collections import defaultdict
import requests
from netaddr import IPSet, IPNetwork
prefixes = requests.get('https://ip-ranges.amazonaws.com/ip-ranges.json').json()['prefixes']
# Just output a few random demo regions
demo_regions = ["us-west-2", "us-east-1", "ap-southeast-1"]
def patmyron_method(prefixes):
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
mask = prefix['ip_prefix'].split('/')[1]
regions[prefix['region']] += 2**(32-int(mask))
sum += 2**(32-int(mask))
for region in demo_regions:
print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
def seligman_method(prefixes):
regions = defaultdict(list)
for prefix in prefixes:
cur_network = IPNetwork(prefix['ip_prefix'])
regions[prefix['region']].append(cur_network)
regions["_all_"].append(cur_network)
all_ips_set = IPSet(regions["_all_"])
for region in demo_regions:
region_set = IPSet(regions[region])
print(f"{region}: {len(region_set) / len(all_ips_set) : 0.2f}")
print(f'total: {len(all_ips_set)//1000000} million')
for x in ["patmyron_method", "seligman_method"]:
print(f"{'-'*10} {x} {'-'*50}")
globals()[x](prefixes) which outputs:
Care must me taken when dealing with netaddr, since it can quickly turn into a O(N^2) problem if you're not careful, doubly so with IPv6 addresses, but if you prepare lists so you only go through I've gotten in the habit of doing this sort of logic for all of the cloud providers, though I think it's really only important for AWS. Truth be told, I'm not sure, it's safer to assume they're all a mess. |
Glad relative region percentages still look similar, that's the main data I was investigating and was hoping duplicates were roughly even between regions until I handled that Wild |
Yep, looked at in how much it impacts the final charts:
Most of the regions are below 0.1% off, with a few outliers around .3 to .5%. Certainly good enough to convey the sizes. |
Just wanted to leave a quick thank you here (and here). Your repo has given me the idea to also calculate the IP addresses for Google regions. You can see the result here: https://gcloud-compute.com/regions.html |
@Cyclenerd GoogleCloudPlatform/region-picker#10 would be another great add, but GCP is the one I've never been able to automate scraping that information for: |
https://ip-ranges.amazonaws.com/ip-ranges.json
https://old.reddit.com/r/aws/comments/j3luvy/can_anyone_tell_me_or_send_me_documentation_on/g7dl4ip/
The text was updated successfully, but these errors were encountered: