Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

region sizes #11

Open
PatMyron opened this issue Jun 30, 2021 · 10 comments
Open

region sizes #11

PatMyron opened this issue Jun 30, 2021 · 10 comments
Assignees

Comments

@PatMyron
Copy link
Owner

PatMyron commented Jun 30, 2021

https://ip-ranges.amazonaws.com/ip-ranges.json

https://old.reddit.com/r/aws/comments/j3luvy/can_anyone_tell_me_or_send_me_documentation_on/g7dl4ip/

from collections import defaultdict
import requests
prefixes = requests.get('https://ip-ranges.amazonaws.com/ip-ranges.json').json()['prefixes']
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
  mask = prefix['ip_prefix'].split('/')[1]
  regions[prefix['region']] += 2**(32-int(mask))
  sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
us-east-1: 28%
us-west-2: 15%
eu-west-1: 09%
us-east-2: 07%
ap-northeast-1: 06%
eu-central-1: 06%
GLOBAL: 04%

total: 124 million
@PatMyron
Copy link
Owner Author

PatMyron commented Sep 20, 2021

https://www.gstatic.com/ipranges/cloud.json

from collections import defaultdict
import requests
prefixes = requests.get('https://www.gstatic.com/ipranges/cloud.json').json()['prefixes']
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
  try:
    mask = prefix['ipv4Prefix'].split('/')[1]
  except:
    pass
  regions[prefix['scope']] += 2**(32-int(mask))
  sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
us-central1: 25%
us-east1: 10%
europe-west1: 9%
asia-east1: 5%
us-west1: 5%
asia-northeast1: 4%
us-east4: 4%
global: 5%

total: 9 million

@PatMyron
Copy link
Owner Author

PatMyron commented Sep 21, 2021

https://www.microsoft.com/en-us/download/details.aspx?id=41653

https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20200824.xml

from collections import defaultdict
from xml.etree import ElementTree
import requests

regions = defaultdict(lambda: 0)
sum = 0
for region in ElementTree.fromstring(requests.get('https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20200824.xml').text):
  for cidr in region:
    mask = cidr.attrib['Subnet'].split('/')[1]
    regions[region.attrib['Name']] += 2**(32-int(mask))
    sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
useast: 13%
europewest: 11%
uswest: 10%
useast2: 7%
europenorth: 7%
uscentral: 6%
ussouth: 6%
asiasoutheast: 4%
uswest2: 4%

total: 16 million

PatMyron added a commit that referenced this issue May 28, 2022
PatMyron added a commit that referenced this issue May 28, 2022
@PatMyron
Copy link
Owner Author

PatMyron commented May 30, 2022

I should consider switching the Azure source:

https://twitter.com/0xdabbad00/status/1275821557785309184

https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20220523.json

https://www.microsoft.com/en-us/download/details.aspx?id=56519

Screen Shot 2022-05-30 at 8 19 55 AM

from collections import defaultdict
import requests
prefixes = requests.get('https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20220523.json').json()['values']
regions = defaultdict(lambda: 0)
sum = 0
for prefixList in prefixes:
  for prefix in prefixList['properties']['addressPrefixes']:
    mask = prefix.split('/')[1]
    try:
      regions[prefixList['name'].split('.')[1]] += 2**(32-int(mask))
      sum += 2**(32-int(mask))
    except:
      pass
    # sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')

@PatMyron
Copy link
Owner Author

PatMyron commented May 30, 2022

@0xdabbad00 most CIDRs seem listed multiple times in that newer Azure IP range file.. listed by service and then listed again by service + region

Confirmed no duplicates in GCP's data, found some duplicates in AWS' data that need to be handled too, still some duplicates in Azure's CIDR ranges even after only counting service + region data

@PatMyron PatMyron reopened this May 30, 2022
@PatMyron PatMyron self-assigned this May 30, 2022
@PatMyron
Copy link
Owner Author

PatMyron commented May 31, 2022

assuming @seligman dealt with duplicates since his repos arrive at similar GCP/Azure totals after I handled most Azure duplicates, but @seligman has just over half as many AWS IP addresses as my unhandled AWS total

@seligman
Copy link

Yep, there's a lot of overlap in AWS's ip-ranges, you need to account for it. For instance, both of these entries exist in ip-ranges.json, but they're clearly for the same exact IP addresses.

{"ip_prefix": "52.219.170.0/23", "region": "eu-central-1", "service": "AMAZON", "network_border_group": "eu-central-1"}
{"ip_prefix": "52.219.170.0/23", "region": "eu-central-1", "service": "S3", "network_border_group": "eu-central-1"}

There is also some overlap that's not quite as obvious, one larger example is:

{"ip_prefix": "35.180.0.0/16", "region": "eu-west-3", "service": "AMAZON", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.0.0/16", "region": "eu-west-3", "service": "EC2", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.16/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.24/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.32/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.40/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.48/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.56/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.8/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.128/27", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.160/27", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.80/29", "region": "eu-west-3", "service": "EC2_INSTANCE_CONNECT", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.244.0/23", "region": "eu-west-3", "service": "AMAZON", "network_border_group": "eu-west-3"}

All of the ranges in that list are either whole or in part included in the first range in the list.

To account for this, the code in my stuff uses netaddr:

from collections import defaultdict
import requests
from netaddr import IPSet, IPNetwork

prefixes = requests.get('https://ip-ranges.amazonaws.com/ip-ranges.json').json()['prefixes']
# Just output a few random demo regions
demo_regions = ["us-west-2", "us-east-1", "ap-southeast-1"]

def patmyron_method(prefixes):
    regions = defaultdict(lambda: 0)
    sum = 0
    for prefix in prefixes:
        mask = prefix['ip_prefix'].split('/')[1]
        regions[prefix['region']] += 2**(32-int(mask))
        sum += 2**(32-int(mask))
    for region in demo_regions:
        print(region + ": " + str(round(regions[region] / sum, 2)))
    print('total:', sum//1000000, 'million')

def seligman_method(prefixes):
    regions = defaultdict(list)
    for prefix in prefixes:
        cur_network = IPNetwork(prefix['ip_prefix'])
        regions[prefix['region']].append(cur_network)
        regions["_all_"].append(cur_network)
    all_ips_set = IPSet(regions["_all_"])
    for region in demo_regions:
        region_set = IPSet(regions[region])
        print(f"{region}: {len(region_set) / len(all_ips_set) : 0.2f}")
    print(f'total: {len(all_ips_set)//1000000} million')

for x in ["patmyron_method", "seligman_method"]:
    print(f"{'-'*10} {x} {'-'*50}")
    globals()[x](prefixes)

which outputs:

---------- patmyron_method --------------------------------------------------
us-west-2: 0.14
us-east-1: 0.26
ap-southeast-1: 0.03
total: 127 million
---------- seligman_method --------------------------------------------------
us-west-2:  0.15
us-east-1:  0.26
ap-southeast-1:  0.03
total: 66 million

Care must me taken when dealing with netaddr, since it can quickly turn into a O(N^2) problem if you're not careful, doubly so with IPv6 addresses, but if you prepare lists so you only go through IPSet() work a handful of times, it shouldn't be too bad.

I've gotten in the habit of doing this sort of logic for all of the cloud providers, though I think it's really only important for AWS. Truth be told, I'm not sure, it's safer to assume they're all a mess.

@PatMyron
Copy link
Owner Author

PatMyron commented Jun 1, 2022

Glad relative region percentages still look similar, that's the main data I was investigating and was hoping duplicates were roughly even between regions until I handled that

Wild us-east-1 alone is in between GCP and Azure in terms of number of IP addresses

@seligman
Copy link

seligman commented Jun 1, 2022

Yep, looked at in how much it impacts the final charts:

Change:  0.570%: 25.792% -> 26.362%: us-east-1
Change:  0.568%:  6.449% ->  5.881%: eu-central-1
Change:  0.383%:  5.049% ->  5.432%: GLOBAL
Change:  0.296%:  3.000% ->  2.704%: ap-south-1
Change:  0.243%: 14.640% -> 14.397%: us-west-2
Change:  0.218%:  8.500% ->  8.718%: eu-west-1
Change:  0.152%:  1.404% ->  1.252%: eu-west-3
Change:  0.147%:  6.760% ->  6.613%: us-east-2
Change:  0.143%:  1.603% ->  1.460%: ca-central-1
Change:  0.129%:  3.233% ->  3.361%: ap-southeast-1

Most of the regions are below 0.1% off, with a few outliers around .3 to .5%. Certainly good enough to convey the sizes.

( Same basic code tweaked to show changes )

@Cyclenerd
Copy link

Just wanted to leave a quick thank you here (and here). Your repo has given me the idea to also calculate the IP addresses for Google regions. You can see the result here: https://gcloud-compute.com/regions.html

@PatMyron
Copy link
Owner Author

PatMyron commented Jul 21, 2022

@Cyclenerd GoogleCloudPlatform/region-picker#10 would be another great add, but GCP is the one I've never been able to automate scraping that information for:
https://github.com/PatMyron/cloud#product--feature-regional-availability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants