- Open data catalogs from various governments and NGOs:
- NYC Open Data
- DC Open Data Catalog / OpenDataDC
- DataLA
- data.gov (see also: Project Open Data Dashboard)
- data.gov.uk
- US Census Bureau
- World Bank Open Data
- Humanitarian Data Exchange
- Sunlight Foundation: government-focused data
- ProPublica Data Store
- Datasets hosted by academic institutions:
- UC Irvine Machine Learning Repository: datasets specifically designed for machine learning
- Stanford Large Network Dataset Collection: graph data
- Inter-university Consortium for Political and Social Research
- Pittsburgh Science of Learning Center's DataShop
- Academic Torrents: distributed network for sharing large research datasets
- Dataverse Project: searchable archive of research data
- Datasets hosted by private companies:
- Quandl: over 10 million financial, economic, and social datasets
- Amazon Web Services Public Data Sets
- Kaggle provides datasets with their challenges, but each competition has its own rules as to whether the data can be used outside of the scope of the competition.
- Big lists of datasets:
- Awesome Public Datasets: Well-organized and frequently updated
- Rdatasets: collection of 700+ datasets originally distributed with R packages
- RDataMining.com
- KDnuggets
- inside-R
- 100+ Interesting Data Sets for Statistics
- 20 Free Big Data Sources
- Sebastian Raschka: datasets categorized by format and topic
- APIs:
- Apigee: explore dozens of popular APIs
- Mashape: explore hundreds of APIs
- Python APIs: Python wrappers for many APIs
- Other interesting datasets:
- FiveThirtyEight: data and code related to their articles
- The Upshot: data related to their articles
- Yelp Dataset Challenge: Yelp reviews, business attributes, users, and more from 10 cities
- Donors Choose: data related to their projects
- 200,000+ Jeopardy questions
- CrowdFlower: interesting datasets created or enhanced by their contributors
- UFO reports: geolocated and time-standardized UFO reports for close to a century
- Reddit Top 2.5 Million: all-time top 1,000 posts from each of the top 2,500 subreddits
- Other resources:
- Datasets subreddit: ask for help finding a specific data set, or post your own
- Center for Data Innovation: blog posts about interesting, recently-released data sets.
- Awesome Public Datasets: a github repo with links to lots of interesting datasets.