Skip to content

Commit

Permalink
Merge pull request #479 from nachoalonsoportillo/patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
leestott authored Nov 2, 2023
2 parents 12535d2 + 6a3c941 commit d92c02b
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions 1-Introduction/01-defining-data-science/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,9 @@ The first step is to collect the data. While in many cases it can be a straight
</dd>
<dt>2) Data Storage</dt>
<dd>
Storing data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would to query the data in the future. There are several ways data can be stored:
Storing data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would like to query the data in the future. There are several ways data can be stored:
<ul>
<li>A relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables are organized into different groups called schemas. In many cases we need to convert the data from original form to fit the schema.</li>
<li>A relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables are organized into different groups called schemas. In many cases we need to convert the data from original form to fit the schema.</li>
<li><a href="https://en.wikipedia.org/wiki/NoSQL">A NoSQL</a> database, such as <a href="https://azure.microsoft.com/services/cosmos-db/?WT.mc_id=academic-77958-bethanycheum">CosmosDB</a>, does not enforce schemas on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL databases do not have the rich querying capabilities of SQL, and cannot enforce referential integrity, i.e. rules on how the data is structured in tables and governing the relationships between tables.</li>
<li><a href="https://en.wikipedia.org/wiki/Data_lake">Data Lake</a> storage is used for large collections of data in raw, unstructured form. Data lakes are often used with big data, where all data cannot fit on one machine, and has to be stored and processed by a cluster of servers. <a href="https://en.wikipedia.org/wiki/Apache_Parquet">Parquet</a> is the data format that is often used in conjunction with big data.</li>
</ul>
Expand Down

0 comments on commit d92c02b

Please sign in to comment.