diff --git a/category/r/index.xml b/category/r/index.xml index 5a262fd1..a6c80058 100644 --- a/category/r/index.xml +++ b/category/r/index.xml @@ -242,6 +242,21 @@ install_github(&quot;rkward-community/rk.Teaching&quot;) <iframe src="https://www.youtube.com/embed/SB1oER6HbEs" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe> </div> +</li> +</ol> +<h4 id="instalación-en-macos-mediante-wmware-fusion">Instalación en MacOs mediante WMware Fusion</h4> +<p>Si el procedimiento anterior no funciona, es posible instalar una máquina virtual con RKward ya instalado mediante el software WMware Fusion. Para ello deben seguirse los siguientes pasos:</p> +<ol> +<li> +<p><strong>Instalar WMware Fusion</strong>. WMware Fusion es un software de virtualización que permite instalar sistemas operativos Windows o Linux en un Mac. +<a href="https://www.techspot.com/downloads/2755-vmware-fusion-mac.html" target="_blank" rel="noopener">Descargar WMware Fusion</a> y seguir las instrucciones de instalación.</p> +</li> +<li> +<p>Descargar la máquina virtual con RKWard ya instalado. +<a href="https://ceu365-my.sharepoint.com/:u:/g/personal/asalber_ceu_es/EcK0P2-es1pOl-3HtRYHn4sBswcpiil5q6QwVp01i0o0yA?e=fWtpe0" target="_blank" rel="noopener">Descargar máquina virtual</a></p> +</li> +<li> +<p>Arrancar WMware Fusion y abrir la máquina virtual descargada. La máquina virtual arrancará con RKWard ya instalado y listo para su uso.</p> </li> </ol> <h3 id="instalación-en-linux">Instalación en Linux</h3> diff --git a/en/author/alfredo-sanchez-alberca/index.html b/en/author/alfredo-sanchez-alberca/index.html index 3af9ba36..8aefdf39 100644 --- a/en/author/alfredo-sanchez-alberca/index.html +++ b/en/author/alfredo-sanchez-alberca/index.html @@ -278,11 +278,11 @@ - + - + @@ -299,7 +299,7 @@ - Alfredo Sánchez Alberca | Aprende con Alf + Alfredo Sánchez-Alberca | Aprende con Alf @@ -661,186 +661,89 @@

Search

+
+

Alfredo Sánchez-Alberca

+
+
- - - - - - - - - - - - - - - - - - - -
-
- -
- - - - -

Father and environmental activist, I work as a teacher of Mathematics and Statistics at the Applied Maths and Statistics department of the CEU San Pablo University. I do my research in Data Science, including Biostatistics and Machine Learning. I master the programming languages R, Python and LaTeX.

- - -
- - -
-

Interests

-
    - -
  • Statistics
  • - -
  • Applied Maths
  • - -
  • Machine Learning
  • - -
  • Artificial Intelligence
  • - -
  • Data Science
  • - -
-
- - - -
-

Education

-
    - -
  • - -
    -

    PhD in Artificial Intelligence, 2016

    -

    Technical University of Madrid

    -
    -
  • - -
  • - -
    -

    Degree in Mathematics (Computational Sciences), 1993

    -

    Complutense University of Madrid

    -
    -
  • - -
-
- - -
-
-
- - - - -
diff --git a/en/author/alfredo-sanchez-alberca/index.xml b/en/author/alfredo-sanchez-alberca/index.xml index 9723b461..e6e28a0c 100644 --- a/en/author/alfredo-sanchez-alberca/index.xml +++ b/en/author/alfredo-sanchez-alberca/index.xml @@ -1,144 +1,16 @@ - Alfredo Sánchez-Alberca | Aprende con Alf + Alfredo Sánchez Alberca | Aprende con Alf /en/author/alfredo-sanchez-alberca/ - Alfredo Sánchez-Alberca - Source Themes Academic (https://sourcethemes.com/academic/)en-usMon, 01 Jan 2018 00:00:00 +0000 + Alfredo Sánchez Alberca + Source Themes Academic (https://sourcethemes.com/academic/)en-us /images/logo_hude38443eeb2faa5fa84365aba7d86a77_3514_300x300_fit_lanczos_3.png - Alfredo Sánchez-Alberca + Alfredo Sánchez Alberca /en/author/alfredo-sanchez-alberca/ - - Una nueva taxonomía de colecciones y de funciones de similitud para su comparación - /en/publication/nueva-2018-2/ - Mon, 01 Jan 2018 00:00:00 +0000 - /en/publication/nueva-2018-2/ - - - - - Una nueva taxonomía de colecciones y de funciones de similitud para su comparación - /en/publication/nueva-2018/ - Mon, 01 Jan 2018 00:00:00 +0000 - /en/publication/nueva-2018/ - - - - - Innovación en la docencia de Estadística con R y rk.Teaching - /en/publication/innovacion-2016-2/ - Fri, 01 Jan 2016 00:00:00 +0000 - /en/publication/innovacion-2016-2/ - - - - - Innovación en la docencia de Estadística con R y rk.Teaching - /en/publication/innovacion-2016/ - Fri, 01 Jan 2016 00:00:00 +0000 - /en/publication/innovacion-2016/ - - - - - Bringing R to non-expert users with the package RKTeaching - /en/publication/bringing-2015/ - Thu, 01 Jan 2015 00:00:00 +0000 - /en/publication/bringing-2015/ - - - - - Bioestadística Aplicada con SPSS - /en/publication/bioestadistica-2014/ - Wed, 01 Jan 2014 00:00:00 +0000 - /en/publication/bioestadistica-2014/ - - - - - Towards a Semanctic Catalog of Similarity Measures - /en/publication/towards-2014-1/ - Wed, 01 Jan 2014 00:00:00 +0000 - /en/publication/towards-2014-1/ - - - - - Towards a Semantic Catalog of Similarity Measures - /en/publication/towards-2014/ - Wed, 01 Jan 2014 00:00:00 +0000 - /en/publication/towards-2014/ - - - - - RKTeaching: a new R package for teaching Statistics . - /en/publication/rkteaching-2013/ - Tue, 01 Jan 2013 00:00:00 +0000 - /en/publication/rkteaching-2013/ - - - - - RKTeaching: Un paquete de R para la enseñanza de la Estadística - /en/publication/rkteaching-2012/ - Sun, 01 Jan 2012 00:00:00 +0000 - /en/publication/rkteaching-2012/ - - - - - Evolution of neuroendocrine cell population and peptidergic innervation, assessed by discriminant analysis, during postnatal development of the rat prostate - /en/publication/evolution-2007/ - Mon, 01 Jan 2007 00:00:00 +0000 - /en/publication/evolution-2007/ - - - - - AMON: A software system for automatic generation of ontology mappings - /en/publication/amon-2005/ - Sat, 01 Jan 2005 00:00:00 +0000 - /en/publication/amon-2005/ - - - - - Framework for automatic generation of ontology mappings - /en/publication/framework-2004/ - Thu, 01 Jan 2004 00:00:00 +0000 - /en/publication/framework-2004/ - - - - - Herramientas de trabajo cooperativo - /en/publication/herramientas-2004/ - Thu, 01 Jan 2004 00:00:00 +0000 - /en/publication/herramientas-2004/ - - - - - Aspectos técnicos de la comunidad virtual de usuarios FARMATOXI - /en/publication/aspectos-2002/ - Tue, 01 Jan 2002 00:00:00 +0000 - /en/publication/aspectos-2002/ - - - - - FARMATOXI: Red temática de farmacología y toxicología de RedIris - /en/publication/farmatoxi-2000/ - Sat, 01 Jan 2000 00:00:00 +0000 - /en/publication/farmatoxi-2000/ - - - diff --git a/en/category/biostatistics/index.xml b/en/category/biostatistics/index.xml index a38a4b93..2f5d0df2 100644 --- a/en/category/biostatistics/index.xml +++ b/en/category/biostatistics/index.xml @@ -477,7 +477,7 @@ For each value or category of the variable, a bar is draw to the height of its f <p><strong>Example</strong>. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.</p> -<div id="chart-142863579" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> +<div id="chart-762543189" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> <script> (function() { let a = setInterval( function() { @@ -487,7 +487,7 @@ For each value or category of the variable, a bar is draw to the height of its f clearInterval( a ); Plotly.d3.json("./img/absolute-barchart.json", function(chart) { - Plotly.plot('chart-142863579', chart.data, chart.layout, {responsive: true}); + Plotly.plot('chart-762543189', chart.data, chart.layout, {responsive: true}); }); }, 500 ); })(); diff --git a/en/category/statistics/index.xml b/en/category/statistics/index.xml index 0d43865a..19a4d5ef 100644 --- a/en/category/statistics/index.xml +++ b/en/category/statistics/index.xml @@ -477,7 +477,7 @@ For each value or category of the variable, a bar is draw to the height of its f <p><strong>Example</strong>. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.</p> -<div id="chart-142863579" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> +<div id="chart-762543189" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> <script> (function() { let a = setInterval( function() { @@ -487,7 +487,7 @@ For each value or category of the variable, a bar is draw to the height of its f clearInterval( a ); Plotly.d3.json("./img/absolute-barchart.json", function(chart) { - Plotly.plot('chart-142863579', chart.data, chart.layout, {responsive: true}); + Plotly.plot('chart-762543189', chart.data, chart.layout, {responsive: true}); }); }, 500 ); })(); diff --git a/en/index.json b/en/index.json index 6889865e..aa6b3701 100644 --- a/en/index.json +++ b/en/index.json @@ -1 +1 @@ -[{"authors":["asalber"],"categories":null,"content":"Father and environmental activist, I work as a teacher of Mathematics and Statistics at the Applied Maths and Statistics department of the CEU San Pablo University. I do my research in Data Science, including Biostatistics and Machine Learning. I master the programming languages R, Python and LaTeX.\n","date":-62135596800,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1600206500,"objectID":"4c1a52ed2fdb89e37dda671db5e7b383","permalink":"/en/author/alfredo-sanchez-alberca/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/author/alfredo-sanchez-alberca/","section":"authors","summary":"Father and environmental activist, I work as a teacher of Mathematics and Statistics at the Applied Maths and Statistics department of the CEU San Pablo University. I do my research in Data Science, including Biostatistics and Machine Learning.","tags":null,"title":"Alfredo Sánchez Alberca","type":"authors"},{"authors":null,"categories":["Calculus","One Variable Calculus","Several Variables Calculus"],"content":" Descargar\nThis Calculus manual has been conceived to ease the learning of Calculus in first years of university studies. It explain in a clear and simplified manner the most important concepts with a lot of examples that ease their understanding.\nThe manual is mainly focused on Health Sciences and most examples are applied to this field. However, the concepts and procedures presented are valid for any scope.\nTable of Contents Analytic Geometry One variable differential calculus Integral calculus Ordinary differential equations Several variables differentiable calculus ","date":1461110400,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"189c5ff675788efb52fff13012b2c1ea","permalink":"/en/teaching/calculus/manual/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/","section":"teaching","summary":"Explanation of the most important concepts in one variable and several variables Calculus with applied examples.","tags":null,"title":"Calculus Manual","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Descargar\nThis Manual of Statistics has been conceived to ease the learning of Statistics. It contains simple explanations of the most important concepts in Statistics with examples. It also explains the the most common statistical procedures for data analysis.\nThe manual is mainly aimed at Biostatistics, and therefore most of the examples are applied to health sciences. However, the concepts and statistical methods presented are valid for any scope.\nTable of Contents Introduction Descriptive Statistics Regression Probability Discrete Random Variables Continuous Random Variables Study flash cards There is an Anki deck of flash cards to study an remember the main concepts of this manual.\nIf you don\u0026rsquo;t know what is Anki, please visit the Anki web site. There is also a excellent tutorial about Anki essentials.\n","date":1461110400,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"e61b7a0a352f5ba615722fcec6c07083","permalink":"/en/teaching/statistics/manual/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/","section":"teaching","summary":"Explanation of the most important concepts in Statistics and Probability with examples.","tags":["Statistics"],"title":"Statistics Manual","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":" Descargar\nThis is a basic manual of Excel, the Microsoft Office spreadsheet. The version of Excel used in this manual is Excel 2010, but some parts of this manual are also valid for other versions.\nThis manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\nTable of Contents Introduction Formatting and Data Printing Formulas Plotting Charts Database Management ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"1f8a65a2485c96f4aa2aa6a90d06495f","permalink":"/en/teaching/excel/manual/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/","section":"teaching","summary":"A basic introduction to Excel for Economics with examples.","tags":["Excel"],"title":"Excel Manual","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Derivatives ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"99e2b20656785fc76fa5f07071839347","permalink":"/en/teaching/calculus/problems/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/problems/","section":"teaching","summary":"Statistics problems with solutions.","tags":["Problems"],"title":"Calculus Problems","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"\n\n","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"3ed71473b71b9ac7e72598691fa90654","permalink":"/en/teaching/excel/exercises/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/exercises/","section":"teaching","summary":"Excel problems with solutions.","tags":["Excel","Problems"],"title":"Excel Exercises","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Frequency Tables and Charts Descriptive Statistics Linear Regression Non Linear Regression Probability Diagnostic Tests Discrete Random Variables Continuous Random Variables ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1609665891,"objectID":"4734df1d5e0d7fd295f3f7a54a0584c4","permalink":"/en/teaching/statistics/problems/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/","section":"teaching","summary":"Statistics problems with solutions.","tags":["Problems"],"title":"Statistics Problems","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Calculus exams of the last courses with solutions.\nPharmacy exam 2022-01-17 Pharmacy exam 2021-01-18 Pharmacy exam 2019-12-16 Pharmacy exam 2018-12-17 Pharmacy exam 2018-01-19 Pharmacy exam 2017-11-06 Pharmacy exam 2016-01-10 Pharmacy exam 2016-11-07 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"d2efe47f363828dae16b00de59f4b783","permalink":"/en/teaching/calculus/exams/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/","section":"teaching","summary":"Calculus exams of the last courses with solutions.","tags":["Exams"],"title":"Calculus Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics exams of the last courses with solutions.\nPharmacy Statistics Exams Physiotherapy Statistics Exams ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"505c6dae1d0e0627c3510b060dd6e42f","permalink":"/en/teaching/statistics/exams/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/","section":"teaching","summary":"Statistics exams of the last courses with solutions.","tags":["Exams"],"title":"Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy"],"content":"List of exams Pharmacy exam 2022-01-17 Pharmacy exam 2021-11-22 Pharmacy exam 2021-10-25 Pharmacy exam 2021-01-18 Pharmacy exam 2020-11-23 Pharmacy exam 2020-10-26 Pharmacy exam 2019-12-16 Pharmacy exam 2019-11-18 Pharmacy exam 2019-10-14 Pharmacy exam 2018-12-17 Pharmacy exam 2018-11-19 Pharmacy exam 2018-10-29 Pharmacy exam 2018-01-19 Pharmacy exam 2017-11-27 Pharmacy exam 2017-01-10 Pharmacy exam 2016-11-28 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600206500,"objectID":"c6437d9dc217cdb1190e9310d6a79db0","permalink":"/en/teaching/statistics/exams/pharmacy/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Pharmacy Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"List of exams Physiotherapy exam 2022-06-06 Physiotherapy exam 2022-05-06 Physiotherapy exam 2022-03-11 Physiotherapy exam 2021-06-07 Physiotherapy exam 2021-05-05 Physiotherapy exam 2021-03-17 Physiotherapy exam 2020-06-19 Physiotherapy exam 2020-05-25 Physiotherapy exam 2019-06-18 Physiotherapy exam 2019-05-27 Physiotherapy exam 2019-03-26 Physiotherapy exam 2018-05-31 Physiotherapy exam 2018-04-09 Physiotherapy exam 2017-06-02 Physiotherapy exam 2017-05-19 Physiotherapy exam 2017-03-31 Physiotherapy exam 2016-06-23 Physiotherapy exam 2016-05-19 Physiotherapy exam 2016-05-13 Physiotherapy exam 2016-04-01 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600206500,"objectID":"1580d126132989916d94dd543a563c7d","permalink":"/en/teaching/statistics/exams/physiotherapy/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Physiotherapy Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics as a scientific tool What is Statistics? Definition - Statistics. Statistics is a branch of Mathematics that deals with data collection, summary, analysis and interpretation. The role of Statistics is to extract information from data in order to gain knowledge for taking decisions.\nStatistics is essential in any scientific or technical discipline which requires data handling, especially with large volumes of data, such as Physics, Chemistry, Medicine, Psychology, Economics or Social Sciences.\nBut, why is Statistics necessary?\nA changing World Scientists try to study the World. A World with a high variability that makes difficult determining the behaviour of things.\nStatistics provides a bridge between the real world and the mathematical models that attempt to explain it, providing a methodology to assess the discrepancies between reality and theoretical models.\nThis makes Statistics an indispensable tool in applied sciences that require design of experiments and data analysis.\nPopulation and sample Statistical population Definition - Population. A population is a set of elements defined by an or more features that has all the elements and only them. Every element of the population is called individual.\nDefinition - Population size. The number of individuals in a population is known as the population size and is represented by $N$.\nSometimes not all the individuals are accessible to study. Then we distinguish between:\nTheoretical population: Individuals to which we want to extrapolate the study conclusions. Studied population: Individuals truly accessible in the study. Example. In a study about a particular disease, the theoretical population would be all the persons that suffered the disease in some moment, even if they were not born yet. While the studied population will be the set o persons that have suffered the disease and that we can really study (observe that this exclude people with the disease but that we do not have any mean to get information about them).\nDrawbacks in the population study Scientists study a phenomenon in a population to understand it, to get knowledge about it, and so to control it.\nBut, for a complete knowledge of the population it is necessary to study all its individuals.\nHowever, this is not always possible for several reasons:\nThe population size is infinite or too large to study all its individuals. The operations that individuals undergo are destructive. The cost, both in money and time, that would require study all the individuals in the population is not affordable. Statistics Sample When it is not possible or convenient to study all the individuals in a population, we study only a subset of them.\nDefinition - Sample. A sample is a subset of the population.\nDefinition - Sample size. The number of individuals of the sample is called sample size and is represented by $n$.\nUsually, the population study is conducted on samples drawn from it.\nThe sample study only gives an approximate knowledge of the population. But in most cases it is enough.\nSample size determination One of the most interesting questions that arise:\nHow many individuals are required to sample to have an approximate but enough knowledge of the population?\nThe answer depends of several factors, as the population variability or the desired reliability for extrapolations to the population.\nUnfortunately we can not answer that question until the end of the course, but in general, the most individuals the sample has, the more reliable will the conclusions be on the population, but also the study will be longer and more expensive.\nExample. To understand what a sufficient sample size means we can use a picture example. A digital photography consist of a lot of small points called pixels disposed in an big array layout with rows and columns (the more rows and columns, the more resolution the picture has). Here the picture is the population and every pixel is an individual. Every pixel has a colour and it is the variability of colours what forms the picture motif.\nHow many pixels must we take in a sample in order to know the motif of a picture?\nThe answer depends on the variability of colours in the picture. If all the pixels in the picture are of the same colour, only one pixel is required to know the motif. But, if there is a lot of variability in the colours, a large sample size will be required.\nThe image below contains a small sample of the pixels of a picture. Could you find out the motif of the picture?\nWith a small sample size it is difficult to find out the picture motif!\nSurely you has not been able to guess the motif because the number of pixels picked in the sample is too small to understand the variability of colours in the picture.\nThe image below contains a larger sample of pixels. Could you find out the motif of the picture now?\nWith a large sample is easier to find out the picture motif!\nAnd here is the whole population.\nIt is not required to know all the pixels of a picture to find out its motif!\nTypes of reasoning Deduction properties: If the premises are true, it guarantees the certainty of the conclusions (that is, if something is true in the population, it is also true in the sample). However,\nInduction properties: It does not guarantee the certainty of the conclusions (if something is true in the sample, it may not be true in the population, so be careful with the extrapolations!). But, it is the only way to generate new knowledge!\nStatistics is fundamentally based on inductive reasoning, because it uses the information obtained from samples to draw conclusions about populations.\nSampling Definition - Sampling. The process of selecting the elements included in a sample is known as sampling. To reflect reliable information about the whole population, the sample must be representative of the population. That means that the sample should reproduce on a smaller scale the population variability.\nThe goal is to get a representative sample of the population.\nTypes of sampling There exist a lot of sampling methods but all of them can be grouped in two categories:\nRandom sampling: The sample individuals are selected randomly. All the population individuals have the same likelihood of being selected (equiprobability).\nNon random sampling: The sample individuals are not selected randomly. Some population individuals have a higher likelihood of being selected than others.\nOnly random sampling methods avoid the selection bias and guarantee the representativeness of the sample, and therefore, the validity of conclusions.\nNon random sampling methods are not suitable to make generalizations because they do not guarantee the representativeness of the sample. Nevertheless, usually they are less expensive and can be used in exploratory studies.\nSimple random sampling The most popular random sampling method is the simple random sampling, that has the following properties:\nAll the population individuals have the same likelihood of being selected in the sample. The individual selection is performed with replacement, that is, each selected individual is returned to the population before selecting the next one. In this way the population does not change. Each individual selection is independent of the others. The only way of doing a random sampling is to assign a unique identity number to each population individual (conducting a census) and performing a random drawing.\nStatistical variables In every statistical study we are interested in some properties or characteristics of individuals.\nDefinition - Statistical variable. A statistical variable is a property or characteristic measured in the population individuals. The data is the actual values or outcomes recorded on a statistical variable.\nTypes of statistical variables According to the nature of their values and their scale, they can be:\nQualitative variables. They measure non-numeric qualities. They can be:\nNominals: There is no natural order between its categories. Example: The hair colour or the gender.\nOrdinals: There is a natural order between its categories. Example: The education level.\nQuantitative variables: They measure numeric quantities. They can be:\nDiscrete: Their values are isolated numbers (usually integers). Example: The number of children or cars in a family.\nContinuous: They can take any value in a real interval. Example: The height, weight or age of a person.\nQualitative and discrete variables are also called categorical variables and their values categories.\nChoosing the appropriate type of variable Sometimes a characteristic could be measured in variables of different types.\nExample. Whether a person smokes or not could be measure in several ways:\nSmokes: yes/no. (Nominal)\nSmoking level: No smoking/unusual/moderate/quite/heavy. (Ordinal)\nNumber of cigarettes per day: 0,1,2,\u0026hellip;(Discrete)\nIn those cases quantitative variables are preferable to qualitative, continuous variables are preferable to discrete variables and ordinal variables are preferable to nominal, as they give more information.\nAccording to their role in the study:\nIndependent variables: Variables that do not depend on other variables in the study. Usually they are manipulate in an experiment in order to observe their effect on a dependent variable. They are also known as predictor variables.\nDependent variables: Variables that depend on other variables in the study. They are not manipulated in an experiment and are also known as outcome variables.\nExample. In a study on the performance of students in a course, the intelligence of students and the daily study time are independent variables, while the course grade is a dependent variable.\nTypes of statistical studies According to their role in the study:\nExperimental: When the independent variables are manipulated in order to see the effect that that change has on the dependent variables.\nExample. In a study on the performance of students in a test, the teacher manipulates the methodology and creates two or more groups following different methodologies.\nNon-experimental: When the independent variables are not manipulated. That does not mean that it is impossible to do so, but it will either be impractical or unethical to do so.\nExample. In a study a researcher could be interested in the effect of smoking over the lung cancer. However, whilst possible, it would be unethical to ask individuals to smoke in order to study what effect this had on their lungs. In this case, the researcher could study two groups of people, one with lung cancer and other without, an observe in each group how many persons smoke or not.\nExperimental studies allow to identify a cause and effect between variables while non-experimental studies only allow to identify association or relationship between variables.\nThe data table The variables of a study will be measured in each individual of the sample. This will give a data set that usually is arranged in a tabular form known as data table.\nIn this table each column contains the information of a variable and each row contains the information of an individual.\nExample. The table below contains data about the variables Name, Age, Gender, Weight and Height of a sample of 6 persons.\nName Age Gender Weight(Kg) Height(cm) José Luis Martínez 18 H 85 179 Rosa Díaz 32 M 65 173 Javier García 24 H 71 181 Carmen López 35 M 65 170 Marisa López 46 M 51 158 Antonio Ruiz 68 H 66 174 Phases of a statistical study Usually a statistical study goes through the following phases:\nThe study begins with a previous design in which the study goals, the population, the variables to measure and the required sample size are set.\nNext, the sample is selected from the population and the variables are measured in the individuals of the sample (getting the data table). This is accomplished by Sampling.\nThe next step consists in describing and summarizing the information of the sample. This is the job of Descriptive Statistics.\nThen, the information obtained is projected on a mathematical model that intend to explain what happens in population, and the model is validated. This is accomplished by Inferential Statistics.\nFinally, the validated model is used to perform predictions and to draw conclusions on the population.\nThe statistical cycle ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"25b11930afc1ea0d302f6d13ef3201a2","permalink":"/en/teaching/statistics/manual/introduction/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/introduction/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Introduction","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Descriptive Statistics is the part of Statistics in charge of representing, analysing and summarizing the information contained in the sample.\nAfter the sampling process, this is the next step in every statistical study and usually consists of:\nTo classify, group and sort the data of the sample.\nTo tabulate and plot data according to their frequencies.\nTo calculate numerical measures that summarize the information contained in the sample (sample statistics).\nIt has no inferential power, so do not generalize to the population from the measures computed by Descriptive Statistics!.\nFrequency distribution The study of a statistical variable starts by measuring the variable in the individuals of the sample and classifying the values.\nThere are two ways of classifying data:\nNon-grouping: Sorting values from lowest to highest value (if there is an order). Used with qualitative variables and discrete variables with few distinct values.\nGrouping: Grouping values into intervals (classes) and sort them from lowest to highest intervals. Used with continuous variables and discrete variables with many distinct values.\nSample classification It consists in grouping the values that are the same and sorting them if there is an order among them.\nExample. $X=$Height\nFrequency count It consists in counting the number of times that every value appears in the sample.\nExample. $X=$Height\nSample frequencies Definition - Sample frequencies. Given a sample of $n$ values of a variable $X$, for every value $x_i$ of the variable we define\nAbsolute Frequency $n_i$: The number of times that value $x_i$ appears in the sample.\nRelative Frequency $f_i$: The proportion of times that value $x_i$ appears in the sample.\n$$f_i = \\frac{n_i}{n}$$\nCumulative Absolute Frequency $N_i$: The number of values in the sample less than or equal to $x_i$. $$N_i = n_1 + \\cdots + n_i = N_{i-1}+n_i$$\nCumulative Relative Frequency $F_i$: The proportion of values in the sample less than or equal to $x_i$. $$F_i = \\frac{N_i}{n}$$\nFrequency table The set of values of a variable with their respective frequencies is called frequency distribution of the variable in the sample, and it is usually represented as a frequency table.\n$X$ values Absolute frequency Relative frequency Cumulative absolute frequency Cumulative relative frequency $x_1$ $n_1$ $f_1$ $N_1$ $F_1$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $x_i$ $n_i$ $f_i$ $N_i$ $F_i$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $x_k$ $n_k$ $f_k$ $N_k$ $F_k$ Example - Quantitative variable and non-grouped data. The number of children in 25 families are:\n1, 2, 4, 2, 2, 2, 3, 2, 1, 1, 0, 2, 2, 0, 2, 2, 1, 2, 2, 3, 1, 2, 2, 1, 2 The frequency table for the number of children in this sample is\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 2 \u0026amp; 0.08\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 8 \u0026amp; 0.32\\newline 2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 22 \u0026amp; 0.88\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 24 \u0026amp; 0.96\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\end{array} $$\nExample - Quantitative variable and grouped data. The heights (in cm) of 30 students are:\n179, 173, 181, 170, 158, 174, 172, 166, 194, 185,\n162, 187, 198, 177, 178, 165, 154, 188, 166, 171,\n175, 182, 167, 169, 172, 186, 172, 176, 168, 187. The frequency table for the height in this sample is\n$$ \\begin{array}{crrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline (150,160] \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 2 \u0026amp; 0.07\\newline (160,170] \u0026amp; 8 \u0026amp; 0.27 \u0026amp; 10 \u0026amp; 0.34\\newline (170,180] \u0026amp; 11 \u0026amp; 0.36 \u0026amp; 21 \u0026amp; 0.70\\newline (180,190] \u0026amp; 7 \u0026amp; 0.23 \u0026amp; 28 \u0026amp; 0.93\\newline (190,200] \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\end{array} $$\nClasses construction Intervals are known as classes and the center of intervals as class marks.\nWhen grouping data into intervals, the following rules must be taken into account:\nThe number of intervals should not be too big nor too small. A usual rule of thumb is to take a number of intervals approximately $\\sqrt{n}$ or $\\log_2(n)$. The intervals must not overlap and must cover the entire range of values. It does not matter if intervals are left-open and right-closed or vice versa. The minimum value must fall in the first interval and the maximum value in the last. Example - Qualitative variable. The blood types of 30 people are:\nA, B, B, A, AB, 0, 0, A, B, B, A, A, A, A, AB, A, A, A, B, 0, B, B, B, A, A, A, 0, A, AB, 0. The frequency table of the blood type in this sample is\n$$ \\begin{array}{crr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \\newline \\hline \\mbox{0} \u0026amp; 5 \u0026amp; 0.16 \\newline \\mbox{A} \u0026amp; 14 \u0026amp; 0.47 \\newline \\mbox{B} \u0026amp; 8 \u0026amp; 0.27 \\newline \\mbox{AB} \u0026amp; 3 \u0026amp; 0.10 \\newline \\hline \\sum \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\end{array} $$\nObserve that in this case cumulative frequencies are nonsense as there is no order in the variable.\nFrequency distribution graphs Usually the frequency distribution is also displayed graphically. Depending on the type of variable and whether data has been grouped or not, there are different types of charts:\nBar chart\nHistogram\nLine or polygon chart.\nPie chart\nBar chart A bar chart consists of a set of bars, one for every value or category of the variable, plotted on a coordinate system.\nUsually the values or categories of the variable are represented on the $x$-axis, and the frequencies on the $y$-axis. For each value or category of the variable, a bar is draw to the height of its frequency. The width of the bar is not important but bars should be clearly separated among them.\nDepending on the type of frequency represented in the $y$-axis we get different types of bar charts.\nSometimes a polygon, known as frequency polygon, is plotted joining the top of every bar with straight lines.\nExample. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.\nThe bar chart below shows the relative frequency distribution of the number of children with the frequency polygon.\nThe bar chart below shows the cumulative absolute frequency distribution of the number of children.\nAnd the bar chart below shows the cumulative relative frequency distribution of the number of children with the frequency polygon.\nHistogram A histogram is similar to a bar chart but for grouped data.\nUsually the classes or grouping intervals are represented on the $x$-axis, and the frequencies on the $y$-axis. For each class, a bar is draw to the height of its frequency. Contrary to bar charts, the width of bars coincides with the width of classes, and there are no space between two consecutive bars.\nDepending on the type of frequency represented in the $y$-axis we get different types of histograms.\nAs with the bar chart, the frequency polygon can be drawn joining the top centre of every bar with straight lines.\nExample. The histogram below shows the absolute frequency distribution of heights.\nThe histogram below shows the relative frequency distribution of heights with the frequency polygon.\nThe cumulative frequency polygon (for absolute or relative frequencies) is known as ogive.\nExample. The histogram and the ogive below show the cumulative relative distribution of heights.\nObserve that in the ogive we join the top right corner of bars with straight lines, instead of the top center, because we do not reach the accumulated frequency of the class until the end of the interval.\nPie chart A pie chart consists of a circle divided in slices, one for every value or category of the variable. Each slice is called a sector and its angle or area is proportional to the frequency of the corresponding value or category.\nPie charts can represent absolute or relative frequencies, but not cumulative frequencies, and are used with nominal qualitative variables. For ordinal qualitative or quantitative variables is better to use bar charts, because it is easier to perceive differences in one dimension (length of bars) than in two dimensions (areas of sectors).\nExample. The pie chart below shows the relative frequency distribution of blood types.\nThe normal distribution Distributions with different properties will show different shapes.\nOutliers One of the main problems in samples are outliers, values very different from the rest of values of the sample.\nExample. The last height of the following sample of heights is an outlier.\nIt is important to find out outliers before doing any analysis, because outliers usually distort the results.\nThey always appears in the ends of the distribution, and can be found out easily with a box and whiskers chart (as be show later).\nOutliers management With big samples outliers have less importance and can be left in the sample.\nWith small samples we have several options:\nRemove the outlier if it is an error. Replace the outlier by the lower or higher value in the distribution that is not an outlier if it is not an error and the outlier does not fit the theoretical distribution. Leave the outlier if it is not an error, and change the theoretical model to fit it to outliers. Sample statistics The frequency table and charts summarize and give an overview of the distribution of values of the studied variable in the sample, but it is difficult to describe some aspects of the distribution from it, as for example, which are the most representative values of the distribution, how is the spread of data, which data could be considered outliers, or how is the symmetry of the distribution.\nTo describe those aspects of the sample distribution more specific numerical measures, called sample statistics, are used.\nAccording to the aspect of the distribution that they study, there are different types of statistics:\nMeasures of locations: They measure the values where data are concentrated or that divide the distribution into equal parts.\nMeasures of dispersion: They measure the spread of data.\nMeasures of shape: They measure aspects related to the shape of the distribution , as the symmetry and the concentration of data around the mean.\nLocation statistics There are two groups:\nCentral location measures: They measure the values where data are concentrated, usually at the centre of the distribution. These values are the values that best represents the sample data. The most important are:\nArithmetic mean Median Mode Non-central location measures: They divide the sample data into equals parts. The most important are:\nQuartiles. Deciles. Percentiles. Arithmetic mean Definition - Sample arithmetic mean $\\bar{x}$. The sample arithmetic mean of a variable $X$ is the sum of observed values in the sample divided by the sample size\n$$\\bar{x} = \\frac{\\sum x_i}{n}$$\nIt can be calculated from the frequency table with the formula\n$$\\bar{x} = \\frac{\\sum x_in_i}{n} = \\sum x_i f_i$$\nIn most cases the arithmetic mean is the value that best represent the observed values in the sample.\nWatch out! It can not be calculated with qualitative variables.\nExample - Non-grouped data. Using the data of the sample with the number of children of families, the arithmetic mean is\n$$ \\begin{aligned} \\bar{x} \u0026amp;= \\frac{1+2+4+2+2+2+3+2+1+1+0+2+2}{25}+\\newline\\newline \u0026amp;+\\frac{0+2+2+1+2+2+3+1+2+2+1+2}{25} = \\frac{44}{25} = 1.76 \\mbox{ children}. \\end{aligned} $$\nor using the frequency table\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; x_in_i \u0026amp; x_if_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 0 \u0026amp; 0\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 6 \u0026amp; 0.24\\newline 2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 28 \u0026amp; 1.12\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 6 \u0026amp; 0.24\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 4 \u0026amp; 0.16 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \u0026amp; 44 \u0026amp; 1.76 \\newline \\hline \\end{array} $$\n$$ \\bar{x} = \\frac{\\sum x_in_i}{n} = \\frac{44}{25}= 1.76 \\mbox{ children}\\qquad \\bar{x}=\\sum{x_if_i} = 1.76 \\mbox{ children}. $$\nThat means that the value that best represent the number of children in the families of the sample is 1.76 children.\nExample - Grouped data. Using the data of the sample of student heights, the arithmetic mean is\n$$\\bar{x} = \\frac{179+173+\\cdots+187}{30} = 175.07 \\mbox{ cm}.$$\nor using the frequency table and taking the class marks as $x_i$,\n$$ \\begin{array}{crrrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; x_in_i \u0026amp; x_if_i\\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 310 \u0026amp; 10.33\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; 0.27 \u0026amp; 1320 \u0026amp; 44.00\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.36 \u0026amp; 1925 \u0026amp; 64.17\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 0.23 \u0026amp; 1295 \u0026amp; 43.17\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 390 \u0026amp; 13 \\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; 1 \u0026amp; 5240 \u0026amp; 174.67 \\newline \\hline \\end{array} $$\n$$ \\bar{x} = \\frac{\\sum x_in_i}{n} = \\frac{5240}{30}= 174.67 \\mbox{ cm} \\qquad \\bar{x}=\\sum{x_if_i} = 174.67 \\mbox{ cm}. $$\nObserve that when the mean is calculated from the table the result differs a little from the real value, because the values used in the calculations are the class marks instead of the actual values.\nWeighted mean In some cases the values of the sample have different importance. In that case the importance or weight of each value of the sample must be taken into account when calculating the mean.\nDefinition - Sample weighted mean $\\bar{x}_p$. Given a sample of values $x_1,\\ldots,x_n$ where every value $x_i$ has a weight $w_i$, the sample weighted mean of variable $X$ is the sum of the product of each value by its weight, divided by sum of weights\n$$\\bar{x}_w = \\frac{\\sum x_iw_i}{\\sum w_i}$$\nFrom the frequency table can be calculated with the formula\n$$\\bar{x}_w = \\frac{\\sum x_iw_in_i}{\\sum w_i}$$\nExample. Assume that a student wants to calculate a representative measure of his/her performance in a course. The grade and the credits of every subjects are\nSubject Credits Grade Maths 6 5 Economics 4 3 Chemistry 8 6 The arithmetic mean is\n$$\\bar{x} = \\frac{\\sum x_i}{n} = \\frac{5+3+6}{3}= 4.67 \\text{ points}.$$\nHowever, this measure does not represent well the performance of the student, as not all the subjects have the same importance and require the same effort to pass. Subjects with more credits require more work and must have more weight in the calculation of the mean.\nIn this case it is better to use the weighted mean, using the credits as the weights of grades, as a representative measure of the student effort\n$$ \\bar{x}_w = \\frac{\\sum x_iw_i}{\\sum w_i} = \\frac{5\\cdot 6+3\\cdot 4+6\\cdot 8}{6+4+8}= \\frac{90}{18} = 5 \\text{ points}. $$\nMedian Definition - Sample median $Me$. The sample median of a variable $X$ is the value that is in the middle of the ordered sample. The median divides the sample distribution into two equal parts, that is, there are the same number of values above and below the median. Therefore, it has cumulative frequencies $N_{Me}= n/2$ y $F_{Me}= 0.5$.\nWatch out! It can not be calculated for nominal variables.\nWith non-grouped data, there are two possibilities:\nOdd sample size: The median is the value in the position $\\frac{n+1}{2}$. Even sample size: The median is the average of values in positions $\\frac{n}{2}$ and $\\frac{n}{2}+1$. Example. Using the data of the sample with the number of children of families, the sample size is 25, that is odd, and the median is the value in the position $\\frac{25+1}{2} = 13$ of the sorted sample.\n$$0,0,1,1,1,1,1,1,2,2,2,2,\\fbox{2},2,2,2,2,2,2,2,2,2,3,3,4$$\nAnd the median is 2 children.\nWith the frequency table, the median is the lowest value with a cumulative absolute frequency greater than or equal to $13$, or with a cumulative relative frequency greater than or equal to $0.5$.\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 2 \u0026amp; 0.08\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 8 \u0026amp; 0.32\\newline \\color{red}2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 22 \u0026amp; 0.88\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 24 \u0026amp; 0.96\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\end{array} $$\nMedian calculation for grouped-data For grouped data the median is calculated from the ogive, interpolating in the class with cumulative relative frequency 0.5.\nBoth expressions are equal as the angle $\\alpha$ is the same, and solving the equation we get that the formula for the median is\n$$ Me=l_i+\\frac{0.5-F_{i-1}}{F_i-F_{i-1}}(l_i-l_{i-1})=l_i+\\frac{0.5-F_{i-1}}{f_i}a_i $$\nExample - Grouped data. Using the data of the sample of student heights, the median falls in class (170,180].\nAnd interpolating in interval (170,180] we get\nEquating both expressions and solving the equation, we get\n$$ Me= 170+\\frac{0.5-0.34}{0.7-0.34}(180-170)=170+\\frac{0.16}{0.36}10=174.54 \\mbox{ cm}. $$\nThis means that half of the students in the sample have an height lower than or equal to 174.54 cm and the other half greater than or equal to.\nMode Definition - Sample Mode $Mo$. The sample mode of a variable $X$ is the most frequent value in the sample. With grouped data the modal class is the class with the highest frequency.\nIt can be calculated for all types of variables (qualitative and quantitative).\nDistributions can have more than one mode.\nExample. Using the data of the sample with the number of children of families, the value with the highest frequency is $2$, that is the mode $Mo = 2$ children.\n$$ \\begin{array}{rr} \\hline x_i \u0026amp; n_i \\newline \\hline 0 \u0026amp; 2 \\newline 1 \u0026amp; 6 \\newline \\color{red} 2 \u0026amp; 14 \\newline 3 \u0026amp; 2 \\newline 4 \u0026amp; 1 \\newline \\hline \\end{array} $$\nUsing the data of the sample of student heights, the class with the highest frequency is $(170,180]$ that is the modal class $Mo=(170,180]$.\n$$ \\begin{array}{cr} \\hline X \u0026amp; n_i \\newline \\hline (150,160] \u0026amp; 2 \\newline (160,170] \u0026amp; 8 \\newline \\color{red}{(170,180]} \u0026amp; 11 \\newline (180,190] \u0026amp; 7 \\newline (190,200] \u0026amp; 2 \\newline \\hline \\end{array} $$\nWhich central tendency statistic should I use? In general, when all the central tendency statistics can be calculated, is advisable to use them as representative values in the following order:\nThe mean. Mean takes more information from the sample than the others, as it takes into account the magnitude of data.\nThe median. Median takes less information than mean but more than mode, as it takes into account the order of data.\nThe mode. Mode is the measure that fewer information takes from the sample, as it only takes into account the absolute frequency of values.\nBut, be careful with outliers, as the mean can be distorted by them. In that case it is better to use the median as the value most representative.\nExample. If a sample of number of children of 7 families is\n0, 0, 1, 1, 2, 2, 15, then, $\\bar{x}=3$ children and $Me=1$ children.\nWhich measure represent better the number of children in the sample?\nNon-central location measures The non-central location measures or quantiles divide the sample distribution in equal parts.\nThe most used are:\nQuartiles: Divide the distribution into 4 equal parts. There are 3 quartiles: $C_1$ (25% accumulated) , $C_2$ (50% accumulated), $C_3$ (75% accumulated).\nDeciles: Divide the distribution into 10 equal parts. There are 9 deciles: $D_1$ (10% accumulated) ,…, $D_9$ (90% accumulated).\nPercentiles: Divide the distribution into en 100 equal parts. There are 99 percentiles: $P_1$ (1% accumulated),…, $P_{99}$ (99% accumulated).\nObserve that there is a correspondence between quartiles, deciles and percentiles. For example, first quartile coincides with percentile 25, and fourth decile coincides with the percentile 40.\nQuantiles are calculated in a similar way to the median. The only difference lies in the cumulative relative frequency that correspond to every quantile.\nExample. Using the data of the sample with the number of children of families, the cumulative relative frequencies were\n$$ \\begin{array}{rr} \\hline x_i \u0026amp; F_i \\newline \\hline 0 \u0026amp; 0.08\\newline 1 \u0026amp; 0.32\\newline 2 \u0026amp; 0.88\\newline 3 \u0026amp; 0.96\\newline 4 \u0026amp; 1\\newline \\hline \\end{array} $$\n$$ \\begin{aligned} F_{Q_1}=0.25 \u0026amp;\\Rightarrow Q_1 = 1 \\text{ children},\\newline F_{Q_2}=0.5 \u0026amp;\\Rightarrow Q_2 = 2 \\text{ children},\\newline F_{Q_3}=0.75 \u0026amp;\\Rightarrow Q_3 = 2 \\text{ children},\\newline F_{D_4}=0.4 \u0026amp;\\Rightarrow D_4 = 2 \\text{ children},\\newline F_{P_{92}}=0.92 \u0026amp;\\Rightarrow P_{92} = 3 \\text{ children}. \\end{aligned}$$\nDispersion statistics Dispersion or spread refers to the variability of data. So, dispersion statistics measure how the data values are scattered in general, or with respect to a central location measure.\nFor quantitative variables, the most important are:\nRange Interquartile range Variance Standard deviation Coefficient of variation Range Definition - Sample range. The sample range of a variable $X$ is the difference between the the maximum and the minimum values in the sample.\n$$\\text{Range} = \\max_{x_i} -\\min_{x_i}$$\nThe range measures the largest variation among the sample data. However, it is very sensitive to outliers, as they appear at the ends of the distribution, and for that reason is rarely used.\nInterquartile range The following measure avoids the problem of outliers and is much more used.\nDefinition - Sample interquartile range. The sample interquartile range of a variable $X$ is the difference between the third and the first sample quartiles.\n$$\\text{IQR} = Q_3-Q_1$$\nThe interquartile range measures the spread of the 50% central data.\nBox plot The dispersion of a variable in a sample can be graphically represented with a box plot, that represent five descriptive statistics (minimum, quartiles and maximum) known as the five-numbers. It consist in a box, drawn from the lower to the upper quartile, that represent the interquartile range, and two segments, known as the lower and the upper whiskers. Usually the box is split in two with the median.\nThis chart is very helpful as it serves to many purposes:\nIt serves to measure the spread of data as it represents the range and the interquartile range. It serves to detect outliers, that are the values outside the interval defined by the whiskers. It serves to measure the symmetry of distribution, comparing the length of the boxes and whiskers above and below the median. Example. The chart below shows a box plot of newborn weights.\nTo create a box plot follow the steps below:\nCalculate the quartiles.\nDraw a box from the lower to the upper quartile.\nSplit the box with the median or second quartile.\nFor the whiskers calculate first two values called fences $f_1$ y $f_2$. The lower fence is the lower quartile minus one and a half the interquartile range, and the upper fence is the upper quartile plus one and a half the interquartile range:\n$$\\begin{aligned} f_1\u0026amp;=Q_1-1.5,\\text{IQR}\\newline f_2\u0026amp;=Q_3+1.5,\\text{IQR} \\end{aligned}$$\nThe fences define the interval where data are considered normal. Any value outside that interval is considered an outlier.\nFor the lower whisker draw a segment from the lower quartile to the lower value in the sample grater than or equal to $f_1$, and for the upper whisker draw a segment from the upper quartile to the highest value in the sample lower than or equal to $f_2$.\nThe whiskers are not the fences. Finally, if there are outliers, draw a dot at every outlier. Example. The box plot for the sample with the number of children si shown below.\nDeviations from the mean Another way of measuring spread of data is with respect to a central tendency measure, as for example the mean.\nIn that case, it is measured the distance from every value in the sample to the mean, that is called deviation from the mean·\nIf deviations are big, the mean is less representative than when they are small.\nExample. The grades of 3 students in a course with subjects $A$, $B$ and $C$ are shown below.\n$$ \\begin{array}{cccc} \\hline A \u0026amp; B \u0026amp; C \u0026amp; \\bar x\\newline 0 \u0026amp; 5 \u0026amp; 10 \u0026amp; 5\\newline 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 5\\newline 5 \u0026amp; 5 \u0026amp; 5 \u0026amp; 5\\newline \\hline \\end{array} $$\nAll the students have the same mean, but, in which case does the mean represent better the course performance?\nVariance and standard deviation Definition \u0026ndash; Sample variance $s^2$. The sample variance of a variable $X$ is the average of the squared deviations from the mean.\n$$s^2 = \\frac{\\sum (x_i-\\bar x)^2n_i}{n} = \\sum (x_i-\\bar x)^2f_i$$\nIt can also be calculated with the formula\n$$s^2 = \\frac{\\sum x_i^2n_i}{n} -\\bar x^2= \\sum (x_i^2f_i)-\\bar x^2$$\nThe variance has the units of the variable squared, and to ease its interpretation it is common to calculate its square root.\nDefinition - Sample standard deviation $s$. The sample standard deviation of a variable $X$ is the square root of the variance.\n$$s = +\\sqrt{s^2}$$\nBoth variance and standard deviation measure the spread of data around the mean. When the variance or the standard deviation are small, the sample data are concentrated around the mean, and the mean is a good representative measure. In contrast, when variance or the standard deviation are high, the sample data are far from the mean, and the mean does not represent so well.\nStandard deviation small $\\Rightarrow$ Mean is representative Standard deviation big $\\Rightarrow$ Mean is unrepresentative Example. The following samples contains the grades of 2 students in 2 subjects\nWhich mean is more representative?\nExample - Non-grouped data. Using the data of the sample with the number of children of families, with mean $\\bar x= 1.76$ children, and adding a new column to the frequency table with the squared values,\n$$ \\begin{array}{rrr} \\hline x_i \u0026amp; n_i \u0026amp; x_i^2n_i \\newline \\hline 0 \u0026amp; 2 \u0026amp; 0 \\newline 1 \u0026amp; 6 \u0026amp; 6 \\newline 2 \u0026amp; 14 \u0026amp; 56\\newline 3 \u0026amp; 2 \u0026amp; 18\\newline 4 \u0026amp; 1 \u0026amp; 16 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 96 \\newline \\hline \\end{array}$$\n$$s^2 = \\frac{\\sum x_i^2n_i}{n}-\\bar x^2 = \\frac{96}{25}-1.76^2= 0.7424 \\mbox{ children}^2.$$\nand the standard deviation is $s=\\sqrt{0.7424} = 0.8616$ children.\nCompared to the range, that is 4 children, the standard deviation is not very large, so we can conclude that the dispersion of the distribution is small and consequently the mean, $\\bar x=1.76$ children, represents quite well the number of children of families of the sample.\nExample - Grouped data. Using the data of the sample with the heights of students and grouping heights in classes, we got a mean $\\bar x=174.67$ cm. The calculation of variance is the same than for non-grouped data but using the class marks.\n$$ \\begin{array}{crrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i^2n_i \\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; 48050\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; 217800\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 336875\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 239575\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 76050\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; 918350 \\newline \\hline \\end{array} $$\n$$s^2 = \\frac{\\sum x_i^2n_i}{n}-\\bar x^2 = \\frac{918350}{30}-174.67^2= 102.06 \\mbox{ cm}^2,$$\nand the standard deviation is $s=\\sqrt{102.06} = 10.1$ cm.\nThis value is quite small compared to the range of the variable, that goes from 150 to 200 cm, therefore the distribution of heights has little dispersion and the mean is very representative.\nCoefficient of variation Both, variance and standard deviation, have units and that makes difficult to interpret them, specially when comparing distributions of variables with different units.\nFor that reason it is also common to use the following dispersion measure that has no units.\nDefinition - Sample coefficient of variation $cv$. The sample coefficient of variation of a variable $X$ is the quotient between the sample standard deviation and the absolute value of the sample mean.\n$$cv = \\frac{s}{|\\bar x|}$$\nThe coefficient of variation measures the relative dispersion of data around the sample mean.\nAs it has no units, it is easier to interpret: The higher the coefficient of variation is, the higher the relative dispersion with respect to the mean and the less representative the mean is.\nThe coefficient of variation it is very helpful to compare dispersion in distributions of different variables, even if variables have different units.\nExample. In the sample of the number of children, where the mean was $\\bar x=1.76$ and the standard deviation was $s=0.8616$ children, the coefficient of variation is\n$$cv = \\frac{s}{|\\bar x|} = \\frac{0.8616}{|1.76|} = 0.49.$$\nIn the sample of heights, where the mean was $\\bar x=174.67$ cm and the standard deviation was $s=10.1$ cm, the coefficient of variation is\n$$cv = \\frac{s}{|\\bar x|} = \\frac{10.1}{|174.67|} = 0.06.$$\nThis means that the relative dispersion in the heights distribution is lower than in the number of children distribution, and consequently the mean of height is most representative than the mean of number of children.\nShape statistics They are measures that describe the shape of the distribution.\nIn particular, the most important aspects are:\nSymmetry It measures the symmetry of the distribution with respect to the mean. The statistics most used is the coefficient of skewness.\nKurtosis: It measures the concentration of data around the mean of the distribution. The statistics most used is the coefficient of kurtosis.\nCoefficient of skewness Definition - Sample coefficient of skewness $g_1$. The sample coefficient of skewness of a variable $X$ is the average of the deviations of values from the sample mean to cube, divided by the standard deviation to cube.\n$$g_1 = \\frac{\\sum (x_i-\\bar x)^3 n_i/n}{s^3} = \\frac{\\sum (x_i-\\bar x)^3 f_i}{s^3}$$\nThe coefficient of skewness measures the symmetric or skewness of the distribution, that is, how many values in the sample are above or below the mean and how far from it.\n$g_1=0$ indicates that there are the same number of values in the sample above and below the mean and equally deviated from it, and the distribution is symmetrical. $g_1\u0026lt;0$ indicates that there are more values above the mean than below it, but the values below are further from it, and the distribution is left-skewed (it has longer tail to the left). $g_1\u0026gt;0$ indicates that there are more values below the mean than above it, but the values above are further from it, and the distribution is right-skewed (it has longer tail to the right). Example - Grouped data. Using the frequency table of the sample with the heights of students and adding a new column with the deviations from the mean $\\bar x = 174.67$ cm to cube, we get\n$$ \\begin{array}{crrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i-\\bar x \u0026amp; (x_i-\\bar x)^3 n_i \\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; -19.67 \u0026amp; -15221.00\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; -9.67 \u0026amp; -7233.85\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.33 \u0026amp; 0.40\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 10.33 \u0026amp; 7716.12\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 20.33 \u0026amp; 16805.14\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; \u0026amp; 2066.81 \\newline \\hline \\end{array} $$\n$$g_1 = \\frac{\\sum (x_i-\\bar x)^3n_i/n}{s^3} = \\frac{2066.81/30}{10.1^3} = 0.07.$$\nAs it is close to 0, that means that the distribution of heights is fairly symmetrical.\nCoefficient of kurtosis Definition - Sample coefficient of kurtosis $g_2$ The sample coefficient of kurtosis of a variable $X$ is the average of the deviations of values from the sample mean to the fourth power, divided by the standard deviation to the fourth power and minus 3.\n$$g_2 = \\frac{\\sum (x_i-\\bar x)^4 n_i/n}{s^4}-3 = \\frac{\\sum (x_i-\\bar x)^4 f_i}{s^4}-3$$\nThe coefficient of kurtosis measures the concentration of data around the mean and the length of tails of distribution. The normal (Gaussian bell-shaped) distribution is taken as a reference.\n$g_2=0$ indicates that the kurtosis is normal, that is, the concentration of values around the mean is the same than in a Gaussian bell-shaped distribution (mesokurtic). $g_2\u0026lt;0$ indicates that the kurtosis is less than normal, that is, the concentration of values around the mean is less than in a Gaussian bell-shaped distribution (platykurtic). $g_2\u0026gt;0$ indicates that the kurtosis is greater than normal, that is, the concentration of values around the mean is greater than in a Gaussian bell-shaped distribution (leptokurtic). Example - Grouped data. Using the frequency table of the sample with the heights of students and adding a new column with the deviations from the mean $\\bar x = 174.67$ cm to the fourth power, we get\n$$ \\begin{array}{rrrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i-\\bar x \u0026amp; (x_i-\\bar x)^4 n_i\\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; -19.67 \u0026amp; 299396.99\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; -9.67 \u0026amp; 69951.31\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.33 \u0026amp; 0.13\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 10.33 \u0026amp; 79707.53\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 20.33 \u0026amp; 341648.49\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; \u0026amp; 790704.45 \\newline \\hline \\end{array} $$\n$$g_2 = \\frac{\\sum (x_i-\\bar x)^4n_i/n}{s^4} - 3 = \\frac{790704.45/30}{10.1^4}-3 = -0.47.$$\nAs it is a negative value but not too far from 0, that means that the distribution of heights is a little bit platykurtic.\nAs we will see in the chapters of inferential statistics, many of the statistical test can only be applied to normal (bell-shaped) populations.\nNormal distributions are symmetrical and mesokurtic, and therefore, their coefficients of symmetry and kurtosis are equal to 0. So, a way of checking if a sample comes from a normal population is looking how far are the coefficients of skewness and kurtosis from 0.\nIn general, the normality of population is rejected when $g_1$ or $g_2$ are outside the interval $[-2,2]$. In that case, is common to apply a transformation to the variable to correct non-normality.\nNon-normal distributions Non-normal right-skewed distribution An example of left-skewed distribution is the household income.\nNon-normal left-skewed distribution An example of left-skewed distribution is the age at death.\ndistribution Non-normal bimodal distribution An example of left-skewed distribution is the age at death.\nVariable transformations In many cases, the raw sample data are transformed to correct non-normality of distribution or just to get a more appropriate scale.\nFor example, if we are working with heights in metres and a sample contains the following values:\n$$ 1.75 \\mbox{ m}, 1.65 \\mbox{ m}, 1.80 \\mbox{ m}, $$\nit is possible to avoid decimals multiplying by 100, that is, changing from metres to centimetres:\n$$ 175 \\mbox{ cm}, 165 \\mbox{ cm}, 180 \\mbox{ cm}, $$\nAnd it is also possible to reduce the magnitude of data subtracting the minimum value in the sample, in this case 165 cm:\n$$ 10 \\mbox{ cm}, 0 \\mbox{ cm}, 15 \\mbox{ cm}. $$\nIt is obvious that these data are easier to work with than the original ones. In essences, what it is been done is to apply the following transformation to the data:\n$$Y= 100X-165$$\nLinear transformations One of the most common transformations is the linear transformation:\n$$Y=a+bX.$$\nFor a linear transformation, the mean and the standard deviation of the transformed variable are\n$$ \\begin{aligned} \\bar y \u0026amp;= a+ b\\bar x,\\newline s_{y} \u0026amp;= |b|s_{x} \\end{aligned} $$\nAdditionally, the coefficient of kurtosis does not change and the coefficient of skewness changes only the sign if $b$ is negative.\nStandardization and standard scores One of the most common linear transformations is the standardization.\nDefinition - Standardized variable and standard scores. The standardized variable of a variable $X$ is the variable that results from subtracting the mean from $X$ and dividing it by the standard deviation\n$$Z=\\frac{X-\\bar x}{s_{x}}.$$\nFor each value $x_i$ of the sample, the standard score is the value that results of applying the standardization transformation\n$$z_i=\\frac{x_i-\\bar x}{s_{x}}.$$\nThe standard score is the number of standard deviations a value is above or below the mean, and it is useful to avoid the dependency of the variable from its measurement units. This helps, for instance, to compare values from different variables or samples. The standardized variable always has mean 0 and standard deviation 1.\n$$\\bar z = 0 \\qquad s_{z} = 1$$\nExample. The grades of 5 students in 2 subjects are\n$$ \\begin{array}{rccccccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; \\newline \\hline X: \u0026amp; 2 \u0026amp; 5 \u0026amp; 4 \u0026amp; \\color{red} 8 \u0026amp; 6 \u0026amp; \\qquad \u0026amp; \\bar x = 5 \u0026amp; \\quad s_x = 2\\newline Y: \u0026amp; 1 \u0026amp; 9 \u0026amp; \\color{red} 8 \u0026amp; 5 \u0026amp; 2 \u0026amp; \\qquad \u0026amp; \\bar y = 5 \u0026amp; \\quad s_y = 3.16\\newline \\hline \\end{array} $$\nDid the fourth student get the same performance in subject $X$ than the third student in subject $Y$?\nIt might seem that both students had the same performance in every subject because they have the same grade, but in order to get the performance of every student relative to the group of students, the dispersion of grades in every subject must be considered. For that reason it is better to use the standard score as a measure of relative performance.\n$$ \\begin{array}{cccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; -1.50 \u0026amp; 0.00 \u0026amp; -0.50 \u0026amp; \\color{red}{1.50} \u0026amp; 0.50 \\newline Y: \u0026amp; -1.26 \u0026amp; 1.26 \u0026amp; \\color{red}{0.95} \u0026amp; 0.00 \u0026amp; -0.95\\newline \\hline \\end{array} $$\nThat is, the student with an 8 in $X$ is $1.5$ times the standard deviation above the mean of $X$, while the student with an 8 in $Y$ is only $0.95$ times the standard deviation above the mean of $Y$. Therefore, the first student had a higher performance in $X$ than the second in $Y$.\nFollowing with this example and considering both subjects, which is the best student?\nIf we only consider the sum of grades\n$$\\begin{array}{rccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; 2 \u0026amp; 5 \u0026amp; 4 \u0026amp; 8 \u0026amp; 6 \\newline Y: \u0026amp; 1 \u0026amp; 9 \u0026amp; 8 \u0026amp; 5 \u0026amp; 2 \\newline \\hline \\sum \u0026amp; 3 \u0026amp; \\color{red}{14} \u0026amp; 12 \u0026amp; 13 \u0026amp; 8 \\end{array} $$\nthe best student is the second one.\nBut if the relative performance is considered, taking the standard scores\n$$ \\begin{array}{rccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; -1.50 \u0026amp; 0.00 \u0026amp; -0.50 \u0026amp; 1.50 \u0026amp; 0.50 \\newline Y: \u0026amp; -1.26 \u0026amp; 1.26 \u0026amp; 0.95 \u0026amp; 0.00 \u0026amp; -0.95\\newline \\hline \\sum \u0026amp; -2.76 \u0026amp; 1.26 \u0026amp; 0.45 \u0026amp; \\color{red}{1.5} \u0026amp; -0.45 \\end{array} $$\nthe best student is the fourth one.\nNon-linear transformations Non-linear transformations are also common to correct non-normality of distributions.\nThe square transformation $Y=X^2$ compresses small values and expand large values. So, it is used to correct left-skewed distributions.\nThe square root transformation $Y=\\sqrt x$, the logarithmic transformation $Y= \\log X$ and the inverse transformation $Y=1/X$ compress large values and expand small values. So, they are used to correct right-skewed distributions.\nFactors Sometimes it is interesting to describe the frequency distribution of the main variable for different subsamples corresponding to the categories of another variable known as classificatory variable or factor.\nExample. Dividing the sample of heights by gender we get two subsamples\n$$ \\begin{array}{lll} \\hline \\mbox{Females} \u0026amp; \u0026amp; 173, 158, 174, 166, 162, 177, 165, 154, 166, 182, 169, 172, 170, 168. \\newline \\mbox{Males} \u0026amp; \u0026amp; 179, 181, 172, 194, 185, 187, 198, 178, 188, 171, 175, 167, 186, 172, 176, 187. \\newline \\hline \\end{array} $$\nComparing distributions for the levels of a factor Usually factors allow to compare the distribution of the main variable for every category of the factor.\nExample. The following charts allow to compare the distribution of heights according to the gender.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"2891575ebbe9e976e5448211d8b3c292","permalink":"/en/teaching/statistics/manual/descriptive-statistics/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/descriptive-statistics/","section":"teaching","summary":"Descriptive Statistics is the part of Statistics in charge of representing, analysing and summarizing the information contained in the sample.\nAfter the sampling process, this is the next step in every statistical study and usually consists of:","tags":["Statistics","Biostatistics","Descriptive-Statistics"],"title":"Descriptive Statistics","type":"book"},{"authors":null,"categories":["Calculus","Geometry"],"content":"Scalars and Vectors Scalars Some phenomena of Nature can be described by a number and a unit of measurement.\nDefinition - Scalar. A scalar is a number that expresses a magnitude without direction. Example. The height or weight of a person, the temperature of a gas or the time it takes a vehicle to travel a distance.\nHowever, there are other phenomena that cannot be described adequately by a scalar. If, for instance, a sailor wants to head for seaport and only knows the intensity of wind, he will not know what direction to take. The description of wind requires two elements: intensity and direction.\nVectors Definition - Vector. A vector is a number that expresses a magnitude and has associated an orientation and a sense. Example. The velocity of a vehicle or the force applied to an object.\nGeometrically, a vector is represented by an directed line segment, that is, an arrow.\nVector representation An oriented segment can be located in different places in a Cartesian space. However, regardless of where it is located, if the length and the direction of the segment does not change, the segment represents always the same vector.\nThis allows to represent all vectors with the same origin, the origin of the Cartesian coordinate system. Thus, a vector can be represented by the Cartesian coordinates of its final end in any Euclidean space.\nVector from two points Given two points $P$ and $Q$ of a Cartesian space, the vector that starts at $P$ and ends at $Q$ has coordinates $\\vec{PQ}=Q-P$.\nExample. Given the points $P=(1,1)$ and $Q=(3,4)$ in the real plane $\\mathbb{R}^2$, the coordinates of the vector that start at $P$ and ends at $Q$ are $$\\vec{PQ} = Q-P = (3,4)-(1,1) = (3-2,4-1) = (2,3).$$\nModule of a vector Definition - Module of a vector. Given a vector $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, the module of $\\mathbf{v}$ is $$|\\mathbf{v}| = \\sqrt{v_1^2+ \\cdots + v_n^2}.$$ The module of a vector coincides with the length of the segment that represents the vector.\nExamples. Let $\\mathbf{u}=(3,4)$ be a vector in $\\mathbb{R}^2$, then its module is $$|\\mathbf{u}| = \\sqrt{3^2+4^2} = \\sqrt{25} = 5$$\nLet $\\mathbf{v}=(4,7,4)$ be a vector in $\\mathbb{R}^3$, then its module is $$|\\mathbf{v}| = \\sqrt{4^2+7^2+4^2} = \\sqrt{81} = 9$$\nUnit vectors Definition - Unit vector. A vector $\\mathbf{v}$ in $\\mathbb{R}^n$ is a unit vector if its module is one, that is, $\\vert v\\vert=1$. The unit vectors with the direction of the coordinate axes are of special importance and they form the standard basis.\nIn $\\mathbb{R}^2$ the standard basis is formed by two vectors $\\mathbf{i}=(1,0)$ and $\\mathbf{j}=(0,1)$.\nIn $\\mathbb{R}^3$ the standard basis is formed by three vectors $\\mathbf{i}=(1,0,0)$, $\\mathbf{j}=(0,1,0)$ and $\\mathbf{k}=(0,0,1)$.\nSum of two vectors Definition - Sum of two vectors. Given two vectors $\\mathbf{u}=(u_1,\\cdots,u_n)$ y $\\mathbf{v}=(v_1,\\cdots,v_n)$ de $\\mathbb{R}^n$, the sum of $\\mathbf{u}$ and $\\mathbf{v}$ is\n$$\\mathbf{u}+\\mathbf{v} = (u_1+v_1,\\ldots, u_n+v_n).$$\nExample. Let $\\mathbf{u}=(3,1)$ and $\\mathbf{v}=(2,3)$ two vectors in $\\mathbb{R}^2$, then the sum of them is $$\\mathbf{u}+\\mathbf{v} = (3+2,1+3) = (5,4).$$\nProduct of a vector by a scalar Definition - Product of a vector by a scalar. Given a vector $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, and a scalar $a\\in \\mathbb{R}$, the product of $\\mathbf{v}$ by $a$ is\n$$a\\mathbf{v} = (av_1,\\ldots, av_n).$$\nExample. Let $\\mathbf{v}=(2,1)$ a vector in $\\mathbb{R}^2$ and $a=2$ a scalar, then the product of $a$ by $\\mathbf{v}$ is $$a\\mathbf{v} = 2(2,1) = (4,2).$$\nExpressing a vector as a linear combination of the standard basis The sum of vectors and the product of vector by a scalar allow us to express any vector as a linear combination of the standard basis.\nIn $\\mathbb{R}^3$, for instance, a vector with coordinates $\\mathbf{v}=(v_1,v_2,v_3)$ can be expressed as the linear combination $$\\mathbf{v}=(v_1,v_2,v_3) = v_1\\mathbf{i}+v_2\\mathbf{j}+v_3\\mathbf{k}.$$\nDot product of two vectors Definition - Dot product of two vectors. Given the vectors $\\mathbf{u}=(u_1,\\cdots,u_n)$ and $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, the dot product of $\\mathbf{u}$ and $\\mathbf{v}$ is\n$$\\mathbf{u}\\cdot \\mathbf{v} = u_1v_1 + \\cdots + u_nv_n.$$\nExample. Let $\\mathbf{u}=(3,1)$ and $\\mathbf{v}=(2,3)$ two vectors in $\\mathbb{R}^2$, then the dot product of them is\n$$\\mathbf{u}\\cdot\\mathbf{v} = 3\\cdot 2 +1\\cdot 3 = 9.$$\nTheorem - Dot product. Given two vectors $\\mathbf{u}$ and $\\mathbf{v}$ in $\\mathbb{R}^n$, it holds that\n$$\\mathbf{u}\\cdot\\mathbf{v} = |\\mathbf{u}||\\mathbf{v}|\\cos\\alpha$$\nwhere $\\alpha$ is the angle between the vectors.\nParallel vectors Definition - Parallel vectors. Two vectors $\\mathbf{u}$ and $\\mathbf{v}$ are parallel if there is a scalar $a\\in\\mathbb{R}$ such that\n$$\\mathbf{u} = a\\mathbf{v}.$$\nExample. The vectors $\\mathbf{u}=(-4,2)$ and $\\mathbf{v}=(2,-1)$ in $\\mathbb{R}^2$ are parallel, as there is a scalar $-2$ such that $$\\mathbf{u}= (-4,2) = -2(2,-1) = -2\\mathbf{v}.$$\nOrthogonal and orthonormal vectors Definition - Orthogonal and orthonormal vectors. Two vectors $\\mathbf{u}$ and $\\mathbf{v}$ are orthogonal if their dot product is zero,\n$$\\mathbf{u}\\cdot \\mathbf{v} = 0.$$\nIf in addition both vectors are unit vectors, $\\vert\\mathbf{u}\\vert=\\vert\\mathbf{v}\\vert=1$, then the vectors are orthonormal.\nOrthogonal vectors are perpendicular, that is the angle between them is right. Examples. The vectors $\\mathbf{u}=(2,1)$ and $\\mathbf{v}=(-2,4)$ in $\\mathbb{R}^2$ are orthogonal, as $$\\mathbf{u}\\mathbf{v} = 2\\cdot -2 +1\\cdot 4 = 0,$$ but they are not orthonormal since $|\\mathbf{u}| = \\sqrt{2^2+1^2} \\neq 1$ and $|\\mathbf{v}| = \\sqrt{-2^2+4^2} \\neq 1$.\nThe vectors $\\mathbf{i}=(1,0)$ and $\\mathbf{j}=(0,1)$ in $\\mathbb{R}^2$ are orthonormal, as $$\\mathbf{i}\\mathbf{j} = 1\\cdot 0 +0\\cdot 1 = 0, \\quad |\\mathbf{i}| = \\sqrt{1^2+0^2} = 1, \\quad |\\mathbf j| = \\sqrt{0^2+1^2} = 1.$$\nLines Vectorial equation of a straight line Definition - Vectorial equation of a straight line. Given a point $P=(p_1,\\ldots,p_n)$ and a vector $\\mathbf{v}=(v_1,\\ldots,v_n)$ of $\\mathbb{R}^n$, the vectorial equation of the line $l$ that passes through the point $P$ with the direction of $\\mathbf{v}$ is\n$$l: X= P + t\\mathbf{v} = (p_1,\\ldots,p_n)+t(v_1,\\ldots,v_n) = (p_1+tv_1,\\ldots,p_n+tv_n)$$\nwith $t\\in\\mathbb{R}.$\nExample. Let $l$ the line of $\\mathbb{R}^3$ that goes through $P=(1,1,2)$ with the direction of $\\mathbf{v}=(3,1,2)$, then the vectorial equation of $l$ is $$ l : X= P + t\\mathbf{v} = (1,1,2)+t(3,1,2) = (1+3t,1+t,2+2t)\\quad t\\in\\mathbb{R}. $$\nParametric and Cartesian equations of a line From the vectorial equation of a line $l: X=P + t\\mathbf{v}=(p_1+tv_1,\\ldots,p_n+tv_n)$ is easy to obtain the coordinates of the the points of the line with $n$ parametric equations\n$$x_1(t)=p_1+tv_1, \\ldots, x_n(t)=p_n+tv_n$$\nfrom where, if $\\mathbf{v}$ is a vector with non-null coordinates ($v_i\\neq 0$ $\\forall i$), we can solve for $t$ and equal the equations getting the Cartesian equations\n$$\\frac{x_1-p_1}{v_1}=\\cdots = \\frac{x_n-p_n}{v_n}$$\nExample. Given a line with vectorial equation $l: X=(1,1,2)+t(3,1,2) =(1+3t,1+t,2+2t)$ in $\\mathbb{R^3}$, its parametric equations are\n$$x(t) = 1+3t, \\quad y(t)=1+t, \\quad z(t)=2+2t,$$ and the Cartesian equations are $$\\frac{x-1}{3}=\\frac{y-1}{1}=\\frac{z-2}{2}$$\nPoint-slope equation of a line in the plane In the particular case of the real plane $\\mathbb{R}^2$, if we have a line with vectorial equation $l: X=P+t\\mathbf{v}=(x_0,y_0)+t(a,b) = (x_0+ta,y_0+tb)$, its parametric equations are\n$$x(t)=x_0+ta,\\quad y(t)=y_0+tb$$\nand its Cartesian equation is\n$$\\frac{x-x_0}{a} = \\frac{y-y_0}{b}.$$\nFrom this, moving $b$ to the other side of the equation, we get $$y-y_0 = \\frac{b}{a}(x-x_0),$$ or renaming $m=b/a$,\n$$y-y_0=m(x-x_0).$$\nThis equation is known as the point-slope equation of the line.\nSlope of a line in the plane Definition - Slope of a line in the plane. Given a line $l: X=P+t\\mathbf{v}$ in the real plane $\\mathbb{R}^2$, with direction vector $\\mathbf{v}=(a,b)$, the slope of $l$ is $b/a$. Recall that given two points $P=(x_1,y_1)$ y $Q=(x_2,y_2)$ on the line $l$, we can take as a direction vector the vector from $P$ to $Q$, with coordinates $\\vec{PQ}=Q-P=(x_2-x_1,y_2-y_1)$. Thus, the slope of $l$ is $\\dfrac{y_2-y_1}{x_2-x_1}$, that is, the ratio between the changes in the vertical and horizontal axes.\nPlanes Vector equation of a plane in space To get the equation of a plane in the real space $\\mathbb{R}^3$ we can take a point of the plane $P=(x_0,y_0,z_0)$ and an orthogonal vector to the plane $\\mathbf{v}=(a,b,c)$. Then, any point $Q=(x,y,z)$ of the plane satisfies that the vector $\\vec{PQ} = (x-x_0,y-y_0,z-z_0)$ is orthogonal to $\\mathbf{v}$, and therefore their dot product is zero.\nDefinition - Vector equation of a plane in space. Given a point $P=(v_0,y_0,z_0)$ an a vector $\\mathbf{v}=(a,b,c)$ in the real space $\\mathbb{R}^3$, the vector equation of the plane that passes through $P$ orthogonal to $\\mathbf{v}=(a,b,c)$ is\n$$ \\begin{align*} \\vec{PQ}\\cdot\\mathbf{v} \u0026amp;= (x-x_0,y-y_0,z-z_0)(a,b,c) =\\newline \u0026amp;= a(x-x_0)+b(y-y_0)+c(z-z_0) = 0. \\end{align*} $$\nScalar equation of a plane in space From the vector equation of the plane we can get\n$$a(x-x_0)+b(y-y_0)+c(z-z_0) = 0 \\Leftrightarrow ax+by+cz=ax_0+by_0+cz_0,$$\nthat, renaming $d=ax_0+by_0+cz_0$, can be written as\n$$ax+by+cz=d,$$\nand is known as the scalar equation of the plane.\nExample. Given the point $P=(2,1,1)$ and the vector $\\mathbf{v}=(2,1,2)$, the vector equation of the plane that passes through $P$ and is orthogonal to $\\mathbf{v}$ is\n$$(x-2,y-1,z-1)(2,1,2)=2(x-2)+(y-1)+2(z-1)=0,$$\nand its scalar equation is\n$$2x+y+2z=7.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"22fd1e7f4350c0e3490590a757816c43","permalink":"/en/teaching/calculus/manual/analytic-geometry/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/analytic-geometry/","section":"teaching","summary":"Scalars and Vectors Scalars Some phenomena of Nature can be described by a number and a unit of measurement.\nDefinition - Scalar. A scalar is a number that expresses a magnitude without direction.","tags":null,"title":"Analytic Geometry","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Classify the following variables\nDaily hours of exercise. Nationality. Blood pressure. Severity of illness. Number of sport injuries in a year. Daily calorie intake. Size of clothing. Subjects passed in a course. Solution Quantitative continuous. Qualitative nominal. Quantitative continuous. Qualitative ordinal. Quantitative discrete. Quantitative continuous. Qualitative ordinal. Quantitative discrete. Exercise 2 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Compute:\nConstruct the frequency distribution table of the sample. Draw the bar chart of the sample and the polygon. Draw the cumulative frequency bar chart and polygon. Solution Injuries $n_i$ $f_i$ $N_i$ $F_i$ 0 6 0.2500 6 0.2500 1 11 0.4583 17 0.7083 2 5 0.2083 22 0.9167 3 2 0.0833 24 1.0000 3. Exercise 3 A survey about the daily number of medicines consumed by people over 70 shows the following results:\n3 1 2 2 0 1 4 2 3 5 1 3 2 3 1 4 2 4 3 2 3 5 0 1 2 0 2 3 0 1 1 5 3 4 2 3 0 1 2 3 Construct the frequency distribution table of the sample. Draw the bar chart of the sample and the polygon. Draw the cumulative relative frequency bar chart and polygon. Solution Medicines $n_i$ $f_i$ $N_i$ $F_i$ 1 8 0.200 13 0.325 2 10 0.250 23 0.575 3 10 0.250 33 0.825 4 4 0.100 37 0.925 5 3 0.075 40 1.000 3. Exercise 4 In a survey about the dependency of older people, 23 persons over 75 years were asked about the help they need in daily life. The answers were\nB D A B C C B C D E A B C E A B C D B B A A B where the meanings of letters are:\nA No help. B Help climbing stairs. C Help climbing stairs and getting up from a chair or bed. D Help climbing stairs, getting up and dressing. E Help for almost everything.\nConstruct the frequency distribution table and a suitable chart.\nSolution Help $n_i$ $f_i$ $N_i$ $F_i$ A 5 0.2174 5 0.2174 B 8 0.3478 13 0.5652 C 5 0.2174 18 0.7826 D 3 0.1304 21 0.9130 E 2 0.0870 23 1.0000 Exercise 5 The number of people treated in the emergency service of a hospital every day of November was\n15 23 12 10 28 7 12 17 20 21 18 13 11 12 26 30 6 16 19 22 14 17 21 28 9 16 13 11 16 20 Construct the frequency distribution table of the sample. Draw a suitable chart for the frequency distribution. Draw a suitable chart for the cumulative frequency distribution. Solution People $n_i$ $f_i$ $N_i$ $F_i$ [5,10] 4 0.1333 4 0.1333 (10,15] 9 0.3000 13 0.4333 (15,20] 9 0.3000 22 0.7333 (20,25] 4 0.1333 26 0.8667 (25,30] 4 0.1333 30 1.0000 3. Exercise 6 The following frequency distribution table represents the distribution of time (in min) required by people attended in a medical dispensary.\n$$ \\begin{array}{|c|c|c|c|c|} \\hline \\mbox{Time} \u0026amp; n_{i} \u0026amp; f_{i} \u0026amp; N_{i} \u0026amp; F_{i}\\newline \\hline \\left[ 0,5\\right) \u0026amp; 2 \u0026amp; \u0026amp; \u0026amp; \\newline \\hline \\left[ 5,10\\right) \u0026amp; \u0026amp; \u0026amp; 8 \u0026amp; \\newline \\hline \\left[ 10,15\\right) \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.7 \\newline \\hline \\left[ 15,20\\right) \u0026amp; 6 \u0026amp; \u0026amp; \u0026amp;\\newline \\hline \\end{array} $$\nComplete the table. Draw the ogive. Solution $$ \\begin{array}{|c|c|c|c|c|} \\hline \\mbox{Time} \u0026amp; n_{i} \u0026amp; f_{i} \u0026amp; N_{i} \u0026amp; F_{i}\\newline \\hline \\left[ 0,5\\right) \u0026amp; 2 \u0026amp; 0.1 \u0026amp; 2 \u0026amp; 0.1 \\newline \\hline \\left[ 5,10\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 8 \u0026amp; 0.4 \\newline \\hline \\left[ 10,15\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 14 \u0026amp; 0.7 \\newline \\hline \\left[ 15,20\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 20 \u0026amp; 1\\newline \\hline \\end{array} $$\nExercise 7 The following table represents the frequency distribution of the yearly uses of a health insurance in a sample of clients of a insurance company.\nuses clients 0 4 1 8 2 6 3 3 4 2 5 1 7 1 Draw the box plot. Study the symmetry of the distribution.\nSolution Exercise 8 The box plots below correspond to the age of a sample of people by marital status.\nWhich group has higher ages? Which group has lower central dispersion? Which groups have outliers? At which group is the age distribution more asymmetric? Solution Widowers. Divorced. Widowers and divorced. Divorced. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"053ef795366cc6d9468a03875df23d5a","permalink":"/en/teaching/statistics/problems/frequency_charts/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/frequency_charts/","section":"teaching","summary":"Exercise 1 Classify the following variables\nDaily hours of exercise. Nationality. Blood pressure. Severity of illness. Number of sport injuries in a year. Daily calorie intake. Size of clothing. Subjects passed in a course.","tags":["Frequencies","Charts"],"title":"Problems of Frequency Tables and Charts","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Excel is a spreadsheet application that is part of the Microsoft Office suite.\nWhat is a spreadsheet? A spreadsheet is a program that allows the user to enter data and make calculations with them in a grid layout.\nThere are a lot of programs for managing spreadsheets but the best-known are Excel, in the Microsoft Office suite, and Calc, in the LibreOffice suite. Although Calc is opensource, with all the advantages associated therewith, Excel is by far the most widespread and mature spreadsheet, thus this manual covers Excel 2010. However, some of the procedures and methods explained in this manual are also valid for Calc.\nExcel 2010 main window The figure below shows a screenshot of the Excel 2010 main window where the different parts of the window have been highlighted.\nExcel 2010 ribbon The top ribbon of Excel 2010 contains a lot of buttons that perform different actions. These buttons are arranged in panels, and the panels are arranged in tabs. The main ribbon tabs are:\nFile – Performs file management tasks (new file, open file, save file, print file, etc.). It also contains general configuration options and help.\nHome – Common tools (clipboard, fonts, alignment, numbers format, insert rows and columns, etc.)\nInsert – Insert objects in the sheet (tables, illustrations, charts, hyperlinks, text, equations, etc.)\nPage Layout – Configure the printing (page setup, scale, themes, etc. )\nFormulas – Functions arranged in categories and formula auditing.\nData – Working with databases (import data, connection with databases, sort and filter data, data validation, etc.)\nReview – Spelling, commenting, protecting and sharing sheets.\nView – How Excel appears on screen (custom windows, grids lines, zoom, windows, etc. Does not affect printing).\nContextual tabs These tabs only appear in some contexts, as for example, when creating a chart or a picture.\nChart design Allows to select the type of chart.\nChart layout Allows the user to insert and configure some parts of charts (title, axis, leyend, gridlines, etc.)\nChart format Allows the user to change the aspect of charts (height, width, font, colors, background, etc.)\nPicture Allows to modify images (borders, rotation, crop, color, filters, special effects, etc.)\nIn addition to these tabs, users can create their own tabs and customise them with buttons at their convenience.\nThere is also a quick access toolbar just above the ribbon that can be customised with the most common buttons.\nAccess dialogs When you click the right bottom corner of any panel, the corresponding dialog is shown where all the related options are available.\nExample. Figure below shows the font dialog with all the options related to fonts (font family, font style, font size, etc.)\nContextual menu Clicking the right button of the mouse (right-clicking) a contextual menu is shown with some buttons or options to perform actions in that context. This menu has different options depending on the part of the windows that is clicked.\nExample. Figure below shows the contextual menu showed right-clicking any cell.\nWorkbooks, worksheets, rows, columns and cells An Excel file is a workbook with several worksheets that are two dimensional tables divided in columns and rows. The intersection of a column with a row is a cell that is where data are entered. Sheets have a maximum of 16,384 columns and 1,048,576 rows.\nEach worksheet has a name and they are arranged in tabs at the bottom. Columns and rows also have names; columns are named with letters at the top of the column and rows with numbers to the left of the row. This way each cell is identified by the name of the worksheet, the name of the column and the name of the row where it is located, and cell names follow the pattern: name-of-worksheet ! column-name row-name. However, to refer to any cell in the active worksheet, the worksheet name may be omitted.\nExample. The name of the selected cell in the figure below is Sheet1!C4.\nThe names of rows and columns can not be changed, but worksheet names can be changed by double-clicking on the name and typing the new name.\nRanges of cells A range of cells is a rectangular block of adjacent cells that is identified by top-left cell and the bottom-right cell separated by a colon, following the pattern top-left-cell-name:bottom-right-cell-name.\nExample. In the figure below the range B3:E5 is selected.\nSelecting cells, rows, columns, ranges and worksheets To select a cell just click it. To select a row click the header of the row or press the keys Shift+Spacebar. To select a column click the header of the column or press the keys Ctrl+Spacebar. To select a range click one corner cell and drag the cursor over the desired cells. To select the whole worksheet click the top-left corner of the worksheet or press the keys Ctrl+A.\nExample. The animation below shows how to select cell C3, then row 3, then column C, then range B3:D7 and finally the whole worksheet.\nData edition Insert data Data are entered into the cells by activating the cell (clicking it) and typing directly in the cell or in the input bar.\nExample. The animation below shows how to enter the text \u0026lsquo;Excel\u0026rsquo; in cell B2 and the number 2010 in cell C2, and then how to change the number of cell C2 to 2013.\nExcel has a smart autocomplete feature that proposes some options for completing the typed data.\nDelete data To delete the content of a cell or a range of cells simply select the it and press Supr key. It is also possible to delete the cell contents with the button Clear All.\nRemove cells, rows, columns and worksheets To remove a whole cell (not only the content), right-click the cell and select the option Delete.... In the dialog that appears select Shift cells left if you want the cells to the left of the removed cell to move to the left to fill the gap, or Shift cells up if you want the cells below the removed cell to move up to fill the gap.\nTo remove a whole row, right-click the header of the row and select the option Delete....\nTo remove a whole column, right-click the header of the column and select the option Delete....\nTo remove a worksheet, right-click the tab with the name of the worksheet and select the option Delete.... Warning: Removing worksheets cannot be undone!\nExample. This shows how to remove a cell, a row, a column and a worksheet.\nInsert cells, rows, columns and worksheets To insert a new cell in a position, right-click the current cell in that position and select the option Insert.... In the dialog that appears select Shift cells right if you want to move the cells to the right to make a gap for the new cell, or Shift cells down if you want to move the cells down to make a gap for the new cell.\nTo insert a new row, right-click the header of the row above which you want to insert the new row and select Insert.\nTo insert a new column, right-click the header of the column to the left of which you want to insert the new column and select Insert.\nTo insert a new worksheet, right-click the tab with the name of the worksheet to the left of which you want to insert the new worksheet and select Insert. In the dialog that appears select `Worksheet\u0026rsquo;.\nExample. The animation below shows how to insert a cell, a row, a column and a worksheet.\nCut, copy and paste Like in many other Windows applications, you can use the clipboard to cut, copy and paste cells, rows, columns and ranges contents.\nTo cut or copy a cell, row, column or range, right-click it and select the option Cut or Copy respectively, or press the keys Ctrl+x or Ctrl+c respectively. Both options copy the content of the cell, row, column or range to the clipboard, but the difference between cut and copy is that cut deletes the content from the current cell, row, column or range, while copy does not.\nTo paste the content of the clipboard in a new cell, row, column or range, select the cell or the first cell of the row, column or range and click the button Paste or press the keys Ctrl+v.\nExample. The animation below shows how to copy and paste the content of a cell, a row, a column and a range and a worksheet.\nAutofill A useful feature of Excel is the autofill of cells following a serie or pattern. In some cases, like for example dates, it is enough to write the content of the first cell and then click the bottom-right corner of the cell and drag the cursor over the column or row to fill the cells with the subsequent dates.\nFor numbers or text, this action replicates the content of the first cell in the others. To autofill with a series of numbers it is necessary to enter the first two numbers of the series in two consecutive cells, then select both cells, click the bottom-left corner and drag the cursor over the column or row to fill the cells with the numbers following in the series.\nExample. The animation below shows how to replicate the content of cell A1 to range A2:A10, then how to auto fill the range B1:B10 with the dates following date in cell B1, and finally how to auto fill the range C1:C10 with the series of even numbers.\nUndo and redo In the quick access toolbar there are buttons Undo and Redo . The Undo button undoes the last data edition action performed and the Redo button reverses the last undone action. If you press the undo button several $n$ times, it undoes the last $n$ actions, and the same happens with the redo button.\nExample. The animation below shows how to remove the content of cell B2, then change the content of cell C2 two times, then undo that action and finally redo the same actions.\nColumn and row sizing Column width and row height can be easily changed. To change the width of a column click the line between the column you want to resize and the next column in the column header, and then drag the pointer mouse to increase or reduce the column width. If you double-click this line the column width will auto resize to the width of the widest cell content in the column.\nIn a similar way, to change the height of a row click the line between the row you want to resize and the next row in the row header, and then drag the pointer mouse to increase or reduce the row height. If you double-click this line the row height will auto resize to the height of the highest cell content in the row.\nExample. The animation below shows how to resize the width of column C and the height of row 3 to fit the content of cell C3.\nFile management Data of workbooks are stored in files. Although Excel makes backups copies of your work regularly, it is good practice to save your work in files regularly.\nSave a file To save the content of a workbook in a file press the tab File and select the option Save. In the dialog that appears type the file name and select the storage unit and folder where you want to save the file. The default extension for Excel 2010 file names is xlsx.\nOpen a file To open an Excel file press the tab File and select the option Open. In the dialog that appears select the storage unit and folder where the file is saved and the file to open, and press the button Open.\nCreate a new workbook To create a new workbook press the tab File and select the option New. In the dialog that appears select Blank workbook. It is possible to create new workbooks from predefined templates.\nClose a workbook To close an open workbook press the tab File and select the option Close. If the last changes in the workbook haven\u0026rsquo;t been saved, a warning will appear allowing you to save the file before to close it.\nExporting and importing data Excel can export and import data in many formats. One of the most common formats is csv (comma separated values). In this format data is saved in a plain text file one row per line and separating columns with commas or semicolons.\nExport to csv format To export a worksheet to csv format file, click the option Save as of the ribbon\u0026rsquo;s File tab. In the dialog that appears select the option CSV (Comma delimited) (*.csv) from the drop-down list Save as type, give a name to the file, select the folder where to save it and click OK.\nExample. The animation below shows how to export a worksheet with a students database to a csv format file.\nImport from csv format To import csv format file click the option Open of the ribbon\u0026rsquo;s File tab. In the dialog that appears click the button to the right of the File name box and select the option Text Files (*.prn;*.txt;*.csv), select the csv format file and click OK.\nIf you want more control in the importation process, click the From Tex button of the Get External Data in the ribbon\u0026rsquo;s Data tab. In the dialog that appears select the csv format file and click the Import button. This brings another dialog where you can select if fields are delimited by a special character or are a fixed number of characters, the delimiter character (Tab, Semicolon, Comma, Space or other), the data format or every column (General, Text or Date). After that click the Finish button and in the dialog that appears select the cell where to put the imported data and click OK.\nExample. The animation below shows how to import the csv format file with the students database of the previous example.\nGetting help One of the most useful features of Microsoft Office programs is the system of help that they have. To get help about any issue in Excel click the option Help in the Help tab of the ribbon, and then click Microsoft Office Help. This shows a browser where you can enter some key words and Excel will search topics related to these words and present the search results in a list. Clicking the desired topic will show you help info about that topic.\nExample. The figure below shows the help search results for the word \u0026ldquo;cell\u0026rdquo;.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"0ef4310edc3a4ddaec8751dec5ce4428","permalink":"/en/teaching/excel/manual/introduction/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/introduction/","section":"teaching","summary":" ","tags":["Excel"],"title":"Introduction","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"In the last chapter we saw how to describe the distribution of a single variable in a sample. However, in most cases, studies require to describe several variables that are often related. For instance, a nutritional study should consider all the variables that could be related to the weight, as height, age, gender, smoking, diet, physic exercise, etc.\nTo understand a phenomenon that involve several variables is not enough to study every variable by its own. We have to study all the variables together to describe how they interact and the type of relation among them.\nUsually in a dependency study there is a dependent variable $Y$ that it is supposed to be influenced by a set of variables $X_1,\\ldots,X_n$ known as independent variables. The simpler case is a simple dependency study when there is only one independent variable, that is the case covered in this chapter.\nJoint distribution Joint frequencies To study the relation between two variables $X$ and $Y$, we have to study the joint distribution of the two-dimensional variable $(X,Y)$, whose values are pairs $(x_i,y_j)$ where the first element is a value of $X$ and the second a value of $Y$.\nDefinition - Joint sample frequencies. Given a sample of $n$ values and a two-dimensional variable $(X,Y)$, for every value of the variable $(x_i,y_j)$ is defined:\nAbsolute frequency $n_{ij}$: Is the number of times that the pair $(x_i,y_j)$ appears in the sample. Relative frequency $f_{ij}$: Is the proportion of times that the pair $(x_i,y_j)$ appears in the sample. $$f_{ij}=\\frac{n_{ij}}{n}.$$\nFor two-dimensional variables it make no sense cumulative frequencies. Joint frequency distribution The values of the two-dimensional variable with their frequencies is known as joint frequency distribution, and is represented in a joint frequency table.\n$$\\begin{array}{|c|ccccc|} \\hline X\\backslash Y \u0026amp; y_1 \u0026amp; \\cdots \u0026amp; y_j \u0026amp; \\cdots \u0026amp; y_q \\newline \\hline x_1 \u0026amp; n_{11} \u0026amp; \\cdots \u0026amp; n_{1j} \u0026amp; \\cdots \u0026amp; n_{1q} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \\newline x_i \u0026amp; n_{i1} \u0026amp; \\cdots \u0026amp; n_{ij} \u0026amp; \\cdots \u0026amp; n_{iq} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \\newline x_p \u0026amp; n_{p1} \u0026amp; \\cdots \u0026amp; n_{pj} \u0026amp; \\cdots \u0026amp; n_{pq} \\newline \\hline \\end{array}$$\nExample (grouped data). The height (in cm) and weight (in kg) of a sample of 30 students is:\n(179,85), (173,65), (181,71), (170,65), (158,51), (174,66), (172,62), (166,60), (194,90), (185,75), (162,55), (187,78), (198,109), (177,61), (178,70), (165,58), (154,50), (183,93), (166,51), (171,65), (175,70), (182,60), (167,59), (169,62), (172,70), (186,71), (172,54), (176,68),(168,67), (187,80). The joint frequency table is\n$$\\begin{array}{|c||c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \\ \\newline \\hline \\end{array}$$\nScatter plot The joint frequency distribution can be represented graphically with a scatter plot, where data is displayed as a collections of points on a $XY$ coordinate system.\nUsually the independent variable is represented in the $X$ axis and the dependent variable in the $Y$ axis. For every data pair $(x_i,y_j)$ in the sample a dot is drawn on the plane with those coordinates.\nThe result is a set of points that usually is known as a point cloud.\nExample. The scatter plot below represent the distribution of heights and weights of the previous sample.\nThe shape of the point cloud in a scatter plot gives information about the type of relation between the variables.\nMarginal frequency distributions The frequency distributions of each variable of the two-dimensional variable are known as marginal frequency distributions.\nWe can get the marginal frequency distributions from the joint frequency table by adding frequencies by rows and columns.\n$$\\begin{array}{|c|ccccc|c|} \\hline X\\backslash Y \u0026amp; y_1 \u0026amp; \\cdots \u0026amp; y_j \u0026amp; \\cdots \u0026amp; y_q \u0026amp; \\color{red}{n_x} \\newline \\hline x_1 \u0026amp; n_{11} \u0026amp; \\cdots \u0026amp; n_{1j} \u0026amp; \\cdots \u0026amp; n_{1q} \u0026amp; \\color{red}{n_{x_1}} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\downarrow + \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\color{red}{\\vdots} \\newline x_i \u0026amp; n_{i1} \u0026amp; \\stackrel{+}{\\rightarrow} \u0026amp; n_{ij} \u0026amp; \\stackrel{+}{\\rightarrow} \u0026amp; n_{iq} \u0026amp; \\color{red}{n_{x_i}} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\downarrow + \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\color{red}{\\vdots} \\newline x_p \u0026amp; n_{p1} \u0026amp; \\cdots \u0026amp; n_{pj} \u0026amp; \\cdots \u0026amp; n_{pq} \u0026amp; \\color{red}{n_{x_p}} \\newline \\hline \\color{red}{n_y} \u0026amp; \\color{red}{n_{y_1}} \u0026amp; \\color{red}{\\cdots} \u0026amp; \\color{red}{n_{y_j}} \u0026amp; \\color{red}{\\cdots} \u0026amp; \\color{red}{n_{y_q}} \u0026amp; n \\newline \\hline \\end{array}$$\nExample. The marginal frequency distributions for the previous sample of heights and weights are\n$$ \\begin{array}{|c||c|c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \u0026amp; \\color{red}{n_x}\\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{2}\\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{8}\\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{11} \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \u0026amp; \\color{red}{7} \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \u0026amp; \\color{red}{2}\\ \\newline \\hline \\color{red}{n_y} \u0026amp; \\color{red}{7} \u0026amp; \\color{red}{11} \u0026amp; \\color{red}{7} \u0026amp; \\color{red}{2} \u0026amp; \\color{red}{2} \u0026amp; \\color{red}{1} \u0026amp; 30\\ \\newline \\hline \\end{array} $$\nand the corresponding statistics are\n$$ \\begin{array}{lllll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \u0026amp; \\quad \u0026amp; s_x = 10.1 \\mbox{ cm} \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \u0026amp; \u0026amp; s_y = 12.82 \\mbox{ Kg} \\end{array} $$\nCovariance To study the relation between two variables, we have to analyze the joint variation of them.\nDividing the point cloud of the scatter plot in 4 quadrants centered in the mean point $(\\bar x, \\bar y)$, the sign of deviations from the mean is:\nQuadrant $(x_i-\\bar x)$ $(y_j-\\bar y)$ $(x_i-\\bar x)(y_j-\\bar y)$ 1 $+$ $+$ $+$ 2 $-$ $+$ $-$ 3 $-$ $-$ $+$ 4 $+$ $-$ $-$ If there is an increasing linear relationship between the variables, most of the points will fall in quadrants 1 and 3, and the sum of the products of deviations from the mean will be positive.\n$$\\sum(x_i-\\bar x)(y_j-\\bar y) \u0026gt; 0$$\nIf there is an decreasing linear relationship between the variables, most of the points will fall in quadrants 2 and 4, and the sum of the products of deviations from the mean will be negative.\n$$\\sum(x_i-\\bar x)(y_j-\\bar y) \u0026lt; 0$$\nUsing the products of deviations from the means we get the following statistic.\nDefinition - Sample covariance. The sample covariance of a two-dimensional variable $(X,Y)$ is the average of the products of deviations from the respective means.$$s_{xy}=\\frac{\\sum (x_i-\\bar x)(y_j-\\bar y)n_{ij}}{n}$$ It can also be calculated using the formula\n$$s_{xy}=\\frac{\\sum x_iy_jn_{ij}}{n}-\\bar x\\bar y.$$\nThe covariance measures the linear relation between two variables:\nIf $s_{xy}\u0026gt;0$ there exists an increasing linear relation. If $s_{xy}\u0026lt;0$ there exists a decreasing linear relation. If $s_{xy}=0$ there is no linear relation. Example. Using the joint frequency table of the sample of heights and weights\n$$ \\begin{array}{|c||c|c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \u0026amp; n_x\\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 2\\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 8\\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \u0026amp; 11 \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \u0026amp; 7 \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \u0026amp; 2\\ \\newline \\hline n_y \u0026amp; 7 \u0026amp; 11 \u0026amp; 7 \u0026amp; 2 \u0026amp; 2 \u0026amp; 1 \u0026amp; 30\\ \\newline \\hline \\end{array} $$\n$$\\bar x = 174.67 \\mbox{ cm} \\qquad \\bar y = 69.67 \\mbox{ Kg}$$\nwe get that the covariance is equal to\n$$ \\begin{aligned} s_{xy} \u0026amp;=\\frac{\\sum x_iy_jn_{ij}}{n}-\\bar x\\bar y = \\frac{155\\cdot 55\\cdot 2 + 165\\cdot 55\\cdot 4 + \\cdots + 195\\cdot 105\\cdot 1}{30}-174.67\\cdot 69.67 = \\newline \u0026amp; = \\frac{368200}{30}-12169.26 = 104.07 \\mbox{ cm$\\cdot$ Kg}. \\end{aligned} $$\nThis means that there is a increasing linear relation between the weight and the height.\nRegression In most cases the goal of a dependency study is not only to detect a relation between two variables, but also to express that relation with a mathematical function, $$y=f(x)$$ in order to predict the dependent variable for every value of the independent one. The part of Statistics in charge of constructing such a function is called regression, and the function is known as regression function or regression model.\nSimple regression models There are a lot of types of regression models. The most common models are shown in the table below.\nModel Equation Linear $y=a+bx$ Quadratic $y=a+bx+cx^2$ Cubic $y=a+bx+cx^2+dx^3$ Potential $y=a\\cdot x^b$ Exponential $y=e^{a+bx}$ Logarithmic $y=a+b\\log x$ Inverse $y=a+\\frac{b}{x}$ Sigmoidal $y=e^{a+\\frac{b}{x}}$ The model choice depends on the shape of the points cloud in the scatterplot.\nResiduals or predictive errors Once chosen the type of regression model, we have to determine which function of that family explains better the relation between the dependent and the independent variables, that is, the function that predicts better the dependent variable.\nThat function is the function that minimizes the distances from the observed values for $Y$ in the sample to the predicted values of the regression function. These distances are known as residuals or predictive errors.\nDefinition - Residuals or predictive errors. Given a regression model $y=f(x)$ for a two-dimensional variable $(X,Y)$, the residual or predictive error for every pair $(x_i,y_j)$ of the sample is the difference between the observed value of the dependent variable $y_j$ and the predicted value of the regression function for $x_i$,$$e_{ij} = y_j-f(x_i).$$ Least squares fitting A way to get the regression function is the least squares method, that determines the function that minimizes the squared residuals.\n$$\\sum e_{ij}^2.$$\nFor a linear model $f(x) = a + bx$, the sum depends on two parameters,the intercept $a$, and the slope $b$ of the straight line,\n$$\\theta(a,b) = \\sum e_{ij}^2 =\\sum (y_j - f(x_i))^2 =\\sum (y_j-a-bx_i)^2.$$\nThis reduces the problem to determine the values of $a$ and $b$ that minimize this sum.\nTo solve the minimization problem, we have to set to zero the partial derivatives with respect to $a$ and $b$.\n$$ \\begin{aligned} \\frac{\\partial \\theta(a,b)}{\\partial a} \u0026amp;= \\frac{\\partial \\sum (y_j-a-bx_i)^2 }{\\partial a} =0 \\newline \\frac{\\partial \\theta(a,b)}{\\partial b} \u0026amp;= \\frac{\\partial \\sum (y_j-a-bx_i)^2 }{\\partial b} =0 \\end{aligned} $$\nAnd solving the equation system, we get\n$$a= \\bar y - \\frac{s_{xy}}{s_x^2}\\bar x \\qquad b=\\frac{s_{xy}}{s_x^2}$$\nThis values minimize the residuals on $Y$ and give us the optimal linear model.\nRegression line Definition - Regression line. Given a sample of a two-dimensional variable $(X,Y)$, the regression line of $Y$ on $X$ is$$y = \\bar y +\\frac{s_{xy}}{s_x^2}(x-\\bar x).$$ The regression line of $Y$ on $X$ is the straight line that minimizes the predictive errors on $Y$, therefore it is the linear regression model that gives better predictions of $Y$. Example. Using the previous sample of heights ($X$) and weights ($Y$) with the following statistics\n$$ \\begin{array}{lllll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \u0026amp; \\quad \u0026amp; s_x = 10.1 \\mbox{ cm} \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \u0026amp; \u0026amp; s_y = 12.82 \\mbox{ Kg} \\newline \u0026amp; \u0026amp; s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \u0026amp; \u0026amp; \\end{array} $$\nthe regression line of weight on height is\n$$y = \\bar y +\\frac{s_{xy}}{s_x^2}(x-\\bar x) = 69.67+\\frac{104.07}{102.06}(x-174.67) = -108.49 +1.02 x$$\nAnd the regression line of height on weight is\n$$x = \\bar x +\\frac{s_{xy}}{s_y^2}(y-\\bar y) = 174.67+\\frac{104.07}{164.42}(y-69.67) = 130.78 + 0.63 y$$\nObserve that the regression lines are different! Relative position of the regression lines Usually, the regression line of $Y$ on $X$ and the regression line of $X$ on $Y$ are not the same, but they always intersect in the mean point $(\\bar x,\\bar y)$.\nIf there is a perfect linear relation between the variables, then both regression lines are the same, as that line makes both $X$-residuals and $Y$-residuals zero.\nIf there is no linear relation between the variables, then both regression lines are constant and equals to the respective means,\n$$y = \\bar y,\\quad x = \\bar x.$$\nSo, they intersect perpendicularly.\nRegression coefficient The most important parameter of a regression line is the slope.\nDefinition - Regression coefficient $b_{yx}$. Given a sample of a two-dimensional variable $(X,Y)$, the regression coefficient of the regression line of $Y$ on $X$ is its slope,$$b_{yx} = \\frac{s_{xy}}{s_x^2}$$ The regression coefficient has always the same sign as the covariance. It measures how the dependent variable changes in relation to the independent one according to the regression line. In particular, it gives the number of units that the dependent variable increases or decreases for every unit that the independent variable increases. Example. In the sample of heights and weights, the regression line of weight on height was\n$$y=-108.49 +1.02 x.$$\nThus, the regression coefficient of weight on height is\n$$b_{yx}= 1.02 \\mbox{Kg/cm.}$$\nThat means that, according to the regression line of weight on height, the weight will increase $1.02$ Kg for every cm that the height increases.\nRegression predictions Usually the regression models are used to predict the dependent variable for some values of the independent variable.\nExample. In the sample of heights and weights, to predict the weight of a person with a height of 180 cm, we have to use the regression line of weight on height,\n$$y = -108.49 + 1.02 \\cdot 180 = 75.11 \\mbox{ Kg}.$$\nBut to predict the height of a person with a weight of 79 Kg, we have to use the regression line of height on weight,\n$$x = 130.78 + 0.63\\cdot 79 = 180.55 \\mbox{ cm}.$$\nHowever, how reliable are these predictions?\nCorrelation Once we have a regression model, in order to see if it is a good predictive model we have to assess the goodness of fit of the model and the strength of the of relation set by it. The part of Statistics in charge of this is correlation.\nThe correlation study the residuals of a regression model: the smaller the residuals, the greater the goodness of fit, and the stronger the relation set by the model.\nResidual variance To measure the goodness of fit of a regression model is common to use the residual variance.\nDefinition - Sample residual variance $s_{ry}^2$. Given a regression model $y=f(x)$ of a two-dimensional variable $(X,Y)$, its sample residual variance is the average of the squared residuals,\n$$s_{ry}^2 = \\frac{\\sum e_{ij}^2n_{ij}}{n} = \\frac{\\sum (y_j - f(x_i))^2n_{ij}}{n}.$$\nThe greater the residuals, the greater the residual variance and the smaller the goodness of fit.\nWhen the linear relation is perfect, the residuals are zero and the residual variance is zero. Conversely, when there are no relation, the residuals coincide with deviations from the mean, and the residual variance is equal to the variance of the dependent variable.\n$$0\\leq s_{ry}^2\\leq s_y^2$$\nExplained and non-explained variation Coefficient of determination From the residual variance is possible to define another correlation statistic easier to interpret.\nDefinition - Sample coefficient of determination $r^2$. Given a regression model $y=f(x)$ of a two-dimensional variable $(X,Y)$, its coefficient of determination is$$r^2 = 1- \\frac{s_{ry}^2}{s_y^2}$$ As the residual variance ranges from 0 to $s_y^2$, we have\n$$0\\leq r^2\\leq 1$$\nThe greater $r^2$ is, the greater the goodness of fit of the regression model, and the more reliable will its predictions be. In particular,\nIf $r^2 =0$ then there is no relation as set by the regression model. If $r^2=1$ then the relation set by the model is perfect. When the regression model is linear, the coefficient of determination can be computed with this formula\n$$ r^2 = \\frac{s_{xy}^2}{s_x^2s_y^2}.$$\nProof When the fitted model is the regression line, the the residual variance is\n$$ \\begin{aligned} s_{ry}^2 \u0026amp; = \\sum e_{ij}^2f_{ij} = \\sum (y_j - f(x_i))^2f_{ij} = \\sum \\left(y_j - \\bar y -\\frac{s_{xy}}{s_x^2}(x_i-\\bar x) \\right)^2f_{ij}= \\newline \u0026amp; = \\sum \\left((y_j - \\bar y)^2 +\\frac{s_{xy}^2}{s_x^4}(x_i-\\bar x)^2 - 2\\frac{s_{xy}}{s_x^2}(x_i-\\bar x)(y_j -\\bar y)\\right)f_{ij} = \\newline \u0026amp; = \\sum (y_j - \\bar y)^2f_{ij} +\\frac{s_{xy}^2}{s_x^4}\\sum (x_i-\\bar x)^2f_{ij}- 2\\frac{s_{xy}}{s_x^2}\\sum (x_i-\\bar x)(y_j -\\bar y)f_{ij}= \\newline \u0026amp; = s_y^2 + \\frac{s_{xy}^2}{s_x^4}s_x^2 - 2 \\frac{s_{xy}}{s_x^2}s_{xy} = s_y^2 - \\frac{s_{xy}^2}{s_x^2}. \\end{aligned} $$\nand the coefficient of determination is\n$$ \\begin{aligned} r^2 \u0026amp;= 1- \\frac{s_{ry}^2}{s_y^2} = 1- \\frac{s_y^2 - \\frac{s_{xy}^2}{s_x^2}}{s_y^2} = 1 - 1 + \\frac{s_{xy}^2}{s_x^2s_y^2} = \\frac{s_{xy}^2}{s_x^2s_y^2}. \\end{aligned} $$\nExample. In the sample of heights and weights, we had\n$$ \\begin{array}{lll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \\newline s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \\end{array} $$\nThus, the linear coefficient of determination is\n$$r^2 = \\frac{s_{xy}^2}{s_x^2s_y^2} = \\frac{(104.07 \\mbox{ cm\\cdot Kg})^2}{102.06 \\mbox{ cm}^2 \\cdot 164.42 \\mbox{ Kg}^2} = 0.65.$$\nThis means that the linear model of weight on height explains the 65% of the variation of weight, and the linear model of height on weight also explains 65% of the variation of height.\nCorrelation coefficient Definition - Sample correlation coefficient $r$. Given a sample of a two-dimensional variable $(X,Y)$, the sample correlation coefficient is the square root of the linear coefficient of determination, with the sign of the covariance,$$r = \\dfrac{s_{xy}}{s_xs_y}.$$ As $r^2$ ranges from 0 to 1, $r$ ranges from -1 to 1,\n$$-1\\leq r\\leq 1.$$\nThe correlation coefficient measures not only the strength of the linear association but also its direction (increasing or decreasing):\nIf $r=0$ then there is no linear relation. Si $r=1$ then there is a perfect increasing linear relation. Si $r=-1$ then there is a perfect decreasing linear relation. Example. In the sample of heights and weights, we had\n$$\\begin{array}{lll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \\newline s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \\end{array} $$\nThus, the correlation coefficient is\n$$r = \\frac{s_{xy}}{s_xs_y} = \\frac{104.07 \\mbox{ cm\\cdot Kg}}{10.1 \\mbox{ cm} \\cdot 12.82 \\mbox{ Kg}} = +0.8.$$\nThis means that there is a rather strong linear, increasing, relation between height and weight.\nDifferent linear correlations The scatter plots below show linear regression models with differents correlations.\nReliability of regression predictions The coefficient of determination explains the goodness of fit of a regression model, but there are other factors that influence the reliability of regression predictions:\nThe coefficient of determination: The greater $r^2$, the greater the goodness of fit and the more reliable the predictions are.\nThe variability of the population distribution: The greater the variation, the more difficult to predict and the less reliable the predictions are.\nThe sample size: The greater the sample size, the more information we have and the more reliable the predictions are.\nIn addition, we have to take into account that a regression model is only valid for the range of values observed in the sample. That means that, as we don’t have any information outside that range, we must not do predictions for values far from that range. Non-linear regression The fit of a non-linear regression can be also done by the least square fitting method.\nHowever, in some cases the fitting of a non-linear model can be reduced to the fitting of a linear model applying a simple transformation to the variables of the model.\nTransformations of non-linear regression models Logarithmic: A logarithmic model $y = a+b \\log x$ can be transformed in a linear model with the change $t=\\log x$:\n$$y=a+b\\log x = a+bt.$$\nExponential: An exponential model $y = e^{a+bx}$ can be transformed in a linear model with the change $z = \\log y$:\n$$z = \\log y = \\log(e^{a+bx}) = a+bx.$$\nPotential: A potential model $y = ax^b$ can be transformed in a linear model with the changes $t=\\log x$ and $z=\\log y$:\n$$z = \\log y = \\log(ax^b) = \\log a + b \\log x = a^\\prime+bt.$$\nInverse: An inverse model $y = a+b/x$ can be transformed in a linear model with the change $t=1/x$:\n$$y = a + b(1/x) = a+bt.$$\nSigmoidal: A sigmoidal model $y = e^{a+b/x}$ can be transformed in a linear model with the changes $t=1/x$ and $z=\\log y$:\n$$z = \\log y = \\log (e^{a+b/x}) = a+b(1/x) = a+bt.$$\nExponential relation Example. The number of bacteria in a culture evolves with time according to the table below.\n$$\\begin{array}{c|c} \\mbox{Hours} \u0026amp; \\mbox{Bacteria} \\newline \\hline 0 \u0026amp; 25 \\newline 1 \u0026amp; 28 \\newline 2 \u0026amp; 47 \\newline 3 \u0026amp; 65 \\newline 4 \u0026amp; 86 \\newline 5 \u0026amp; 121 \\newline 6 \u0026amp; 190 \\newline 7 \u0026amp; 290 \\newline 8 \u0026amp; 362 \\end{array} $$\nThe scatter plot of the sample is showed below.\nFitting a linear model we get\n$$\\mbox{Bacteria} = -30.18+41,27,\\mbox{Hours, with } r^2=0.85.$$\nIs a good model?\nAlthough the linear model is not bad, according to the shape of the point cloud of the scatter plot, an exponential model looks more suitable.\nTo construct an exponential model $y = e^{a+bx}$ we can apply the transformation $z=\\log y$, that is, applying a logarithmic transformation to the dependent variable.\n$$\\begin{array}{c|c|c} \\mbox{Hours} \u0026amp; \\mbox{Bacteria} \u0026amp; \\mbox{$\\log$(Bacteria)} \\newline \\hline 0 \u0026amp; 25 \u0026amp; 3.22 \\newline 1 \u0026amp; 28 \u0026amp; 3.33 \\newline 2 \u0026amp; 47 \u0026amp; 3.85 \\newline 3 \u0026amp; 65 \u0026amp; 4.17 \\newline 4 \u0026amp; 86 \u0026amp; 4.45 \\newline 5 \u0026amp; 121 \u0026amp; 4.80 \\newline 6 \u0026amp; 190 \u0026amp; 5.25 \\newline 7 \u0026amp; 290 \u0026amp; 5.67 \\newline 8 \u0026amp; 362 \u0026amp; 5.89 \\end{array} $$\nNow it only remains to compute the regression line of the logarithm of bacteria on hours,\n$$\\mbox{$\\log$(Bacteria)} = 3.107 + 0.352, \\mbox{Horas},$$\nand, undoing the change of variable,\n$$\\mbox{Bacteria} = e^{3.107+0.352,\\mbox{Hours}}, \\mbox{ with } r^2=0.99.$$\nThus, the exponential model fits much better than the linear model.\nRegression risks Lack of fit does not mean independence It is important to note that every regression model has its own coefficient of determination.\nThus, a coefficient of determination near zero means that there is no relation as set by the model, but that does not mean that the variables are independent, because there could be a different type of relation. Outliers influence in regression Outliers in regression studies are points that clearly do not follow the tendency of the rest of points, even if the values of the pair are not outliers for every variable separately.\nOutliers in regression studies can provoke drastic changes in the regression models.\nThe Simpson\u0026rsquo;s paradox Sometimes a trend can disappears or even reverses when we split the sample into groups according to a qualitative variable that is related to the dependent variable. This is known as the Simpson\u0026rsquo;s paradox.\nExample. The scatterplot below shows an inverse relation between the study hours and the score in an exam.\nBut if we split the sample in two groups (good and bad students) we get different trends and now the relation is direct, which makes more sense.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1633627012,"objectID":"a459deb83268bdce67f7ac69652daa44","permalink":"/en/teaching/statistics/manual/regression/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/regression/","section":"teaching","summary":"In the last chapter we saw how to describe the distribution of a single variable in a sample. However, in most cases, studies require to describe several variables that are often related.","tags":["Statistics","Biostatistics","Regression"],"title":"Regression","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Calculate the following statistics and interpret them.\nMean. Median. Mode. Quartiles. Percentile 32. Solution $\\bar x=1.125$ injuries. $Me=1$ injury. $Mo=1$ injury. $Q_1=1$ injury, $Q_2=1$ injury and $Q_3=2$ injuries. $P_{32}=1$ injury. Exercise 2 The chart below shows the cumulative distribution of the time (in min) required by 66 students to do an exam.\nAt what time have half of the students finished? And 90% of students? What percentage of students have finished after 100 minutes? What is the time that best represent the time required by students in the sample to finish the exam? Is this value representative or not? Solution $Me=94.62$ min. $P_{90}=132$ min. $57.08%$ of students. $\\bar x=85.9091$ min, $s=37.5268$ min and $cv=0.4368$. Exercise 3 In a study about children\u0026rsquo;s growth, two samples were drawn, one for newborn babies and the other for one year old infants. The heights in cm of children in each of the samples were\nNewborn children: 51 50 51 53 49 50 53 50 47 50 One year old children: 62 65 69 71 65 66 68 69 In which group is the mean more representative? Justify your answer.\nSolution Newborn children: $\\bar x=50.4$ min, $s_x=1.6852$ min and $cv_x=0.0334$.\nOne year old children: $\\bar y=66.875$ min, $s_y=2.7128$ min and $cv_y=0.0406$. Exercise 4 To determine the accuracy of a method for measuring hematocrit in blood, the measurement was repeated 8 times on the same blood sample. The results of hematocrit in plasma, in percentage, were\n42.2 42.1 41.9 41.8 42 42.1 41.9 42 What do you think about the accuracy of the method?\nSolution $\\bar x=42$ min, $s=0.1225$ min and $cv=0.0029$. Exercise 5 The histogram below shows the frequency distribution of the body mass index (BMI) of a group of people by gender.\nDraw the pie chart for the gender. In which group is more representative the mean of the BMI? Calculate the mean for the whole sample. Use the following sums Females: $\\sum x_i=1160$ kg/m$^2$ $\\sum x_i^2=29050$ kg$^2$/m$^4$ Males: $\\sum x_i=1002.5$ kg/m$^2$ $\\sum x_i^2=22781.25$ kg$^2$/m$^4$\nSolution Females: $\\bar x=24.1667$ min, $s_x=4.6022$ min and $cv_x=0.1904$.\nMales: $\\bar y=22.2778$ min, $s_y=3.1545$ min and $cv_y=0.1416$. $\\bar z=23.2527$. Exercise 6 The following table represents the frequency distribution of ages at which a group of people suffered a heart attack.\nage persons [40,50) 6 [50,60) 12 [60,70) 23 [70,80) 19 [80,90) 5 Could we assume that the sample comes from a normal population?\nUse the following sums: $\\sum x_i=4275$ years, $\\sum(x_i-\\bar x)^2=7461.5385$ years$^2$, $\\sum (x_i-\\bar x)^3=-18248.5207$ years$^3$, $\\sum (x_i-\\bar x)^4=2099635.8671$ years$^4$.\nSolution $g_1=-0.2283$ and $g_2=-0.5487$. Exercise 7 To compare two rehabilitation treatments $A$ and $B$ for an injury, every treatment was applied to a different group of people. The number of days required to cure the injury in each group is shown in the following table:\nDays A B 20-40 5 8 40-60 20 15 60-80 18 20 80-100 7 7 In which treatment is more representative the mean? In which treatment the distribution of days is more skew? In which treatment the distribution is more peaked? Use the following sums: $A$: $\\sum x_i=3040$ days, $\\sum (x_i-\\bar x)^2=14568$ days$^2$, $\\sum (x_i-\\bar x)^3=17011.2$ days$^3$, $\\sum (x_i-\\bar x)^4=9989602.56$ days$^4$ $B$: $\\sum y_j=3020$ days, $\\sum (y_j-\\bar y)^2=16992$ days$^2$, $\\sum (y_j-\\bar y)^3=-42393.6$ days$^3$, $\\sum (y_j-\\bar y)^4=12551516.16$ days$^4$\nSolution $A$: $\\bar a=60.8$ days, $s_a=17.0693$ days and $cv_a=0.2807$.\n$B$: $\\bar b=60.4$ days, $s_b=18.4347$ days and $cv_b=0.3052$. $g_{1a}=0.0684$ and $g_{1b}=-0.1353$. $g_{2a}=-0.6465$ and $g_{2b}=-0.8264$, so the distribution of treatment $A$ is more peaked than the one of treatment $B$ as $g_{2a} \u0026gt; g_{2b}$. Exercise 8 The systolic blood pressure (in mmHg) of a sample of persons is\n135 128 137 110 154 142 121 127 114 103 Calculate the central tendency statistics. How is the relative dispersion with respect to the mean? How is the skewness of the sample distribution? How is the kurtosis of the sample distribution? If we know that the method used for measuring the blood pressure is biased, and, in order to get the right values, we have to apply the linear transformation $y=1.2x-5$, what are the statistics values of parts (a) to (d) for the new, corrected distribution? Use the following sums: $\\sum x_i=1271$ mmHg, $\\sum (x_i-\\bar x)^2=2188.9$ mmHg$^2$, $\\sum (x_i-\\bar x)^3=2764.32$ mmHg$^3$, $\\sum (x_i-\\bar x)^4=1040079.937$ mmHg$^4$.\nSolution $\\bar x=127.1$ mmHg, $Me=127.5$ mmHg, $Mo$ all the values. $s=14.7949$ mmHg and $cv=0.1164$. $g_1=0.0854$. $g_2=-0.8292$. $\\bar x=147.52$ mmHg, $Me=148$ mmHg, $Mo=157$ mmHg, $s=17.7539$ mmHg, $cv=0.1203$, $g_1=0.0854$ and $g_2=-0.8292$. Exercise 9 The table below contains the frequency of pregnancies, abortions and births of a sample of 999 women in a city.\nNum Pregnancies Abortions Births 0 61 751 67 1 64 183 80 2 328 51 400 3 301 10 300 4 122 2 90 5 81 2 62 6 29 0 0 7 11 0 0 8 2 0 0 How many birth outliers are in the sample? Which variable has lower spread with respect to the mean? Which value is relatively higher, 7 pregnancies or 4 abortions? Justify your answer. Use the following sums: Pregnancies: $\\sum x_i=2783$, $\\sum x_i^2=9773$. Abortions: $\\sum y_j=333$, $\\sum y_j^2=559$. Births: $\\sum z_k=2450$, $\\sum z_k^2=7370$.\nSolution $129$ outliers. Pregnancies: $\\bar x=2.7858$, $s_x=1.422$ and $cv_x=0.5105$.\nAbortions: $\\bar y=0.3333$, $s_y=0.6697$ and $cv_y=2.009$.\nBirths: $\\bar z=2.4525$, $s_z=1.1674$ and $cv_z=0.476$. Standard score of $7$ pregnancies is $2.9635$, and standard score of $4$ abortions is $5.4754$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616018106,"objectID":"cf36b557c37d44162ad677200a352a36","permalink":"/en/teaching/statistics/problems/statistics/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/statistics/","section":"teaching","summary":"Exercise 1 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Calculate the following statistics and interpret them.","tags":["Descriptive Statistics"],"title":"Problems of Descriptive Statistics","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Content of cells can be formatted in many ways: changing the data type, the font family, the alignment, the color, the border, etc. Most formatting options are grouped in the Format Cells dialog. To show this dialog click the bottom right corner of the Font panel in the ribbon\u0026rsquo;s Home tab.\nData types Excel manages several data types. The most common are numbers, dates and times, and text. All available data types are in the Number tab of the Format Cells dialog.\nFormatting numbers By default cells with numeric content are of type Number, but there are other numeric types like Currency and Accounting. Number is used for general display of numbers, while Currency and Accounting are used for monetary values. In all cases you can specify the number of decimal places. For monetary values you can also specify the symbol for the currency (€ by default).\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the format of prices to currency type with 3 decimal places.\nFormatting dates and times By default cells with content following the pattern day/month/year are of type Date, but there are a lot of ways of formatting dates, like for example, year-month-day or day-month_name-year etc.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the format of dates following the pattern Month-Year, with the three first letters of months and the two last digits of years.\nBy default cells with content following the pattern hours:minutes:seconds are of type Time, but there are a several ways of formatting times.\nFormatting text By default cells with non numeric content are of type Text. It\u0026rsquo;s possible to apply this type even to numbers, like for example phone numbers.\nText entered in a cell spreads to adjacent cells to the right if these cells have no content. To confine text to a certain width in the cell, select the cell and click the button Wrap Text in the Alignment section in the ribbon\u0026rsquo;s Home tab.\nAlign cell contents By default numbers are aligned to the right and text to the left, but it\u0026rsquo;s possible to change the alignment of cell contents in the Alignment tab of the Format Cells dialog.\nHorizontal alignment To change the horizontal alignment select Left, Right, Center or Justify in the Horizontal drop down list of the Alignment tab. You can also align the cell contents with the buttons of the Alignment panel in the ribbon\u0026rsquo;s Home tab.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to align the average prices centered.\nVertical alignment To change the vertical alignment select Top, Bottom, Center or Justify in the Vertical drop down list of the Alignment tab. You can also align the cell contents with the buttons of the Alignment panel in the ribbon\u0026rsquo;s Home tab.\nFont properties To format the font of cell contents select the font family, font style, font size and font color from the Font tab of the Format Cells dialog. You can also apply some effects like underline, superscript and subscript.\nIt\u0026rsquo;s also possible to change the font family, style, size and color from the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the font family of all table to Arial, size 10 pt.\nThe animation below shows how to change the font style of average prices to bold and the color of fruits names to blue.\nBorders and background To format the borders of cells select the line style and color, and click the borders where to apply that line in the table of the Borders tab in the Format Cells dialog.\nIt\u0026rsquo;s also possible to change the border of cells with the Border button of the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to put lines to some cell borders.\nTo format the background of cells select the background color and pattern style in the Fill tab of the Format Cells dialog.\nIt\u0026rsquo;s also possible to change the background color of cells with the Background colour button of the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how set the background colour of some cells.\nMerge cells To merge several cells in one, select the range of cells and click the button Merge \u0026amp; Center in the Alignment section in the ribbon\u0026rsquo;s Home tab. If there are more than one cell with content in the range, merging will keep the content of the upper-left cell only. By default content of merged cells is centered.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how merge the cells of the first row and center the title.\nCopy and paste format To apply the format of a cell to others select the cell, click the Format painter button to copy the cell format. Then then select the range of cells to paste the that format.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply the same format of the fruit rows to a new row for pineapples.\nConditional formatting Excel allows to apply a format to a cell depending on its value and according to some rules. To set a new rule click the Conditional Formatting button and select New Rule. There are different types of rules:\nFormat all cells based on their value Applies a format style based on the value of the cell. There are 4 types of styles:\n2-Color Scale Applies a colour in a continuous scale ranging from one colour for the minimum value or percentage to other colour for the maximum value or percentage.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices a colour background in a continuous scale from green (the minimum price) to red (the maximum price).\n3-Color Scale The same than 2-Color Scale but with a third intermediate colour in the scale.\nData bar Plots an horizontal bar in each cell with a length proportional to the value of the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices a data bar format.\nIcon Sets Divide the distribution of selected cell values in several parts according to intervals or percentiles, assign an different icon to each part, and plot the corresponding icon in each cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices an icon set format. The icon set has three icons: red is applied to values under the 33 percentile, yellow is applied to values between 33 and 67 percentiles, and green is applied to values over 67 percentile.\nFormat only cells that contain Applies a format to the cell if satisfies a logical condition.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices higher than 2 € a red colour.\nFormat only top or bottom ranked values Applies a format to a number or percentage of top or bottom values. Example. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to the three top higher prices a red colour.\nFormat only values that are above or below average Applies a format to cells with values above or below the average of selected cells. Example. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply a red colour to prices above the average and a green colour to prices below the average.\nPredefined styles Excel has a lot of predefined styles for formatting cells and tables. To apply a predefined cell style click Cell Styles button and select the desired style. It\u0026rsquo;s possible to define new cell styles. For that select the cell with the format to define as a style, click Cell Styles button and select New Cell Style option. In the dialog that appears just give a name to the new style, press OK, and the new cell style will appear in the cell styles menu.\nTo apply a predefined table style click Format as Table button and select the desired style. It\u0026rsquo;s also possible to define new table styles. For that click Format as Table button and select New Table Style option. In the dialog that appears just give a name to the new style, define the table format (font, borders and fill), press OK, and the new table style will appear in the table styles menu.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"c84a6dcd76af8594181a43299ad083c8","permalink":"/en/teaching/excel/manual/formatting/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/formatting/","section":"teaching","summary":" ","tags":["Excel"],"title":"Formatting and Data Printing","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Descriptive Statistics provides methods to describe the variables measured in the sample and their relations, but it does not allow to draw any conclusion about the population.\nNow it is time to take the leap from the sample to the population and the bridge for that is Probability Theory.\nRemember that the sample has a limited information about the population, and in order to draw valid conclusions for the population the sample must be representative of it. For that reason, to guarantee the representativeness of the sample, this must be drawn randomly. This means that the choice of individuals in the sample is by chance.\nProbability Theory will provide us the tools to control the random in the sampling and to determine the level of reliability of the conclusions drawn from the sample.\nRandom experiments and events Random experiments The study of a characteristic of the population is conducted through random experiments.\nDefinition - Random experiment. A random experiment is an experiment that meets two conditions:\nThe set of possible outcomes is known. It is impossible to predict the outcome with absolute certainty. Example. Gambling are typical examples of random experiments. The roll of a dice, for example, is a random experiment because\nIt is known the set of possible outcomes: $\\{1,2,3,4,5,6\\}$. Before rolling the dice, it is impossible to predict with absolute certainty the outcome. Another non-gambling example is the random choice of an individual of a human population and the determination of its blood type.\nGenerally, the draw of a sample by a random method is an random experiment.\nSample space Definition - Sample space. The set $\\Omega$ of the possible outcomes of a random experiment is known as the sample space. Example. Some examples of sample spaces are:\nFor the toss of a coin $\\Omega=\\{\\mbox{heads},\\mbox{tails}\\}$. For the roll of a dice $\\Omega=\\{1,2,3,4,5,6\\}$. For the blood type of an individual drawn by chance $\\Omega=\\{\\mbox{A},\\mbox{B},\\mbox{AB},\\mbox{0}\\}$. For the height of an individual drawn by chance $\\Omega=\\mathbb{R}^+$. Tree diagrams In experiments where more than one variable is measured, the determination of the sample space can be difficult. In such a cases, it is advisable to use a tree diagram to construct the sample space.\nIn a tree diagram every variable is represented in a level of the tree and every possible outcome of the variable as a branch.\nExample. The tree diagram below represents the sample space of a random experiment where the gender and the blood type is measured in a random individual.\nRandom events Definition - Random event. A random event is any subset of the sample space $\\Omega$ of a random experiment. There are different types of events:\nImpossible event: Is the event with no elements $\\emptyset$. It has no chance of occurring. Elemental events: Are events with only one element, that is, a singleton. Composed events: Are events with two or more elements. Sure event: Is the event that contains the whole sample space $\\Omega$. It always happens. Set theory Event space Definition - Event space. Given a sample space $\\Omega$ of a random experiment, the event space of $\\Omega$ is the set of all possible events of $\\Omega$, and is noted $\\mathcal{P}(\\Omega).$ Example. Given the sample space $\\Omega=\\{a,b,c\\}$, its even space is\n$$\\mathcal{P}(\\Omega)=\\{\\emptyset, {a},{b},{c},{a,b},{a,c},{b,c},{a,b,c}\\}$$\nAs events are subsets of the sample space, using the set theory we have the following operations on events:\nUnion Intersection Complement Difference Union of events Definition - Union event. Given two events $A,B\\subseteq \\Omega$, the union of $A$ and $B$, denoted by $A\\cup B$, is the event of all elements that are members of $A$ or $B$ or both.\n$$A\\cup B = \\{x\\,|\\, x\\in A\\textrm{ or }x\\in B\\}.$$\nThe union event $A\\cup B$ happens when $A$ or $B$ happen.\nIntersection of events Definition - Intersection event. Given two events $A,B\\subseteq \\Omega$, the intersection of $A$ and $B$, denoted by $A\\cap B$, is the event of all elements that are members of both $A$ and $B$.\n$$A\\cap B = \\{x\\,|\\, x\\in A\\textrm{ and }x\\in B\\}.$$\nThe intersection event $A\\cap B$ happens when $A$ and $B$ happen.\nTwo events are incompatible if their intersection is empty.\nComplement of an event Definition - Complementary event. Given an event $A\\subseteq \\Omega$, the complementary or contrary event of $A$, denoted by $\\bar A$, is the event of all elements of $\\Omega$ except the elements that are members of $A$.\n$$\\bar A = \\{x\\,|\\, x\\not\\in A\\}.$$\nThe complementary event $\\bar A$ happens when $A$ does not happen.\nDifference of events Definition - Difference event. Given two events $A,B\\subseteq \\Omega$, the difference of $A$ and $B$, denoted by $A-B$, is the event of all elements that are members of $A$ but not are members of $B$.\n$$A-B = \\{x\\,|\\, x\\in A\\textrm{ and }x\\not\\in B\\} = A \\cap \\bar B.$$\nThe difference event $A-B$ happens when $A$ happens but $B$ does not.\nExample. Given the sample space of rolling a dice $\\Omega=\\{1,2,3,4,5,6\\}$ and the events $A=\\{2,4,6\\}$ and $B=\\{1,2,3,4\\}$,\nThe union of $A$ and $B$ is $A\\cup B=\\{1,2,3,4,6\\}$. The intersection of $A$ and $B$ is $A\\cap B=\\{2,4\\}$. The complement of $A$ is $\\bar A=\\{1,3,5\\}$. The events $A$ and $\\bar A$ are incompatible. The difference of $A$ and $B$ is $A-B=\\{6\\}$, and the difference of $B$ and $A$ is $B-A=\\{1,3\\}$. Algebra of events Given the events $A,B,C\\subseteq \\Omega$, the following properties are meet.\n$A\\cup A=A$, $\\quad A\\cap A=A$ (idempotency). $A\\cup B=B\\cup A$, $\\quad A\\cap B = B\\cap A$ (commutative). $(A\\cup B)\\cup C = A\\cup (B\\cup C)$, $\\quad (A\\cap B)\\cap C = A\\cap (B\\cap C)$ (associative). $(A\\cup B)\\cap C = (A\\cap C)\\cup (B\\cap C)$, $\\quad (A\\cap B)\\cup C = (A\\cup C)\\cap (B\\cup C)$ (distributive). $A\\cup \\emptyset=A$, $\\quad A\\cap \\Omega=A$ (neutral element). $A\\cup \\Omega=\\Omega$, $\\quad A\\cap \\emptyset=\\emptyset$ (absorbing element). $A\\cup \\overline A = \\Omega$, $\\quad A\\cap \\overline A= \\emptyset$ (complementary symmetric element). $\\overline{\\overline A} = A$ (double contrary). $\\overline{A\\cup B} = \\overline A\\cap \\overline B$, $\\quad \\overline{A\\cap B} = \\overline A\\cup \\overline B$ (Morgan’s laws). $A\\cap B\\subseteq A\\cup B$. Probability definition Classical definition of probability Definition - Probability (Laplace). Given a sample space $\\Omega$ of a random experiment where all elements of $\\Omega$ are equally likely, the probability of an event $A\\subseteq \\Omega$ is the quotient between the number of elements of $A$ and the number of elements of $\\Omega$\n$$P(A) = \\frac{|A|}{|\\Omega|} = \\frac{\\mbox{number of favorable outcomes}}{\\mbox{number of possible outcomes}}$$\nThis definition is well known, but it has important restrictions:\nIt is required that all the elements of the sample space are equally likely (equiprobability). It can not be used with infinite sample spaces. Example. Given the sample space of rolling a dice $\\Omega=\\{1,2,3,4,5,6\\}$ and the event $A=\\{2,4,6\\}$, the probability of $A$ is\n$$P(A) = \\frac{|A|}{|\\Omega|} = \\frac{3}{6} = 0.5.$$\nHowever, given the sample space of the blood type of a random individual $\\Omega=\\{O,A,B,AB\\}$, it is not possible to use the classical definition to compute the probability of having group $A$,\n$$P(A) \\neq \\frac{|A|}{|\\Omega|} = \\frac{1}{4} = 0.25,$$\nbecause the blood types are not equally likely in human populations.\nFrequency definition of probability Theorem - Law of large numbers. When a random experiment is repeated a large number of times, the relative frequency of an event tends to the probability of the event. The following definition of probability uses this theorem.\nDefinition - Frequency probability. Given a sample space $\\Omega$ of a replicable random experiment, the probability of an event $A\\subseteq \\Omega$ is the relative frequency of the event $A$ in an infinite number of repetitions of the experiment\n$$P(A) = lim_{n\\rightarrow \\infty}\\frac{n_A}{n}$$\nAlthough frequency probability avoid the restrictions of classical definition, it also have some drawbacks:\nIt computes an estimation of the real probability (more accurate the higher the sample size). The repetition of the experiment must be in identical conditions. Example. Given the sample space of tossing a coin $\\Omega=\\{H,T\\}$, if after tossing the coin 100 times we got 54 heads, then the probability of $H$ is\n$$P(H) = \\frac{n_H}{n} = \\frac{54}{100} = 0.54.$$\nGiven the sample space of the blood type of a random individual $\\Omega=\\{O,A,B,AB\\}$, if after drawing a random sample of 1000 persons we got 412 with blood type $A$, then the probability of $A$ is\n$$P(A) = \\frac{n_A}{n} = \\frac{412}{1000} = 0.412.$$\nAxiomatic definition of probability Definition - Probability (Kolmogórov). Given a sample space $\\Omega$ of a random experiment, a probability function is a function that maps every event $A\\subseteq \\Omega$ a real number $P(A)$, known as the probability of $A$, that meets the following axioms:\nThe probability of any event is nonnegative,\n$$P(A)\\geq 0.$$\nThe probability of the sure event is 1,\n$$P(\\Omega)=1$$\nThe probability of the union of two incompatible events ($A\\cap B=\\emptyset$) is the sum of their probabilities\n$$P(A\\cup B) = P(A)+P(B).$$\nFrom the previous axioms is possible to deduce some important properties of a probability function.\nGiven a sample space $\\Omega$ of a random experiment and the events $A,B\\subseteq \\Omega$, the following properties are meet:\n$P(\\bar A) = 1-P(A)$.\n$P(\\emptyset)= 0$.\nIf $A\\subseteq B$ then $P(A)\\leq P(B)$.\n$P(A) \\leq 1$. This means that $P(A)\\in [0,1]$.\n$P(A-B)=P(A)-P(A\\cap B)$.\n$P(A\\cup B)= P(A) + P(B) - P(A\\cap B)$.\nIf $A=\\{e_1,\\ldots,e_n\\}$, where $e_i$ $i=1,\\ldots,n$ are elemental events, then\n$$P(A)=\\sum_{i=1}^n P(e_i).$$\nProof $\\bar A = \\Omega \\Rightarrow P(A\\cup \\bar A) = P(\\Omega) \\Rightarrow P(A)+P(\\bar A) = 1 \\Rightarrow P(\\bar A)=1-P(A)$.\n$\\emptyset = \\bar \\Omega \\Rightarrow P(\\emptyset) = P(\\bar \\Omega) = 1-P(\\Omega) = 1-1 = 0.$\n$B = A\\cup (B-A)$. As $A$ and $B-A$ are incompatible, $P(B) = P(A\\cup (B-A)) = P(A)+P(B-A) \\geq P(A).$\nIf we think of probabilities as areas, it is easy to see graphically,\n$A\\subseteq \\Omega \\Rightarrow P(A)\\leq P(\\Omega)=1.$\n$A=(A-B)\\cup (A\\cap B)$. As $A-B$ and $A\\cap B$ are incompatible, $P(A)=P(A-B)+P(A\\cap B) \\Rightarrow P(A-B)=P(A)-P(A\\cap B)$.\nIf we think of probabilities as areas, it is easy to see graphically,\n$A\\cup B= (A-B) \\cup (B-A) \\cup (A\\cap B)$. As $A-B$, $B-A$ and $A\\cap B$ are incompatible, $P(A\\cup B)=P(A-B)+P(B-A)+P(A\\cap B) =P(A)-P(A\\cap B)+P(B)-P(A\\cap B)+P(A\\cap B)$ $=P(A)+P(B)-P(A\\cup B)$.\nIf we think again of probabilities as areas, it is easy to see graphically because the area of $A\\cap B$ is added twice (one for $A$ and other for $), so it must be subtracted once.\n$A=\\{e_1,\\cdots,e_n\\} = \\{e_1\\}\\cup \\cdots \\cup \\{e_n\\} \\Rightarrow P(A)=P(\\{e_1\\}\\cup \\cdots \\cup \\{e_n\\}) = P(\\{e_1\\})+ \\cdots P(\\{e_n\\}).$\nProbability interpretation As set by the previous axioms, the probability of an event $A$, is a real number $P(A)$ that always ranges from 0 to 1.\nIn a certain way, this number expresses the plausibility of the event, that is, the chances that the event $A$ occurs in the experiment. Therefore, it also gives a measure of the uncertainty about the event.\nThe maximum uncertainty correspond to probability $P(A)=0.5$ ($A$ and $\\bar A$ have the same chances of happening). The minimum uncertainty correspond to probability $P(A)=1$ ($A$ will happen with absolute certainty) and $P(A)=0$ ($A$ won’t happen with absolute certainty) When $P(A)$ is closer to 0 than to 1, the chances of not happening $A$ are greater than the chances of happening $A$. On the contrary, when $P(A)$ is closer to 1 than to 0, the chances of happening $A$ are greater than the chances of not happening $A$.\nConditional probability Conditional experiments Occasionally, we can get some information about the experiment before its realization. Usually that information is given as an event $B$ of the same sample space that we know that is true before we conduct the experiment.\nIn such a case, we will say that $B$ is a conditioning event and the probability of another event $A$ is known as a conditional probability and expressed $P(A\\vert B)$. This must be read as probability of $A$ given $B$ or probability of $A$ under the condition $B$.\nUsually, conditioning events change the sample space and therefore the probabilities of events.\nExample. Assume that we have a sample of 100 women and 100 men with the following frequencies\n$$ \\begin{array}{|c|c|c|} \\hline \u0026amp; \\mbox{Non-smokers} \u0026amp; \\mbox{Smokers} \\newline \\hline \\mbox{Females} \u0026amp; 80 \u0026amp; 20 \\newline \\hline \\mbox{Males} \u0026amp; 60 \u0026amp; 40 \\newline \\hline \\end{array} $$\nThen, using the frequency definition of probability, the\n$$P(\\mbox{Smoker})= \\frac{60}{200}=0.3.$$\nHowever, if we know that the person is a woman, then the sample is reduced to the first row, and the probability of being smoker is\n$$P(\\mbox{Smoker}\\mid\\mbox{Female})=\\frac{20}{100}=0.2.$$\nConditional probability Definition - Conditional probability Given a sample space $\\Omega$ of a random experiment, and two events $A,B\\subseteq \\Omega$, the probability of $A$ conditional on $B$ occurring is\n$$P(A|B) = \\frac{P(A\\cap B)}{P(B)},$$as long as, $P(B)\\neq 0$.\nThis definition allows to calculate conditional probabilities without changing the original sample space.\nExample. In the previous example\n$$P(\\mbox{Smoker}\\mid\\mbox{Female})= \\frac{P(\\mbox{Smoker}\\cap \\mbox{Female})}{P(\\mbox{Female})} = \\frac{20/200}{100/200}=\\frac{80}{100}=0.8.$$\nProbability of the intersection event From the definition of conditional probability it is possible to derive the formula for the probability of the intersection of two events.\n$$P(A\\cap B) = P(A)P(B|A) = P(B)P(A|B).$$\nExample. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer. The probability of a random person being smoker and having breast cancer is\n$$P(\\mbox{Smoker}\\cap \\mbox{Cancer})= P(\\mbox{Smoker})P(\\mbox{Cancer}\\mid\\mbox{Smoker}) = 0.3\\times 0.4 = 0.12.$$\nIndependence of events Sometimes, the probability of the conditioning event does not change the original probability of the main event.\nDefinition - Independent events. Given a sample space $\\Omega$ of a random experiment, two events $A,B\\subseteq \\Omega$ are independents if the probability of $A$ does not change when conditioning on $B$, and vice-versa, that is,\n$$P(A|B) = P(A) \\quad \\mbox{and} \\quad P(B|A)=P(B),$$\nif $P(A)\\neq 0$ and $P(B)\\neq 0$.\nThis means that the occurrence of one event does not give relevant information to change the uncertainty of the other.\nWhen two events are independent, the probability of the intersection of them is equal to the product of their probabilities,\n$$P(A\\cap B) = P(A)P(B).$$\nExample. The sample space of tossing twice a coin is $\\Omega=\\{(H,H),(H,T),(T,H),(T,T)\\}$ and all the elements are equiprobable if the coin is fair. Thus, applying the classical definition of probability we have\n$$P((H,H)) = \\frac{1}{4} = 0.25.$$\nIf we name $H_1={(H,H),(H,T)}$, that is, having heads in the first toss, and $H_2=\\{(H,H),(T,H)\\}$, that is, having heads in the second toss, we can get the same result assuming that these events are independent,\n$$P(H,H)= P(H_1\\cap H_2) = P(H_1)P(H_2) = \\frac{2}{4}\\frac{2}{4}=\\frac{1}{4}=0.25.$$\nProbability Space Definition - Probability space. A probability space of a random experiment is a triplet $(\\Omega,\\mathcal{F},P)$ where\n$\\Omega$ is the sample space of the experiment. $\\mathcal{F}$ is a set of events of the experiment. $P$ is a probability function. If we know the probabilities of all the elements of $\\Omega$, then we can calculate the probability of every event in $\\mathcal{F}$ and we can construct easily the probability space.\nProbability space construction In order to determine the probability of every elemental event we can use a tree diagram, using the following rules:\nFor every node of the tree, label the incoming edge with the probability of the variable in that level having the value of the node, conditioned by events corresponding to its ancestor nodes in the tree. The probability of every elemental event in the leaves is the product of the probabilities on edges that go form the root to the leave. Probability tree with dependent variables In a probability tree with dependent variables, the probababilities of every level of the tree are different depending on the outcome of the previous leves.\nExample. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer, while only 10% of non-smokers have breast cancer. The probability tree of the probability space of the random experiment consisting of picking a random person and measuring the variables smoking and breast cancer is shown below.\nProbability tree with independent variables In a probability tree with independent variables, the probabilities of every level of the tree are the same no matter the outcome of the previous leves.\nExample. The probability tree of the random experiment of tossing two coins is shown below.\nExample. In a population there are 40% of males and 60% of females, the probability tree of drawing a random sample of three persons is shown below.\nTotal probability theorem Partition of the sample space Definition - Partition of the sample space. A collection of events $A_1,A_2,\\ldots,A_n$ of the same sample space $\\Omega$ is a partition of the sample space if it satisfies the following conditions\nThe union of the events is the sample space, that is, $A_1\\cup \\cdots\\cup A_n =\\Omega$. All the events are mutually incompatible, that is, $A_i\\cap A_j = \\emptyset$ $\\forall i\\neq j$. Usually it is easy to get a partition of the sample space splitting a population according to some categorical variable, like for example gender, blood type, etc.\nTotal probability theorem If we have a partition of a sample space, we can use it to calculate the probabilities of other events in the same sample space.\nTheorem - Total probability. Given a partition $A_1,\\ldots,A_n$ of a sample space $\\Omega$, the probability of any other event $B$ of the same sample space can be calculated with the formula\n$$P(B) = \\sum_{i=1}^n P(A_i\\cap B) = \\sum_{i=1}^n P(A_i)P(B|A_i).$$\nProof The proof of the theorem is quite simple. As $A_1,\\ldots,A_n$ is a partition of $\\Omega$, we have\n$$B = B\\cap \\Omega = B\\cap (A_1\\cup \\cdots \\cup A_n) = (B\\cap A_1)\\cup \\cdots \\cup (B\\cap A_n).$$\nAnd all the events of this union are mutually incompatible as $A_1,\\ldots,A_n$ are, thus\n$$ \\begin{aligned} P(B) \u0026amp;= P((B\\cap A_1)\\cup \\cdots \\cup (B\\cap A_n)) = P(B\\cap A_1)+\\cdots + P(B\\cap A_n) =\\newline \u0026amp;= P(A_1)P(B|A_1)+\\cdots + P(A_n)P(B|A_n) = \\sum_{i=1}^n P(A_i)P(B|A_i). \\end{aligned} $$\nExample. A symptom $S$ can be caused by a disease $D$, but it can also be present in persons without the disease. In a population, the rate of people with the disease is $0.2$. We know also that $90%$ of persons with the disease have the symptom, while only $40%$ of persons without the disease have it.\nWhat is the probability that a random person of the population has the symptom?\nTo answer the question we can apply the total probability theorem using the partition $\\{A,\\bar A\\}$:\n$$P(S) = P(D)P(S|D)+P(\\bar D)P(S|\\bar D) = 0.2\\cdot 0.9 + 0.8\\cdot 0.4 = 0.5.$$\nThat is, half of the population has the symptom.\nIndeed, it is a weighted mean of probabilities!\nThe answer to the previous question is even clearer with the tree diagram of the probability space.\n$$ \\begin{aligned} P(S) \u0026amp;= P(D,S) + P(\\bar D,S) = P(D)P(S|D)+P(\\bar D)P(S|\\bar D)\\newline \u0026amp; = 0.2\\cdot 0.9+ 0.8\\cdot 0.4 = 0.18 + 0.32 = 0.5. \\end{aligned} $$\nBayes theorem A partition of a sample space $A_1,\\cdots,A_n$ may also be interpreted as a set of feasible hypothesis for a fact $B$.\nIn such cases it may be helpful to calculate the posterior probability $P(A_i\\vert B)$ of every hypothesis.\nDefinition - Bayes. Given a partition $A_1,\\ldots,A_n$ of a sample space $\\Omega$ and another event $B$ of the same sample space, the conditional probability of every even $A_i$ $i=1,\\ldots,n$ on $B$ can be calculated with the following formula\n$$P(A_i|B) = \\frac{P(A_i\\cap B)}{P(B)} = \\frac{P(A_i)P(B|A_i)}{\\sum_{i=1}^n P(A_i)P(B|A_i)}.$$\nExample. In the previous example, a more interesting question is about the diagnosis for a person with the symptom.\nIn this case we can interpret $D$ and $\\overline{D}$ as the two feasible hypothesis for the symptom $S$. The prior probabilities for them are $P(D)=0.2$ and $P(\\overline{D})=0.8$. That means that if we do not have information about the symptom, the diagnosis would be that the person does not have the disease.\nHowever, if after examining the person we observe the symptom, that information changes the uncertainty about the hypothesis, and we need calculate the posterior probabilities to diagnose, that is, $P(D\\vert S)$ and $P(\\overline{D}\\vert S)$.\nTo calculate the posterior probabilities we can use the Bayes theorem.\n$$ \\begin{aligned} P(D|S) \u0026amp;= \\frac{P(D)P(S|D)}{P(D)P(S|D)+P(\\overline{D})P(S|\\overline{D})} = \\frac{0.2\\cdot 0.9}{0.2\\cdot 0.9 + 0.8\\cdot 0.4} = \\frac{0.18}{0.5}=0.36,\\newline P(\\overline{D}|S) \u0026amp;= \\frac{P(\\overline{D})P(S|\\overline{D})}{P(D)P(S|D)+P(\\overline{D})P(S|\\overline{D})} = \\frac{0.8\\cdot 0.4}{0.2\\cdot 0.9 + 0.8\\cdot 0.4} = \\frac{0.32}{0.5}=0.64. \\end{aligned} $$\nAs we can see the probability of having the disease has increased. Nevertheless, the probability of not having the disease is still greater than the probability of having it, and for that reason, the diagnosis is not having the disease.\nIn this case it is said the the symptom $S$ is not decisive in order to diagnose the disease.\nEpidemiology One of the branches of Medicine that makes an intensive use of probability is , that study the distribution and causes of diseases in populations identifying risk factors for disease and targets for preventive healthcare.\nIn Epidemiology we are interested in how often appears an event or medical event $D$ (typically a disease like flu, a risk factor like smoking or a protection factor like a vaccine) that is measured as a nominal variable with two categories (occurrence or not of the event).\nThere are different measures related to the frequency of a medical event. The most important are:\nPrevalence Incidence Relative risk Odds ratio Prevalence Definition - Prevalence. The prevalence of a medical event $D$ is the proportion of a particular population that is affected by a medical event.\n$$\\mbox{Prevalence}(D) = \\frac{\\mbox{Num people affected by $D$}}{\\mbox{Population size}}$$\nOften, the prevalence is estimated from a sample as the relative frequency of people affected by the event in the sample. It is also common to express that frequency as a percentage.\nExample. To estimate the prevalence of flu a sample of 1000 persons has been studied and 150 of them had flu. Thus, the prevalence of flu is approximately 150/1000=0.15, that is, a 15%.\nIncidence Incidence measures the probability of occurrence of a medical event in a population within a given period of time. Incidence can be measured as a cumulative proportion or as a rate.\nDefinition - Cumulative incidence. The cumulative incidence of a medical event $D$ is the proportion of people that experience the event in a period of time, that is, the number of new cases with the event in the period of time divided by the size of the population at risk.\n$$R(D)=\\frac{\\mbox{Num of new cases with $D$}}{\\mbox{Population at risk size}}$$\nExample. A population initially contains 1000 persons without flu and after two years of observation 160 of them got the flu. The incidence proportion of flu is 160 cases per 1000 persons per two years, i.e. 16% per two years.\nIncidence rate or Absolute risk Definition - Incidence rate. The incidence rate or absolute risk of a medical event $D$ is the number of new cases with the event divided by the size of the population at risk and by the number of units of time in a given period.\n$$R(D)=\\frac{\\mbox{Num of new cases with $D$}}{\\mbox{Population at risk size}\\times \\mbox{Num of unit time intervals}}$$\nExample. A population initially contains $1000$ persons without flu and after two years of observation 160 of them got the flu. If we consider the year as the unit of time, the incidence rate of flu is 160 cases per $1000$ persons divided by two years, i.e. 80 cases per 1000 persons-year or 8% persons per year.\nPrevalence vs Incidence Prevalence must not be confused with incidence. Prevalence indicates how widespread the medical event is, and is more a measure of the burden of the event on society with no regard to time at risk or when subjects may have been exposed to a possible risk factor, whereas incidence conveys information about the risk of being affected by the event.\nPrevalence can be measured in cross-sectional studies at a particular time, while in order to measure incidence we need a longitudinal study observing the individuals during a period of time.\nIncidence is usually more useful than prevalence in understanding the event etiology: for example, if the incidence of a disease in a population increases, then there is a risk factor that promotes it.\nWhen the incidence is approximately constant for the duration of the event, prevalence is approximately the product of event incidence and average event duration, so\n$$\\mbox{prevalence} = \\mbox{incidence} \\times \\mbox{duration}$$\nComparing risks In order to determine if a factor or characteristic is associated with the medical event we need to compare the risk of the medical event in two populations, one exposed to the factor and the other not exposed. The group of people exposed to the factor is known as the treatment group or experimental group and the group of people unexposed as the control group.\nUsually the cases observed for each group are represented in a 2$\\times$2 table like the one below.\nEvent $D$ No event $\\overline D$ Treatment group (exposed) $a$ $b$ Control group(unexposed) $c$ $d$ Attributable risk or Risk difference $RD$ Definition - Attributable risk. The attributable risk or risk difference of a medical event $D$ for people exposed to a factor is the difference between the absolute risks of the treatment group and the control group.\n$$\\begin{aligned}RD(D) \u0026amp;= \\mbox{Risk in treatment group}-\\mbox{Risk in control group}=\\newline \u0026amp;= R_T(D)-R_C(D)=\\frac{a}{a+b}-\\frac{c}{c+d}. \\end{aligned} $$\nThe attributable risk is the risk of an event that is specifically due to the factor of interest.\nObserve that the attributable risk can be positive, when the risk of the treatment group is greater than the risk of the control group, and negative, on the contrary.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The attributable risk of getting the flu for people vaccinated is\n$$AR(D) = \\frac{20}{20+480}-\\frac{80}{80+420} = -0.12.$$\nThis means that the risk of getting flu in vaccinated people is a 12% less than in unvaccinated.\nRelative risk $RR$ Definition - Relative risk. The relative risk of a medical event $D$ for people exposed to a factor is the quotient between the proportions of people that acquired the event in a period of time in the treatment and control groups. That is, the quotient between the incidences of the treatment and the control groups.\n$$RR(D)=\\frac{\\mbox{Risk in treatment group}}{\\mbox{Risk in control group}}=\\frac{R_1(D)}{R_0(D)}=\\frac{a/(a+b)}{c/(c+d)}$$\nRelative risk compares the risk of a medical event between the treatment and the control groups.\n$RR=1$ $\\Rightarrow$ There is no association between the event and the exposure to the factor. $RR\u0026lt;1$ $\\Rightarrow$ Exposure to the factor decreases the risk of the event. $RR\u0026gt;1$ $\\Rightarrow$ Exposure to the factor increases the risk of the event. The further from 1, the stronger the association.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The relative risk of getting the flu for people vaccinated is\n$$RR(D) = \\frac{20/(20+480)}{80/(80+420)} = 0.25.$$\nThis means that vaccinated people were only one-fourth as likely to develop flu as were unvaccinated people, i.e. the vaccine reduce the risk of flu by 75%.\nOdds An alternative way of measuring the risk of a medical event is the odds.\nDefinition - Odds. The odds of a medical event $D$ in a population is the quotient between the people that acquired the event and people that not in a period of time. Unlike incidence or absolute risk, that is a proportion less than 1, the odds can be greater than 1. However, it is possible to convert an odd into a probability with the formula\n$$P(D) = \\frac{\\mbox{ODDS}(D)}{\\mbox{ODDS}(D)+1}$$\nExample. A population initially contains $1000$ persons without flu and after a year 160 of them got the flu. The odds of flu is 160/840.\nObserve that the incidence is 160/1000.\nOdds ratio $OR$ Definition - Odds ratio. The odds ratio of a medical event $D$ for people exposed to a factor is the quotient between the odds of people that acquired the event in a period of time in the treatment and control groups.\n$$OR(D)=\\frac{\\mbox{Odds in treatment group}}{\\mbox{Odds in control group}}=\\frac{a/b}{c/d}=\\frac{ad}{bc}$$\nOdds ratio compares the odds of a medical event between the treatment and the control groups. The interpretation is similar to the relative risk.\n$OR=1$ $\\Rightarrow$ There is no association between the event and the exposure to the factor. $OR\u0026lt;1$ $\\Rightarrow$ Exposure to the factor decreases the risk of the event. $OR\u0026gt;1$ $\\Rightarrow$ Exposure to the factor increases the risk of the event. The further from 1, the stronger the association.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The odds ratio of getting the flu for people vaccinated is\n$$OR(D) = \\frac{20/480}{80/420} = 0.21875.$$\nThis means that the odds of getting the flu versus not getting the flu in vaccinated individuals is almost one fifth of that in unvaccinated, i.e. approximately for every 22 persons vaccinated with flu there will be 100 persons unvaccinated with flu.\nRelative risk vs Odds ratio Relative risk and odds ratio are two measures of association but their interpretation is slightly different. While the relative risk expresses a comparison of risks between the treatment and control groups, the odds ratio expresses a comparison of odds, that is not the same than the risk. Thus, an odds ratio of 2 does not mean that the treatment group has the double of risk of acquire the medical event.\nThe interpretation of the odds ratio is trickier because is counterfactual, and give us how many times is more frequent the event in the treatment group in comparison with the control group, assuming that in the control group the event is as frequent as the non-event.\nThe advantage of the odds ratio is that it does not depend on the prevalence or the incidence of the event, and must be used necessarily when the number of people with the medical event is selected arbitrarily in both groups, like in the case-control studies.\nExample. In order to determine the association between lung cancer and smoking two samples were selected (the second one with the double of non-cancer individuals) getting the following results:\nSample 1\nCancer No cancer Smokers 60 80 Non-smokers 40 320 $$ \\begin{aligned} RR(D) \u0026amp;= \\frac{60/(60+80)}{40/(40+320)} = 3.86.\\newline OR(D) \u0026amp;= \\frac{60/80}{40/320} = 6. \\end{aligned} $$\nSample 2\nCancer No cancer Smokers 60 160 Non-smokers 40 640 $$ \\begin{aligned} RR(D) \u0026amp;= \\frac{60/(60+160)}{40/(40+640)} = 4.64.\\newline OR(D) \u0026amp;= \\frac{60/160}{40/640} = 6. \\end{aligned} $$\nThus, when we change the incidence or the prevalence of the event (lung cancer) the relative risk changes, while the odds ratio not.\nThe relation between the relative risk and the odds ratio is given by the following formula\n$$RR = \\frac{OR}{1-R_0+R_0OR} = OR\\frac{1-R_1}{1-R_0},$$\nwhere $R_0$ and $R_1$ are the prevalence or the incidence in control and treatment groups respectively.\nThe odds ratio always overestimate the relative risk when it is greater than 1 and underestimate it when it is less than 1. However, with rare medical events (with very small prevalence or incidence) the relative risk and the odds ratio are almost the same.\nDiagnostic tests In Epidemiology it is common to use diagnostic test to diagnose diseases.\nIn general, diagnostic tests are not fully reliable and have some risk of misdiagnosis as it is represented in the table below.\n$$ \\begin{array}{|l|c|c|} \\hline \u0026amp; \\mbox{Presence of disease }D \u0026amp; \\mbox{Absence of disease }\\bar D\\newline \\hline \\mbox{Test outcome positive } + \u0026amp; \\color{green}{ \\mbox{True Positive } TP} \u0026amp; \\color{red}{\\mbox{False Positive } FP}\\newline \\hline \\mbox{Test outcome negative } - \u0026amp; \\color{red}{\\mbox{False Negative } FN} \u0026amp; \\color{green}{\\mbox{True Negative } TN}\\newline \\hline \\end{array} $$\nSensitivity and specificity of a diagnostic test The performance of a diagnostic test depends on the following two probabilities.\nDefinition - Sensitivity. The sensitivity of a diagnostic test is the proportion of positive outcomes in persons with the disease$$P(+|D)=\\frac{TP}{TP+FN}$$ Definition - Specificity. The specificity of a diagnostic test is the proportion of negative outcomes in persons without the disease$$P(-|\\overline{D})=\\frac{TN}{TN+FP}$$ Sensitivity and specificity interpretation Usually, there is a trade-off between sensitivity and specificity.\nA test with high sensitivity will detect the disease in most sick persons, but it will produce also more false positives than a less sensitive test. This way, a positive outcome in a test with high sensitivity is not useful for confirming the disease, but a negative outcome is useful for ruling out the disease, since it rarely misdiagnoses those who have the disease.\nOn the other hand, a test with a high specificity will rule out the disease in most healthy persons, but it will produce also more false negatives than a less specific test. Thus, a negative outcome in a test with high specificity is not useful for ruling out the disease, but a positive is useful to confirm the disease, since it rarely give positive outcomes in healthy people.\nDeciding on a test with greater sensitivity or a test with greater specificity depends on the type of disease and the goal of the test. In general, we will use a sensitive test when:\nThe disease is serious and it is important to dectect it. The disease is curable. The false positives do not provoke serious traumas. An we will use a specific test when:\nThe disease is important but difficult or impossible to cure. The false positives provoke serious traumas. The treatment of false positives can have dangerous consequences. Predictive values of a diagnostic test But the most important aspect of a diagnostic test is its predictive power, that is measured with the following two posterior probabilities.\nDefinition - Positive predictive value $PPV$. The positive predictive value of a diagnostic test is the proportion of persons with the disease to persons with a positive outcome$$P(D|+) = \\frac{TP}{TP+FP}$$ Definition - Negative predictive value $NPV$. The negative predictive value of a diagnostic test is the proportion of persons without the disease to persons with a negative outcome$$P(\\overline{D}|-) = \\frac{TN}{TN+FN}$$ Positive and negative predictive values allow to confirm or to rule out the disease, respectively, if they reach at least a threshold of $0.5$.\n$$ \\begin{array}{rcl} PPV\u0026gt;0.5 \u0026amp; \\Rightarrow \u0026amp; \\mbox{Disease diagnostic}\\newline NPV\u0026gt;0.5 \u0026amp; \\Rightarrow \u0026amp; \\mbox{Not disease diagnostic} \\end{array} $$\nHowever, these probabilities depends on the proportion of persons with the disease in the population $P(D)$ that is known as of the disease. They can be calculated from the sensitivity and the specificity of the diagnostic test using the Bayes theorem.\n$$ \\begin{aligned} PPV=P(D|+) \u0026amp;= \\frac{P(D)P(+|D)}{P(D)P(+|D)+P(\\overline{D})P(+|\\overline{D})}\\newline NPV=P(\\overline{D}|-) \u0026amp;= \\frac{P(\\overline{D})P(-|\\overline{D})}{P(D)P(-|D)+P(\\overline{D})P(-|\\overline{D})} \\end{aligned} $$\nThus, with frequent diseases, the positive predictive value increases, and with rare diseases, the negative predictive value increases.\nExample. A diagnostic test for the flu has been tried in a random sample of 1000 persons. The results are summarized in the table below.\n$$ \\begin{array}{|l|c|c|} \\hline \u0026amp; \\mbox{Presence of flu } D \u0026amp; \\mbox{Absence of flu } \\bar D\\newline \\hline \\mbox{Test outcome } + \u0026amp; 95 \u0026amp; 90 \\newline \\hline \\mbox{Test outcome }- \u0026amp; 5 \u0026amp; 810 \\newline \\hline \\end{array} $$\nAccording to this sample, the prevalence of the flu can be estimated as\n$$P(D) = \\frac{95+5}{1000} = 0.1.$$\nThe sensitivity of this diagnostic test is\n$$P(+|D) = \\frac{95}{95+5}= 0.95.$$\nAnd the specificity is\n$$P(-|\\overline{D}) = \\frac{810}{90+810}=0.9.$$\nThe predictive positive value of the diagnostic test is\n$$PPV = P(D|+) = \\frac{95}{95+90} = 0.5135.$$\nAs this value is over $0.5$, this means that we will diagnose the flu if the outcome of the test is positive. However, the confidence in the diagnostic will be low, as this value is pretty close to $0.5$.\nOn the other hand, the predictive negative value is\n$$NPV = P(\\overline{D}|-) = \\frac{810}{5+810} = 0.9939.$$\nAs this value is almost 1, that means that is almost sure that a person does not have the flu if he or she gets a negative outcome in the test.\nThus, this test is a powerful test to rule out the flu, but not so powerful to confirm it.\nLikelihood ratios of a diagnostic test The following measures are usually derived from sensitivity and specificity.\nDefinition - Positive likelihood ratio $LR+$. The positive likelihood ratio of a diagnostic test is the ratio between the probability of positive outcomes in persons with the disease and healthy persons respectively,\n$$LR+=\\frac{P(+|D)}{P(+|\\overline{D})} = \\frac{\\mbox{Sensitivity}}{1-\\mbox{Specificity}}$$\nDefinition - Negative likelihood ratio $LR-$. The negative likelihood ratio of a diagnostic test is the ratio between the probability of negative outcomes in persons with the disease and healthy persons respectively, $$LR-=\\frac{P(-|D)}{P(-|\\overline{D})} = \\frac{1-\\mbox{Sensitivity}}{\\mbox{Specificity}}$$ Positive likelihood ratio can be interpreted as the number of times that a positive outcome is more probable in people with the disease than in people without it.\nOn the other hand, negative likelihood ratio can be interpreted as the number of times that a negative outcome is more probable in people with the disease than in people without it.\nPost-test probabilities can be calculated from pre-test probabilities through likelihood ratios.\n$$P(D|+) = \\frac{P(D)P(+|D)}{P(D)P(+|D)+P(\\overline{D})P(+|\\overline{D})} = \\frac{P(D)LR+}{1-P(D)+P(D)LR+}$$\nThus,\nA likelihood ratio greater than 1 increases the probability of disease. A likelihood ratio less than 1 decreases the probability of disease. A likelihood ratio 1 does not change the pre-test probability. ","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1615158565,"objectID":"dc3b86f5c99c3bb3d06c28a98d3a21e5","permalink":"/en/teaching/statistics/manual/probability/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/probability/","section":"teaching","summary":"Descriptive Statistics provides methods to describe the variables measured in the sample and their relations, but it does not allow to draw any conclusion about the population.\nNow it is time to take the leap from the sample to the population and the bridge for that is Probability Theory.","tags":["Statistics","Biostatistics","Descriptive-Statistics"],"title":"Probability","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Spreadsheets are used mainly for doing calculations and one of the most powerful features of spreadsheets are calculation formulas. In this section we will see how to use them.\nEnter formulas To enter a formula in a cell always start typing an equal sign = and then the formula expression.\nFormula expressions can contain arithmetic operators: addition +, subtraction -, multiplication *, division / and powers ^ and named predefined functions like SUM, EXP, SIN, etc. This allow to use Excel as a calculator. When Excel evaluates expressions first evaluate named functions, then powers, then products and quotients, and finally additions and subtractions, but it\u0026rsquo;s possible to use parenthesis to force the evaluation of a subexpression before.\nExample Assuming that cells A1, B1 and C1 contain the values 6,3 and 2 respectively, the next table shows some formulas and their respective results.\nFormula Result A1+B1-C1 7 A1+B1*C1 12 (A1+B1)*C1 18 A1/B1-C1 0 A1/(B1-C1) 6 A1+B1^C1 15 (A1+B1)^C1 81 Example. The animation below shows how to enter the formula 4+2 in cell A1, the formula 4-2 in cell B1, the formula 4*2 in cell C1, the formula 4/2 in cell D1, the formula 4^2 in cell E1 and the formula ((4+1)*2)^3 in cell F1.\nUsing relative and absolutes cell references in formulas Formula expressions can content references to cells. When Excel evaluates formulas it replace every cell reference by its content before doing the calculation.\nExample. The animation below shows how to use the formula =A1+B1 to add up the content of cells A1 and B1 in cell C1.\nReferences that are formed by the name of the cell or range are known as relative references, because referenced cells change When you copy a cell with a formula and paste in another cell. In general, when you copy a formula $n$ columns to the right and $m$ rows down, the referenced cells in the formulas will be updated by the cells $n$ columns to the right and $m$ rows down, an the same if you copy the cell to the left or top.\nExample. The animation below shows how to copy the formula =A1+B1 in cell C1, with relative references to A1 and B1, to the cell E4, that is 2 columns to the right and 3 rows down. Observe how the formula in cell E4 is updated to =C4+D4.\nA common way of copying the formula of a cell to adjacent cells is clicking the bottom-right corner of the cell and dragging the cursor to the desired range of cells.\nExample. The animation below shows how to generate the first ten numbers of the Fibonacci sequence. Cells A1 and B1 contains the two first numbers of the serie and cell C1 the formula =A1+B1 that add the two first numbers up and gives the third number of the serie. For generating the rest of the serie it is enough to copy the formula of cell C1 to the range D1:J1. Observe how references in formulas of these cells are updated.\nAlthough relative references are very helpful in many cases, sometimes we need the references in a formula to remain fixed when copied elsewhere.\nIn that case we need to use absolute references, that are like relative references but preceding the column name or the row name with a $ sign to fix either the row, the column or both on any cell reference.\nExample. The animation below shows how to calculate the IVA of a list of prices. Cells A2 to A5 contains the prices and cell F1 contains the IVA percentage. For calculating the IVA of first price we use the formula A2*F$4/100 where we fix the row of cell F4 because we wan it remain fixed when copying the formula down. Observe how the reference to cell F4 doesn\u0026rsquo;t change when copying the formula down.\nExample. The animation below shows how to calculate the multiplication table using absolute references.\nIn general, if you want to fix a reference in a formula that you pretend to copy horizontally, you must precede the column name with a $ sign; and if you pretend to copy the formula vertically, you must precede the row name with a $ sign.\nNaming cells and ranges Cell references are somewhat abstract, and don\u0026rsquo;t really communicate anything about the data they contain. This makes formulas that involve multiple references difficult to understand. To overcome this difficulty Excel allows to give name to cells or ranges. To define a cell or range name, select or cell range and click the Define Name button of the Defined Names panel in the ribbon\u0026rsquo;s Formulas tab. In the dialog that appears give a name to the cell and click OK. Cell or range names must begin with a letter and can\u0026rsquo;t include spaces.\nYou can also set the name of a cell or range in the name box of the input bar.\nAfter that you can use that cell o range name in any formula. Observe that references with names are always absolutes.\nExample. The animation below shows how to calculate the IVA of a list of prices using a cell name for the cell that contains the IVA percentage.\nFunctions Excel has a huge library of predefined functions that performs different calculations organised by categories. There are three ways to to enter a function in a formula expression:\nType it rawly if you know its name and syntax. Select it from the buttons of the Functions Library panel in the ribbon\u0026rsquo;s Formulas tab. Click the Insert Function button from the input bar. This will show you a dialog where you can type some key words for looking the desired function an select it. This dialog also shows help about the function and its syntax. Numeric functions Numeric functions work with numbers or cells that contains numbers. They are the most frequently used.\nSUM function The most common function is SUM that calculates the sum of several numbers. Its syntax is SUM(number1,number2,...) where number1, number2, etc. are the numbers or cell ranges that you want to sum.\nExample The animation below shows how to calculate the sum of the subject grades for every student in a course.\nSUMIF function The SUMIF function its similar to the SUM function but only sum numbers that satisfied a given criterion. Its syntax is SUMIF(range,criterion,sum-range) range is the cell range to check the criterion, criterion is the condition expression of the criterion, sum-range is the range with the values to sum (if this argument is not provided, the sum is calculated over the values of the range argument that meet the criterion).\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the sum of the grades greater than or equal to 5 for every student in a course.\nCOUNT function The COUNT function counts the number of cells with numbers in a range. Its syntax is COUNT(value1,value2,...) where value1, value2, etc. are the values or cell ranges to count.\nExample The animation below shows how to calculate the number of subjects grades for every student in a course.\nCOUNTIF function The COUNTIF function its similar to the COUNT but only counts number of cells that satisfied a given criterion. Its syntax is SUMIF(range,criterion) range is the cell range to check the criterion and criterion is the condition expression of the criterion,.\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the number of passed subjects (grade greater than or equal to 5).\nMIN function The MIN function calculates the minimum value of several numbers. Its syntax is MIN(number1,number2,...) where number1, number2, etc. are numbers or cell ranges for which you want the minimum.\nExample The animation below shows how to calculate the minimum grade for every student in a course.\nMAX function The MAX function calculates the maximum value of several numbers. Its syntax is MAX(number1,number2,...) where number1, number2, etc. are numbers or cell ranges for which you want the maximum.\nExample The animation below shows how to calculate the maximum grade for every student in a course.\nISNUMBER function The ISNUMBER function checks if a value is number or not and returns the logical value TRUE in the first case and FALSE in the second. Its syntax is ISNUMBER(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain numbers or not. Observe that in the example cells with numbers are aligned to the right and that dates are numbers.\nLogical functions Logical functions are very useful to take decisions.\nIF function The most important logical function is the IF function, that checks whether a condition is met and returns a value if is true or another value if is false. Its syntax is IF(condition,true_value,false_value), where condition is the logical condition to test, true_value is the returned value if the condition is true, and false_value is the returned value if the condition is false.\nIn the logical condition expression you use logical operators like equal =, not equal \u0026lt;\u0026gt;, greater \u0026gt;, less \u0026lt;, greater than or equal to \u0026gt;=, less than or equal to \u0026lt;=, etc. In the true or false value you can put numbers, text in double quotes, dates, cell references or other formulas.\nExample The animation below shows how to use the IF function to decide if students pass or don\u0026rsquo;t pass a course depending on whether the average grade is greater than or equal to 5.\nAND function The AND function will return TRUE if all its arguments are true and FALSE if at least one argument is false. Its syntax is AND(contidion1,condition2,...), where condition1, condition2, etc are logical conditions.\nThe following table, known as a truth table, shows the returned value by the AND function according to the corresponding values of its arguments.\nA B AND(A,B) TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE Example. The animation below shows how to use the AND function to see which students have passed all the subjects of a course with a grade greater than or equal to 5. Observe that conditions that involve blank cells are always false.\nOR function The OR function will return TRUE if one or more of its arguments are true and FALSE if all its arguments are false. Its syntax is OR(contidion1,condition2,...), where condition1, condition2, etc are logical conditions.\nThe following truth table shows the returned value by the OR function according to the corresponding values of its arguments.\nA B OR(A,B) TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE Example. The animation below shows how to use the OR function to see which students have not passed some subjects of a course with a grade greater than or equal to 5.\nNOT function The NOT function will return TRUE if its argument is FALSE, and FALSE if its argument is TRUE. Its syntax is NOT(condition), where condition is a logical condition.\nThe following truth table shows the returned value by the NOT function according to the corresponding values of its argument.\nA NOT(A) TRUE FALSE FALSE TRUE Date and time functions Date and time functions performs operations with dates and times respectively.\nExcel convert automatically any entry with with a date or time formats into a serial number. For dates, this serial number represents the number of days that have elapsed since the beginning of the twentieth century (so that January 1, 1900, is serial number 1; January 2, 1900, is serial number 2; and so on). For times, this serial number is a fraction that represents the number of hours, minutes, and seconds that have elapsed since midnight (so that 00:00:00 is serial number 0.00000000, 12:00:00 p.m. (noon) is serial number 0.50000000; 11:00:00 p.m. is 0.95833333; and so on).\nTime elapsed between two dates or times. To calculate the time elapsed between two dates or times, just enter a formula that subtracts the earlier date or time from the later date or time. In the case of dates, Excel will return the number of days between these dates. If you want to express it in year units, just divide the number of days by 365.25. In the case of times, Excel will return the number of hours between these times. If you want to express it in days unit, just change the cell format to General.\nExample. The animation below shows how to calculate the time elapsed between two dates and two times.\nTODAY function The function TODAY returns the system date (usually the current date). Its syntax is TODAY() and this functions doesn\u0026rsquo;t have arguments.\nExample. The animation below shows how to calculate current age of a person using the TODAY function.\nDATE function The function DATE returns a date serial number for the date specified by the year, month, and day argument. Its syntax is DATE(year,month,day), where year is the year, month is the month (in number) and day is the day.\nExample. The animation below shows how to calculate the date given the year, moth and day.\nDAY, WEEKDAY, MONTH and YEAR functions The DAY function returns the day of the month of a date. Its\u0026rsquo; syntax is DAY(date), where date is the serial number of the date.\nThe WEEKDAY function returns the day of the week of a date. Its\u0026rsquo; syntax is WEEKDAY(date,type), where date is the serial number of the date and type has three possible values (1: 1 equals Sunday and 7 Saturday, 2: 1 equals Monday and 7 equals Sunday; 3: 0 equals Monday and 6 equals Sunday).\nThe MONTH function returns the number of the month of a date. Its\u0026rsquo; syntax is MONTH(date), where date is the serial number of the date.\nThe YEAR function returns the year of a date. Its\u0026rsquo; syntax is YEAR(date), where date is the serial number of the date.\nExample. The animation below shows how to calculate the day, week day, month and year of a date.\nNOW function The function NOW returns the system time (usually the current time). Its syntax is NOW() and this functions doesn\u0026rsquo;t have arguments.\nExample. The animation below shows how to calculate current age of a person using the TODAY function.\nTIME function The function TIME returns a time serial number for the time specified by the hours, minutes and seconds argument. Its syntax is TIME(hours,minutes,seconds), where year is the year, month is the month (in number) and day is the day.\nExample. The animation below shows how to calculate the date given the year, moth and day.\nHOUR, MINUTE and SECOND functions The HOUR function returns the hour of a time. Its\u0026rsquo; syntax is HOUR(time), where time is the serial number of the time.\nThe MINUTE function returns the minute of a time. Its\u0026rsquo; syntax is MINUTE(time), where time is the serial number of the time.\nThe SECOND function returns the hour of a time. Its\u0026rsquo; syntax is SECOND(time), where time is the serial number of the time.\nExample. The animation below shows how to calculate the hour, minute and second of a time.\nText functions Text functions performs different actions on text data type.\nTEXT function The TEXT function converts a number into text using a format specified by the users. Its syntax is TEXT(number,format) where number is a number or a cell reference that you want to convert to text, and format is the format pattern for the text in double quotes. In that pattern you can use a 0 for numbers, . for decimal separator, d for days, m for months, y years, h for hours, m for minutes and s for seconds. Also you can use currency signs and the percentage sign %.\nExample The animation below shows how to convert different numbers, dates and times to text.\nVALUE function The VALUE function converts a text string into a number. Its syntax is VALUE(text) where text is a text or a cell reference with text that represents a number.\nExample The animation below shows how to convert different text strings representing numbers, times and percentages to numbers.\nT function The T function checks if a value is text and if so, returns the text; Otherwise, the function returns an empty text string. Its syntax is T(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain text or not. Observe that in the example cells with text are aligned to the left.\nISTEXT function The ISTEXT function checks if a value is text or not and returns the logical value TRUE in the first case and FALSE in the second. Its syntax is ISTEXT(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain text or not. Observe that in the example cells with text are aligned to the left.\nLEN function The LEN function counts the number of characters of a text string. Its syntax is LEN(text) where text is a text string or a cell reference with text.\nExample The animation below shows how to count the number of characters of several words. Observe that numbers are previously converted to text, and that blank cells have 0 characters.\nCONCATENATE function The CONCATENATE function joins together two or more text strings into a combined text string. Its syntax is CONCATENATE(text1,text2,...) where text1, text2, \u0026hellip; are text strings or cell ranges with text to join.\nExample The animation below shows how to concatenate the first name and the last name of some persons with a blank space between them.\nFIND and SEARCH functions The FIND function returns the position of a specified character or sub-string within a given text string. Its syntax is FIND(find_text,within_text,[start_num]) where find_text is the sub-string to find, within_text is text where to find the sub-string, and start_num is an optional argument that specifies the position in the within_text string, from which the search should begin (if omitted the search starts from the first character). The search is case-sensitive.\nThe SEARCH functions works the same that the FIND function except that is not case-sensitive.\nExample The animation below shows how to calculate the position of some text sub-strings in a text with the FIND and the SEARCH functions.\nSUBSTITUTE functions The SUBSTITUTE function replaces one or more instances of a specified text sub-string with another one supplied within a given text string. Its syntax is SUBSTITUTE(text, old_text, new_text, [instance_num]) where text is the text where to perform the substitution, old_text is the sub-string to replace, new_text is the new text string that it is used to replace the old_text string, and instance_num is an optional argument that specifies which occurrence of the old_text should be replaced by the new_text (if this argument is not specified all instances of old_text are replaced with the new_text). The search is case-sensitive.\nExample The animation below shows how to replace some sub-strings in some texts by other text strings.\nLOWER and UPPER functions The LOWER function converts all characters in a text string to lower case. Its syntax is LOWER(text) where text is the text to convert to lower case.\nThe UPPER functions works like the LOWER function but it converts text to upper case.\nExample The animation below shows how to convert to lower case some text strings.\nDatabase functions See the Database functions section.\nMathematical functions Some common mathematical functions included in the function library are exponentials, logarithmic and trigonometric.\nSQRT function The SQRT function calculates the root square of a number. Its syntax is SQRT(number) where number is a number or a cell reference for which you want the square root.\nExample The animation below shows how to calculate the square root of grades in a course.\nEXP function The EXP function calculates the exponential of a number. Its syntax is EXP(number) where number is a number or a cell reference for which you want the exponential.\nExample The animation below shows how to calculate the exponential of grades in a course.\nLN and LOG functions The LN function calculates the natural logarithm of a number (that is with base $e$). Its syntax is LN(number) where number is a number or a cell reference for which you want the natural logarithm.\nThe LOG function calculates the logarithm of a number in a given base. Its syntax is LOG(number,[base]) where number is a number or a cell reference for which you want the logarithm and base is the base of the logarithm (if this argument is omitted, then base 10 is taken).\nExample The animation below shows how to calculate the natural logarithm and the base 10 logarithm of grades in a course.\nPI function The PI function returns the constant value of $\\pi$. Its syntax is PI() without arguments.\nSIN, COS and TAN functions The SIN function calculates the sine of an angle in radians. Its syntax is SIN(angle) where angle is a number or a cell reference with the radians for which you want the sine.\nThe COS function calculates the cosine of an angle in radians. Its syntax is COS(angle) where angle is a number or a cell reference with the radians for which you want the cosine.\nThe TAN function calculates the tangent of an angle in radians. Its syntax is TAN(angle) where angle is a number or a cell reference with the radians for which you want the tangent.\nIf angles are in degrees, they have to be converted to radians before with the function RADIANS(degrees) where degrees is a number or a cell reference with the degrees that you want to convert to radians.\nExample The animation below shows how to calculate the sine, cosine and tangent of several angles. Observe that the sine of an angle o 180 degrees is not exactly 0 because the RADIANS function does not calculate the radians corresponding to a number of degrees with total accuracy.\nROUND function The ROUND function rounds a number to a specified number of digits. Its syntax is ROUND(number,digits) where number is a number or a cell reference that you want to round and digits is the number of digits to which you want to round the number.\nExample The animation below shows how to round the grades in a course.\nABS function The ABS function calculates the absolute value of a number. Its syntax is ABS(number) where number is a number or a cell reference for which you want the absolute value.\nStatistical functions Excel provides functions to calculate the main descriptive statistics, probability distributions and also to make inferences about the population. For an introductory text to Statistics visit the Statistic manual page.\nAVERAGE function The AVERAGE function calculates the arithmetic mean of several numbers. Its syntax is AVERAGE(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the average.\nExample The animation below shows how to calculate the average grade for every student in a course. Observe that the average grade is well calculated even when there are blank cells in the range.\nAVERAGEIF function The AVERAGEIF function calculates the arithmetic mean of numbers in a cell range that meet a given criterion. Its syntax is AVERAGEIF\t(range,criterion,[average-range]) where range is the cell range to check the criterion, criterion is the condition expression of the criterion, average-range is the range with the values to average (if this argument is not provided, the average is calculated over the values of the range argument that meet the criterion).\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the average grade of students with a grade greater than or equal to 5 for every subject in a course.\nMEDIAN function The MEDIAN function calculates the median of several numbers. Its syntax is MEDIAN(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the median.\nExample The animation below shows how to calculate the median grade for every student in a course. Observe that the median grade is well calculated even when there are blank cells in the range.\nMODE function The MODE function calculates the mode of several numbers. Its syntax is MODE(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the mode.\nExample The animation below shows how to calculate the mode grade for every student in a course. Observe that the mode grade is not calculated when there are not repetitions of values.\nPERCENTILE.EXC function The PERCENTILE.EXC function calculates the k-th percentile of numbers in a cell range. Its syntax is PERCENTILE.EXC(range,k) where range is the cell range with the values for which you want the percentile, and k is the relative frequency (between 0 and 1) of the percentile.\nExample The animation below shows how to calculate the quartiles (percentiles 25, 50 and 75) of grades for every student in a course. Observe that if we use a cell reference for the k argument, putting a relative frequency in that cell (0.25 for first quartile, 0.5 for second quartile and 0.75 for third quartile) we get the correspondent percentile.\nVAR.P function The VAR.P function calculates the variance of several numbers. Its syntax is VAR.P(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the variance.\nExample The animation below shows how to calculate the variance of grades for every student in a course. Observe that the variance is well calculated even when there are blank cells in the range.\nSTDEV.P function The STDEV.P function calculates the standard deviation of several numbers. Its syntax is STDEV.P(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the standard deviation.\nExample The animation below shows how to calculate the standard deviation of grades for every student in a course. Observe that you can also calculate the standard deviation applying the square root to the variance.\nSKEW function The SKEW function calculates the skewness coefficient of several numbers. Its syntax is SKEW(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the skewness coefficient. Excel 2010 uses the following formula to calculate skewness:\n$$g_1=\\frac{n}{(n-1)(n-2)}\\sum \\left(\\frac{x_i-\\bar x}{s}\\right)^3,$$\nwhere $\\bar x$ is the mean and $s$ is the standard deviation.\nExample The animation below shows how to calculate the skewness coefficient of grades for every subject in a course.\nKURT function The KURT function calculates the kurtosis coefficient of several numbers. Its syntax is KURT(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the kurtosis coefficient. Excel 2010 uses the following formula to calculate kurtosis:\n$$g_1=\\frac{n(n+1)}{(n-1)(n-2)(n-3)}\\sum \\left(\\frac{x_i-\\bar x}{s}\\right)^4 - \\frac{3(n-1)^2}{(n-2)(n-3)},$$\nwhere $\\bar x$ is the mean and $s$ is the standard deviation.\nExample The animation below shows how to calculate the kurtosis coefficient of grades for every subject in a course.\nOther functions Other common functions are the following.\nISBLANK function The ISBLANK function checks if a value is null or a cell is blank. Its syntax is ISBLANK(value) where value is a value or a cell reference.\nExample The animation below shows how to check if some cells are blank or not. Observe that cell A3 is not blank because it contains a blank space.\nISERROR function The ISBLANK function checks if a value or cell is an error. Its syntax is ISERROR(value) where value is a value or a cell reference.\nExample The animation below shows how to check if some cells have errors.\nAuditing formulas When Excel can not perform an operation or when there is an error in a formula, it shows an error. Some common errors are\n#NAME? error. Occurs when Excel does not recognize text in a formula. Usually happens when you misspell the name of a function. #VALUE! error. Occurs when a formula has the wrong type of argument. Usually happens when you try to performs mathematical operations with cells that does not contain numbers. #DIV/0! error. Occurs when a formula tries to divide a number by 0 or an empty cell. #REF! error. Occurs when a formula refers to a cell that is not valid. Usually happens when a formula refers to a deleted cell. #NUM! error. Occurs when a formula or function contains invalid numeric values. For example when trying to calculate the square root of a negative number. #N/A error Occurs when a value is not available to a function or formula. In complex formulas it could be difficult to detect the error. Fortunately, Excel provide some tools for tracking down errors.\nTracing formulas The simplest procedure to trace formulas is double click a cell with a formula. This will show the cells referenced by the formula marked in different colours.\nAnother possibility is to trace precedents or dependents references. If you select a cell with a formula and click the Trace Precedents button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab, Excel will show arrows to the cells that affect the value of the selected cell. And if click the Trace Dependents button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab, Excel will show arrows to the cells that are affected by selected cell. To remove the arrow simply click the Remove Arrows button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab.\nExample The animation below shows how to trace a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\nError checking If some formula have an error, you can check where the error come from selecting the cell with the error and clicking the Error Checking button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab. This will display a dialog with the formula expression, an explanation of the error and several options. If the error is in the selected cell you can click the option Show Calculation Steps to evaluate the formula (see the section Formula evaluation). But if the error is in a cell that affects the selected cell you can click the option Trace Error. This will show red arrows to cells where the error come from.\nExample The animation below shows how to check an error in a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\nFormula evaluation In general, you can evaluate any formula, even if it has no error, selecting the cell with the formula and clicking the Formula Evaluation button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab. This will display a dialog where you can evaluate the formula step by step.\nExample The animation below shows how to check an error in a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"90b31b8635dd3d2cb0c0a2711c78a68c","permalink":"/en/teaching/excel/manual/formulas/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/formulas/","section":"teaching","summary":" ","tags":["Excel"],"title":"Formulas","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Give some examples of:\nNon related variables. Variables that are increasingly related. Variables that are decreasingly related. Solution The daily averge temperature and the daily number of births in a city. The hours preparing an exam and the score. The weight of a person and the time require to run 100 meters. Exercise 2 In a study about the effect of different doses of a medicament, 2 patients got 2 mg and took 5 days to cure, 4 patients got 2 mg and took 6 days to cure, 2 patients got 3 mg ant took 3 days to cure, 4 patients got 3 mg and took 5 days to cure, 1 patient got 3 mg and took 6 days to cure, 5 patients got 4 mg and took 3 days to cure and 2 patients got 4 mg and took 5 days to cure.\nConstruct the joint frequency table. Get the marginal frequency distributions and compute the main statistics for each variable. Compute the covariance and interpret it. Solution $$ \\begin{array}{c|c|c|c} \\hline \\mbox{dose/days} \u0026amp; 3 \u0026amp; 5 \u0026amp; 6\\newline \\hline 2 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4\\newline \\hline 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 1\\newline \\hline 4 \u0026amp; 5 \u0026amp; 2 \u0026amp; 0\\newline \\hline \\end{array} $$\n$$ \\begin{array}{c|c|c|c|c} \\hline \\mbox{dose/days} \u0026amp; 3 \u0026amp; 5 \u0026amp; 6 \u0026amp; \\mbox{Sum}\\newline \\hline 2 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4 \u0026amp; 6\\newline \\hline 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 1 \u0026amp; 7\\newline \\hline 4 \u0026amp; 5 \u0026amp; 2 \u0026amp; 0 \u0026amp; 7\\newline \\hline \\mbox{Sum} \u0026amp; 7 \u0026amp; 8 \u0026amp; 5 \u0026amp; 20\\newline \\hline \\end{array} $$\nDose: $\\bar x=3.05$ mg, $s_x^2=0.6475$ mg$^2$, $s_x=0.8047$ mg. Days: $\\bar y=4.55$ days, $s_y^2=1.4475$ days$^2$, $s_y=1.2031$ days. 3. $s_{xy}=-0.6775$ mg$\\cdot$days.\nExercise 3 The table below shows the two-dimensional frequency distribution of a sample of 80 persons in a study about the relation between the blood cholesterol ($X$) in mg/dl and the high blood pressure ($Y$).\n$$ \\begin{array}{|c||c|c|c||c|} \\hline X\\setminus Y \u0026amp; [110,130) \u0026amp; [130,150) \u0026amp; [150,170) \u0026amp; n_x \\newline \\hline\\hline [170,190) \u0026amp; \u0026amp; 4 \u0026amp; \u0026amp; 12\\newline \\hline [190,210) \u0026amp; 10 \u0026amp; 12 \u0026amp; 4 \u0026amp; \\newline \\hline [210,230) \u0026amp; 7 \u0026amp; \u0026amp; 8 \u0026amp; \\newline \\hline [230,250) \u0026amp; 1 \u0026amp; \u0026amp; \u0026amp; 18\\newline \\hline\\hline n_y \u0026amp; \u0026amp; 30 \u0026amp; 24 \u0026amp; \\newline \\hline \\end{array} $$\nComplete the table. Construct the linear regression model of cholesterol on pressure. Use the linear model to calculate the expected cholesterol for a person with pressure 160 mmHg. According to the linear model, what is the expected pressure for a person with cholesterol 270 mg/dl? Use the following sums: $\\sum x_i=16960$ mg/dl, $\\sum y_j=11160$ mmHg, $\\sum x_i^2=3627200$ (mg/dl)$^2$, $\\sum y_j^2=1576800$ mmHg$^2$ y $\\sum x_iy_j=2378800$ mg/dl$\\cdot$mmHg.\nSolution $$ \\begin{array}{|c||c|c|c||c|} \\hline X\\setminus Y \u0026amp; [110,130) \u0026amp; [130,150) \u0026amp; [150,170) \u0026amp; n_x \\newline \\hline\\hline [170,190) \u0026amp; 8 \u0026amp; 4 \u0026amp; 0 \u0026amp; 12\\newline \\hline [190,210) \u0026amp; 10 \u0026amp; 12 \u0026amp; 4 \u0026amp; 26 \\newline \\hline [210,230) \u0026amp; 7 \u0026amp; 9 \u0026amp; 8 \u0026amp; 24 \\newline \\hline [230,250) \u0026amp; 1 \u0026amp; 5 \u0026amp; 12 \u0026amp; 18\\newline \\hline\\hline n_y \u0026amp; 26 \u0026amp; 30 \u0026amp; 24 \u0026amp; 80\\newline \\hline \\end{array} $$\n$\\bar x=212$ mg/dl, $s_x^2=396$ (mg/dl)$^2$. $\\bar y=139.5$ mmHg, $s_y^2=249.75$ mmHg$^2$. $s_{xy}=161$ mg/dl$\\cdot$mmHg. Regression line of cholesterol on blood pressure: $x=122.0721 + 0.6446y$. 3. $x(160)=225.2152$ mg/dl. 4.\nRegression line of blood pressure on cholesterol: $y=53.3081 + 0.4066x$. $y(270)=163.0808$ mmHg.\nExercise 4 A research study has been conducted to determine the loss of activity of a drug. The table below shows the results of the experiment.\n$$ \\begin{array}{lrrrrr} \\hline \\mbox{Time (in years)} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \\newline \\mbox{Activity (%)} \u0026amp; 96 \u0026amp; 84 \u0026amp; 70 \u0026amp; 58 \u0026amp; 52 \\newline \\hline \\end{array} $$\nConstruct the linear regression model of activity on time. According to the linear model, when will the activity be 80%? When will the drug have lost all activity? Solution $\\bar x=3$ years, $s_x^2=2$ years$^2$. $\\bar y=72$ %, $s_y^2=264$ %$^2$. $s_{xy}=-22.8$ years$\\cdot$%. Regression line of activity on time: $y=106.2 + -11.4x$. Regression line of time on activity: $x=9.2182 + -0.0864y$. $x(80)=2.3091$ years and $x(0)=9.2182$ years.\nExercise 5 A basketball team is testing a new stretching program to reduce the injuries during the league. The data below show the daily number of minutes doing stretching exercises and the number of injuries along the league.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Stretching minutes} \u0026amp; 0 \u0026amp; 30 \u0026amp; 10 \u0026amp; 15 \u0026amp; 5 \u0026amp; 25 \u0026amp; 35 \u0026amp; 40\\newline \\mbox{Injuries} \u0026amp; 4 \u0026amp; 1 \u0026amp; 2 \u0026amp; 2 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 1\\newline \\hline \\end{array} $$\nConstruct the regression line of the number of injuries on the time of stretching. How much is the reduction of injuries for every minute of stretching? How many minutes of stretching are require for having no injuries? Is reliable this prediction? Use the following sums ($X$=Number of minutes stretching, and $Y$=Number of injuries): $\\sum x_i =160$ min, $\\sum y_j=14$ injuries, $\\sum x_i^2=4700$ min$^2$, $\\sum y_j^2=36$ injuries$^2$ and $\\sum x_iy_j=160$ min$\\cdot$injuries.\nSolution $\\bar x=20$ min, $s_x^2=187.5$ min$^2$. $\\bar y=1.75$ injuries, $s_y^2=1.4375$ injuries$^2$. $s_{xy}=-15$ min$\\cdot$injuries. Regression line of injuries on time of stetching: $y=3.35 + -0.08x$. $0.08$ injuries/min. Regression line of time of stretching on injuries: $x=38.2609 + -10.4348y$. $x(0)=38.2609$ min. $r^2=0.8348$.\nExercise 6 For two variables $X$ and $Y$ we have\nThe regression line of $Y$ on $X$ is $y-x-2=0$. The regression line of $X$ on $Y$ is $y-4x+22=0$. Calculate:\nThe means $\\bar x$ and $\\bar y$. The correlation coefficient. Solution $\\bar x=8$ and $\\bar y=10$. $r=0.5$. Exercise 7 The means of two variables $X$ and $Y$ are $\\bar x=2$ and $\\bar y=1$, and the correlation coefficient is 0.\nPredict the value of $Y$ for $x=10$. Predict the value of $X$ for $y=5$. Plot both regression lines. Solution $y(10)=1$. $x(5)=2$. Exercise 8 A study to determine the relation between the age and the physical strength gave the scatter plot below. Calculate the linear coefficient of determination for the whole sample. Calculate the linear coefficient of determination for the sample of people younger than 25 years old. Calculate the linear coefficient of determination for the sample of people older than 25 years old. For which age group the relation between age and strength is stronger? Use the following sums ($X$=Age and $Y=$Weight lifted).\nWhole sample: $\\sum x_i=431$ years, $\\sum y_j=769$ Kg, $\\sum x_i^2=13173$ years$^2$, $\\sum y_j^2=39675$ Kg$^2$ and $\\sum x_iy_j=21792$ years$\\cdot$Kg.\nYoung people: $\\sum x_i=123$ years, $\\sum y_j=294$ Kg, $\\sum x_i^2=2339$ years$^2$, $\\sum y_j^2=14418$ Kg$^2$ and $\\sum x_iy_j=5766$ years$\\cdot$Kg.\nOld people: $\\sum x_i=308$ years, $\\sum y_j=475$ Kg, $\\sum x_i^2=10834$ years$^2$, $\\sum y_j^2=25257$ Kg$^2$ and $\\sum x_iy_j=16026$ years$\\cdot$Kg.\nSolution $\\bar x=26.9375$ years, $s_x^2=97.6836$ years$^2$. $\\bar y=48.0625$ kg, $s_y^2=169.6836$ kg$^2$. $s_{xy}=67.3164$ years$\\cdot$kg. $r^2=0.2734$. $\\bar x=17.5714$ years, $s_x^2=25.3878$ years$^2$. $\\bar y=42$ kg, $s_y^2=295.7143$ kg$^2$. $s_{xy}=85.7143$ years$\\cdot$kg. $r^2=0.9786$. $\\bar x=34.2222$ years, $s_x^2=32.6173$ years$^2$. $\\bar y=52.7778$ kg, $s_y^2=20.8395$ kg$^2$. $s_{xy}=-25.5062$ years$\\cdot$kg. $r^2=0.9571$. The linear relation between the age and the physical strength is a little bit stronger in the group of young people. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1601555270,"objectID":"8f518bade28c9dd4b2f3818225d824e9","permalink":"/en/teaching/statistics/problems/linear_regression/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/linear_regression/","section":"teaching","summary":"Exercise 1 Give some examples of:\nNon related variables. Variables that are increasingly related. Variables that are decreasingly related. Solution The daily averge temperature and the daily number of births in a city.","tags":["Regression","Linear Regression"],"title":"Problems of Linear Regression","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Random variables The process of drawing a sample randomly is a random experiment and any variable measured in the sample is a random variable because the values taken by the variable in the individuals of the sample are a matter of chance.\nDefinition - Random variable. A random variable $X$ is a function that maps every element of the sample space of a random experiment to a real number.\n$$X:\\Omega \\rightarrow \\mathbb{R}$$\nThe set of values that the variable can assume is called the range and is represented by $\\mbox{Ran}(X)$.\nIn essence, a random variable is a variable whose values come from a random experiment, and every value has a probability of occurrence.\nExample. The variable $X$ that measures the outcome of rolling a dice is a random variable and its range is $\\mbox{Ran}(X)={1,2,3,4,5,6}$.\nTypes of random variables There are two types of random variables:\nDiscrete. They take isolated values, and their range is numerable. Example. Number of children of a family, number of smoked cigarettes, number of subjects passed, etc.\nContinuous. They can take any value in a real interval, and their range is non-numerable. Example. Weight, height, age, cholesterol level, etc.\nThe way of modelling each type of variable is different. In this chapter we are going to study how to model discrete variables.\nProbability distribution of a discrete random variable As values of a discrete random variable are linked to the elementary events of a random experiment, every value has a probability.\nDefinition - Probability function. The probability function of a discrete random variable $X$ is the function $f(x)$ that maps every value $x_i$ of the variable to its probability$$f(x_i) = P(X=x_i).$$ We can also accumulate probabilities the same way that we accumulated sample frequencies.\nDefinition - Distribution function. The distribution function of a discrete random variable $X$ is the function $F(x)$ that maps every value $x_i$ of the variable to the probability of having a value less than or equal to $x_i$$$F(x_i) = P(X\\leq x_i) = f(x_1)+\\cdots +f(x_i).$$ The range of a discrete random variable and its probability function is known as probability distribution of the variable, and it is usually presented in a table\n$$ \\begin{array}{|c|cccc|c|} \\hline X \u0026amp; x_1 \u0026amp; x_2 \u0026amp; \\cdots \u0026amp; x_n \u0026amp; \\sum\\newline \\hline f(x) \u0026amp; f(x_1) \u0026amp; f(x_2) \u0026amp; \\cdots \u0026amp; f(x_n) \u0026amp; 1\\newline \\hline F(x) \u0026amp; F(x_1) \u0026amp; F(x_2) \u0026amp; \\cdots \u0026amp; F(x_n) =1 \u0026amp; \\newline \\hline \\end{array} $$\nThe same way that the sample frequency table shows the distribution of values of a variable in the sample, the probability distribution of a discrete random variable shows the distribution of values in the whole population.\nExample. Let $X$ be the discrete random variable that measures the number of heads after tossing two coins. The probability tree of the random experiment is\nAccording to this, the probability distribution of $X$ is\n$$\\begin{array}{|c|ccc|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2\\newline \\hline f(x) \u0026amp; 0.25 \u0026amp; 0.5 \u0026amp; 0.25\\newline \\hline F(x) \u0026amp; 0.25 \u0026amp; 0.75 \u0026amp; 1 \\newline \\hline \\end{array} \\qquad F(x) = \\begin{cases} 0 \u0026amp; \\mbox{si $x\u0026lt;0$}\\newline 0.25 \u0026amp; \\mbox{si $0\\leq x\u0026lt; 1$}\\newline 0.75 \u0026amp; \\mbox{si $1\\leq x\u0026lt; 2$}\\newline 1 \u0026amp; \\mbox{si $x\\geq 2$} \\end{cases} $$\nPopulation statistics The same way we use sample statistics to describe the sample frequency distribution of a variable, we use population statistics to describe the probability distribution of a random variable in the whole population.\nThe population statistics definition is analogous to the sample statistics definition, but using probabilities instead of relative frequencies.\nThe most important are 1:\nDefinition - Discrete random variable mean The mean or the expectec value of a discrete random variable $X$ is the sum of the products of its values and its probabilities:\n$$\\mu = E(X) = \\sum_{i=1}^n x_i f(x_i)$$\nDefinition - Discrete random variable variance and standard deviation The variance of a discrete random variable $X$ is the sum of the products of its squared values and its probabilities, minus the squared mean:\n$$\\sigma^2 = Var(X) = \\sum_{i=1}^n x_i^2 f(x_i) -\\mu^2$$\nThe standard deviation of a random variable $X$ is the square root of the variance:\n$$\\sigma = +\\sqrt{\\sigma^2}$$\nExample. In the random experiment of tossing two coins the probability distribution is\n$$ \\begin{array}{|c|ccc|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2\\newline \\hline f(x) \u0026amp; 0.25 \u0026amp; 0.5 \u0026amp; 0.25\\newline \\hline F(x) \u0026amp; 0.25 \u0026amp; 0.75 \u0026amp; 1 \\newline \\hline \\end{array} $$\nThe main population statistics are\n$$ \\begin{aligned} \\mu \u0026amp;= \\sum_{i=1}^n x_i f(x_i) = 0\\cdot 0.25 + 1\\cdot 0.5 + 2\\cdot 0.25 = 1 \\mbox{ heads},\\newline \\sigma^2 \u0026amp;= \\sum_{i=1}^n x_i^2 f(x_i) -\\mu^2 = (0^0\\cdot 0.25 + 1^2\\cdot 0.5 + 2^2\\cdot 0.25) - 1^2 = 0.5 \\mbox{ heads}^2,\\newline \\sigma \u0026amp;= +\\sqrt{0.5} = 0.71 \\mbox{ heads}. \\end{aligned} $$\nDiscrete probability distribution models According to the type of experiment where the random variable is measured, there are different probability distributions models. The most common are\nDiscrete uniform Binomial Poisson Discrete uniform distribution $U(a,b)$ When all the values of a random variable $X$ have equal probability, the probability distribution of $X$ is uniform.\nDefinition - Discrete uniform distribution $U(a,b)$. A discrete random variable $X$ follows a discrete uniform distribution model with parameters $a$ and $b$, noted $X\\sim U(a,b)$, if its range is $\\mbox{Ran}(X) = {a, a+1, \\ldots,b}$ and its probability function is\n$$f(x)=\\frac{1}{b-a+1}.$$\nObserve that $a$ and $b$ are the minimum and the maximum of the range respectively.\nThe mean and the variance are\n$$\\mu = \\sum_{i=0}^{b-a}\\frac{a+i}{b-a+1}=\\frac{a+b}{2} \\qquad \\sigma^2 =\\sum_{i=0}^{b-a}\\frac{(a+i-\\mu)^2}{b-a+1}=\\frac{(b-a+1)^2-1}{12}$$\nExample. The variable that measures the outcome of rolling a dice follows a discrete uniform distribution model $U(1,6)$.\nBinomial distribution $B(n,p)$ Usually the binomial distribution corresponds to a variable measured in a random experiment with the following features:\nThe experiment consist in a sequence of $n$ repetitions of the same trial. Each trial is repeated in identical conditions and produces two possible outcomes known as Success or Failure. The trials are independent. The probability of Success is the same in all the trials and is $P(\\mbox{Success})=p$. Under these conditions, the discrete random variable $X$ that measures the number of successes in the $n$ trials follows a binomial distribution model with parameters $n$ and $p$.\nDefinition - Binomial distribution $(B(n,p)$. A discrete random variable $X$ follows a binomial distribution model with parameters $n$ and $p$, noted $X\\sim B(n,p)$, if its range is $\\mbox{Ran}(X) = {0,1,\\ldots,n}$ and its probability function is\n$$f(x) = \\binom{n}{x}p^x(1-p)^{n-x} = \\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}.$$\nObserve that $n$ is known as the number of repetitions of a trial and $p$ is known as the probability of Success in every repetition.\nThe mean and the variance are\n$$\\mu = n\\cdot p \\qquad \\sigma^2 = n\\cdot p\\cdot (1-p).$$\nExample. The variable that measures the number of heads after tossing 10 coins follows a binomial distribution model $B(10,0.5)$.\nAccording to this,\nThe probability of getting 4 heads is $$f(4) = \\binom{10}{4}0.5^4 (1-0.5)^{10-4} = \\frac{10!}{4!6!}0.5^40.5^6 = 210\\cdot 0.5^{10} = 0.2051.$$\nThe probability of getting 2 or less heads is $$\\begin{aligned} F(2) \u0026amp;= f(0) +f(1) + f(2) =\\newline \u0026amp;= \\binom{10}{0}0.5^0 (1-0.5)^{10-0} + \\binom{10}{1}0.5^1 (1-0.5)^{10-1} + \\binom{10}{2}0.5^2 (1-0.5)^{10-2} =\\newline \u0026amp;= 0.0547.\\end{aligned} $$\nAnd the expected number of heads is $$\\mu = 10\\cdot 0.5 = 5 \\mbox{ heads}.$$\nExample. In a population there are a 40% of smokers. The variable $X$ that measures the number of smokers in a random sample with replacement of 3 persons follows a binomial distribution model $X\\sim B(3,,0.4)$.\n$$ \\begin{align*} f(0)\u0026amp;=\\displaystyle\\binom{3}{0}0.4^0(1-0.4)^{3-0}= 0.6^3,\\newline f(1)\u0026amp;=\\displaystyle\\binom{3}{1}0.4^1(1-0.4)^{3-1}= 3\\cdot 0.4\\cdot 0.6^2,\\newline f(2)\u0026amp;=\\displaystyle\\binom{3}{2}0.4^2(1-0.4)^{3-2}= 3\\cdot 0.4^2\\cdot 0.6,\\newline f(3)\u0026amp;=\\displaystyle\\binom{3}{3}0.4^3(1-0.4)^{3-3}= 0.4^3. \\end{align*} $$\nPoisson distribution $P(\\lambda)$ Usually the Poisson distribution correspond to a variable measured in a random experiment with the following features:\nThe experiment consists of observing the number of events occurring in a fixed interval of time or space. For instance, number of births in a month, number of emails in one hour, number of red blood cells in a volume of blood, etc. The events occur independently. The experiment produces the same average rate of events $\\lambda$ for every interval unit. Under these conditions, the discrete random variable $X$ that measures the number of events in an interval unit follows a Poisson distribution model with parameter $\\lambda$.\nDefinition - Poisson distribution $P(\\lambda)$. A discrete random variable $X$ follows a Poisson distribution model with parameter $\\lambda$, noted $X\\sim P(\\lambda)$, if its range is $\\mbox{Ran}(X) = {0,1,\\ldots,\\infty}$ and its probability function is\n$$f(x) = e^{-\\lambda}\\frac{\\lambda^x}{x!}.$$\nObserve that $\\lambda$ is the average rate of event for an interval unit, and it will change if the interval changes.\nThe mean and the variance are\n$$\\mu = \\lambda \\qquad \\sigma^2 = \\lambda.$$\nExample. In a city there are an average of 4 births every day. The random variable $X$ that measures the number of births in a day in the city follows a Poisson distribution model $X\\sim P(4)$.\nAccording to this,\nThe probability that there are 5 births in a day is $$f(5) = e^{-4}\\frac{4^5}{5!} = 0.1563.$$\nThe probability that there are less than 2 births in a day is $$F(1) = f(0)+f(1) = e^{-4}\\frac{4^0}{0!} + e^{-4}\\frac{4^1}{1!} = 5e^{-4} = 0.0916.$$\nThe probability that there are more than 1 birth a day is $$P(X\u0026gt;1) = 1-P(X\\leq 1) = 1-F(1) = 1-0.0916 = 0.9084.$$\nApproximation of Binomial by Poisson distribution The Poisson distribution can be obtained from the Binomial distribution when the number of trials repetition tends to infinite and the probability of Success tends to zero.\nLaw or rare events. The Binomial distribution $X\\sim B(n,p)$ tends to the Poisson distribution $P(\\lambda)$, with $\\lambda=n\\cdot p$, when $n$ tends to infinite and $p$ tends to zero, that is,\n$$\\lim_{n\\rightarrow \\infty, p\\rightarrow 0}\\binom{n}{x}p^x(1-p)^{n-x} = e^{-\\lambda}\\frac{\\lambda^x}{x!}.$$\nIn practice, this approximation can be used for $n\\geq 30$ and $p\\leq 0.1$.\nExample. A vaccine produce an adverse reaction in 4% of cases. If a sample of 50 persons are vaccinated, what is the probability of having more than 2 persons with an adverse reaction?\nThe variable that measures the number of persons with an adverse reaction in the sample follows a Binomial distribution model $X\\sim B(50,0.04)$, but as $n=50\u0026gt;30$ and $p=0.04\u0026lt;0.1$, we can apply the law of rare events and use the Poisson distribution model $P(50\\cdot 0.04)=P(2)$ to do the calculations.\n$$ \\begin{aligned} P(X\u0026gt;2) \u0026amp;= 1-P(X\\leq 2) = 1-f(0)-f(1)-f(2) =\\newline \u0026amp;= 1-e^{-2}\\frac{2^0}{0!}-e^{-2}\\frac{2^1}{1!}-e^{-2}\\frac{2^2}{2!} =\\newline \u0026amp;= 1-5e^{-2} = 0.3233.\\end{aligned} $$\nTo distinguish population statistics from sample statistics we use Greek letters.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ba9fad09ad6c5312ddf502710f334f63","permalink":"/en/teaching/statistics/manual/discrete-random-variables/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/discrete-random-variables/","section":"teaching","summary":"Random variables The process of drawing a sample randomly is a random experiment and any variable measured in the sample is a random variable because the values taken by the variable in the individuals of the sample are a matter of chance.","tags":["Statistics","Biostatistics","Random Variables"],"title":"Discrete Random Variables","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Calculus formulas Main Calculus formulas Derivatives Integrals ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"85066dc0c1cbf4c700bd9b4a270786a4","permalink":"/en/teaching/calculus/cheatsheets/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/cheatsheets/","section":"teaching","summary":"Everything you have to know at a glance","tags":["Cheat sheet"],"title":"Calculus Cheat Sheets","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Exercise 1 Compute the derivative function of $f(x)=x^3-2x^2+1$ at the points $x=-1$, $x=0$ and $x=1$. Explain your result. Find an equation of the tangent line to the graph of $f$ at each of the three given points.\nSolution $f\u0026rsquo;(-1)=7$, $f\u0026rsquo;(0)=0$ y $f\u0026rsquo;(1)=-1$.\nTangent line at $x=-1$: $y=-2+7(x+1)$.\nTangent line at $x=0$: $y=1$.\nTangent line at $x=1$: $y=-(x-1)$. Exercise 2 The pH measures the concentration of hydrogen ions H$^+$ in an aqueous solution. It is defined by $$ \\mbox{pH} = -\\log_{10}(\\mbox{H}^+). $$ Compute the derivative of the pH as a function of the concentration of H$^+$. Study the growth of the pH function.\nSolution The pH decreases as the concentration of hydrogen ions H$^+$ increase. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"5d7a39f96e69b5aea4b51f6c60707876","permalink":"/en/teaching/calculus/problems/derivatives-1/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/problems/derivatives-1/","section":"teaching","summary":"Exercise 1 Compute the derivative function of $f(x)=x^3-2x^2+1$ at the points $x=-1$, $x=0$ and $x=1$. Explain your result. Find an equation of the tangent line to the graph of $f$ at each of the three given points.","tags":["Derivatives"],"title":"Problems of Derivatives","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 A dietary center is testing a new diet in sample of 12 persons. The data below are the number of days of diet and the weight loss (in kg) until them for every person.\n(33,3.9) (51,5.9) (30,3.2) (55,6) (38,4.9) (62,6.2) (35,4.5) (60,6.1) (44,5.6) (69,6.2) (47,5.8) (40,5.3) Draw the scatter plot. According to the point cloud, what type of regression model explains better the relation between the weight loss and the days of diet? Construct the linear regression model and the logarithmic regression model of the weight loss on the number of days of diet. Use the best model to predict the weight that will lose a person after 40 and 100 days of diet. Are these predictions reliable? Use the following sums ($X$=days of diet and $Y$=weight loss): $\\sum x_i=564$ days, $\\sum \\log(x_i)=45.8086$ $\\log(\\mbox{days})$, $\\sum y_j=63.6$ kg, $\\sum x_i^2=28234$ days$^2$, $\\sum \\log(x_i)^2=175.6603$ $\\log(\\mbox{days})^2$, $\\sum y_j^2=347.7$ kg$^2$, $\\sum x_iy_j=3108.5$ days$\\cdot$kg, $\\sum \\log(x_i)y_j=245.4738$ $\\log(\\mbox{days})\\cdot$kg.\nSolution 2. Linear model $\\bar x=47$ days, $s_x^2=143.8333$ days$^2$. $\\bar y=5.3$ kg, $s_y^2=0.885$ kg$^2$. $s_{xy}=9.9417$ days$\\cdot$kg. Regression line of weight loss on days of diet: $y=2.0514 + 0.0691x$. $r^2=0.7765$. Logartihmic model $\\overline{\\log(x)}=3.8174$ log(days), $s_{\\log(x)}^2=0.0659$ log(days)$^2$. $s_{\\log(x)y}=0.224$ log(days)$\\cdot$kg. Logartihmic model of weight loss on days of diet: $y=-7.6678 + 3.397\\log(x)$. $r^2=0.8599$. 3. $y(40)=4.8635$ kg and $y(100)=7.9761$ kg. The predictions are reliable because the coefficient of determination is close to 1, but the last one is less reiable as 100 is far from the observed range of values in the sample.\nExercise 2 The concentration of a drug in blood, in mg/dl, depends on time, in hours, according to the data below.\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Time} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8\\newline \\mbox{Drug concentration} \u0026amp; 25 \u0026amp; 36 \u0026amp; 48 \u0026amp; 64 \u0026amp; 86 \u0026amp; 114 \u0026amp; 168\\newline \\hline \\end{array} $$\nConstruct the linear regression model of drug concentration on time. Construct the exponential regression model of drug concentration on time. Use the best regression model to predict the drug concentration after $4.8$ hours? Is this prediction reliable? Justify your answer. Use the following sums ($C$=Drug concentration and $T$=time): $\\sum t_i=35$ h, $\\sum \\log(t_i)=10.6046$ $\\log(\\mbox{h})$, $\\sum c_j=541$ mg/dl, $\\sum \\log(c_j)= 29.147$ $\\log(\\mbox{mg/dl})$, $\\sum t_i^2=203$ h$^2$, $\\sum \\log(t_i)^2=17.5206$ $\\log(\\mbox{h})^2$, $\\sum c_j^2=56937$ (mg/dl)$^2$, $\\sum \\log(c_j)^2=124.0131$ $\\log(\\mbox{mg/dl})^2$, $\\sum t_ic_j=3328$ h$\\cdot$mg/dl, $\\sum t_i\\log(c_j)=154.3387$ h$\\cdot\\log(\\mbox{mg/dl})$, $\\sum \\log(t_i)c_j=951.6961$ $\\log(\\mbox{h})\\cdot$mg/dl, $\\sum\\log(t_i)\\log(c_j)=46.08046$ $\\log(\\mbox{h})\\cdot\\log(\\mbox{mg/dl})$.\nSolution $\\bar x=5$ hours, $s_x^2=4$ hours$^2$. $\\bar y=77.2857$ mg/dl, $s_y^2=2160.7755$ (mg/dl)$^2$. $s_{xy}=89$ hours$\\cdot$mg/dl. Regression line of drug concentration on time: $y=-33.9643 + 22.25x$. $r^2=0.9165$. $\\overline{\\log(y)}=4.1639$ log(mg/dl), $s_{\\log(y)}^2=0.3785$ log(mg/dl)$^2$. $s_{x\\log(y)}=1.2291$ hours$\\cdot$log(mg/dl). Exponential model of drug concentration on time: $y=e^{2.6275 + 0.3073x}$. $r^2=0.9979$. 3. $y(4.8)=60.4853$ mg/dl.\nExercise 3 A researcher is studying the relation between the obesity and the response to pain. The obesity is measured as the percentage over the ideal weight, and the response to pain as the nociceptive flexion pain threshold. The results of the study appears in the table below.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Obesity} \u0026amp; 89 \u0026amp; 90 \u0026amp; 77 \u0026amp; 30 \u0026amp; 51 \u0026amp; 75 \u0026amp; 62 \u0026amp; 45 \u0026amp; 90 \u0026amp; 20\\newline \\mbox{Pain threshold} \u0026amp; 10 \u0026amp; 12 \u0026amp; 11.5 \u0026amp; 4.5 \u0026amp; 5.5 \u0026amp; 7 \u0026amp; 9 \u0026amp; 8 \u0026amp; 15 \u0026amp; 3\\newline \\hline \\end{array} $$\nAccording to the scatter plot, what model explains better the relation of the response to pain on the obesity? According to the best regression model, what is the response to pain expected for a person with an obesity of 50%? Is this prection reliable? According to the best regression model, what is the expected obesity for a person with a pain threshold of 10? Is this prediction reliable? Use the following sums ($X$=Obesity and $Y$=Pain threshold): $\\sum x_i=629$, $\\sum \\log(x_i)=40.4121$, $\\sum y_j=92.2$, $\\sum \\log(y_j)=21.339$, $\\sum x_i^2=45445$, $\\sum \\log(x_i)^2=165.6795$, $\\sum y_j^2=960.14$, $\\sum \\log(y_j)^2=47.6231$, $\\sum x_iy_j=6537.7$, $\\sum x_i\\log(y_j)=1443.1275$, $\\sum \\log(x_i)y_j=387.5728$, $\\sum \\log(x_i)\\log(y_j)=88.3696$.\nSolution 2. Linear model $\\bar x=62.9$, $s_x^2=588.09$. $\\bar y=9.22$, $s_y^2=11.0056$. $s_{xy}=82.0356$. Regression line of pain threshold on obesity: $y=1.3232 + 0.1255x$. $r^2=0.8422$. Logartihmic model $\\overline{\\log(x)}=4.0412$, $s_{\\log(x)}^2=0.2366$. $s_{\\log(x)y}=1.4973$. Logartihmic model of pain threshold on obesity: $y=-16.3578 + 6.3293\\log(x)$. $r^2=0.8611$. $y(50)=8.4023$. 3.\nExponential model of obesity on pain threshold: $x=e^{2.7868 + 0.1361y}$. $x(10)=63.2648$.\nExercise 4 A blood bank keeps plasma at a temperature of 0ºF. When it is required for a blood transfusion, it is heated in an oven at a constant temperature of 120ºF. In an experiment it has been measured the temperature of plasma at different times during the heating. The results are in the table below.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Time (min)}\t\u0026amp; 5 \u0026amp; 8 \u0026amp; 15 \u0026amp; 25 \u0026amp; 30 \u0026amp; 37 \u0026amp; 45 \u0026amp; 60\\newline \\mbox{Temperature (ºF)} \u0026amp; 25 \u0026amp; 50 \u0026amp; 86 \u0026amp; 102 \u0026amp; 110 \u0026amp; 114 \u0026amp; 118 \u0026amp; 120\\newline \\hline \\end{array} $$\nPlot the scatter plot. Which type of regression model do you think explains better relationship between temperature and time? Which transformation should we apply to the variables to have a linear relationship? Compute the logarithmic regression of the temperature on time. According to the logarithmic model, what will the temperature of the plasma be after 15 minutes of heating? Is this prediction reliable? Justify your answer. Use the following sums ($X$=Time and $Y$=Temperature): $\\sum x_i=225$ min, $\\sum \\log(x_i)=24.5289$ log(min), $\\sum y_j=725$ ºF, $\\sum \\log(y_j)=35.2051$ log(ºF), $\\sum x_i^2=8833$ min², $\\sum \\log(x_i)^2=80.4703$ log²(min), $\\sum y_j^2=74345$ ºF², $\\sum \\log(y_j)^2=157.1023$ log²(ºF), $\\sum x_iy_j=24393$ min⋅ºF, $\\sum x_i\\log(y_j)=1048.0142$ min⋅log(ºF), $\\sum \\log(x_i)y_j=2431.7096$ log(min)⋅ºF, $\\sum \\log(x_i)\\log(y_j)=111.1165$ log(min)log(ºF).\nSolution A logarithmic model. 2. Apply a logarithmic transformation to time $z=\\log(x)$. $\\bar z=28.125$ log(min), $s_z^2=0.6577$ log²(min). $\\bar y=90.625$ ºF, $s_y^2=1080.2344$ ºF². $s_{zy}=26.0969$ log(min)ºF. Logarithmic model of temperature on time: $y=-31.0325 + 39.6781\\log(x)$. $y(15)=76.4176$ ºF. $r^2=0.9586$, that is close to 1, so the prediction is reliable. Exercise 5 The activity of a radioactive substance depends on time according to the data in the table below.\n$$ \\begin{array}{lrrrrrrrr} \\hline t\\mbox{ (hours)} \u0026amp; 0 \u0026amp; 10 \u0026amp; 20 \u0026amp; 30 \u0026amp; 40 \u0026amp; 50 \u0026amp; 60 \u0026amp; 70 \\newline A\\mbox{ ($10^7$ disintegrations/s)} \u0026amp; 25.9 \u0026amp; 8.16 \u0026amp; 2.57 \u0026amp; 0.81 \u0026amp; 0.25 \u0026amp; 0.08 \u0026amp; 0.03 \u0026amp; 0.01\\newline \\hline \\end{array} $$\nRepresent graphically the data of radioactivity as a function of time. Which type of regression model explains better the relationship between radioactivity and time? Represent graphically the data of radioactivity as a function of time in a semi-logarithmic paper. Compute the regression line of the logarithm of radioactivity on time. Taking into account that radioactivity decay follows the formula \\newline[ A(t) = A_0 e^{-\\lambda t} \\newline] where $A_0$ is the number of disintegrations at the begining and $\\lambda$ is a disintegration constant, different for each radioactive substance, use the slope of the previous regression line to compute the disintegration constant for the substance. Use the following sums ($X$=Time and $Y$=Radioactivity): $\\sum x_i=280$ hours, $\\sum y_j=37.81$ 10⁷ disintegrations/s, $\\sum \\log(y_j)=-5.9371$ log(10⁷ disintegrations/s), $\\sum x_i^2=14000$ hours², $\\sum y_j^2=744.7265$ 10⁷ disintegrations/s², $\\sum \\log(y_j)^2=57.7369$ log²(10⁷ disintegrations/s), $\\sum x_iy_j=173.8$ hours⋅10⁷ disintegrations/s, $\\sum x_i\\log(y_j)=-680.9447$ hours⋅log(10⁷ disintegrations/s).\nSolution 2. $\\bar x=35$ hours, $s_x^2=525$ hours². $\\bar z=-0.7421$ log(10⁷ disintegrations/s), $s_z^2=6.6664$ log(10⁷ disintegrations/s)^2. $s_{xz}=-59.1434$ hours⋅log(10⁷ disintegrations/s) Regression line of logarithm of radioactivity on time: $z=3.2008 + -0.1127x$. $\\lambda=0.1127$. Exercise 6 For oscillations of small amplitude, the oscillation period $T$ of a pendulum is given by the formula \\newline[ T = 2\\pi\\sqrt{\\frac{L}{g}} \\newline] where $L$ is the length of the pendulum and $g$ is the gravitational constant. In order to check if the previous formula is satisfied, an experiment has been conducted where it has been measured the oscillation period for different lengths of the pendulum.The measurements are shown in the table below.\n$$ \\begin{array}{lrrrrr} \\hline L\\text{ (cm)} \u0026amp; 52.5 \u0026amp; 68.0 \u0026amp; 99.0 \u0026amp; 116.0 \u0026amp; 146.0 \\newline P\\text{ (seg)} \u0026amp; 1.449 \u0026amp; 1.639 \u0026amp; 1.999 \u0026amp; 2.153 \u0026amp; 2.408\\newline \\hline \\end{array} $$\nRepresent graphically the data of the period versus the length of the pendulum.\nDoes a linear model fit well to the points cloud? Represent graphically the data of the period versus the length in a logarithmic paper. Which type of model fits better to the points cloud? Compute the regression line of the logarithm of period on the logarithm of length. Taking in to account the independent term of the previous regression line, compute the value of $g$. Solution The linear model fits well to the points cloud. 2. The model that best fits the points cloud is linear. 3. Let $X$ be the logarithm of length and $Y$ to the logarithm of period, $\\bar x=4.5025$ log(cm), $s_x^2=0.1353$ log(cm)². $\\bar y=0.6407$ log(s), $s_y^2=0.0339$ log(s)². $s_{xy}=0.0677$ log(cm)log(s)\nRegression line of Y on X: $y=-1.6132 + 0.5006x$. 4. $g=994.4579 cm/s².\nExercise 7 A study tries to determine the relationship between two substances $X$ and $Y$ in blood. The concentrations of these substances have been measured in seven individuals (in $\\mu$g/dl) and the results are shown in the table below.\n$$ \\begin{array}{rrrrrrrr} \\hline X \u0026amp; 2.1 \u0026amp; 4.9 \u0026amp; 9.8 \u0026amp; 11.7 \u0026amp; 5.9 \u0026amp; 8.4 \u0026amp; 9.2 \\newline Y \u0026amp; 1.3 \u0026amp; 1.5 \u0026amp; 1.7 \u0026amp; 1.8 \u0026amp; 1.5 \u0026amp; 1.7 \u0026amp; 1.7 \\newline \\hline \\end{array} $$\nAre $Y$ and $X$ linearly related? Are $Y$ and $X$ potentially related? Use the best of the previous regression models to predict the concentration in blood of $Y$ for $x=8$ $\\mu$gr/dl.Is this prediction reliable. Justify your answer. Use the following sums: $\\sum x_i=52$ μg/dl, $\\sum \\log(x_i)=13.1955$ log(μg/dl), $\\sum y_j=11.2$ μg/dl, $\\sum \\log(y_j)=3.253$ log(μg/dl), $\\sum x_i^2=451.36$ (μg/dl)², $\\sum \\log(x_i)^2=26.9397$ log(μg/dl)², $\\sum y_j^2=18.1$ (μg/dl)², $\\sum \\log(y_j)^2=1.5878$ log(μg/dl)², $\\sum x_iy_j=86.57$ (μg/dl)², $\\sum x_i\\log(y_j)=26.3463$ μg/dl⋅log(μg/dl), $\\sum \\log(x_i)y_j=21.7087$ log(μg/dl)⋅μg/dl, $\\sum \\log(x_i)\\log(y_j)=6.5224$ log(μg/dl)².\nSolution $\\bar x=7.4286$ μg/dl, $s_x^2=9.2963$ (μg/dl)². $\\bar z=-0.7421$ μg/dl, $s_z^2=6.6664$ (μg/dl)². $s_{xz}=-0.4147$ (μg/dl)²\nLinear relation: $r^2=0.9696$, that is close to 1, so there is a strong linear relation.\n2. Naming $u=\\log(x)$ and $v=\\log(y)$,\n$\\bar u=1.8851$ log(μg/dl), $s_u^2=0.295$ log(μg/dl)². $\\bar v=0.4647$ log(μg/dl), $s_v^2=0.0109$ log(μg/dl)². $s_{uv}=0.0558$ (μg/dl)²\nPotential relation: $r^2=0.9688$, that is close to 1, so there is a strong potential relation, although the linear relation is a little bit stronger.\n3. Regression line of $Y$ on $X$: $y=1.2153 + 0.0518x$. $y(8)=1.6296$ μg/dl. The prediction is reliable since the linear coefficient of determination is close to 1.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1601555270,"objectID":"240d83e0159a0490570775dba8ffaa8d","permalink":"/en/teaching/statistics/problems/non_linear_regression/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/non_linear_regression/","section":"teaching","summary":"Exercise 1 A dietary center is testing a new diet in sample of 12 persons. The data below are the number of days of diet and the weight loss (in kg) until them for every person.","tags":["Regression","Non-linear Regression"],"title":"Problems of Non Linear Regression","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Concept of derivative Increment Definition - Increment of a variable. An increment of a variable $x$ is a change in the value of the variable; it is denoted $\\Delta x$. The increment of a variable $x$ along an interval $[a,b]$ is given by $$\\Delta x = b-a.$$ Definition - Increment of a function. The increment of a function $y=f(x)$ along an interval $[a,b]\\subseteq Dom(f)$ is given by $$\\Delta y = f(b)-f(a).$$ Example. The increment of $x$ along the interval $[2,5]$ is $\\Delta x=5-2=3$, and the increment of the function $y=x^2$ along the same interval is $\\Delta y=5^2-2^2=21$.\nAverage rate of change The study of a function $y=f(x)$ requires to understand how the function changes, that is, how the dependent variable $y$ changes when we change the independent variable $x$.\nDefinition - Average rate of change. The average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]\\subseteq Dom(f)$, is the quotient between the increment of $y$ and the increment of $x$ in that interval; it is denoted by $$\\mbox{ARC}\\;f[a,a+\\Delta x]=\\frac{\\Delta y}{\\Delta x}=\\frac{f(a+\\Delta x)-f(a)}{\\Delta x}.$$ Example - Area of a square. Let $y=x^2$ be the function that measures the area of a metallic square of side length $x$.\nIf at any given time the side of the square is $a$, and we heat the square uniformly increasing the side by dilatation a quantity $\\Delta x$, how much will increase the area of the square?\n$$ \\Delta y = f(a+\\Delta x)-f(a)=(a+\\Delta x)^2-a^2= a^2+2a\\Delta x+\\Delta x^2-a^2=2a\\Delta x+\\Delta x^2. $$\nWhat is the average rate of change in the interval $[a,a+\\Delta x]$? $$\\mbox{ARC}\\;f[a,a+\\Delta x]=\\frac{\\Delta y}{\\Delta x}=\\frac{2a\\Delta x+\\Delta x^2}{\\Delta x}=2a+\\Delta x.$$\nGeometric interpretation of the average rate of change The average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]$ is the slope of the secant line to the graph of $f$ through the points $(a,f(a))$ and $(a+\\Delta x,f(a+\\Delta x))$.\nInstantaneous rate of change Often it is interesting to study the rate of change of a function, not in an interval, but in a point.\nKnowing the tendency of change of a function in an instant can be used to predict the value of the function in nearby instants.\nDefinition - Instantaneous rate of change and derivative. The instantaneous rate of change of a function $f$ in a point $a$, is the limit of the average rate of change of $f$ in the interval $[a,a+\\Delta x]$, when $\\Delta x$ approaches 0; it is denoted by\n$$ \\begin{aligned} \\textrm{IRC}\\;f (a) \u0026amp;= \\lim_{\\Delta x\\rightarrow 0} \\textrm{ARC}\\; f[a,a+\\Delta x]=\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta y}{\\Delta x}=\\newline \u0026amp;= \\lim_{\\Delta x\\rightarrow 0}\\frac{f(a+\\Delta x)-f(a)}{\\Delta x}. \\end{aligned} $$\nWhen this limit exists, the function $f$ is said to be differentiable at the point $a$, and its value is called the derivative of $f$ at $a$, and it is denoted $f\u0026rsquo;(a)$ (Lagrange’s notation) or $\\frac{df}{dx}(a)$ (Leibniz’s notation).\nExample - Area of a square. Let us take again the function $y=x^2$ that measures the area of a metallic square of side $x$.\nIf at any given time the side of the square is $a$, and we heat the square uniformly increasing the side, what is the tendency of change of the area in that moment?\n$$\\begin{aligned} \\textrm{IRC}\\;f(a)\u0026amp;=\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta y}{\\Delta x} = \\lim_{\\Delta x\\rightarrow 0}\\frac{f(a+\\Delta x)-f(a)}{\\Delta x} =\\newline \u0026amp;= \\lim_{\\Delta x\\rightarrow 0}\\frac{2a\\Delta x+\\Delta x^2}{\\Delta x}=\\lim_{\\Delta x\\rightarrow 0} 2a+\\Delta x= 2a. \\end{aligned} $$\nThus, $$f\u0026rsquo;(a)=\\frac{df}{dx}(a)=2a,$$ indicating that the area of the square tends to increase the double of the side.\nInterpretation of the derivative The derivative of a function $f\u0026rsquo;(a)$ shows the growth rate of $f$ at point $a$:\n$f\u0026rsquo;(a)\u0026gt;0$ indicates an increasing tendency ($y$ increases as $x$ increases). $f\u0026rsquo;(a)\u0026lt;0$ indicates a decreasing tendency ($y$ decreases as $x$ increases). Example. A derivative $f\u0026rsquo;(a)=3$ indicates that $y$ tends to increase triple of $x$ at point $a$. A derivative $f\u0026rsquo;(a)=-0.5$ indicates that $y$ tends to decrease half of $x$ at point $a$.\nGeometric interpretation of the derivative We have seen that the average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]$ is the slope of the secant line, but when $\\Delta x$ approaches $0$, the secant line becomes the tangent line.\nThe instantaneous rate of change or derivative of a function $y=f(x)$ at $x=a$ is the slope of the tangent line to the graph of $f$ at point $(a,f(a))$. Thus, the equation of the tangent line to the graph of $f$ at the point $(a,f(a))$ is $$y-f(a) = f\u0026rsquo;(a)(x-a) \\Leftrightarrow y = f(a)+f\u0026rsquo;(a)(x-a)$$\nKinematic applications: Linear motion Assume that the function $y=f(t)$ describes the position of an object moving in the real line at time $t$. Taking as reference the coordinates origin $O$ and the unitary vector $\\mathbf{i}=(1)$, we can represent the position of the moving object $P$ at every moment $t$ with a vector $\\vec{OP}=x\\mathbf{i}$ where $x=f(t)$.\nRemark. It also makes sense when $f$ measures other magnitudes as the temperature of a body, the concentration of a gas, or the quantity of substance in a chemical reaction at every moment $t$.\nKinematic interpretation of the average rate of change In this context, if we take the instants $t=t_0$ and $t=t_0+\\Delta t$, both in $\\mbox{Dom}(f)$, the vector $$\\mathbf{v}_m=\\frac{f(t_0+\\Delta t)-f(t_0)}{\\Delta t}$$ is known as the average velocity of the trajectory $f$ in the interval $[t_0, t_0+\\Delta t]$.\nExample. A vehicle makes a trip from Madrid to Barcelona. Let $f(t)$ be the function that determine the position of the vehicle at every moment $t$. If the vehicle departs from Madrid (km 0) at 8:00 and arrives at Barcelona (km 600) at 14:00, then the average velocity of the vehicle in the path is $$\\mathbf{v}_m=\\frac{f(14)-f(8)}{14-8}=\\frac{600-0}{6} = 100 km/h.$$\nKinematic interpretation of the derivative In the same context of the linear motion, the derivative of the function $f(t)$ at the moment $t_0$ is the vector\n$$\\mathbf{v}=f\u0026rsquo;(t_0)=\\lim_{\\Delta t\\rightarrow 0}\\frac{f(t_0+\\Delta t)-f(t_0)}{\\Delta t},$$\nthat is known, as long as the limit exists, as the instantaneous velocity or simply velocity of the trajectory $f$ at moment $t_0$.\nThat is, the derivative of the object position with respect to time is a vector field that is called velocity along the trajectory $f$.\nExample. Following with the previous example, what indicates the speedometer at any instant is the modulus of the instantaneous velocity vector at that moment.\nAlgebra of derivatives Properties of the derivative If $y=c$, is a constant function, then $y\u0026rsquo;=0$ at any point.\nIf $y=x$, is the identity function, then $y\u0026rsquo;=1$ at any point.\nIf $u=f(x)$ and $v=g(x)$ are two differentiable functions, then\n$(u+v)\u0026rsquo;=u\u0026rsquo;+v'$ $(u-v)\u0026rsquo;=u\u0026rsquo;-v'$ $(u\\cdot v)\u0026rsquo;=u\u0026rsquo;\\cdot v+ u\\cdot v'$ $\\left(\\dfrac{u}{v}\\right)\u0026rsquo;=\\dfrac{u\u0026rsquo;\\cdot v-u\\cdot v\u0026rsquo;}{v^2}$ Derivative of a composite function Theorem - Chain rule. If the function $y=f\\circ g$ is the composition of two functions $y=f(z)$ and $z=g(x)$, then $$(f\\circ g)\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x).$$ Proof It is easy to proof this fact using the Leibniz notation $$\\frac{dy}{dx}=\\frac{dy}{dz}\\frac{dz}{dx}=f\u0026rsquo;(z)g\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x).$$ Example. If $f(z)=\\sin z$ and $g(x)=x^2$, then $f\\circ g(x)=\\sin(x^2)$. Applying the chain rule the derivative of the composite function is $$(f\\circ g)\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x) = \\cos(g(x)) 2x = \\cos(x^2)2x.$$\nOn the other hand, $g\\circ f(z)= (\\sin z)^2$, and applying the chain rule again, its derivative is $$(g\\circ f)\u0026rsquo;(z)=g\u0026rsquo;(f(z))f\u0026rsquo;(z) = 2f(z)\\cos z = 2\\sin z\\cos z.$$\nDerivative of the inverse of a function Theorem - Derivative of the inverse function. Given a function $y=f(x)$ with inverse $x=f^{-1}(y)$, then $$\\left(f^{-1}\\right)\u0026rsquo;(y)=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{f\u0026rsquo;(f^{-1}(y))},$$ provided that $f$ is differentiable at $f^{-1}(y)$ and $f\u0026rsquo;(f^{-1}(y))\\neq 0$. Proof It is easy to prove this equality using the Leibniz notation $$\\frac{dx}{dy}=\\frac{1}{dy/dx}=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{f\u0026rsquo;(f^{-1}(y))}$$ Example. The inverse of the exponential function $y=f(x)=e^x$ is the natural logarithm $x=f^{-1}(y)=\\ln y$, so we can compute the derivative of the natural logarithm using the previous theorem and we get $$\\left(f^{-1}\\right)\u0026rsquo;(y)=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{e^x}=\\frac{1}{e^{\\ln y}}=\\frac{1}{y}.$$\nSometimes it is easier to apply the chain rule to compute the derivative of the inverse of a function. In this example, as $\\ln x$ is the inverse of $e^x$, we know that $e^{\\ln x}=x$, so differentiating both sides and applying the chain rule to the left side we get $$(e^{\\ln x})\u0026rsquo;=x\u0026rsquo; \\Leftrightarrow e^{\\ln x}(\\ln(x))\u0026rsquo; = 1 \\Leftrightarrow (\\ln(x))\u0026rsquo;=\\frac{1}{e^{\\ln x}}=\\frac{1}{x}.$$\nAnalysis of functions Analysis of functions: increase and decrease The main application of derivatives is to determine the variation (increase or decrease) of functions. For that we use the sign of the first derivative.\nTheorem. Let $f(x)$ be a function with first derivative in an interval $I\\subseteq \\mathbb{R}$.\nIf $\\forall x\\in I\\ f\u0026rsquo;(x)\u0026gt; 0$ then $f$ is increasing on $I$. If $\\forall x\\in I\\ f\u0026rsquo;(x)\u0026lt; 0$ then $f$ is decreasing on $I$. If $f\u0026rsquo;(x_0)=0$ then $x_0$ is known as a critical point or stationary point. At this point the function can be increasing, decreasing or neither increasing nor decreasing.\nExample The function $f(x)=x^2$ has derivative $f\u0026rsquo;(x)=2x$; it is decreasing on $\\mathbb{R}^-$ as $f\u0026rsquo;(x)\u0026lt; 0$ $\\forall x\\in \\mathbb{R}^-$ and increasing on $\\mathbb{R}^+$ as $f\u0026rsquo;(x)\u0026gt; 0$ $\\forall x\\in \\mathbb{R}^+$. It has a critical point at $x=0$, as $f\u0026rsquo;(0)=0$; at this point the function is neither increasing nor decreasing.\nA function can be increasing or decreasing on an interval and not have first derivative. Example. Let us analyze the increase and decrease of the function $f(x)=x^4-2x^2+1$. Its first derivative is $f\u0026rsquo;(x)=4x^3-4x$.\nAnalysis of functions: relative extrema As a consequence of the previous result we can also use the first derivative to determine the relative extrema of a function.\nTheorem - First derivative test. Let $f(x)$ be a function with first derivative in an interval $I\\subseteq \\mathbb{R}$ and let $x_0\\in I$ be a critical point of $f$ ($f\u0026rsquo;(x_0)=0$).\nIf $f\u0026rsquo;(x)\u0026gt;0$ on an open interval extending left from $x_0$ and $f\u0026rsquo;(x)\u0026lt;0$ on an open interval extending right from $x_0$, then $f$ has a relative maximum at $x_0$. If $f\u0026rsquo;(x)\u0026lt;0$ on an open interval extending left from $x_0$ and $f\u0026rsquo;(x)\u0026gt;0$ on an open interval extending right from $x_0$, then $f$ has a relative minimum at $x_0$. If $f\u0026rsquo;(x)$ has the same sign on both an open interval extending left from $x_0$ and an open interval extending right from $x_0$, then $f$ has an inflection point at $x_0$. A vanishing derivative is a necessary but not sufficient condition for the function to have a relative extrema at a point. Example. The function $f(x)=x^3$ has derivative $f\u0026rsquo;(x)=3x^2$; it has a critical point at $x=0$. However it does not have a relative extrema at that point, but an inflection point.\nExample. Consider again the function $f(x)=x^4-2x^2+1$ and let us analyze its relative extrema now. Its first derivative is $f\u0026rsquo;(x)=4x^3-4x$.\nAnalysis of functions: concavity The concavity of a function can be determined by de second derivative.\nTheorem. Let $f(x)$ be a function with second derivative in an interval $I\\subseteq \\mathbb{R}$.\nIf $\\forall x\\in I\\ f\u0026rsquo;\u0026rsquo;(x)\u0026gt; 0$ then $f$ is concave up (convex) on $I$. If $\\forall x\\in I\\ f\u0026rsquo;\u0026rsquo;(x)\u0026lt; 0$ then $f$ is concave down (concave) on $I$. Example. The function $f(x)=x^2$ has second derivative $f\u0026rsquo;\u0026rsquo;(x)=2\u0026gt;0$ $\\forall x\\in \\mathbb{R}$, so it is concave up in all $\\mathbb{R}$.\nA function can be concave up or down and not have second derivative. Example. Let us analyze the concavity of the same function of previous examples $f(x)=x^4-2x^2+1$. Its second derivative is $f\u0026rsquo;\u0026rsquo;(x)=12x^2-4$.\nFunction approximation Approximating a function with the derivative The tangent line to the graph of a function $f(x)$ at $x=a$ can be used to approximate $f$ in a neighbourhood of $a$.\nThus, the increment of a function $f(x)$ in an interval $[a,a+\\Delta x]$ can be approximated multiplying the derivative of $f$ at $a$ by the increment of $x$ $$\\Delta y \\approx f\u0026rsquo;(a)\\Delta x$$\nExample - Area of a square. In the previous example of the function $y=x^2$ that measures the area of a metallic square of side $x$, if the side of the square is $a$ and we increment it by a quantity $\\Delta x$, then the increment on the area will be approximately $$\\Delta y \\approx f\u0026rsquo;(a)\\Delta x = 2a\\Delta x.$$ In the figure below we can see that the error of this approximation is $\\Delta x^2$, which is smaller than $\\Delta x$ when $\\Delta x$ approaches to 0.\nApproximating a function by a polynomial Another useful application of the derivative is the approximation of functions by polynomials.\nPolynomials are functions easy to calculate (sums and products) with very good properties:\nDefined in all the real numbers. Continuous. Differentiable of all orders with continuous derivatives. Goal Approximate a function $f(x)$ by a polynomial $p(x)$ near a point $x=a$.\nApproximating a function by a polynomial of order 0 A polynomial of degree 0 has equation $$p(x) = c_0,$$ where $c_0$ is a constant.\nAs the polynomial should coincide with the function at $a$, it must satisfy $$p(a) = c_0 = f(a).$$\nTherefore, the polynomial of degree 0 that best approximate $f$ near $a$ is $$p(x) = f(a).$$\nApproximating a function by a polynomial of order 1 A polynomial of order 1 has equation $$p(x) = c_0+c_1x,$$ but it can also be written as $$p(x) = c_0+c_1(x-a).$$\nAmong all the polynomials of degree 1, the one that best approximates $f(x)$ near $a$ is that which meets the following conditions\n$p$ and $f$ coincide at $a$: $p(a) = f(a)$, $p$ and $f$ have the same rate of change at $a$: $p\u0026rsquo;(a) = f\u0026rsquo;(a)$. The last condition guarantees that $p$ and $f$ have approximately the same tendency, but it requires the function $f$ to be differentiable at $a$.\nImposing the previous conditions we have\n$p(x)=c_0+c_1(x-a) \\Rightarrow p(a)=c_0+c_1(a-a)=c_0=f(a)$, $p\u0026rsquo;(x)=c_1 \\Rightarrow p\u0026rsquo;(a)=c_1=f\u0026rsquo;(a)$. Therefore, the polynomial of degree 1 that best approximates $f$ near $a$ is $$p(x) = f(a)+f \u0026lsquo;(a)(x-a),$$ which turns out to be the tangent line to $f$ at $(a,f(a))$.\nApproximating a function by a polynomial of order 2 A polynomial of order 2 is a parabola with equation $$p(x) = c_0+c_1x+c_2x^2,$$ but it can also be written as $$p(x) = c_0+c_1(x-a)+c_2(x-a)^2.$$\nAmong all the polynomials of degree 2, the one that best approximate $f(x)$ near $a$ is that which meets the following conditions\n$p$ and $f$ coincide at $a$: $p(a) = f(a)$, $p$ and $f$ have the same rate of change at $a$: $p\u0026rsquo;(a) = f\u0026rsquo;(a)$. $p$ and $f$ have the same concavity at $a$: $p\u0026rsquo;\u0026rsquo;(a)=f\u0026rsquo;\u0026rsquo;(a)$. The last condition requires the function $f$ to be differentiable twice at $a$.\nImposing the previous conditions we have\n$p(x)=c_0+c_1(x-a) \\Rightarrow p(a)=c_0+c_1(a-a)=c_0=f(a)$, $p\u0026rsquo;(x)=c_1 \\Rightarrow p\u0026rsquo;(a)=c_1=f\u0026rsquo;(a)$. $p\u0026rsquo;\u0026rsquo;(x)=2c_2 \\Rightarrow p\u0026rsquo;\u0026rsquo;(a)=2c_2=f\u0026rsquo;\u0026rsquo;(a) \\Rightarrow c_2=\\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}$. Therefore, the polynomial of degree 2 that best approximates $f$ near $a$ is $$p(x) = f(a)+f\u0026rsquo;(a)(x-a)+\\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}(x-a)^2.$$\nApproximating a function by a polynomial of order $n$ A polynomial of order $n$ has equation $$p(x) = c_0+c_1x+c_2x^2+\\cdots +c_nx^n,$$ but it can also be written as $$p(x) = c_0+c_1(x-a)+c_2(x-a)^2+\\cdots +c_n(x-a)^n.$$\nAmong all the polynomials of degree $n$, the one that best approximate $f(x)$ near $a$ is that which meets the following $n+1$ conditions:\n$p(a) = f(a)$, $p\u0026rsquo;(a) = f\u0026rsquo;(a)$, $p\u0026rsquo;\u0026rsquo;(a)=f\u0026rsquo;\u0026rsquo;(a)$, $\\cdots$ $p^{(n)}(a)=f^{(n)}(a)$. The successive derivatives of $p$ are\n$$ \\begin{aligned} p(x) \u0026amp;= c_0+c_1(x-a)+c_2(x-a)^2+\\cdots +c_n(x-a)^n,\\newline p\u0026rsquo;(x)\u0026amp; = c_1+2c_2(x-a)+\\cdots +nc_n(x-a)^{n-1},\\newline p\u0026rsquo;\u0026rsquo;(x)\u0026amp; = 2c_2+\\cdots +n(n-1)c_n(x-a)^{n-2},\\newline \\vdots \\newline p^{(n)}(x)\u0026amp;= n(n-1)(n-2)\\cdots 1 c_n=n!c_n. \\end{aligned} $$\nImposing the previous conditions we have\n$p(a) = c_0+c_1(a-a)+c_2(a-a)^2+\\cdots +c_n(a-a)^n=c_0=f(a)$, $p\u0026rsquo;(a) = c_1+2c_2(a-a)+\\cdots +nc_n(a-a)^{n-1}=c_1=f\u0026rsquo;(a)$, $p\u0026rsquo;\u0026rsquo;(a) = 2c_2+\\cdots +n(n-1)c_n(a-a)^{n-2}=2c_2=f\u0026rsquo;\u0026rsquo;(a)\\Rightarrow c_2=f\u0026rsquo;\u0026rsquo;(a)/2$, $\\cdots$ $p^{(n)}(a)=n!c_n=f^{(n)}(a)=c_n=\\frac{f^{(n)}(a)}{n!}$. Taylor polynomial of order $n$ Definition - Taylor polynomial. Given a function $f(x)$ differentiable $n$ times at $x=a$, the Taylor polynomial of order $n$ of $f$ at $a$ is the polynomial with equation\n$$ \\begin{aligned} p_{f,a}^n(x) \u0026amp;= f(a) + f\u0026rsquo;(a)(x-a) + \\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}(x-a)^2 + \\cdots + \\frac{f^{(n)}(a)}{n!}(x-a)^n = \\newline \u0026amp;= \\sum_{i=0}^{n}\\frac{f^{(i)}(a)}{i!}(x-a)^i. \\end{aligned} $$\nThe Taylor polynomial of order $n$ of $f$ at $a$ is the $n$th degree polynomial that best approximates $f$ near $a$, as is the only one that meets the previous conditions. Example. Let us approximate the function $f(x)=\\log x$ near the value $1$ by a polynomial of order $3$.\nThe equation of the Taylor polynomial of order $3$ of $f$ at $a=1$ is $$p_{f,1}^3(x)=f(1)+f\u0026rsquo;(1)(x-1)+\\frac{f\u0026rsquo;\u0026rsquo;(1)}{2}(x-1)^2+\\frac{f\u0026rsquo;\u0026rsquo;\u0026rsquo;(1)}{3!}(x-1)^3.$$ The derivatives of $f$ at $1$ up to order $3$ are\n$$ \\begin{array}{lll} f(x)=\\log x \u0026amp; \\quad \u0026amp; f(1)=\\log 1 =0,\\newline f\u0026rsquo;(x)=1/x \u0026amp; \u0026amp; f\u0026rsquo;(1)=1/1=1,\\newline f\u0026rsquo;\u0026rsquo;(x)=-1/x^2 \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;(1)=-1/1^2=-1,\\newline f\u0026rsquo;\u0026rsquo;\u0026rsquo;(x)=2/x^3 \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;\u0026rsquo;(1)=2/1^3=2. \\end{array} $$\nAnd substituting into the polynomial equation we get $$p_{f,1}^3(x)=0+1(x-1)+\\frac{-1}{2}(x-1)^2+\\frac{2}{3!}(x-1)^3= \\frac{2}{3}x^3-\\frac{3}{2}x^2+3x-\\frac{11}{6}.$$\nMaclaurin polynomial of order $n$ The Taylor polynomial equation has a simpler form when the polynomial is calculated at $0$. This special case of Taylor polynomial at $0$ is known as the Maclaurin polynomial.\nDefinition - Maclaurin polynomial. Given a function $f(x)$ differentiable $n$ times at $0$, the Maclaurin polynomial of order $n$ of $f$ is the polynomial with equation\n$$ \\begin{aligned} p_{f,0}^n(x)\u0026amp;=f(0)+f\u0026rsquo;(0)x+\\frac{f\u0026rsquo;\u0026rsquo;(0)}{2}x^2+\\cdots +\\frac{f^{(n)}(0)}{n!}x^n = \\newline \u0026amp;=\\sum_{i=0}^{n}\\frac{f^{(i)}(0)}{i!}x^i. \\end{aligned} $$\nExample. Let us approximate the function $f(x)=\\sin x$ near the value $0$ by a polynomial of order $3$.\nThe Maclaurin polynomial equation of order $3$ of $f$ is $$p_{f,0}^3(x)=f(0)+f\u0026rsquo;(0)x+\\frac{f\u0026rsquo;\u0026rsquo;(0)}{2}x^2+\\frac{f\u0026rsquo;\u0026rsquo;\u0026rsquo;(0)}{3!}x^3.$$ The derivatives of $f$ at $0$ up to order $3$ are\n$$\\begin{array}{lll} f(x)=\\sin x \u0026amp; \\quad \u0026amp; f(0)=\\sin 0 =0,\\newline f\u0026rsquo;(x)=\\cos x \u0026amp; \u0026amp; f\u0026rsquo;(0)=\\cos 0=1,\\newline f\u0026rsquo;\u0026rsquo;(x)=-\\sin x \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;(0)=-\\sin 0=0,\\newline f\u0026rsquo;\u0026rsquo;\u0026rsquo;(x)=-\\cos x \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;\u0026rsquo;(0)=-\\cos 0=-1. \\end{array} $$\nAnd substituting into the polynomial equation we get $$p_{f,0}^3(x)=0+1\\cdot x+\\frac{0}{2}x^2+\\frac{-1}{3!}x^3= x-\\frac{x^3}{6}.$$\nMaclaurin polynomials of elementary functions $$ \\renewcommand{\\arraystretch}{2.5} \\begin{array}{cc} \\hline f(x) \u0026amp; p_{f,0}^n(x) \\newline \\hline \\sin x \u0026amp; \\displaystyle x - \\frac{x^3}{3!} + \\frac{x^5}{5!} - \\cdots + (-1)^k\\frac{x^{2k-1}}{(2k-1)!} \\mbox{ if $n=2k$ or $n=2k-1$}\\newline \\cos x \u0026amp; \\displaystyle 1 - \\frac{x^2}{2!} + \\frac{x^4}{4!} - \\cdots + (-1)^k\\frac{x^{2k}}{(2k)!} \\mbox{ if $n=2k$ or $n=2k+1$}\\newline \\arctan x \u0026amp; \\displaystyle x - \\frac{x^3}{3} + \\frac{x^5}{5} - \\cdots + (-1)^k\\frac{x^{2k-1}}{(2k-1)} \\mbox{ if $n=2k$ or $n=2k-1$}\\newline e^x \u0026amp; \\displaystyle 1 + x + \\frac{x^2}{2!} + \\frac{x^3}{3!} + \\cdots + \\frac{x^n}{n!}\\newline \\log(1+x) \u0026amp; \\displaystyle x - \\frac{x^2}{2} + \\frac{x^3}{3} - \\cdots + (-1)^{n-1}\\frac{x^n}{n}\\newline \\hline \\end{array} $$\nTaylor remainder and Taylor formula Taylor polynomials allow to approximate a function in a neighborhood of a value $a$, but most of the times there is an error in the approximation.\nDefinition - Taylor remainder. Given a function $f(x)$ and its Taylor polynomial of order $n$ at $a$, $p_{f,a}^n(x)$, the Taylor remainder of order $n$ of $f$ at $a$ is the difference between the function and the polynomial,\n$$r_{f,a}^n(x)=f(x)-p_{f,a}^n(x).$$\nThe Taylor remainder measures the error int the approximation of $f(x)$ by the Taylor polynomial and allow us to express the function as the Taylor polynomial plus the Taylor remainder\n$$f(x)=p_{f,a}^n(x) + r_{f,a}^n(x).$$\nThis expression is known as the Taylor formula of order $n$ or $f$ at $a$.\nIt can be proved that\n$$\\lim_{h\\rightarrow 0}\\frac{r_{f,a}^n(a+h)}{h^n}=0,$$\nwhich means that the remainder $r_{f,a}^n(a+h)$ is much smaller than $h^n$.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1606295051,"objectID":"c2c1680f6068dcb308d49e1be4b37a9b","permalink":"/en/teaching/calculus/manual/derivatives-one-variable/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/derivatives-one-variable/","section":"teaching","summary":"Concept of derivative Increment Definition - Increment of a variable. An increment of a variable $x$ is a change in the value of the variable; it is denoted $\\Delta x$. The increment of a variable $x$ along an interval $[a,b]$ is given by $$\\Delta x = b-a.","tags":["Derivative","Tangent Line"],"title":"One variable differential calculus","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"A picture is worth a thousand words. That\u0026rsquo;s why data is usually presented in a graphical form, and for that reason spreadsheets provide different types of charts. This section presents the main chart types and how to plot them in Excel 2010.\nCharts creation Regardless the chart type, the steps to create a chart are:\nSelect the range that contains the data to plot. Data should be arranged in series (vertically or horizontally) following the next rules:\nDo not leave empty rows or columns within the data range or between data labels and data. Only one row and/or one column should be used for data labels. Each data label should be unique. Select the type of chart from the Charts panel on the ribbon\u0026rsquo;s Insert tab.\nSet the chart design (data serie to plot, order, etc.). You can use the ribbon\u0026rsquo;s Design tab.\nApply a layout (title, axis, legend, grids, data labels, etc.). You can use the ribbon\u0026rsquo;s Layout tab.\nApply a style format (text, line and background colours). You can use the ribbon\u0026rsquo;s Format tab.\nCharts are embedded in the same worksheet that data by default but it\u0026rsquo;s possible to put it on a separate worksheet. For that right-clicking the chart background and select Move chart. In the dialog that appears select New sheet give a name to the worksheet a click OK.\nCharts are linked to data from which they come. This means that any change in the data will be immediately reflected in any derived chart.\nTypes of charts There are eleven major chart types (Column, Line, Pie, Bar, Area, Scatter, Stock, Surface, Doughnut, Bubble and Radar) and each has many subtypes.\nEach chart type has a purpose and requires data to be arranged in a particular way. So choosing the right chart is probably the most important decision. The main chart types and their purpose are presented below.\nColumn and bar charts A column or bar chart is a set of bars (usually rectangles) graphed over an horizontal and vertical axis (also known as XY axis). Each bar is graphed over the corresponding category with a length proportional to the value of the category in the data serie. Usually more than one data serie are plotted and bars corresponding to different series are differentiated with colours. In a column chart, categories appear horizontally and values appear vertically, whereas in a bar chart, categories appear vertically. Column charts, unlike bar charts, is suitable for emphasizing data variations over a period of time.\nExample. The next figure shows a column chart showing the evolution of fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (longest bars) and apples the cheapest (shortest bars) along the time. Also that the prices of strawberries and bananas are decreasing, the prices of oranges are increasing and the prices of apples are almost stables.\nExcel offers a lot of shapes for the bars (rectangles, cylinders, cones, pyramids) in 2-D an 3-D, and allows to stack bars. Also is possible to add error bars to the bars.\nExample. The animation below shows how to create a column chart for the apple prices evolution (one data serie).\nAnd the animation below shows how to create a column chart for the fruit prices evolution (several data series).\nLine charts A line chart display a serie of data points called markers connected by straight line segments. Each marker is graphed over the corresponding category at a height proportional to the value of the category in the data serie. It\u0026rsquo;s similar to a column chart but using markers at the end of bars instead of bars, and joining them with straight line segments. Line charts are suitable for displaying and comparing trends over a period of time.\nExample. The next figure shows a line chart showing the evolution of fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (higher markers) and apples the cheapest (lowest markers) along the time. Also that the prices of strawberries and bananas are decreasing (lines with negative slope), the prices of oranges are increasing (line with positive slope) and the prices of apples first decrease an then increases.\nExcel offers different subtypes of line charts, with or without data points in 2-D and 3-D, and also allows to stack lines.\nExample. The animation below shows how to create a line chart for the fruit prices evolution. Looking at the chart you can quickly realize which prices are increasing and which prices are decreasing.\nArea charts An area chart is similar to a line chart but filling the area between the line and the horizontal axis. Area charts are suitable for displaying the relative importance of values over time. It\u0026rsquo;ss similar to a line chart, but because the area between lines is filled in, the area chart puts greater emphasis on the magnitude of values and less emphasis on the flow of change over time.\nExample. The next figure shows an area chart showing the evolution of accumulated fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (the largest area) and that accumulated prices are decreasing.\nExcel allows to plot areas in 2-D or 3-D and also to stack areas.\nExample. The animation below shows how to create an area chart for the evolution of accumulated fruit prices.\nPie charts A pie chart is a circle divided into slices called sectors. Each sector represents a category of the data serie an has an angle or area proportional to the quantity that correspond to the category.\nPie charts are suitable for displaying the parts of a whole. Unlike the other charts presented so far, which can graph multiple data series, pie charts can graph just one data series.\nExample. The next figure shows a pie chart comparing fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (biggest sector) and apples are the cheapest (smallest sector).\nAgain Excel has several subtypes that allows you to emphasize a part of the whole in 2-D or 3-D.\nExample. The animation below shows how to create a pie chart comparing the fruit prices of January.\nDoughnut charts Doughnut charts are similar to pie charts except for its ability to display more than one data series.\nExample. The next figure shows a doughnut chart comparing fruit prices in January and April. The inner doughnut correspond to prices of January and the outer to prices of April. Looking at the chart you can quickly realize that, although the price of apples were smaller in April than in January, it was relatively higher in April than in January, compared to the rest of fruit prices.\nExample. The animation below shows how to create a doughnut chart comparing the fruit prices in January and April.\nXY Scatter charts An XY scatter chart is a point cloud graphed using Cartesian coordinates. Each point correspond to a pair of values. The first value of the pair determines the position on the horizontal axis and the second value of the pair determines the position on the vertical axis. XY Scatter charts are suitable for displaying correlation among the data pairs of two numeric variables.\nExample. The next figure shows an XY Scatter chart relating banana and strawberry prices. Looking at the chart you can quickly realize that there is a positive correlation (when banana price increase, strawberry price increase too).\nExample. The animation below shows how to create an XY Scatter chart relating banana and strawberry prices.\nHistograms A histogram is a graphical representation of the distribution of numerical data. It\u0026rsquo;s similar to a column chart but data values are grouped into interval classes and each bar represents a class. Histograms charts are suitable for displaying frequency of data values in one numeric variable.\nTo plot an histogram previously is required to load the Analysis ToolPak add-in.\nExample. The animation below shows how to create an histogram of the grades in a course.\nChart design Changing the data source You can change the data range graphed in a chart anytime clicking the Select Data button of the Data panel on the ribbon\u0026rsquo;s Design tab. This brings a dialog where you can select the new data range, switch rows and columns series, add new data series to graph and their labels, remove or edit existing data series or change the order in which are graphed in the chart.\nObserve that is possible to plot in the same chart data in separated ranges.\nExample. The animation below shows how to add the orange prices data serie to a column chart for the apple prices evolution.\nSwitching rows and columns When Excel creates a new chart with x and y axis, it automatically graphs the data by rows in the selected range so that the column headings appear along the horizontal axis and the row headings appear in the legend. If you want to switch from row series to column series, that is, that row headings appear on the horizontal axis and the column headings appear in the legend, click the Switch Row/Column button of the Data panel on the ribbon\u0026rsquo;s Design tab.\nExample. The animation below shows how to switch from row series to column series in a column chart for the fruit prices evolution.\nChart layout After creating a chart you can add new layout elements like chart titles, axis titles, legends, data labels, grids, trend lines, error bars, etc. or modify the existing ones.\nTo format any element of a chart right-click the element (bar, line, title, axis, legend, etc) and select the corresponding option at the bottom of the contextual menu. This will open a dialog where you can perform the desired changes for the selected element.\nTitles You can add a title to the chart selecting the chart and clicking the Chart Title button of the Labels panel on the ribbon\u0026rsquo;s Layout tab. That will show a drop down menu that let you choose between a centered overlay title (inside the chart area) or an above chart (outside the chart area).\nExample. The animation below shows how to add a title to a column chart for the fruits prices evolution and how to change the font colour.\nAxes You can add a title to the horizontal or vertical axes selecting the chart and clicking the Axis Title button of the Labels panel on the ribbon\u0026rsquo;s Layout tab.\nExample. The animation below shows how to add a title to the horizontal and vertical axes of a column chart for the fruits prices evolution. The vertical axis title is rotated 90 degrees.\nOne of the most important parts of a chart are axis scales. Excel allows you to configure the axis scale setting the minimum and maximum showed in the axis, the major and minor units, the format of thick marks (small lines intersecting axis that indicate categories, scale units or chart data series) and their labels, or even the scale type (linear by default or logarithmic). To configure an axis right-click any label of the axis (not the axis title) and select the Format Axis option from the contextual menu. This will open a dialog with a lot of axis options. Change whatever you want and click Close.\nExample. The animation below shows how to change the scales of the horizontal and vertical axes of a column chart for the apple prices evolution. Observe that in the original chart the minimum value of the vertical axis scale is 1.26, what magnify the differences between month prices. To avoid that the minimum value of vertical scale is set to €0, and the major unit is set to €0.1. Also the format of tick marks labels is changed to currency with two decimal places. On the other hand, the tick marks labels of the horizontal axis are rotated 30 degrees counterclockwise.\nGrid A grid is composed of horizontal or vertical lines (usually equally spaced) over the axes. Grids are helpful to mark out more precisely the position of markers, bars, lines or other chart elements in the axis scales.\nExcel allows to plot both horizontal and vertical grid lines for major and minor tick marks. To plot vertical grid lines right-click any label of the horizontal axis and select the Add Major Gridlines option for drawing lines over the major tick marks, or Add Minor Gridlines for drawing lines over the minor tick marks. To plot horizontal grid lines do the same but right-clicking any label of the vertical axis. Once the grid line is plotted you can change its format right-clicking any label of the axis and selecting the Format Major Gridlines or Format Minor Gridlines option.\nExample. The animation below shows how add vertical major grid lines and horizontal minor grid lines. Also show how to change the line style of minor grid lines.\nLegends A legend is key that identifies patterns, colors, or symbols associated with the markers of a chart data series. The legend shows the data series name corresponding to each data marker.\nExcel usually plots a legend to the right of the chart but it\u0026rsquo;s possible to change the legend to other position or to remove it. To plot the lenged of a chart click the Legend button of the Labels panel on the ribbon\u0026rsquo;s Layout tab. This shows a drop down menu with different positions for the legend. After plotting the legend, if you want to format it right-click it and select Format Legend. This will open a dialog where you can choose the legend position, the frame and background colours and many other legend aspects. Finally if you want to remove a legend, just select it and press the Supr key.\nExample. The animation below shows how add a legend for the fruits to the right of a column chart with the fruit prices evolution. Also it shows how to plot a frame around the legend and how to move the legend to the top.\nData series The aspect of any graphic element used to represent a data serie in a chart (bars, markers, lines, sectors, etc) can be easily changed. To format the graphic element corresponding to a data serie right-click it and select the Format Data Series option. This will open a dialog where you can change the shape, border and background colours, space between elements, and many other aspects. It\u0026rsquo;s also possible to format only one element of the serie. For that you need to click it two times (not double-clicking), then right-click it and select the Format Data Point option.\nExample. The animation below shows how to change the background colour of orange bars in a column chart for the fruits prices evolution. It also shows how to add a glow effect over the highest bar.\nData labels Sometime is useful to plot the values for a data serie next to their bars, markers, lines, sectors or other chart elements. To plot the values of a data serie right-click the chart element (bar, marker, line, sector, etc) corresponding to the data serie and select the Data Labels option. This will plot the value corresponding to each bar, marker, sector, etc. close to it.\nExample. The animation below shows how add a legend for the fruits to the right of a column chart with the fruit prices evolution. Also it shows how to plot a frame around the legend and how to move the legend to the top.\nChart styles Finally, the Chart styles panel on the ribbon\u0026rsquo;s Design tab has many predefined chart styles that combine different colours for graphics elements and backgrounds. Apply one of those styles is as easy as select the chart an click the desired style.\nAlso, the Shape styles panel on the ribbon\u0026rsquo;s Format tab have predefined styles for the background area and frame of the chart.\nExample. The animation below shows how to apply some chart and shape styles to a column chart with the fruit prices evolution.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"73bbf06356c8b605d786a699a05bfcb7","permalink":"/en/teaching/excel/manual/charts/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/charts/","section":"teaching","summary":" ","tags":["Excel"],"title":"Plotting Charts","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Probability distribution of a continuous random variable Continuous random variables, unlike discrete random variables, can take any value in a real interval. Thus the range of a continuous random variables is infinite and uncountable.\nSuch a density of values makes impossible to compute the probability for each one of them, and therefore, it’s not possible to define a probabilistic model trough a probability function like with discrete random variables.\nBesides, usually the measurement of continuous random variable is limited by the precision of the measuring instrument. For instance, when somebody says that is 1.68 meters tall, his or her true height is no exactly 1.68 meters, because the precision of the measuring instrument is only cm (two decimal places). This means that the true height of that person is between 1.675 y 1.685 meters.\nHence, for continuous variables, it makes no sense to calculate the probability of an isolated value, and we will calculate probabilities for intervals.\nProbability density function To model the probability distribution of a continuous random variable we use a probability density function.\nDefinition - Probability density function. The probability density function of a continuous random variable $X$ is a function $f(x)$ that meets the following conditions:\nIt is non-negative: $f(x)\\geq 0$ $\\forall x\\in \\mathbb{R}$,\nThe area bounded by the curve of the density function and the x-axis is equal to 1, that is,\n$$\\int_{-\\infty}^{\\infty} f(x)\\; dx = 1.$$\nThe probability that $X$ assumes a value between $a$ and $b$ is equal to the area bounded by the density function and the x-axis from $a$ to $b$, that is,\n$$P(a\\leq X\\leq b) = \\int_a^b f(x)\\; dx$$\nThe probability density function measures the relative likelihood of every value, but $f(x)$ is not the probability of $x$, cause $P(X=x)=0$ for every $x$ value by definition. Distribution function The same way that for discrete random variables, for continuous random variables it makes sense to calculate cumulative probabilities.\nDefinition - Distribution function. The distribution function of a continuous random variable $X$ is a function $F(x)$ that maps every value $a$ to the probability that $X$ takes on a value less than or equal to $a$, that is,\n$$F(a) = P(X\\leq a) = \\int_{-\\infty}^{a} f(x)\\; dx.$$\nProbabilities as areas To calculate probabilities with a continuous random variable we measure the area bounded by the probability density function and the x-axis in an interval.\nThis area can be calculated integrating the density function or subtracting the distribution function that is easier,\n$$P(a\\leq X\\leq b) = \\int_a^b f(x), dx = F(b)-F(a)$$\nExample. Given the following function\n$$ f(x) = \\begin{cases} 0 \u0026amp; \\mbox{if $x\u0026lt;0$} \\newline e^{-x} \u0026amp; \\mbox{if $x\\geq 0$}, \\end{cases} $$\nlet’s check that is a density function.\nAs this function is clearly non-negative, we have to check that total area bounded by the curve and the x-axis is 1.\n$$ \\begin{align*} \\int_{-\\infty}^\\infty f(x)\\;dx \u0026amp;= \\int_{-\\infty}^0 f(x)\\;dx +\\int_0^\\infty f(x)\\;dx = \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty e^{-x}\\;dx =\\newline \u0026amp;= \\left[-e^{-x}\\right]^{\\infty}_0 = -e^{-\\infty}+e^0 = 1. \\end{align*} $$\nNow, let’s calculate the probability of $X$ having a value between 0 and 2.\n$$ \\begin{align*} P(0\\leq X\\leq 2) \u0026amp;= \\int_0^2 f(x)\\;dx = \\int_0^2 e^{-x}\\;dx = \\left[-e^{-x}\\right]^2_0 = -e^{-2}+e^0 = 0.8646. \\end{align*} $$\nPopulation statistics The calculation of the population statistics is similar to the case of discrete variables, but using the density function instead of the probability function, and extending the discrete sum to the integral.\nThe most important are:\nDefinition - Continuous random variable mean The mean or the expectec value of a continuous random variable $X$ is the integral of the products of its values and its probabilities:\n$$\\mu = E(X) = \\int_{-\\infty}^\\infty x f(x)\\; dx$$\nDefinition - Continuous random variable variance and standard deviation The variance of a continuous random variable $X$ is the integral of the products of its squared values and its probabilities, minus the squared mean:\n$$\\sigma^2 = Var(X) = \\int_{-\\infty}^\\infty x^2f(x)\\; dx -\\mu^2$$\nThe standard deviation of a random variable $X$ is the square root of the variance:\n$$\\sigma = +\\sqrt{\\sigma^2}$$\nExample. Let $X$ be a variable with the following probability density function\n$$ f(x) = \\begin{cases} 0 \u0026amp; \\mbox{si $x\u0026lt;0$}\\newline e^{-x} \u0026amp; \\mbox{si $x\\geq 0$} \\end{cases} $$\nThe mean is\n$$ \\begin{aligned} \\mu \u0026amp;= \\int_{-\\infty}^\\infty xf(x)\\;dx = \\int_{-\\infty}^0 xf(x)\\;dx +\\int_0^\\infty xf(x)\\;dx = \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty xe^{-x}\\;dx =\\newline \u0026amp;= \\left[-e^{-x}(1+x)\\right]_0^{\\infty} = 1. \\end{aligned} $$\nand the variance is\n$$ \\begin{aligned} \\sigma^2 \u0026amp;= \\int_{-\\infty}^\\infty x^2f(x)\\;dx -\\mu^2 = \\int_{-\\infty}^0 x^2f(x)\\;dx +\\int_0^\\infty x^2f(x)\\;dx -\\mu^2 = \\newline \u0026amp;= \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty x^2e^{-x}\\;dx -\\mu^2= \\left[-e^{-x}(x^2+2x+2)\\right]^{\\infty}_0 - 1^2 = \\newline \u0026amp;= 2e^0-1 = 1. \\end{aligned} $$\nContinuous probability distribution models According to the type of experiment where the random variable is measured, there are different probability distributions models. The most common are\nContinuous uniform. Normal. Student’s T. Chi-square. Fisher-Snedecor’s F. Continuous uniform distribution When all the values of a random variable $X$ have equal probability, the probability distribution of $X$ is uniform.\nDefinition \u0026ndash; Continuous uniform distribution $U(a,b)$. A continuous random variable $X$ follows a probability distribution model uniform of parameters $a$ and $b$, noted $X\\sim U(a,b)$, if its range is $\\mbox{Ran}(X) = [a,b]$ and its density function is\n$$f(x)= \\frac{1}{b-a}\\quad \\forall x\\in [a,b]$$\nObserve that $a$ and $b$ are the minimum and the maximum of the range respectively, and that the density function is constant.\nThe mean and the variance are $$\\mu = \\frac{a+b}{2}$$ and $$\\sigma^2=\\frac{(b-a)^2}{12}.$$\nExample. The generation of a random number between 0 and 1 is follows a continuous uniform distribution $U(0,1)$.\nAs the density function is constant, the distribution function has a linear growth.\nExample. A bus has a frequency of 15 minutes. Assuming that a person can arrive to the bus station in any time, what is the probability of waiting for the bus between 5 and 10 minutes?\nIn this case, the variable $X$ that measures the waiting time follows a continuous uniform distribution $U(0,15)$ as any waiting time between 0 and 15 is equally likely.\nThen, the probability of waiting between 5 and 10 minutes is\n$$ \\begin{aligned} P(5\\leq X\\leq 10) \u0026amp;= \\int_{5}^{10} \\frac{1}{15}\\;dx = \\left[\\frac{x}{15}\\right]^{10}_5 = \\newline \u0026amp;= \\frac{10}{15}-\\frac{5}{15} =\\frac{1}{3}. \\end{aligned} $$\nAnd the expected waiting (the mean) time is $\\mu=\\frac{0+15}{2}=7.5$ minutes.\nNormal distribution The normal distribution model is, without a doubt, the most important continuous distribution model as it is the most common in Nature.\nDefinition - Normal distribution $N(\\mu,\\sigma)$. A continuous random variable $X$ follows a probability distribution model normal of parameters $\\mu$ and $\\sigma$, noted $X\\sim N(\\mu,\\sigma)$, if its range is $\\mbox{Ran}(X) = (-\\infty,\\infty)$ and its density function is\n$$f(x)= \\frac{1}{\\sigma\\sqrt{2\\pi}}e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}.$$\nThe two parameters $\\mu$ and $\\sigma$ are the mean and the standard deviation of the population respectively.\nThe plot of the probability density function of a normal distribution $N(\\mu,\\sigma)$ is bell shaped and it is known as a Gauss bell.\nThe bell shape depends on the mean $\\mu$ and the standard deviation $\\sigma$,\nThe mean $\\mu$ sets the center of the bell. The standard deviation sets $\\sigma$ the width of the bell. The plot of the distribution function of a normal distribution is S shaped.\nNormal distribution properties It is symmetric with respect to the mean, and therefore, the coefficient of skewness is zero, $g_1=0$. It is mesokurtic, as the density function is bell shaped, and so, the coefficient of kurtosis is zero, $g_2=0$. The mean, median and mode are the same $$\\mu = Me = Mo.$$ It asymptotically approaches 0 when $x$ tends to $\\pm \\infty$. $P(\\mu-\\sigma \\leq X \\leq \\mu+\\sigma) = 0.68$ $P(\\mu-2\\sigma \\leq X \\leq \\mu+2\\sigma) = 0.95$ $P(\\mu-3\\sigma \\leq X \\leq \\mu+3\\sigma) = 0.99$ Example. It is known that the cholesterol level in females of age between 40 and 50 follows a normal distribution with mean 210 mg/dl and standard deviation 20 mg/dl.\nAccording to the Gauss bell properties, this means that\nThe 68% of females have a cholesterol level between $210\\pm 20$ mg/dl, i.e., between 190 and 230 mg/dl. The 95% of females have a cholesterol level between $210\\pm 2\\cdot 20$ mg/dl, i.e., between 170 and 250 mg/dl. The 99% of females have a cholesterol level between $210\\pm 3\\cdot 20$ mg/dl, i.e., between 150 and 270 mg/dl. Example of blood analysis. In blood analysis it is common to use the interval $\\mu\\pm 2\\sigma$ to detect possible pathologies. In the case of cholesterol, this interval is $[170\\text{ mg/dl}, 250\\text{ mg/dl}]$.\nThus, when a women between 40 and 50 years of age has a cholesterol level out of this interval, it’s common to think about some pathology. However this person could be healthy, although the likelihood of that happening is only 5%.\nThe central limit theorem This behavior is common in many physical and biological variables in Nature.\nIf you think about the distribution of the height, for instance, you can check that most people in the population have a height around the mean, but as the heights move away from the mean, both below and above the mean, there are few and few people with such a heights.\nThe explanation for this behavior is the , that we will see in the next chapter; it states that a continuous random variable whose values depends on a huge number of independent factors adding their effects, always follows a normal distribution.\nThe standard normal distribution $N(0,1)$ The most important normal distribution has mean zero, $\\mu=0$, and standard deviation one, $\\sigma=1$. It is known as Standard normal distribution and usually represented as $Z\\sim N(0,1)$.\nCalculation of probabilities with the normal distribution To avoid integrating the normal density function to compute probabilities it’s common to use the distribution function, that is given in a tabular format like the one below. For instance, to calculate $P(Z\\leq 0.52)$\n0.00 0.01 0.02 \u0026hellip; 0.0 0.5000 0.5040 0.5080 \u0026hellip; 0.1 0.5398 0.5438 0.5478 \u0026hellip; 0.2 0.5793 0.5832 0.5871 \u0026hellip; 0.3 0.6179 0.6217 0.6255 \u0026hellip; 0.4 0.6554 0.6591 0.6628 \u0026hellip; 0.5 0.6915 0.6950 0.6985 \u0026hellip; ⋮ ⋮ ⋮ ⋮ ⋮ $$0.52 \\rightarrow \\mbox{row }0.5 + \\mbox{column }0.02$$\nTo compute cumulative probabilities to the right of a value, we can apply the rule for the complement event. For instance,\n$$P(Z\u0026gt;0.52) =1-P(Z\\leq 0.52) = 1-F(0.52) = 1 - 0.6985 = 0.3015.$$\nStandardization We have seen how to use the table of the standard normal distribution function to compute probabilities, but, what to do when the normal distribution is not the standard one?\nIn that case we can use standardization to transform any normal distribution in the standard normal distribution.\nTheorem - Standardization. If $X$ is a continuous random variables that follow a Normal probability distribution model with mean $\\mu$ and standard deviation $\\sigma$, $X\\sim N(\\mu,\\sigma)$, then the variable that result of subtracting $\\mu$ to $X$ and dividing by $\\sigma$, follows a Standard Normal probability distribution,\n$$X\\sim N(\\mu,\\sigma) \\Rightarrow Z=\\frac{X-\\mu}{\\sigma}\\sim N(0,1).$$\nThus, to compute probabilities with a non-standard normal distribution first we have to standardize the variable before using the table of the standard normal distribution function.\nExample. Assume that the grade of an exam $X$ follows a normal probability distribution model $N(\\mu=6,\\sigma=1.5)$. What percentage of students didn’t pass the exam?\nAs $X$ follows a non-standard normal distribution model, we have to apply standardization first, $Z=\\displaystyle \\frac{X-\\mu}{\\sigma} = \\frac{X-6}{1.5}$,\n$$ P(X\u0026lt;5) = P\\left(\\frac{X-6}{1.5}\u0026lt;\\frac{5-6}{1.5}\\right) = P(Z\u0026lt;-0.67). $$\nThen we can use the table of the standard normal distribution function,\n$$P(Z\u0026lt;-0.67) = F(-0.67) = 0.2514.$$\nTherefore, $25.14%$ of students didn’t pass the exam.\nChi-square distribution Definition - Chi-square distribution $\\chi^2(n)$. Given $n$ independent random variables $Z_1,\\ldots,Z_n$, all of them following a standard normal probability distribution, then the variable\n$$\\chi^2(n) = Z_1^2+\\cdots +Z_n^2,$$\nfollows a chi-square probability distribution with $n$ degrees of freedom.\nIts range is $\\mathbb{R}^+$ and its mean and variance are $\\mu = n$ and $\\sigma^2 = 2n.$.\nExample. Below are plotted the density functions of some chi-square distribution models.\nChi-square distribution properties The range is non-negative. If $X\\sim \\chi^2(n)$ and $Y\\sim \\chi^2(m)$, then $$X+Y \\sim \\chi^2(n+m).$$ It asymptotically approaches to a normal distribution as the degrees of freedom increase. As we will see in the next chapter, the chi-square distribution plays an important role in the estimation of the population variance and in the study of relations between qualitative variables.\nStudent’s t distribution Definition - Student’s t distribution $T(n)$. Given a variable $Z$ following a standard normal distribution model, $Z\\sim N(0,1)$, and a variable $X$ following a chi-square distribution model with $n$ degrees of freedom, $X\\sim \\chi^2(n)$, independent, the variable\n$$T = \\frac{Z}{\\sqrt{X/n}},$$\nfollows a Student’s t probability distribution model with $n$ degrees of freedom.\nIts range is $\\mathbb{R}$ and its mean and variance are $$\\mu = 0$$ and $$\\sigma^2 = \\frac{n}{n-2}$$ if $n\u0026gt;2$.\nExample. Below are plotted the density functions of some student\u0026rsquo;s t distribution models.\nStudent’s t distribution properties The mean, the median and the mode are the same, $\\mu=Me=Mo$. It is symmetric, $g_1=0$. It asymptotically approaches to the standard normal distribution as the degrees of freedom increase. In practice for $n\\geq 30$ both distributions are approximately the same. $$T(n)\\stackrel{n\\rightarrow \\infty}{\\approx}N(0,1).$$ As we will see in the next chapter, the Student’s t distribution plays an important role in the estimation of the population mean.\nFisher-Snedecor’s F distribution Definition - Fisher-Snedecor’s F distribution $F(m,n)$. Given two independent variables $X$ and $Y$ both following a chi-square probability distribution model with $m$ an $n$ degrees of freedom respectively, $X\\sim \\chi^2(m)$ and $Y\\sim \\chi^2(n)$, then the variable\n$$F = \\frac{X/m}{Y/n},$$\nfollows a Fisher-Snedecor’s F probability distribution model with $m$ and $n$ degrees of freedom.\nIts range is $\\mathbb{R}^+$ and its mean and variance are $$\\mu = \\frac{n}{n-2}$$ and $$\\sigma^2 =\\frac{2n^2(m+n−2)}{m(n-2)^2(n-4)}$$ if $n\u0026gt;4$.\nExample. Below are plotted the density functions of some Fisher-Snedecor\u0026rsquo;s F distribution models.\nFisher-Snedecor’s F distribution properties The range is non-negative. It satisfies $$F(m,n) =\\frac{1}{F(n,m)}.$$ Thus, if we name $f(m,n)_p$ the value that satisfies $P(F(m,n)\\leq f(m,n)_p)=p$, then $$f(m,n)_p =\\frac{1}{f(n,m)_{1-p}}$$ which is helpful in order to compute probabilities from the table of the distribution function. As we will see in the next chapter, the Fisher-Snedecor’s F distribution plays an important role in the comparison of population variances and in the analysis of variance test (ANOVA).\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"f27b27994d49369e7085870ab298be38","permalink":"/en/teaching/statistics/manual/continuous-random-variables/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/continuous-random-variables/","section":"teaching","summary":"Probability distribution of a continuous random variable Continuous random variables, unlike discrete random variables, can take any value in a real interval. Thus the range of a continuous random variables is infinite and uncountable.","tags":["Statistics","Biostatistics","Random Variables"],"title":"Continuous Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"A database is an organised collection of data. Usually databases are composed of records that contains information about the same object (person, company, product, etc), and records are composed of fields that contains every piece of information (name, address, phone number, price, etc.).\nExample The next table show a students database with fields First name, Last name, Address, City, Birth date, Average grade and Passed credits.\nFirst name Last name Address City Birth date Average grade Passed credits María Sánchez García c. Estrella, 9 Madrid 23/10/1994 5,8 78 Carlos Pérez López c. Bravo Murillo, 34 3º-D Madrid 16/08/1993 7,9 123 Luis González Roca c. Antonio López, 67 1º-A Madrid 07/07/1995 8,2 45 Camen Aguirre Jordán c. Espada, 12 4º-C Sevilla 06/03/1994 4,2 28 Luisa Martín Garrido c. Cervantes, 14 Albacete 22/01/1994 6,7 54 Alberto Pintado Marín c. Arroyo, 27 2º-C Sevilla 10/03/1995 4,1 12 Marina Gómez Gómez c. Velázquez 28 4º-A Madrid 12/04/1994 7,7 62 Javier Yagüe Pinzón c. Rosales, 76 8º-B Madrid 18/12/1993 6,1 82 Lucas Guerrero Monzón c. Isaac Peral, 30 Bajo Albacete 12/01/1995 5,4 32 Database creation in Excel Excel allows to define databases as tables where fields are defined in columns and records in rows. The first row of the table contains labels for each field. This tables are also called data lists.\nTo create a data list first enter the name of the fields in the first row of the table, each in one column. This first row with the field names is the headers row. Field names must be unique and there musn\u0026rsquo;t be blank cells in the headers row. After creating the fields enter first record data in the appropriate columns of the row immediately below the one containing the field names. To Excel recognise this table as a data list, click the Format as Table button on the ribbon’s Home tab and then click a thumbnail of one of the table styles in the drop-down gallery.\nAfter that you can enter the remaining records, one by row. After entering the data of a field press the Tab key to go to the next field of the same record, or to the first field of the next record if you are in the last field of a record.\nExample. The animation below shows how create a data list of students with the fields First name, Last name, Address, City, Birth date, Average grade and Passed credits.\nAfter creating a data list Excel will give a name to it, but is advisable to give it a descriptive name (see the Naming cells and ranges section).\nData validation When entering data to a data list is important to validate data to maintain database integrity. Data validation allows to specify which type and range of data are accepted by a cell or field (column). To apply a validation rule to a field, select the field column of the data list and click Data validation button of the Data tools panel on the ribbon\u0026rsquo;s Data tab. In the dialog that appears, select the validation criteria from the drop-down list of the Setting:\nWhole number allows only integers numbers between a specified minimum and a maximum or greater o less than a specified number. Decimal allows decimal numbers between a specified minimum and maximum or greater or less than a specified number. List allows a list of defined entries. Date allows dates between two specified dates or before or after a specified date. Time allows times between two specified times or before of after a specified time. Text length allows text with a restricted length. After selecting the validating criteria, enter the correspondent parameters (minimum or maximum numbers, dates, times or range with the entries of the list). You can also define an input message in the Input Message tab and an error message in the Error Alert tab that will be shown if an invalid entry is entered in the field.\nExample. The animation below shows how create a validation rule for the Average grade field in a data list of students.\nImporting databases Excel offers the possibility to import data from diverse sources like csv text files, XML files, relational databases like Access or web data sources.\nImporting data from csv text files To see how to import data from csv text file visit the section Import from csv format.\nImporting from web data sources There are many web pages that offers open data in a suitable format for import from Excel. To import data from a web data source click the From Web buttom of the Get External Data panel on the ribbon\u0026rsquo;s Data tab. This opens a web browser where you must enter the URL of the page with de data source. When the browser shows the data table some yellow arrows appears that allow you to select the rows and columns of the table to import.\nExample The animation below shows how to import the IBEX 35 serie from Yahoo finances.\nImporting data from Qandl Quandl is a finance and economic data repository with hundred of open data series. It\u0026rsquo;s possible to import data from Qandl to Excel easily, but you need the Quandl add in for Excel. To install the Quandl add in for Excel follow these instructions.\nAfter installing the add in a new tab labelled Quandl appears in the ribbon. To import a data serie from Qandl, first search the data serie clicking the Search button on the ribbon\u0026rsquo;s Quandl tab, enter some key words for the search and click the Show Results button, select the data serie desired from the search results, click the Insert Selected Codes buttom and click the Close button. This will insert the Quandl code of the data serie (if you know the Quandl code of the data serie you can avoid the search and enter it directly in a cell). Finally, select the cell with the Quandl code and click the Download button on the ribbon\u0026rsquo;s Quandl tab. This will download the data serie and put it in a range below the cell that contais the Quandl code.\nExample The animation below shows how to import the IBEX 35 serie from Quandl.\nData sorting To sort the data list records on a single field, you simply click that field’s AutoFilter button (the button with the triangle that appears to the right of the header) and then click the appropriate sort option on its drop-down list:\nSort A to Z or Sort Z to A in a text field. Sort Smallest to Largest or Sort Largest to Smallest in a number field. Sort Oldest to Newest or Sort Newest to Oldest in a date field. Other option to sort a data list on a field is to select a cell of the field column an click the Sort A to Z button of the Sort \u0026amp; Filter panel on the ribbon’s Data tab, to sort ascending, or the Sort Z to A button to sort descending.\nExcel then will reorder all the records in the data list according to the ascending or descending order selected.\nExample. The animation below shows how to sort a students database. First ascending on the Birth date field, next descending on the Average degree field, and finally ascending on the Last name field.\nIf you need to sort a data list on more than one field, select a cell of the data list and click the Sort button of the Sort \u0026amp; Filter panel on the ribbon\u0026rsquo;s Data tab. Then, in the dialog that appears, select the first sorting field column and the sorting order (ascending or descending), next the second sorting field column an the sorting order, and so on.\nExample. The animation below shows how to sort a students database on the fields City ascending and Average grade descending.\nYou can also sort a range of cells in general indicating the name of the columns instead of the field names.\nSummarizing data With large tables or data lists is difficult to extract relevant information. For that purpose, Excel provides several methods for summarizing data.\nTotaling and subtotaling fields A common operation is to apply a function to a whole field in a data list, as for instance the SUM function for summarizing or the AVERAGE function for averaging all the values in a field column. This could be done activating the Total row check box of the Table Style Options panel on the ribbon\u0026rsquo;s Table Options tab. This will add a total row at the bottom of the table. Clicking any cell of this row you can choose which function to apply to the whole field.\nExample The animation below shows how to sum the passed credits of students in a students database. It also shows how to average the average grade.\nExcel also allows subtotaling a field by categories of other field. This procedure only works with data lists formatted like tables, so if a data list have been formatted like a table first it has to be converted to a range selecting any cell of the table and clicking the Convert to Range button of the Tool panel on the ribbon\u0026rsquo;s Table Tools - Design tab. After that, you have to sort the data list by the field with the categories to summarize (see the Data sorting section). Finally, to subtotaling a data list click the Subtotal button of the Outline panel on the ribbons\u0026rsquo; Data tab. This will display a dialog where you have to select the field with the categories in the At each change in drop-down menu, the function to apply (sum, count, average, etc.) in the Use function drop-down menu, check the fields to with apply the subtotaling function in the Add subtotal to list, and click OK.\nExample The animation below shows how to subtotaling the passed credits of students in a students database by the city where they live.\nPivot tables A pivot table is a powerful tool for exploring data. It help you organise and summarize the raw data in your data list, revealing patterns or relationships that might not be obvious at first glance.\nTo create a pivot table click on any cell of a data list and then click the PivotTable button on the ribbon’s Insert tab. This display a dialog where you can select the range for the pivot table (by default Excel select the whole data list) and choose between placing the pivot table in a new workbook (default) or in the same workbook (in this case you have to indicate in which cell). After click OK, a pane appears on the right side of the pane:\nReport Filter for the fields that enable you to page through the data summaries shown in the actual pivot table by filtering out sets of data — they act as the filters for the report. So, for example, if you designate the Year Field from a data list as a Report Filter, you can display data summaries in the pivot table for individual years or for all years represented in the data list. Column Labels for the fields that determine the arrangement of data shown in the columns of the pivot table. Row Labels for the fields that determine the arrangement of data shown in the rows of the pivot table. Values for the fields whose data are presented and summarized in the body cells of the pivot table. By default Excel will use the SUM function to summarize values. To use another function click the field and select the Value Field Settings option in the menu that appears. In the dialog that appears just select the function that you want to use for summarizing and click OK. Example The animation below shows how to create a pivot table for a students database. The pivot table shows and summarizes the passed credits by degrees on rows and by cities on columns.\nThe animation below shows how to arrange the previous pivot table to show the passed credits summarized first by city and then by degree and vice versa, both on rows.\nThe animation below shows how to arrange the previous pivot table to show, in addition to the passed credits, the average grade of students. The passed credits are summarized using the SUM function while the average grade is summarized using the AVERAGE function.\nThe animation below shows how to filter the previous pivot table to show only the values of course year 2014 and not to show the physics degree.\nTo change the format of a pivot table you can use the Layout panel on ribbon\u0026rsquo;s PivotTable Tools - Design tab. This panel has four buttons:\nSubtotals Allows to show subtotals at top of groups, at bottom of groups or not to show subtotals. Grand Totals Allows to show grand totals for rows, for columns, for both rows and columns, or not to show grand totals. Report Layout Allows to show the groups in compact form (all the grouping fields in the same column), in outline form (every grouping field in a different column) or in tabular form (like the outline form but adding extra rows for the subtotals). Blank rows Allow to insert or not a blank row after each group. It\u0026rsquo;s also possible to apply a predefined style to a pivot table just selecting the desired style from the PivotTable Styles panel on ribbon\u0026rsquo;s PivotTable Tools - Design tab.\nExample The animation below shows how to format and how to apply a style to the previous pivot table.\nPivot chart Pivot tables can be accompanied by pivot charts, that is an interactive chart where you can present and summarize data grouped by some fields like a in a pivot table. To create a pivot chart from a pivot table, in the worksheet with the pivot table click the PivotChart button of the Tools panel on the ribbon\u0026rsquo;s PivotTable Tools - Options tab. This will show a dialog with the charts types. Select the desired chart type and click OK. After that Excel inserts a chart in the same worksheet of the pivot table reflecting the same information of the pivot table. Fron now on, any change in the pivot table will be reflected in the pivot chart.\nExample The animation below shows how to create a pivot chart from a pivot table for a students database.\nOf course, you can change the pivot chart layout as any other chart (see section Chart layout).\nData filtering With huge databases it\u0026rsquo;s difficult to find the desired information. To overcome this problem Excel provide several methods to filter the database. Filtering is the procedure for specifying the data that you want displayed in an Excel data list.\nApply a simple filter The easiest way to perform this basic type of filtering on a field is to click the AutoFilter button (the button with the triangle that appears to the right of the header). This display a drop-down menu that contains at the end a list box with a complete listing of all entries made in that column, each with its own check box. In this list click the check box in front of the (Select All) option at the top of the field’s list box to clear the check boxes, then click each of the check boxes corresponding to the entries for the records you do want displayed in the filtered data list, and finally click OK. Excel then hides rows in the data list for all records except for those that contain the entries you just selected.\nExample The animation below shows how to filter the students of Sevilla and Albacete in a students database.\nTo perform more sophisticated filters you can use the other filter options of the AutoFiller button. These filter options depend on the type of entries in the field:\nIf the column only contains dates, the menu contains a Date Filters option with a submenu that allows you to filter dates equals to, before o after a given date; dates between two given dates; dates of today, yesterday and tomorrow; dates of this week, last week and next week; dates of this month, last month and next month; dates of this quarter, last quarter and next quarter; dates of this year, last year and next year; and dates in a specific period (quarter or month).\nIf the column contains only numbers or a mixture of dates with numbers, the menu contains a Number Filters option with a submenu that allows you to filter numbers equal or not equal to a given number; numbers greater than, greater than or equal to, less than, less than or equal to a given number; numbers between two given numbers; top 10 numbers; number above the average and numbers below the average.\nIf the column only text or a mixture of text, date and numbers, the menu contains a Text Filters option with a submenu that allows you to filter text equal or not equal to a given text; text that begins or end with a given text; and text that contains or does not contains a given text.\nIf the filter selected requires some parameter (date, number or text), a dialog appears where you must enter that data and click OK.\nExample The animation below shows how to filter the students born before 1/1/1995, with an average grade greater than or equal to 5, and whose name begins with M, in a students database.\nApply a complex filter Simple filters are enough in most cases, but sometime you need to filter data according to more complex criteria. Fortunately Excel provides a method to perform filters based on calculated criteria with formulas.\nTo perform a filter with calculated criteria first you have to specify the criteria somewhere in the worksheet that contains the data list. The criteria must have a cell header and a logical formula in the cell just below. In the logical formula you can use functions and references to the cells, but it\u0026rsquo;s important to note that all references must be to cells in the first row of the data list. After that, to apply the filter you need to select a cell in the data list and click the Advanced button of the Sort \u0026amp; Filter panel on the ribbons\u0026rsquo;s Data tab. This shows a dialog where you have to enter the range of the data list (usually Excel auto recognise it), the range of the filter criteria and click OK. Excel will apply the logical formula to every row of the data list and show only the records where the formula returns TRUE.\nExample The animation below shows how to filter the students with an average grade greater than or equal to 5, and a number of passed credits over the average, in a students database, using a calculated criteria. Observe how is used the data list name and the field name to reference the column of passed credits in the average calculation.\nClear a filter To clear an active filter in a data list click the AutoFilter button of the column with the active filter and select the option Clear Filter. After that Excel will show all the records hidden by the removed filter, but the rest of filters will continue active. To clear all the filters in a data list, select a cell of the data list and then click the Clear button of the Sort \u0026amp; Filter panel on the ribbons\u0026rsquo;s Data tab. This will show all the records of the data list.\nDatabase functions Excel have some predefined functions that can be applied to data list. Some of them apply other function only to records in a data list that match a criteria you specify.\nDefine a criteria The criteria must be defined in a range and must include at least one header with a field name that indicates the field whose values are to be evaluated and one cell just below with the value or expression to be used in the evaluation. The expression with the condition is a text string starting with a logical comparator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string). You can specify multiple conditions in different columns. If you want to apply the function to all the records of the data list, just leave the cell with the criteria conditions blank.\nDSUM function The DSUM function sums the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DSUM(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to sum the passed credits of students from Madrid born in 1994 or after with an average grade greater or equal to 6, in a students database.\nDCOUNT function The DCOUNT function counts the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DCOUNT(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to count the students with an average grade greater than or equal to 6 whose name begins with L, in a students database.\nDMIN function The DMIN function returns the minimum in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DMIN(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nDMAX function The DMAX function returns the maximum in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DMAX(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to calculate the minimum and the maximum average grade of students from Madrid born before 1995, in a students database.\nDAVERAGE function The DAVERAGE function averages the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DAVERAGE(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to average the average grades of students from Madrid born in 1994 or after with an average grade greater or equal to 6, in a students database.\nDSTDEVP function The DSTDEVP function calculates the standard deviation the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DSTDEVP(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to calculate the standard deviation of average grades of students from Madrid born in Madrid before 1995, in a students database.\nDGET function The DGET function returns the value of field (column) in the record of a data list that match the criteria you specify. Its syntax is DGET(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to return enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nIf no record satisfy the criteria, the function returns a #VALUE! error, and if more than one records satisfy the criteria the functions return a #NUM! error.\nExample The animation below shows how to find the student with the highest grade in a student database.\nOther functions allow to search values in a list or table.\nVLOOKUP and HLOOKUP functions The VLOOKUP function finds things in a table or list by row. Its syntax is VLOOKUP (value, table, col-index, [approx-match]), where value is the value you want to look up, table is the range of the table or list in which to perform the search, col-index is the the column number (starting with 1 for the left-most column of table range) that contains the return value, and approx-match is an optional logical argument that specifies whether to find an approximate match (TRUE by default) or an exact match (FALSE). The function looks the value argument up in the first column of the table argument. If the approx-match argument is TRUE, the table should be ordered by the firs column (the column where to look the value up) and the function will return the value of the col-index column in the same row that the closest value to value in the first column of the table range. If approx-match is false, the function will search for the exact value in the firs column and it will return the value of the col-index column in the same row that the first matched value in the first column. If no value in the first column matches the value argument, the function will return a #N/A error.\nExample The animation below shows how to look the phone up of a student in a students database.\nThe HLOOKUP function works like the VLOOKUP function but it performs a search by columns. Its syntax is HLOOKUP (value, table, row-index, [approx-match]), where value is the value you want to look up, table is the range of the table or list in which to perform the search, row-index is the the row number (starting with 1 for the top-most row of table range) that contains the return value, and approx-match is an optional logical argument that specifies whether to find an approximate match (TRUE by default) or an exact match (FALSE).\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"0bf14acea21eaa00b95813c4c2dc6e25","permalink":"/en/teaching/excel/manual/databases/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/databases/","section":"teaching","summary":" ","tags":["Excel"],"title":"Database Management","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Antiderivative of a function Definition - Antiderivative of a function. Given a function $f(x)$, the function $F(X)$ is an antiderivative or primitive function of $f$ if it satisfies that $F\u0026rsquo;(x)=f(x)$ $\\forall x \\in \\mathop{\\rm Dom}(f)$. Example. The function $F(x)=x^2$ is an antiderivative of the function $f(x)=2x$ as $F\u0026rsquo;(x)=2x$ on $\\mathbb{R}$.\nRoughly speaking, the calculus of antiderivatives is the reverse process of differentiation, and that is the reason for the name of antiderivative.\nIndefinite integral of a function As two functions that differs in a constant term have the same derivative, if $F(x)$ is an antiderivative of $f(x)$, so will be any function of the form $F(x)+k$ $\\forall k \\in \\mathbb{R}$. This means that, when a function has an antiderivative, it has an infinite number of antiderivatives.\nDefinition - Indefinite integral. The indefinite integral of a function $f(x)$ is the set of all its antiderivatives; it is denoted by\n$$\\int{f(x)}\\,dx=F(x)+C$$ where $F(x)$ is an antiderivative of $f(x)$ and $C$ is a constant.\nExample. The indefinite integral of the function $f(x)=2x$ is $$\\int 2x\\, dx = x^2+C.$$\nInterpretation of the integral We have seen in a previous chapter that the derivative of a function is the instantaneous rate of change of the function. Thus, if we know the instantaneous rate of change of the function at any point, we can compute the change of the function.\nExample. What is the space covered by an free falling object?\nAssume that the only force acting upon an object drop is gravity, with an acceleration of $9.8$ m/s$^2$. As acceleration is the the rate of change of the speed, that is constant at any moment, the antiderivative is the speed of the object,\n$$v(t) = 9.8t \\mbox{ m/s}$$\nAnd as the speed is the rate of change of the space covered by object during the fall, the antiderivative of the speed is the space covered by the object,\n$$s(t) = \\int 9.8t\\, dt = 9,8\\frac{t^2}{2}.$$\nThus, for instance, after 2 seconds, the covered space is $s(2) = 9.8\\frac{2^2}{2} = 19.6$ m.\nLinearity of integration Given two integrable functions $f(x)$ and $g(x)$ and a constant $k \\in \\mathbb{R}$, it is satisfied that\n$\\int{(f(x)+g(x))}\\,dx=\\int{f(x)}\\,dx+\\int{g(x)}\\,dx$, $\\int{kf(x)}\\,dx=k\\int{f(x)}\\,dx$. This means that the integral of any linear combination of functions equals the same linear combination of the integrals of the functions.\nElementary integrals $\\int a\\,dx=ax+C$, with $a$ constant. $\\int x^n\\,dx=\\dfrac{x^{n+1}}{n+1}+C$ if $n\\neq -1$. $\\int \\dfrac{1}{x}\\, dx=\\ln\\vert x\\vert+C$. $\\int e^x\\,dx=e^x+C$. $\\int a^x\\,dx=\\dfrac{a^x}{\\ln a}+C$. $\\int \\sin x\\, dx=-\\cos x+C$. $\\int \\cos x\\, dx=\\sin x+C$. $\\int \\tan x\\, dx=\\ln\\vert\\sec x\\vert+C$. $\\int \\sec x\\, dx = \\ln\\vert\\sec x + \\tan x\\vert+C$. $\\int \\csc x\\, dx= \\ln\\vert\\csc x-\\cot x\\vert+C$. $\\int \\cot x \\, dx= \\ln\\vert\\sin x\\vert+C$. $\\int \\sec^2 x\\, dx= \\tan x+ C$. $\\int \\csc^2 x\\, dx= -\\cot x+ C$. $\\int \\sec x \\tan x\\, dx= \\sec x+ C$. $\\int \\csc x \\cot x\\, dx = -\\csc x +C$. $\\int \\dfrac{dx}{\\sqrt{a^2-x^2}}=\\arcsin\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{a^2+x^2}=\\dfrac{1}{a}\\arctan\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{x\\sqrt{x^2-a^2}}=\\dfrac{1}{a}\\sec^{-1}\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{a^2-x^2}=\\dfrac{1}{2a}\\ln\\left\\vert\\dfrac{x+a}{x-a}\\right\\vert+C$. Techniques of integration Unfortunately, unlike differential calculus, the is not a foolproof procedure to compute the antiderivative of a function. However, there are some techniques that allow to integrate some types of functions. The most common methods of integration are\nIntegration by parts Integration by reduction Integration by substitution Integration of rational functions Integration of trigonometric functions Integration by parts Theorem - Integration by parts. Given two differentiable functions $u(x)$ and $v(x)$,\n$$\\int{u(x)v\u0026rsquo;(x)}\\,dx=u(x)v(x)-\\int{u\u0026rsquo;(x)v(x)}\\,dx,$$\nor, writing $u\u0026rsquo;(x)dx=du$ and $v\u0026rsquo;(x)dx=dv$,\n$$\\int{u}\\,dv=uv-\\int{v}\\,du.$$\nProof From the rule for differentiating a product we have\n$$ (uv)\u0026rsquo; = u\u0026rsquo;v + uv\u0026rsquo; $$\nand computing the integrals both sides we get\n$$ \\begin{gathered} \\int (uv)\u0026rsquo; \\, dx = \\int u\u0026rsquo;v \\, dx + \\int uv\u0026rsquo;\\, dx \\Rightarrow\\newline uv = \\int v\\,du + \\int u\\, dv \\Rightarrow\\newline \\int{u}\\,dv=uv-\\int{v}\\,du. \\end{gathered} $$\nTo apply this method we have to choose the functions $u$ and $dv$ in a way so that the final integral is easier to compute than the original one.\nExample. To integrate $\\int{x \\sin x}\\,dx$ we have to choose $u=x$ and $dv=\\sin x\\, dx$, so $du=dx$ and $v=-\\cos x$, getting $$\\int{x \\sin x}\\,dx=-x\\cos x-\\int (-\\cos x)\\,dx = -x\\cos x +\\sin x.$$ If we had chosen $u=\\sin x$ and $dv=x\\,dx$, we would have got a more difficult integral.\nIntegration by reduction The reduction technique is used when we have to apply the integration by parts several times.\nIf we want to compute the antiderivative $I_{n}$ that depends on a natural number $n$, the reduction formulas allow us to write $I_{n}$ as a function of $I_{n-1}$, that is, we have a recurrent relation $$\\ I_{n}=f(I_{n-1},x,n)$$ so by computing the first antiderivative $I_0$ we should be able to compute the others.\nExample. To compute $I_{n}=\\int{x^ne^x}\\,dx$ applying integration by parts, we have to choose $u=x^n$ y $dv=e^x\\,dx$, so $du=nx^{n-1}\\,dx$ and $v=e^{x}$, getting\n$$\\ I_{n}=\\int{x^ne^x}\\,dx=x^ne^x-n\\int{x^{n-1}e^x}\\,dx=x^ne^x-nI_{n-1}.$$\nThus, for instance, for $n=3$ we have\n$$ \\begin{aligned} \\int x^3 e^x\\, dx \u0026amp;= I_3 = x^3e^x-3I_2 = x^3e^x-3(x^2e^x-2I_1) =\\newline \u0026amp;= x^3e^x-3(x^2e^x-(xe^x-I_0) = x^3e^x-3(x^2e^x-(xe^x-e^x) =\\newline \u0026amp;= e^x(x^3-3x^2+6x-6). \\end{aligned} $$\nIntegration by substitution From the chain rule for differentiating the composition of two functions\n$$f(g(x))\u0026rsquo; = f\u0026rsquo;(g(x))g\u0026rsquo;(x),$$\nwe can make a variable change $u=g(x)$, so $du=g\u0026rsquo;(x)dx$, and get\n$$\\int f\u0026rsquo;(g(x))g\u0026rsquo;(x)\\, dx = \\int f\u0026rsquo;(u)\\, du = f(u)+C = f(g(x))+C.$$\nExample. To compute the integral of $\\int{\\dfrac{1}{x\\log x}}\\, dx$ we can make the substitution $u=\\log x$, so $du=\\frac{1}{x}dx$, and we have\n$$\\int \\frac{dx}{x\\log x}=\\int \\frac{1}{\\log x}\\frac{1}{x}\\,dx = \\int \\frac{1}{u}\\,du = \\log \\vert u\\vert+ C.$$\nFinally, undoing the substitution we get\n$$\\int \\frac{1}{x\\log x}\\,dx= \\log \\vert\\log x\\vert + C.$$\nIntegration of rational functions Partial fractions decomposition A rational function can be written as the sum of a polynomial (with an immediate antiderivative) plus a proper rational function, that is, a rational function in which the degree of the numerator is less than the degree of the denominator.\nOn the other hand, depending of the factorization of the denominator, a proper rational function can be expressed as a sum of simpler fractions of the following types\nDenominator with a single linear factor: $\\dfrac{A}{(x-a)}$ Denominator with a linear factor repeated $n$ times : $\\dfrac{A}{(x-a)^{n}}$ Denominator with a single quadratic factor: $\\dfrac{Ax+B}{x^2+cx+d}$ Denominator with a quadratic factor repeated $n$ times: $\\dfrac{Ax+B}{(x^2+cx+d)^n}$ Antiderivatives of partial fractions Using the linearity of integration, we can compute the antiderivative of a rational function from the antiderivative of these partial fractions\n$$ \\begin{aligned} \\int \\frac{A}{x-a}\\,dx \u0026amp;= A\\log\\vert x-a\\vert+C,\\newline \\int \\frac{A}{(x-a)^n}\\,dx \u0026amp;= \\frac{-A}{(n-1)(x-a)^{n-1}}+C \\textrm{ si $n\\neq 1$}.\\newline \\int \\frac{Ax+B}{x^2+cx+d} \u0026amp;= \\frac{A}{2}\\log\\vert x^2+cx+d\\vert + \\frac{2B-Ac}{\\sqrt{4d-c^2}}\\arctan \\frac{2x+c}{\\sqrt{4d-c^2}}+C. \\end{aligned} $$\nIntegration of a rational function with a denominator with linear factors Example. Consider the function $f(x)=\\dfrac{x^2+3x-5}{x^3-3x+2}$.\nThe factorization of the denominator is $x^3-3x+2=(x-1)^2(x+2)$; it has a single linear factor $(x+2)$ and a linear factor $(x-1)$, repeated two times. In this case the decomposition in partial fractions is:\n$$ \\begin{aligned} \\frac{x^2+3x-5}{x^3-3x+2}\u0026amp;=\\frac{A}{x-1}+\\frac{B}{(x-1)^2}+\\frac{C}{x+2} = \\newline \u0026amp;= \\frac{A(x-1)(x+2)+ B(x+2)+C(x-1)^2}{(x-1)^2(x+2)} = \\newline \u0026amp;= \\frac{(A+C)x^2+(A+B-2C)x+(-2A+2B+C)}{(x-1)^2(x+2)} \\end{aligned} $$\nand equating the numerators we get $A=16/9$, $B=-1/3$ and $C=-7/9$, so\n$$\\frac{x^2+3x-5}{x^3-3x+2}= \\frac{16/9}{x-1}+\\frac{-1/3}{(x-1)^2}+\\frac{-7/9}{x+2}.$$\nFinally, integrating each partial fraction we have\n$$ \\begin{aligned} \\int \\frac{x^2+3x-5}{x^3-3x+2}\\, dx \u0026amp;= \\int \\frac{16/9}{x-1}\\,dx+\\int \\frac{-1/3}{(x-1)^2}\\,dx+\\int \\frac{-7/9}{x+2}\\,dx = \\newline \u0026amp;= \\frac{16}{9}\\int\\frac{1}{x-1}\\,dx-\\frac{1}{3}\\int(x-1)^{-2}\\,dx- \\frac{7}{9}\\int \\frac{1}{x+2}\\,dx = \\newline \u0026amp;= \\frac{16}{9}\\ln\\vert x-1\\vert+\\frac{1}{3(x-1)}-\\frac{7}{9}\\ln\\vert x+2\\vert+C. \\end{aligned} $$\nIntegration of a rational function with a denominator with simple quadratic factors Example. Consider the function $f(x)=\\dfrac{x+1}{x^2-4x+8}$.\nIn this case the denominator cannot be factorised as a product of linear factors, but we can write\n$$x^2-4x+8 = (x-2)^2+4,$$\nso\n$$ \\begin{aligned} \\int \\dfrac{x+1}{x^2-4x+8}\\, dx \u0026amp;= \\int \\dfrac{x-2+3}{(x-2)^2+4}\\,dx = \\newline \u0026amp;= \\int \\dfrac{x-2}{(x-2)^2+4}\\,dx + \\int \\dfrac{3}{(x-2)^2+4}\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\ln\\vert(x-2)^2+4\\vert + \\dfrac{3}{2}\\arctan\\left(\\frac{x-2}{2}\\right)+C. \\end{aligned} $$\nIntegration of trigonometric functions Integration of $\\sin^n x\\cos^m x$ with $n$ or $m$ odd If $f(x)=\\sin^n x\\cos^m x$ with $n$ or $m$ odd, then we can make the substitution $t=\\sin x$ or $t=\\cos x$, to convert the function into a polynomial. Example.\n$$\\int \\sin^2 x\\cos^3 x\\, dx = \\int \\sin^2 x\\cos^2 x\\cos x\\, dx = \\int \\sin^2 x(1-\\sin^2 x)\\cos x\\, dx,$$\nand making the substitution $t=\\sin x$, so $dt = \\cos x dx$, we have\n$$\\int \\sin^2 x(1-\\sin^2 x)\\cos x\\, dx = \\int t^2(1-t^2)\\, dt = \\int t^2-t^4 \\, dt = \\frac{t^3}{3}-\\frac{t^5}{5}+C.$$\nFinally, undoing the substitution we have\n$$\\int \\sin^2 x\\cos^3 x\\, dx = \\frac{\\sin^3 x}{3}-\\frac{\\sin^5 x}{5}+C.$$\nIntegration of $\\sin^n x\\cos^m x$ with $n$ and $m$ even If $f(x)=\\sin^n x\\cos^m x$ with $n$ and $m$ even, then we can make the following substitutions to simplify the integration\n$$ \\begin{aligned} \\sin^2 x \u0026amp;= \\frac{1}{2}(1-\\cos(2x))\\newline \\cos^2 x \u0026amp;= \\frac{1}{2}(1+\\cos(2x))\\newline \\sin x\\cos x \u0026amp;= \\frac{1}{2}\\sin(2x) \\end{aligned} $$\nExample.\n$$ \\begin{aligned} \\int \\sin^2 x\\cos^4 x\\, dx \u0026amp;= \\int (\\sin x\\cos x)^2\\cos^2 x\\, dx = \\int \\left(\\frac{1}{2}\\sin(2x)\\right)^2\\frac{1}{2}(1+\\cos(2x))\\,dx =\\newline \u0026amp;= \\frac{1}{8}\\int \\sin^2(2x)\\,dx+\\frac{1}{8}\\int \\sin^2(2x) \\cos(2x)\\,dx, \\end{aligned} $$\nthe first integral is of the same type and the second one of the previous type, so $$\\int \\sin^2 x\\cos^4 x\\, dx = \\frac{1}{32}x-\\frac{1}{32}\\sin(2x)+\\frac{1}{24}\\sin^3(2x).$$\nProducts of sines and cosines The equalities\n$$ \\begin{aligned} \\sin x\\cos y \u0026amp;= \\frac{1}{2}(\\sin(x-y)+\\sin(x+y))\\newline \\sin x\\sin y \u0026amp;= \\frac{1}{2}(\\cos(x-y)-\\cos(x+y))\\newline \\cos x\\cos y \u0026amp;= \\frac{1}{2}(\\cos(x-y)+\\cos(x+y)) \\end{aligned} $$\ntransform products in sums, simplifying the integration.\nExample.\n$$ \\begin{aligned} \\int \\sin x\\cos 2x\\, dx \u0026amp;= \\int \\frac{1}{2}(\\sin(x-2x)+\\sin(x+2x))\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\int \\sin (-x)\\,dx +\\frac{1}{2}\\int \\sin 3x\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\cos(-x)- \\frac{1}{6}\\cos 3x +C. \\end{aligned} $$\nRational functions of sines and cosines If $f(x,y)$ is a rational function then the function $f(\\sin x,\\cos x)$ can be transformed in an rational function of $t$ with the following substitutions\n$$\\tan \\frac{x}{2}=t \\quad \\sin x=\\frac{2t}{1+t^2} \\quad \\cos x = \\frac{1-t^2}{1+t^2} \\quad dx = \\frac{2}{1+t^2}dt.$$\nExample.\n$$\\int \\frac{1}{\\sin x}\\,dx = \\int \\frac{1}{\\frac{2t}{1+t^2}}\\frac{2}{1+t^2}\\,dt = \\int \\frac{1}{t}\\,dt = \\log\\vert t\\vert+C = \\log\\vert\\tan\\frac{x}{2}\\vert+C.$$\nDefinite integral Definition - Definite integral. Let $f(x)$ be a function which is continuous on an interval $[a, b]$. Divide this interval into $n$ subintervals of equal width $\\Delta x$ and choose an arbitrary point $x_i$ from each subinterval. The definite integral of $f$ from $a$ to $b$ is defined to be the limit\n$$\\int_a^b f(x)\\,dx = \\lim_{n\\rightarrow \\infty}\\sum_{i=1}^n f(x_i)\\Delta x.$$\nTheorem - First fundamental theorem of Calculus. If $f(x)$ is continuous on the interval $[a,b]$ and $F(x)$ is an antiderivative of $f$ on $[a,b]$, then\n$$\\int_a^b f(x)\\,dx = F(b)-F(a)$$\nExample. Given the function $f(x)=x^2$, we have\n$$\\int_1^2 x^2\\,dx = \\left[\\frac{x^3}{3}\\right]_1^2 = \\frac{2^3}{3}-\\frac{1^3}{3} = \\frac{7}{3}.$$\nProperties of the definite integral Given two functions $f(x)$ and $g(x)$ integrable on $[a,b]$ and $k \\in \\mathbb{R}$ the following properties are satisfied:\n$\\int_{a}^{b}(f(x)+g(x))\\,dx=\\int_{a}^{b}f(x)\\,dx+\\int_{a}^{b}g(x)\\,dx$ (linearity)\n$\\int_{a}^{b}{kf(x)}\\,dx=k\\int_{a}^{b}{f(x)}\\,dx$ (linearity)\n$\\int_{a}^{b}{f(x)\\,dx} \\leq \\int_{a}^{b}{g(x)\\,dx}$ si $f(x)\\leq g(x)\\ \\forall x \\in [a,b]$ (monotony)\n$\\int_{a}^{b}{f(x)\\,dx} = \\int_{a}^{c}{f(x)\\,dx}+\\int_{c}^{b}{f(x)\\,dx}$ for any $c\\in(a,b)$ (additivity)\n$\\int_a^b f(x)\\,dx = -\\int_b^a f(x)\\,dx$\nArea calculation Area between a positive function and the $x$ axis If $f(x)$ is an integrable function on the interval $[a,b]$ and $f(x)\\geq 0\\ \\forall x\\in[a,b]$, then the definite integral\n$$\\int_a^b f(x)\\,dx$$\nmeasures the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$.\nArea between a negative function and the $x$ axis If $f(x)$ is an integrable function on the interval $[a,b]$ and $f(x)\\leq 0\\ \\forall x\\in[a,b]$, then the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$ is\n$$-\\int_a^b f(x)\\,dx.$$\nArea between a function and the $x$ axis In general, if $f(x)$ is an integrable function on the interval $[a,b]$, no matter the sign of $f$ on $[a,b]$, the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$ is\n$$\\int_a^b \\vert f(x)\\vert\\,dx.$$\nArea between two functions If $f(x)$ and $g(x)$ are two integrable functions on the interval $[a,b]$, then the area between the graph of $f$ and $g$ on the interval $[a,b]$ is $$\\int_{a}^{b}{\\vert f(x)- g(x)\\vert\\,dx}.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600986575,"objectID":"534b57a451c94fee658cb2589add7cce","permalink":"/en/teaching/calculus/manual/integrals/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/integrals/","section":"teaching","summary":"Antiderivative of a function Definition - Antiderivative of a function. Given a function $f(x)$, the function $F(X)$ is an antiderivative or primitive function of $f$ if it satisfies that $F\u0026rsquo;(x)=f(x)$ $\\forall x \\in \\mathop{\\rm Dom}(f)$.","tags":["Integral","Area"],"title":"Integral calculus","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Ordinary Differential Equations Often in Physics, Chemistry, Biology, Geometry, etc there arise equations that relate a function with its derivative, or successive derivatives.\nDefinition - Ordinary differential equation. An ordinary differential equation (O.D.E.) is a equation that relates an independent variable $x$, a function $y(x)$ that depends on $x$, and the successive derivatives of $y$, $y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n)}$; it can be written as\n$$F(x, y, y\u0026rsquo;, y\u0026rsquo;\u0026rsquo;,\\ldots, y^{(n)})=0.$$\nThe order of a differential equation is the greatest order of the derivatives in the equation.\nExample. The equation $y\u0026rsquo;\u0026rsquo;\u0026rsquo;+sen(x)y\u0026rsquo;=2x$ is a differential equation of order 3.\nDeducing a differential equation To deduce a differential equation that explains a natural phenomenon is essential to understand what a derivative is and how to interpret it.\nExample. Newton’s law of cooling states\n“The rate of change of the temperature of a body in a surrounding medium is proportional to the difference between the temperature of the body $T$ and the temperature of the medium $T_a$.”\nThe rate of change of the temperature is the derivative of temperature with respect to time $dT/dt$. Thus, Newton’s law of cooling can be explained by the differential equation\n$$\\frac{dT}{dt}=k(T-T_a),$$\nwhere $k$ is a proportionality constant.\nSolution of an ordinary differential equation Definition - Solution of an ordinary differential equation. Given an ordinary differential equation $F(x,y,y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n})=0$, the function $y=f(x)$ is a solution of the ordinary differential equation if it satisfies the equation, that is, if\n$$F(x,f(x), f\u0026rsquo;(x), f\u0026rsquo;\u0026rsquo;(x),\\ldots, f^{(n}(x))=0.$$\nThe graph of a solution of the ordinary differential equation is known as integral curve.\nSolving an ordinary differential equations consists on finding all its solutions in a given domain. For that integral calculus is required.\nThe same manner than the indefinite integral is a family of antiderivatives, that differ in a constant term, after integrating an ordinary differential equation we get a family of solutions that differ in a constant. We can get particular solutions giving values to this constant.\nGeneral solution of an ordinary differential equation Definition - General solution of an ordinary differential equation. Given an ordinary differential equation $F(x,y,y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n})=0$ of order $n$, the general solution of the differential equation is a family of functions\n$$y =f (x,C_1,\\ldots,C_n),$$\ndepending on $n$ constants, such that for any value of $C_1,\\ldots,C_n$ we get a solution of the differential equation.\nFor every value of the constant we get particular solution of the differential equation. Thus, when a differential equation can be solved, it has infinite solutions.\nGeometrically, the general solution of a differential equation corresponds to a family of integral curves of the differential equation.\nOften, it is common to impose conditions to the solutions of a differential equation to reduce the number of solutions. In many cases, these conditions allow to determine the values of the constants in the general solution to get a particular solution.\nFirst order differential equations In this chapter we are going to study first order differential equations\n$$F(x,y,y\u0026rsquo;)=0.$$\nThe general solution of a first order differential equation is\n$$y = f (x,C),$$\nso to get a particular solution from the general one, it is enough to set the value of the constant $C$, and for that we only need to impose one initial condition.\nDefinition - Initial value problem. The group consisting of a first order differential equation and an initial condition is known as initial value problem:\n$$ \\begin{cases} F(x,y,y\u0026rsquo;)=0, \u0026amp; \\mbox{First order differential equation;} \\newline y(x_0)=y_0, \u0026amp; \\mbox{Initial condition.} \\end{cases} $$\nSolving an initial value problem consists in finding a solution of the first order differential equation that satisfies the initial condition.\nExample. Recall the first order differential equation of the Newton’s law of cooling, $$\\frac{dT}{dt}=k(T-T_a),$$ where $T$ is the temperature of the body and $T_a$ is the temperature of the surrounding medium.\nIt is easy to check that the general solution of this equation is\n$$T(t) = Ce^{kt}+T_a.$$\nIf we impose the initial condition that the temperature of the body at the initial instant is $5$ ºC, that is, $T(0)=5$, we have\n$$T(0) = Ce^{k\\cdot0}+T_a = C+T_a = 5,$$\nfrom where we get $C=5-T_a$, and this give us the particular solution\n$$T(t) = (5-T_a)e^{kt}+T_a.$$\nIntegral curve of an initial value problem Example. If we assume in the previous example that the temperature of the surrounding medium is $T_a=0$ ºC and the cooling constant of the body is $k=1$, the general solution of the differential equation is $$T(t)=Ce^t,$$ that is a family of integral curves. Among all of them, only the one that passes through the point $(0,5)$ corresponds to the particular solution of the previous initial value problem.\nExistence and uniqueness of solutions Theorem - Existence and uniqueness of solutions of a first order ODE. Given an initial value problem\n$$\\begin{cases} y\u0026rsquo;=F(x,y);\\newline y(x_0)=y_0; \\end{cases} $$\nif $F(x,y(x))$ is a function continuous on an open interval around the point $(x_0,y_0)$, then a solution of the initial value problem exists. If, in addition, $\\frac{\\partial F}{\\partial y}$ is continuous in an open interval around $(x_0,y_0)$, the solution is unique.\nAlthough this theorem guarantees the existence and uniqueness of a solution of a first order differential equation, it does not provide a method to compute it. In fact, there is not a general method to solve first order differential equations, but we will see how to solve some types:\nSeparable differential equations Homogeneous differential equations Linear differential equations Separable differential equations Definition - Separable differential equation. A separable differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;g(y)=f(x),$$\nor what is the same,\n$$g(y)dy=f(x)dx,$$\nso the different variables are on different sides of the equality (the variables are separated).\nThe general solution for a separable differential equation comes after integrating both sides of the equation\n$$\\int g(y)\\,dy = \\int f(x)\\,dx+C.$$\nExample. The differential equation of the Newton’s law of cooling\n$$\\frac{dT}{dt}=k(T-T_a),$$\nis a separable differential equation since it can be written as\n$$\\frac{1}{T-T_a}dT=k\\,dt.$$\nIntegrating both sides of the equation we have\n$$\\int \\frac{1}{T-T_a}\\,dT=\\int k\\,dt\\Leftrightarrow \\log(T-T_a)=kt+C,$$\nand solving for $T$ we get the general solution of the equation\n$$T(t)=e^{kt+C}+T_a=e^Ce^{kt}+T_a=Ce^{kt}+T_a,$$\nrewriting $C=e^C$ as an arbitrary constant.\nHomogeneous differential equations Definition - Homogeneous function. A function $f(x,y)$ is homogeneous of degree $n$, if it satisfies\n$$f(kx,ky)= k^nf(x,y),$$\nfor any value $k\\in \\mathbb{R}$.\nIn particular, a homogeneous function of degree $0$ always satisfies\n$$f(kx,ky)=f(x,y).$$\nSetting $k=1/x$ we have\n$$f(x,y)=f\\left(\\frac{1}{x}x,\\frac{1}{x}y\\right)=f\\left(1,\\frac{y}{x}\\right)=g\\left(\\frac{y}{x}\\right).$$\nThis way, a homogeneous function of degree $0$ always can be written as a function of $u=y/x$:\n$$f(x,y)=g\\left(\\frac{y}{x}\\right)=g(u).$$\nDefinition - Homogeneous differential equation. A homogeneous differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;=f(x,y),$$\nwhere $f(x,y)$ is a homogeneous function of degree $0$.\nWe can solve a homogeneous differential equation by making the substitution\n$$u=\\frac{y}{x}\\Leftrightarrow y=ux,$$\nso the equation becomes\n$$u\u0026rsquo;x+u=f(u),$$\nthat is a separable differential equation.\nOnce solved the separable differential equation, the substitution must be undone.\nExample. Let us consider the following differential equation $$4x-3y+y\u0026rsquo;(2y-3x)=0.$$\nRewriting the equation in this way\n$$y\u0026rsquo;=\\frac{3y-4x}{2y-3x}$$\nwe can easily check that it is a homogeneous differential equation.\nTo solve this equation we have to do the substitution $y=ux$, and we get\n$$u\u0026rsquo;x+u=\\frac{3ux-4x}{2ux-3x}=\\frac{3u-4}{2u-3}$$\nthat is a separable differential equation.\nSeparating the variables we have\n$$u\u0026rsquo;x=\\frac{3u-4}{2u-3}-u=\\frac{-2u^2+6u-4}{2u-3}\\Leftrightarrow \\frac{2u-3}{-2u^2+6u-4}\\,du=\\frac{1}{x}\\,dx.$$\nNow, integrating both sides of the equation we have\n$$ \\renewcommand{\\arraystretch}{2} \\begin{array}{c} \\displaystyle \\int \\frac{2u-3}{-2u^2+6u-4}\\,du=\\int \\frac{1}{x}\\,dx \\Leftrightarrow -\\frac{1}{2}\\log|u^2-3u+2|=\\log|x|+C \\Leftrightarrow\\newline \\Leftrightarrow \\log|u^2-3u+2|=-2\\log|x|-2C, \\end{array} $$\nthen, applying the exponential function to both sides and simplifying we get the general solution\n$$u^2-3u+2=e^{-2\\log|x|-2C}=\\frac{e^{-2C}}{e^{\\log|x|^2}}=\\frac{C}{x^2},$$\nrewriting the constant $K=e^{-2C}$.\nFinally, undoing the initial substitution $u=y/x$, we arrive at the general solution of the homogeneous differential equation\n$$\\left(\\frac{y}{x}\\right)^2-3\\frac{y}{x}+2=\\frac{K}{x^2}\\Leftrightarrow y^2-3xy+2x^2=K.$$\nLinear differential equations Definition - Linear differential equation A linear differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;+g(x)y = h(x).$$\nTo solve a linear differential equation we try to write the left side of the equation as the derivative of a product. For that we multiply both sides by the function $f(x)$, such that\n$$f\u0026rsquo;(x)=g(x)f(x).$$\nThus, we get\n$$ \\begin{array}{c} y\u0026rsquo;f(x)+g(x)f(x)y=h(x)f(x)\\newline \\Updownarrow\\newline y\u0026rsquo;f(x)+f\u0026rsquo;(x)y=h(x)f(x)\\newline \\Updownarrow\\newline \\dfrac{d}{dx}(yf(x))=h(x)f(x) \\end{array} $$\nIntegrating both sides of the previous equation we get the solution\n$$yf(x)=\\int h(x)f(x)\\,dx+C.$$\nOn the other hand, the unique function that satisfies $f\u0026rsquo;(x)=g(x)f(x)$ is\n$$f(x)=e^{\\int g(x)\\,dx},$$\nso, substituting this function in the previous solution we arrive at the solution of the linear differential equation\n$$ye^{\\int g(x)\\,dx}=\\int h(x) e^{\\int g(x)\\,dx}\\,dx+C,$$\nor what is the same\nSolution of a linear differential equation.\n$$y=e^{-\\int g(x)\\,dx}\\left(\\int h(x)e^{\\int g(x)\\,dx}\\,dx+C\\right).$$\nExample. If in the differential equation of the Newton’s law of cooling the temperature of the surrounding medium is a function of time $T_a(t)$, then the differential equation\n$$\\frac{dT}{dt}=k(T-T_a(t)),$$\nis a linear differential equation since it can be written as\n$$T\u0026rsquo;-kT=-kT_a(t),$$\nwhere the independent term is $-kT_a(t)$ and the coefficient of $T$ is $-k$.\nSubstituting in the formula of the general solution of a linear differential equation we have\n$$y=e^{-\\int -k\\,dt}\\left(\\int -kT_a(t)e^{\\int -k\\,dt}\\,dt+C\\right)= e^{kt}\\left(-\\int kT_a(t)e^{-kt}\\,dt+C\\right).$$\nIn the particular case that $T_a(t)=t$, and the proportionality constant $k=1$, the general solution of the linear differential equation is\n$$y=e^{t}\\left(-\\int te^{-kt}\\,dt+C\\right)=e^t(e^{-t}(t+1)+C)=Ce^t+t+1.$$\nIf, in addition, we know that the temperature of the body at time $t=0$ is $5$ ºC, that is, we have the initial condition $T(0)=5$, then we can compute the value of the constant $C$,\n$$y(0)=Ce^0+0+1=5 \\Leftrightarrow C+1=5 \\Leftrightarrow C=4,$$ and we get the particular solution\n$$y(t)=4e^t+t+1.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600986575,"objectID":"a0eba6affdd1b66e27df2ea84c83aa50","permalink":"/en/teaching/calculus/manual/ordinary-differential-equations/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/ordinary-differential-equations/","section":"teaching","summary":"Ordinary Differential Equations Often in Physics, Chemistry, Biology, Geometry, etc there arise equations that relate a function with its derivative, or successive derivatives.\nDefinition - Ordinary differential equation. An ordinary differential equation (O.","tags":["Ordinary Differential Equation"],"title":"Ordinary differential equations","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Construct the sample space of the following random experiments:\nPick a random person and record the gender and whether she or he is smoker or not. Pick a random person and record the blood type and whether she or he is smoker or not. Pick a random person and record the gender, the blood type and whether she or he is smoker or not. Exercise 2 There are two boxes with balls of different colors. The first box contains 3 white balls and 2 black balls, and the second one contains 2 blue balls, 1 red ball and 1 green ball. Construct the sample space of the following random experiments:\nPick a random ball from every box. Pick two random balls from every box. Exercise 3 The Morgan’s laws state that given two events $A$ and $B$ from the same sample space, $\\overline{A\\cup B}=\\bar A \\cap \\bar B$ and $\\overline{A\\cap B}=\\bar A \\cup \\bar B$. Proof both statements graphically using Venn diagrams.\nExercise 4 Compute the probability of the following events of the random experiment consisting in tossing 3 coins:\nGet exactly 1 head. Get exactly 2 tails. Get two or more heads. Get some tails. Solution $P(\\mbox{1 head})=0.375$. $P(\\mbox{2 tails})=0.375$. $P(\\mbox{2 or more heads})=0.5$. $P(\\mbox{some tails})=0.875$. Exercise 5 In a laboratory there are 4 flasks with sulfuric acid and 2 with nitric acid, and in another laboratory there are 1 flask with sulfuric acid and 3 with nitric acid. A random experiment consist in picking two flask, one from every laboratory. Compute the probability of the following events:\nThe two picked flasks are of sulfuric acid. The two picked flasks are of nitric acid. The two picked flasks contains different acids. Compute again the above probabilities if the flask picked in the first laboratory is put in the second laboratory before picking the flask from it.\nSolution $P(\\mbox{Two flasks of sulfuric acid})=4/24$. $P(\\mbox{Two flasks of nitric acid})=6/24$. $P(\\mbox{One flask of each})=14/24$. Putting the first flask in the second laboratory: $P(\\mbox{Two flasks of sulfuric acid})=8/30$. $P(\\mbox{Two flasks of nitric acid})=8/30$. $P(\\mbox{One flask of each})=14/30$. Exercise 6 Let $A$ and $B$ two be events of a same sample space, such that $P(A)=3/8$, $P(B)=1/2$, $P(A\\cap B)=1/4$. Compute the following probabilities:\n$P(A\\cup B)$. $P(\\bar A)$ y $P(\\bar B)$. $P(\\bar A\\cap \\bar B)$. $P(A\\cap \\bar B)$. $P(A\\vert B)$. $P(A\\vert \\bar B)$. Solution $P(A\\cup B)=5/8$. $P(\\bar A)=5/8$ and $P(\\bar B)=1/2$. $P(\\bar A\\cap \\bar B)=3/8$. $P(A\\cap \\bar B)=1/8$. $P(A\\vert B)=1/2$. $P(A\\vert \\bar B)=1/4$. Exercise 7 In a hospital the probability of getting hepatitis in a blood transfusion from a unit of blood is $0.01$. A patient gets two units of blood while staying at the hospital. What is the probability of getting hepatitis?\nSolution $P(\\mbox{Hepatitis})=0.0199$. Exercise 8 Let $A$ and $B$ be two events of a same sample space, such that $P(A)=0.6$ and $P(A\\cup B)=0.9.$ Compute $P(B)$ under the following assumptions:\n$A$ and $B$ are incompatible. $A$ and $B$ are independent. Solution $P(B)=0.3$. $P(B)=0.75$. Exercise 9 A study about smoking has found that 40% of smokers have a smoker father, 25% have a smoker mother and 52% have al least one of the parents smoker. We pick a random person from this population. Answer the following questions:\nWhat is the probability of having a smoker mother if the father smokes? What is the probability of having a smoker mother if the father does not smoke? Are independent the events having a smoker father and having a smoker mother? Solution Naming $SF$ tho the event of having a smoker father and $SM$ to the event of having a smoker mother,\n$P(SM/SF)=0.325$. $P(SM/\\bar SF)=0.2$. The events aren\u0026rsquo;t independent. Exercise 10 The probability that an injury $A$ is repeated is $4/5$, the probability that another injury $B$ is repeated is $1/2$, and the probability that both injuries are repeated is $1/3$. Compute the probability of the following events:\nOnly injury $B$ is repeated. At least one injury is repeated. Injury $B$ is repeated if injury $A$ has been repeated. Injury $B$ is repeated if injury $A$ has not been repeated. Solution $P(B\\cap\\overline A)=1/6$. $P(A\\cup B)=29/30$. $P(B\\vert A)=5/12$. $P(B\\vert \\overline A)=5/6$. Exercise 11 In a digestive clinic, from every 1000 patients that arrive with stomach pain, 700 have gastritis, 200 have an ulcer and 100 have cancer. After analyzing the gastric symptoms, it is known that the probability of vomiting is $0.3$ in case of gastritis, $0.6$ in case of ulcer and $0.9$ in case of cancer. What is the diagnosis for a new patient with stomach pain that suffers from vomiting?\nNote: Assume that the only diseases are gastritis, ulcer and cancer and that are incompatible among them.\nSolution Let $G$, $U$ and $C$ be the events of having gastritis, ulcer and cander respectively, and let $V$ be the event of vomiting, $P(G/V)=0.5$, $P(U/V)=0.286$ and $P(C/V)=0.214$, so, the diagnosis is gastritis. Exercise 12 A severe pain without effusion in a particular zone of the knee joint is a symptom of sprained lateral collateral ligament (SLCL). If the sprains in that ligament are classified into grade 1, when there is only distension (60% of cases), grade 2 when there is a partial tearing (30% of cases) and grade 3 when there is a complete tearing (10% of cases). Taking into account that the symptom appears in 80% of cases of grade 1 sprains, 90% of cases of grade 2 and 100% of cases of grade 3, answer the following questions:\nIf a person has SLCL what is the probability that he or she present severe pain without effusion? What is the diagnosis for a person with severe pain without effusion? From a total of 10000 people with severe pain without effusion, how many are expected to have a grade 1 sprain? How many are expected to have a grade 2 sprain? And a grade 3 sprain? Solution Naming $S$ to the event of presenting severe pain without effusion, and $G1$, $G2$ and $G3$ to the events of having a SLCL of grade 1, 2 and 3 respectively,\n$P(S)=0.85$. $P(G1\\vert S)=0.5647$, $P(G2\\vert S)=0.3176$ and $P(G3\\vert S)=0.1176$, so the diagnosis is a SLCL of grade 1. $5647.0588$ will have a grade 1 sprain, $3176.4706$ will have a grade 2 sprain and $1176.4706$ will have a grade 3 sprain. Exercise 13 A physiotherapist uses two techniques $A$ and $B$ to cure an injury. It is known that the injury is 3 times more frequent in people over 30 than in people under 30. It is also known that in people over 30 technique $A$ works in 30% of cases and technique $B$ in 60%, while in people under 30 technique $A$ works in 50% of cases and technique $B$ in 70%. If both techniques are applied with the same probability, no matter the age,\nWhat is the probability that a random person under 30 is cured? And for a people over 30? What is the probability that a random person is cured? If after applying a technique to a person over 30, the person does not cure, what is the probability that the technique applied was $A$? Solution Naming $J$ to the event of being under 30, $C$ to the event of being cured, and $A$ and $B$ to the events of applying techniques $A$ and $B$ respectively,\n$P(C\\vert J)=0.45.$ and $P(C\\vert \\bar J)=0.6$. $P(C)=0.5625$.\n$P(A/\\bar J\\cap \\bar C)=0.625$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"926fe34432677779f405a3d111214b9f","permalink":"/en/teaching/statistics/problems/probability/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/probability/","section":"teaching","summary":"Exercise 1 Construct the sample space of the following random experiments:\nPick a random person and record the gender and whether she or he is smoker or not. Pick a random person and record the blood type and whether she or he is smoker or not.","tags":["Probability"],"title":"Problems of Probability","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Epidemiology"],"content":"Exercise 1 A test was applied to a sample of people in order to evaluate its effectiveness; the results are as follows:\n$$ \\begin{array}{l|cc} \u0026amp; \\mbox{Test }+ \u0026amp; \\mbox{Test }- \\newline \\hline \\mbox{Sick} \u0026amp; 2020 \u0026amp; 140 \\newline\n\\mbox{Healthy} \u0026amp; 80 \u0026amp; 7760 \\newline \\end{array} $$\nCalculate for this test:\nThe sensitivity and the specificity. The positive and negative predictive value. The probability of a correct diagnosis. Solution Naming $S$ and $H$ to the events of being sick and healthy respectively,\nSensitivity $P(+\\vert S)=0.9352$ and specificity $P(-\\vert H)=0.9898$. PPV $P(S\\vert +)=0.9619$ and NPV $P(H\\vert -)=0.9823$. $P((S\\cap +)\\cup (H\\cap -)) = P(S\\cap +) + P(H\\cap -) = 0.978$. Exercise 2 We know, from a research study, that 10% of people over 50 suffer a particular type or arthritis. We have developed a new method to detect the disease and after clinical trials we observe that if we apply the method to people with arthritis we get a positive result in 85% of cases, while if we apply the method to people without arthritis, we get a positive result in 4% of cases. Answer the following questions:\nWhat is the probability of getting a positive result after applying the method to a random person? If the result of applying the method to one person has been positive, what is the probability of having arthritis? Solution Naming $A$ to the event of having arthritis,\n$P(+)=0.121$. $P(A\\vert +) = 0.7025$. Exercise 3 We have two different test $A$ and $B$ to diagnose a disease. Test $A$ have a sensitivity of 98% and a specificity of 80%, while test $B$ have a sensitivity of 75% and a specificity of 99%.\nWhich test is better to confirm the disease? Which test is better to rule out the disease? Often a test is used to discard the presence of the disease in a large amount of people apparently healthy. This type of test is known as screening test. Which test will work better as a screening test? What are the predictive values of this test if the prevalence of the disease is 0.01? And if the prevalence of de disease is 0.2? The positive predictive value of a screening test used to be not too high. How can we combine the tests $A$ and $B$ to have a higher confidence in the diagnosis of the disease? Calculate the post-test probability of having the disease with the combination of both thest, if the outcome of both test is positive for a prevalence of 0.01. Solution Test $B$ cause it has a greater specificity. Test $A$ cause it has a greater sensitivity. Test $A$ will perform better as a screening test.\nFor a prevalence of $0.01$ the PPV is $P(D\\vert +)=0.0472$ and the NPV is $P(\\bar D\\vert -)=0.9997$.\nFor a prevalence of $0.2$ the PPV is $P(D\\vert +)=0.5506$ and the NPV is $P(\\bar D\\vert -)=0.9938$. First applying test $A$ to everybody and then applying test $B$ to positive cases of test $A$.\n$P(D\\vert +_A\\cap +_B)=0.7878$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"d9f103bcc7581ebf8ef06c9e4bd3de2a","permalink":"/en/teaching/statistics/problems/diagnostic_tests/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/diagnostic_tests/","section":"teaching","summary":"Exercise 1 A test was applied to a sample of people in order to evaluate its effectiveness; the results are as follows:\n$$ \\begin{array}{l|cc} \u0026amp; \\mbox{Test }+ \u0026amp; \\mbox{Test }- \\newline \\hline \\mbox{Sick} \u0026amp; 2020 \u0026amp; 140 \\newline","tags":["Probability","Diagnostic Tests"],"title":"Problems Diagnostic Tests","type":"book"},{"authors":null,"categories":["Calculus","Several Variables Calculus"],"content":"Vector functions of a single real variable Definition - Vector function of a single real variable. A vector function of a single real variable or vector field of a scalar variable is a function that maps every scalar value $t\\in D\\subseteq \\mathbb{R}$ into a vector $(x_1(t),\\ldots,x_n(t))$ in $\\mathbb{R}^n$:\n$$ \\begin{array}{rccl} f: \u0026amp; \\mathbb{R} \u0026amp; \\longrightarrow \u0026amp; \\mathbb{R}^n\\newline \u0026amp; t \u0026amp; \\longrightarrow \u0026amp; (x_1(t),\\ldots, x_n(t)) \\end{array} $$\nwhere $x_i(t)$, $i=1,\\ldots,n$, are real function of a single real variable known as coordinate functions.\nThe most common vector field of scalar variable are in the the real plane $\\mathbb{R}^2$, where usually they are represented as\n$$f(t)=x(t)\\mathbf{i}+y(t)\\mathbf{j},$$\nand in the real space $\\mathbb{R}^3$, where usually they are represented as\n$$f(t)=x(t)\\mathbf{i}+y(t)\\mathbf{j}+z(t)\\mathbf{k},$$\nGraphic representation of vector fields The graphic representation of a vector field in $\\mathbb{R}^2$ is a trajectory in the real plane.\nThe graphic representation of a vector field in $\\mathbb{R}^3$ is a trajectory in the real space.\nDerivative of a vector field The concept of derivative as the limit of the average rate of change of a function can be extended easily to vector fields.\nDefinition - Derivative of a vectorial field. A vectorial field $f(t)=(x_1(t),\\ldots,x_n(t))$ is differentiable at a point $t=a$ if the limit\n$$\\lim_{\\Delta t\\rightarrow 0} \\frac{f(a+\\Delta t)-f(a)}{\\Delta t}.$$\nexists. In such a case, the value of the limit is known as the derivative of the vector field at $a$, and it is written $f\u0026rsquo;(a)$.\nMany properties of real functions of a single real variable can be extended to vector fields through its component functions. Thus, for instance, the derivative of a vector field can be computed from the derivatives of its component functions.\nTheorem. Given a vector field $f(t)=(x_1(t),\\ldots,x_n(t))$, if $x_i(t)$ is differentiable at $t=a$ for all $i=1,\\ldots,n$, then $f$ is differentiable at $a$ and its derivative is\n$$f\u0026rsquo;(a)=(x_1\u0026rsquo;(a),\\ldots,x_n\u0026rsquo;(a))$$\nProof The proof for a vectorial field in $\\mathbb{R}^2$ is easy.\n$$\\begin{aligned} f\u0026rsquo;(a)\u0026amp;=\\lim_{\\Delta t\\rightarrow 0} \\frac{f(a+\\Delta t)-f(a)}{\\Delta t} = \\lim_{\\Delta t\\rightarrow 0} \\frac{(x(a+\\Delta t),y(a+\\Delta t))-(x(a),y(a))}{\\Delta t} =\\newline \u0026amp;= \\lim_{\\Delta t\\rightarrow 0} \\left(\\frac{x(a+\\Delta t)-x(a)}{\\Delta t},\\frac{y(a+\\Delta t)-y(a)}{\\Delta t}\\right) =\\newline \u0026amp;= \\left(\\lim_{\\Delta t\\rightarrow 0}\\frac{x(a+\\Delta t)-x(a)}{\\Delta t},\\lim_{\\Delta t\\rightarrow 0}\\frac{y(a+\\Delta t)-y(a)}{\\Delta t}\\right) = (x\u0026rsquo;(a),y\u0026rsquo;(a)). \\end{aligned} $$\nKinematics: Curvilinear motion The notion of derivative as a velocity along a trajectory in the real line can be generalized to a trajectory in any euclidean space $\\mathbb{R}^n$.\nIn case of a two dimensional space $\\mathbb{R}^2$, if $f(t)$ describes the position of a moving object in the real plane at any time $t$, taking as reference the coordinates origin $O$ and the unitary vectors ${\\mathbf{i}=(1,0),\\mathbf{j}=(0,1)}$, we can represent the position of the moving object $P$ at every moment $t$ with a vector $\\vec{OP}=x(t)\\mathbf{i}+y(t)\\mathbf{j}$, where the coordinates\n$$ \\begin{cases} x=x(t)\\newline y=y(t) \\end{cases} \\quad t\\in \\mbox{Dom}(f) $$\nare the coordinate functions of $f$.\nIn this context the derivative of a trajectory $f\u0026rsquo;(a)=(x_1\u0026rsquo;(a),\\ldots,x_n\u0026rsquo;(a))$ is the velocity vector of the trajectory $f$ at moment $t=a$. Example. Given the trajectory $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, whose image is the unit circumference centred in the coordinate origin, its coordinate functions are $x(t) = \\cos t$, $y(t) = \\sin t$, $t\\in \\mathbb{R}$, and its velocity is\n$$\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t))=(-\\sin t, \\cos t).$$\nIn the moment $t=\\pi/4$, the object is in position $f(\\pi/4) = (\\cos(\\pi/4),\\sin(\\pi/4)) =(\\sqrt{2}/2,\\sqrt{2}/2)$ and it is moving with a velocity $\\mathbf{v}=f\u0026rsquo;(\\pi/4)=(-\\sin(\\pi/4),\\cos(\\pi/4))=(-\\sqrt{2}/2,\\sqrt{2}/2)$.\nObserve that the module of the velocity vector is always 1 as $\\vert\\mathbf{v}\\vert=\\sqrt{(-\\sin t)^2+(\\cos t)^2}=1$.\nTangent line to a trajectory Tangent line to a trajectory in the plane Vectorial equation Given a trajectory $f(t)$ in the real plane, the vectors that are parallel to the velocity $\\mathbf{v}$ at a moment $a$ are called tangent vectors to the trajectory $f$ at the moment $a$, and the line passing through $P=f(a)$ directed by $\\mathbf{v}$ is the tangent line to the graph of $f$ at the moment $a$.\nDefinition - Tangent line to a trajectory. Given a trajectory $f(t)$ in the real plane $\\mathbb{R}^2$, the tangent line to to the graph of $f$ at $a$ is the line with equation\n$$ \\begin{aligned} l:(x,y) \u0026amp;= f(a)+tf\u0026rsquo;(a) = (x(a),y(a))+t(x\u0026rsquo;(a),y\u0026rsquo;(a))\\newline \u0026amp; = (x(a)+tx\u0026rsquo;(a),y(a)+ty\u0026rsquo;(a)). \\end{aligned} $$\nExample. We have seen that for the trajectory $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, whose image is the unit circumference centred at the coordinate origin, the object position at the moment $t=\\pi/4$ is $f(\\pi/4)=(\\sqrt{2}/2,\\sqrt{2}/2)$ and its velocity $\\mathbf{v}=(-\\sqrt{2}/2,\\sqrt{2}/2)$. Thus the equation of the tangent line to $f$ at that moment is\n$$ \\begin{aligned} l: (x,y) \u0026amp; = f(\\pi/4)+t\\mathbf{v} = \\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right)+t\\left(\\frac{-\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right) =\\newline \u0026amp; =\\left(\\frac{\\sqrt{2}}{2}-t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right). \\end{aligned} $$\nCartesian and point-slope equations From the vectorial equation of the tangent to a trajectory $f(t)$ at the moment $t=a$ we can get the coordinate functions\n$$ \\begin{cases} x=x(a)+tx\u0026rsquo;(a)\\newline y=y(a)+ty\u0026rsquo;(a) \\end{cases} \\quad t\\in \\mathbb{R}, $$\nand solving for $t$ and equalling both equations we get the Cartesian equation of the tangent\n$$\\frac{x-x(a)}{x\u0026rsquo;(a)}=\\frac{y-y(a)}{y\u0026rsquo;(a)},$$\nif $x\u0026rsquo;(a)\\neq 0$ and $y\u0026rsquo;(a)\\neq 0$.\nFrom this equation it is easy to get the point-slope equation of the tangent\n$$y-y(a)=\\frac{y\u0026rsquo;(a)}{x\u0026rsquo;(a)}(x-x(a)).$$\nExample. Using the vectorial equation of the tangent of the previous example\n$$l: (x,y)=\\left(\\frac{\\sqrt{2}}{2}-t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right),$$\nits Cartesian equation is $$\\frac{x-\\sqrt{2}/2}{-\\sqrt{2}/2} = \\frac{y-\\sqrt{2}/2}{\\sqrt{2}/2}$$ and the point-slope equation is\n$$y-\\sqrt{2}/2 = \\frac{-\\sqrt{2}/2}{\\sqrt{2}/2}(x-\\sqrt{2}/2) \\Rightarrow y=-x+\\sqrt{2}.$$\nNormal line to a trajectory in the plane We have seen that the tangent line to a trajectory $f(t)$ at $a$ is the line passing through the point $P=f(a)$ directed by the velocity vector $\\mathbf{v}=f\u0026rsquo;(a)=(x\u0026rsquo;(a),y\u0026rsquo;(a))$. If we take as direction vector a vector orthogonal to $\\mathbf{v}$, we get another line that is known as normal line to the trajectory.\nDefinition - Normal line to a trajectory. Given a trajectory $f(t)$ in the real plane $\\mathbb{R}^2$, the normal line to the graph of $f$ at moment $t=a$ is the line with equation\n$$l: (x,y)=(x(a),y(a))+t(y\u0026rsquo;(a),-x\u0026rsquo;(a)) = (x(a)+ty\u0026rsquo;(a),y(a)-tx\u0026rsquo;(a)).$$\nThe Cartesian equation is\n$$\\frac{x-x(a)}{y\u0026rsquo;(a)} = \\frac{y-y(a)}{-x\u0026rsquo;(a)},$$\nand the point-slope equation is\n$$y-y(a) = \\frac{-x\u0026rsquo;(a)}{y\u0026rsquo;(a)}(x-x(a)).$$\nThe normal line is always perpendicular to the tangent line as their direction vectors are orthogonal. Example. Considering again the trajectory of the unit circumference $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, the normal line to the graph of $f$ at moment $t=\\pi/4$ is\n$$ \\begin{aligned} l: (x,y)\u0026amp;=(\\cos(\\pi/2),\\sin(\\pi/2))+t(\\cos(\\pi/2),\\sin(\\pi/2)) =\\newline \u0026amp;= \\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right)+t\\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right) =\\left(\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right), \\end{aligned} $$\nthe Cartesian equation is\n$$\\frac{x-\\sqrt{2}/2}{\\sqrt{2}/2} = \\frac{y-\\sqrt{2}/2}{\\sqrt{2}/2},$$ and the point-slope equation is $$y-\\sqrt{2}/2 = \\frac{\\sqrt{2}/2}{\\sqrt{2}/2}(x-\\sqrt{2}/2) \\Rightarrow y=x.$$\nTangent and normal lines to a function A particular case of tangent and normal lines to a trajectory are the tangent and normal lines to a function of one real variable. For every function $y=f(x)$, the trajectory that trace its graph is\n$$g(x) = (x,f(x)) \\quad x\\in \\mathbb{R},$$\nand its velocity is\n$$g\u0026rsquo;(x) = (1,f\u0026rsquo;(x)),$$\nso that the tangent line to $g$ at the moment $a$ is\n$$\\frac{x-a}{1} = \\frac{y-f(a)}{f\u0026rsquo;(a)} \\Rightarrow y-f(a) = f\u0026rsquo;(a)(x-a),$$\nand the normal line is\n$$\\frac{x-a}{f\u0026rsquo;(a)} = \\frac{y-f(a)}{-1} \\Rightarrow y-f(a) = \\frac{-1}{f\u0026rsquo;(a)}(x-a).$$\nExample. Given the function $y=x^2$, the trajectory that traces its graph is $g(x)=(x,x^2)$ and its velocity is $g\u0026rsquo;(x)=(1,2x)$. At the moment $x=1$ the trajectory passes through the point $(1,1)$ with a velocity $(1,2)$. Thus, the tangent line at that moment is\n$$\\frac{x-1}{1} = \\frac{y-1}{2} \\Rightarrow y-1 = 2(x-1) \\Rightarrow y = 2x-1,$$\nand the normal line is\n$$\\frac{x-1}{2} = \\frac{y-1}{-1} \\Rightarrow y-1 = \\frac{-1}{2}(x-1) \\Rightarrow y = \\frac{-x}{2}+\\frac{3}{2}.$$\nTangent line to a trajectory in the space The concept of tangent line to a trajectory can be easily extended from the real plane to the three-dimensional space $\\mathbb{R}^3$.\nIf $f(t)=(x(t),y(t),z(t))$, $t\\in \\mathbb{R}$, is a trajectory in the real space $\\mathbb{R}^3$, then at the moment $a$, the moving object that follows this trajectory will be at the position $P=(x(a),y(a),z(a))$ with a velocity $\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t),z\u0026rsquo;(t))$. Thus, the tangent line to $f$ at this moment have the following vectorial equation\n$$ \\begin{aligned} l\u0026amp;: (x,y,z)=(x(a),y(a),z(a))+t(x\u0026rsquo;(a),y\u0026rsquo;(a),z\u0026rsquo;(a)) =\\newline \u0026amp;= (x(a)+tx\u0026rsquo;(a),y(a)+ty\u0026rsquo;(a),z(a)+tz\u0026rsquo;(a)), \\end{aligned} $$\nand the Cartesian equations are $$\\frac{x-x(a)}{x\u0026rsquo;(a)}=\\frac{y-y(a)}{y\u0026rsquo;(a)}=\\frac{z-z(a)}{z\u0026rsquo;(a)},$$ provided that $x\u0026rsquo;(a)\\neq 0$, $y\u0026rsquo;(a)\\neq 0$ y $z\u0026rsquo;(a)\\neq 0$.\nExample. Given the trajectory $f(t)=(\\cos t, \\sin t, t)$, $t\\in \\mathbb{R}$ in the real space, at the moment $t=\\pi/2$ the trajectory passes through the point\n$$f(\\pi/2)=(\\cos(\\pi/2),\\sin(\\pi/2),\\pi/2)=(0,1,\\pi/2),$$\nwith velocity\n$$\\mathbf{v}=f\u0026rsquo;(\\pi/2)=(-\\sin(\\pi/2),\\cos(\\pi/2), 1)=(-1,0,1),$$\nand the tangent line to the graph of $f$ at that moment is\n$$l:(x,y,z)=(0,1,\\pi/2)+t(-1,0,1) = (-t,1,t+\\pi/2).$$\nInteractive Example\nNormal plane to a trajectory in the space In the three-dimensional space $\\mathbb{R}^3$, the normal line to a trajectory is not unique. There are an infinite number of normal lines and all of them are in the normal plane.\nIf $f(t)=(x(t),y(t),z(t))$, $t\\in \\mathbb{R}$, is a trajectory in the real space $\\mathbb{R}^3$, then at the moment $a$, the moving object that follows this trajectory will be at the position $P=(x(a),y(a),z(a))$ with a velocity $\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t),z\u0026rsquo;(t))$. Thus, using the velocity vector as normal vector the normal plane to $f$ at this moment have the following vectorial equation\n$$ \\begin{aligned} \\Pi \u0026amp;: (x-x(a),y-y(a),z-z(a))(x\u0026rsquo;(a),y\u0026rsquo;(a),z\u0026rsquo;(a)) = 0\\newline \u0026amp;= x\u0026rsquo;(a)(x-x(a))+y\u0026rsquo;(a)(y-y(a))+z\u0026rsquo;(a)(z-z(a))=0. \\end{aligned} $$\nExample. For the trajectory of the previous example $f(t)=(\\cos t, \\sin t, t)$, $t\\in \\mathbb{R}$, at the moment $t=\\pi/2$ the trajectory passes through the point\n$$f(\\pi/2)=(\\cos(\\pi/2),\\sin(\\pi/2),\\pi/2)=(0,1,\\pi/2),$$\nwith velocity\n$$\\mathbf{v}=f\u0026rsquo;(\\pi/2)=(-\\sin(\\pi/2),\\cos(\\pi/2), 1)=(-1,0,1),$$ and normal plane to the graph of $f$ at that moment is\n$$\\Pi:\\left(x-0,y-1,z-\\frac{\\pi}{2}\\right)(-1,0,1) =0 \\Leftrightarrow -x+z-\\frac{\\pi}{2}=0.$$\nInteractive Example\nFunctions of several variables A lot of problems in Geometry, Physics, Chemistry, Biology, etc. involve a variable that depend on two or more variables:\nThe area of a triangle depends on two variables that are the base and height lengths. The volume of a perfect gas depends on two variables that are the pressure and the temperature. The way travelled by an object free falling depends on a lot of variables: the time, the area of the cross section of the object, the latitude and longitude of the object, the height above the sea level, the air pressure, the air temperature, the speed of wind, etc. These dependencies are expressed with functions of several variables.\nDefinition - Functions of several real variables. A function of $n$ real variables or a scalar field from a set $A_1\\times \\cdots \\times A_n\\subseteq \\mathbb{R}^n$ in a set $B\\subseteq \\mathbb{R}$, is a relation that maps any tuple $(a_1,\\ldots,a_n)\\in A_1\\times \\cdots\\times A_n$ into a unique element of $B$, denoted by $f(a_1,\\ldots,a_n)$, that is knwon as the image of $(a_1,\\ldots,a_n)$ by $f$.\n$$ \\begin{array}{lccc} f: \u0026amp; A_1\\times\\cdots\\times A_n \u0026amp; \\longrightarrow \u0026amp; B\\newline \u0026amp;(a_1,\\ldots,a_n) \u0026amp; \\longrightarrow \u0026amp; f(a_1,\\ldots,a_n) \\end{array} $$\nThe area of a triangle is a real function of two real variables $$f(x,y)=\\frac{xy}{2}.$$\nThe volume of a perfect gas is a real function of two real variables $$v=f(t,p)=\\frac{nRt}{p},\\quad \\mbox{with $n$ and $R$ constants.}$$\nGraph of a function of two variables The graph of a function of two variables $f(x,y)$ is a surface in the real space $\\mathbb{R}^3$ where every point of the surface has coordinates $(x,y,z)$, with $z=f(x,y)$.\nExample. The function $f(x,y)=\\dfrac{xy}{2}$ that measures the area of a triangle of base $x$ and height $y$ has the graph below.\nThe function $\\displaystyle f(x,y)=\\frac{\\sin(x^2+y^2)}{\\sqrt{x^2+y^2}}$ has the peculiar graph below.\nLevel set of a scalar field Definition - Level set Given a scalar field $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$, the level set $c$ of $f$ is the set\n$$C_{f,c}={(x_1,\\ldots,x_n): f(x_1,\\ldots,x_n)=c},$$\nthat is, a set where the function takes on the constant value $c$.\nExample. Given the scalar field $f(x,y)=x^2+y^2$ and the point $P=(1,1)$, the level set of $f$ that includes $P$ is\n$$C_{f,2} = {(x,y): f(x,y)=f(1,1)=2} = {(x,y): x^2+y^2=2},$$\nthat is the circumference of radius $\\sqrt{2}$ centred at the origin.\nLevel sets are common in applications like topographic maps, where the level curves correspond to points with the same height above the sea level,\nand weather maps (isobars), where level curves correspond to points with the same atmospheric pressure.\nPartial functions Definition - Partial function. Given a scalar field $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$, an $i$-th partial function of $f$ is any function $f_i:\\mathbb{R}\\rightarrow \\mathbb{R}$ that results of substituting all the variables of $f$ by constants, except the $i$-th variable, that is:\n$$f_i(x)=f(c_1,\\ldots,c_{i-1},x,c_{i+1},\\ldots,c_{n}),$$\nwith $c_j$ $(j=1,\\ldots, n,\\ j\\neq i)$ constants.\nExample. If we take the function that measures the area of a triangle\n$$f(x,y)=\\frac{xy}{2},$$ and set the value of the base to $x=c$, then we the area of the triangle depends only of the height, and $f$ becomes a function of one variable, that is the partial function\n$$f_1(y)=f(c,y)=\\frac{cy}{2},\\quad \\mbox{with $c$ constant}.$$\nPartial derivative notion Variation of a function with respect to a variable We can measure the variation of a scalar field with respect to each of its variables in the same way that we measured the variation of a one-variable function.\nLet $z=f(x,y)$ be a scalar field of $\\mathbb{R}^2$. If we are at point $(x_0,y_0)$ and we increase the value of $x$ a quantity $\\Delta x$, then we move in the direction of the $x$-axis from the point $(x_0,y_0)$ to the point $(x_0+\\Delta x,y_0)$, and the variation of the function is $$\\Delta z=f(x_0+\\Delta x,y_0)-f (x_0,y_0).$$\nThus, the rate of change of the function with respect to $x$ along the interval $[x_0,x_0+\\Delta x]$ is given by the quotient\n$$\\frac{\\Delta z}{\\Delta x}=\\frac{f(x_0+\\Delta x,y_0)-f(x_0,y_0)}{\\Delta x}.$$\nInstantaneous rate of change of a scalar field with respect to a variable If instead o measuring the rate of change in an interval, we measure the rate of change in a point, that is, when $\\Delta x$ approaches 0, then we get the instantaneous rate of change that is the partial derivative with respect to $x$.\n$$\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta z}{\\Delta x}=\\lim_{\\Delta x \\rightarrow 0}\\frac{f(x_0+\\Delta x,y_0)-f(x_0,y_0)}{\\Delta x}.$$\nThe value of this limit, if exists, it is known as the partial derivative of $f$ with respect to the variable $x$ at the point $(x_0,y_0)$; it is written as $$\\frac{\\partial f}{\\partial x}(x_0,y_0).$$\nThis partial derivative measures the instantaneous rate of change of $f$ at the point $P=(x_0,y_0)$ when $P$ moves in the $x$-axis direction.\nGeometric interpretation of partial derivatives Geometrically, a two-variable function $z=f(x,y)$ defines a surface. If we cut this surface with a plane of equation $y=y_0$ (that is, the plane where $y$ is the constant $y_0$) the intersection is a curve, and the partial derivative of $f$ with respect to to $x$ at $(x_0,y_0)$ is the slope of the tangent line to that curve at $x=x_0$.\nInteractive Example\nPartial derivative The concept of partial derivative can be extended easily from two-variable function to $n$-variables functions.\nDefinition - Partial derivative. Given a $n$-variables function $f(x_1,\\ldots,x_n)$, $f$ is partially differentiable with respect to the variable $x_i$ at the point $a=(a_1,\\ldots,a_n)$ if exists the limit\n$$\\lim_{\\Delta x_i\\rightarrow 0} \\frac{f(a_1,\\ldots,a_{i-1},a_i+\\Delta x_i,a_{i+1},\\ldots,a_n)-f(a_1,\\ldots,a_{i-1},a_i,a_{i+1},\\ldots,a_n)} {h}.$$\nIn such a case, the value of the limit is known as partial derivative of $f$ with respect to $x_i$ at $a$; it is denoted\n$$f\u0026rsquo;_{x_i}(a)=\\frac{\\partial f}{\\partial x_i}(a).$$\nRemark. The definition of derivative for one-variable functions is a particular case of this definition for $n=1$.\nPartial derivatives computation When we measure the variation of $f$ with respect to a variable $x_i$ at the point $a=(a_1,\\ldots,a_n)$, the other variables remain constant. Thus, if we can consider the $i$-th partial function $$f_i(x_i)=f(a_1,\\ldots,a_{i-1},x_i,a_{i+1},\\ldots,a_n),$$\nthe partial derivative of $f$ with respect to $x_i$ can be computed differentiating this function:\n$$\\frac{\\partial f}{\\partial x_i}(a)=f_i\u0026rsquo;(a_i).$$\nTo differentiate partially $f(x_1,\\ldots,x_n)$ with respect to the variable $x_i$, you have to differentiate $f$ as a function of the variable $x_i$, considering the other variables as constants. Example of a perfect gas. Consider the function that measures the volume of a perfect gas $$v(t,p)=\\frac{nRt}{p},$$ where $t$ is the temperature, $p$ the pressure and $n$ and $R$ are constants.\nThe instantaneous rate of change of the volume with respect to the pressure is the partial derivative of $v$ with respect to $p$. To compute this derivative we have to think in $t$ as a constant and differentiate $v$ as if the unique variable was $p$:\n$$\\frac{\\partial v}{\\partial p}(t,p)=\\frac{d}{dp}\\left(\\frac{nRt}{p}\\right)_{\\mbox{$t=$cst}}=\\frac{-nRt}{p^2}.$$\nIn the same way, the instantaneous rate of change of the volume with respect to the temperature is the partial derivative of $v$ with respect to $t$:\n$$\\frac{\\partial v}{\\partial t}(t,p)=\\frac{d}{dt}\\left(\\frac{nRt}{p}\\right)_{\\mbox{$p=$cst}}=\\frac{nR}{p}.$$\nGradient Definition - Gradient. Given a scalar field $f(x_1,\\ldots,x_n)$, the gradient of $f$, denoted by $\\nabla f$, is a function that maps every point $a=(a_1,\\ldots,a_n)$ to a vector with coordinates the partial derivatives of $f$ at $a$,\n$$\\nabla f(a)=\\left(\\frac{\\partial f}{\\partial x_1}(a),\\ldots,\\frac{\\partial f}{\\partial x_n}(a)\\right).$$\nLater we will show that the gradient in a point is a vector with the magnitude and direction of the maximum rate of change of the function in that point. Thus, $\\nabla f(a)$ points to direction of maximum increase of $f$ at $a$, while $-\\nabla f(a)$ points to the direction of maximum decrease of $f$ at $a$. Example. After heating a surface, the temperature $t$ (in $^\\circ$C) at each point $(x,y,z)$ (in m) of the surface is given by the function\n$$t(x,y,z)=\\frac{x}{y}+z^2.$$\nIn what direction will increase the temperature faster at point $(2,1,1)$ of the surface? What magnitude will the maximum increase of temperature have?\nThe direction of maximum increase of the temperature is given by the gradient\n$$\\nabla t(x,y,z)=\\left(\\frac{\\partial t}{\\partial x}(x,y,z),\\frac{\\partial t}{\\partial y}(x,y,z),\\frac{\\partial t}{\\partial z}(x,y,z)\\right)=\\left(\\frac{1}{y},\\frac{-x}{y^2},2z\\right).$$\nAt point $(2,1,1)$ de direction is given by the vector\n$$\\nabla t(2,1,1)=\\left(\\frac{1}{1},\\frac{-2}{1^2},2\\cdot 1\\right)=(1,-2,2),$$\nand its magnitude is\n$$|\\nabla f(2,1,1)|=|\\sqrt{1^2+(-2)^2+2^2}|=|\\sqrt{9}|=3 \\mbox{ $^\\circ$C/m}.$$\nComposition of a vectorial field with a scalar field Multivariate chain rule If $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$ is a scalar field and $g:\\mathbb{R}\\rightarrow \\mathbb{R}^n$ is a vectorial function, then it is possible to compound $g$ with $f$, so that $f\\circ g:\\mathbb{R}\\rightarrow \\mathbb{R}$ is a one-variable function.\nTheorem - Chain rule. If $g(t)=(x_1(t),\\ldots,x_n(t))$ is a vectorial function differentiable at $t$ and $f(x_1,\\ldots,x_n)$ is a scalar field differentiable at the point $g(t)$, then $f\\circ g(t)$ is differentiable at $t$ and\n$$(f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t)=\\frac{\\partial f}{\\partial x_1}\\frac{dx_1}{dt}+ \\cdots + \\frac{\\partial f}{\\partial x_n}\\frac{dx_n}{dt}$$\nExample. Let us consider the scalar field $f(x,y)=x^2y$ and the vectorial function $g(t)=(\\cos t,\\sin t)$ $t\\in [0,2\\pi]$ in the real plane, then\n$$\\nabla f(x,y) = (2xy, x^2) \\quad \\mbox{and} \\quad g\u0026rsquo;(t) = (-\\sin t, \\cos t),$$\nand\n$$ \\begin{aligned} (f\\circ g)\u0026rsquo;(t) \u0026amp;= \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = (2\\cos t\\sin t,\\cos^2 t)\\cdot (-\\sin t,\\cos t) =\\newline \u0026amp;= -2\\cos t\\sin^2 t+\\cos^3 t. \\end{aligned} $$\nWe can get the same result differentiating the composed function directly\n$$(f\\circ g)(t) = f(g(t)) = f(\\cos t, \\sin t) = \\cos^2 t\\sin t,$$\nand its derivative is\n$$(f\\circ g)\u0026rsquo;(t) = 2\\cos t(-\\sin t)\\sin t+\\cos^2 t \\cos t = -2\\cos t\\sin^2 t+\\cos^3 t.$$\nThe chain rule for the composition of a vectorial function with a scalar field allow us to get the algebra of derivatives for one-variable functions easily:\n$$ \\begin{aligned} (u+v)\u0026rsquo; \u0026amp;= u\u0026rsquo;+v\u0026rsquo;\\newline (uv)\u0026rsquo; \u0026amp;= u\u0026rsquo;v+uv\u0026rsquo;\\newline \\left(\\frac{u}{v}\\right)\u0026rsquo; \u0026amp;= \\frac{u\u0026rsquo;v-uv\u0026rsquo;}{v^2}\\newline (u\\circ v)\u0026rsquo; \u0026amp;= u\u0026rsquo;(v)v' \\end{aligned} $$\nTo infer the derivative of the sum of two functions $u$ and $v$, we can take the scalar field $f(x,y)=x+y$ and the vectorial function $g(t)=(u(t),v(t))$. Applying the chain rule we get\n$$(u+v)\u0026rsquo;(t) = (f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = (1,1)\\cdot (u\u0026rsquo;,v\u0026rsquo;) = u\u0026rsquo;+v\u0026rsquo;.$$\nTo infer the derivative of the quotient of two functions $u$ and $v$, we can take the scalar field $f(x,y)=x/y$ and the vectorial function $g(t)=(u(t),v(t))$.\n$$\\left(\\frac{u}{v}\\right)\u0026rsquo;(t) = (f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = \\left(\\frac{1}{v},-\\frac{u}{v^2}\\right)\\cdot (u\u0026rsquo;,v\u0026rsquo;) = \\frac{u\u0026rsquo;v-uv\u0026rsquo;}{v^2}.$$\nTangent plane and normal line to a surface Let $C$ be the level set of a scalar field $f$ that includes a point $P$. If $\\mathbf{v}$ is the velocity at $P$ of a trajectory following $C$, then\n$$\\nabla f(P) \\cdot \\mathbf{v} = 0.$$\nProof If we take the trajectory $g(t)$ that follows the level set $C$ and passes through $P$ at time $t=t_0$, that is $P=g(t_0)$, so $\\mathbf{v}=g\u0026rsquo;(t_0)$, then\n$$(f\\circ g)(t) = f(g(t)) = f(P),$$\nthat is constant at any $t$. Thus, applying the chain rule we have\n$$(f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = 0,$$\nand, particularly, at $t=t_0$, we have\n$$\\nabla f(P)\\cdot \\mathbf{v} = 0.$$\nThat means that the gradient of $f$ at $P$ is normal to $C$ at $P$, provided that the gradient is not zero.\nNormal and tangent line to curve in the plane Normal line to a curve in the plane. According to the previous result, the normal line to a curve with equation $f(x,y)=0$ at point $P=(x_0,y_0)$, has equation\n$$P+t\\nabla f(P) = (x_0,y_0)+t\\nabla f(x_0,y_0).$$\nExample. Given the scalar field $f(x,y)=x^2+y^2-25$, and the point $P=(3,4)$, the level set of $f$ that passes through $P$, that satisfies $f(x,y)=f(P)=0$, is the circle with radius 5 centred at the origin of coordinates. Thus, taking as a normal vector the gradient of $f$\n$$\\nabla f(x,y) = (2x,2y),$$\nat the point $P=(3,4)$ is $\\nabla f(3,4) = (6,8)$, and the normal line to the circle at $P$ is\n$$P+t\\nabla f(P) = (3,4)+t(6,8) = (3+6t,4+8t),$$\nOn the other hand, the tangent line to the circle at $P$ is\n$$((x,y)-P)\\cdot \\nabla f(P) = ((x,y)-(3,4))\\cdot (6,8) = (x-3,y-4)\\cdot(6,8) = 6x+8y=50.$$\nNormal line and tangent plane to a surface in the space Normal line to a surface in the space. if we have a surface with equation $f(x,y,z)=0$, at the point $P=(x_0,y_0,z_0)$ the normal line has equation\n$$P+t\\nabla f(P) = (x_0,y_0,z_0)+t\\nabla f(x_0,y_0,z_0).$$\nExample. Given the scalar field $f(x,y,z)=x^2+y^2-z$, and the point $P=(1,1,2)$, the level set of $f$ that passes through $P$, that satisfies $f(x,y)=f(P)=0$, is the paraboloid $z=x^2+y^2$. Thus, taking as a normal vector the gradient of $f$\n$$\\nabla f(x,y,z) = (2x,2y,-1),$$\nat the point $P=(1,1,2)$ is $\\nabla f(1,1,2) = (2,2,-1)$, and the normal line to the paraboloid at $P$ is\n$$ \\begin{aligned} P+t\\nabla f(P)\u0026amp;= (1,1,2)+t\\nabla f(1,1,2) = (1,1,2)+t(2,2,-1)\\newline \u0026amp;= (1+2t,1+2t,2-t). \\end{aligned} $$\nOn the other hand, the tangent plane to the paraboloid at $P$ is\n$$\\begin{aligned} ((x,y,z)-P)\\cdot \\nabla f(P) \u0026amp;= ((x,y,z)-(1,1,2))(2,2,-1) = (x-1,y-1,z-2)(2,2,-1)=\\newline \u0026amp;= 2(x-1)+2(y-1)-(z-2) = 2x+2y-z-2= 0. \\end{aligned}$$\nThe graph of the paraboloid $f(x,y,z)=x^2+y^2-z=0$ and the normal line and the tangent plane to the graph of $f$ at the point $P=(1,1,2)$ are below.\nInteractive Example\nDirectional derivative For a scalar field $f(x,y)$, we have seen that the partial derivative $\\dfrac{\\partial f}{\\partial x}(x_0,y_0)$ is the instantaneous rate of change of $f$ with respect to $x$ at point $P=(x_0,y_0)$, that is, when we move along the $x$-axis.\nIn the same way, $\\dfrac{\\partial f}{\\partial y}(x_0,y_0)$ is the instantaneous rate of change of $f$ with respect to $y$ at the point $P=(x_0,y_0)$, that is, when we move along the $y$-axis.\nBut, what happens if we move along any other direction?\nThe instantaneous rate of change of $f$ at the point $P=(x_0,y_0)$ along the direction of a unitary vector $u$ is known as directional derivative.\nDefinition - Directional derivative. Given a scalar field $f$ of $\\mathbb{R}^n$, a point $P$ and a unitary vector $\\mathbf{u}$ in that space, we say that $f$ is differentiable at $P$ along the direction of $\\mathbf{u}$ if exists the limit\n$$f^\\prime_{\\mathbf{u}}(P) = \\lim_{h\\rightarrow 0}\\frac{f(P+h\\mathbf{u})-f(P)}{h}.$$\nIn such a case, the value of the limit is known as directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$.\nTheorem - Directional derivative . Given a scalar field $f$ of $\\mathbb{R}^n$, a point $P$ and a unitary vector $\\mathbf{u}$ in that space, the directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$ can be computed as the dot product of the gradient of $f$ at $P$ and the unitary vector $\\mathbf{u}$:\n$$f^\\prime_{\\mathbf{u}}(P) = \\nabla f(P)\\cdot \\mathbf{u}.$$\nProof If we consider a unitary vector $\\mathbf{u}$, the trajectory that passes through $P$, following the direction of $\\mathbf{u}$, has equation\n$$g(t)=P+t\\mathbf{u},\\ t\\in\\mathbb{R}.$$\nFor $t=0$, this trajectory passes through the point $P=g(0)$ with velocity $\\mathbf{u}=g\u0026rsquo;(0)$.\nThus, the directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$ is\n$$(f\\circ g)\u0026rsquo;(0) = \\nabla f(g(0))\\cdot g\u0026rsquo;(0) = \\nabla f(P)\\cdot \\mathbf{u}.$$\nThe partial derivatives are the directional derivatives along the vectors of the canonical basis. Example. Given the function $f(x,y) = x^2+y^2$, its gradient is\n$$\\nabla f(x,y) = (2x,2y).$$\nThe directional derivative of $f$ at the point $P=(1,1)$, along the unit vector $\\mathbf{u}=(1/\\sqrt{2},1/\\sqrt{2})$ is\n$$f_{\\mathbf{u}}\u0026rsquo;(P) = \\nabla f(P)\\cdot \\mathbf{u} = (2,2)\\cdot(1/\\sqrt{2},1/\\sqrt{2}) = \\frac{2}{\\sqrt{2}}+\\frac{2}{\\sqrt{2}} = \\frac{4}{\\sqrt{2}}.$$\nTo compute the directional derivative along a non-unitary vector $\\mathbf{v}$, we have to use the unitary vector that results from normalizing $v$ with the transformation\n$$\\mathbf{v\u0026rsquo;}=\\frac{\\mathbf{v}}{|\\mathbf{v}|}.$$\nGeometric interpretation of the directional derivative Geometrically, a two-variable function $z=f(x,y)$ defines a surface. If we cut this surface with a plane of equation $a(y-y_0)=b(x-x_0)$ (that is, the vertical plane that passes through the point $P=(x_0,y_0)$ with the direction of vector $\\mathbf{u}=(a,b)$) the intersection is a curve, and the directional derivative of $f$ at $P$ along the direction of $\\mathbf{u}$ is the slope of the tangent line to that curve at point $P$.\nInteractive Example\nGrowth of scalar field along the gradient We have seen that for any vector $\\mathbf{u}$\n$$f^\\prime_{\\mathbf{u}}(P) = \\nabla f(P)\\cdot \\mathbf{u} = |\\nabla f(P)|\\cos \\theta,$$\nwhere $\\theta$ is the angle between $\\mathbf{u}$ and the gradient $\\nabla f(P)$.\nTaking into account that $-1\\leq \\cos\\theta\\leq 1$, for any vector $\\mathbf{u}$ it is satisfied that\n$$-|\\nabla f(P)|\\leq f\u0026rsquo;_{\\mathbf{u}}(P)\\leq |\\nabla f(P)| .$$\nFurthermore, if $\\mathbf{u}$ has the same direction and sense than the gradient, we have $f\u0026rsquo;_{\\mathbf{u}}(P)=\\vert\\nabla f(P)\\vert\\cos 0=\\vert\\nabla f(P)\\vert$. Therefore, the maximum increase of a scalar field at a point $P$ is along the direction of the gradient at that point.\nIn the same manner, if $\\mathbf{u}$ has the same direction but opposite sense than the gradient, we have $f_{\\mathbf{u}}\u0026rsquo;(P)=\\vert\\nabla f(P)\\vert\\cos \\pi=-\\vert\\nabla f(P)\\vert$. Therefore, the maximum decrease of a scalar field at a point $P$ is along the opposite direction of the gradient at that point.\nImplicit derivation When we have a relation $f(x,y)=0$, sometimes we can consider $y$ as an implicit function of $x$, at least in a neighbourhood of a point $(x_0,y_0)$.\nThe equation $x^2+y^2=25$, whose graph is the circle of radius 5 centred at the origin of coordinates, its not a function, because if we solve the equation for $y$, we have two images for some values of $x$,\n$$y=\\pm \\sqrt{25-x^2}$$\nHowever, near the point $(3,4)$ we can represent the relation as the function $y=\\sqrt{25-x^2}$, and near the point $(3,-4)$ we can represent the relation as the function $y=-\\sqrt{25-x^2}$.\nIf an equation $f(x,y)=0$ defines $y$ as a implicit function of $x$, $y=h(x)$, in a neighbourhood of $(x_0,y_0)$, then we can compute de derivative of $y$, $h\u0026rsquo;(x)$, even if we do not know the explicit formula for $h$.\nTheorem - Implicit derivation. Let $f(x,y):\\mathbb{R}^2\\longrightarrow \\mathbb{R}$ a two-variable function and let $(x_0,y_0)$ be a point in $\\mathbb{R}^2$ such that $f(x_0,y_0)=0$. If $f$ has partial derivatives continuous at $(x_0,y_0)$ and $\\frac{\\partial f}{\\partial y}(x_0,y_0)\\neq 0$, then there is an open interval $I\\subset \\mathbb{R}$ with $x_0\\in I$ and a function $h(x): I\\longrightarrow \\mathbb{R}$ such that\n$y_0=h(x_0)$. $f(x,h(x))=0$ for all $x\\in I$. $h$ is differentiable on $I$, and $y\u0026rsquo;=h\u0026rsquo;(x)=\\frac{-\\dfrac{\\partial f}{\\partial x}}{\\dfrac{\\partial f}{\\partial y}}$ Proof. To prove the last result, take the trajectory $g(x)=(x,h(x))$ on the interval $I$. Then\n$$(f\\circ g)(x) = f(g(x)) = f(x,h(x))=0.$$\nThus, using the chain rule we have\n$$ \\begin{aligned} (f\\circ g)\u0026rsquo;(x) \u0026amp;= \\nabla f(g(x))\\cdot g\u0026rsquo;(x) = \\left(\\frac{\\partial f}{\\partial x}, \\frac{\\partial f}{\\partial y}\\right)\\cdot (1,h\u0026rsquo;(x)) = \\newline \u0026amp;= \\frac{\\partial f}{\\partial x}+\\frac{\\partial f}{\\partial y}h\u0026rsquo;(x) = 0, \\end{aligned} $$\nfrom where we can deduce\n$$y\u0026rsquo;=h\u0026rsquo;(x)=\\frac{-\\dfrac{\\partial f}{\\partial x}}{\\dfrac{\\partial f}{\\partial y}}.$$\nThis technique that allows us to compute $y\u0026rsquo;$ in a neighbourhood of $x_0$ without the explicit formula of $y=h(x)$, it is known as implicit derivation.\nExample. Consider the equation of the circle of radius 5 centred at the origin $x^2+y^2=25$. It can also be written as\n$$f(x,y) = x^2+y^2-25 = 0.$$ Take the point $(3,4)$ that satisfies the equation, $f(3,4)=0$.\nAs $f$ have partial derivatives $\\frac{\\partial f}{\\partial x}=2x$ and $\\frac{\\partial f}{\\partial y}=2y$, that are continuous at $(3,4)$, and $\\frac{\\partial f}{\\partial y}(3,4)=8\\neq 0$, then $y$ can be expressed as a function of $x$ in a neighbourhood of $(3,4)$ and its derivative is\n$$y\u0026rsquo;=\\frac{-\\frac{\\partial f}{\\partial x}}{\\frac{\\partial f}{\\partial y}} = \\frac{-2x}{2y}=\\frac{-x}{y} \\quad \\mbox{and} \\quad y\u0026rsquo;(3)=\\frac{-3}{4}.$$\nIn this particular case, that we know the explicit formula of $y=\\sqrt{1-x^2}$, we can get the same result computing the derivative as usual\n$$y\u0026rsquo; = \\frac{1}{2\\sqrt{1-x^2}}(-2x) = \\frac{-x}{\\sqrt{1-x^2}}.$$\nThe implicit function theorem can be generalized to functions with several variables.\nTheorem - Implicit derivation. Let $f(x_1,\\ldots,x_n,y):\\mathbb{R}^{n+1}\\longrightarrow \\mathbb{R}$ a $n+1$-variables function and let $(a_1,\\ldots, a_n,b)$ be a point in $\\mathbb{R}^{n+1}$ such that $f(a_1,\\ldots,a_n,b)=0$. If $f$ has partial derivatives continuous at $(a_1,\\ldots,a_n,b)$ and $\\frac{\\partial f}{\\partial y}(a_1,\\ldots,a_n,b)\\neq 0$, then there is a region $I\\subset \\mathbb{R}^n$ with $(x_1,\\ldots,x_n)\\in I$ and a function $h(x_1,\\ldots, x_n): I\\longrightarrow \\mathbb{R}$ such that\n$b=h(a_1,\\ldots,a_n)$. $f(x_1,\\ldots,x_n,h(x_1,\\ldots,x_n))=0$ for all $(x_1,\\ldots,x_n)\\in I$. $h$ is differentiable on $I$, and $\\dfrac{\\partial y}{\\partial x_i}=\\frac{-\\dfrac{\\partial f}{\\partial x_i}}{\\dfrac{\\partial f}{\\partial y}}$ Second order partial derivatives As the partial derivatives of a function are also functions of several variables we can differentiate partially each of them.\nIf a function $f(x_1,\\ldots,x_n)$ has a partial derivative $f^\\prime_{x_i}(x_1,\\ldots,x_n)$ with respect to the variable $x_i$ in a set $A$, then we can differentiate partially again $f_{x_i}^\\prime$ with respect to the variable $x_j$. This second derivative, when exists, is known as second order partial derivative of $f$ with respect to the variables $x_i$ and $x_j$; it is written as\n$$\\frac{\\partial ^2 f}{\\partial x_j \\partial x_i}= \\frac{\\partial}{\\partial x_j}\\left(\\frac{\\partial f}{\\partial x_i}\\right).$$\nIn the same way we can define higher order partial derivatives.\nExample. The two-variables function $$f(x,y)=x^y$$ has 4 second order partial derivatives:\n$$ \\begin{aligned} \\frac{\\partial^2 f}{\\partial x^2}(x,y) \u0026amp;= \\frac{\\partial}{\\partial x}\\left(\\frac{\\partial f}{\\partial x}(x,y)\\right) = \\frac{\\partial}{\\partial x}\\left(yx^{y-1}\\right) = y(y-1)x^{y-2},\\newline \\frac{\\partial^2 f}{\\partial y \\partial x}(x,y) \u0026amp;= \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial f}{\\partial x}(x,y)\\right) = \\frac{\\partial}{\\partial y}\\left(yx^{y-1}\\right) = x^{y-1}+yx^{y-1}\\log x,\\newline \\frac{\\partial^2 f}{\\partial x \\partial y}(x,y) \u0026amp;= \\frac{\\partial}{\\partial x}\\left(\\frac{\\partial f}{\\partial y}(x,y)\\right) = \\frac{\\partial}{\\partial x}\\left(x^y\\log x \\right) = yx^{y-1}\\log x+x^y\\frac{1}{x},\\newline \\frac{\\partial^2 f}{\\partial y^2}(x,y) \u0026amp;= \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial f}{\\partial y}(x,y)\\right) = \\frac{\\partial}{\\partial y}\\left(x^y\\log x \\right) = x^y(\\log x)^2. \\end{aligned} $$\nHessian matrix and Hessian Definition - Hessian matrix. Given a scalar field $f(x_1,\\ldots,x_n)$, with second order partial derivatives at the point $a=(a_1,\\ldots,a_n)$, the Hessian matrix of $f$ at $a$, denoted by $\\nabla^2f(a)$, is the matrix\n$$ \\nabla^2f(a)=\\left( \\begin{array}{cccc} \\dfrac{\\partial^2 f}{\\partial x_1^2}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_1 \\partial x_2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_1 \\partial x_n}(a)\\newline \\dfrac{\\partial^2 f}{\\partial x_2 \\partial x_1}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_2^2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_2 \\partial x_n}(a)\\newline \\vdots \u0026amp; \\vdots \u0026amp; \\ddots \u0026amp; \\vdots \\newline \\dfrac{\\partial^2 f}{\\partial x_n \\partial x_1}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_n \\partial x_2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_n^2}(a) \\end{array} \\right) $$\nThe determinant of this matrix is known as Hessian of $f$ at $a$; it is denoted $Hf(a)=\\vert\\nabla^2f(a)\\vert$.\nExample. Consider again the two-variables function\n$$f(x,y)=x^y.$$\nIts Hessian matrix is\n$$ \\nabla^2f(x,y) = \\left( \\begin{array}{cc} \\dfrac{\\partial^2 f}{\\partial x^2} \u0026amp; \\dfrac{\\partial^2 f}{\\partial x \\partial y}\\newline \\dfrac{\\partial^2 f}{\\partial y \\partial x} \u0026amp; \\dfrac{\\partial^2 f}{\\partial y^2} \\end{array} \\right) = \\left(\\begin{array}{cc} y(y-1)x^{y-2} \u0026amp; x^{y-1}(y\\log x+1) \\newline x^{y-1}(y\\log x+1) \u0026amp; x^y(\\log x)^2 \\end{array} \\right). $$\nAt point $(1,2)$ is\n$$ \\nabla^2 f(1,2) = \\left( \\begin{array}{cc} 2(2-1)1^{2-2} \u0026amp; 1^{2-1}(2\\log 1+1) \\newline 1^{2-1}(2\\log 1+1) \u0026amp; 1^2(\\log 1)^2 \\end{array} \\right) = \\left( \\begin{array}{cc} 2 \u0026amp; 1 \\newline 1 \u0026amp; 0 \\end{array} \\right). $$\nAnd its Hessian is\n$$ Hf(1,2)=\\left| \\begin{array}{cc} 2 \u0026amp; 1 \\newline 1 \u0026amp; 0 \\end{array} \\right|= 2\\cdot 0-1\\cdot1= -1. $$\nSymmetry of second partial derivatives In the previous example we can observe that the mixed derivatives of second order $\\frac{\\partial^2 f}{\\partial y\\partial x}$ and $\\frac{\\partial^2 f}{\\partial x\\partial y}$ are the same. This fact is due to the following result.\nTheorem - Symmetry of second partial derivatives. If $f(x_1,\\ldots,x_n)$ is a scalar field with second order partial derivatives $\\frac{\\partial^2 f}{\\partial x_i\\partial x_j}$ and $\\frac{\\partial^2 f}{\\partial x_j\\partial x_i}$ continuous at a point $(a_1,\\ldots,a_n)$, then\n$$\\frac{\\partial^2 f}{\\partial x_i\\partial x_j}(a_1,\\ldots,a_n)=\\frac{\\partial^2 f}{\\partial x_j\\partial x_i}(a_1,\\ldots,a_n).$$\nThis means that when computing a second partial derivative.\nAs a consequence, if the function satisfies the requirements of the theorem for all the second order partial derivatives, the Hessian matrix is symmetric.\nTaylor polynomials Linear approximation of a scalar field In a previous chapter we saw how to approximate a one-variable function with a Taylor polynomial. This can be generalized to several-variables functions.\nIf $P$ is a point in the domain of a scalar field $f$ and $\\mathbf{v}$ is a vector, the first degree Taylor formula of $f$ around $P$ is\n$$f(P+\\mathbf{v}) = f(P) + \\nabla f(P)\\cdot \\mathbf{v} +R^1_{f,P}(\\mathbf{v}),$$\nwhere\n$$P^1_{f,P}(\\mathbf{v}) = f(P)+\\nabla f(P)\\mathbf{v}$$\nis the first degree Taylor polynomial of $f$ at $P$, and $R^1_{f,P}(\\mathbf{v})$ is the Taylor remainder for the vector $\\mathbf{v}$, that is the error in the approximation.\nThe remainder satisfies\n$$\\lim_{|\\mathbf{v}|\\rightarrow 0} \\frac{R^1_{f,P}(\\mathbf{v})}{|\\mathbf{v}|} = 0$$\nThe first degree Taylor polynomial for a function of two variables is the tangent plane to the graph of $f$ at $P$. Linear approximation of a two-variable function If $f$ is a scalar field of two variables $f(x,y)$ and $P=(x_0,y_0)$, as for any point $Q=(x,y)$ we can take the vector $\\mathbf{v}=\\vec{PQ}=(x-x_0,y-y_0)$, then the first degree Taylor polynomial of $f$ at $P$, can be written as\n$$ \\begin{aligned} P^1_{f,P}(x,y) \u0026amp;= f(x_0,y_0)+\\nabla f(x_0,y_0)(x-x_0,y-y_0) =\\newline \u0026amp;= f(x_0,y_0)+\\frac{\\partial f}{\\partial x}(x_0,y_0)(x-x_0)+\\frac{\\partial f}{\\partial y}(x_0,y_0)(y-y_0). \\end{aligned} $$\nExample. Given the scalar field $f(x,y)=\\log(xy)$, its gradient is\n$$\\nabla f(x,y) = \\left(\\frac{1}{x},\\frac{1}{y}\\right),$$\nand the first degree Taylor polynomial at the point $P=(1,1)$ is\n$$\\begin{aligned} P^1_{f,P}(x,y) \u0026amp;= f(1,1) +\\nabla f(1,1)\\cdot (x-1,y-1) = \\newline \u0026amp;= \\log 1+(1,1)\\cdot(x-1,y-1) = x-1+y-1 = x+y-2. \\end{aligned}$$\nThis polynomial approximates $f$ near the point $P$. For instance,\n$$f(1.01,1.01) \\approx P^1_{f,P}(1.01,1.01) = 1.01+1.01-2 = 0.02.$$\nThe graph of the scalar field $f(x,y)=\\log(xy)$ and the first degree Taylor polynomial of $f$ at the point $P=(1,1)$ is below.\nQuadratic approximation of a scalar field If $P$ is a point in the domain of a scalar field $f$ and $\\mathbf{v}$ is a vector, the second degree Taylor formula of $f$ around $P$ is\n$$f(P+\\mathbf{v}) = f(P) + \\nabla f(P)\\cdot \\mathbf{v} + \\frac{1}{2}\\left(\\mathbf{v}\\nabla^2f(P)\\mathbf{v}\\right) + R^2_{f,P}(\\mathbf{v}),$$\nwhere\n$$P^2_{f,P}(\\mathbf{v})f(P)+\\nabla f(P)\\mathbf{v}+\\frac{1}{2}\\left(\\mathbf{v}\\nabla^2f(P)\\mathbf{v}\\right)$$\nis the second degree Taylor polynomial of $f$ at the point $P$, and $R^2_{f,P}(\\mathbf{v})$ is the Taylor remainder for the vector $\\mathbf{v}$, that is the error in the approximation.\nThe remainder satisfies\n$$\\lim_{|\\mathbf{v}\\rightarrow 0|} \\frac{R^2_{f,P}(\\mathbf{v})}{|\\mathbf{v}|^2} = 0.$$\nThis means that the remainder is smaller than the square of the module of $\\mathbf{v}$.\nQuadratic approximation of a two-variable function If $f$ is a scalar field of two variables $f(x,y)$ and $P=(x_0,y_0)$, then the second degree Taylor polynomial of $f$ at $P$, can be written as\n$$ \\begin{aligned} P^2_{f,P}(x,y) \u0026amp;= f(x_0,y_0)+\\nabla f(x_0,y_0)(x-x_0,y-y_0) + \\newline \u0026amp; + \\frac{1}{2}(x-x_0,y-y_0)\\nabla^2f(x_0,y_0)(x-x_0,y-y_0)= \\newline \u0026amp; = f(x_0,y_0)+\\frac{\\partial f}{\\partial x}(x_0,y_0)(x-x_0)+\\frac{\\partial f}{\\partial y}(x_0,y_0)(y-y_0)+ \\newline \u0026amp; + \\frac{1}{2}(\\frac{\\partial^2 f}{\\partial x^2}(x_0,y_0) (x-x_0)^2 + 2\\frac{\\partial^2 f}{\\partial y\\partial x}(x_0,y_0)(x-x_0)(y-y_0) + \\newline \u0026amp; + \\frac{\\partial^2 f}{\\partial y^2}(x_0,y_0)(y-y_0^2)) \\end{aligned} $$\nExample. Given the scalar field $f(x,y)=\\log(xy)$, its gradient is\n$$\\nabla f(x,y) = \\left(\\frac{1}{x},\\frac{1}{y}\\right),$$\nits Hessian matrix is\n$$Hf(x,y) = \\left( \\begin{array}{cc} \\frac{-1}{x^2} \u0026amp; 0\\newline 0 \u0026amp; \\frac{-1}{y^2} \\end{array} \\right)$$\nand the second degree Taylor polynomial of $f$ at the point $P=(1,1)$ is\n$$\\begin{aligned} P^2_{f,P}(x,y) \u0026amp;= f(1,1) +\\nabla f(1,1)\\cdot (x-1,y-1) +\\newline \u0026amp;+ \\frac{1}{2}(x-1,y-1)\\nabla^2f(1,1)\\cdot(x-1,y-1)=\\newline \u0026amp;= \\log 1+(1,1)\\cdot(x-1,y-1) +\\newline \u0026amp;+ \\frac{1}{2}(x-1,y-1) \\left( \\begin{array}{cc} -1 \u0026amp; 0\\newline 0 \u0026amp; -1 \\end{array} \\right) \\left( \\begin{array}{c} x-1\\newline y-1 \\end{array} \\right) = \\newline \u0026amp;= x-1+y-1+\\frac{-x^2-y^2+2x+2y-2}{2} =\\newline \u0026amp;= \\frac{-x^2-y^2+4x+4y-6}{2}. \\end{aligned}$$\nThus, $$ \\begin{aligned} f(1.01,1.01) \\approx P^1_{f,P}(1.01,1.01) \u0026amp;= \\frac{-1.01^2-1.01^2+4\\cdot 1.01+4\\cdot 1.01-6}{2} \\newline \u0026amp;= 0.0199. \\end{aligned} $$\nThe graph of the scalar field $f(x,y)=\\log(xy)$ and the second degree Taylor polynomial of $f$ at the point $P=(1,1)$ is below.\nInteractive Example\nRelative extrema Definition - Relative extrema. A scalar field $f$ in $\\mathbb{R}^n$ has a relative maximum at a point $P$ if there is a value $\\epsilon\u0026gt;0$ such that\n$$f(P)\\geq f(X)\\ \\forall X, |\\vec{PX}|\u0026lt;\\epsilon.$$\n$f$ has a relative minimum at $f$ if there is a value $\\epsilon\u0026gt;0$ such that\n$$f(P)\\leq f(X)\\ \\forall X, |\\vec{PX}|\u0026lt;\\epsilon.$$\nBoth relative maxima and minima are known as relative extrema of $f$.\nCritical points Theorem - Critical points. If a scalar field $f$ in $\\mathbb{R}^n$ has a relative maximum or minimum at a point $P$, then $P$ is a critical or stationary point of $f$, that is, a point where the gradient vanishes\n$$\\nabla f(P) = 0.$$\nProof Taking the trajectory that passes through $P$ with the direction of the gradient at that point $$g(t)=P+t\\nabla f(P),$$ the function $h=(f\\circ g)(t)$ does not decrease at $t=0$ since\n$$h\u0026rsquo;(0)= (f\\circ g)\u0026rsquo;(0) = \\nabla f(g(0))\\cdot g\u0026rsquo;(0) = \\nabla f(P)\\cdot \\nabla f(P) = |\\nabla f(P)|^2\\geq 0,$$\nand it only vanishes if $\\nabla f(P)=0$.\nThus, if $\\nabla f(P)\\neq 0$, $f$ can not have a relative maximum at $P$ since following the trajectory of $g$ from $P$ there are points where $f$ has an image greater than the image at $P$. In the same way, following the trajectory of $g$ in the opposite direction there are points where $f$ has an image less than the image at $P$, so $f$ can not have relative minimum at $P$.\nExample. Given the scalar field $f(x,y)=x^2+y^2$, it is obvious that $f$ only has a relative minimum at $(0,0)$ since\n$$f(0,0)=0 \\leq f(x,y)=x^2+y^2,\\ \\forall x,y\\in \\mathbb{R}.$$\nIs easy to check that $f$ has a critical point at $(0,0)$, that is $\\nabla f(0,0) = 0$.\nSaddle points Not all the critical points of a scalar field are points where the scalar field has relative extrema. If we take, for instance, the scalar field $f(x,y)=x^2-y^2$, its gradient is\n$$\\nabla f(x,y) = (2x,-2y),$$\nthat only vanishes at $(0,0)$. However, this point is not a relative maximum since the points $(x,0)$ in the $x$-axis have images $f(x,0)=x^2\\geq 0=f(0,0)$, nor a relative minimum since the points $(0,y)$ in the $y$-axis have images $f(0,y)=-y^2\\leq 0=f(0,0)$. This type of critical points that are not relative extrema are known as saddle points.\nAnalysis of the relative extrema From the second degree Taylor’s formula of a scalar field $f$ at a point $P$ we have\n$$f(P+\\mathbf{v})-f(P)\\approx \\nabla f(P)\\mathbf{v}+\\frac{1}{2}\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}.$$\nThus, if $P$ is a critical point of $f$, as $\\nabla f(P)=0$, we have\n$$f(P+\\mathbf{v})-f(P)\\approx \\frac{1}{2}\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}.$$\nTherefore, the sign of the $f(P+\\mathbf{v})-f(P)$ is the sign of the second degree term $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}$.\nThere are four possibilities:\nDefinite positive: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026gt;0$ $\\forall \\mathbf{v}\\neq 0$.\nDefinite negative: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026lt;0$ $\\forall \\mathbf{v}\\neq 0$.\nIndefinite: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026gt;0$ for some $\\mathbf{v}\\neq 0$ and $\\nabla^2f(P)\\mathbf{u}\\cdot\\mathbf{u}\u0026lt;0$ for some $\\mathbf{u}\\neq 0$.\nSemidefinite: In any other case.\nThus, depending on de sign of $\\nabla^2 f(P)\\mathbf{v}\\cdot\\mathbf{v}$, we have\nTheorem. Given a critical point $P$ of a scalar field $f$, it holds that\nIf $\\nabla^2f(P)$ is definite positive then $f$ has a relative minimum at $P$. If $\\nabla^2f(P)$ is definite negative then $f$ has a relative maximum at $P$. If $\\nabla^2f(P)$ is indefinite then $f$ has a saddle point at $P$. When $\\nabla^2f(P)$ is semidefinite we can not draw any conclusion and we need higher order partial derivatives to classify the critical point.\nAnalysis of the relative extrema of a scalar field in $\\mathbb{R}^2$ In the particular case of a scalar field of two variables, we have\nTheorem. Given a critical point $P=(x_0,y_0)$ of a scalar field $f(x,y)$, it holds that\nIf $Hf(P)\u0026gt;0$ and $\\dfrac{\\partial^2 f}{\\partial x^2}(x_0,y_0)\u0026gt;0$ then $f$ has a relative minimum at $P$. If $Hf(P)\u0026gt;0$ and $\\dfrac{\\partial^2 f}{\\partial x^2}(x_0,y_0)\u0026lt;0$ then $f$ has a relative maximum at $P$. IF $Hf(P)\u0026lt;0$ then $f$ has a saddle point at $P$. Example. Given the scalar field $f(x,y)=\\dfrac{x^3}{3}-\\dfrac{y^3}{3}-x+y$, its gradient is\n$$\\nabla f(x,y)= (x^2-1,-y^2+1),$$\nand it has critical points at $(1,1)$, $(1,-1)$, $(-1,1)$ and $(-1,-1)$.\nThe hessian matrix is\n$$\\nabla^2f(x,y) = \\left( \\begin{array}{cc} 2x \u0026amp; 0\\newline 0 \u0026amp; -2y \\end{array} \\right)$$\nand the hessian is\n$$Hf(x,y) = -4xy.$$\nThus, we have\nPoint $(1,1)$: $Hf(1,1)=-4\u0026lt;0 \\Rightarrow$ Saddle point.\nPoint $(1,-1)$: $Hf(1,-1)=4\u0026gt;0$ and $\\frac{\\partial^2}{\\partial x^2}(1,-1)=2\u0026gt;0 \\Rightarrow$ Relative min.\nPoint $(-1,1)$: $Hf(-1,1)=4\u0026gt;0$ and $\\frac{\\partial^2}{\\partial x^2}(-1,1)=-2\u0026lt;0 \\Rightarrow$ Relative max.\nPoint $(-1,-1)$: $Hf(-1,-1)=-4\u0026lt;0 \\Rightarrow$ Saddle point.\nThe graph of the function $f(x,y)=\\dfrac{x^3}{3}-\\dfrac{y^3}{3}-x+y$ and their relative extrema and saddle points are shown below.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1631523540,"objectID":"b82b8cd53b4f3645f020f7b3a65480d0","permalink":"/en/teaching/calculus/manual/derivatives-n-variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/derivatives-n-variables/","section":"teaching","summary":"Vector functions of a single real variable Definition - Vector function of a single real variable. A vector function of a single real variable or vector field of a scalar variable is a function that maps every scalar value $t\\in D\\subseteq \\mathbb{R}$ into a vector $(x_1(t),\\ldots,x_n(t))$ in $\\mathbb{R}^n$:","tags":["Partial Derivative","Gradient","Tangent Line","Normal Line","Tangent Plane","Normal Plane","Hessian Matrix","Extrema"],"title":"Several variables differentiable calculus","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Let $X$ be a discrete random variable with the following probability distribution\n$$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \\newline \\hline f(x) \u0026amp; 0.15 \u0026amp; 0.35 \u0026amp; 0.10 \u0026amp; 0.25 \u0026amp; 0.15 \\newline \\hline \\end{array} $$\nCalculate and represent graphically the distribution function. Calculate the following probabilities a. $P(X\u0026lt;7.5)$. b. $P(X\u0026gt;8)$. c. $P(4\\leq X\\leq 6.5)$. d. $P(5\u0026lt;X\u0026lt;6)$. Solution $$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;4$,}\\newline 0.15 \u0026amp; \\text{if $4\\leq x\u0026lt;5$,}\\newline 0.5 \u0026amp; \\text{if $5\\leq x\u0026lt;6$,}\\newline 0.6 \u0026amp; \\text{if $6\\leq x\u0026lt;7$,}\\newline 0.85 \u0026amp; \\text{if $7\\leq x\u0026lt;8$,}\\newline 1 \u0026amp; \\text{if $8\\leq x$.} \\end{cases} $$ $P(X\u0026lt;7.5)=0.85$, $P(X\u0026gt;8)=0$, $P(4\\leq x\\leq 6.5)=0.6$ and $P(5\u0026lt;X\u0026lt;6)=0$. Exercise 2 Let $X$ be a discrete random variable with the following probability distribution\n$$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;1$,} \\newline 1/5 \u0026amp; \\text{if $1\\leq x\u0026lt; 4$,} \\newline 3/4 \u0026amp; \\text{if $4\\leq x\u0026lt;6$,} \\newline 1 \u0026amp; \\text{if $6\\leq x$.} \\end{cases} $$\nCalculate the probability function. Calculate the following probabilities a. $P(X=6)$. b. $P(X=5)$. c. $P(2\u0026lt;X\u0026lt;5.5)$. d. $P(0\\leq X\u0026lt;4)$. Calculate the mean. Calculate the standard deviation. Solution $$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 1 \u0026amp; 4 \u0026amp; 6 \\newline \\hline f(x) \u0026amp; 0.2 \u0026amp; 0.55 \u0026amp; 0.25 \\newline \\hline \\end{array} $$\n$P(X=6)= 0.25$, $P(X=5)=0$, $P(2\u0026lt;X\u0026lt;5.5)=0.55$ and $P(0\\leq X\u0026lt;4)=0.2$.\n$\\mu=3.9$.\n$\\sigma=1.6703$.\nExercise 3 An experiment consist in injecting a virus to three rats and checking if they survive or not. It is known that the probability of surviving is $0.5$ for the first rat, $0.4$ for the second and $0.3$ for the third.\nCalculate the probability function of the variable $X$ that measures the number of surviving rats. Calculate the distribution function. Calculate $P(X\\leq 1)$, $P(X\\geq 2)$ and $P(X=1.5)$. Calculate the mean and the standard deviation. Is representative the mean? Solution $$ \\begin{array}{|c|c|c|c|c|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \\newline \\hline f(x) \u0026amp; 0.21 \u0026amp; 0.44 \u0026amp; 0.29 \u0026amp; 0.06\\newline \\hline \\end{array} $$ 2.$$ F(x)= \\begin{cases} 0 \u0026amp; \\text{si $x\u0026lt;0$,}\\newline 0.21 \u0026amp; \\text{si $0\\leq x\u0026lt;1$,}\\newline 0.65 \u0026amp; \\text{si $1\\leq x\u0026lt;2$,}\\newline 0.94 \u0026amp; \\text{si $2\\leq x\u0026lt;3$,}\\newline 1 \u0026amp; \\text{si $3\\leq x$.} \\end{cases} $$\n$P(X\\leq 1)=0.65$, $P(X\\geq 2)=0.35$ and $P(X=1.5)=0$. $\\mu=1.2$ rats, $\\sigma^2=0.7$ rats$^2$ y $\\sigma=0.84$ rats. Exercise 4 The chance of being cured with certain treatment is 0.85. If we apply the treatment to 6 patients,\nWhat is the probability that half of them get cured? What is the probability that a least 4 of them get cured? Solution Let $X$ be the number of cured patients,\n$P(X=3) = 0.0415$. $P(X\\geq 4)= 0.9527$. Exercise 5 Ten persons came into contact with a person infected with tuberculosis. The probability of being infected after contacting a person with tuberculosis is 0.1.\nWhat is the probability that nobody is infected? What is the probability that at least 2 persons are infected? What is the expected number of infected persons? Solution Let $X$ be the number of persons infected,\n$P(X=0) = 0.3487$. $P(X\\geq 2)= 0.2639$. $\\mu=1$. Exercise 6 The probability of suffering an adverse reaction to a vaccine is 0.001. If 2000 persons are vaccinated, what is the probability of suffering some adverse reaction?\nSolution Let $X$ be the number of adverse reactions, $P(X\\geq 1)=0.8648$. Exercise 7 The average number of calls per minute received by a telephone switchboard is 120.\nWhat is the probability of receiving less than 4 calls in 2 seconds? What is the probability of receiving at least 3 calls in 3 seconds? Solution Let $X$ be the number of calls in 2 seconds, $P(X\u0026lt;4)=0.4335$. Let $Y$ be the number of calls in 3 seconds, $P(X\\geq 3)= 0.938$. Exercise 8 A test contains 10 questions with 3 possible options each. For every question you get a point if you give the right answer and lose half a point if the answer is wrong. A student knows the right answer for 3 of the 10 questions and answers the rest randomly. What is the probability of passing the exam?\nSolution Let $X$ be the number of correct answers in questions randomly answered, $P(X\\geq 4)=0.1733$. Exercise 9 It has been observed experimentally that 1 of every 20 trillions of cells exposed to radiation mutates becoming carcinogenic. We know that the human body has approximately 1 trillion of cells by kilogram ot tissue. Calculate the probability that a 60 kg person exposed to radiation develops cancer. If the radiation affects 3 persons weighing 60 kg, what is the probability that a least one of them develops cancer?\nSolution Let $X$ be the number of cells mutated, $P(X\u0026gt;0)=0.9502$.\nLet $Y$ be the of persons developing cancer, $P(Y\\geq 1) = 0.9999$. Exercise 10 A diagnostic test for a disease returns 1% of positive outcomes, and the positivie and negative predictive values are 0.95 and 0.98 respectively.\nCalculate the prevalence of the disease. Calculate the sensitivity and the specificity of the test. If the test is applied to 12 sick persons, what is the probability of getting at least a wrong diagnosis? If the test is applied to 12 persons, what is the probability of getting a right diagnosis for all of them? Solution $P(D)=0.0293$. Sensitivity $P(+\\vert D)=0.3242$ and specificity $P(-\\vert \\bar D)=0.9995$. Let $X$ be the number of wrong diagnosis in 12 sick persons, $P(X\\geq 1)=1$. Let $Y$ be the number of right diagnosis in 12 persons, $P(X=12)=0.7818$. Exercise 11 In a study about a parasite that attacks the kidney of rats it is known that the average number of parasites per kidney is 3.\nCalculate the probability that a rat has more than 3 parasites. Calculate the probability of having at least 9 rats infected in a sample of 10 rats. Solution Let $X$ be the number of parasites in a rat, $P(X\u0026gt;3)=0.8488$. Let $Y$ be the number of rats with parasites in a sample of 10 rats, $P(Y\\geq 9)=0.9997$. Exercise 12 In a physiotherapy course there are 60% of females and 40% of males.\nIf 6 random students have to go to a hospital for making practices, what is the probability of going more males than females? In 5 samples of 6 students, what is the probability of having some sample without males? Solution Let $X$ be the number of females in a group of 6 students, $P(X\u0026lt;2)=0.1792$. Let $Y$ be the number of groups of 6 students without males in a sample of 5 groups, $P(Y\u0026gt;0) =0.2125$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a1636b6696a9581dbd7c28189956fd1d","permalink":"/en/teaching/statistics/problems/discrete_random_variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/discrete_random_variables/","section":"teaching","summary":"Exercise 1 Let $X$ be a discrete random variable with the following probability distribution\n$$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \\newline \\hline f(x) \u0026amp; 0.","tags":["Random Variables","Discrete Random Variables"],"title":"Problems of Discrete Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Given the continuous random variable $X$ with the following probability density function chart, Check that $f(x)$ is a probability density function. Calculate the following probabilities a. $P(X\u0026lt;1)$ b. $P(X\u0026gt;0)$ c. $P(X=1/4)$ d. $P(1/2\\leq X\\leq 3/2)$ Calculate the distribution function. Solution $P(X\u0026lt;1)=0.5$, $P(X\u0026gt;0)=1$, $P(X=1/4)=0$ and $P(1/2\\leq X\\leq 3/2)=0.875$. $$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;0$,} \\newline x^2/2 \u0026amp; \\text{if $0\\leq x\u0026lt; 1$,} \\newline x-5 \u0026amp; \\text{if $1\\leq x\u0026lt;1.5$,} \\newline 1 \u0026amp; \\text{if $1.5\\leq x$.} \\end{cases} $$\nExercise 2 A worker can arrive to the workplace at any moment between 6 and 7 in the morning with the same likelihood.\nCompute and plot the probability density function of the variable that measures the arrival time. compute and plot the distribution function. Compute the probability of arriving before quarter past six and after half past six. What is the expected arrival time? Solution $P(X\u0026lt;6.25)=0.25$ and $P(X\u0026gt;6.5)=0.5$. $\\mu=6.5$. Exercise 3 Let $Z$ be a random variable following a standard normal distribution model. Calculate the following probabilities using the table of the distribution function:\n$P(Z\u0026lt;1.24)$ $P(Z\u0026gt;-0.68)$ $P(-1.35\\leq Z\\leq 0.44)$ Solution $P(Z\u0026lt;1.24)=0.8925$. $P(Z\u0026gt;-0.68)=0.7517$. $P(-1.35\\leq Z\\leq 0.44)=0.5815$. Exercise 4 Let $Z$ be a random variable following a standard normal distribution model. Determine the value of $x$ in the following cases using the table of the distribution function:\n$P(Z\u0026lt;x)=0.6406$. $P(Z\u0026gt;x)=0.0606$. $P(0\\leq Z\\leq x)=0.4783$. $P(-1.5\\leq Z\\leq x)=0.2313$. $P(-x\\leq Z\\leq x)=0.5467$. Solution $x=0.3601$. $x=1.5498$. $x=2.0198$. $x=-0.5299$. $x=0.7499$. Exercise 5 Let $X$ be a random variable following a normal distribution model $N(10,2)$.\nCalculate $P(X\\leq 10)$. Calculate $P(8\\leq X\\leq 14)$. Calculate the interquartile range. Calculate the third decile. Solution $P(X\\leq 10)=0.5$. $P(8\\leq X\\leq 14)=0.8186$. $IQR=2.698$. $D_3=8.9512$. Exercise 6 It is known that the glucose level in blood of diabetic persons follows a normal distribution model with mean 106 mg/100 ml and standard deviation 8 mg/100 ml.\nCalculate the probability of a random diabetic person having a glucose level less than 120 mg/100 ml. What percentage of persons have a glucose level between 90 and 120 mg/100 ml? Calculate and interpret the first quartile of the glucose level. Solution $P(X\\leq 120)=0.9599$. $P(90\\leq X\\leq 120)=0.9372 \\Rightarrow 93.72%$. $Q_1=100.6041$ mg/100 ml. Exercise 7 It is known that the cholesterol level in males 30 years old follows a normal distribution with mean 220 mg/dl and standard deviation 30 mg/dl. If there are 20000 males 30 years old in the population,\nhow many of them have a cholesterol level between 210 and 240 mg/dl? If a cholesterol level greater than 250 mg/dl can provoke a thrombosis, how many of them are at risk of thrombosis? Calculate the cholesterol level above which 20% of the males are? Solution $P(210\\leq X\\leq 240)=0.3781 \\Rightarrow 7561.3$ persons. $P(X\u0026gt; 250)=0.1587 \\Rightarrow 3173.1$ persons. $P_{80}=245.2486$ mg/dl. Exercise 8 In an exam done by 100 students, the average grade is 4.2 and only 32 students pass. Assuming that the grade follows a normal distribution model, how many students got a grade greater than 7?\nSolution $P(X\u0026gt;7)=0.0508 \\Rightarrow 5.1$ students. Exercise 9 In a population with 40000 persons, 2276 have between 0.8 and 0.84 milligrams of bilirubin per deciliter of blood, and 11508 have more than 0.84. Assuming that the level of bilirubin in blood follows a normal distribution model,\nCalculate the mean and the standard deviation. How many persons have more than 1 mg of bilirubin per dl of blood? Solution $\\mu=0.7001$ mg/dl and $s=0.2497$ mg/dl. $P(X\u0026gt;1)=0.1149 \\Rightarrow 11.5$ persons. Exercise 10 It is known that the blood pressure of people in a population with 20000 persons follows a normal distribution model with mean 13 mm Hg and interquartile range 4 mm Hg.\nHow many persons have a blood pressure above 16 mm Hg? How much have to decrease the blood pressure of a person with 16 mm Hg in order to be below the 40% of people with lowest blood pressure? Solution $P(X\u0026gt;16)=0.1587 \\Rightarrow 3174$ persons. $D_4 = 12.25$ mm Hg, so, must decrease a least $3.75$ mm Hg. Exercise 11 A study tries to determine the effect of a low fat diet in the lifetime of rats. The rats where divided into two groups, one with a normal diet and another with a low fat diet. It is assumed that the lifetimes of both groups are normally distributed with the same variance but different mean. If 20% of rats with normal diet lived more than 12 months, 5% less than 8 months, and 85% of rats with low fat diet lived more than 11 months,\nwhat is the mean and the standard deviation of the lifetime of rats following a low fat diet? If 40% of the rats were under a normal diet, and 60% of rats under a low fat diet, what is the probability that a random rat die before 9 months? Solution Naming $X_1$ and $X_2$ to the lifetime of rats with a normal diet and a low fat diet respectively,\n$\\mu_2=12.6673$ months and $s=1.6087$ months. $P(X\u0026lt;9)=0.068$. Exercise 12 A diagnostic test to determine doping of athletes returns a positive outcome when the concentration of a substance in blood is greater than 4 $\\mu$g/ml. If the distribution of the substance concentration in doped athletes follows a normal distribution model with mean 4.5 $\\mu$g/ml and standard deviation 0.2 $\\mu$g/ml, and in non-doped athletes is normally distributed with mean 3 $\\mu$g/ml and standard deviation 0.3 $\\mu$g/ml,\nwhat is the sensitivity and specificity of the test? If there is a 10% of doped athletes in a competition, what is the positive predicted value? Solution Naming $D$ to the event of being doped, $X$ to the concentration in doped athletes and $Y$ to the concentration in non-doped athletes,\nSensitivity $P(+\\vert D) = P(X\u0026gt;4)=0.9938$ and specificity $P(-\\vert \\bar D)=P(Y\u0026lt;4)=0.9996$ PPV $P(D\\vert +) = 0.9961$. Exercise 13 According to the central limit theorem, for big samples ($n\\geq 30$) the sample mean $\\bar x$ follows a normal distribution model $N(\\mu,\\sigma/\\sqrt{n})$, where $\\mu$ is the population mean and $\\sigma$ the population standard deviation.\nIt is known that in a population the sural triceps elongation follows has mean 60 cm and standard deviation 15 cm. If you draw a sample of 30 individuals from this population, what is the probability of having a sample mean greater than 62 cm? If a sample is atypical if its mean is below the 5th percentile, is atypical a sample of 60 individuals with $\\bar x=57$?\nSolution $P(\\bar x\u0026gt;62) = 0.2326$.\n$P_{5}=56.8148$, so, the sample is non-atypical. Exercise 14 The curing time of a knee injury in soccer players follows a normal distribution model with mean 50 days and standard deviation 10 days. If there is a final match in 65 days, what is the probability that a player that has just injured his knee will miss the final? If the semifinal match is in 40 days, and 4 players has just injured the knee, what is the probability that some of them can play the semifinal?\nSolution Let $X$ be the curing time, $P(X\u0026gt;65)=0.0668$.\nLet $Y$ be the number of injured players that could play the semifinal, $P(Y\\geq 1)=0.4989$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ecd113b6c57caed03f719da29b66a46b","permalink":"/en/teaching/statistics/problems/continuous_random_variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/continuous_random_variables/","section":"teaching","summary":"Exercise 1 Given the continuous random variable $X$ with the following probability density function chart, Check that $f(x)$ is a probability density function. Calculate the following probabilities a. $P(X\u0026lt;1)$ b. $P(X\u0026gt;0)$ c.","tags":["Random Variables","Continuous Random Variables"],"title":"Problems of Continuous Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 6, 2022\nQuestion 1 The patients of a physiotherapy clinic were asked to assess their satisfaction in a scale from 0 to 10. The assessments are summarized in the table below.\n$$\\begin{array}{lr} \\hline \\mbox{Assessment} \u0026amp; \\mbox{Patients}\\newline 0 - 2 \u0026amp; 3 \\newline 2 - 4 \u0026amp; 12 \\newline 4 - 6 \u0026amp; 9 \\newline 6 - 8 \u0026amp; 18 \\newline 8 - 10 \u0026amp; 22 \\newline \\hline \\end{array} $$\nCompute the interquartile range of the assessment and interpret it.\nIf it is required an assessment greater than 5 in more than 50% of patients for the clinic to remain open, will the clinic remain open?\nIs the assessment mean representative?\nCompute the coefficient of kurtosis of the assessment and interpret it. Is the kurtosis normal?\nIf the assessment mean of another clinic is 6.8 and the standard deviation is 2.6, which assessment is relatively higher 6 in the first clinic or 6.2 in the second?\nUse the following sums for the computations: $\\sum x_in_i=408$, $\\sum x_i^2n_i=3000$, $\\sum (x_i-\\bar x)^3n_i=-548.25$ and $\\sum (x_i-\\bar x)^4n_i=5140.45$.\nShow solution Let $X$ be the patient assessment.\n$Q_1= 4.2203$, $Q_3=8.5457$ and $IQR = 4.3254$, so the central dispersion is moderate.\n$F(5)=0.305$, and the percentage of patients with an assessment greater than 5 is $69.5\\%$.\n$\\bar x = 6.375$, $s_x^2 = 6.2344$, $s_x=2.4969$ and $cv=0.3917$, thus the representativity of the mean is moderate.\n$g_2 = -0.9335$ and the distribution is flatter than a Gauss bell, but normal, as $g_2$ is between -2 and 2.\nFirst clinic: $z(6)=-0.1502$\nSecond clinic: $z(6.2)=-0.3077$.\nThus, an assessment of 6 in the first clinic is relatively higher as its standard score is greater.\nQuestion 2 A study tries to determine the effectiveness a training program to increase the grip strength. The table below shows the grip strength in Kg in some weeks of the training program.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Week} \u0026amp; 1 \u0026amp; 3 \u0026amp; 6 \u0026amp; 9 \u0026amp; 14 \u0026amp; 17 \u0026amp; 21 \u0026amp; 24 \\newline \\mbox{Grip strength} \u0026amp; 15 \u0026amp; 22 \u0026amp; 29 \u0026amp; 34 \u0026amp; 36 \u0026amp; 39 \u0026amp; 40 \u0026amp; 41 \\newline \\hline \\end{array} $$\nCompute the regression coefficient of the grip strength on the weeks and interpret it.\nAccording to the logarithmic regression model, what is the expected grip strength after 5 and 25 weeks. Are these predictions reliable? Would these predictions be more reliable with the linear regression model?\nAccording to the exponential regression model, how many weeks are required to have a grip strength of 25 Kg?\nWhat percentage of the total variability of the weeks is explained by the exponential model?\nUse the following sums ($X$=Weeks and $Y$=Grip strength):\n$\\sum x_i=95$, $\\sum \\log(x_i)=16.7824$, $\\sum y_j=256$, $\\sum \\log(y_j)=27.3423$,\n$\\sum x_i^2=1629$, $\\sum \\log(x_i)^2=43.606$, $\\sum y_j^2=8804$, $\\sum \\log(y_j)^2=94.3237$,\n$\\sum x_iy_j=3552$, $\\sum x_i\\log(y_j)=342.9642$, $\\sum \\log(x_i)y_j=608.4186$, $\\sum \\log(x_i)\\log(y_j)=60.047$.\nShow solution $\\bar x=11.875$ weeks, $s_x^2=62.6094$ weeks$^2$.\n$\\bar y=32$ Kg, $s_y^2=76.5$ Kg$^2$.\n$s_{xy}=64$ weeks$\\cdot$Kg.\nRegression coefficient of $Y$ on $X$: $b_{yx} = 1.0222$ Kg/week. The grip strength increases $1.0222$ Kg per week.\n$\\overline{\\ln(x)} = 2.0978$ ln(weeks), $s_{\\ln(x)}^2 = 1.05$ ln(weeks)$^2$ and $s_{\\ln(x)y} = 8.9226$ ln(weeks)Kg.\nLogarithmic regression model of $Y$ on $X$: $y = 14.1729 + 8.498 \\ln(x)$.\nPredictions: $y(5) = 27.8499$ Kg and $y(25) = 41.5268$ Kg.\nLogarithmic coefficient of determination: $r^2 = 0.9912$. The predictions are not reliable because the sample size is small.\nLinear coefficient of determination: $r^2 = 0.8552$.\nAs the linear coefficient of determination is less than the logarithmic one, the predictions with the logarithmic model are more reliable.\nExponential regression model of $X$ on $Y$: $x = e^{-1.6345 + 0.1166y}$.\nPrediction: $x(25)=3.6015$ Weeks.\nAs $r^2 = 0.9912$, the exponential models explains $99.12$% of the variability of the weeks.\nQuestion 3 A diagnostic test for a cervical injury has a 99% of sensitivity and produces 80% of right diagnosis. Assuming that the prevalence of the injury is 10%:\nCompute the specificity of the test.\nCan we rule out the injury with a negative outcome of the test?\nCan we diagnose the injury with a positive outcome of the test? What must the minimum prevalence of the injury be to diagnose the injury with a positive outcome of the test?\nShow solution Specificity = $P(-|\\overline D) = 0.7789$.\nNegative predictive value = $P(\\overline D|-) = 0.9986 \u0026gt; 0.5$, so we can rule out the injury with a negative outcome.\nPositive predictive value = $P(D|+) = 0.3322 \u0026lt; 0.5$, so we can not diagnose the injury with a positive outcome. The minimum prevalence required to be able to diagnose the injury with a positive outcome is $P(D)=0.1825$.\nQuestion 4 A pharmacy sells two vaccines $A$ and $B$ against a virus. The $A$ vaccine produces 5% of side effects, while the $B$ vaccine produces 2% of side effects. The pharmacy has sold 10 units of the $A$ vaccine and 100 units of the $B$ vaccine.\nCompute the probability of having less than 2 side effects with the $A$ vaccine.\nCompute the probability of having more than 3 side effects with the $B$ vaccine.\nIf we apply both vaccines to the same person at different moments, and assuming that the production of side effects of the vaccines are independent, what is the probability that this person will have any side effect?\nShow solution Let $X$ be the number of side effects in 10 applications of A vaccine. Then, $X\\sim B(10, 0.05)$ and $P(X\u0026lt;2) = 0.9139$.\nLet $Y$ be the number of side effects in 100 applications of B vaccine. Then, $Y\\sim B(100, 0.02)\\approx P(2)$ and $P(Y\u0026gt;3) = 0.1429$.\nLet $A$ and $B$ the events of having side effects with vaccines A and B respectively. $P(A\\cup B) = 0.069$.\nQuestion 5 The length of the femur bone is normally distributed in both men and women with a standard deviation of 4 cm. It is also known that the first quartile in women is 42.3 cm, while the third quartile in men is 50.7 cm.\nWhat is the difference between the means of the femur length of women and men?\nRemark: If you do not know how to compute the means, use a mean 44 cm for women and a mean 47 cm for men in the following parts.\nCompute the 60th percentile of the femur length in women. What percentage of men have a femur length less than the 60th percentile of women?\nIf we pick a woman and man at random, what is the probability that neither of them has a femur length less than 45 cm?\nShow solution Let $X$ and $Y$ be the femur length of women and men respectively. Then $X\\sim N(\\mu_x, 4)$ and $Y\\sim N(\\mu_y,4)$.\n$\\mu_x = 44.91$ cm, $\\mu_y = 48.02$ cm and $\\mu_x - \\mu_y = -3.11$ cm.\n60th percentile in women $P_{60}=45.9234$ cm, and $P(Y\u0026lt;45.9234) = 0.3001$, that is, a $30.01\\%$ of men have a femur length less than the 60th percentile of women.\n$P(X\\geq 45 \\cap Y\\geq 45) = 0.3805$.\n","date":1654473600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1655134020,"objectID":"f5f50c8bc7726b0c5fe86666dfba940f","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-06-06/","publishdate":"2022-06-06T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-06-06/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 6, 2022\nQuestion 1 The patients of a physiotherapy clinic were asked to assess their satisfaction in a scale from 0 to 10. The assessments are summarized in the table below.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-06-06","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: May 6, 2022\nQuestion 1 A basketball player scores 12 points per game on average.\nWhat is the probability that the player scores more than 4 points in a quarter?\nIf the player plays 10 games in a league, what is the probability of scoring less than 6 points in some game?\nShow solution Let $X$ be the points scored in a quarter by the player. Then $X\\sim P(3)$, and $P(X\u0026gt;4)=0.1847$.\nLet $Y$ be the number of points scored in a game by the player. Then $Y\\sim P(12)$ and $P(Y\u0026lt;6)=0.0203$.\nLet $Z$ be the number of games with less than 6 points scored by the player. Then $Z\\sim B(10, 0.0203)$, and $P(Z\u0026gt;0)=0.1858$.\nQuestion 2 8% of people in a population consume cocaine. It is also known that 4% of people who consume cocaine have a heart attack and 10% of people who have a heart attack consume cocaine.\nConstruct the probability tree for the random experiment of drawing a random person from the population and measuring if he or she consumes cocaine and if he or she has a heart attack.\nCompute the probability that a random person of the population does not consume cocaine and does not have a heart attack.\nAre the events of consuming cocaine and having a heart attack dependent?\nCompute the relative risk and the odds ratio of suffering a heart attack consuming cocaine. Which association measure is more suitable for this study? Interpret it.\nShow solution Let $C$ the event of consuming cocaine and $H$ the event of having a heart attack.\n$P(\\overline C\\cap \\overline H)=0.8912$.\nThe events are dependent as $P(C)=0.08\\neq P(C|H)=0.1$.\n$RR(H)=1.2778$ and $OR(H)=1.2894$. The odds ratio is more suitable as the study is retrospective. That means that the odds of having a heart attack is $1.2894$ times greater if a person consumes cocaine.\nQuestion 3 The creatine phosphokinase (CPK3) is an enzyme in the body that causes the phosphorylation of creatine. This enzyme is found in the skeletal muscle and can be measured in a blood analysis. The concentration of CPK3 in blood is normally distributed, and the interval centred at the mean with the reference values, that accumulates 99% of the population, ranges from 40 to 308 IU/L in healthy adult males.\nCompute the mean and the standard deviation of the concentration of CPK3 in healthy males.\nA diagnostic test to detect muscular dystrophy gives a negative outcome when the concentration of CPK3 is below 300 UI/L. Compute the specificity of the test.\nIf the concentration of CPK3 in people with muscular dystrophy also follows a normal distribution with mean 350 IU/L and the same standard deviation, what is the sensitivity of the test?\nCompute the predictive values of the test and interpret them assuming that the muscular dystrophy prevalence is 8%.\nShow solution $\\mu = 174$ IU/L and $\\sigma = 51.938$ IU/L.\nSpecificity = $0.9924$.\nSensitivity = $0.8321$.\nThe test is better to confirm the disease as the specificity is greater than the sensitivity.\nPPV = $0.9046$. Thus, we can diagnose the disease with a positive outcome.\nNPV = $0.9855$. Thus, we can rule out the disease with a negative outcome.\n","date":1651795200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652390973,"objectID":"035e2e68234e69067ec8995b0251f08d","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-05-06/","publishdate":"2022-05-06T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-05-06/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 6, 2022\nQuestion 1 A basketball player scores 12 points per game on average.\nWhat is the probability that the player scores more than 4 points in a quarter?","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-05-06","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: March 11, 2022\nQuestion 1 The table below shows the number of credits obtained by the students of the first year of the physiotherapy grade.\n$$48, 52, 60, 60, 24, 48, 48, 36, 39, 54, 54, 60, 12, 46$$\nCompute the median and the mode and interpret them.\nDraw the box and whiskers plot and interpret it. Are there outliers in the sample?\nCan we assume that the sample comes from a normal population?\nIf the the second year the mean of credits obtained is $102$ and the standard deviation is $12.5$, which year has a higher relative dispersion?\nWhich number of credits is relatively higher, 50 in the first year, or 105 in the second year?\nUse the following sums for the computations:\n$\\sum x_i=641$ credits, $\\sum x_i^2=31901$ credits$^2$, $\\sum (x_i-\\bar x)^3=-40158.06$ credits$^3$ and $\\sum (x_i-\\bar x)^4=1672652.57$ credits$^4$.\nShow solution $Me = 48$ credits and $Mo = 48$ and $60$ credits.\n$Q_1= 39$ credits, $Q_3= 54$ credits, $IQR=15$ credits, $f_1= 16.5$ credits and $f_2= 76.5$ credits. 12 credits is an outlier.\n$\\bar x=45.7857$ credits, $s^2=182.3112$ credits$^2$, $s=13.5023$ credits.\n$g_1=-1.1653$ and $g_2=0.5946$. Thus, we can assume that the sample comes from a normal distribution as the coef. of skewness and the coef. of kurtosis fall between -2 and 2.\nFirst year: $cv=0.2949$. Second year: $cv=0.1225$. Thus, the first year has a higher relative dispersion as the coef. of variation is greater.\nStandard score for the first year: $z(50)=0.3121$\nStandard score for the second year: $z(105)=0.24$\nAs the standard score of $50$ the first year is greater than the standard score of $105$ the second year, 50 credits in the first year is relatively higher than 105 credits in the second year.\nQuestion 2 The Regional Ministry of Health of the Community of Madrid realizes a possible relationship between the level of air pollution and the number of cases of pneumonia in the population in the first 10 weeks of the year. To verify this, the variable $X$ registers the number of pollution meters that exceed the pollution limits each week, and the variable $Y$ indicates the number of people affected by pneumonia in each week.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline\nX \u0026amp; 3 \u0026amp; 3 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \u0026amp; 3 \u0026amp; 4 \u0026amp; 2 \u0026amp; 3 \\newline Y \u0026amp; 2 \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 6 \u0026amp; 6 \u0026amp; 2 \u0026amp; 2 \u0026amp; 1 \u0026amp; 1 \\newline \\hline \\end{array} $$\nAre the number of people affected by pneumonia and the number of meters that exceed the pollution limits two linearly independent variables?\nAccording to the linear model, how does the number of people affected by pneumonia change in relation to the number of meters that exceed the pollution limits?\nJustify whether or not the linear relationship between the two variables is well explained and in what proportion.\nAccording to the exponential regression model, how many people are expected to be affected by pneumonia a week with 5 meters exceeding the pollution limits?\nWhich of the following diagrams best represents the regression lines? Justify the answer.\nUse the following sums for the computations:\n$\\sum x_i=44$ meters, $\\sum \\log(x_i)=13.9004$ log(meters), $\\sum y_j=26$ persons, $\\sum \\log(y_j)=7.4547$ log(persons),\n$\\sum x_i^2=230$ meters$^2$, $\\sum \\log(x_i)^2=21.1414$ log(meters)$^2$, $\\sum y_j^2=100$ persons$^2$, $\\sum \\log(y_j)^2=9.5496$ log(persons)$^2$,\n$\\sum x_iy_j=146$ meters$\\cdot$persons, $\\sum x_i\\log(y_j)=43.8653$ meters$\\cdot$log(persons), $\\sum \\log(x_i)y_j=42.8037$ log(meters)$\\cdot$persons, $\\sum \\log(x_i)\\log(y_j)=12.7804$ log(meters)$\\cdot$log(persons).\nShow solution $\\bar x = 4.4$ meters, $s_x^2=3.64$ meters$^2$.\n$\\bar y = 2.6$ persons, $s_y^2=3.24$ persons$^2$.\n$s_{xy}=3.16$ meters$\\cdot$persons. That means that there is a direct linear relation between the meters that exceed pollution limits and the people affected by pneumonia.\n$b_{yx}=0.8681$ persons/meter. Thus, the number of people affected by pneumonia increases $0.8681$ persons for every meter more that exceed the pollution limits.\nLinear coefficient of determination $r^2=0.8467$. Therefore, the linear regression model explains $84.67$ % of the variability of the number of people affected by pneumonia.\n$\\overline{\\log(y)}=0.7455$ log(persons), $s_{x\\log(y)}=1.1065$ meters*log(persons).\nExponential regression model: $y=e^{-0.592 + 0.304x}$, and $y(5)=-3.552$ persons.\nDiagram $A$ because the relation is direct and very strong according to the linear coefficient of determination.\n","date":1646956800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1650733145,"objectID":"40392b70da564e0b81e3910a9f83a155","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-03-11/","publishdate":"2022-03-11T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-03-11/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 11, 2022\nQuestion 1 The table below shows the number of credits obtained by the students of the first year of the physiotherapy grade.\n$$48, 52, 60, 60, 24, 48, 48, 36, 39, 54, 54, 60, 12, 46$$","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-03-11","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 17, 2022\nQuestion 1 To analyze the hypoxemia tolerance of mammals, in a laboratory some rats are exposed to extreme conditions with variable levels of oxygen. The rats are in a room whose oxygen level (in %) at any position $(x,y)$ (in meters), is\n$$O(x,y)=\\frac{1}{10}x^2y^2e^{x-y}$$\nFor the rats to survive, they must reach positions where the oxygen level is above 18%.\nA rat $A$ is at position $(3,2)$. If the rat stays in that position, will it survive?\nWhat direction should rat $A$ take in order to increase the oxygen level as quickly as possible? What is the instantaneous rate of change of the oxygen level following that direction?\nAnother rat $B$ is at position $(2,2)$. If it starts to move in such a way that $y$ decreases the double of the increment of $x$, how will the oxygen level change?\nSolution $O(3,2)=9.7858$%, therefore the rat will not survive.\nThe direction of the gradient $\\nabla(3,2) = (6e,0)$. Following this direction, the instantaneous rate of change of the oxygen level is $|\\nabla(3,2)|=6e$%/m.\nDirectional derivative along the direction of the vector $\\mathbf{v}=(1,-2)$: $f\u0026rsquo;_{\\mathbf{v}}(2,2)=1.4311$%/m.\nQuestion 2 The ozone ($O_3$) in the atmosphere is transformed into oxygen ($O_2$) through the following chemical reaction:\n$$2O_3 \\rightarrow 3O_2$$\nIt was experimentally observed that the speed at which the amount of oxygen varies is inversely proportional to the amount of oxygen present. If there is initially 10 g of oxygen in a place, and after one hour this amount of oxygen doubles,\nWhat will the amount of oxygen be after 5 hours?\nHow long will it take to have 1 kg of oxygen?\nSolution Let $t$ the time and $o(t)$ the amount of oxygen at time $t$. Differential equation $o\u0026rsquo;=k/o$.\nParticular solution: $o(t)=\\sqrt{300t+100}$.\n$o(5)=40$ g.\n$3333$ hours.\nQuestion 3 Two insects start moving from the same point following perpendicular directions.\nIf the first insect moves at a speed of 3 cm/s and the second at a speed of 4 cm/s, at what instantaneous speed does the distance between them change 2 seconds after they start moving? And at 3 seconds?\nIf 4 seconds after they start moving the second insect stops and the first continues moving with the same direction and speed, at what instantaneous speed does the distance between the two insects change at that moment?\nRemark: The distance between the two insects is the length of the hypotenuse of the right triangle whose sides are the distance travelled by them.\nSolution Let $h(t)$ the length of the hypotenuse of the right triangle whose sides are the distance travelled by the insects at time $t$.\n$h\u0026rsquo;(2)=5$ cm/s and $h\u0026rsquo;(3)=5$ cm/s.\n$h\u0026rsquo;(4)=1.8$ cm/s.\n","date":1642377600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642789234,"objectID":"43c4ecd6f56eb0b8b0a7887afa991d54","permalink":"/en/teaching/calculus/exams/pharmacy-2022-01-17/","publishdate":"2022-01-17T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2022-01-17/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 17, 2022\nQuestion 1 To analyze the hypoxemia tolerance of mammals, in a laboratory some rats are exposed to extreme conditions with variable levels of oxygen.","tags":["Exam"],"title":"Pharmacy exam 2022-01-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 17, 2022\nQuestion 1 A diagnostic test for a disease with a prevalence of 10% has a positive predictive value of 40% and negative predictive value of 95%.\nCompute the sensitivity and the specificity of the test.\nCompute the probability of a right diagnose.\nWhat must be the minimum sensitivity of the test to be able to diagnose the disease?\nSolution Sensitivity $P(+|D)=0.571$ and specificity $P(-|\\overline D)=0.9048$.\n$P(\\mbox{Right diagnose}) = P(D \\cap +) + P(\\overline D \\cap -) = 0.8714$.\nMinimum sensitivity to diagnose the disease $P(+|D)=0.857$.\nQuestion 2 To study the effectiveness of two antigen tests for the COVID both tests have been applied to a sample of 100 persons. The table below shows the results:\n$$ \\begin{array}{ccr} \\hline \\mbox{Test $A$} \u0026amp; \\mbox{Test $B$} \u0026amp; \\mbox{Num persons}\\newline \\mbox{+} \u0026amp; \\mbox{+} \u0026amp; 8\\newline \\mbox{+} \u0026amp; \\mbox{-} \u0026amp; 2\\newline \\mbox{-} \u0026amp; \\mbox{+} \u0026amp; 3\\newline \\mbox{-} \u0026amp; \\mbox{-} \u0026amp; 87\\newline \\hline \\end{array} $$\nDefine the following events and compute its probabilities:\nGet a $+$ in the test $A$.\nGet a $+$ in the test $A$ and a $-$ in the test $B$.\nGet a $+$ in some of the two tests.\nGet different results in the two tests.\nGet the same result in the two tests.\nGet a $+$ in the test $B$ if we got a $+$ in the test $A$.\nAre the outcomes of the two tests independent?\nSolution Let $A$ and $B$ the events of getting positive outcomes in the tests $A$ and $B$ respectively.\n$P(A)=0.1$.\n$P(A\\cap \\overline B)=0.02$.\n$P(A\\cup B) = 0.13$.\n$P(A\\cap \\overline B) + P(\\overline A \\cap B) = 0.05$.\n$P(A\\cap B) + P(\\overline A \\cap \\overline B)= 0.95$.\n$P(B|A) = 0.8$.\nAs $P(B|A)\\neq P(B)$ the events are dependent.\nQuestion 3 It is known that the life of a battery for a peacemaker follows a normal distribution. It has been observed that 20% of the batteries last more than 15 years, while 10% last less than 12 years.\nCompute the mean and the standard deviation of the battery life.\nCompute the fourth decile of the battery life.\nIf we take a sample of 5 batteries, what is the probability that more than half of them last between 13 and 14 years?\nIf we take a sample of 100 batteries, what is the probability that some of them last less than 11 years?\nSolution Let $X$ be the duration of a battery. Then $X\\sim N(\\mu,\\sigma)$.\n$\\mu = 13.8108$ years and $\\sigma = 1.413$ years.\n$D_4 = 13.4528$ years.\nLet $Y$ be the number of batteries lasting between 13 and 14 years in a sample of 5 batteries. Then $Y\\sim B(5,0.2702)$ and $P(Y\u0026gt;2.5)=0.0209$.\nLet $U$ be the number of batteries lasting less than 11 years in a sample of 100 batteries. Then $U\\sim B(100, 0.0233)\\approx P(2.3335)$ and $P(U\\geq 1)=0.903$.\n","date":1642377600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642925317,"objectID":"7af3f40b063c6570e1321845c462b81e","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2022-01-17/","publishdate":"2022-01-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2022-01-17/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 17, 2022\nQuestion 1 A diagnostic test for a disease with a prevalence of 10% has a positive predictive value of 40% and negative predictive value of 95%.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2022-01-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 22, 2021\nQuestion 1 The cranial capacity (in dm$^3$) of a primate population follows a normal probability distribution $X\\sim N(\\mu,\\sigma)$. The chart below shows the Gauss bell of $X$. Observe that the chart shows the area below the bell between 1 and 3.\nWhat is the mean of the cranial capacity distribution?\nIs the mean of the cranial capacity representative of the population?\nWhat are the coefficients of skewness and kurtosis?\nWhat is the interquartile range of the cranial capacity?\nIf a cranial capacity outside of the interval $(Q_1-1.5IQR, Q_3+1.5IQR)$ is considered an outlier, what is the probability of observing an outlier in the cranial capacity?\nSolution Let $X$ be the cranial capacity of a primate. Then, $X\\sim N(\\mu, \\sigma)$.\n$\\mu=2$ dm$^3$\n$\\sigma=0.5$ dm$^6$ and $cv=0.25$. As the coef. of variation is small, the mean is representative.\nAs $X$ follows a normal distribution, $g_1=0$ and $g_2=0$.\n$Q_1 = 1.6628$ dm$^3$, $Q_3=2.3372$ dm$^3$ and $IQR=0.6745$ dm$^3$.\nFences: $f_1=0.651$ dm$^3$ and $f_2=3.349$.\n$P(X \u0026lt; 0.651) + P(X \u0026gt; 3.349) = 0.007$.\nQuestion 2 A pharmaceutical company produces the same drug in 5 different laboratories. It has been observed that each laboratory produces, on average, one non-marketable defective batch every three months.\nWhat is the probability that a laboratory produce more than 3 defective batches in one year?\nWhat is the probability that at least 2 laboratories produce no defective batches in one year?\nSolution Let $X$ be the number of defective batches in a year then $X\\sim P(4)$, and $P(X\u0026gt;3) = 0.5665$.\nLet $Y$ be the number of laboratories that produce no defective batches in one year, then $Y\\sim B(5, 0.0183)$ and $P(Y\\geq 2) = 0.0032$.\nQuestion 3 The table below shows the frequencies observed in a random sample from a population for the blood type and SARS-CoV-2 infection:\n$$ \\begin{array}{llr} \\hline \\mbox{Blood type} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{Persons}\\newline \\mbox{O} \u0026amp; \\mbox{No} \u0026amp; 1800\\newline \\mbox{O} \u0026amp; \\mbox{Yes} \u0026amp; 100\\newline \\mbox{A} \u0026amp; \\mbox{No} \u0026amp; 4200\\newline \\mbox{A} \u0026amp; \\mbox{Yes} \u0026amp; 400\\newline \\mbox{B} \u0026amp; \\mbox{No} \u0026amp; 2500\\newline \\mbox{B} \u0026amp; \\mbox{Yes} \u0026amp; 150\\newline \\mbox{AB} \u0026amp; \\mbox{No} \u0026amp; 800\\newline \\mbox{AB} \u0026amp; \\mbox{Yes} \u0026amp; 50\\newline \\hline \\end{array} $$\nCompute the probability of SARS-CoV-2 infection for a random person.\nCompute the probability of having a blood type A and being infected by SARS-CoV-2 for a random person.\nCompute the probability of having a blood type A or being infected by SARS-CoV-2 for a random person.\nCompute the probability of being infected by SARS-CoV-2 for a person with blood type O.\nCompute the probability of having a blood type different from A and B for a person infected by SARS-CoV-2.\nDoes the SARS-CoV-2 infection depend on the blood type?\nSolution Let $I$ be the probability of being infected by SARS-CoV-2.\n$P(I) = 0.07$.\n$P(A\\cap I) = 0.04$.\n$P(A\\cup I) = 0.49$.\n$P(I|O) = 0.0526$.\n$P(\\overline A \\cap \\overline B|I) = 0.2143$.\nThe infection depends on the blood as, for instance, $p(I)\\neq P(I|O)$.\nQuestion 4 To study the relation between the blood Rh and the SARS-CoV-2 infection a random sample of non-infected people was drawn from a population. The table below shows the number of people infected after one year.\n$$ \\begin{array}{llr} \\hline \\mbox{Blood Rh} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{Persons}\\newline \\mbox{-} \u0026amp; \\mbox{Yes} \u0026amp; 520\\newline \\mbox{-} \u0026amp; \\mbox{No} \u0026amp; 6380\\newline \\mbox{+} \u0026amp; \\mbox{Yes} \u0026amp; 780\\newline \\mbox{+} \u0026amp; \\mbox{No} \u0026amp; 6200\\newline \\hline \\end{array} $$\nCompute the relative risk and the odds ratio to study the association between the SARS-CoV-2 infection and the blood Rh. Which association measure is more suitable to explain the relation between the SARS-CoV-2 infection and the blood Rh. Interpret it.\nA diagnostic test for the SARS-CoV-2 has been developed with a 95% of specificity and a 60% of sensitivity, regardless of blood Rh. In which blood Rh will produce more errors? Which diagnosis will we make if we apply the test to a persons with blood Rh- and we get a positive outcome? Which diagnosis will we make if we apply the test to a persons with blood Rh+ and we get a negative outcome?\nSolution Let $I$ be the event of being infected by SARS-CoV-2.\n$RR(I) = R_+(I) / R_-(I) = 1.4828$ and $OR(I) = O_+(I) / O_-(I) = 1.5435$.\nThe relative risk is more suitable as this is a prospective study and the incidence of infection can be estimated. Thus, the risk of infection with Rh+ is almost one and a half the risk with Rh-.\n$P(\\mbox{Error}|\\mbox{Rh-}) = 0.0764$ and $P(\\mbox{Error}|\\mbox{Rh+}) = 0.0891$. Thus, the test will produce more errors in people with Rh+.\nPositive predictive value for Rh-: $p(I|+)=0.4945$. Therefore, we will diagnose no infection.\nNegative predictive value for Rh+: $p(\\overline I|-)=0.9497$. Therefore, we will predict no infection.\n","date":1637539200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1638051573,"objectID":"10872ade46fe0189bda4d1cb55780dab","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-11-22/","publishdate":"2021-11-22T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-11-22/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 22, 2021\nQuestion 1 The cranial capacity (in dm$^3$) of a primate population follows a normal probability distribution $X\\sim N(\\mu,\\sigma)$. The chart below shows the Gauss bell of $X$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-11-22","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 25, 2021\nQuestion 1 The table below shows the number of daily sugary drinks drunk by a sample of 16-years-old people.\n$$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; \u0026amp; 0.1 \u0026amp; \u0026amp; \\newline \\hline 1 \u0026amp; \u0026amp; \u0026amp; 48 \u0026amp; \\newline \\hline 2 \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.725\\newline \\hline 3 \u0026amp; 24 \u0026amp; \u0026amp; \u0026amp; \\newline \\hline 4 \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.975\\newline \\hline 5 \u0026amp; \u0026amp; \u0026amp; 120 \u0026amp; \\newline \\hline \\end{array} $$\nComplete the table explaining how.\nPlot the cumulative frequency polygon.\nAre there outliers?\nStudy the normality of the distribution.\nIf another sample of 18-years-old people has a mean $2.1$ drinks and a variance $1.5$ drinks$^2$, in which distribution is more representative the mean?\nWho consumes a higher relative amount of sugary drinks, a 16-years-old who consumes 3 drinks a day or a 18-years-old who consumes 4?\nUse the following sums for the computations: $\\sum x_i= 225$ drinks, $\\sum x_i^2=579$ drinks$^2$, $\\sum (x_i-\\bar x)^3=80.16$ drinks$^3$ and $\\sum (x_i-\\bar x)^4=616.32$ drinks$^4$.\nSolution $$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i \\newline \\hline 0 \u0026amp; 12 \u0026amp; 0.100 \u0026amp; 12 \u0026amp; 0.100 \\newline \\hline 1 \u0026amp; 36 \u0026amp; 0.300 \u0026amp; 48 \u0026amp; 0.400 \\newline \\hline 2 \u0026amp; 39 \u0026amp; 0.325 \u0026amp; 87 \u0026amp; 0.725 \\newline \\hline 3 \u0026amp; 24 \u0026amp; 0.200 \u0026amp; 111 \u0026amp; 0.925 \\newline \\hline 4 \u0026amp; 6 \u0026amp; 0.050 \u0026amp; 117 \u0026amp; 0.975 \\newline \\hline 5 \u0026amp; 3 \u0026amp; 0.025 \u0026amp; 120 \u0026amp; 1.000 \\newline \\hline \\end{array} $$\nQuartiles: $Q_1=1$ drinks, $Q_2=2$ drinks, $Q_3=3$ drinks\n$IQR = 2$ drinks.\nFences: $f_1=-2$ drinks and $f_2=6$ drinks. Thus, there are no outliers.\n$\\bar x=1.875$ drinks, $s^2=1.3094$ drinks$^2$, $s=1.1443$ drinks, $g_1=0.4458$ and $g_2=-0.0043$. As the coefficient of skewness and the coefficient of kurtosis are between -2 and 2 we can assume that the sample comes from a normal population.\nLet $Y$ be the daily sugary drinks drunk by 18-year-old people. Then, $cv_x=0.6103$ and $cv_y=0.5832$. As the coefficient of variation of 18-year-old is a little bit smaller than the one of 16-year-old, the mean of the 18-year-old is a little bit more representative.\nStandard score for 16-year-old: $z(3)=0.9832$\nStandard score for 18-year-old: $z(4)=1.5513$\nAs the standard score of 4 for a 18-years-old is greater than the standard score of 3 for a 16-years-old, 4 drinks for a 18-year-old is relatively higher than 3 drinks for a 16-years-old.\nQuestion 2 The rowan is a species of tree that grows at different altitudes. In order to study how the rowan adapts to different habitats, we have collected a sample of branches of 12 trees at different altitudes in Scotland. In the laboratory, the respiration rate of each branch was observed during the night. The following table shows the altitude (in meters) of each branch and the respiration rate (in nl of O$_2$ per hour per mg of weight).\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Altitude} \u0026amp; 90 \u0026amp; 230 \u0026amp; 240 \u0026amp; 260 \u0026amp; 330 \u0026amp; 400 \u0026amp; 410 \u0026amp; 550 \u0026amp; 590 \u0026amp; 610 \u0026amp; 700 \u0026amp; 790 \\newline \\mbox{Respiration rate} \u0026amp; 110 \u0026amp; 200 \u0026amp; 130 \u0026amp; 150 \u0026amp; 180 \u0026amp; 160 \u0026amp; 230 \u0026amp; 180 \u0026amp; 230 \u0026amp; 260 \u0026amp; 320 \u0026amp; 370 \\newline \\hline \\end{array} $$\nIs there a linear relationship between altitude and respiration rate of rowan. How is this relationship?\nHow much increases the respiration rate per each increment of 100 meters in the altitude?\nWhat respiration rate is expected for a rowan at 500 meters of altitude? And for a rowan at the sea level?\nAre these predictions reliable?\nUse the following sums for the computations ($X$=Altitude and $Y$=Respiration rate): $\\sum x_i=5200$ m, $\\sum y_i=2520$ nl/(mg$\\cdot$ h), $\\sum x_i^2=2760000$ (m)$^2$, $\\sum y_i^2=594600$ nl/(mg$\\cdot$ h)$^2$ and $\\sum x_iy_j=1253400$ m$\\cdot$ nl/(mg$\\cdot$ h).\nSolution $\\bar x=433.3333$ m, $s_x^2=42222.2222$ (m)$^2$,\n$\\bar y=210$ nl/(mg$\\cdot$ h), $s_y^2=5450$ nl/(mg$\\cdot$ h)$^2$,\n$s_{xy}=13450$ m $\\cdot$ nl/(mg$\\cdot$ h).\nAs the covariance is positive, there is a direct linear relation between the altitude and the respiration rate.\nThe respiration rate increases $b_{yx} = 0.3186$ nl/(mg$\\cdot$h) per meter, or what is the same, $31.8553$ nl/(mg$\\cdot$h) per 100 meters.\nRegression line of the respiration rate on the altitude: $y=71.9605 + 0.3186x$.\nPredictions: $y(500) = 231.2368$ nl/(mg$\\cdot$ h) and $y(0) = 71.9605$ nl/(mg$\\cdot$ h).\n$r^2 = 0.7862$. As the coefficient of determination is not far from 1, the regression line fits well, but the sample size is too small to have reliable predictions. In addition, the prediction for the sea level is less reliable because it falls outside the range of values of the sample.\nQuestion 3 The relationship between basal metabolic rate and age is being studied in a sample of healthy men and the following regression lines have been obtained\nCompute the means of the basal metabolic rate and the age.\nHow is the fit of the two lines?\nSolution Let $X$ be the age and $Y$ the basal metabolic rate.\n$\\bar x=40$ and $\\bar y=40$.\n$b_{yx}=-0.1$, $b_{xy}=-5$ and $r^2 = 0.5$, thus the fit of the regression lines moderate.\n","date":1635120000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1635284889,"objectID":"11b6ecf127b258e5427cf2d1a9d17ad2","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-10-25/","publishdate":"2021-10-25T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-10-25/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 25, 2021\nQuestion 1 The table below shows the number of daily sugary drinks drunk by a sample of 16-years-old people.\n$$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; \u0026amp; 0.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-10-25","type":"book"},{"authors":null,"categories":["R"],"content":" Application access Table of Contents Application access What is Rubrics? How to cite Rubrics? What is Rubrics? Rubrics is a Shiny web application for assessment with rubrics.\nThe application allows:\nCreate a rubric for an exam or test. Load the list of students from a csv file and generate a template for the assessment. Load the assessment from the template. Generate a list with the students grades. Generate a descriptive summary of the distribution of grades. Generate a personalized report with the assessment of each student. The video below contains a more detailed presentation of this application (in Spanish):\nHow to cite Rubrics? Anemone, Gloria., Sánchez-Alberca, Alfredo. (2021). Rubrics (version 1.0) [software]. Obtained from: https://aprendeconalf.es/en/project/rubrics.\n","date":1630454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1634551798,"objectID":"f9a37edad77f43f8a27a71e21ea67b2c","permalink":"/en/project/rubrics/","publishdate":"2021-09-01T00:00:00Z","relpermalink":"/en/project/rubrics/","section":"project","summary":"A web app for assessment with rubrics.","tags":["Rybrics","Software","Shiny"],"title":"Rubrics","type":"project"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 7, 2021\nDescriptive Statistics and Regression Question 1 To study the effectiveness of a new treatment for the polymyalgia rheumatica a sample of patients with polymyalgia was drawn and they were divided into two groups. The first group received the new treatment while the second one received a placebo. After a year following the treatment they filled out a survey. The chart below shows the distribution of the survey score of the two groups of patients (the greater the score the better the treatment).\nConstruct the frequency table of the scores for the placebo group and plot the ogive.\nCompute the interquartile range of the scores for the placebo group.\nAre there outliers in the placebo group?\nIn which group the score mean represents better?\nWhich distribution is more normal regarding the kurtosis?\nWhich score is relatively better, a score of 5 in the placebo group or a score of 6 in the treatment group?\nUse the following sums for the computations:\nPlacebo: $\\sum x_i=125.5$, $\\sum x_i^2=680.25$, $\\sum (x_i-\\bar x)^3=27.11$ and $\\sum (x_i-\\bar x)^4=253.27$.\nTreatment: $\\sum x_i=131$, $\\sum x_i^2=887$ $\\sum (x_i-\\bar x)^3=2.66$ and $\\sum (x_i-\\bar x)^4=88.03$.\nShow solution $$\\begin{array}{lrrrr} \\mbox{Score} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i \\newline \\hline [2,3] \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 1 \u0026amp; 0.0 \\newline (3,4] \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 7 \u0026amp; 0.3 \\newline (4,5] \u0026amp; 7 \u0026amp; 0.28 \u0026amp; 14 \u0026amp; 0.6 \\newline (5,6] \u0026amp; 3 \u0026amp; 0.12 \u0026amp; 17 \u0026amp; 0.7 \\newline (6,7] \u0026amp; 7 \u0026amp; 0.28 \u0026amp; 24 \u0026amp; 1.0 \\newline (7,8] \u0026amp; 0 \u0026amp; 0.00 \u0026amp; 24 \u0026amp; 1.0 \\newline (8,9] \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1.0 \\newline \\hline \\end{array} $$ $Q_1= 3.875$, $Q_3= 6.25$ and $IQR=2.375$.\n$f_1 = 0.3125$ and $f_2=9.8125$. Thus, there are no outliers in the placebo sample because all the values fall between the fences.\nPlacebo: $\\bar x=5.02$, $s^2=2.0096$, $s=1.4176$ and $cv=0.2824$.\nTreatment: $\\bar x=6.55$, $s^2=1.4475$, $s=1.2031$ and $cv=0.1837$.\nPlacebo: $g_2=-0.4914$. Treatment: $g_2=-0.8992$. Thus, the distribution of the placebo group is more normal as the coef. of kurtosis is closer to 0.\nStandard score for the placebo: $z(5)=-0.0141$.\nStandard score for the treatment: $z(6)=-0.4571$.\nAs the standard score of $5$ in the placebo group is greater than the standard score of $6$ in the treatment group, a score of 5 in the placebo group is better.\nQuestion 2 We have applied different doses of an antibiotic to a culture of bacteria. The table below shows the number of residual bacteria corresponding to the different doses.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Dose ($\\mu$g)} \u0026amp; 0.2 \u0026amp; 0.7 \u0026amp; 1 \u0026amp; 1.5 \u0026amp; 2 \u0026amp; 2.4 \u0026amp; 2.8 \u0026amp; 3 \\newline \\mbox{Bacteria} \u0026amp; 40 \u0026amp; 32 \u0026amp; 28 \u0026amp; 20 \u0026amp; 18 \u0026amp; 15 \u0026amp; 12 \u0026amp; 11 \\newline \\hline \\end{array} $$\nWhich regression model explains better the number of residual bacteria as a function of the antibiotic dose, the linear or the exponential?\nUse the best of the two previous regression models to predict the number of residual bacteria for an antibiotic dose of 3.5 $\\mu$g. Is this prediction reliable?\nAccording to the linear regression model, what is the expected decrease in the number of residual bacteria per each $\\mu$g more of antibiotic?\nUse the following sums for the computations ($X$=Antibiotic dose and $Y$=Number of bacteria):\n$\\sum x_i=13.6$ $\\mu$g, $\\sum \\log(x_i)=2.1362$ $\\log(\\mbox{$\\mu$g})$, $\\sum y_j=176$ bacteria, $\\sum \\log(y_j)=23.9638$ $\\log(\\mbox{bacteria})$,\n$\\sum x_i^2=30.38$ $\\mu$g$^2$, $\\sum \\log(x_i)^2=6.3959$ $\\log(\\mbox{$\\mu$g})^2$, $\\sum y_j^2=4622$ bacteria$^2$, $\\sum \\log(y_j)^2=73.3096$ $\\log(\\mbox{bacteria})^2$,\n$\\sum x_iy_j=227$ $\\mu$g$\\cdot$bacteria, $\\sum x_i\\log(y_j)=37.4211$ $\\mu$g$\\cdot\\log(\\mbox{bacteria})$, $\\sum \\log(x_i)y_j=-17.633$ $\\log(\\mbox{$\\mu$g})$bacteria, $\\sum \\log(x_i)\\log(y_j)=3.6086$ $\\log(\\mbox{$\\mu$g})\\log(\\mbox{bacteria})$.\nShow solution $\\overline{x}=1.7$ $\\mu$g, $s_x^2=0.9075$ $\\mu$g$^2$.\n$\\bar y=22$ bacteria, $s_y^2=93.75$ bacteria$^2$.\n$s_{xy}=-9.025$ $\\mu$g$\\cdot$bacteria.\nLinear coefficient of determination $r^2 = 0.9574$.\n$\\overline{\\log(y)}=2.9955$ log(bacteria), $s_{\\log(y)}^2=0.1908$ log(bacteria)$^2$.\n$s_{x\\log(y)}=-0.4147$ $\\mu$g$\\cdot$ log(bacteria).\nExponential coefficient of determination $r^2 = 0.9928$.\nThus, the exponential model explains better the number of residual bacteria as a function of the antibiotic dose because the exponential coef. of determination is greater.\nExponential regression model: $y=e^{3.7723-0.4569x}$.\nPrediction: $y(3.5)=8.7845$ bacteria.\nAlthough the coef. of determination is close to 1, the this prediction is not reliable because the sample size is very small.\n$b_{yx}=-9.9449$, therefore the number of bacteria decreases $9.9449$ per each $\\mu$g more of antibiotic.\nProbability and Random Variables Question 3 In women, the shoulder circumference follows a normal distribution with mean 98 cm and standard deviation 5 cm.\nCompute the percentage of women in the population with a shoulder circumference between 95 and 105 cm.\nAbove what value are the 5% of women with a highest shoulder circumference?\nCompute the probability that in a sample of 50 women there is at least 2 with a shoulder circumference less than 90 cm.\nShow solution Let $X$ be the shoulder circumference, then $X\\sim N(98, 5)$.\n$P(95\\leq X\\leq 105) = 0.645$, that is $6.45%$.\n$P_{95} = 106.22$ cm.\nLet $Y$ be the number of women with a shoulder circumference less than 90 cm in a sample of 50 women. Then, $Y\\sim B(50, 0.0548) \\approx P(2.74)$, and $P(Y\\geq 2) = 0.7585$.\nQuestion 4 It has been observed that a company of components for physiotherapy machines produces 12 defective components every 300 hours on average.\nWhat is the probability of producing more than 2 defective components in 100 hours?\nWhat is the probability of producing at most one defective component in 50 hours?\nIf there are 7 companies in Spain that produce these components, and assuming that all of them produce the same number of defective components on average, compute the probability that at least one company produces more than 3 defective components in 50 hours.\nShow solution Let $X$ be the number of defective components in 100 hours, then $X\\sim P(4)$, and $P(X\u0026gt;2) = 0.7619$.\nLet $Y$ be the number of defective components in 50 hours, then $X\\sim P(2)$, and $P(X\\leq 1) = 0.406$.\nLet $Z$ be the number of companies that produce more than 3 defective components in 50 hours in a sample of 7 companies, then $Z\\sim B(7, 0.1429)$, and $P(Y\\geq 1) = 0.6601$.\nQuestion 5 We want to study the risk for a new vaccine to cause thrombi compared with a traditional vaccine. After applying the new vaccine to 1000 persons and the traditional vaccine to 3000 persons, we observed 30 persons with thrombi in the new vaccine group and 42 persons with thrombi in the traditional vaccine group.\nCompute the relative risk of suffering thrombi with the new vaccine and interpret it.\nCompute the odds ratio of suffering thrombi with the new vaccine and interpret it.\nWhich association measure is more reliable?\nIn a random experiment we applied both vaccines (in different moments) to a sample and we observed that 4% of persons suffered some thrombi (due to the new vaccine or to the traditional vaccine). Compute the probability of suffering thrombi with the new vaccine and no with the traditional one.\nAre the events corresponding to suffering thrombi with the new vaccine and the traditional vaccine independent?\nShow solution Let $T$ be the event of suffering thrombi.\n$RR(T)=2.1429$. Thus, the risk of suffering thrombi with the new vaccine is more than the double that with traditional vaccine.\n$OR(T)=2.1782$. Thus, the odds of suffering thrombi with the new vaccine is more than the double that with traditional vaccine.\nBoth measures are reliable because the study is prospective and we can estimate the incidence, but the relative risk is easier to interpret.\nLet $T_n$ and $T_t$ the events of suffering thrombi with the new and the traditional vaccines, respectively. $P(T_n\\cap \\overline{T_t}) = 0.026$.\n$P(T_t|T_n) = 0.1333 \\neq P(T_t) = 0.014$, thus the events are dependent.\n","date":1623024000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1623953052,"objectID":"7448cde3ef53e43e21e1233c2c8dd253","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-06-07/","publishdate":"2021-06-07T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-06-07/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 7, 2021\nDescriptive Statistics and Regression Question 1 To study the effectiveness of a new treatment for the polymyalgia rheumatica a sample of patients with polymyalgia was drawn and they were divided into two groups.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-06-07","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: May 5, 2021\nProbability and random variables Question 1 The average number of injuries in an international tennis tournament is 2.\nCompute the probability that in an international tennis tournament there are more than 2 injuries.\nIf a tennis circuit has 6 international tournaments, what is the probability that there are no injuries in some of them?\nShow solution Let $X$ be the number of injuries in a tournament, then $X\\sim P(2)$ and $P(X\u0026gt;2)=0.3233$.\nLet $Y$ be the number of tournaments in the tennis circuit with no injuries, then $Y\\sim B(6,0.1353)$ and $P(Y\u0026gt;0)=0.5821$.\nQuestion 2 The tables below corresponds to two tests $A$ and $B$ to detect an injury that have been applied to the same sample.\n$$ \\begin{array}{lcc} \\hline \\mbox{Test A} \u0026amp; \\mbox{Injury} \u0026amp; \\mbox{No injury} \\newline \\mbox{Outcome } + \u0026amp; 87 \u0026amp; 14 \\newline \\mbox{Outcome }- \u0026amp; 33 \u0026amp; 866 \\newline \\hline \\end{array} \\qquad \\begin{array}{lcc} \\hline \\mbox{Test B}\u0026amp; \\mbox{Injury} \u0026amp; \\mbox{No injury} \\newline \\mbox{Outcome }+ \u0026amp; 104 \u0026amp; 115 \\newline \\mbox{Outcome }- \u0026amp; 16 \u0026amp; 765 \\newline \\hline \\end{array} $$\nWhich test is more sensitive? Which one is more specific?\nAccording to the predictive values, which test is better to diagnose the injury? Which one is better to rule out the injury?\nAssuming that both tests are independent, what is the probability of getting a right diagnose with both tests if we apply both tests to a healthy person?\nAssuming that both tests are independent, what is the probability of getting at least a positive outcome if we apply both tests to a random person?\nShow solution Let $D$ the event of suffering the injury, and $+$ and $-$ the events of getting a positive and a negative outcome in the test, respectively.\nTest $A$: sen = $0.725$ and spe = $0.9841$.\nTest $B$: sen = $0.8667$ and spe = $0.8693$.\nThus, test $A$ is more specific and test $B$ is more sensitive.\nTest $A$: PPV = $0.8614$ and NPV = $0.9633$.\nTest $B$: PPV = $0.4749$ and NPV = $0.9795$.\nThus, test $A$ is better to diagnose the injury and test $B$ is better to rule out the injury.\n$P(-_A\\cap -_B | \\overline{D}) = 0.8555$.\n$P(+_A\\cup +_B) = 0.2979$.\nQuestion 3 A study tries to determine the effect of a low fat diet in the lifetime of rats. The rats where divided into two groups, one with a normal diet and another with a low fat diet. It is assumed that the lifetimes of both groups are normally distributed with the same variance but different mean. If 20% of rats with normal diet lived more than 12 months, 5% less than 8 months, and 85% of rats with low fat diet lived more than 11 months,\nCompute the means and the standard deviation of the lifetime of rats following a normal diet and a low fat diet?\nIf 40% of the rats were under a normal diet, and 60% of rats under a low fat diet, what is the probability that a random rat die before 9 months?\nShow solution Let $X$ be the life time of a random rat, and let $X_1$ and $X_2$ be the lifetime of rats with a normal diet and a low fat diet respectively,\n$\\mu_1=10.6461$ months, $\\mu_2=12.6673$ months and $s=1.6087$ months.\n$P(X\u0026lt;9)=0.068$.\n","date":1620172800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1620723858,"objectID":"82950554557ef075738b8ca9b4d2516b","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-05-05/","publishdate":"2021-05-05T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-05-05/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 5, 2021\nProbability and random variables Question 1 The average number of injuries in an international tennis tournament is 2.\nCompute the probability that in an international tennis tournament there are more than 2 injuries.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-05-05","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: March 17, 2021\nDescriptive Statistics and Regression Question 1 The chart below shows the distribution of the number of subjects passed in a sample of first year students of a degree.\nDraw the box and whiskers plot and interpret it.\nCompute the central tendency statistics and interpret them.\nHow is the asymmetry of the distribution? And the kurtosis? Can we assume that the sample comes from a normal population?\nIf the mean of subjects passed in the second year was 5.5 and the variance was 2, is the mean of the subjects passed in the first year more or less representative than the one of the second year?\nWhich student is better, a first year student that pass 7 subjects or a second year student that pass 6 subjects?\nUse the following sums for the computations: $\\sum x_i=478$ subjects, $\\sum x_i^2=3036$ subjects$^2$, $\\sum (x_i-\\bar x)^3=29.5$ subjects$^3$ and $\\sum (x_i-\\bar x)^4=1226.27$ subjects$^4$.\nShow solution Quartiles: $Q_1=5$ subjects, $Q_2=6$ subjects, $Q_3=7$ subjects. $IQR = 2$ subjects. Fences: $f_1=2$ subjects and $f_2=10$ subjects. 50% of central data fall between 5 and 7 subjects, that is a moderate dispersion. The are no outliers and the right whisker is a little bit longer than the left one, so the distribution is a little bit right skew but almost normal.\n$\\bar x=5.975$ subjects, $Me=6$ subjects and $Mo=6$ subjects. They are very close, and that means that the distribution is normal.\n$s^2=2.2494$ (subjects)$^2$, $s=1.4998$ subjects and $g_1=0.1093$, so that the distribution is slightly skewed to the right.\n$g_2=0.0295$, so that the distribution is a little bit more peaked than a Gauss bell.\nWe can assume that the sample comes from a normal population as both, the coefficient of skewness and the coefficient of kurtosis, are between -2 and 2.\nLet $Y$ the number of subjects passed the second year. Then, $cv_x=0.251$ and $cv_y=0.2571$. As the coefficient of variation of the first year is a little bit smaller than the one of the second year, the mean of the first year is a little bit more representative.\nStandard score for the first year: $z(7)=0.6834$.\nStandard score for the second year: $z(6)=0.3536$.\nAs the standard score of $7$ the first year is greater than the standard score of $6$ the second year, the firs year student is better.\nQuestion 2 The table below shows the number of days of rehabilitation for a knee injury, and the knee flexion angle in degrees after those days.\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Days} \u0026amp; 10 \u0026amp; 15 \u0026amp; 20 \u0026amp; 25 \u0026amp; 30 \u0026amp; 35 \u0026amp; 40 \u0026amp; 45 \u0026amp; 50 \\newline \\mbox{Angle} \u0026amp; 45 \u0026amp; 58 \u0026amp; 65 \u0026amp; 75 \u0026amp; 82 \u0026amp; 88 \u0026amp; 91 \u0026amp; 93 \u0026amp; 94 \\newline \\hline \\end{array} $$\nCompute the covariance of the number of days of rehabilitation and the knee flexion angle, and interpret it.\nAccording to the regression line, how many degrees increases or decreases the knee flexion angle per day of rehabilitation?\nAccording to the logarithmic model, what is the expected number of degrees of the knee flexion angle after 32 days? Is this prediction more or less reliable than the prediction of the linear model?\nAccording to the exponential model, how many days of rehabilitation are required to get a knee flexion angle of 120degrees. Is this prediction reliable?\nUse the following sums for the computations ($X$=Days of rehabilitation and $Y$=knee flexion angle):\n$\\sum x_i=270$ days, $\\sum \\log(x_i)=29.5894$ $\\log(\\mbox{days})$,$\\sum y_j=691$ degrees, $\\sum \\log(y_j)=38.8298$$\\log(\\mbox{degrees})$,\n$\\sum x_i^2=9600$ days$^2$, $\\sum \\log(x_i)^2=99.5821$$\\log(\\mbox{days})^2$, $\\sum y_j^2=55473$ degrees$^2$,$\\sum \\log(y_j)^2=168.0436$ $\\log(\\mbox{degrees})^2$,\n$\\sum x_iy_j=22560$ days$\\cdot$degrees,$\\sum x_i\\log(y_j)=1190.8727$ days$\\cdot\\log(\\mbox{degrees})$,$\\sum \\log(x_i)y_j=2346.0281$ $\\log(\\mbox{days})$degrees,$\\sum \\log(x_i)\\log(y_j)=128.738$$\\log(\\mbox{days})\\log(\\mbox{degrees})$.\nShow solution $\\overline{x}=30$ days, $s_x^2=166.6667$ days$^2$.\n$\\bar y=76.7778$ degrees, $s_y^2=268.8395$ degrees$^2$.\n$s_{xy}=203.3333$ days$\\cdot$degrees.\nAs the covariance is positive, there is a direct linear relation between the number of days of rehabilitation and the knee flexion angle.\n$b_{yx}=1.22$ degrees/day, therefore the knee flexion angle will increase$1.22$ degrees per day of rehabilitation.\n$\\overline{\\log(x)}=3.2877$ log(days), $s_{\\log(x)}^2=0.2557$log(days)$^2$ and $s_{\\log(x)y}=8.247$ log(days)degrees.\nLogarithmic regression model: $y=-29.2741+32.2571\\log(x)$.\nPrediction: $y(32)=82.5205$ degrees.\nThe logarithmic coefficient of determination is $0.9895$ and the linear coefficient of determination is $0.9227$. Thus, the prediction with the logarithmic model is more reliable as the coefficient of determination of the logarithmic model is greater.\nExponential regression model: $x=e^{0.9324+0.0307y}$.\nPrediction: $x(120)=100.8475$ days.\nThis prediction is not reliable as 120 degrees falls far away of the range of values observed in the sample for the knee flexion angle.\n","date":1615939200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616317723,"objectID":"8d60a06bc0b6ed3414eda4e966b2d502","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-03-17/","publishdate":"2021-03-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-03-17/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 17, 2021\nDescriptive Statistics and Regression Question 1 The chart below shows the distribution of the number of subjects passed in a sample of first year students of a degree.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-03-17","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 18, 2021\nQuestion 1 A drug is administered intravenously at a speed of 15 mg/hour. At the same time, the body methabolizes the drug at a rate of 80% of the amount in the body per hour.\nIf the drug is administered continuously, what will the maximum amount of drug in the body be? Assume that there was no drug in the body at the beginning of the process.\nIf administration is stopped when the amount administered is 150 mg, how long from that point will it take for the patient to have only 10 mg of drug in the body?\nSolution Let $x(t)$ be the amount of drug in the body at any time $t$.\nDifferential equation: $x\u0026rsquo;=15-0.8x$. Initial condition $x(0)=0$. Particular solution: $x(t)=18.75-18.75e^{-0.8t}$ and the maximum amount of drug in the body will be 18.75 mg.\nDifferential equation: $x\u0026rsquo;=-0.8x$. Initial condition $x(0)=18.74$. Particular solution: $x(t)=18.74e^{-0.8t}$ and the time required to have 10 mg of drug in the body will be $0.7851$ hours.\nResolución Question 2 The function $T(x,y)=\\ln(3xy+2x^2-y)$ gives the temperature of the surface of a mountain at latitude $x$ and longitude $y$. Some mountaineers are lost at position $(1,2)$ and are at risk of freezing to death.\nIn which direction should they move to avoid the risk of freezing as fast as possible?\nIf they are in the wrong direction and move so that the longitude decreases half of the increase of the latitude, will the risk of hypothermia increase or decrease?\nIn which direction should they move to keep constant the temperature?\nSolution $\\nabla T(1,2)=\\frac{1}{3}(5,1)$.\nLet $\\mathbf{u}$ the vector $(1,-1/2)$, then $T\u0026rsquo;_{\\mathbf{u}}(1,2) = \\frac{3}{\\sqrt{5}}$ ºC.\nAlong the direction of the vector $(1,-5)$.\nResolución Question 3 A beach ball has a volumen of 50 dm$^3$ at the time when we start to pump air into it at a rate of 2 dm$^3$/min.\nWhat is the speed at which the radius is changing?\nAbout when will the surface of the ball be twice its initial value?\nRemark: The volume of a sphere is $V(r)=\\frac{4}{3}\\pi r^3$ and the surface is $S(r)=4\\pi r^2$.\nSolution $\\dfrac{dr}{dt}=0.0305$ dm/s.\nUsing the linear approximation $dt = S\u0026rsquo;/dS=37.5013$ seconds approximately.\nResolución ","date":1610928000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1612727901,"objectID":"c844d2814ed646f5b268eee41227a75a","permalink":"/en/teaching/calculus/exams/pharmacy-2021-01-18/","publishdate":"2021-01-18T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2021-01-18/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 18, 2021\nQuestion 1 A drug is administered intravenously at a speed of 15 mg/hour. At the same time, the body methabolizes the drug at a rate of 80% of the amount in the body per hour.","tags":["Exam"],"title":"Pharmacy exam 2021-01-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 18, 2021\nQuestion 1 The table below contains the differences between the grades in the final school exam and the entrance exam in a sample of public high schools ($X$) and private high schools ($Y$):\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Public schools} \u0026amp; -1.2 \u0026amp; -0.7 \u0026amp; -0.4 \u0026amp; -0.9 \u0026amp; -1.6 \u0026amp; 0.5 \u0026amp; 0.2 \u0026amp; -1.8 \u0026amp; 0.8\\newline\n\\mbox{Private schools} \u0026amp; -2.1 \u0026amp; -0.5 \u0026amp; -0.7 \u0026amp; -1.9 \u0026amp; 0.2 \u0026amp; -2.8 \u0026amp; -1\\newline\n\\hline \\end{array} $$\nWhich of the following box plots corresponds to each variable? Compare the central dispersion of the two variables according to the box plots. In which variable is smaller the median? In which type of schools is more representative the mean of grades?\nIn which type of schools is more symmetric the distribution of grades?\nIn which type of schools is more peaked the distribution of grades?\nWhich difference is relatively smaller, $-0.5$ points in a public high school or $-1$ points in a private high school?\nUse the following sums for the computations:\nPublic: $\\sum x_i=-5.1$, $\\sum x_i^2=9.63$, $\\sum (x_i-\\bar x)^3=0.95$ and $\\sum (x_i-\\bar x)^4=8.76$.\nPrivate: $\\sum y_i=-8.8$, $\\sum y_i^2=17.64$, $\\sum (y_i-\\bar y)^3=-0.82$ and $\\sum (y_i-\\bar y)^4=11.28$.\nSolution The box plot 1 corresponds to private schools and the box plot 2 to public schools. The central dispersion is pretty similar in both variables. The median is smaller in private schools.\nPublic schools: $\\bar x=-0.5667$ , $s^2=0.7489$ , $s=0.8654$ and $cv=1.5271$.\nPrivate schools: $\\bar y=-1.2571$ , $s^2=0.9396$ , $s=0.9693$ and $cv=0.7711$.\nThus, the mean of the grade is more representative in private schools.\n$g_{1x}=0.1626$ and $g_{1y}=-0.1285$. Thus, the distribution of grades in private schools is more symmetric as the coefficient of skewness is closer to 0.\n$g_{2x}=-1.2651$ and $g_{2y}=-1.1748$. Thus, the distribution of grades in private schools is more peaked.\nPublic schools: $z(-0.5)=0.077$.\nPrivate schools: $z(-1)=0.2653$.\nThus, a difference of grades -0.5 in a public schools is relatively smaller than a difference of -1 in a private school.\nQuestion 2 An auditor is studying the relationship between the salary and the number of absences of a hospital warden. The following table shows the salary in thousands of euros ($X$) and the annual average of absences with that salary ($Y$).\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Salary} \u0026amp; 20.0 \u0026amp; 22.5 \u0026amp; 25 \u0026amp; 27.5 \u0026amp; 30.0 \u0026amp; 32.5 \u0026amp; 35.0 \u0026amp; 37.5 \u0026amp; 40.0 \\newline \\mbox{Absences} \u0026amp; 2.3 \u0026amp; 2.0 \u0026amp; 2 \u0026amp; 1.8 \u0026amp; 2.2 \u0026amp; 1.5 \u0026amp; 1.2 \u0026amp; 1.3 \u0026amp; 0.6 \\newline \\hline \\end{array} $$\nCompute the regression line that best explains the absences as a function of the salary.\nWhat is the expected number of absences that will have a warden with a salary of 29000€? Is this prediction reliable?\nHow much will the number of absences increase or decrease for every increment of 1000€ in the salary?\nUse the following sums for the computations:\n$\\sum x_i=270$ $10^3$€, $\\sum y_i=14.9$ absences,\n$\\sum x_i^2=8475$ ($10^3$€)$^2$, $\\sum y_i^2=27.11$ absences$^2$,\n$\\sum x_iy_j=420$ $10^3$€ absences.\nSolution $\\bar x=30$ $10^3$€, $s_x^2=41.6667$ ($10^3$€)$^2$,\n$\\bar y=1.6556$ absences, $s_y^2=0.2714$ absences$^2$,\n$s_{xy}=-3$ $10^3$€ absences.\nRegression line of absences on salary: $y=3.8156-0.072x$.\n$y(29) = 1.7276$ absences.\n$r^2 = 0.796$, thus the model fits well as the coefficient of determination is not far from 1, but the sample size is too small to be reliable the prediction.\nThe number of absences will decrease 0.072 for every increment of 1000€ in the salary.\nQuestion 3 In a regression study it is known that the regression line of $Y$ on $X$ is $y+2x-10=0$ and the regression line of $X$ on $Y$ is $y+3x-14=0$.\nCompute the means of $X$ and $Y$.\nCompute the linear correlation coefficient and interpret it.\nSolution $\\bar x=4$ and $\\bar y=2$.\n$r=-0.8165$. The linear correlation coefficient is near -1 so there is a strong inverse relation between $X$ and $Y$.\nQuestion 4 A test to detect prostate cancer produces 1% of false positives and 0.2% false negatives. It is known that 1 in 400 males suffer this type of cancer.\nCompute the sensitivity and the specificity of the test.\nIf a male got a positive outcome in the test, what is the chance of developing cancer?\nCompute and interpret the negative predictive value.\nIs this test better to predict or to rule out the cancer?\nTo study whether there is an association between the practice of sports and this type of cancer, a sample of 1000 males was drawn, of which 700 practised sports, and it was observed that there were 2 males with cancer in the group of males who practised sports, and there were 3 males with cancer in the group of males who did not practice sports. Compute the relative risk and the odds ratio and interpret them.\nSolution Let $D$ the event corresponding to suffering prostate cancer and $+$ and $-$ the events corresponding to get a positive and a negative outcome respectively.\nThe sensitivity is $P(+|D) = 0.2$ and specificity $P(-|\\overline D) = 0.99$.\nPositive predictive value $P(D|+) = 0.0476$.\nNegative predictive value $P(\\overline D|-) = 0.998$.\nAs the positive predictive value is smaller than the negative predictive value, this test is better to rule out the disease. In fact, we can not use this test to detect the prostate cancer because the positive predictive value is less than 0.5.\n$RR(D)=0.2857$ and $OR(D)=0.2837$. Thus, there is an association between the practice of sports and the prostate cancer and the risks and the odds of developing cancer is almost one fourth smaller if the male practice sports.\nResolución Question 5 The probability that a child of a mother with the color-blind gene and a father without the color-blind gene is a color-blind male is $0.25$. It is also known that in a population there is one color-blind male for every 5000 males.\nIf this couple has 5 children, what is the probability that at most 2 of them are color-blind males?\nIf this couple has 5 children, and the gender of the children is equiprobable, what is the probability that 3 or more are females?\nIn a random sample of 10000 males of this population, what is the probability that more than 3 are color-blind males?\nSolution Let $X$ be the number of color-blind sons in a sample of 5 children, then $X\\sim B(5, 0.25)$ and $P(X\\leq 2)=0.8965$.\nLet $Y$ be the number of girls in a sample of 5 children, then $Y\\sim B(5, 0.5)$ and $P(Y\\geq 3)=0.5$.\nLet $Z$ be the number of color-blind males in a sample of 10000 males, then $Z\\sim B(10000, 0.0002)\\approx P(2)$ and $P(Z\u0026gt;3)=0.1429$.\nResolución Question 6 The primate cranial capacity follows a normal distribution with mean 1200 cm$^3$ and standard deviation 140 cm$^3$.\nCompute the probability that the cranial capacity of a primate is greater than 1400 cm$^3$.\nCompute the probability that the cranial capacity of a primate is exactly than 1400 cm$^3$.\nAbove what cranial capacity will 20% of primates be?\nCompute the interquartile range of the cranial capacity of primates and interpret it.\nSolution Let $X$ be the primate cranial capacity. Then $X\\sim N(1200,140)$.\n$P(X\u0026gt;1400) = 0.0766$.\n$P(X=1400) = 0$.\n$P_{80} = 1317.827$ cm$^3$.\n$Q_1 = 1105.5714$ cm$^3$, $Q_3 = 1294.4286$ cm$^3$ and $IQR = 188.8571$ cm$^3$. Thus the 50% of central data will be concentranted in an interval of width $188.8571$ cm$^3$, that is a small spread.\nResolución ","date":1610928000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1612657121,"objectID":"4c570671d341494644ab5f9fa875dc09","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-01-18/","publishdate":"2021-01-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-01-18/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 18, 2021\nQuestion 1 The table below contains the differences between the grades in the final school exam and the entrance exam in a sample of public high schools ($X$) and private high schools ($Y$):","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-01-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 23, 2020\nQuestion 1 A test to detect the COVID19 was applied to 850 persons infected by COVID19 with a positive outcome in 800 of them, and it was also applied to 9150 non-infected persons with a positive outcome in 10% of them.\nCompute the sensitivity and the specificity of the test.\nCompute the positive and the negative predictive values and interpret them.\nCompute the probability of a correct diagnostic.\nSolution Let $D$ the event corresponding to suffering COVID19 and $+$ and $-$the events corresponding to get a positive and a negative outcome respectively.\nThe sensitivity is $P(+|D) = 0.9412$ and specificity $P(-|\\overline D) = 0.9$.\nPositive predictive value $P(D|+) = 0.4665$ and negative predictive value $P(\\overline D|-) = 0.994$. As the positive predictive value is less than 0.5 we can not use this test to confirm COVID19, but we can use it to rule it out with a strong confidence since the negative predictive value is pretty close to 1.\n$P(D\\cap +) + P(\\overline D\\cap -) = 0.9035$.\nQuestion 2 A newborn baby affected by Moebius syndrome blinks, on average, twice a minute.\nCompute the probability that a newborn blinks twice in half a minute.\nIn a hospital five children have been born with Moebius syndrome. Compute the probability that at least 3 of them blink in their first minute of life.\nIn which distribution is more representative the mean, in the number of times that a newborn blinks in a minute or in the number of times that a newborn blinks in half a minute?\nSolution Let $X$ be the number of times that a newborn blinks in half a minute, then $X\\sim P(1)$ and $P(X=2)=0.1839$. Let $Y$ be the number of newborns that blink in their first minute of life in a sample of 5 newborns, then $Y\\sim B(5,0.8647)$ and $P(Y\\geq 3)=0.98$. Let $Z$ be the number of times that a newborn blinks in a minute, then $cv_z = 0.7071$ and $cv_x = 1$. Thus, the mean of $Z$ represents better since its coefficient of variation is smaller. Question 3 The prolactin level in pregnant and non-pregnant females follows anormal distribution with different means but with the same variance.When the prolactin levels exceed 15 ng/ml, females secrete milk through their mammary glands. It is known that 95% of pregnant females secrete milk but only 1% of non-pregnant females secret milk.\nIf the median of the prolactin level in pregnant females is 16 ng/ml, what are the means and the standard deviation of the prolactin level in both populations?\nCompute the percentage of pregnant females with a prolactin level between 15.5 and 17 ng/ml.\nCompute the prolactin level such that 20% of pregnant females are above that level.\nSolution Let $X$ and $Y$ be the prolactin levels in pregnant and non-pregnant females respectively.\n$\\mu_x=16$ ng/ml, $\\mu_y=13.5857$ ng/ml and $\\sigma=0.608$ ng/ml.\n$P(15.5\u0026lt;X\u0026lt;17) = 0.7446$, so 74.4583% of pregnant females.\n$P_{80} = 16.5117$ ng/ml.\nQuestion 4 An organism has the same chance of being infected by a virus and a bacteria. At the same time, the probability of being infected by a virus doubles when the organism has been previously infected by a bacteria. On the other hand, the probability of being infected by no pathogen (neither virus nor bacteria) is $0.52$.\nWhat is the probability of being infected by a virus and a bacteria at the same time?\nWhat is the probability of being infected by a bacteria if it has been infected by a virus?\nWhat is the probability of being infected only by a virus?\nAre the events of being infected by a virus an a bacteria independent?\nSolution Let $V$ and $B$ the events corresponding to be infected by a virus and a bacteria respectively.\n$P(V\\cap B) = 0.32$.\n$P(B|V) = 0.8$.\n$P(V\\cap \\overline B) = 0.08$.\nThe events are dependents since $P(V) = 0.4 \\neq 0.8 = P(V|B)$.\n","date":1606089600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1611391437,"objectID":"96467a52325f88fdb9dbe65d260306cc","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-11-23/","publishdate":"2020-11-23T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-11-23/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 23, 2020\nQuestion 1 A test to detect the COVID19 was applied to 850 persons infected by COVID19 with a positive outcome in 800 of them, and it was also applied to 9150 non-infected persons with a positive outcome in 10% of them.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2020-11-23","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 26, 2020\nQuestion 1 The table below shows the daily number of patients hospitalized in a hospital during the month of September.\n$$ \\begin{array}{cr} \\mbox{Patients} \u0026amp; \\mbox{Frequency} \\newline \\hline (10,14] \u0026amp; 6 \\newline (14,18] \u0026amp; 10 \\newline (18,22] \u0026amp; 7 \\newline (22,26] \u0026amp; 6 \\newline (26,30] \u0026amp; 1 \\newline \\hline \\end{array}$$\nStudy the spread of the 50% of central data.\nCompute the mean and study the dispersion with respect to it.\nStudy the normality of the patients distribution.\nIf the mean was 35 patients and the variance 40 patients$^2$ during the month of April, which month had a higher relative variability?\nWhich number of people hospitalized was greater, 20 persons in September or 40 in April?\nUse the following sums for the computations:\n$\\sum x_in_i=544$ patients, $\\sum x_i^2n_i=10464$ patients$^2$, $\\sum (x_i-\\bar x)^3n_i=736.14$ patients$^3$ and $\\sum (x_i-\\bar x)^4n_i = 25367.44$ patients$^4$.\nSolution $Q_1=16$ patients, $Q_3=20$ patients and $IQR=4$ patients. Thus the central dispersion is small.\n$\\bar x=18.1333$ patients, $s^2=19.9822$ patients$^2$, $s=4.4701$ patients and $cv=0.2465$. Thus, the dispersion with respect to the mean is small and the mean represents well.\n$g_1=0.2747$ and $g_2=-0.8823$. As the coefficient of skewness and the coefficient of kurtosis fall between -2 and 2, we can assume that the sample comes from a normal population.\nLet $Y$ be the daily number of patients hospitalized during April. Then, $cv_y=0.1807$. Since the coefficient of variation in September is greater than the one in April, there is a relative higher variability in September.\nSeptember: $z(20)=0.4176$.\nApril: $z(40)=0.7906$.\nThus, 40 patients hospitalized in April is relatively higher than 20 in September as its standard score is greater.\nQuestion 2 The chart below shows the distribution of scores in three subjects.\nWhich subject is more difficult?\nWhich subject has more central dispersion?\nWhich subjects have outliers?\nWhich subject is more asymmetric?\nSolution Subject $Y$ because its scores are smaller.\nSubject $X$ because the box is wider.\nSubject $Z$ because there is a score out of the whiskers.\nSubject $Z$ because the distance from the first quartile to the median (left side of the box) is greater than the distance from the third quartile to the median (right side of the box).\nQuestion 3 In a sample of 10 families with a son older than 20 it has been measured the height of the father ($X$), the mother ($Y$) and the son ($Z$) in centimetres, getting the following results:\n$\\sum x_i=1774$ cm, $\\sum y_i=1630$ cm, $\\sum z_i=1795$ cm,\n$\\sum x_i^2=315300$ cm$^2$, $\\sum y_i^2=266150$ cm$^2$, $\\sum z_i^2=322737$ cm$^2$,\n$\\sum x_iy_j=289364$ cm$^2$, $\\sum x_iz_j=318958$ cm$^2$, $\\sum y_iz_j=292757$ cm$^2$.\nOn which height does the height of the son depend more linearly, the height of the father or the mother?\nUsing the best linear regression model, predict the height of a son with a father 181 cm tall and a mother 163 cm tall.\nAccording to the linear model, how much will increase the height of the son for each centimetre that increases the height of the father? And for each centimetre that increases the height of the mother?\nHow would the reliability of the prediction be if the heights were measured in inches? (An inch is 2.54 cm).\nSolution $\\bar x=177.4$ cm, $s_x^2=59.24$ cm$^2$,\n$\\bar y=163$ cm, $s_y^2=46$ cm$^2$,\n$\\bar z=179.5$ cm, $s_z^2=53.45$ cm$^2$,\n$s_{xz}=52.5$ cm$^2$ and $s_{yz}=17.2$ cm$^2$.\n$r^2_{xz}=0.8705$ and $r^2_{yz}=0.1203$, thus the height of the son depends linearly more on the height of the father since the $r^2_{xz}\u0026gt;r^2_{yz}$.\nRegression line of $Z$ on $X$: $z=22.2836 + 0.8862x$ and $z(181)=182.6904$ cm.\nThe height of the son will increase $0.8862$ cm per cm of the height of the father and $0.3739$ cm per cm of the height of the mother.\nThe reliability of the prediction will be the same, as after applying the same linear transformation to $X$ and $Z$, the variances are multiplied by the square of the slope and the covariance is also multiplied by the square of the slope.\n","date":1603670400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1611391437,"objectID":"0912b77d9377cae59002d5676b9dded0","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-10-26/","publishdate":"2020-10-26T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-10-26/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 26, 2020\nQuestion 1 The table below shows the daily number of patients hospitalized in a hospital during the month of September.\n$$ \\begin{array}{cr} \\mbox{Patients} \u0026amp; \\mbox{Frequency} \\newline \\hline (10,14] \u0026amp; 6 \\newline (14,18] \u0026amp; 10 \\newline (18,22] \u0026amp; 7 \\newline (22,26] \u0026amp; 6 \\newline (26,30] \u0026amp; 1 \\newline \\hline \\end{array}$$","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2020-10-26","type":"book"},{"authors":null,"categories":["R"],"content":"Table of Contents What is rkTeaching? Installation Installation on Windows Installation on Mac OS Installation on Linux Statistical procedures Functionality How to cite rkTeaching? What is rkTeaching? rkTeaching is an R package that provides a plugin for the graphical user interface RKWard adding new menus and dialog specially designed for teaching and learning Statistics.\nThis package has been developed and is maintained by Alfredo Sánchez Alberca asalber@ceu.es in the Department of Applied Math and Statistics of the San Pablo CEU of Madrid.\nIf you find out some error or have a suggestion, please, let me know it by email or opening an issue on Github.\nInstallation Installation on Windows For Windows users there is a bundle that include R, RKWard and rkTeaching.\nDownload the last version (R version 4.3, RKWard version 0.8, rkTeaching version 1.3.0)\nDownload the previous version (R version 3.6.2, RKWard version 0.7.1b, rkTeaching version 1.3.0)\nOnce the file is downloaded, all you have to do is to execute it. It will ask for the installation unit and directory. It is recommended to install it on the root of unit C, that ist C:\\. The installation creates a folder RKWard into the installation directory. There, in the bin folder you have to execute the rkward.exe file to start the program.\nThe following video tutorial shows the installation process (in Spanish).\nInstallation on Mac OS To install the software on Mac OS systems, you must take the following steps:\nInstall R. R can be downloaded from the following link https://cran.r-project.org/.\nIt is recommended to install the version 4.3 of R for MacOs. Depending the computer processor you must select the arm64 version for computers with a silicon chip (M1-3) or the x86_64 version for computers with an Intel chip.\nR version 4.3 for MacOs with silicon chip (M1-3) R version 4.3 for MacOs with Intel chip (x86) Install RKWard. RKWard can be downloaded from the web https://rkward.kde.org/.\nYou must select the distribution corresponding to Mac Os ( https://rkward.kde.org/RKWard_on_Mac.html).\nAfter downloading it follow the installation instructions\nIt is important having a version of Mac OX X 10.15 or higher, because RKWard does not work with previous versions.\nIf you get some errors during the installation process, check for possible solutions at ( http://rkward.sourceforge.net/wiki/RKWard_on_Mac#Troubleshooting)\nInstall the packages that rkTeaching depends on. The rkTeaching package depends on several packages that should be installed first. To install this packages you must run RKWard, open the R console and type the following commands:\ninstall.packages(c(\u0026quot;R2HTML\u0026quot;,\u0026quot;car\u0026quot;,\u0026quot;e1071\u0026quot;,\u0026quot;Hmisc\u0026quot;, \u0026quot;ez\u0026quot;, \u0026quot;multcomp\u0026quot;, \u0026quot;psych\u0026quot;, \u0026quot;probs\u0026quot;, \u0026quot;tidyverse\u0026quot;, \u0026quot;knitr\u0026quot;, \u0026quot;kableExtra\u0026quot;, \u0026quot;remotes\u0026quot;)) Install rkTeaching. To install the rkTeaching package you must type the following commands in the R console:\nlibrary(remotes) install_github(\u0026quot;rkward-community/rk.Teaching\u0026quot;) The following video tutorial shows the installation process (only for RKWard version 0.7.0).\nInstallation on Linux To install the software in Linux systems, you must take the following steps:\nInstall R. R can be downloaded from the web https://cran.r-project.org/. You have to select the Linux distribution and follow the instructions there. It is required an R version 3.4 or higher.\nWith Debian based distributions like Ubuntu, you can install R from the command line typing the command:\nsudo apt-get install rbase Install RKWard. RKWard can be downloaded from the web https://rkward.kde.org/. You have to select the Linux distribution and follow the instructions there.\nWith Debian based distributions like Ubuntu, you can install R from the command line typing the command:\nsudo apt-get install rkward Install the packages that rkTeaching depends on. The rkTeaching package depends on several packages that should be installed first. To install this packages you must run RKWard, open the R console and type the following commands:\ninstall.packages(c(\u0026quot;R2HTML\u0026quot;,\u0026quot;car\u0026quot;,\u0026quot;e1071\u0026quot;,\u0026quot;Hmisc\u0026quot;, \u0026quot;ez\u0026quot;, \u0026quot;multcomp\u0026quot;, \u0026quot;psych\u0026quot;, \u0026quot;probs\u0026quot;, \u0026quot;tidyverse\u0026quot;, \u0026quot;knitr\u0026quot;, \u0026quot;kableExtra\u0026quot;, \u0026quot;remotes\u0026quot;)) Install rkTeaching. To install the rkTeaching package you must type the following commands in the R console:\nlibrary(remotes) install_github(\u0026quot;rkward-community/rk.Teaching\u0026quot;) The following video tutorial shows the installation process (in Spanish).\nStatistical procedures Once installed a new menu Teaching will appear in RKWard with the following statistical procedures:\nData manipulation: Fiter data Calculate variable Recoding variable Weight data Frequency distributions: Frequency tabulation Bidimensional frequency tabulation Plots: Bar chart Histogram Pie chart Box plot Means chart Interaction chart Line chart Scatterplot Scatterplot matrix Descriptive statistics Statistics Regression: Correlation Linear Regression Non linear regression Regression model comparison Regression prediction Parametric tests: Means: T test for one sample T test for two independent samples T test for two paired samples ANOVA Sample size calculation for mean estimation Variances: Fisher test for two samples Levene test for multiple samples Proportions: Test for one proportion Test for two proportions Sample size calculation for proportion estimation Non parametric tests: Normality tests: Shapiro-Wilk, Kolmogorov U Mann-Whitney test Wilcoxon test Friedman test Kruskal-Wallis test Chi-square test Concordance Intraclass correlation coefficient Cohen\u0026rsquo;s kappa Probability: Random games: Coins Dice Cards Urn Build probability space Combine probability spaces Repeat probability space Calculate probability Probability distributions Discrete: Binomial Geometric Hypergeometric Poisson Continous: Uniform Normal Chi-square Student\u0026rsquo;s T Fisher\u0026rsquo;s F Simulations: Law of rare events Functionality Menus and dialogs specially designed to easy the learning, ruling out non-common options to get an simplified and intuitive interface.\nAll the dialogs have a wizard that guide the user step by step through the statistical procedure. HTML output tha presents the results of the analysis in a clear and concise way. Charts based in the modern ggplot2 package. Computation formulas and details available for some statistical procedures. rkTeaching is maintained by asalber.\nHow to cite rkTeaching? Sánchez-Alberca, A. (2024). rkTeaching (version 1.4) [software]. Get from: http://aprendeconalf.es/projects/rkteaching.\n","date":1598918400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1645683840,"objectID":"91a88147a2d56cb2aead87000b360f2e","permalink":"/en/project/rkteaching/","publishdate":"2020-09-01T00:00:00Z","relpermalink":"/en/project/rkteaching/","section":"project","summary":"An R package for teaching and learning Statistics","tags":["RKWard","rkTeaching","Software"],"title":"rkTeaching","type":"project"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 19, 2020\nDescriptive Statistics and Regression Question 1 To see if the confinement due to COVID-19 has influenced the performance of a course, the number of failed subjects of each student in the current course and in the previous year course has been counted, obtaining the table below.\n$$ \\begin{array}{crr} \\mbox{Failed subjects} \u0026amp; \\mbox{Previous year course} \u0026amp; \\mbox{Current course} \\newline \\hline 0 \u0026amp; 7 \u0026amp; 8 \\newline 1 \u0026amp; 15 \u0026amp; 12 \\newline 2 \u0026amp; 11 \u0026amp; 8 \\newline 3 \u0026amp; 5 \u0026amp; 7 \\newline 4 \u0026amp; 4 \u0026amp; 3 \\newline 5 \u0026amp; 2 \u0026amp; 2 \\newline 6 \u0026amp; 1 \u0026amp; 2 \\newline 8 \u0026amp; 0 \u0026amp; 1 \\newline \\hline \\end{array}$$\nDraw the box plots of the failed subjects in the current and the previous year courses and compare them. Can we assume that both samples come from a normal population? In which sample the mean is more representative? Which number of failed subjects is greater, 7 in the current course or 6 in the previous year course? Use the following sums for the computations:\nPrevious year course: $\\sum x_in_i=84$, $\\sum x_i^2n_i=254$, $\\sum (x_i-\\bar x)^3n_i=122.99$ y $\\sum (x_i-\\bar x)^4n_i=669.21$.\nCurrent course: $\\sum y_in_i=91$, $\\sum y_i^2n_i=341$, $\\sum (y_i-\\bar y)^3n_i=301.16$ y $\\sum (y_i-\\bar y)^4n_i=2012.88$.\nShow solution Both distributions are pretty similar. The central dispersion is the same and both are right skewed. The only difference is that there is an outlier in the current year distribution. 2. Previous year course: $\\bar x=1.8667$, $s^2=2.16$, $s=1.4697$, $g_1=0.8609$ and $g_2=0.1874$. Current course: $\\bar y=2.1163$, $s^2=3.4516$, $s=1.8578$, $g_1=1.0922$ and $g_2=0.9292$. As the coefficients of skewness and kurtosis are between -2 and 2, we can assume that both distributions come from a normal distribution. 3. Previous year course: $cv=0.7873$. Current year: $cv=0.8779$. Thus, the mean is more representative in the previous year course, since the coefficient of variation is smaller. 4. Previous year course: $z(6)=2.8124$. Current course: $z(7)=2.6287$. Thus, 7 failed subjects in the current course is relatively less than 6 in the previous year course, since the standard score is smaller.\nQuestion 2 A study tries to develop a new technique for detecting a certain antibody. For this, a piezoelectric immunosensor is used, which allows to measure the change in the signal in Hz by varying the concentration of the antibody ($\\mu$g/ml). The table below presents the data collected.\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Concentration ($\\mu$g/ml)} \u0026amp; 5 \u0026amp; 8 \u0026amp; 20 \u0026amp; 35 \u0026amp; 50 \u0026amp; 80 \u0026amp; 110 \\newline \\mbox{Signal (Hz)} \u0026amp; 50 \u0026amp; 70 \u0026amp; 100 \u0026amp; 150 \u0026amp; 170 \u0026amp; 190 \u0026amp; 200 \\newline \\hline \\end{array}$$\nCompute the logarithmic model of the change in the signal on the concentration of the antibodies.\nIt was observed that at a concentration of 100 $\\mu$g/ml the change in signal tends to stabilize. Predict the value of the signal corresponding to such concentration using the logarithmic model.\nPredict the antibody concentration that corresponds to a change in the signal of 120 using the exponential model.\nUse the following sums for the computations ($X$=Concentration and $Y$=Signal):\n$\\sum x_i=308$ Hz, $\\sum \\log(x_i)=23.2345$ $\\log(\\mbox{Hz})$, $\\sum y_j=930$ $\\mu$g/ml, $\\sum \\log(y_j)=33.4575$ $\\log(\\mbox{$\\mu$g/ml})$,\n$\\sum x_i^2=22714$ Hz$^2$, $\\sum \\log(x_i)^2=85.1299$ $\\log(\\mbox{Hz})^2$, $\\sum y_j^2=144900$ $\\mu$g/ml$^2$, $\\sum \\log(y_j)^2=161.6475$ $\\log(\\mbox{$\\mu$g/ml})^2$,\n$\\sum x_iy_j=53760$ Hz$\\cdot\\mu$g/ml, $\\sum x_i\\log(y_j)=1580.3905$ Hz$\\cdot\\log(\\mbox{$\\mu$g/ml})$, $\\sum \\log(x_i)y_j=3496.6333$ $\\log(\\mbox{Hz})\\mu$g/ml, $\\sum \\log(x_i)\\log(y_j)=114.7297$ $\\log(\\mbox{Hz})\\log(\\mbox{$\\mu$g/ml})$.\nShow solution $\\overline{\\log(x)}=3.3192$ log($\\mu$g/ml), $s_{\\log(x)}^2=1.1442$ log($\\mu$g/ml)$^2$. $\\bar y=132.8571$ Hz, $s_y^2=3048.9796$ Hz$^2$. $s_{\\log(x)y}=58.5379$ log($\\mu$g/ml)Hz. Logarithmic regression model: $y=-36.9501+51.1589\\log(x)$. Prediction: $y(100)=198.6453$ Hz. Exponential regression model: $y=e^{0.7685+0.0192y}$. Prediction: $y(120)=21.5929$ $\\mu$g/ml. Probability and Random Variables Question 3 Two symptoms of COVID-19 are fever and cough. We know that 30% of people with COVID-19 cough and 20% have fever and cough. Also, if somebody with COVID-19 have fever then the probability of coughing 0.5.\nConstruct the probability tree for the sample space of the random experiment consisting in picking a random person with COVID-19 and measuring the symptoms that he or she have.\nCalculate the probability of having any of the symptoms.\nCalculate the probability of having only cough.\nCalculate the probability of having only fever.\nCalculate the probability no fever nor cough.\nAre the symptoms dependent or independent?\nShow solution Let $C$ and $F$ be the events of having cough and fever respectively. According to the statement $P(C)=0.3$, $P(C\\cap F)=0.2$ and $P(C|F)=0.5$. 2. $P(C\\cup F) = 0.5$. 3. $P(C\\cap \\overline F) = 0.1$. 4. $P(\\overline C \\cap F) = 0.2$. 5. $P(\\overline C \\cap \\overline F) = 0.5$. 6. The events are dependent since $P(C)\\neq P(C|F)$. Question 4 The sensitivity and specificity of a diagnostic test are 0.58 and 0.01, respectively, and the probability of a true positive is 0.02.\nCalculate the prevalence of the disease.\nCalculate predictive values.\nIs the test more useful to rule out or confirm the disease?\nIf we have 10 non-sick patients, what is the probability that more than 9 have a misdiagnosis?\nIf we have 60 patients, what is the probability that at least two of them have a correct diagnosis?\nShow solution $P(D) = 0.0345$. $PPV = P(D|+) = 0.0205$ and $NPV = P(\\overline D|-) = 0.4$. The test is not helpful to confirm nor to rule out the disease, since both the positive and the negative predictive values are below 0.5. Let $X$ be the number non sick patients with a positive outcome, then $X\\sim B(10, 0.99)$, and $P(X\\geq 9)=0.9957$. Let $Y$ be the number of patients with a right diagnose, then $Y\\sim B(60, 0.0297)\\approx P(1.7793)$, and $P(Y\\geq 2)=0.531$. Question 5 The time required to cure a basketball injury with a rehabilitation technique follows a normal distribution with quartiles $Q_1 = 22$ days and $Q_2 = 25$ days.\nCalculate the mean and standard deviation of the curation time.\nIf a player has just been injured and has to play a match in 30 days, what is the probability that he will miss it?\nCalculate the interquartile range of the curation time distribution.\nShow solution Let $X$ be the time required to cure the injury, then $X\\sim N(25, 4.4478)$. $P(X \u0026gt; 30) = 0.1305$. $IQR = 6$ days. ","date":1592524800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4355f28ec946456c86aa32dfd51f95bd","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-06-19/","publishdate":"2020-06-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-06-19/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 19, 2020\nDescriptive Statistics and Regression Question 1 To see if the confinement due to COVID-19 has influenced the performance of a course, the number of failed subjects of each student in the current course and in the previous year course has been counted, obtaining the table below.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2020-06-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 25, 2020\nDescriptive Statistics and Regression Question 1 In a course there are 150 students, of which 50 are working students and the other 100 non-working students. The table below shows the frequency distribution of the grade in an exam of these two groups:\n$$ \\begin{array}{crr} \\mbox{Grade} \u0026amp; \\mbox{Num non-working students} \u0026amp; \\mbox{Num working students} \\newline \\hline 0-2 \u0026amp; 8 \u0026amp; 2 \\newline 2-4 \u0026amp; 15 \u0026amp; 9 \\newline 4-6 \u0026amp; 25 \u0026amp; 19 \\newline 6-8 \u0026amp; 38 \u0026amp; 11 \\newline 8-10 \u0026amp; 14 \u0026amp; 9 \\newline \\hline \\end{array} $$\nCompute the percentage of students that passed the exam (a grade 5 or above) in both groups, working and non-working students.\nIn which group is there a higher relative dispersion of the grade with respect to the mean?\nWhich grade distribution is more asymmetric, the distribution of working students, or the non-working students one?\nTo apply for a scholarship to go abroad, the grade must be transformed applying the linear transformation $Y = 0.5 + X * 1.45$. Compute the mean of Y for the two groups. How changes the asymmetry of the two groups?\nWhich grade is relatively higher, 6 in the working students group, or 7 in the non-working students group?\nUse the following sums for the computations:\nNon-working students: $\\sum x_in_i=570$, $\\sum x_i^2n_i=3764$, $\\sum (x_i-\\bar x)^3n_i=-547.8$ and $\\sum (x_i-\\bar x)^4n_i=6475.73$.\nWorking students: $\\sum y_in_i=282$, $\\sum y_i^2n_i=1826$, $\\sum (y_i-\\bar y)^3n_i=-1.31$ and $\\sum (y_i-\\bar y)^4n_i=2552.14$.\nSolution 35.5% of non-working students passed and 41% of working students passed. Non-working students: $\\bar x=5.7$, $s^2=5.15$, $s=2.2694$ and $cv=0.3981$. Working students: $\\bar y=5.64$, $s^2=4.7104$, $s=2.1703$ and $cv=0.3848$. The sample of non-working students has a slightly higher relative dispersion with respect to the mean as the coefficient of variation is greater. Non-working students: $g_1=-0.4687$. Working students: $g_1=-0.0026$. Thus, the sample of non-working students is more asymmetric as the coefficient os skewness is further from 0. Non-working students: $\\bar y=8.765$. Working students: $\\bar x=8.678$. The coefficient of skewness does not change as the slope of the linear transformation is positive. Non-working students: $z(7)=0.5728$. Working students: $z(6)=0.1659$. Thus, a 7 in the sample of non-working students is relatively higher than than a 6 in the sample of working students, as its standard score is greater. Question 2 The effect of a doping substance on the response time to a given stimulus was analyzed in a group of patients. The same amount of substance was administered in successive doses, from 10 to 80 mg, to all the patients. The table below shows the average response time to the stimulus, expressed in hundredths of a second:\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Dose (mg)} \u0026amp; 10 \u0026amp; 20 \u0026amp; 30 \u0026amp; 40 \u0026amp; 50 \u0026amp; 60 \u0026amp; 70 \u0026amp; 80 \\newline \\mbox{Response time ($10^{-2}$ s)} \u0026amp; 28 \u0026amp; 46 \u0026amp; 62 \u0026amp; 81 \u0026amp; 100 \u0026amp; 132 \u0026amp; 195 \u0026amp; 302 \\newline \\hline \\end{array} $$\nAccording to the linear regression model, how much will the response time increase or decrease for each mg we increase the dose?\nBased on the exponential model, what will be the expected response time for a 75 mg dose?\nIf a response time greater than one second is considered dangerous for health, from what level should the administration of the doping substance be regulated, or even prohibited, according to the logarithmic model?\nUse the following sums for the computations:\n$\\sum x_i=360$ mg, $\\sum \\log(x_i)=29.0253$ $\\log(\\mbox{mg})$, $\\sum y_j=946$ $10^{-2}$ s, $\\sum \\log(y_j)=36.1538$ $\\log(\\mbox{$10^{-2}$ s})$,\n$\\sum x_i^2=20400$ mg$^2$, $\\sum \\log(x_i)^2=108.7717$ $\\log(\\mbox{mg})^2$, $\\sum y_j^2=169958$ $10^{-2}$ s$^2$, $\\sum \\log(y_j)^2=167.5694$ $\\log(\\mbox{$10^{-2}$ s})^2$,\n$\\sum x_iy_j=57030$ mg$\\cdot 10^{-2}$ s, $\\sum x_i\\log(y_j)=1758.6576$ mg$\\cdot\\log(\\mbox{$10^{-2}$ s})$, $\\sum \\log(x_i)y_j=3795.4339$ $\\log(\\mbox{mg})10^{-2}$ s, $\\sum \\log(x_i)\\log(y_j)=134.823$ $\\log(\\mbox{mg})\\log(\\mbox{$10^{-2}$ s})$.\nSolution $\\bar x=45$ mg, $s_x^2=525$ mg$^2$. $\\bar y=118.25$ $10^{-2}$ s, $s_y^2=7261.6875$ $10^{-4}$ s$^2$. $s_{xy}=1807.5$ mg$\\cdot 10^{-2}$ s. $b_{yx} = 3.4429$ $10^{-2}$ s/mg. Therefore, the response time increases $3.4429$ hundredths of a second for each mg the dose is increased. $\\overline{\\log(y)}=4.5192$ log($10^{-2}$ s), $s_{\\log(y)}^2=0.5227$ log($10^{-2}$ s)$^2$. $s_{x\\log(y)}=16.4669$ mg$\\cdot\\log(10^{-2}$ s). Exponential regression model: $y=e^{3.1078+0.0314x}$. Prediction: $y(75)=235.1434$ $10^{-2}$ s. Exponential coefficient of determination: $r^2=0.988$ Thus, the exponential model fits almost perfectly to the cloud of points of the scatter plot, but the sample is too small to get reliable predictions. Logarithmic regression model: $x=-97.3603+31.501\\ln(y)$. Prediction: $x(100)=47.7072$ mg. Probability and Random Variables Question 3 A hospital orders a DNA compatibility test to three labs A, B and C. Lab A performs 40 test a day, lab B 50, and lab C 60. It is known that the probability of a wrong diagnose is 20% in lab A, 18% in lab B and 22% in lab C. If we select a random test of the hospital,\nCompute the probability of wrong diagnose in that test.\nIf the test is wrong, what is the probability that it has been performed by lab B?\nIf the test is right, which lab is more likely to have performed the test?\nSolution Let $A$, $B$ and $C$ be the events of performing the test in labs $A$, $B$ and $C$ respectively, and $R$ the event of getting a right diagnose. According to the statement $P(A)=0.2667$, $P(B)=0.3333$, $P(C)=0.4$, $P(R|A)=0.8$, $P(R|B)=0.82$ and $P(R|C)=0.78$.\n$P(\\overline R) = 0.2013$. $P(B|\\overline R) = 0.298$. $P(A|R) = 0.2671$, $P(B|R) = 0.3422$ and $P(C|R) = 0.3907$, thus, it is more likely that it has been performed in lab $C$. Question 4 An epidemiological study tries to determine the effectiveness of face masks to prevent the COVID19. In a sample 4000 persons without the virus and 1000 persons with it were selected. I was observed that in the group of infected people 120 had used face masks in the two previous weeks, while in the non-infected group, 1250 had used face masks in the two previous weeks.\nCompute the relative risk of been infected with face masks.\nCompute the odds ratio of been infected with face masks.\nWhich association measure is more reliable?\nSolution Let $D$ be the event of being infected.\n$RR(D)=0.3613$. Thus, the risk of being infected with face mask is almost one third of the likelihood of been infected without face mask. $OR(D)=0.3$. Thus, the odds of being infected with face mask is less than one third of the likelihood of been infected without face mask. As we can not compute the prevalence of $D$, the odds ratio is more reliable. Question 5 During the COVID19 quarantine a telephone exchange with 4 telephone operators received an average of 12 calls per day. Assuming that the calls are equally distributed among the operators,\nCompute the probability that an operator received more than 3 calls a day.\nCompute the probability that all the the operators received some call a day.\nSolution Let $X$ be the number of calls that arrive to one operator, then $X\\sim P(3)$, and $P(X\u0026gt;3)=0.3528$. Let $Y$ be the number of operators that receive some call, then $Y\\sim B(4, 0.9502)$, and $P(Y=4)=0.8152$. Question 6 In a course with 200 students the score of a test to measure the intelligence quotient follows a normal distribution. After applying the test to the students 10 of them got a score above 130 and 30 of them a score below 60.\nCompute the mean and the standard deviation of the score.\nHow many students will have a score between 90 and 95?\nCompute the limits of the interval centered at the mean that accumulates 95% of the scores.\nSolution Let $X$ be the score of the test then $X\\sim N(87.058, 26.1069)$ $P(90\\leq X \\leq 95) = 0.0747$, that is, around $14.9309$ students. Interval with 95% of probability $(35.8895, 138.2265)$. ","date":1590364800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1646900374,"objectID":"07181cbc4126ee87dcf255d682414dbc","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-05-25/","publishdate":"2020-05-25T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-05-25/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 25, 2020\nDescriptive Statistics and Regression Question 1 In a course there are 150 students, of which 50 are working students and the other 100 non-working students.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2020-05-25","type":"book"},{"authors":["Parrab-Blesa, Alfonso; Sanchez-Alberca, Alfredo; Garcia-Medina, Jose Javier"],"categories":[],"content":"","date":1577836800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"c28983392176b78c2c324f2c97d13ef2","permalink":"/en/publication/clinical-2020/","publishdate":"2020-09-16T21:26:02.134179Z","relpermalink":"/en/publication/clinical-2020/","section":"publication","summary":"Primary open-angle glaucoma (POAG) is considered one of the main causes of blindness. Detection of POAG at early stages and classification into evolutionary stages is crucial to blindness prevention. Methods: 1001 patients were enrolled, of whom 766 were healthy subjects and 235 were ocular hypertensive or glaucomatous patients in different stages of the disease. Spectral domain optical coherence tomography (SD-OCT) was used to determine Bruch’s membrane opening-minimum rim width (BMO-MRW) and the thicknesses of peripapillary retinal nerve fibre layer (RNFL) rings with diameters of 3.0, 4.1 and 4.7 mm centred on the optic nerve. The BMO-MRW rim and RNFL rings were divided into seven sectors (G-T-TS-TI-N-NS-NI). The k-means algorithm and linear discriminant analysis were used to classify patients into disease stages. Results: We defined four glaucoma stages and provided a new model for classifying eyes into these stages, with an overall accuracy greater than 92% (88% when including healthy eyes). An online application was also implemented to predict the probability of glaucoma stage for any given eye. Conclusions: We propose a new objective algorithm for classifying POAG into clinical-evolutionary stages using SD-OCT.","tags":[],"title":"Clinical-Evolutionary Staging System of Primary Open-Angle Glaucoma Using Optical Coherence Tomography","type":"publication"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Dic 16, 2019\nQuestion 1 A lagoon contaminated with nitrates contains 1000 tons of nitrates dissolved in 6 millions of cubic meters of water. To decontaminate the lagoon, we start to introduce pure water into the lagoon at a rate of 100000 cubic meters per day, and we take out the same amount of contaminated water. Assuming that the concentration of nitrates remains uniform in the lagoon, what amount of nitrates will be in the lagoon after two weeks? If the maximum concentration of nitrates to consider a water not contaminated is $0.1$ kg/m$^3$, when will the lagoon be decontaminated?\nSolution Let $n(t)$ the amount of nitrates in the lagoon at time $t$.\nDifferential equation: $n\u0026rsquo;=-n/60$.\nSolution: $n(t)=10^6 e^{-t/60}$.\n$n(14)=791889.6$ kg.\nThe lagoon will be decontaminated after $30.6495$ days. Question 2 The temperature $T$ of a chemical reaction depends on the concentrations of two substances $x$ and $y$ according to the function $T(x,y)=-x^3+4x^2y-3y^2$.\nIf the concentration of $x$ and $y$ are 2 gr/dl and 1 gr/dl respectively, how must the two concentrations be changed to increase the temperature the maximum? How is the variation of the temperature if we change the two concentration in that direction?\nHow must the two concentrations be changed to increase the temperature at a rate of 10 ºC (gr/dl)$^{-1}$?\nSolution $x$ and $y$ must be changed along the direction of the gradient $\\nabla T(2,1) = (4, 10)$. Along this direction the rate of change of the temperature is $|\\nabla T(2,1)|=10.77$ ºC (gr/dl)$^{-1}$. $x$ and $y$ must be changed along the direction of the unit vector $(0, 1)$, that is $x$ must be keep constant. Question 3 It is known the concentration in blood of the active ingredient of a drug $t$ hours after applying the drug is given by the function $c(t) = t^2e^{-t/2}$ mg/ml.\nCompute the maximum value for the concentration of the active ingredient and give the time when the maximum is reached. Study the concavity and compute the inflection points of the concentration. Solution The maximum is reached at $t=4$ hours and $c(4)=16e^{-2}$ mg/dl. There are two inflection points at $t=1.1716$ and $t=6.8284$.\nThe function is concave up in $(-\\infty, 1.1716) \\cup (6.8284, \\infty)$ and concave down in $(1.1716, 6.8284)$. ","date":1576454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6d5b5e97873b2f8778658cb293c56c3c","permalink":"/en/teaching/calculus/exams/pharmacy-2019-12-16/","publishdate":"2019-12-16T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2019-12-16/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Dic 16, 2019\nQuestion 1 A lagoon contaminated with nitrates contains 1000 tons of nitrates dissolved in 6 millions of cubic meters of water. To decontaminate the lagoon, we start to introduce pure water into the lagoon at a rate of 100000 cubic meters per day, and we take out the same amount of contaminated water.","tags":["Exam"],"title":"Pharmacy exam 2019-12-16","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: December 16, 2019\nQuestion 1 The table below summarizes the time (in minutes) required to remove anesthesia after a surgery in a sample of 50 patients.\n$$ \\begin{array}{cr} \\mbox{Time} \u0026amp; \\mbox{Patients} \\newline \\hline 10-30 \u0026amp; 2 \\newline 30-45 \u0026amp; 11 \\newline 45-60 \u0026amp; 18 \\newline 60-90 \u0026amp; 9 \\newline 90-120 \u0026amp; 8 \\newline 120-180 \u0026amp; 2 \\newline \\hline \\end{array} $$\nAre there some outliers in the sample?\nCompute the mean. Is it representative?\nIf according to a postoperative protocol the 15% of patients that require more time to remove the anesthesia must be monitored, above what time should a patient be monitored?\nIf we apply a drug that is anesthesia antagonist, it is known that the time required to remove the anesthesia decreases a 25%. How will the time decrease affect the representativeness of the mean?\nIf it is known that another type of anesthesia $B$ has mean 50 minutes and standard deviation 15 minutes, what time is relatively greater, 70 minutes with this type of anesthesia or 60 minutes with the type $B$?.\nUse the following sums for the computations:\n$\\sum x_in_i=3212.5$ min, $\\sum x_i^2n_i=249706.25$ min$^2$,\n$\\sum (x_i-\\bar x)^3n_i=1400531.25$ min$^3$ y\n$\\sum (x_i-\\bar x)^4n_i=143958437.7$ min$^4$.\nSolution $Q_1=44.3182$, $Q_3=81.6667$, $IQR=37.3485$, $f_1=-11.7045$ and $f_2=137.6894$. Since the last class contains values above the upper fence, there could be outliers. $\\bar x=64.25$ min, $s^2=866.0625$ min$^2$, $s=29.4289$ min and $cv=0.458$. Thus the representativity of the mean is moderate. $P_{85}=99.375$ min. Applying the linear transformation $y=0.75x$, $\\bar y=48.1875$ min, $s_y=22.0717$ min and $cv=0.458$. Thus the representativity of the mean is the same. Standard score in first anesthesia: $z(70)=0.1954$. Standard score in anesthesia $B$: $z(60)=0.6667$. Thus, 60 min with anesthesia $B$ is relatively greater. Question 2 The table below summarizes the scores of a group of 10 students in three practical exams of Maths.\n$$ \\begin{array}{rrr} \\mbox{Exam 1} (X) \u0026amp; \\mbox{Exam 2} (Y) \u0026amp; \\mbox{Exam 3} (Z) \\newline \\hline 5.5 \u0026amp; 3.2 \u0026amp; 5.0 \\newline 7.5 \u0026amp; 6.5 \u0026amp; 2.0 \\newline 2.5 \u0026amp; 4.0 \u0026amp; 1.0 \\newline 6.0 \u0026amp; 4.0 \u0026amp; 6.0 \\newline 8.0 \u0026amp; 7.5 \u0026amp; 6.0 \\newline 4.0 \u0026amp; 3.5 \u0026amp; 1.0 \\newline 7.0 \u0026amp; 5.5 \u0026amp; 4.0 \\newline 9.5 \u0026amp; 10.0 \u0026amp; 9.0 \\newline 10.0 \u0026amp; 9.5 \u0026amp; 8.0 \\newline 1.0 \u0026amp; 3.0 \u0026amp; 0.5 \\newline \\hline \\end{array} $$\nWhich two scores are more linearly correlated?\nUsing linear models, what are the expected scores of the second and third exams for a student with a score $6.5$ in the first exam?\nUse the following sums for the computations:\n$\\sum x_i=61$, $\\sum y_i=56.7$, $\\sum z_i=42.5$,\n$\\sum x_i^2=449$, $\\sum y_i^2=382.49$, $\\sum z_i^2=264.25$,\n$\\sum x_iy_j=405.85$, $\\sum x_iz_j=327$, $\\sum y_jz_j=295$.\nSolution $\\bar x=6.1$, $s_x^2=7.69$, $\\bar y=5.67$, $s_y^2=6.1001$, $\\bar z=4.25$, $s_z^2=8.3625$, $s_{xy}=5.998$, $s_{xz}=6.775$, $s_{yz}=5.4025$, $r^2_{xy}=0.7669$, $r^2_{xz}=0.7138$ and $r^2_{yz}=0.5722$. Thus, the two variables more linearly related are $X$ and $Y$, since their coefficient of determination is greater. Regression line of $Y$ on $X$: $y=0.9122 + 0.78x$ and $y(6.5)=5.982$. Regression line of $Z$ on $X$: $z=-1.1242 + 0.881x$ and $z(6.5)=4.6024$. Question 3 To study the association between the osteoporosis and the gender a random sample of people between 65 and 70 years old was taken. The following table summarize the results\n$$ \\begin{array}{lcc} \\hline \u0026amp; \\mbox{Osteoporosis} \u0026amp; \\mbox{Not osteoporosis}\\newline \\mbox{Women} \u0026amp; 480 \u0026amp; 2320\\newline \\mbox{Men} \u0026amp; 255 \u0026amp; 1505\\newline \\hline \\end{array} $$\nCompute the prevalence of the osteoporosis in the population.\nCompute the relative risk of osteoporosis in females with respect to males and interpret it.\nCompute the odds ratio of osteoporosis in females with respect to males and interpret it.\nWhich of the two measures is most suitable to study the association between the osteoporosis and the gender?\nSolution Let $D$ be the event of suffering osteoporosis.\nPrevalence: $P(D)=0.1612$. $RR(D)=1.1832$. Thus, the risk of suffering osteoporosis in women is higher than in men but not to much. There is no strong association between the osteoporosis and the gender. $OR(D)=1.2211$. Thus, the odds of suffering osteoporosis in women is higher than in men but not to much. Since we can compute the prevalence of $D$, both statistics are suitable, but relative risk is easier to interpret. Question 4 The risks of getting the flu in two cities $A$ and $B$ with the same population size are 14% and 8% respectively.\nCompute the probability of having more than 2 persons getting the flu in a random sample of 10 persons of the city $A$.\nCompute the probability of having more than 2 and less than 5 persons getting the flu in a random sample of 50 persons of the city $B$.\nCompute the probability of having 2 persons getting the flu in a random sample of 8 persons of the two cities.\nCompute the probability of having some person getting the flu in a random sample of 5 persons that have been living in both cities.\nSolution Let $X$ be the number of persons with flu in a sample of 10 persons from $A$, then $X\\sim B(10, 0.14)$ and $P(X\u0026gt;2)=0.1545$. Let $Y$ be the number of persons with flu in a sample of 50 persons from $B$, then $Y\\sim B(50, 0.08)\\approx P(4)$ and $P(2 \u0026lt; Y \u0026lt; 5) = 0.3907$. Let $Z$ be the number of persons with flu in a sample of 8 persons from $A$ and $B$, then $Z\\sim B(8, 0.11)$ and $P(Z = 2) = 0.1684$. Let $U$ be the number of persons with flu in a sample of 5 persons living in both cities, then $U\\sim B(5, 0.2088)$ and $P(U\u0026gt;0)=0.69$. Question 5 In a study about the cholesterol two samples of 10000 males and 10000 females was taken. It was observed that 3420 males and 1234 females had a cholesterol level above 230 mg/dl, and that 4936 males had a cholesterol level between 210 and 230 mg/dl. Assuming that the cholesterol level in males and females follows a normal distribution with the same standard deviation, compute:\nThe means and the standard deviation of the distributions of cholesterol level in males and females.\nThe percentage of males with cholesterol level between 200 and 240 mg/dl.\nThe interquartile range of the cholesterol level of females.\nSolution Let $X$ be cholesterol level in males and $Y$ the cholesterol level in females, then $X\\sim N(224.1164, 14.4556)$ and $X\\sim N(213.2581, 14.4556)$. $P(200\\leq X \\leq 240) = 0.8164$. $IQR = 19.5003$ mg/dl. ","date":1576454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6150ba40285e943123193991f81d26bf","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-12-16/","publishdate":"2019-12-16T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-12-16/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: December 16, 2019\nQuestion 1 The table below summarizes the time (in minutes) required to remove anesthesia after a surgery in a sample of 50 patients.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-12-16","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 18, 2019\nQuestion 1 In a population where the prevalence of a disease is 10% we apply a diagnostic test with a sensitivity 85%. What must be the minimum specificity of the test to diagnose the disease when the outcome of the test is positive?\nSolution The specificity must be at least $0.9056$. Question 2 In a stretch of a road there is an average of 2 accidents per day.\nCompute the probability of having more than 2 accidents a random day.\nCompute the probability of having more than 2 accidents a random day, knowing that there is at least one accident that day.\nCompute the probability of having 14 accidents a random week.\nSolution Let $X$ be the number of accidents in a day. $X\\sim P(2)$ and $P(X\u0026gt;2)=0.3233$. $P(X\u0026gt;2|X\\geq 1)=0.3739$. Let $Y$ be the number of accidents in a week. $X\\sim P(14)$ and $P(X=14)=0.106$. Question 3 In a study about the effectiveness of two flu drugs $A$ and $B$ it has been observed in a clinical trial that in 12% of cases only drug $A$ is effective, in 24% of cases only drug $B$ is effective and in 80% of cases where drug $A$ was effective, also was effective the drug $B$.\nWhat is the probability that both drugs are effective at the same time?\nWhat is the probability that only one of the drugs is effective?\nWhat is the probability that none of the drugs are effective?\nAre the effectiveness of the two drugs independent?\nSolution According to the problem statement, $P(A\\cap \\overline B) = 0.12$, $P(\\overline A\\cap B)=0.24$ and $P(B|A)=0.8$.\n$P(A\\cap B)=0.48$. $P(A\\cap \\overline B) + P(\\overline A\\cap B) =0.36$. $P(\\overline A \\cap \\overline B) = 0.16$. The events are dependent because $P(B)=0.72 \\neq P(B|A)=0.8$. Question 4 It is known that the annual rainfall in a region follows a normal probability distribution. If the statistics show that 15% of the years the annual rainfall has been greater than 45 cm and 3% of the years less than 30 cm,\nCompute the mean and the standard deviation of the annual rainfall.\nWhat is the probability that in the next 5 years at least one year the annual rainfall was above 50 cm?\nSolution Let $X$ be the annual rainfall. $X\\sim N(\\mu, \\sigma)$, and according to the statement $P(X\u0026gt;45)=0.15$ and $P(X\u0026lt;30)=0.03$. $\\mu=39.6708$ cm and $\\sigma=5.1419$ cm. Let $Y$ be the number of years in the next 5 years with annual rainfall above 50 cm. Then $Y\\sim B(5, 0.0223)$, and $P(X\\geq 1)=0.1065$. ","date":1574035200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"415bf266af33209796f13d8e0d1df047","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-11-18/","publishdate":"2019-11-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-11-18/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 18, 2019\nQuestion 1 In a population where the prevalence of a disease is 10% we apply a diagnostic test with a sensitivity 85%. What must be the minimum specificity of the test to diagnose the disease when the outcome of the test is positive?","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-11-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 14, 2019\nQuestion 1 It has been measured the systolic blood pressure (in mmHg) in two groups of 100 persons of two populations $A$ and $B$. The table below summarize the results.\n$$ \\begin{array}{lrr} \\mbox{Systolic blood pressure} \u0026amp; \\mbox{Num persons $A$} \u0026amp; \\mbox{Num persons $B$} \\newline \\hline (80, 90] \u0026amp; 4 \u0026amp; 6 \\newline (90, 100] \u0026amp; 10 \u0026amp; 18 \\newline (100, 110] \u0026amp; 28 \u0026amp; 30 \\newline (110, 120] \u0026amp; 24 \u0026amp; 26 \\newline (120, 130] \u0026amp; 16 \u0026amp; 10 \\newline (130, 140] \u0026amp; 10 \u0026amp; 7 \\newline (140, 150] \u0026amp; 6 \u0026amp; 2 \\newline (150, 160] \u0026amp; 2 \u0026amp; 1 \\newline \\hline \\end{array} $$\nWhich of the two systolic blood pressure distributions is less asymmetric? Which one has a higher kurtosis? According to skewness and kurtosis can we assume that populations $A$ and $B$ are normal?\nIn which group is more representative the mean of the systolic blood pressure?\nCompute the value of the systolic blood pressure such that 30% of persons of the group of population $A$ are above it?\nWhich systolic blood pressure is relatively higher, 132 mmHg in the group of population $A$, or 130 mmHg in the group of population $B$?\nIf we measure the systolic blood pressure of the group of population $A$ with another tensiometer, and the new pressure obtained ($Y$) is related with the first one ($X$) according to the equation $y=0.98x-1.4$, in which distribution, $X$ or $Y$, is more representative the mean?\nUse the following sums for the computations:\nGroup $A$: $\\sum x_in_i=11520$ mmHg, $\\sum x_i^2n_i=1351700$ mmHg$^2$, $\\sum (x_i-\\bar x)^3n_i=155241.6$ mmHg$^3$ and $\\sum (x_i-\\bar x)^4n_i=16729903.52$ mmHg$^4$.\nGroup $B$: $\\sum x_in_i=11000$ mmHg, $\\sum x_i^2n_i=1230300$ mmHg$^2$, $\\sum (x_i-\\bar x)^3n_i=165000$ mmHg$^3$ and $\\sum (x_i-\\bar x)^4n_i=13632500$ mmHg$^4$.\nSolution Group $A$: $\\bar x=115.2$ mmHg, $s^2=245.96$ mmHg$^2$, $s=15.6831$ mmHg, $g_{1A}=0.4024$ and $g_{2A}=-0.2346$. Group $B$: $\\bar x=110$ mmHg, $s^2=203$ mmHg$^2$, $s=14.2478$ mmHg, $g_{1B}=0.5705$ and $g_{2B}=0.3081$. Thus the distribution of the population $A$ group is less asymmetric since $g_{1A}$ is closer to 0 than $g_{1B}$ and the populaton $B$ group has a higher kurtosis since $g_{2B}\u0026gt;g_{2A}$. Both populations can be cosidered normal since $g_1$ and $g_2$ are between -2 and 2. $cv_A=0.1361$ and $cv_B=0.1295$, thus, the mean of group $B$ is a little bit more representative since its coef. of variation is smaller than the one of group $A$. $P_{70}\\approx 125$ mmHg. The standard scores are $z_A(132)=1.0712$ and $z_B(130)=1.4037$. Thus, 130 mmHg in group $B$ is relatively higher than 132 mmHg in group $A$. $\\bar y=111.496$, $s_y=15.3694$ and $cv_y=0.1378$. Thus the mean of $X$ is more representative than the mean of $Y$ since $cv_x\u0026lt;cv_y$. Question 2 In a symmetric distribution the mean is 15, the first quartile 12 and the maximum value is 25.\nDraw the box and whiskers plot. Could an hypothetical value of 2 be considered an outlier in this distribution? Solution $Q_1=12$, $Q_2=15$, $Q_3=18$, $IQR=6$, $f_1=3$, $f_2=27$, $Min=5$ and $Max=25$. Yes, because $2\u0026lt;f_1$. Question 3 A pharmaceutical company is trying three different analgesics to determine if there is a relation among the time required for them to take effect. The three analgesics were administered to a sample of 20 patients and the time it took for them to take effect was recorded. The following sums summarize the results, where $X$, $Y$ and $Z$ are the times for the three analgesics.\n$\\sum x_i=668$ min, $\\sum y_i=855$ min, $\\sum z_i=1466$ min,\n$\\sum x_i^2=25056$ min$^2$, $\\sum y_i^2=42161$ min$^2$, $\\sum z_i^2=123904$ min$^2$,\n$\\sum x_iy_j=31522$ min$^2$, $\\sum y_jz_j=54895$ min$^2$.\nIs there a linear relation between the times $X$ and $Y$? And between $Y$ and $Z$? How are these linear relationships?\nAccording to the regression line, how much will the time $X$ increase for every minute that time $Y$ increases?\nIf we want to predict the time $Y$ using a linear regression model, ¿which of the two times $X$ or $Z$ is the most suitable? Why?\nUsing the chosen linear regression model in the previous question, predict the value of $Y$ for a value of $X$ or $Z$ of 40 minutes.\nIf the correlation coefficient between the times $X$ and $Z$ is $r=-0.69$, compute the regression line of $X$ on $Z$.\nSolution $\\bar x=33.4$ min, $s_x^2=137.24$ min$^2$, $\\bar y=42.75$ min, $s_y^2=280.4875$ min$^2$, $\\bar z=73.3$ min, $s_z^2=822.31$ min$^2$, $s_{xy}=148.25$ min$^2$ and $s_{yz}=-388.825$ min$^2$. Thus, there is a direct linear relation between $X$ and $Y$ and an inverse linear relation between $Y$ and $Z$. $b_{xy}=0.5285$ min. $r^2_{xy}=0.5709$ and $r^2_{yz}=0.6555$, thus the regression line of $Y$ on $Z$ explains better $Y$ than the regression line of $Y$ on $X$ since $r^2_{yz}\u0026gt;r^2_{xy}$. Regression line of $Y$ on $Z$: $y=77.4095 + -0.4728z$ and $y(40)=58.4957$. $s_{xz}=-231.7967$ and the regression line of $X$ on $Z$ is $x=54.0622 + -0.2819z$. ","date":1571011200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"84914b3cccbde96cecb28bd06c7b2549","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-10-14/","publishdate":"2019-10-14T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-10-14/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 14, 2019\nQuestion 1 It has been measured the systolic blood pressure (in mmHg) in two groups of 100 persons of two populations $A$ and $B$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-10-14","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: June 18, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours. A sample of individuals between 40 and 50 years that spent more than 5 hours sitting were drawn. It was observed if they followed or not the occupational risk prevention program and the number of spinal injuries after 10 years. The results are shown in the table below.\n$$ \\begin{array}{lrrrrrrrrrrrrrrr} \\hline \\mbox{With prevention program} \u0026amp; 1 \u0026amp; 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4 \u0026amp; 2 \u0026amp; 2 \u0026amp; 5 \u0026amp; 2 \u0026amp; 3 \u0026amp; 2 \u0026amp; 0 \\newline \\mbox{Wihtout prevention program} \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 3 \u0026amp; 7 \u0026amp; 6 \u0026amp; 5 \u0026amp; 5 \u0026amp; 9 \u0026amp; 5 \u0026amp; 5 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; \\newline \\hline \\end{array}$$\nPlot the polygon of cumulative relative frequencies of the total sample.\nAccording to the interquartile range, which sample has more central spread of the spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich sample has a greater relative spread with respect to the mean of the spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich sample has a more normal kurtosis of the number of spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich number of spinal injuries is relatively greater, 2 injuries of a person following the prevention program or 4 injuries of a person not following the prevention program?\nUse the following sums for the computations:\nWith prevention program: $\\sum x_i=36$ injuries, $\\sum x_i^2=116$ injuries$^2$, $\\sum (x_i-\\bar x)^3=-0.48$ injuries$^3$ and $\\sum (x_i-\\bar x)^4=135.97$ injuries$^4$.\nWithout prevention program: $\\sum y_i=66$ injuries, $\\sum y_i^2=362$ injuries$^2$, $\\sum (y_i-\\bar y)^3=27.92$ injuries$^3$ and $\\sum (y_i-\\bar y)^4=586.9$ injuries$^4$.\nSolution With prevention program: $Q_1=2$ injuries, $Q_3=4$ injuries, $IQR=2$ injuries.\nWithout prevention program: $Q_1=3$ injuries, $Q_3=6$ injuries, $IQR=3$ injuries.\nThe sample not following the prevention program has more central spread since the interquartile range is greater.\nWith prevention program: $\\bar x=2.4$ injuries, $s^2=1.9733$ injuries$^2$, $s=1.4048$ injuries and $cv=0.5853$.\nWithout prevention program: $\\bar y=4.7143$ injuries, $s^2=3.6327$ injuries$^2$, $s=1.906$ injuries and $cv=0.4043$.\nThe sample following the prevention program has a greater relative spread with respect to the mean since the coef. of variation is greater.\nWith prevention program: $g_2=-0.6722$.\nWithout prevention program: $g_2=0.1768$.\nThus the sample not following the prevention program has a more normal kurtosis, since the coeff. of kurtosis is closer to 0.\nWith prevention program: $z(2)=-0.2847$.\nWithout prevention program: $z(4)=-0.3748$.\nThus 4 injuries in the sample not following the prevention program is relatively smaller, since its standard score is smaller.\nQuestion 2 The evolution of the price of a muscle relaxant between 2015 and 2019 is shown in the table below.\n$$ \\begin{array}{lrrrrr} \\hline \\mbox{Year} \u0026amp; 2015 \u0026amp; 2016 \u0026amp; 2017 \u0026amp; 2018 \u0026amp; 2019 \\newline \\mbox{Price (€)} \u0026amp; 1.40 \u0026amp; 1.60 \u0026amp; 1.92 \u0026amp; 2.30 \u0026amp; 2.91 \\newline \\hline \\end{array}$$\nWhich regression model is better to predict the price, the linear or the exponential?\nUse the best of the two previous models to predict the price in 2020.\nSolution $\\bar x=2017$ years, $s_x^2=2$ years$^2$.\n$\\bar y=2.026$ €, $s_y^2=0.2882$ €$^2$.\n$\\overline{\\log(y)}=0.672$ log(€), $s_{\\log(y)}^2=0.0673$ log(€)$^2$.\n$s_{xy}=0.744$ years$\\cdot$€, $s_{x\\log(y)}=0.3653$ years$\\cdot\\log(€)$\nLinear coef. determination: $r^2=0.9603$ Exponential coef. determination: $r^2=0.9909$\nThus the exponential regression model is better to predict the price since the coef. of determination is greater. Exponential regression model: $y=e^{-367.6861+0.1826x}$.\nPrediction: $y(2020)=3.3867$ €. Question 3 In a linear regression study between two variables $X$ and $Y$ we know $\\bar x = 3$, $s_x^2=2$, $s_y^2=10.8$ and the regression line of $Y$ on $X$ is $y=90.9-2.3x$.\nCompute the mean of $Y$.\nCompute and interpret the linear correlation coefficient.\nSolution $\\bar y = 84$. $r=-0.9898$. Probability and Random Variables Question 4 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours. A sample of 500 individuals between 40 and 50 years that spent more than 5 hours sitting was drawn. Half of the individuals followed the prevention program (treatment group) and the other half not (control group). After 5 years it was observed that 12 individuals suffered spinal injuries in the group following the prevention program while 32 individuals suffered spinal injuries in the other group. In the following 5 years it was observed that 21 individuals suffered spinal injuries in the group following the prevention program while 48 individuals suffered spinal injuries in the other group.\nCompute the cumulative incidence of spinal injuries in the total sample after 5 years and after 10 years.\nCompute the absolute risk of suffering spinal injuries in 10 years in the treatment and control groups.\nCompute the relative risk of suffering spinal injuries in 10 years in the treatment group compared to the control group. Interpret it.\nCompute the odds ratio of suffering spinal injuries in 10 years in the treatment group compared to the control group. Interpret it.\nWhich statistics, the relative risk or the odds ratio, is more suitable in this study? Justify the answer.\nSolution Let $D$ be the event of suffering spinal injuries.\nCumulative incidence after 5 years: $R(D)=0.088$. Cumulative incidence after 10 years: $R(D)=0.226$.\nRisk in the treatment group: $R_T(D)=0.132$. Risk in the control group: $R_C(D)=0.32$.\n$RR(D)=0.4125$. Thus, the risk of suffering spinal injuries is less than half following the prevention program.\n$OR(D)=0.3232$. Thus, the odd of suffering spinal injuries is less than one third following the prevention program.\nSince the study is prospective and we can estimate the prevalence of $D$, both statistics are suitable, but relative risk is easier to interpret.\nQuestion 5 The table below shows the results of a study to evaluate the usefulness of a reactive strip to diagnose an urinary infection.\n$$ \\begin{array}{ccc} \\hline \\mbox{Outcome} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{No infection}\\newline \\mbox{Positive} \u0026amp; 60 \u0026amp; 80\\newline \\mbox{Negative} \u0026amp; 10 \u0026amp; 200\\newline \\hline \\end{array} $$\nCompute the sensitivity and the specificity of the test.\nCompute the positive and the negative predictive values.\nIs this test better to confirm or to rule out the infection?\nIf another study has determined that the true prevalence of the infection is 2%, how does this affect to the predictive values?\nSolution Let $D$ be the event corresponding to suffering the urinary infection and $+$ and $-$ the events corresponding to get a positive and negative outcome in the test respectively.\nSensitivity = $0.8571$ and Specificity = $0.7143$.\n$PPV=0.4286$ and $NPV=0.9524$. Since the $PPV\u0026lt;NPV$ the test is better to rule out the infection.\n$PPV=0.0577$ and $NPV=0.9959$. The positive predictive value descreases a lot while the negative predictive value increases al little bit.\nQuestion 6 The time required to recover from an injury follows a normal distribution with variance 64 days.\nIt is also known that 10% of people with this injury require more than 80 days to recover.\nWhat is the expected time required to recover from the injury?\nWhat percentage of individuals will require between 60 and 75 days to recover?\nIf we draw a random sample of 12 individuals with this injury, what is the probability of having between 9 and 11 individuals, both included, requiring less than 80 days to recover?\nIf we draw a random sample of 500 individuals with this injury, what is the probability of having less than 4 requiring a time above the 99th percentile to recover?\nSolution Let $X$ be the time required to recover from the injury. Then $X\\sim N(\\mu, 8)$.\n$\\mu=69.7476$ days.\n$P(60\u0026lt;X\u0026lt;75) = 0.6327$.\nLet $Y$ be the number of individuals with the injury requiring less than 80 days to recover in a sample of 12. Then $Y\\sim B(12, 0.9)$ and $P(9\\leq Y\\leq 11)=0.6919$.\nLet $Z$ be the number of individuals with the injury requiring a time above the 99th percentile to recover in a sample of 500. Then $Z\\sim B(500, 0.01)\\approx P(5)$ and $P(Z\\leq 4)=0.265$.\n","date":1560816000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1647070234,"objectID":"c221fea84ca6e9626829bc11271943dc","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-06-18/","publishdate":"2019-06-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-06-18/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 18, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-06-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 27, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effect of smoking during the pregnancy in the weight of newborns. The table below shows the daily number of cigarretes smoked by mothers ($X$) and the weight of the newborn (all of them are males) ($Y$).\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Daily num cigarettes} \u0026amp; 10.00 \u0026amp; 14.00 \u0026amp; 8.00 \u0026amp; 11.00 \u0026amp; 7.00 \u0026amp; 6.00 \\newline \\mbox{Weight (kg)} \u0026amp; 2.55 \u0026amp; 2.44 \u0026amp; 2.68 \u0026amp; 2.65 \u0026amp; 2.71 \u0026amp; 2.85 \\newline \\hline \\end{array} $$\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Daily num cigarettes} \u0026amp; 2.00 \u0026amp; 5.00 \u0026amp; 9.00 \u0026amp; 9.00 \u0026amp; 4.00 \u0026amp; 6.00 \\newline \\mbox{Weight (kg)} \u0026amp; 3.45 \u0026amp; 2.93 \u0026amp; 2.67 \u0026amp; 2.59 \u0026amp; 3.02 \u0026amp; 2.72 \\newline \\hline \\end{array} $$\nGive the equation of the regression line of the weight of newborns on the daily number of cigarettes and interpret the slope.\nWhich regression model is better to predict the weight of newborns, the logarithmic or the exponential?\nUse the best of the two previous regression models to predict the weight of a newborn whose mother smokes 12 cigarettes a day. Is this prediction reliable?\nUse the following sums for the computations:\n$\\sum x_i=91$ cigarettes, $\\sum \\log(x_i)=23.0317$ $\\log(\\mbox{cigarettes})$, $\\sum y_j=33.26$ kg, $\\sum \\log(y_j)=12.1857$ $\\log(\\mbox{kg})$,\n$\\sum x_i^2=809$ cigarettes$^2$, $\\sum \\log(x_i)^2=47.196$ $\\log(\\mbox{cigarettes})^2$, $\\sum y_j^2=92.9708$ kg$^2$, $\\sum \\log(y_j)^2=12.4665$ $\\log(\\mbox{kg})^2$,\n$\\sum x_iy_j=243.61$ cigarettes$\\cdot$kg, $\\sum x_i\\log(y_j)=89.3984$ cigarettes$\\cdot\\log(\\mbox{kg})$, $\\sum \\log(x_i)y_j=62.3428$ $\\log(\\mbox{cigarettes})$kg, $\\sum \\log(x_i)\\log(y_j)=22.8753$ $\\log(\\mbox{cigarettes})\\log(\\mbox{kg})$.\nSolution $\\bar x=7.5833$ cigarettes, $s_x^2=9.9097$ cigarettes$^2$. $\\bar y=2.7717$ kg, $s_y^2=0.0654$ kg$^2$. $s_{xy}=-0.7176.$ cigarettes$\\cdot$kg Regression line: $y=-0.0724x + 3.3208$. The slope of the regression line is $b_{yx}=-0.0724$. That means that the weight of the newborn will decrease 0.0724 kg per daily cigarette smoked by the mother.\n$\\overline{\\log(x)}=1.9193$ log(cigarettes), $s_{\\log(x)}^2=0.2492$ log(cigarettes)$^2$. $\\overline{\\log(y)}=1.0155$ log(kg), $s_{\\log(y)}^2=0.0077$ log(kg)$^2$. $s_{x\\log(y)}=-0.2508$ cigarettes$\\cdot$log(kg), $s_{\\log(x)y}=-0.1245$ log(cigarettes)$\\cdot$kg Logarithmic coef. determination: $r^2=0.9499$ Exponential coef. determination: $r^2=0.8268$ Therefore, the logarithmic models fits better the data and is better to predict the weight.\nLogarithmic regression model: $y=3.7301+-0.4994\\log(x)$. Prediction: $y(12)=2.4892$ kg. The coefficient of determination is high but the sample size small, so the prediction is not enterely reliable.\nQuestion 2 The table below summarize the time that took to the runners to reach the finish in a long-distance race in Madrid:\n$$ \\begin{array}{lr} \\mbox{Time (min)} \u0026amp; \\mbox{Num runners}\\newline \\hline (30,35] \u0026amp; 15\\newline (35,40] \u0026amp; 35\\newline (40,45] \u0026amp; 40\\newline (45,50] \u0026amp; 10\\newline \\hline \\end{array}$$\nIn a another race in Paris, the mean of time was 40 minutes, the standard deviation 5 minutes and the coefficient of skewness $0.75$.\nWhat percentage of runners took less than 42 minutes to reach the finish in Madrid?\nCompute and interpret the interquartile range of the time for Madrid race.\nIn which race the mean of the time is more representative?\nIn which race the time have a more symmetric distribution?\nIn which race a time of 39 minutes to reach the finish is relatively smaller?\nUse the following sums for the computations: $\\sum x_i=3975$ min, $\\sum x_i^2=159875$ min$^2$, $\\sum (x_i-\\bar x)^3=-628.12$ min$^3$ y $\\sum (x_i-\\bar x)^4=80701.95$ min$^4$.\nSolution $F(42)=0.66$, thus approximately $66%$ of runners finished before 42 minutes.\n$Q_1=36.4286$ min, $Q_3=43.125$ min and $IQR=6.6964$ min. The central 50% of times fall in a range of $6.6964$ minutes.\nMadrid statistics: $\\bar x=39.75$ min, $s^2=18.6875$ min$^2$, $s=4.3229$ min and $cv=0.1088$. Paris statistics: $cv=0.125$. Thus, the mean of time in Madrid is a little bit more representative since the coef. of variation is smaller.\n$g_1=-0.0778$, that is closer to 0 than the distribution of times in Paris, thus the distribution of times in Madrid is more symmetric.\nThe standard score of the Madrid sample is $z(39)=-0.1735$ and the standard score of the Paris one $z(39)=-0.2$, thus a time of 39 min is relatively smaller in the sample of Paris.\nProbability and Random Variables Question 1 It has been observed that the concentration of a metabolite in urine can be used as a diagnostic test for a disease. The concentration (in mg/dl) in healthy individuals follows a normal distribution with mean 90 and standard deviation 8, while in sick individuals follows a normal distribution with mean 120 and standard deviation 10.\nIf the cut-off point is set at 105 mg/dl (positive above and negative below), what is the sensitivity and the specificity of the test?\nIf the cut-off point is set at 105 mg/dl and we assume a prevalence of 10%, what is the probability of a correct diagnostic?\nIf we want a sensitivity of 95%, where must we set the cut-off point? What would the specificity of the test be?\nSolution Let $X$ and $Y$ be the distributions of the concentration of metabolite in healthy and sick individuals respectively.\nSensitivity: $P(+|D) = P(Y\u0026gt;105) = 0.9332$. Specificity: $P(-|\\overline D) = P(X\u0026lt;105) = 0.9696$.\n$P(\\mbox{correct diagnostic}) = P(D\\cap +) + P(\\overline D \\cap -) = 0.966$.\nCut-off point $103.5515$ mg/dl. Specificity: $P(-|\\overline D) = P(X\u0026lt;103.5515) = 0.9549$.\nQuestion 2 Let $A$ and $B$ be two events of a random experiment, such that $A$ is three times as likely as $B$, $P(A\\cup B)=0.8$ and $P(A\\cap B)=0.2$.\nCompute $P(A)$ and $P(B)$.\nCompute $P(A-B)$ and $P(B-A)$.\nCompute $P(\\bar A \\cup \\bar B)$ and $P(\\bar A \\cap \\bar B)$.\nCompute $P(A|B)$ and $P(B|A)$.\nAre $A$ and $B$ independent?\nSolution $P(A) = 0.75$ and $P(B) = 0.25$.\n$P(A-B) = 0.55$ and $P(B-A) = 0.05$.\n$P(\\bar A \\cup \\bar B) = 0.8$ and $P(\\bar A \\cap \\bar B) = 0.2$.\n$P(A|B) = 0.8$ and $P(B|A) = 0.2667$.\nNo, they are dependent since $P(A|B)\\neq P(A)$.\nQuestion 3 The employees of a courier company send an average of $246.2$ messages in a period of 12 hours. It is also known that the mean of messages sent by males is $256.2$ and by females is $237.4$ in the same period.\nCompute the probability that a random person of the company sends 5 messages in a period of half an hour.\nIf we draw randomly 10 women of this company, what is the probability that at least 3 of them sends more than one message in a period of one hour?\nIf we draw randomly 100 men of this company, what is the probability that none of them sends less than 2 messages in a period of a quarter of an hour?\nSolution Let $X$ be the number of messages sent in 1 hour. Then $X\\sim P(10.2583)$ and $P(X=5)=0.0332$.\nLet $Y$ be the number of women in a sample of 10 that sent more than 1 message in 1 hour. Then $Y\\sim B(10, 1)$ and $P(Y\\geq 3)=1$.\nLet $Z$ be the number of men in a sample of 100 that sent less than 2 messages in a quarter of hour. Then $Z\\sim B(100, 0.0305)$ and $P(Z=0)=0.0166$.\n","date":1558915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1646900374,"objectID":"5ab4f4415cc715de5fb5e8c1aae2eeaf","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-05-27/","publishdate":"2019-05-27T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-05-27/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 27, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effect of smoking during the pregnancy in the weight of newborns. The table below shows the daily number of cigarretes smoked by mothers ($X$) and the weight of the newborn (all of them are males) ($Y$).","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-05-27","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: March 26, 2019\nQuestion 1 The time required by a drug $A$ to be effective has been measured (in minutes) in a sample of 150 patients. The table below summarize the results.\n$$ \\begin{array}{lr} \\mbox{Response time} \u0026amp; \\mbox{Patients} \\newline \\hline (0,5] \u0026amp; 5 \\newline (5,10] \u0026amp; 15 \\newline (10,15] \u0026amp; 32 \\newline (15,20] \u0026amp; 36 \\newline (20,30] \u0026amp; 42 \\newline (30,60] \u0026amp; 20 \\newline \\hline \\end{array} $$\nAre there outliers in the sample? Justify the answer.\nWhat is the minimum time for the 20% of patients with highest response time?\nWhat is the average response time? Is the mean representative?\nCan we assume that the sample comes from a normal population?\nIf we take another sample of patients with mean 18 min and standard deviation 15 min, in which group is greater a response time of 25 min?\nUse the following sums for the computations: $\\sum x_i=3105$ min, $\\sum x_i^2=83650$ min$^2$, $\\sum (x_i-\\bar x)^3=206851.65$ min$^3$ y $\\sum (x_i-\\bar x)^4=8140374.96$ min$^4$.\nSolution $Q_1=12.7344$ min, $Q_3=25.8333$ min, $IQR=13.099$ min, $f_1=-6.9141$ min and $f_2=45.4818$ min. Therefore there are outliers in the sample since the upper limit of the last interval is above the upper fence. $P_{80}=27.619$ min. $\\bar x=20.7$ min, $s^2=129.1767$ min$^2$, $s=11.3656$ min and $cv=0.5491$. The mean is moderately representative since the $cv\\approx 0.5$. $g_1=0.9393$ and $g_2=0.2523$. Since $g_1$ and $g_2$ are between -2 and 2, we can assume that the sample comes from a normal (bell-shaped) population. The standard score of the first sample is $z(25)=0.3783$ and the standard score of the second one is $z(25)=0.4667$, thus a time of 25 min is relatively greater in the second sample. Question 2 In a regression study about the relation between two variables $X$ and $Y$ we got $\\bar x=7$ and $r^2=0.9$. If the equation of the regression line of $Y$ on $X$ is $y-x=1$, compute\nThe mean of $Y$.\nThe equation of the regression line of $X$ on $Y$.\nWhat value does this regression model predict for $x=6$? And for $y=10$?\nSolution $\\bar y=8$. Regression line of $X$ on $Y$: $x=0.9y-0.2$. $y(6)=7$ and $x(10)=8.8$. Question 3 In a tennis club the age ($X$) and the height ($Y$) of the ten players conforming the female youth team has been measured.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Age (years)} \u0026amp; 9 \u0026amp; 10 \u0026amp; 11 \u0026amp; 12 \u0026amp; 13 \u0026amp; 14 \u0026amp; 15 \u0026amp; 16 \u0026amp; 17 \u0026amp; 18 \\newline \\mbox{Height (cm)} \u0026amp; 128 \u0026amp; 144 \u0026amp; 148 \u0026amp; 154 \u0026amp; 158 \u0026amp; 161 \u0026amp; 165 \u0026amp; 164 \u0026amp; 166 \u0026amp; 167 \\newline \\hline \\end{array} $$\nPlot the scatter plot (Height on Age).\nWhich regression model bests fits these data, the linear or the logarithmic?\nWhat is the expected height of a player 12.5 years old according to the best of two previous models?\nUse the following sums for the computations:\n$\\sum x_i=135$ years, $\\sum \\log(x_i)=25.7908$ $\\log(\\mbox{years})$, $\\sum y_j=1555$ cm, $\\sum \\log(y_j)=50.4358$ $\\log(\\mbox{cm})$,\n$\\sum x_i^2=1905$ years$^2$, $\\sum \\log(x_i)^2=67.0001$, $\\log(\\mbox{years})^2$, $\\sum y_j^2=243191$ cm$^2$, $\\sum \\log(y_j)^2=254.4404$ $\\log(\\mbox{cm})^2$,\n$\\sum x_iy_j=21303$ years$\\cdot$cm, $\\sum x_i\\log(y_j)=682.9473$ years$\\cdot\\log(\\mbox{cm})$, $\\sum \\log(x_i)y_j=4035.0697$ $\\log(\\mbox{years})$cm, $\\sum \\log(x_i)\\log(y_j)=130.2422$ $\\log(\\mbox{years})\\log(\\mbox{cm})$.\nSolution 2.$\\bar x=13.5$ years, $s_x^2=8.25$ years$^2$, $\\overline{\\log(x)}=2.5791$ log(years), $s_{\\log(x)}^2=0.0483$ log(years)$^2$.\n$\\bar y=155.5$ cm, $s_y^2=138.85$ cm$^2$. $s_{xy}=31.05$ years$\\cdot$cm, $s_{\\log(x)y}=2.4594$ log(years)cm Linear coef. determination: $r^2=0.8416$ Logarithmic coef. determination: $r^2=0.9013$ Therefore, both models fit pretty well, but the logarithmic model fits a little bit better. 3. Logarithmic regression model: $y=24.2639+50.8848\\log(x)$. Prediction: $x(12.5)=152.785$ cm.\n","date":1553558400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a26227c769d9eb5dda80d4c6cd4b9b77","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-03-26/","publishdate":"2019-03-26T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-03-26/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 26, 2019\nQuestion 1 The time required by a drug $A$ to be effective has been measured (in minutes) in a sample of 150 patients. The table below summarize the results.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-03-26","type":"book"},{"authors":["Alfredo Sánchez Alberca, María Luisa Sánchez Rodríguez, Manuel Camacho Sampelayo, José Miguel Camacho Sampelayo, José Javier García Medina Alfonso Parra Blesa"],"categories":[],"content":"","date":1546300800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"b92a4a34decc05a181d033ea99c9c0da","permalink":"/en/publication/clasificacion-2019/","publishdate":"2020-09-16T21:26:03.60199Z","relpermalink":"/en/publication/clasificacion-2019/","section":"publication","summary":"","tags":[],"title":"Clasificación por estadios clínico evolutivos del glaucoma primario de ángulo abierto (GPAA) usando valores normalizados obtenidos mediante tomografía de coherencia óptica","type":"publication"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Dic 17, 2018\nQuestion 1 A organism metabolizes alcohol at a rate half of the present amount per minute. If initially there is no alcohol and we start to introduce alcohol in the organism at a constant rate of 2 ml/min, how much alcohol will there be in the organism after 5 minutes?\nSolution Let $a$ be the alcohol in the organism and $t$ the time.\nDifferential equation: $a\u0026rsquo;=2-a/2$.\nSolution: $a(t)=4-4e^{-t/2}$.\n$a(5)=3.6717$ ml. Question 2 The amount $y$ of bacteria of type $B$ (in thousands) in a culture is related to the amount $x$ of bacteria of type $A$ (also in thousands) according to the function $y=f(x)$. Knowing that the equation $x^2y^3-6x^3y^2+2xy=1$ is satisfied in this culture and that $f(1/2)=2$, study if $f$ could have a local maximum at $x=1/2$.\nSolution Implicit derivative: $y\u0026rsquo;= \\dfrac{-2xy^3+18x^2y^2+2y}{3x^2y^2-12x^3y+2x}$.\n$y\u0026rsquo;(1/2)=6\\neq 0$, so $f$ has no local maximum at $x=1/2$. Question 3 A capsule has pyramidal shape with base a rectangle of sides $a=3$ cm, $b=4$ cm, and height $h=6$ cm.\nHow must change the dimensions of the capsule to increase the volumen the most? What would be the rate of change of the volume if we changed the dimensions in such a way? If we start to change the dimensions of the capsule such that the largest side of the rectangle decreases half of the increase of the smaller side, and the height increases the double of the increase of the smaller side, what will the rate of change of the volume be? Remark: The volume of a pyramid is $1/3$ of the base area times the height.\nSolution $\\nabla V(3,4,6)=(8,6,4)$ and the volume will increase $|\\nabla V(3,4,6)|=10.7703$ cm$^3$/s if we change the dimensions of the capsule following this direction. Directional derivative of $V$ in $(3,4,6)$ along the vector $\\mathbf{u}=(1,-1/2,2)$: $V\u0026rsquo;_{\\mathbf{u}}(3,4,6)=5.6737$ cm$^3$/s. Question 4 The yield of a crop $y$ depends of the concentrations of nitrogen $n$ and phosphor $p$ according to the function $$y(n,p)=npe^{-(n+p)}.$$ Compute the amount of $n$ and $p$ that maximizes the yield of the crop.\nSolution $n=1$ and $p=1$. ","date":1545004800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"9165fcc76313b9241b8413b9fe25ad1a","permalink":"/en/teaching/calculus/exams/pharmacy-2018-12-17/","publishdate":"2018-12-17T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2018-12-17/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Dic 17, 2018\nQuestion 1 A organism metabolizes alcohol at a rate half of the present amount per minute. If initially there is no alcohol and we start to introduce alcohol in the organism at a constant rate of 2 ml/min, how much alcohol will there be in the organism after 5 minutes?","tags":["Exam"],"title":"Pharmacy exam 2018-12-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: December 17, 2018\nQuestion 1 The chart below represents the cumulative distribution of the number of daily defective drugs produced by a machine in a sample of 40 days.\nConstruct the frequency table of the number of defective drugs. Draw the box and whiskers plot of the number of defective drugs. Study the symmetry of the distribution of the number of defective drugs. If the number of defective drugs produced by a second machine follows the equation $y=3x+2$, where $x$ and $y$ are the number of defective drugs with the first and the second machines respectively, in which machine is more representative the mean of the number of defective drugs? Which number of defective drugs is relatively smaller, 3 drugs in the first machine or 9 in the second one? Solution $$\\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Defective drugs} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 1 \u0026amp; 0.025\\newline 1 \u0026amp; 3 \u0026amp; 0.075 \u0026amp; 4 \u0026amp; 0.100\\newline 2 \u0026amp; 6 \u0026amp; 0.150 \u0026amp; 10 \u0026amp; 0.250\\newline 3 \u0026amp; 7 \u0026amp; 0.175 \u0026amp; 17 \u0026amp; 0.425\\newline 4 \u0026amp; 8 \u0026amp; 0.200 \u0026amp; 25 \u0026amp; 0.625\\newline 5 \u0026amp; 6 \u0026amp; 0.150 \u0026amp; 31 \u0026amp; 0.775\\newline 6 \u0026amp; 5 \u0026amp; 0.125 \u0026amp; 36 \u0026amp; 0.900\\newline 7 \u0026amp; 2 \u0026amp; 0.050 \u0026amp; 38 \u0026amp; 0.950\\newline 8 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 39 \u0026amp; 0.975\\newline 9 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 40 \u0026amp; 1.000\\newline \\hline \\end{array} $$ $\\bar x=3.975$ drugs, $s_x=1.9936$ drugs and $g_1=0.3184$. Thus the distribution is a little bit right-skewed. $cv_x=0.5015$, $\\bar y=13.925$ drugs, $s_y=5.9808$ drugs and $cv_y=0.4295$. Thus, the mean of $y$ is more representative than the mean of $x$ since its coef. of variation is smaller. $z_x=-0.4891$ and $z_y=-0.8235$, therefore 9 defective drugs in the $y$ machine is relatively smaller. Question 2 A pharmaceutical laboratory produces two models of blood pressure monitor, one for the arm and the other for the wrist. To compare the accuracy of both blood pressure monitors, a quality control has been conducted with a sample of 20 patients, getting the following results:\n$\\sum x_i=265.4$ mmHg, $\\sum y_i=262.5$ mmHg , $\\sum z_i=262.4$ mmHg,\n$\\sum x_i^2=3701.14$ mmHg$^2$, $\\sum y_i^2=3629.41$ mmHg$^2$, $\\sum z_i^2=3615.38$ mmHg$^2$,\n$\\sum x_iy_j=3658.28$ mmHg$^2$, $\\sum x_iz_j=3655.95$ mmHg$^2$, $\\sum y_jz_j=3613.97$ mmHg$^2$.\nWhere $X$ is the blood pressure with the arm monitor, $Y$ with the wrist monitor and $Z$ the real blood pressure.\nWhich blood pressure monitor predicts better the real blood pressure with a linear regression model? If a patient has a real blood pressure of $13.5$ mmHg, what is the expected blood pressure given by the arm blood pressure monitor? Solution Blood pressure with the arm monitor: $\\bar x=13.27$ mmHg, $s^2_x=8.9641$ mmHg².\nBlood pressure with the wrist monitor: $\\bar y=13.125$ mmHg, $s^2_y=9.2049$ mmHg².\nReal blood pressure: $\\bar z=13.12$ mmHg, $s^2_z=8.6346$ mmHg². $s_{xz}=8.6951$ mmHg², $s_{yz}=8.4985$ mmHg², $r^2_{xz}=0.9768$ and $r^2_{yz}=0.9087$.\nThus, the arm monitor predicts better the real pressure with a linear regression model since its linear coef. of determination is greater. Regression line of $X$ on $Z$: $x=0.0581+1.007z$.\nPrediction: $x(13.5)=13.6527$ mmHg. Question 3 The regression line of $Y$ on $X$ is $y=1.2x-0.6$.\nWhich of the following lines can not be the regression line of $X$ on $Y$. Justify the answer. $x=0.9y-0.6$ $x=-0.7y+0.4$ $x=0.8y-0.7$ $x=-0.6y-0.5$ $x=0.4y-0.6$ $x=-0.5y+0.9$ Considering only the ones that can be the regression line of $X$ on $Y$, which one will give better predictions? Justify the answer. Solution (b), (d) and (f) are not possible because the slope is negative, and (a) is not possible because the coef. of determination is greater than 1. (c) gives better predictions because its coef. of determination is greater. Question 4 In an epidemiological study a sample of 400 persons with breast cancer was drawn and another sample of 1200 persons without breast cancer. In the sample of persons with breast cancer there was 180 smokers, while in the sample of persons without breast cancer there was 1140 non-smokers.\nCompute the relative risk of developing cancer smoking and interpret it. Compute the odds ratio of developing cancer smoking and interpret it. Solution Let $C$ be the event of having cancer.\n$RR(C)=4.6364$. That means that the probability of having cancer smoking is $4.6364$ times higher than non-smoking. $OR(C)=15.5455$. As is posibive there is a direct association between smoking and having cancer. The odds of having cancer smoking is more than 15 times greater than non-smoking. Question 5 We want to develop a diagnostic test to rule out a disease when the outcome of the test is negative (negative predictive value) with a probability 90% at least. It is known that the prevalence of the disease in the population is 15% and the sensitivity of the test is set to 80%.\nWhat must be the minimum specificity of the test? Using the previous specificity, compute the probability of a correct diagnostic. If we apply the same test two times to the same patient with negative outcomes, what is the probability of ruling out the disease? Solution Let $D$ be the event of having the disease and $+$ and $-$ the events of getting a positive and a negative outcome in the diagnostic test respectively.\nMinimum specificity $P(-|\\overline{D})=0.3176$. $P(TP) + P(TN) = P(D\\cap +) + P(\\overline{D}\\cap -) = 0.12+0.27 = 0.39$. $P(\\overline{D}| -_1\\cap -_2)=0.9346$. Question 6 It is known that in a city one out of 20 persons, in average, has blood type $AB$.\nIf we draw randomly 200 blood donors, what is the probability of having at least 5 with blood type $AB$? If we draw randomly 10 blood donors, what is the probability of having more than 8 with blood type different of $AB$? Solution Let $X$ be the number of donors with blood type $AB$ in a sample of 200 blood donors. Then $X\\sim B(200,1/20)\\approx P(10)$, and $P(X\\geq 5)=0.9707$. Let $Y$ be the number of donors with no blood type $AB$ in a sample of 10 blood donors. Then $Y\\sim B(10,19/20)$, and $P(Y\u0026gt;8)=0.9139$. Question 7 In a course there are 150 females and 80 males. It is known that the distribution of scores of females and males are normal with the same standard deviation. It is also known that there are 120 females and 56 males with a score greater than 5, and 36 males with a score between 5 and 7.\nCompute the means and standard deviations of the distributions of scores of females and males. How many females will have a score between 4.5 and 8? Above what score will be 10% of females? Solution Let $X$ be score of a random male in the course and $Y$ the score of a random female in the course. Then $X\\sim N(\\mu_x,\\sigma)$ and $Y\\sim N(\\mu_y,\\sigma)$.\n$\\mu_x=5.87$, $\\mu_y=6.41$ and $\\sigma=1.68$. $P(4.5\\leq Y\\leq 8) = 0.7018$, that is, $105.27$ females. $P_{90}=8.8$. ","date":1545004800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1615158565,"objectID":"e9e0cdf3b80324768c6aa76bd9d50ebc","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-12-17/","publishdate":"2018-12-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-12-17/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: December 17, 2018\nQuestion 1 The chart below represents the cumulative distribution of the number of daily defective drugs produced by a machine in a sample of 40 days.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-12-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 19, 2018\nQuestion 1 In a population that is exposed to two viruses strains $A$ and $B$ it is known that 2% of persons are immune only to virus $A$ and 4% are immune only to virus $B$. On the other hand it is known tha 91% of the population would be infected by some of the two viruses.\nWhat is the probability that a person is immune to the two viruses? What is the probability that a person immune to virus $A$ is infected by virus $B$? Are dependent the events of being immune to the two viruses? Solution Let $A$ and $B$ the events of being inmune to virus $A$ and $B$ respectively.\n$P(A\\cap B)=0.09$ $P(\\overline B|A)=0.1818$. The events are dependent. Question 2 In a study about the blood pressure the systolic pressure of 2400 males older than 18 was measured. It was observed that 640 had a pressure greater than 14 mmHg and 1450 had between 10 and 14 mmHg. Assuming that the systolic pressure in males older than 18 is normally distributed,\nCompute the mean and the standard deviation. Compute how many males had a systolic pressure between 11 and 13 mmHg. Compute the value of the systolic pressure such that there was 300 males with a systolic pressure above it. Solution Let $X$ be the systolic pressure, $X\\sim N(12.5788, 2.2815)$. $P(10\\leq X\\leq 13)=0.3288$ and there are $789.0501$ persons with a systolic pressure between 11 and 13 mmHg. 300 males have a systolic pressure above 15.2 mmHg. Question 3 The average number of people that enters the intensive care unit of a hospital in an 8-hours shift is $1.4$.\nCompute the probability that a day enter more than 3 persons in the ICU. Compute the probability that in a week there are more than one day with less than 3 persons entering the ICU. Solution Let $X$ be the number of persons that enter in the ICU in a day. $X\\sim P(4.2)$ and $P(X\u0026gt;3)=0.6046$. Let $Y$ be the number of days in a week with less than 3 persons entering the ICU. $Y\\sim B(7,0.2102)$ and $P(Y\u0026gt;1)=0.4513$. Question 4 Two hospitals use different tests $A$ and $B$ to detect a streptococcal infection. The tables below show the results of applying these tests in each hospital during the last year.\n$$ \\begin{array}{ccc} \\mbox{First hospital} (A) \u0026amp; \\quad \u0026amp; \\mbox{Second hospital} (B) \\newline \\begin{array}{|l|r|r|} \\hline \u0026amp; \\mbox{Test} + \u0026amp; \\mbox{Test} - \\newline \\hline \\mbox{Infected} \u0026amp; 705 \u0026amp; 65 \\newline \\hline \\mbox{Non infected} \u0026amp; 120 \u0026amp; 4110 \\newline \\hline \\end{array} \u0026amp; \u0026amp; \\begin{array}{|l|r|r|} \\hline \u0026amp; \\mbox{Test} + \u0026amp; \\mbox{Test} - \\newline \\hline \\mbox{Infected} \u0026amp; 1710 \u0026amp; 70 \\newline \\hline \\mbox{Non infected} \u0026amp; 415 \u0026amp; 7805 \\newline \\hline \\end{array} \\end{array} $$\nCompute the probability of a correct diagnostic with test $A$. Compute the positive predicted value of test $A$. Compute the negative predicted value of test $B$. How can these tests be combined to reduce the risk of wrong diagnosis? Solution $P(\\mbox{Correct diagnotic})=0.963$. $PPV_A=0.8545$. $NPV_B=0.9911$. $NPV_A=0.9844$ and $PPV_B=0.8047$. Since $B$ has the higher negative predicted value and $A$ the higher positive predicted value, it is better to use test $B$ first to rule out the infection and then apply test $A$ only to individuals with a positive outome in test $B$, to confirm the infection. ","date":1542585600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a9d96c57ed3c123a92228bb40a986d97","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-11-19/","publishdate":"2018-11-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-11-19/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 19, 2018\nQuestion 1 In a population that is exposed to two viruses strains $A$ and $B$ it is known that 2% of persons are immune only to virus $A$ and 4% are immune only to virus $B$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-11-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 29, 2018\nQuestion 1 A study about obesity in a city has measured the body mass index (BMI) in a sample. The collected data is shown in the table below.\n$$ \\begin{array}{lr} \\mbox{BMI} \u0026amp; \\mbox{Persons} \\newline \\hline 15-18 \u0026amp; 5 \\newline 18-21 \u0026amp; 62 \\newline 21-24 \u0026amp; 72 \\newline 24-27 \u0026amp; 45 \\newline 27-30 \u0026amp; 12 \\newline 30-33 \u0026amp; 2 \\newline 33-36 \u0026amp; 1 \\newline 36-39 \u0026amp; 1 \\newline \\hline \\end{array} $$\nCompute the percentage of people with a BMI between 19 and 25. Which is the BMI with a 20% of persons above it? Are there outliers in the sample? Give the outliers if there are some. Solution Non interpolating:\n$F(19)\\approx 0.335$ and $F(25)\\approx 0.920$, so the percentage of people between 19 and 25 is 58.5% approximately. $P_{80}\\approx 25.5$. $Q_1\\approx 19.5$, $Q_3\\approx 25.5$, $IQR\\approx 6$, $f_1\\approx 10.5$ and $f_2\\approx 34.5$. Thus there is at leats one outlier in the interval (36-39). Interpolating: $F(19)=0.1283$ and $F(25)=0.77$, so the percentage of people between 19 and 25 is 64.17% $P_{80}=25.4$. $Q_1=20.1774$, $Q_3=24.7333$, $IQR=4.5559$, $f_1=13.3435$ and $f_2=31.5671$. Thus there are at leats two outliers in the intervals (33-36) and (36-39). Question 2 A gene of a rat species has been modified to help the metabolization of cholesterol in blood. To check the effectiveness of this genetic modification two samples of 20 rats were drawn, ones with the gene modified and the others not, and they were fed with the same diet with different concentrations of palm oil during one month. The following sums summarize the results:\nPalm oil quantity in gr (the same in both samples)\n$\\sum x_i=640.6467$ gr, $\\sum x_i^2=23508.6387$ gr², $\\sum(x_i-\\bar x)^3=-5527.08$ gr³, $\\sum(x_i-\\bar x)^4=792910$ gr⁴\nCholesterol level in blood in mg/dl of non genetically modified rats $\\sum y_j=2945.8545$ mg/dl, $\\sum y_j^2=439517.5975$ (mg/dl)², $\\sum(y_j-\\bar y)^3=604.08$ (mg/dl)³, $\\sum(y_j-\\bar y)^4=3717331.07$ (mg/dl)⁴\n$\\sum x_iy_j=98156.0658$ gr$\\cdot$mg/dl.\nCholesterol level in blood in mg/dl of genetically modified rats\n$\\sum y_j=2126.5899$ mg/dl, $\\sum y_j^2=226824.5373$ (mg/dl)², $\\sum(y_j-\\bar y)^3=-629.4$ (mg/dl)³, $\\sum(y_j-\\bar y)^4=48248.29$ (mg/dl)⁴ $\\sum x_iy_j=69517.3648$ gr$\\cdot$mg/dl.\nIn which sample the cholesterol has a more representative mean, genetically modified or non modified rats? In which sample the distribution of cholesterol is more skew? In which sample the kurtosis of the distribution of cholesterol is less normal? Which rat has a cholesterol level relatively bigger, a genetically modified rat with a cholesterol level of 130 mg/dl, or a non genetically modified rat with a cholesterol level of 145 mg/dl? In which sample the regression line of cholesterol on the palm oil quantity fits better? According to the regression line, what level of cholesterol is expected for a genetically modified rat with a diet of 25 gr of palm oil? And for a non genetically modified rat? What amount of palm oil must be supplied to a non genetically modified rat to have a cholesterol level smaller than 150 mg/dl? Is this prediction reliable? Solution Non genetically modified rats: $\\bar y=147.2927$ mg/dl, $s^2_y=280.7332$ (mg/dl)², $s=16.7551$ mg/dl and $cv_y=0.1138$. Genetically modified rats: $\\bar y=106.3295$ mg/dl, $s^2_y=35.265$ (mg/dl)², $s=5.9384$ mg/dl and $cv_y=0.0558$. Thus, the mean of genetically modified rats is more representative since the coef. of variation is smaller. Non genetically modified rats: $g_1=0.0064$. Genetically modified rats: $g_1-0.1503$ Thus, the distribution of genetically modified rats is more skew since the coef. of skewness is further from 0. Non genetically modified rats: $g_2=-0.6416$. Genetically modified rats: $g_2-1.0602$ Thus, the kurtosis of the distribution of genetically modified rats is less normal since the coef. of kurtosis is further from 0. Non genetically modified rats: $z(145)=-0.1368$. Genetically modified rats: $z(130)=3.986$. Thus, a cholesterol level of 130 mg/dl in genetically modified rats is relatively greater than 145 mg/dl in non genetically modidied rats. $\\bar x=32.0323$ gr, $s^2_x=149.3614$ gr². Non genetically modified rats: $s_{xy}=189.6733$ gr$\\cdot$mg/dl and $r^2=0.858$. Genetically modified rats: $s_{xy}=69.8861$ gr$\\cdot$mg/dl and $r^2=0.9273$. Thus, the regression line fits better in genetically modified rats since the coef. of determination is greater. Regression line of $Y$ on $X$ in non genetically modified rats: $y=106.615+1.2699x$. Prediction: $y(25)=138.3624$ Regression line of $Y$ on $X$ in genetically modified rats: $y=91.3416+0.4679x$. Prediction: $y(25)=103.0391$ Regression line of $X$ on $Y$ in non genetically modified rats: $x=-67.4838+0.6756y$. Prediction: $x(150)=33.8615$. The prediction is very reliable since the coef. of determination is close to 1. Question 3 It is known that the regression line of $Y$ on $X$ has equation $3x+2y-4=0$ and it explains half of the variability of $Y$. According to the linear regression model, how much will $X$ change for each unit that increases $Y$?\nSolution $r^2=0.5$ and $b_{xy}=-\\frac{1}{3}$, so $X$ decreases 1/3 of the increase of $Y$. ","date":1540771200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1609746010,"objectID":"99f2fc30cabfe509c3a18fd72f648889","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-10-29/","publishdate":"2018-10-29T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-10-29/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 29, 2018\nQuestion 1 A study about obesity in a city has measured the body mass index (BMI) in a sample. The collected data is shown in the table below.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-10-29","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 31, 2018\nQuestion 1 The ages of a sample of patients of a physical therapy clinic are:\n25, 30, 44, 44, 51, 51, 53, 56, 57, 58, 58, 58, 59, 59, 61, 63, 63, 63, 66, 68, 70, 71, 72, 74, 82, 85\nCompute the quartiles.\nDraw the box plot and identify outliers (do not group data into intervals).\nSplit the sample into two groups, patients younger and older than 65. In which group is the mean more representative. Justify the answer.\nWhich distribution is less symmetric, the one of patients younger than 65 or the one of patients older?\nWhich age is relatively smaller with respect to its group, 50 years in the group of patients younger than 65 or 72 years in the group of patients older than 65?\nUse the following sums for the computations.\nYounger than 65: $\\sum x_i=953$ years, $\\sum x_i^2=52475$ years$^2$, $\\sum (x_i-\\bar x)^3=-30846.51$ years$^3$ and $\\sum (x_i-\\bar x)^4=939658.83$ years$^4$.\nOlder than 65: $\\sum x_i=588$ years, $\\sum x_i^2=43530$ years$^2$, $\\sum (x_i-\\bar x)^3=1485$ years$^3$ and $\\sum (x_i-\\bar x)^4=26983.5$ years$^4$.\nSolution $Q_1=53$ years, $Q_2=59$ years and $Q_3=68$ years. There are 2 outliers: 25, 30. Let $x$ be the age in patients younger than 65 and $y$ the age in patients older than 65.\n$\\bar x=52.9444$ years, $s_x^2=112.1636$ years$^2$, $s_x=10.5907$ years and $cv_x=0.2$.\n$\\bar y=73.5$ years, $s_y^2=39$ years$^2$, $s_y=6.245$ years and $cv_y=0.085$.\nThe mean is more representative in patients older than 65 since the coefficient of variation is smaller. $g_{1x}=-1.4426$ and $g_{1y}=0.7621$, thus the distribution of ages of people younger than 65 is less symmetric. The standard scores are $z_x(50)=-0.278$ and $z_y(72)=-0.2402$, thus 50 years is relative smaller in the group of people younger than 65. Question 2 The table below shows the number of injuries of several teams during a league and the average varm-up time of its players.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Warm-up time} \u0026amp; 15 \u0026amp; 35 \u0026amp; 22 \u0026amp; 28 \u0026amp; 21 \u0026amp; 18 \u0026amp; 25 \u0026amp; 30 \u0026amp; 23 \u0026amp; 20 \\newline \\mbox{Injuries} \u0026amp; 42 \u0026amp; 2 \u0026amp; 16 \u0026amp; 6 \u0026amp; 17 \u0026amp; 29 \u0026amp; 10 \u0026amp; 3 \u0026amp; 12 \u0026amp; 20 \\newline \\hline \\end{array} $$\nDraw the scatter plot.\nWhich regression model is more suitable to predict the number of injuries as a function of the warm-up time, the logarithmic or the exponential? Use that regression model to predict the expected number of injuries for a team whose players warm-up 20 minutes a day.\nWhich regression model is more suitable to predict the warm-up time as a function of the number of injuries, the logarithmic or the exponential? Use that regression model to predict the warm-up time required to have no more than 10 injuries in a league.\nAre these predictions reliable? Which one is more reliable?\nUse the following sums for the computations ($X$ warm-up time and $Y$ number of injuries):\n$\\sum x_i=237$, $\\sum \\log(x_i)=31.3728$, $\\sum y_j=157$, $\\sum \\log(y_j)=24.0775$,\n$\\sum x_i^2=5937$, $\\sum \\log(x_i)^2=98.9906$, $\\sum y_j^2=3843$, $\\sum \\log(y_j)^2=66.3721$,\n$\\sum x_iy_j=3115$, $\\sum x_i\\log(y_j)=519.1907$, $\\sum \\log(x_i)y_j=465.8093$, $\\sum \\log(x_i)\\log(y_j)=73.3995$.\nSolution $\\bar x=23.7$ min, $s_x^2=32.01$ min$^2$. $\\bar \\log(x)=3.1373$ log(min), $s_{\\log(x)}^2=0.0565$ log(min)$^2$. $\\bar y=15.7$ injuries, $s_y^2=137.81$ injuries$^2$. $\\bar \\log(y)=2.4078$ log(injuries), $s_{\\log(y)}^2=0.8399$ log(injuries)$^2$. $s_{x\\log(y)}=-5.1446$, $s_{\\log(x)y}=-2.6744$. Exponential determination coefficient: $r^2=0.9844$. Logarithmic determination coefficient: $r^2=0.9185$. So the exponential regression model es better to predict the number of injuries as a function of the warm-up time. Exponential regression model: $y=e^{6.2168+-0.1607x}$.\nPrediction: $y(20)=20.1341$ injuries.\nThe logarithmic model is better to predict the warm-up time as a function of the number of injuries. Logarithmic regression model: $x=164.1851+-47.3292\\log(y)$. Prediction: $x(10)=55.2056112360638$ min.\nBoth predictions are very reliable since the determination coefficient is very high, but the last one is a little less reliable as it is for a value further from the data range.\nQuestion 3 An ultrasonic technique is used to diagnose a disease with a sensitivity of 91% and a specificity of 98%. The prevalence of the disease is 20%,\nIf we apply the technique to an individual and the outcome is positive, what is the probability of having the disease for that individual?\nIf the outcome was negative, what is the probability of not having the disease?\nIs this technique more reliable to confirm or to rule out the disease? Justify the answer.\nCompute the probability of having a correct diagnosis with this technique.\nSolution Let $D$ the event corresponding to have the disease and + and - the events corresponding to have a positive and negative outcome respectively in the test.\n$PPV=0.9192$. $NPV=0.9776$. It is more reliable to rule out the disease since the NPV is greater than the PPV. $P(D\\cap +)+P(\\overline D\\cap -) = 0.966$. Question 4 It is known that the femur length of a fetus with 25 weeks of pregnancy follows a normal distribution with mean 44 mm and standard deviation 2 mm.\nCompute the probability that the femur length of a fetus with 25 weeks is greater than 46 mm.\nCompute the probability that the femur length of a fetus with 25 weeks is between 46 and 49 mm.\nCompute an interval $(a,b)$ centered at the mean, such that it contains 80% of the femur lengths of fetus with 25 weeks.\nSolution Let $X\\sim N(44,2)$ be the femur length of fetus with 25 weeks of pregnancy.\n$P(X\u0026gt;46)=0.1587$. $P(46\u0026lt;X\u0026lt;49))=0.1524$. The interval centered at $44$ that contains 80% of the femur lengths of fetus with 25 weeks is $(41.4369,46.5631)$. Question 5 The probability that an injury $A$ is repeated is $4/5$, the probability that another injury $B$ is repeated is $1/2$, and the probability that none of them are repeated is $1/20$. Compute the probability of the following events:\nAt least one injury is repeated.\nOnly injury $B$ is repeated.\nInjury $B$ is repeated if injury $A$ has been repeated.\nInjury $B$ is repeated if injury $A$ has not been repeated.\nSolution $P(A\\cup B)=19/20$. $P(B\\cap\\overline{A})=3/20$. $P(B/A)=7/16$. $P(B/\\overline{A})=3/4$. Question 6 A physical therapy clinic opens 6 hours a day and the average number of patients that arrive to the clinic is 12 a day.\nCompute the probability of arriving more than 4 patients in 1 hour.\nIf the clinic has 4 physiotherapists and each of them can treat one patient per hour, what is the probability that a day there was some hour in which some patient can not be attended? How many physiotherapists must be in the clinic to guarantee that this probability is less than 10%?\nSolution Let $X$ be the number of patients that arrive in 1 hours. $X\\sim P(2)$ and $P(X\u0026gt;4)=0.0527$. Let $Y$ be the number of hours in a day in which some patient can not be treated. $Y\\sim B(6, 0.0527)$ and $P(Y\u0026gt;0)=0.2771$.\nThe clinic requires 5 physiotherapists, since $P(X\u0026gt;5)=0.0527$ and $P(Y\u0026gt;0)=0.0954$, with $Y\\sim B(6, 0.0166)$ now. ","date":1527724800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"c366dd731d9f6a89fe137304002a8cee","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-05-31/","publishdate":"2018-05-31T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-05-31/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 31, 2018\nQuestion 1 The ages of a sample of patients of a physical therapy clinic are:\n25, 30, 44, 44, 51, 51, 53, 56, 57, 58, 58, 58, 59, 59, 61, 63, 63, 63, 66, 68, 70, 71, 72, 74, 82, 85","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2018-05-31","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: April 9, 2018\nQuestion 1 The chart below describes the distribution of the head arc of rotation (in degrees) in people working with and without computers.\nPlot the ogive of the head arc of rotation for people working with computers. If a person with a head arc of rotation less than or equal to 115 degrees is considered a person with reduced mobility, what percentage of people working with computers has reduced mobility? Which distribution has a more representative mean of the head arc of rotation, people working with computers or people not working with computers? Compute the global mean of the head arc of rotation. Which distribution is more asymmetric, people working with computers or people not working with computers? Which value of the head arc of rotation is relatively less, 150 degrees in people working with computers or 170 in people not working with computers? Use the following sums for the computations.\nWith computer: $\\sum x_i=3970$ degrees, $\\sum x_i^2=534750$ degrees$^2$, $\\sum (x_i-\\bar x)^3=103662.22$ degrees$^3$ and $\\sum (x_i-\\bar x)^4=7903715.56$ degrees$^4$.\nWithout computers: $\\sum x_i=4230$ degrees, $\\sum x_i^2=645900$ degrees$^2$, $\\sum (x_i-\\bar x)^3=-42359.69$ degrees$^3$ and $\\sum (x_i-\\bar x)^4=4101700.53$ degrees$^4$.\nSolution $F(115)=0.1667 \\rightarrow 16.67%$ of people working with computers have reduced mobility. With computer: $\\bar x=132.3333$ degrees, $s_x^2=312.8889$ degrees², $s_x=17.6887$ degrees and $cv_x=0.1337$ Without computer: $\\bar x=151.0714$ degrees, $s_x^2=245.2806$ degrees², $s_x=15.6614$ degrees and $cv_x=0.1037$ The mean of people working without computer is more representative than the mean of people working with computers since its coefficient of variation is smaller. $\\bar x=141.3793$. With computer $g_1=0.6243$ and without computer $g_1=-0.3938$. Therefore, the distribution of people working with computers is more asymmetric. Standard scores: $z(150)=0.9988$ and $z(170)=1.2086$. Therefore, an arc of rotation of 150 degrees in people working with computers is relatively smaller than an arc of rotation of 170 in people working without computers. Question 2 The concentration of a drug in blood $C$, in mg/dl, depends on time $t$, in hours, according to the following table:\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Time} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8\\newline \\mbox{Concentration} \u0026amp; 25 \u0026amp; 36 \u0026amp; 48 \u0026amp; 64 \u0026amp; 86 \u0026amp; 114 \u0026amp; 168\\newline \\hline \\end{array} $$\nWhich regression model, the linear or the exponential, is more reliable to predict the concentration of the drug as a function of time? Use the best model to predict the concentration of drug in blood after $4.8$ hours. Use the following sums for the computations:\n$\\sum x_i=35$, $\\sum \\log(x_i)=10.6046$, $\\sum y_j=541$, $\\sum \\log(y_j)=29.147$,\n$\\sum x_i^2=203$, $\\sum \\log(x_i)^2=17.5205$, $\\sum y_j^2=56937$, $\\sum \\log(y_j)^2=124.0131$,\n$\\sum x_iy_j=3328$, $\\sum x_i\\log(y_j)=154.3387$, $\\sum \\log(x_i)y_j=951.6961$, $\\sum \\log(x_i)\\log(y_j)=46.0805$.\nSolution Linear model of Concentration on Time: $\\bar x=5$ hours, $s_x^2=4$ hours² . $\\bar y=77.2857$ mg/dl, $s_y^2=2160.7755$ (mg/dl)². $s_{xy}=89$ hours⋅mg/dl.\nLinear coefficient of determination of Concentration on Time $r^2=0.9165$.\nExponential model of Concentration on Time: $\\overline{\\log(y)}=4.1639$ log(mg/dl), $s_{\\log(y)}^2=0.3785$ log(mg/dl)². $s_{x\\log(y)}=1.2291$ hours⋅log(mg/dl).\nExponential coefficient of determination of Concentration on Time $r^2=0.9979$.\nTherefore, the exponential model explains better than the linear one the relation between the concentration and time, since its coefficient of determination is greater.\nExponential model of Concentration on Time: $y=e^{2.6275 + 0.3073x}$. $y(4.8)=60.4853$ mg/dl.\n","date":1523232000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"f59a5652dc07feee9c0ee716219e2bb6","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-04-09/","publishdate":"2018-04-09T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-04-09/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: April 9, 2018\nQuestion 1 The chart below describes the distribution of the head arc of rotation (in degrees) in people working with and without computers.\nPlot the ogive of the head arc of rotation for people working with computers.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2018-04-09","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 19, 2018\nQuestion 1 Find an equation of the tangent plane to the surface $S: e^xy-zy^2+\\frac{x^4}{z}=-1$ at the point $P=(0,1,2)$. Find the tangent line to the curve obtained by the intersection of $S$ and the plane $z=2$ at the given point $P$. Solution Tangent plane: $x-3y-z+5=0$. Tangent line: $(3t,1+t)$ or $y=\\frac{x}{3}+1$. Question 2 An organism metabolizes (eliminates) alcohol at a rate of three times the amount of alcohol present in the organism per hour. If the organism does not have alcohol at initial time and it starts to get alcohol at a constant rate of 12 cl per hour; how much alcohol will be in the organism after 5 hours? What will be the maximum amount of alcohol in the organism? When will that maximum amount be achieved?\nSolution Let $y$ be the alcohol in the organism and $t$ the time.\nDifferential equation: $y\u0026rsquo;=12-3y$.\nSolution: $y(t)=4-4e^{-3t}$.\n$y(5)=3.99$ cl.\nThe maximum amount of alcohol will be 4 cl and it will be achieved at $t=\\infty$. Question 3 Three alleles (alternative versions of a gene) $A$, $B$ and $O$ determine the four blood types $A$ ($AA$ or $AO$), $B$ ($BB$ or $BO$), $O$ ($OO$) and $AB$. The Hardy-Weinberg Law states that the proportion of individuals in a population who carry two different alleles is\n$$ p(x,y,z)=2xy+2xz+2yz $$\nwhere $x$, $y$ and $z$ are the proportions of $A$, $B$ and $O$ in the population. Use the fact that $x+y+z=1$ to compute the maximum value of $p$.\nSolution There is a local maximum at $(\\frac{1}{3},\\frac{1}{3})$ and $f(\\frac{1}{3},\\frac{1}{3})=\\frac{2}{3}$. Question 4 Three substances interact in a chemical process in quantities $x$, $y$ and $z$. At equilibrium, the three quantities are related by the following equation:\n$$ \\ln z - \\frac{x^2y}{z}=-1 $$\nAssume $z$ is an implicit function of $x$ and $y$; compute the variation of $z$ when $x=y=z=1$ and $y$ decreases at the same rate as $x$ increases.\nSolution Directional derivative of $z$ in $(1,1,1)$ along $\\mathbf{v}=(1,-1)$: $z\u0026rsquo;_\\mathbf{v}(1,1,1)=\\frac{1}{2\\sqrt{2}}$. ","date":1516320000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"db8e1450fc62ef751b48ebd4ae3e9902","permalink":"/en/teaching/calculus/exams/pharmacy-2018-01-19/","publishdate":"2018-01-19T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2018-01-19/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 19, 2018\nQuestion 1 Find an equation of the tangent plane to the surface $S: e^xy-zy^2+\\frac{x^4}{z}=-1$ at the point $P=(0,1,2)$. Find the tangent line to the curve obtained by the intersection of $S$ and the plane $z=2$ at the given point $P$.","tags":["Exam"],"title":"Pharmacy exam 2018-01-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 19, 2018\nQuestion 1 A study done on a group of senior people to determine the relation between age $X$, and the number of visits to the doctor $Y$, shows the following results:\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Age} \u0026amp; 62 \u0026amp; 65 \u0026amp; 71 \u0026amp; 79 \u0026amp; 83 \u0026amp; 88 \u0026amp; 90 \u0026amp; 95\\newline \\mbox{No. of Visits} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 6 \u0026amp; 6 \u0026amp; 8 \u0026amp; 10 \u0026amp; 14\\newline \\hline \\end{array} $$\nDo the following:\nEstimate the number of times a 70-year-old patient will go to the doctor, according to a linear regression model. What will be the estimate equal to if you consider an exponential model instead of the linear one? Which of the two estimates is more reliable? A potential model has equation of the type $Y=aX^b$, where $a$ and $b$ are constants to be determined; what transformation should you apply to the variables $X$ and $Y$ to change a potential model into a linear one? Use the following sums for the computations: $\\sum x_i=633$, $\\sum \\log(x_i)=34.8835$, $\\sum y_j=53$, $\\sum \\log(y_j)=13.7827$, $\\sum x_i^2=51109$, $\\sum \\log(x_i)^2=152.28$, $\\sum y_j^2=461$, $\\sum \\log(y_j)^2=26.6206$, $\\sum x_iy_j=4509$, $\\sum x_i\\log(y_j)=1144.0108$, $\\sum \\log(x_i)y_j=235.1289$, $\\sum \\log(x_i)\\log(y_j)=60.7921$.\nSolution Linear model of Visits on Age: $\\bar x=79.125$ years, $s_x^2=127.8594$ years² . $\\bar y=6.625$ visits, $s_y^2=13.7344$ visits². $s_{xy}=39.4219$ years⋅visits. Regression line of Visits on Age: $y=-17.771 + 0.3083x$. $y(70) =3.8116$ visits.\n$\\overline{\\log(y)}=1.7228$ log(visits), $s_{\\log(y)}^2=0.3594$ log(visits)². $s_{x\\log(y)}=6.6823$ years⋅log(visits). Exponential model of Visits on Age: $y=e^{-2.4124 + 0.0523x}$. $y(70)=3.4762$ visits.\nLinear coefficient of determination of Visits on Age $r^2=0.885$. Exponential coefficient of determination of Visits on Age $r^2=0.9716$. Thus, the exponential model explains a little bit better the number of visits to the doctor with respect to the age.\nWe must apply the logarithm to both Visits and Age: $\\log(Y)=\\log(aX^b)\\Rightarrow \\log(Y)=\\log(a)+\\log(X^b)=\\log(a)+b\\log(X)=a\u0026rsquo;+b\\log(X)$.\nQuestion 2 The grass pollen concentration in the center of a city in grains/m$^3$ of air, during the last year, is given in the following table:\n$$ \\begin{array}{cr} \\hline \\mbox{Pollen concentration} \u0026amp; \\mbox{Num days}\\newline 0-300 \u0026amp; 51\\newline 300-500 \u0026amp; 60\\newline 500-600 \u0026amp; 79\\newline 600-800 \u0026amp; 91\\newline 800-1000 \u0026amp; 60\\newline 1000-1300 \u0026amp; 24\\newline \\hline \\end{array} $$\nHealth authorities have determined that the level of pollen did not pose a risk for 75% of the days in the year; what is the minimum level of pollen that is consider a health hazard? On days with pollen level between 575 and 860 health authorities issue a warning to citizens; on how many days of the last year there were warnings issued? Are there outliers in the above sample? Platanaceae has a pollen cycle similar to grass: if $X$ are the pollen levels of grass, and $Y$ are the levels of the platanaceae, it is known that $Y=0.5X-100$. What will be the average pollen level for platanaceae? Which of the two averages is more representative? Can one say that the level of grass pollen comes from a population that is normally distributed? Use the following sums for the computations: $\\sum x_i=220400$ grains/m$^3$, $\\sum x_i^2=159575000$ (grains/m$^3$)$^2$, $\\sum (x_i-\\bar x)^3=261917220.867$ (grains/m$^3$)$^3$ y $\\sum (x_i-\\bar x)^4=4872705679772.61$ (grains/m$^3$)$^4$.\nSolution $P_{75}=784.0417$ grains/m³. $F(575)=0.4664$ and $F(860)=0.8192$, so the frequency of days with a warning is $0.3528$ that correspond to $128.77$ days. $Q_1=434.1849$ grains/m³, $Q_3=784.0417$ grains/m³ and $IQR=349.8568$ grains/m³. Fences: $F_1=-90.6001$ grains/m³ and $F_2=1308.8269$ grains/m³. Since all the values fall into the fences there are no outliers. $\\bar x=603.8356$ grains/m³, $s_x^2=72574.3291$ (grains/m³)², $s_x=269.3962$ grains/m³ and $cv_x=0.4461$ $\\bar y=201.9178$ grains/m³, $s_y=134.6981$ grains/m³ and $cv_y=0.6671$. The mean of $X$ is more representative than the mean of $Y$ as $cv_x\u0026lt;cv_y$. $g_1=0.0367$ and $g_2=-0.4654$. As both of them are between -2 and 2, we can assume that the pollen concentrations are normally distributed. Question 3 Polen level in Madrid in the year 2017 is normally distributed with mean equal to 90 particles per cubic meter. In 42 days of 2017, the level was above 120 particles per cubic meter. Do the following:\nCompute the standard deviation of the polen level in the year 2017. On how many days the polen level did not go over 50 particles per cubic meter of air? On 20% of the days the level of polen was high enough to pose a health risk for allergic people; what is the level of polen that triggers this high risk situation? Solution Let $X$ be the polen level in Madrid in 2017. $X\\sim N(90,\\sigma)$.\n$\\sigma=25$ grains/m³. $P(X\\leq 50)=0.0548$ that correspond to $20.0017$ days. $P_{80}=111.0405$ grains/m³. Question 4 A study on two drugs to reduces the cholesterol levels in blood shows that drug $A$ is effective in 75% of the people, and drug $B$ is effective in 85% of the cases. There is a 5% of people for which none of the two drugs works.\nCompute the percentage of the population for which only drug $A$ works. Assume that drug $A$ works on a person; what is the probability hat drug $B$ will also work in that person? On the other hand, if drug $B$ has not worked for a person, what is the probability that drug $A$ will actually work? Are the effects of the two drugs independent events? Solution $P(A\\cap \\overline B)=0.1$, that is, a $10%$. $P(B|A)=0.8667$. $P(A|\\overline B)=0.6667$. $P(B|A)\\neq P(B)$, thus the events are dependent. Question 5 The weekly average births on a hospital is equal to 14.\nCompute the probability that on a given day more than 2 births take place. Compute the probability that during a week there are more than one day without births taken place. Solution Let $X$ be the number of births in a day. $X\\sim P(2)$. $P(X\u0026gt;2)=0.3233.$ Let $Y$ be the number of days without births in a week. $Y\\sim B(7,0.1353)$. $P(Y\u0026gt;1)=0.2427$. Question 6 A trial to develop a diagnosis test for a desease is tested on 250 people, of which 50 suffer the desease and 200 are healthy. The medical team in charge of the trial wants for the test to have a positive predictive value of $0.7$, and a negative predictive value of $0.9$.\nIn order to get the values given above, how many of the healthy people should get a positive outcome in the test? And how many of the sick people should get a negative outcome in the test? What is the probability that a person with two positive outcomes in the test has the desase? Solution Let $D$ be the event of having the disease.\n$P(+|\\overline{D})=0.0625\\Rightarrow 12.5$ persons. $P(-|D)=0.4165\\Rightarrow 20.825$ persons. $P(D|+\\cap +)=0.9561$. ","date":1516320000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6176c3ec77be3d921871660423823b51","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-01-19/","publishdate":"2018-01-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-01-19/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 19, 2018\nQuestion 1 A study done on a group of senior people to determine the relation between age $X$, and the number of visits to the doctor $Y$, shows the following results:","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-01-19","type":"book"},{"authors":["Alfonso; Sánchez-Alberca, Alfredo; Sanchez-Rodríguez, María Luisa; García-Medina, José Javier Parra-Blesa"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"32a70383266fe72dd001692ee75901b5","permalink":"/en/publication/analisis-2018/","publishdate":"2020-09-16T21:26:03.202933Z","relpermalink":"/en/publication/analisis-2018/","section":"publication","summary":"","tags":[],"title":"Análisis epidemiológico evolutivo del daño sectorizado en la papila y retina papilar a través de OCT. Nueva clasificación de grados de GCAA.","type":"publication"},{"authors":["Alfonso Parra Blesa y Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"8640a1da22ff6713a8127e31da65afaf","permalink":"/en/publication/analisis-2018-2/","publishdate":"2020-09-16T21:26:03.500896Z","relpermalink":"/en/publication/analisis-2018-2/","section":"publication","summary":"","tags":[],"title":"Análisis estadístico inferencial y descriptivo de las capas retinianas; glaucoma versus no glaucoma","type":"publication"},{"authors":["Alfonso Parra Blesa y Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"5685a40d87b8cfe9d30a2750e895478b","permalink":"/en/publication/clasificacion-2018/","publishdate":"2020-09-16T21:26:03.396092Z","relpermalink":"/en/publication/clasificacion-2018/","section":"publication","summary":"","tags":[],"title":"Clasificación por estadios del glaucoma primario de ángulo abierto usando valores normalizados del anillo BMO","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"51e29658e8323bebe5e43b0869705ba6","permalink":"/en/publication/nueva-2018-2/","publishdate":"2020-09-16T21:26:03.299049Z","relpermalink":"/en/publication/nueva-2018-2/","section":"publication","summary":"","tags":[],"title":"Una nueva taxonomía de colecciones y de funciones de similitud para su comparación","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"6135f07167be641ec45b3f1793a049e7","permalink":"/en/publication/nueva-2018/","publishdate":"2020-09-16T21:26:01.940216Z","relpermalink":"/en/publication/nueva-2018/","section":"publication","summary":"","tags":[],"title":"Una nueva taxonomía de colecciones y de funciones de similitud para su comparación","type":"publication"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 27, 2017\nQuestion 1 The following diagram show the NO₂ emissions (𝜇g/m³) in Madrid during the month of October, 2017.\nThe European Standards on Air Quality establish that the average monthly value cannot be over 40 𝜇g/m³ for a healthy environment. Was this requirement met during the month of October? Is the value computed representative of the measurements taken during the month of October? The Local Government of Madrid has set speed limits on those days with emissions measurements over 72 𝜇g/m³; furthermore, there will be additional parking restrictions if the level is over 92 𝜇g/m³. What percentage of days in October had only speed restrictions? According to the October sample shown, can we say that the distribution of the NO₂ emissions in the city of Madrid is normally distributed? Besides the NO₂ level, the Municipal Corporation also checks the level of SO₂, and it has found out that the average level of this substance during October was 2.85 𝜇g/m³, with a standard deviation equa to 0.42 𝜇g/m³. On a day with an NO₂ level of 46, and an SO₂ level of 2.24, which level should be considered higher? The Air Quality Index (AQI) is computed by multiplying the NO₂ level by 0.95, and adding 30 to the result. What was the average AQI in Madrid during the month of October? Is this value more or less representative than the average NO₂ level? Are there outliers in the NO₂ emissions in October? Justify your answer. Use the following data for your computations: $\\sum x_i=1945$ 𝜇g/m³,$\\sum x_i^2=131575$ (𝜇g/m³)$^2$, $\\sum (x_i-\\bar x)^3=93995.838$ (𝜇g/m³)³ y $\\sum (x_i-\\bar x)^4=7766271.021$ (𝜇g/m³)⁴.\nSolution $\\bar x=62.7419$ 𝜇g/m³, so the requirement was not met. $s^2=307.8044$ (𝜇g/m³)², $s=17.5444$ 𝜇g/m³, $cv=0.2796$. As the coefficient of variation is less than 0.3 there is a low variability and the mean is quite representative. $F(72)=0.7097$ and $F(92)=0.9161$, so the percentage of days with only speed restrictions is $20.64%$. $g_1=0.5615$ and $g_2=-0.3558$. As both of them are between -2 and 2, we can assume that the emissions are normally distributed. NO₂: $z(46)=-0.9543$. SO₂: $z(2.24)=-1.4524$. Thus, the NO₂ emission is relatively higher. Let $y=0.95x+30$ the AQI. $\\bar y=89.6048$, $s_y=16.6671$, $cv=0.186$. As the coeffitient of variation is lower, the AQI mean is more representative. $Q_1=49.5816$ 𝜇g/m³, $Q_3=74.0093$ 𝜇g/m³ and $IQR=24.4277$ 𝜇g/m³. Fences: $F_1=12.94$ 𝜇g/m³ and $F_2=110.65$ 𝜇g/m³. Thus, there are outliers. Question 2 The table below shows the flu incidence rate (per 100,000 people) registered after a number of days from the beginning of the study.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Days} \u0026amp; 1 \u0026amp; 5 \u0026amp; 8 \u0026amp; 12 \u0026amp; 20 \u0026amp; 26 \u0026amp; 38 \u0026amp; 44\\newline \\mbox{Flu rate} \u0026amp; 60 \u0026amp; 66 \u0026amp; 71 \u0026amp; 80 \u0026amp; 106 \u0026amp; 132 \u0026amp; 194 \u0026amp; 235\\newline \\hline \\end{array} $$\nEstimate the flu incidence rate 50 days after the beginning of the study with a linear regression model. What is the daily rate of change of the flu incidence rate, according to the linear model computed? Estimate the incidence rate 50 days after the beginning of the study with an exponential regression model? Which of the two estimates is more reliable? Why? Use the following data for your computations ($X=$Days and $Y=$Flu rate): $\\sum x_i=154$, $\\sum \\log(x_i)=19.8494$, $\\sum y_j=944$, $\\sum \\log(y_j)=37.2024$, $\\sum x_i^2=4690$, $\\sum \\log(x_i)^2=60.2309$, $\\sum y_j^2=140918$, $\\sum \\log(y_j)^2=174.8363$, $\\sum x_iy_j=25182$, $\\sum \\log(x_i)y_j=2795.2484$, $\\sum x_i\\log(y_j)=772.3504$, $\\sum \\log(x_i)\\log(y_j)=96.1974$.\nSolution Linear model of flu rate on days: $\\bar x=19.25$ days, $s_x^2=215.6875$ days² . $\\bar y=118$ people, $s_y^2=3690.75$ people². $s_{xy}=876.25$ days⋅people. Regression line of flu rate on days: $y=39.7951 + 4.0626x$. $y(50) =242.9247$.\n$4.0626$ persons per day.\n$\\overline{\\log(y)}=4.6503$ log(people), $s_{\\log(y)}^2=0.2293$ log(people)². $s_{x\\log(y)}=7.0255$ days⋅log(people). Exponential model of flu rate on days: $y=e^{4.0233 + 0.0326x}$. $y(50)=284.8357$.\nLinear coefficient of determination of flu rate on days $r^2=0.9645$. Exponential coefficient of determination of flu rate on days $r^2=0.9982$. Thus, the exponential model explains a little bit better the evolution of the the flu rate with respect to the number of days.\n","date":1511740800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"74b1e63b5e7bd2e8817622bd9d83ab3a","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-11-27/","publishdate":"2017-11-27T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-11-27/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 27, 2017\nQuestion 1 The following diagram show the NO₂ emissions (𝜇g/m³) in Madrid during the month of October, 2017.\nThe European Standards on Air Quality establish that the average monthly value cannot be over 40 𝜇g/m³ for a healthy environment.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2017-11-27","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Nov 6, 2017\nQuestion 1 Adenoma is a benign tumor, which grows usually in spherical shape. Suppose the rate of growth of the radius of a certain adenoma is equal to half the size of the radius per second; compute the rate of growth of the volume of the tumor when the radius is 5mm.\nIf the measurement of the radius has a possible error of $\\pm 0.01$mm, what will be the error in the measurement of the volume?\nNote: The volume of a sphere of radius $r$ is equal to $\\frac{4}{3}\\pi r^3$.\nSolution Rate of growth of the volume: $250\\pi$ mm³/s.\nError in the volume: $\\pi$ mm³. Question 2 The weight of a baby during the first few months of life grows at a rate proportional to the reciprocal of the weight. Suppose a baby\u0026rsquo;s weight was 3.3 kg at birth, and 4.3 kg a month later.\nWhat will be the weight of the baby one year after birth? When will the weight be equal to 8 kg? Is this model of the weight good to determine the weight of a person during his whole life? Solution Let $t$ the time and $w(t)$ the weight of the baby at time $t$.\nDifferential equation: $w\u0026rsquo;=\\dfrac{k}{w}$\nParticular solution: $w(t)=\\sqrt{7.6t+10.89}$.\n$w(12)=10.1$ kg. At 7 months. No, because the function is always increasing. Question 3 The function $f(x,y)=ye^{-x^2-\\frac{1}{2}y^2}$ gives the quantity $z=f(x,y)$ of a substance during a chemical process, depending on the quantities $x$ and $y$ of two other substances.\nCompute the maximum value of $z$ assuming that $x\\geq 0$ and $y\\geq 0$. What will be the variation of $z$ at $x=1$ and $y=0$ when $x$ increases twice as much as $y$? Compute the second degree Taylor polynomial of $f$ at the point $(1,0)$. Solution $f$ has a local maximum at $(0,1)$ and the maximum value is $z=f(0,1)=1/\\sqrt{e}$. Directional derivative of $f$ at $(1,0)$ along the direction of $v=(2,1)$: $f\u0026rsquo;_v(1,0)=\\frac{1}{e\\sqrt{5}}$. $P^2_{f,(1,0)}(x,y)=\\displaystyle\\frac{-2xy+3y}{e}$. Question 4 Given $h(t)=(t\\cos(t), \\cos(t), \\ln(t^2+1)),$ compute the tangent line and normal plane to the trajectory determined by $h$ at the point $(0,1,0)$.\nSolution Tangent line: $(t,1,0)$. Normal plane: $x=0$. ","date":1509926400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"7c1f54c779298d9a2f22ac02e09e6e40","permalink":"/en/teaching/calculus/exams/pharmacy-2017-11-06/","publishdate":"2017-11-06T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2017-11-06/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Nov 6, 2017\nQuestion 1 Adenoma is a benign tumor, which grows usually in spherical shape. Suppose the rate of growth of the radius of a certain adenoma is equal to half the size of the radius per second; compute the rate of growth of the volume of the tumor when the radius is 5mm.","tags":["Exam"],"title":"Pharmacy exam 2017-11-06","type":"book"},{"authors":null,"categories":null,"content":"From now on there are available some cheat sheets for Calculus and Statistics. These cheat sheets contains a summary with the main formulas used in Calculus and Statistics.\nThe cheat sheets can be downloaded from the following links:\nCalculus cheat sheets Statistics cheat sheets I would appreciate if you inform me about any mistake that you detect in this sheets.\n","date":1509618110,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"0eaeafaa3a3d783c1ed1b386110fed16","permalink":"/en/post/cheat-sheet/","publishdate":"2017-11-02T10:21:50Z","relpermalink":"/en/post/cheat-sheet/","section":"post","summary":"From now on there are available some cheat sheets for Calculus and Statistics. These cheat sheets contains a summary with the main formulas used in Calculus and Statistics.\n","tags":null,"title":"Calculus and Statistics cheat sheets","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 02, 2017\nQuestion 1 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. In the sample of persons studied, it was measured the average daily time using some of these devices, and if the person had or not a cervical disc herniation (CDH). The table below summarizes the results.\n$$ \\begin{array}{crrr} \\hline \\mbox{Time (in min)} \u0026amp; \\mbox{People with CDH} \u0026amp; \\mbox{People without CDH} \u0026amp; \\mbox{Total}\\newline 0-60 \u0026amp; 2 \u0026amp; 32 \u0026amp; 34\\newline 60-120 \u0026amp; 5 \u0026amp; 86 \u0026amp; 91\\newline 120-180\t\u0026amp; 14 \u0026amp; 136 \u0026amp; 150\\newline 180-240\t\u0026amp; 21 \u0026amp; 127 \u0026amp; 148\\newline 240-300\t\u0026amp; 16 \u0026amp; 68 \u0026amp; 84\\newline 300-360\t\u0026amp; 10 \u0026amp; 12 \u0026amp; 22\\newline \\mbox{Total} \u0026amp; 68 \u0026amp; 461 \u0026amp; 529\\newline \\hline \\end{array} $$\nPlot the ogive of the global distribution of time (including people with CDH and without CDH). Plot the box plot of the global distribution of time and interpret it. In which sample there is less relative dispersion with respect to the mean, in people with CDH or in people without CDH? Which distribution is less symmetric, people with CDH or without CDH? Compute the standard score of a person with CDH that uses those devices 200 minutes a day and the same for a person without CDH. Interpret them. Use the following sums for the computations:\nPeople with CDH: $\\sum x_i=14640$, $\\sum x_i^2=3538800$, $\\sum(x_i-\\bar x)^3=-8746878.8927$.\nPeople without CDH: $\\sum x_i=78090$, $\\sum x_i^2=15650100$, $\\sum(x_i-\\bar x)^3=-3234289.0161$.\nSolution 2. People with CDH: $\\bar x=215.2941$ points, $s=75.4296$ points, $cv=0.3504$. People without CDH: $\\bar x=169.3926$ points, $s=72.4865$ points, $cv=0.4279$. Since the coefficient of variation of people with CDF less than the one of people without CDF, there is less relative spread with respect to the mean in de distribution of people with CDF. People with CDF: $g_1=-0.2997$.\nPeople without CDF: $g_1=-0.0184$.\nSince the coefficient of skewness of people with CDF is further from zero, the distribution is less symmetric.\nPerson with CDH: $z(200)=-0.2028$.\nPerson without CDH: $z(200)=0.4222$\nThe person with CDH has a value less than the mean but relatively closer to the mean than the person without CDH. Question 2 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. One goal of the research is determining if there is some relation between the average daily time using some of those devices and the number of cervical vertigo attacks in the last year. The table below shows the collected information in a sample of 12 persons.\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Time (min)} \u0026amp; 344 \u0026amp; 68 \u0026amp; 24 \u0026amp; 178 \u0026amp; 218 \u0026amp; 315 \u0026amp; 262 \u0026amp; 77 \u0026amp; 152 \u0026amp; 186 \u0026amp; 144 \u0026amp; 103\\newline \\mbox{Vertigo attacks} \u0026amp; 42 \u0026amp; 3 \u0026amp; 2 \u0026amp; 6 \u0026amp; 14 \u0026amp; 31 \u0026amp; 22 \u0026amp; 3 \u0026amp; 7 \u0026amp; 9 \u0026amp; 3 \u0026amp; 4\\newline \\hline \\end{array} $$\nWhich regression model is better to predict the number of vertigo attacks given the time using these devices, the linear or the exponential? Justify the answer. Use the best regression model (the exponential or the linear) to predict the number or vertigo attacks expected for a person that uses those devices 200 minutes every day. Which regression model would you use to predict the time using those devices required to have a number of vertigo attacks, the linear, the exponential or the logarithmic? Justify the answer. Use the following sums for the computations ($X$=Time and $Y$=Vertigo attacks):\n$\\sum x_i=2071$, $\\sum \\log(x_i)=59.3234$, $\\sum y_j=146$, $\\sum \\log(y_j)=24.2119$,\n$\\sum x_i^2=465587$, $\\sum \\log(x_i)^2=299.5558$, $\\sum y_j^2=3618$, $\\sum \\log(y_j)^2=60.1295$,\n$\\sum x_iy_j=38162$, $\\sum x_i\\log(y_j)=5252.95$, $\\sum \\log(x_i)y_j=800.3072$, $\\sum \\log(x_i)\\log(y_j)=127.0449$.\nSolution Linear regression model of vertigo attacks on time: $\\bar x=172.5833$ min, $s_x^2=9013.9097$ min².\n$\\bar y=12.1667$ attacks, $s_y^2=153.4722$ attacks².\n$s_{xy}=1080.4028$ min⋅attacks.\n$r^2 = 0.8438$.\nExponential regression model of vertigo attacks on time: $\\overline{\\log(y)}=2.0177$ log(attacks), $s_{\\log(y)}^2=0.9398$ log(attacks)². $s_{x\\log(y)}=89.5312$ min⋅log(attacks). $r^2 = 0.9462$.\nTherefore, the exponential regression model is better since its coefficient of determination is higher.\nExponential regression model of vertigo attacks on time: $y=e^{0.3035 + 0.0099x}$.\nNumber of vertigo attacks expected for 200 min usign electronic gadgets $y(200)=9.8747$.\nSince the exponential regression model is better than the linear one to predict the number of vertigo attacks as a function of time using electronic gadgets, to predict the time as a function of the number of vertigo attacks is better to use the inverse of the exponential regression model, that is, the logarithmic regression model.\nQuestion 3 Cervical radiculopathy occurs in 0.35% of men. The Spurling test is a test to diagnose cervical radiculopathy with a sensitivity of 95% and a specificity of 93%.\nCompute the positive and negative predictive values of the test and interpret them. Is this test a good test as a screening test (to rule out the disease)? Compute the minimum specificity of the test to be able to diagnose the cervical radiculopathy with a positive outcome. Solution $PPV=P(D|+)=0.0455$. $NPV=P(\\overline D|-)=0.9998$. It is a good screening test as the post test probability of not having the cervical radiculopathy for a negative outcome is very high. Minimum specificity $P(-|\\overline D)=0.9967$. Question 4 The haematocrit concentration in blood of healthy males follows a normal distribution with mean and standard deviation not known. However, it is known that the first quartile of haematocrit is 38.5% and the third quartile is 52%.\nCompute the mean and the standard deviation of haematocrit in healthy males. Compute the percentage of healthy males with more than 64 of haematocrit. Solution Naming $X$ to the haematocrit level in healthy males,\n$\\mu=45.25$ and $\\sigma=10.07$, thus, $X\\sim N(45.25, 10.07)$.\n$P(X\u0026gt;64)=0.0313$, thus, a $3.13$% of healthy males. Question 5 It is known that 20% of professional cyclists use Erythropoietin (EPO) to improve their physical performance, and 99% of the cyclists that use EPO, also use other forbidden substances to mask the use of EPO.\nIf there are 10 professional cyclists in a team, what is the probability that more than 2 are doped with EPO? If there are 100 professional cyclists doped with EPO in a competition, what is the probability that at least 98 of them had taken some substances to mask the use of EPO? If there are 2000 professional cyclists in a country, what is the probability that some of them has taken EPO without masking it? Solution Naming $X$ to the number of cyclists doped with EPO in a team with 10 cyclists, $X\\sim B(10,0.2)$ and $P(X\u0026gt;2)=0.3222$. Naming $Y$ to the number of cyclists that have taken some substances to mask th EPO in 100 cyclists doped with EPO, $Y\\sim B(100,0.99)$ and $P(Y\\geq 98)=0.9206$. Naming $Z$ to the number of cyclists that has taken EPO without masking it in 2000 cyclists, $Z\\sim B(2000,0.002)\\approx P(4)$ and $P(Z\u0026gt;0)=0.9817$. Question 6 The probability that an injury $A$ is repeated is 4/5, the probability that another injury $B$ is repeated is 1/2, and the probability that both injuries are repeated is 1/3. Compute the probability of the following events:\nOnly injury $B$ is repeated. At least one injury is repeated. Injury $B$ is repeated if injury $A$ has been repeated. Injury $B$ is repeated if injury $A$ has not been repeated. Are the injuries independent? Solution $P(B\\cap\\overline A)=1/6$. $P(A\\cup B)=29/30$. $P(B\\vert A)=5/12$. $P(B\\vert \\overline A)=5/6$. The injuries are dependent as $P(B|A)\\neq P(B)$. ","date":1496361600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"25d9a4ba8e47ff15b3867e60e0629e2f","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-06-02/","publishdate":"2017-06-02T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-06-02/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 02, 2017\nQuestion 1 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. In the sample of persons studied, it was measured the average daily time using some of these devices, and if the person had or not a cervical disc herniation (CDH).","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2017-06-02","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotheraphy\nDate: May 19, 2017\nProbability and random variables Question 1 The prevalence of sciatica in a population is 3%. The Lasegue\u0026rsquo;s test is a neurotension test that is used to diagnose the sciatica with a sensitivity of 91% and a specificity of 26%. On the other hand, there is an alternative test with a sensitivity of 80% and a specificity of 90%.\nCompute the positive predictive value for the Lasegue\u0026rsquo;s test. Assuming that the tests are independent, compute the probability of having a positive outcome in both tests. Compute the probability of getting a wrong diagnose in the Lesegue\u0026rsquo;s test or in the alternative test. Which test is better as a screening test (to rule out the sciatica)? Solution $PPV=P(D|+)=0.0366$. It is not a goot test to confirm the sciatica as the post test probability of having the sciatica for a positive outcome is very low. Naming $L⁺$ to the event of having a positive outcome in Lasegue\u0026rsquo;s test and $A⁺$ to the event of having a positive outcome in the alternative test: $P(L^+\\cap A^+)=P(L^+)P(A^+)=0.7451\\cdot 0.121 = 0.0902$. Naming $WL$ to the event of having a wrong diagnose with Lasegue\u0026rsquo;s test and $WA$ to the event of having a wrong diagnose with the alternative test: $P(WL\\cup WA)=P(WL)+P(WA)-P(WL\\cap WA)=0.7205+ 0.103-0.7205\\cdot0.103=0.7493$. Lesegue test: $NPV=P(\\overline D|-)=0.9894$. Alternative test: $NPV=P(\\overline D|-)=0.9932$. Thus, the alternative test is better to rule out the sciatica. Question 2 A physiotherapist opens a clinic and use the social networks to advertise it. In particular he send a friend request to 20 contacts on Facebook. If the probability that a Facebook user accept the friend request is 80%, what is the probability that more than 18 accept the friend request? What is the expected number of friend requests accepted?\nSolution Naming $X$ to the number of accepted friend request, $X\\sim B(20,0.8)$ and $P(X\u0026gt;18)=0.0692$. The expected number of accepted friend request is $16$. Question 3 According to a study of the Information Society of Spain in 2013, the spanish checks the mobile phone 150 times a day in average. What is the probability that a spanish person checks the mobile phone more than 2 times an hour?\nSolution Naming $X$ to the number of times that a spanish person checks the phone in an hour, $X\\sim P(6.25)$ and $P(X\u0026gt;2)=0.9483$. Question 4 The the cervical rotation in a population follows a normal probability distribution model with mean 58º and standard deviation 6º.\nBetween what values are the cervical rotation of the central 50% of the population? Taking into account the precision of the measurement instrument, a goniometer, a rotation less than 53º is considered a mobility limitation. If we take a random sample of 100 persons from this population, what is the expected number of persons with mobility limitation in the sample? Solution Naming $X$ to the cervical rotation, $X\\sim N(58, 6)$.\n$(Q1,Q3)=(53.9531, 62.0469)$. $P(X\u0026lt;53)=0.2023$ and the expected number of persons with mobility limitation in a sample of 100 persons is $20.2328$. ","date":1495152000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"80f5e75bcf897805a50ba29cbb3590f3","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-05-19/","publishdate":"2017-05-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-05-19/","section":"teaching","summary":"Degrees: Physiotheraphy\nDate: May 19, 2017\nProbability and random variables Question 1 The prevalence of sciatica in a population is 3%. The Lasegue\u0026rsquo;s test is a neurotension test that is used to diagnose the sciatica with a sensitivity of 91% and a specificity of 26%.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2017-05-19","type":"book"},{"authors":null,"categories":null,"content":"In this post I offer some advice or tips that can help undergraduate students, especially during their first years, to be successful making the most of classes and learning to maximize their work.\nI give this pieces of advice from a deep teaching background and from my own experience as a student.\nMost of this tips could seem obvious but many students fail to put them into practice.\nSelf Motivation {: .pull-right}\nMotivation is the key for success in every project, not only in your academic education, but also in your professional life. Without motivation you are likely to give up when you face the first learning difficulties or trouble, and there will be for sure. Therefore, you should bear in mind the reasons that made you start this way (especially when you have difficulties). That is, try to walk with an eye in the goal and the other in the path.\nA grade require a lot of effort, but you must think that many people, even with less capabilities than you, have finished it successfully.\nBe proactive This is the big challenge because in primary or secondary school students got used to follow the steps of the teacher, and is the teacher who usually drives the learning process. But higher education is not compulsory and here is the student who has take the initiative driving his/her learning.\nWhat does it mean to be proactive Explore the subject by yourself. Prepare classes in advance. Try to solve problems by yourself. Expand the information given by the teacher from other sources. Try to apply what you learn to your life or context. Let your curiosity and creativity loose. The teacher is your ally Unfortunately some times students think about the teacher as a judge or an enemy. But nothing far from true, because the main main purpose of the teacher is to help you in your learning process. You have to change your mind and think about your teacher as your ally. Thus, do not hesitate about ask him or her for help any time you need it.\nMake the most of your classes Classroom sessions are not the only way to learn, but they are one of the most effective. Attend to class just to sign the attendance list could be counterproductive. Thus, if you attend a class, forget about other subjects and try to put all your senses in the topic that the class is about.\nAlso, you need to know how make a proper use of every type of classs. A master class tries to give you a general idea about a topic, highlighting the most important concepts. So, try to save the key ideas and do not worry about the details.\nA seminar, on the other hand, tries to develop a topic in deep. So it is a good idea to prepare the class in advance having a prior look at the topic. This way you can focus on the most difficult aspects of the topic during the class.\nFinally, in a problem-solving workshop its important trying to solve the problems before in order to find out the main difficulties or bottle necks.\nAlso, you should ask your teacher to flip the class.\nWhat is a flipped classroom? The flipped classroom describes a reversal of traditional teaching where students gain first exposure to new material outside of class, usually via reading or lecture videos, and then class time is used to do the harder work of assimilating that knowledge through strategies such as problem-solving, discussion or debates.\nDo not be afraid to ask Related to the previous item, the interaction among the students and between the theacher and the students is essential to take advantage of classes. The teacher can not guess if you have understood a concept or are lost unless you give some feedback to him or her. So, try to overcome your shyness and ask without any fear of ridicule, because most of the time our doubts are gone to be the same for your classmates, and even if not, remember that the only stupid question is the one that you don\u0026rsquo;t ask.\nFeedback is an essential aspect to ensure solid progress in any learning process.\nTake advantage of tutoring One of the main advantages of a private university like this is the availability and closeness of the teachers. If it is clear that the teacher is your ally, you should not hesitate to use tutorials whenever you have a trouble.\nEach student has a personal tutor whose function is to inform and advise him or her and to resolve any academic question that arise, from the course registration until the exams. If you do not know who is your tutor, ask for him or her at the secretariat and arrange an interview with him or her as soon as possible.\nAny time you have a trouble that you do not know how to solve, ask your tutor for help. Even if you do not have any trouble, is a good practice to have regular meetings with your tutor to check the course progress.\nOn the other hand, any subject has its own tutoring hours. Those tutorials are usually at the office of the subject teacher.\nUse tutoring hours for \u0026hellip;\nGetting advise from the teacher about how to face the subject or study a topic. Reviewing concepts that you do not understand (especially if you missed a class). Getting help to solve difficult problems. Reviewing your test or exams. But \u0026hellip;\nTutoring hours are not private classes. This means that you have to go to tutorials with a specific doubt or problem and it requires some previous work about the question. Respect the tutoring schedule. Most of the teachers promote tutorials not only to help students, but also because are an effective way to interact with the student and to know each other better. However, the teachers usually have other occupations in addition to teaching. So, in order to not interrupt their work, try not to go to tutorials outside their schedule. If you can not go to a tutorial at that hours, ask the teacher for an appointment. Read and write Another key for success is to have good documentation that complements what is seen in class. Many students believe that the only important information to consider is that provided by the teacher. But this is a mistake, because the information provided by the teacher is limited, incomplete, and sometimes wrong (unfortunately the teachers commit also mistakes). So, try to complement the class notes with recommended readings, because a good documentation can help you not only to contrast what is seen in class with other sources, but also to understand better what has been explained, and to expand it with new examples, discovering new applications, etc.\nFor each subject you should have two or three reference books. You have some recommendations in the bibliography of the course guides. Have a look to those books (most of them are available at the university library), but if you do not feel comfortable with none of them, ask the teacher for others books or try to find them by yourself on Internet, because each student has to find out his or her book!\nOn the other hand, it is very important, for structuring and settling the main ideas about a subject, writing about it. There is a clear evidence that writing about a subject helps to helps you to organize, clarify and fix the ideas in your mind.\nIs helpful to write summaries and schemas with the key ideas of a class or a topic. Another good practice is to make smalls presentations for every chapter or topic.\nFinally, you should take into account that most assessment tests are written, so writing about the subject is a good training for exams.\nWork in groups Group work is important, not only from the point of view of learning, but also because it helps to develop social habits. In most jobs it is usual to work in teams, since some tasks or problems are simplified and easier to solve working cooperatively. Something similar happens with teaching, since collaborative learning is usually faster and better (and more fun).\nIn this way, working in groups have many advantages:\nHelps you to develop social skills. Reinforce the motivation being part of a team. Enrich the learning with different points of view. Helps to develop a critical reasoning. But working in group is not easy (especially when somebody does not assume his/her responsibility). Acquiring the skills to work in group requires time and practice, so the sooner you start the better.\nGet organized Review regularly the progress of the course It is important during the course to set some moments to analyze the progress of your work and to review the path travelled and the path ahead. These reviews are intended to make a small self-assessment on the learning process. To what extent the objectives set are being met and if we are being faithful to the guidelines set by this survival manual. Thereby, we can identify the main difficulties in learning and what is failing in order to correct the course in time. In the worst case, if we fail to pass any subject, it is very important to make a final review trying to identify the causes to learn from mistakes not to repeat them in the future. Do not be discouraged and think that even if you have not passed the subject, for sure you have not completely waste your time. Be positive and think about everything you have learned that will undoubtedly provides you a valuable experience for your life.\nReview regularly this tips Finally, do not forget to review and refresh these tips.\n","date":1493018510,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"7178c1061630403e50e1fc59e98c026d","permalink":"/en/post/survival-guide-for-degrees/","publishdate":"2017-04-24T09:21:50+02:00","relpermalink":"/en/post/survival-guide-for-degrees/","section":"post","summary":"In this post I offer some advice or tips that can help undergraduate students, especially during their first years, to be successful making the most of classes and learning to maximize their work.\n","tags":null,"title":"Survival guide for degrees","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy","Medicine"],"content":"Degrees: Physiotherapy, Medicine\nDate: March 31, 2017\nQuestion 1 The table below gives the distribution of points obtained by students in a physiotherapy public competition this year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.50 \u0026amp; 638140781.25\\newline (40,80] \u0026amp; 185 \u0026amp; 11100 \u0026amp; 666000 \u0026amp; -361328.13 \u0026amp; 4516601.56\\newline (80,120] \u0026amp; 72 \u0026amp; 7200 \u0026amp; 720000 \u0026amp; 1497375.00 \u0026amp; 41177812.50\\newline (120,160] \u0026amp; 40 \u0026amp; 5600 \u0026amp; 784000 \u0026amp; 12301875.00 \u0026amp; 830376562.50\\newline (160,200] \u0026amp; 19 \u0026amp; 3420 \u0026amp; 615600 \u0026amp; 23603640.63 \u0026amp; 2537391367.19\\newline \\hline \\sum \u0026amp; 400 \u0026amp; 29000 \u0026amp; 2819200 \u0026amp; 24886500.00 \u0026amp; 4051603125.00\\newline \\hline \\end{array} $$\nCompute the interquartile range and explain your result. Are there outliers in the sample? The minimum number of points to pass the exam is 150; what percentage of students passed the exam? If the mean of the score of the previous year exam was 80 points and the standard deviation was 52 points, which year is the mean more representative? Justify the answer. According to the values of skewness and kurtosis, can we assume that the sample has been taken from a normally distributed population? What score is relatively higher, 150 points in this year exam or 160 in the previous year exam? Justify the answer. Solution $Q_1=43.48$ points, $Q_3=97.78$ points and $IQR=54.3$ points. Fences: $F_1=-37.97$ points and $F_2=179.23$ points. Thus, there are outliers. $F_{150}=0.925$, so the percentage of students that passed the exam is $7.5%$. This year: $\\bar x=72.5$ points, $s^2=1791.75$ points², $s=42.3291$ points, $cv=0.5838$. Previous year: $\\bar x=80$ points, $s=52$ points, $cv=0.65$. As the coefficient of variation of this year is less than the one of the previous year, there is less relative spread this year and the mean is more representative. $g_1=0.8203$, so the distribution is right-skewed. $g_2=0.1551$, so the distribution is a little bit more peaked than a bell curve (leptokurtic). As $g_1$ and $g_2$ are between -2 and 2 we can assume that the sample has been taken from a normaly distributed population. This year standard score: $z(150)=1.83$. Previous year standard score: $z(160)=1.53$. As the standard score of 150 this year is greater than the standard score of 160 the previous year, 150 points this year is relatively higher than 160 points the previous year. Question 2 A study try to determine the relation between obesity and the response to pain. The obesity is measured as the percentage over the ideal weight ($X$), and the response to pain with a measure of the twinge sensation. For a sample of 10 individuals we got the following sums:\n$\\sum x_i=737$, $\\sum y_j=77$, $\\sum x_i^2=55589$, $\\sum y_j^2=799.5$, $\\sum x_iy_j=6056.5$\nCompute the linear regression model of the response to pain on the obesity. What is the change in the response to pain for an increment of one point in the weight? What percentage of the variability of the response to pain does not explain the linear regression model? Taking into account the parameters of the exponential model given in the table below, give the equation of the exponential model. Which transformation is required to convert this model into a linear one? $$ \\begin{array}{lr} \\hline \\mbox{Coefficient} \u0026amp; \\mbox{Estimation}\\newline \\mbox{Intercept} \u0026amp; -1.772\\newline x \u0026amp; 0.049\\newline \\hline \\end{array} \\qquad \\begin{array}{r} \\hline R^2\\newline 0.72\\newline \\hline \\end{array} $$\nWhat is the expected response to pain for an obesity of 50% according to the linear model? And according to the exponential model? Which prediction is more reliable? Solution Linear model of response to pain on obesity: $\\bar x=73.7$, $s_x^2=127.21$. $\\bar y=7.7$, $s_y^2=20.66$. $s_{xy}=38.16$ Regression line of pain relief on obesity: $y=-14.41+0.3x$. For each increment of one unit in the obesity the response to pain will increase 0.3 units. Linear coefficient of determination: $r^2=0.554$. So, the linear model explains the 55.4% of the variability of the response to pain and it does not explain the remaining 44.6%. Exponential regression model: $y=e^{-1.772+0.049x}$. To compute this model you have to apply the logarithm to the dependen variable, that is, the response to pain and then compute the regression line of the logarithm of the response to pain on obesity. Prediction with the linear model: $y(50)=0.59$ Prediction with the exponential model: $y(50)=1.9699$ The prediction with the exponential model is better as the exponential coefficient of determination is greater than the linear one. ","date":1490918400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"3ac241dbb67a86fa4f23770cd0210b3d","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-03-31/","publishdate":"2017-03-31T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-03-31/","section":"teaching","summary":"Degrees: Physiotherapy, Medicine\nDate: March 31, 2017\nQuestion 1 The table below gives the distribution of points obtained by students in a physiotherapy public competition this year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2017-03-31","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 10, 2017\nQuestion 1 The rate of growth of certain bacteria population is the square root of the number of bacteria in the population. How much will the population have increased after 1 hour from the beginning of the growth? How long will it take until the population is four times the population at the beginning?\nSolution Naming $x$ to the number of bacteria and $t$ to time, $x(t)=(\\frac{t}{2}+C)^2$.\nThe number of bacteria has increased $\\frac{1}{4}+C$ after 1 hour from the beginning.\nThe number of bacteria is four times the population at the beginning at time $t=2C$. Question 2 The temperature of a chemical process depends on the amounts $x$ and $y$ of two substances, according to the function $T(x,y)=4x^3+y^3-3xy$. Determine the local extrema and saddle points of the temperature function (recall that the amounts $x$ and $y$ cannot be negative).\nSolution $T$ has a saddle point at $(0,0)$ and a local minimum at $(\\frac{\\sqrt[3]{4}}{4},\\frac{\\sqrt[3]{2}}{2})$. Question 3 An ecological model explains the number of individuals in a population through the function $$f(x,y)=\\dfrac{e^t}{x},$$ where $t$ is the time and $x$ the number of predators in the area. Give an approximation of the number of individuals at $t=0.1$ and $x=0.9$ using the second order Taylor polynomial of function at point $(1,0)$.\nSolution Second order Taylor polynomial of $f$ at point $(1,0)$: $P^2_{f,(1,0)}(x,y)=3-3x+2t+x^2+\\frac{t^2}{2}-xt$.\n$P^2_{f,(1,0)}(0.9,0.1)=1.225$. Question 4 The position of a moving object in space is given by the function $f(t)=(e^{t/2}, \\sin^2(t), \\sqrt[3]{1-t})$.\nCompute the velocity and acceleration vectors at time $t=0$.\nRemark: velocity is the variation of space with respect to time, and acceleration is the variation of velocity with respect to time. Compute an equation of the plane normal to the trajectory at time $t=0$. Solution $f\u0026rsquo;(t)=(\\frac{e^{t/2}}{2},2\\sin t \\cos t, \\frac{-(1-t)^{-2/3}}{3})$ and $f\u0026rsquo;(0)=(\\frac{1}{2},0,-\\frac{1}{3})$. $f\u0026rsquo;\u0026rsquo;(t)=(\\frac{e^{t/2}}{4},2(\\cos^2 t-\\sin^2 t), \\frac{-2(1-t)^{-5/3}}{9})$ and $f\u0026rsquo;\u0026rsquo;(0)=(\\frac{1}{4},2,-\\frac{2}{9})$. Normal plane to the trajectory at time $t=0$: $3x-2z=1$. ","date":1484006400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"b0db71b6201932c9f355cf651505de25","permalink":"/en/teaching/calculus/exams/pharmacy-2017-01-10/","publishdate":"2017-01-10T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2017-01-10/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 10, 2017\nQuestion 1 The rate of growth of certain bacteria population is the square root of the number of bacteria in the population. How much will the population have increased after 1 hour from the beginning of the growth?","tags":["Exam"],"title":"Pharmacy exam 2016-01-10","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 10, 2017\nDescriptive Statistics and Regression Question 1 The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.\n$$ \\begin{array}{cr} \\hline \\mbox{Time} \u0026amp; \\mbox{Patients}\n(0,10] \u0026amp; 22\\newline (10,20] \u0026amp; 43\\newline (20,30] \u0026amp; 33\\newline (30,40] \u0026amp; 12\\newline (40,50] \u0026amp; 6\\newline (50,60] \u0026amp; 4\\newline \\hline \\end{array} $$\nPlot the ogive of the waiting time. Compute the median of the distribution, and explain its meaning. What percentage of patients have waited for longer than 38 minutes? Solution 2. $Me=18.89$ min. 3. 10% of patients have waited for longer than 38 minutes. Question 2 To study fertility in two different populations $A$ and $B$, a sample of each population was taken and the number of pregnancies for each woman was recorded. The results of such records are shown below.\n$$ \\begin{array}{ccccccccccccccccc} \\hline A \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; 2 \u0026amp; 6 \u0026amp; 1 \u0026amp; 5 \u0026amp; 3 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; 2 \u0026amp; 5 \u0026amp; 0\\newline B \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 0 \u0026amp; 3 \u0026amp; 0 \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 5 \u0026amp; 1 \u0026amp; 1 \u0026amp; 1\\newline \\hline \\end{array} $$\nDraw the box diagram of each sample and compare them. In which of the two samples is the mean more representative? Justify your answer. Compute the skewness coefficient for both samples; which one is more skewed? What is relatively bigger, a case of 5 pregnancies in sample $A$, or a case of 3 pregnancies in sample $B$? Consider the following sums for your computations:\n$\\sum a_i=51$, $\\sum a_i^2=199$, $\\sum (a_i-\\bar a)^3=-11.6016$, $\\sum (a_i-\\bar a)^4=217.9954$,\n$\\sum b_i=20$, $\\sum b_i^2=52$, $\\sum (b_i-\\bar b)^3=49.5$, $\\sum (b_i-\\bar b)^4=220.3125$.\nSolution 2. $\\bar a=3.1875$ pregnancies, $s_a^2=2.2773$ pregnancies², $s_a=1.5091$ pregnancies, $cv_a=0.4734$. $\\bar b=1.25$ pregnancies, $s_b^2=1.6875$ pregnancies², $s_b=1.299$ pregnancies, $cv_b=1.0392$. As the coefficient of variation of $A$ is less than the coefficient of variation of $B$, the mean of population $A$ is more representative than the mean of population $B$. 3. $g_{1,a}=-0.211$ and $g_{1,b}=1.4113$, so the distribution of $B$ is more skewed than the distribution of $A$. 5. $z_a(5)=1.2011$ and $z_b(3)=1.3472$, so 3 pregnancies is relatively bigger in population $B$ than 5 pregnancies in population $A$. Question 3 A study to find the relation between the reduction in cholesterol levels in blood and exercise has been carried out. The results are shown in the table below.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Minutes of exercise} \u0026amp; 96 \u0026amp; 106 \u0026amp; 163 \u0026amp; 207 \u0026amp; 227 \u0026amp; 244 \u0026amp; 261 \u0026amp; 271 \u0026amp; 272 \u0026amp; 301\\newline \\mbox{Cholesterol reduction (mg/dl)} \u0026amp; 4 \u0026amp; 5 \u0026amp; 8 \u0026amp; 13 \u0026amp; 15 \u0026amp; 17 \u0026amp; 22 \u0026amp; 39 \u0026amp; 31 \u0026amp; 45\\newline \\hline \\end{array} $$\nWhich regression models explains better the reduction of cholesterol as a function of the exercise time, the linear o the exponential? Justify the answer. According to the linear regression model, how much will be the reduction in cholesterol when the exercise time is increased by one minute? According to the logarithmic model, how much exercise time is needed to get a reduction of cholesterol of 100 mg/dl? Is this estimation reliable? Justify your answer. Consider the following values for your computations, where $X$=exercise time in minutes, and $Y$=cholesterol reduction: $\\sum x_i=2148$, $\\sum \\log(x_i)=53.0559$, $\\sum y_j=199$, $\\sum \\log(y_j)=27.1766$,\n$\\sum x_i^2=507082$, $\\sum \\log(x_i)^2=282.9578$, $\\sum y_j^2=5779$, $\\sum \\log(y_j)^2=80.035$,\n$\\sum x_iy_j=50750$, $\\sum x_i\\log(y_j)=6359.0468$, $\\sum \\log(x_i)y_j=1097.978$, $\\sum \\log(x_i)\\log(y_j)=147.0682$.\nSolution 1.Linear regression model of cholesterol reduction on exercise time: $\\bar x=214.8$ min, $s_x^2=4569.16$ min². $\\bar y=19.9$ mg/dl, $s_y^2=181.89$ (mg/dl)². $s_{xy}=800.48$ min⋅mg/dl. $r^2 = 0.771$. Exponential regression model of cholesterol reduction on exercise time: $\\overline{\\log(y)}=2.7177$ log(mg/dl), $s_{\\log(y)}^2=0.6178$ log(mg/dl)². $s_{x\\log(y)}=52.1504$ min⋅log(mg/dl). $r^2 = 0.9635$. Therefore, the exponential regression model is better since its coefficient of determination is higher. 2. Regression line of cholesterol reduction on exercise time: $y=-17.7312 + 0.1752x$. The cholesterol reduction increases 0.1752 mg/dl when the exercise time is increased by one minute. 3. Logarithmic regression model of exercise time on cholesterol reduction: $x=-14.6075 + 84.4135\\log(y)$. $x(100)=374.131$ min. Despite the coefficient of determination is pretty close to 1, the estimation is not reliable since 100 mg/dl is far away from the range of values in the sample. Probability and random variables Question 4 The medical emergency services of a town gets 6 requests per day in average. This service is staffed with three shifts of 8 hours each.\nCompute the probability of getting more than 3 requests in an 8 hours shift. Compute the probability that in some of the three shifts there are no requests. Solution Naming $X$ to the number of requests in an 8 hours shift, $X\\sim P(2)$ and $P(X\u0026gt;3)=0.1429$. Naming $Y$ to the number of shifts with no requests, $Y\\sim B(3,0.1353)$ and $P(Y\u0026gt;0)=0.3535$. Question 5 The prevalence on certain disease in a population is 10%. A diagnosis test for that disease has a sensitivity of 95% and a specificity of 85%.\nCompute the positive and negative predictive values and explain the result obtained. What is the test more useful for, to detect the disease or to rule it out? What should be the specificity of the test so that the test has a positive predictive value equal to 80%? Solution $PPV=P(D|+)=0.413$ and $NPV=P(\\overline D|-)=0.9935$. The specificity should be $97.37%$. Question 6 In a study of blood pressure on 8000 individuals, it has been recorded that 2254 people show readings of blood pressure above 130 mmHg, and 3126 individuals show readings between 110 and 130 mmHg. Assume that blood pressure is normally distributed.\nCompute the mean and standard deviation (of blood pressure). Readings above 140 mmHg are considered to be a high pressure problem. How many people in the group have such pressure problem? A test will flag a blood pressure problem if the reading of a patient pressure is in the bottom 5% or in the top 5% of the results for the population. For what values of the blood pressure is an individual in the population considered normal? Solution Naming $X$ to the blood pressure, $X\\sim N(118.723, 19.5221)$. $P(X\u0026gt;140)=0.1379$ and there are $1103.0473$ persons with high pressure. The blood pressure is normal in the interval $(86.612, 150.8341)$. Question 7 Students in a Chemistry class need to take two exams in order to pass the subject. The percentage of students that passed the midterm were 60% for the first exam, and 68% for the second. We also have that 80% of the students that passed the first midterm also passed the second midterm. A student from the class is picked randomly.\nCompute the probability that the student has failed both exams. Compute the probability that the student has passed the first exam if we know that she has failed the second exam. Solution Naming $E_1$ tho the event of passing the first exam and $E_2$ to the event of passing the second exam:\n$P(\\overline E_1\\cap \\overline E_2)=0.2$. $P(E_1|\\overline E_2)=0.375$. ","date":1484006400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"248fdb079a6a3e75e0a16c9db7579175","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-01-10/","publishdate":"2017-01-10T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-01-10/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 10, 2017\nDescriptive Statistics and Regression Question 1 The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2017-01-10","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 28, 2016\nQuestion 1 The table below gives the distribution of points obtained by students in the MIR exam last year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.50 \u0026amp; 638140781.25\\newline (40,80] \u0026amp; 185 \u0026amp; 11100 \u0026amp; 666000 \u0026amp; -361328.13 \u0026amp; 4516601.56\\newline (80,120] \u0026amp; 72 \u0026amp; 7200 \u0026amp; 720000 \u0026amp; 1497375.00 \u0026amp; 41177812.50\\newline (120,160] \u0026amp; 40 \u0026amp; 5600 \u0026amp; 784000 \u0026amp; 12301875.00 \u0026amp; 830376562.50\\newline (160,200] \u0026amp; 19 \u0026amp; 3420 \u0026amp; 615600 \u0026amp; 23603640.63 \u0026amp; 2537391367.19\\newline \\hline \\sum \u0026amp; 400 \u0026amp; 29000 \u0026amp; 2819200 \u0026amp; 24886500.00 \u0026amp; 4051603125.00\\newline \\hline \\end{array} $$\nCompute the interquartile range and explain your result. Are there outliers in the sample? The minimum number of points to pass the exam is 150; what percentage of students passed the exam? Study the representativity of the mean. According to the values of skewness and kurtosis, can we assume that the sample has been taken from a normally distributed population? Compute the standardized points of a student that got 150 points in the MIR. Solution $Q_1=43.48$ points, $Q_3=97.78$ points and $IQR=54.3$ points. Fences: $F_1=-37.97$ points and $F_2=179.23$ points. Thus, there are outliers. $F_{150}=0.925$, so the percentage of students that passed the exam is $7.5%$. $\\bar x=72.5$ points, $s^2=1791.75$ points², $s=42.3291$ points, $cv=0.5838$. As the coefficient of variation is greater than 0.5 but not too much there is a moderate variability and the mean is moderately representative. $g_1=0.8203$, so the distribution is right-skewed. $g_2=0.1551$, so the distribution is a little bit more peaked than a bell curve (leptokurtic). As $g_1$ and $g_2$ are between -2 and 2 we can assume that the sample has been taken from a normaly distributed population. $z(150)=1.83$. Question 2 The table show the data of the GDP (Gross Domestic Product) per capita (thousands of euros) and infant mortality (children per thousand) from 1993 till 2000.\nYear GDP Mortality 1993 17 6.0 1994 17 5.6 1995 18 5.2 1996 18 4.9 1997 19 4.6 1998 20 4.3 1999 21 4.1 2000 22 4.0 Estimate the value of the GDP for an infant mortality of 3.8 children per thousand using the linear regression model. Which regression model explains better the GDP as a function of the infant mortality, a linear model or an exponential one? If we assume that the GPD per capita in year 2001 was 23 thousand €, what will be the expected infant mortality, according to the exponential regression model? Consider the linear models of GDP on infant mortality, and infant mortality on GDP; which of the two is more reliable? Use the following sums for the computations ($X$=GDP and $Y$=Infant mortality): $\\sum x_i=152$, $\\sum \\log(x_i)=23.5229$, $\\sum y_j=38.7$, $\\sum \\log(y_j)=12.5344$, $\\sum x_i^2=2912$, $\\sum \\log(x_i)^2=69.2305$, $\\sum y_j^2=190.87$, $\\sum \\log(y_j)^2=19.7912$, $\\sum x_iy_j=726.5$, $\\sum x_i\\log(y_j)=236.3256$, $\\sum \\log(x_i)y_j=113.3308$, $\\sum \\log(x_i)\\log(y_j)=36.76$. Solution Linear model of GDF on infant mortality: $\\bar x=19$ 10³€, $s_x^2=3$ 10⁶€. $\\bar y=4.8375$ children per thousand, $s_y^2=0.4573$ (children per thousan)². $s_{xy}=-1.1$ 10³€⋅children per thousand. Regression line of GDP on infant mortality: $x=30.6351 + -2.4052y$. $x(3.8) =21.4954$.\n$\\overline{\\log(x)}=2.9404$ log(10³€), $s_{\\log(x)}^2=0.0081$ log(10³€)². $s_{\\log(x)y}=-0.0577$ log(10³€)•children per thousand. Linear coefficient of determination of GDP on infant mortality $r^2=0.8819$. Exponential coefficient of determination of GDP on infant mortality $r^2=0.9002$. Thus, the exponential model explains a little bit better the relation between the GDP and the infant mortality.\n$\\overline{\\log(y)}=1.5668$ log(children per thousand), $s_{\\log(y)}^2=0.019$ log(children per thousand)². $s_{x\\log(y)}=-0.2284$ 10³€⋅log(children per thousand). Exponential model of infant mortality on GDP: $y=e^{3.0135 + -0.0761x}$. $y(23)=3.5332$.\nThe reliability of both models is the same as they have the same coefficient of determination.\nQuestion 3 Consider two variables $X$ and $Y$. Assume that the regression lines of the linear models intersect at the point $(2,3)$, and that, according to the appropriate linear model, the expected value of $Y$ for $x=3$ is $y=1$. How much will $Y$ change, according to the linear model, when $X$ increases by one unit?\nIf the coefficient of linear correlation is $-0.8$, how much will $X$ change, according to the linear model, when $Y$ increases by one unit?\nSolution $\\bar x=2$ and $\\bar y=3$. $b_{yx}=-2$, so $Y$ decreases 2 units when $X$ increases by one unit. $b_{xy}=-0.32$, so $X$ decreases 0.32 units when $Y$ increases by one unit. ","date":1480291200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"5b5f6b16255621541eab98497d33e33c","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2016-11-28/","publishdate":"2016-11-28T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2016-11-28/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 28, 2016\nQuestion 1 The table below gives the distribution of points obtained by students in the MIR exam last year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2016-11-28","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Nov 7, 2016\nQuestion 1 The amount $x$ and $y$ in mg of two compounds in a certain chemical reaction are related by the following equation: $$ \\log(\\sqrt{x^2+y^2}) = y. $$\nCompute the equations of the tangent and normal lines to the graph of $y$ as a function of $x$ at the point $(1,0)$. Compute the approximate change of the amount $y$ if $x$ changes by 2mg, from the same point $(1,0)$. Solution Tangent line: $y=x-1$.\nNormal line: $y=-x+1$. $\\Delta y\\approx 2$ mg. Question 2 The temperature at a point $(x,y,z)$ in three-dimensional space is given by the following function: $$ T(x,y,z)= \\frac{e^{xy}}{z} $$\nSuppose we are position at $(1,1,1)$.\nIn which direction will the temperature decrease the fastest? What will be the rate of that decrease? What is the meaning of your result? Compute the directional derivative in the direction where $y$ increases twice as much as $x$, and $z$ increases half of $x$. What is the meaning of your result? Solution $-\\nabla f(1,1,1)=(-e,-e,e)$. The rate of decrease is $\\sqrt{3}e$. Taking the vector $\\mathbf{u}=(1,2,1/2)$, $f_{\\mathbf{u}}\u0026rsquo;(1,1,1)=5e/\\sqrt{21}$. This means that for each unit in the direction of the vector $(1,2,1/2)$ the function will increase $5e/\\sqrt{21}$ units. Question 3 Allometric growth refers to relationships between sizes of different parts of an organism. Suppose $x(t)$ and $y(t)$ are the size of two organs in an organism of age $t$; then the allometric relationship is given by the equation: $$ \\frac{1}{y}\\frac{dy}{dt} = k \\frac{1}{x}\\frac{dx}{dt}, $$ where $k$ is a positive constant.\nCompute the differential equation that explains $y$ as a function of $x$ (that is, take $x$ as the independent variable and $y$ as the dependent one). Solve the equation for $y$. Assume $y$ denotes the mass of a cell, and $x$ its volume, with $k=0.0794$, compute $y$ as a function of $x$ if $x=1000\\ \\mu$m$^3$ at the age at which $y$ is equal to 1 ng. Solution Differential equation: $y\u0026rsquo;=k\\dfrac{y}{x}$.\nGeneral solution: $y=cx^k$. Particular solution: $y=0.5778 x^{0.0794}$. Question 4 Find the local extrema and saddle points of the function $f(x,y)=e^y(y^2-x^2)$.\nSolution $f$ has a saddle point a $(0,0)$ and a local maximum at $(0,-2)$. ","date":1478476800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ee35cae8cf6a9ad706be7e4918064a0b","permalink":"/en/teaching/calculus/exams/pharmacy-2016-11-07/","publishdate":"2016-11-07T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2016-11-07/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Nov 7, 2016\nQuestion 1 The amount $x$ and $y$ in mg of two compounds in a certain chemical reaction are related by the following equation: $$ \\log(\\sqrt{x^2+y^2}) = y.","tags":["Exam"],"title":"Pharmacy exam 2016-11-07","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: June 23, 2016\nQuestion 1 It is believed that the age at which a person finish their first marathon depends on gender. To check it, a sample of 180 marathon runners was drawn. For every runner it was recorded the gender, the age (in years) when they finish the first marathon and if they finish with tendinitis. The data are summarized in the table below.\nMales\u0026nbsp; Females Age Finished With tendinitis \u0026nbsp; Finished Width tendinitis (10,20] 7 2 \u0026nbsp; 3 1 (20,30] 35 12 \u0026nbsp; 22 5 (30,40] 30 6 \u0026nbsp; 29 4 (40,50] 15 2 \u0026nbsp; 22 3 (50,60] 9 1 \u0026nbsp; 3 0 (60,70] 4 0 \u0026nbsp; 1 0 Calculate the average age at which it is finished the first marathon, both of males and females. Which mean is more representative? Justify the answer.\nCalculate the interquartile range of the age for the joint distribution (joining males and females) and interpret it.\nWhat age distribution is more asymmetric, males or females distribution. Justify the answer.\nTaking into account the relative spread in each group, who finished a marathon before, a man that finished his first marathon at the age of 48 or a woman that finished her first marathon at the age of 47? Justify the answer.\nUsing frequencies to approximate probabilities, compute the following probabilities:\nProbability that a runner finish their first marathon with tendinitis. Probability that a man 40 or less years old finish their first marathon with tendinitis. Probability that a woman who finish her first marathon with tendinitis is between 20 and 30 years old. Use the following sums for the calculations: Males: $\\sum n_i = 100$, $\\sum x_i n_i = 3460$, $\\sum x_i^2 n_i= 134700$, $\\sum(x_i-\\bar x)^3 n_i =121987$, $\\sum(x_i-\\bar x)^4 n_i =6480792$ Females: $\\sum n_i = 80$, $\\sum x_i n_i = 2830$, $\\sum x_i^2 n_i= 107800$, $\\sum(x_i-\\bar x)^3 n_i =18346$, $\\sum(x_i-\\bar x)^4 n_i =2175992$\nSolution Males: $\\bar x_m = 34.6$ years, $s_m=12.2409$ years, $cv_m=0.3538$. Females: $\\bar x_f = 35.375$ years, $s_f=9.8035$ years, $cv_f=0.2771$. The mean of females is more representative as the coefficient of variation is lower. $IQR=16.292$ years. The spread of central data is low. Coeff. of skewness of males $g_{1m}=0.2434$ and coeff. of skewness of females $g_{1f}=0.8378$, thus the males distribution of ages is less asymmetric. Standard score for a man of 48 years $z_m(48)=1.0947$ and standard score for a woman of 47 years $z_m(47)=1.1858$, thus the man finished his first marathon before. Naming $T$ to the event of finishing the first marathon with tendinitis, $M$ to the event of being male and $F$ to the event of being female, $P(T)=0.2$, $P(T|M\\cap \\mbox{Age}\u0026lt;=40) = 0.2778$, $P(\\mbox{Age}\\in (20,30]|T\\cap F) = 0.3846$. Question 2 A study tries to determine if the number of muscular injuries of professional athletes depends on stress. The study lasted four years and measured the average level of stress and the number of muscular injuries suffered by a group of athletes. The collected data is shown in the table below.\nStress ($X$) 2.3 3.8 5.1 1.4 6.9 7.2 3.2 8.3 Injuries ($Y$) 3 6 7 2 6 8 4 8 Calculate the linear regression model of the number of injuries on stress.\nAccording to the most appropriate linear model, what stress level is expected for an athlete that suffered 4 injuries in that period?\nCalculate the logarithmic regression model of the number of injuries on stress.\nWhich regression model is better, the linear or the logarithmic? Justify the answer.\nUse the following sums for the calculations: $\\sum x_i = 38.2$, $\\sum y_j=44$, $\\sum \\log(x_i)=11.3186$, $\\sum \\log(y_j)=12.8664$, $\\sum x_i^2 = 226.28$, $\\sum y_j^2=278$, $\\sum \\log^2(x_i)=18.7028$, $\\sum \\log^2(y_j)=22.4647$, $\\sum x_iy_j = 246.4$, $\\sum x_i\\log(y_j)=69.2607$, $\\sum \\log(x_i)y_j=71.5508$, $\\sum \\log(x_i)\\log(y_j)=20.2895$.\nSolution $\\bar x=4.775$ points, $s_x^2=5.4844$ points$^2$. $\\bar y=5.5$ injuries, $s_y^2=4.5$ injuries$^2$. $s_{xy}=4.5375$ points$\\cdot$injuries. Regression line of injuries on stress: $y=1.5494 + 0.8274x$.\n$x(4)=3.2625$.\n$\\overline{\\log(x)}=1.4148$ log(points), $s_{\\log(x)}^2=0.3361$ log(points)$^2$. $s_{\\log(x)y}=1.1623$ log(points)$\\cdot$injuries. Logartihmic model of injuries on stress: $y=0.6075 + 3.458\\log(x)$.\nLinear coefficient of determination $r^2=0.8342$. Logarithmic coefficient of determination $r^2=0.8932$. Thus, the logarithmic model fits better.\nQuestion 3 A diagnostic test with a sensitivity of 96% and a specificity of 93% is used to determine a disease with a prevalence of 10%.\nWhat are the positive and negative predictive values of the test?\nIf the test is applied to 15 persons, what is the probability of having more than one positive outcomes?\nIf the test is applied to 50 persons, what is the probability of having a wrong diagnosis in more than two persons?\nSolution $PPV = P(D\\vert +) = 0.6038$ and $NPV=P(\\bar D\\vert -)=0.9952$. Naming $X$ to the number of positive outcomes after applying the test to a sample of 15 persons, $P(X\u0026gt;1)=0.7144$. Naming $Y$ to the number of wrong diagnosis after applying the test to a sample of 50 persons, $P(Y\u0026gt;2)=0.6505$. Question 4 It is known from previous studies that the hours of study of Statistics for students that pass the subject follows a normal distribution with mean 50 hours and standard deviation unknown; while for students that fail the subject follows a normal distribution with mean unknown and standard deviation 10 hours. If 20% of students that pass study more than 70 hours and 30% of students that fail study less than 25 hours,\nCalculate the standard deviation of the hours of study distribution for students that pass and the mean of the distribution for students that fail.\nIf a year there are 200 students enrolled in the subject and 150 of them pass, how many of the total students have studied more than 55 hours?\nSolution Naming $H_p$ and $H_f$ to the number of hours of study for students thar pass and fail respectively,\n$\\sigma_p=23.7637$ mg/dl and $\\mu_f=30.141$ hours. $62.8244$ students. ","date":1466640000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4f497265697b1b6777e07aff2740f6f8","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-06-23/","publishdate":"2016-06-23T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-06-23/","section":"teaching","summary":"Grade: Physiotherapy\nDate: June 23, 2016\nQuestion 1 It is believed that the age at which a person finish their first marathon depends on gender. To check it, a sample of 180 marathon runners was drawn.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-06-23","type":"book"},{"authors":null,"categories":["rkTeaching","R"],"content":"The 1.3.0 version of the R package rkTeaching for learning Statistics is available to install. This version is updated with the 3.2.3 version of R and the 0.6.5 version of RKWard.\nThis is a transitional version towards a major update that will arrive shortly and will incorporate the internationalization of the package.\nTo install it visit the page of rkTeaching.\n","date":1464566400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"c080d61a8a40cf8fea7390fb0d1ca8b9","permalink":"/en/post/rkteaching-version1.3/","publishdate":"2016-05-30T00:00:00Z","relpermalink":"/en/post/rkteaching-version1.3/","section":"post","summary":"The 1.3.0 version of the R package rkTeaching for learning Statistics is available to install. This version is updated with the 3.2.3 version of R and the 0.6.5 version of RKWard.\n","tags":["rkTeaching","RKWard"],"title":"Released version 1.3.0 of the rkTeaching package","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: May 19, 2016\nQuestion 1 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on gender, a sample of 390 patients (210 males and 180 females) was drawn and the recovery time was measured for every patient. The table below shows the frequencies of times.\n$$ \\begin{array}{ccc} \\hline \\mbox{Time (days)} \u0026amp; \\mbox{Males} \u0026amp; \\mbox{Females}\\newline 20-30 \u0026amp; 50 \u0026amp; 73\\newline 30-40 \u0026amp; 61 \u0026amp; 42\\newline 40-50 \u0026amp; 26 \u0026amp; 31\\newline 50-60 \u0026amp; 32 \u0026amp; 20\\newline 60-70 \u0026amp; 20 \u0026amp; 12\\newline 70-80 \u0026amp; 11 \u0026amp; 2\\newline 80-90 \u0026amp; 10 \u0026amp; 0\\newline \\hline \\end{array} $$\nCalculate the mean of recovery time for males, females and for the whole sample. What mean is more representative the mean of the recovery time of males or the one of females? Justify the answer. What distribution is more symmetric, the distribution of recovery time of males or the one of females? Compare the kurtosis of the recovery time of males and females. Calculate the 80th percentile of the recovery time of males. What percentage of females will have a recovery time greater than 63 days? Use the following sums for the calculations, Males: $\\sum x_in_i = 9290$ days, $\\sum x_i^2n_i=474050$ days$^2$, $\\sum(x_i-\\bar x)^3n_i = 812271.3832$ days$^3$ and $\\sum(x_i-\\bar x)^4n_i = 48895722.3971$ days$^4$. Females: $\\sum x_in_i = 6720$ days, $\\sum x_i^2n_i=282300$ days$^2$, $\\sum(x_i-\\bar x)^3n_i = 347773.3333$ days$^3$ and $\\sum(x_i-\\bar x)^4n_i = 14802393.3333$ days$^4$.\nSolution Males: $\\bar x_m=44.2381$ days, $s^2_m=300.3719$ days$^2$, $s_m=17.3312$ days and $cv_m=0.3918$. Females: $\\bar x_f=37.3333$ days, $s^2_f=174.5556$ days$^2$, $s_f=13.2119$ days and $cv_f=0.3539$. Thus, is more representative the mean of females. $g_{1m}=0.743$ and $g_{1f}=0.8378$. Thus, both distributions are right-skewed but is more symmetric the distribution of males. $g_{2m}=-0.4193$ and $g_{2f}=-0.3011$. Thus, both distributions are platykurtic, but the disribution of males is flatter. $P_{80}=59.7041$ days. $16.68%$. Question 2 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on age, a sample of 8 patients was drawn and the recovery time $Y$ (in days) and ages $X$ (in years) were measured for every patient. The table below shows the results.\nAge (years) Recovery time (days) 32 20 38 25 48 32 51 40 57 55 61 75 68 102 71 130 Calculate the regresion line of the recovery time on the age. According to the linear regression model, what is expected age for a patient with a recovery time of 100 days? Calculate the exponential regression model of the recovery time on age. What regression model explains better the relation between the recovery time and the age, the exponential or the linear? Justify the answer. Use the following sums for the calculations: $\\sum x_i=426$, $\\sum \\log(x_i)=31.5425$, $\\sum y_j=479$, $\\sum \\log(y_j)=31.1866$, $\\sum x_i^2=24008$, $\\sum \\log(x_i)^2=124.909$, $\\sum y_j^2=39603$, $\\sum \\log(y_j)^2=124.7374$, $\\sum x_iy_j=29042$, $\\sum x_i\\log(y_j)=1724.5468$, $\\sum \\log(x_i)y_j=1956.6274$, $\\sum \\log(x_i)\\log(y_j)=124.2263$. Solution Linear model $\\bar x=53.25$ years, $s_x^2=165.4375$ years$^2$. $\\bar y=59.875$ days, $s_y^2=1365.3594$ days$^2$. $s_{xy}=441.9062$ years$\\cdot$days. Regression line of recovery time on age: $y=-82.3631 + 2.6711x$.\n$66.2367$ years.\nExponential model $\\overline{\\log(y)}=3.8983$ log(days), $s_{\\log(y)}^2=0.3953$ log(days)$^2$. $s_{x\\log(y)}=7.9829$ years$\\cdot$log(days). Exponential model of recovery time on age: $y=e^{1.3288 + 0.0483x}$.\nLinear coefficient of determination $r^2=0.8645$. Exponential coefficient of determination $r^2=0.9745$. So the exponential model fits better.\nQuestion 3 In a random sample of 500 people drawn from a population there are 20 persons with an injury $A$, 40 persons with other injury $B$ and 450 persons with none of the injuries. Use relative frequencies to estimate probabilities in following questions:\nCalculate the probability that a person has both injuries Calculate the probability that a person has some injury. Calculate the probability that a person has injury $A$ but no $B$. Calculate the probability that a person has injury $A$ if he or she has injury $B$. Calculate the probability that a person has injury $B$ if he or she doesn\u0026rsquo;t have injury $A$. Are the injuries $A$ and $B$ dependent? Solution $P(A\\cap B) = 0.02$. $P(A\\cup B) = 0.1$. $P(A-B) = 0.02$. $P(A|B) = 0.25$. $P(B|\\bar A) = 0.0625$. The injuries are dependent. Question 4 The level of severity $X$ of an injury is classified in a scale from 1 to 5, from low to high severity. The probability distribution of $X$ in a population is plotted below.\nCalculate and plot the distribution function. Calculate the following probabilities: $P(X\\leq 2)$, $P(X\u0026gt;3)$, $P(X=4.2)$ and $P(1\u0026lt;X\\leq 4.2)$. Calculate the mean and the standard deviation of $X$. Is the mean representative? If a level of severity of 0.05 is considered incurable, what is the probability of having some person with an incurable injury in a sample of 10 persons with the injury? If there are 6 persons injured per month in average, what is the probability of having more than 2 persons injured? What is the probability of having more than 1 person injured with an incurable injury? Solution $$F(x) = \\begin{cases} 0 \u0026amp; \\mbox{if } x\u0026lt;1\\newline 0.2 \u0026amp; \\mbox{if } 1\\leq x\u0026lt; 2\\newline 0.6 \u0026amp; \\mbox{if } 2\\leq x\u0026lt; 3\\newline 0.85 \u0026amp; \\mbox{if } 3\\leq x\u0026lt; 4\\newline 0.95 \u0026amp; \\mbox{if } 4\\leq x\u0026lt; 5\\newline 1 \u0026amp; \\mbox{if } x\\geq 5 \\end{cases} $$ $P(X\\leq 2)=0.6$, $P(X\u0026gt;3)=0.15$, $P(X=4.2)=0$, $P(1\u0026lt;X\\leq 4.2)=0.75$\n$\\mu = 2.4$ and $s=1.0677$. The mean is moderately representative because $cv=0.4449$.\nNaming $X$ to the number of persons having an incurable injury in a sample of 10 persons with the injury, $P(X\\geq 1)=0.4013$.\nNaming $Y$ to the number of persons injured in a month, $P(T\u0026gt;2)=0.938$. Naming $Z$ to the number of persons injured with an incurable injury in an month, $P(T\u0026gt;1)=0.0369$.\nQuestion 5 A diagnostic test to determine doping of athletes returns a positive outcome when the concentration of a substance in blood is greater than 4 $\\mu$g/ml. If the distribution of the substance concentration in doped athletes follows a normal distribution model with mean 4.5 $\\mu$g/ml and standard deviation 0.2 $\\mu$g/ml, and in non-doped athletes follow a normal distribution model with mean 3 $\\mu$g/ml and standard deviation 0.3 $\\mu$g/ml,\nwhat is the sensitivity and specificity of the test? If there are a 10% of doped athletes in a competition, what are the predicted values? Solution Naming $D$ to the event of being doped, $X$ to the concentration in doped athletes and $Y$ to the concentration in non-doped athletes,\nSensitivity $P(+\\vert D) = P(X\u0026gt;4)=0.9938$ and specificity $P(-\\vert \\bar D)=P(Y\u0026lt;4)=0.9996$ PPV $P(D\\vert +) = 0.9961$ and NPV $P(\\bar D\\vert -) = 0.9993$ ","date":1463616000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"3ef3260185c5eb8d184e5901deb0f762","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-19/","publishdate":"2016-05-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-19/","section":"teaching","summary":"Grade: Physiotherapy\nDate: May 19, 2016\nQuestion 1 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on gender, a sample of 390 patients (210 males and 180 females) was drawn and the recovery time was measured for every patient.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-05-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: May 13, 2016\nQuestion 1 Of all the anterior cruciate ligament of the knee injuries, the rupture occurs in 20% of cases, and to detect it there are three different tests:\nThe drawer test that analyzes the stability of the tibia. It has a sensitivity of 80% and a specificity of 0.99%. A radiologic study in 2 planes, that allows rule out bone avulsion. It has a sensitivity of 0.85% and a specificity of 0.9%. A magnetic resonance, that it is the most appropriate when there is hematoma. It has a sensitivity and a specificity of 0.98%. Assuming that the tests are independent,\nCompute the predictive values of the drawer test. If an individual has an anterior cruciate ligament injury, what is the probability that the radiologic study and the magnetic resonance return a positive outcome? If an individual has an anterior cruciate ligament injury, what is the probability that the radiologic study or the magnetic resonance give a wrong diagnosis? Solution $PPV_1 = P(D\\vert +_1) = 0.9524$ and $NPV_1=P(\\bar D\\vert -_1)=0.9519$. $P(+_2)=0.25$, $P(+_3)=0.212$ and $P(+_2\\cap +_3)=0.053$. $P(\\mbox{Error}_2)=0.11$, $P(\\mbox{Error}_3)=0.02$ and $P(\\mbox{Error}_2\\cup \\mbox{Error}_3)=0.1278$. Question 2 It is known that 10% of professional soccer players have a cruciate ligament injury during the league. It is also known that the ligament rupture occurs in 20% of cruciate ligament injuries.\nCalculate the probability that in a team with 20 players more than 3 have a cruciate ligament injury during the league. Calculate the probability that in a league with 200 players more than 3 have a ligament rupture. Solution Naming $X$ to the number of players in a team with a cruciate ligament injury, $P(X\u0026gt;3)=0.133$. Naming $Y$ to the number of players in a league with a ligament rupture, $P(Y\u0026gt;3)= 0.5665$. Question 3 In a blood analysis the LDL cholesterol level reference interval for a particular population is $(42,155)$ mg/dl. (The reference interval contains the 95% of the population and is centered in the mean).\nAssuming that the LDL cholesterol level follows a normal distribution,\nCalculate the mean and the standard deviation of the LDL cholesterol level.\nAccording to the LDL cholesterol level, patients are classified into three categories of infarct risk:\nLDL cholesterol level Infarct risk Less than 100 mg/dl Low Between 100 and 160 mg/dl Medium Greater than 160 mg/dl High Calculate the percentage of people in the population that falls into every category of infarct risk.\nThe probability of having an infarct with a high risk is twice the probability of having infarct with a medium risk, and this is twice the probability of having infarct with a low risk. What is the probability of having infart in the whole population if the probability of having infarct with a low risk is 0.01?\nSolution Naming $C$ to the LDL cholesterol level,\n$\\mu=98.5$ mg/dl and $\\sigma=28.25$ mg/dl. $P(\\mbox{Low})=P(C\u0026lt;100)=0.5199$, $P(\\mbox{Medium})=P(100\\leq C\\leq 160)=0.4654$ and $P(\\mbox{Low})=P(C\u0026gt;160)=0.0146$. Thus, there are 51.99% of persons with low risk, 46.54% of persons with medium risk and 1.46% of persons with high risk. Naming $I$ to the event of havig an infarct, $P(I)=0.0151$. ","date":1463097600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ac4bdefcce517aeee1316f2f97006ad6","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-13/","publishdate":"2016-05-13T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-13/","section":"teaching","summary":"Grade: Physiotherapy\nDate: May 13, 2016\nQuestion 1 Of all the anterior cruciate ligament of the knee injuries, the rupture occurs in 20% of cases, and to detect it there are three different tests:","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-05-13","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Grade: Physiotherapy\nDate: April 01, 2016\nQuestion 1 The chart below shows the cumulative frequency distribution the maximum angle of knee deflection after a replacement of the knee cap in a group of patients.\nCalculate the quartiles and interpret them. Are there outliers in the sample? What percentage of patients have a maximum angle of knee deflection of 90 degrees? Solution $Q_1=64$, $Q_2=83.3333$, $Q_3=100$. Fences: $F_1=10$ and $F_2=154$. There are no outliers. $F_{90}=60%$. Question 2 The waiting times in a physiotherapy clinic of a sample of patiens are\n18, 8, 27, 6, 13, 26, 14, 23, 14, 31, 27, 19, 15, 20, 11, 30, 25, 23, 20, 15 Calculate the mean. Is representative? Justify the answer. Calculate the coefficient of skewness and interpret it. Calculate the coefficient of kurtosis and interpret it. Use the following sums for the calculations: $\\sum x_i=385$ min, $\\sum(x_i-\\bar x)^2=983.75$ min$^2$, $\\sum (x_i-\\bar x)^3=-601.125$ min$^3$, $\\sum (x_i-\\bar x)^4=98369.1406$ min$^4$.\nSolution $\\bar x=19.25$ min, $s^2=49.1875$ min$^2$, $s=7.0134$ min, $cv=0.3643$. As the $cv\u0026lt;0.5$ there is a low variability and the mean is representative. $g_1=-0.0871$. The distribution is almost symmetrical. $g_2=-0.9671$. The distribution is flatter than a bell curve (platykurtic). Question 3 A study try to determine if there is relation between recovery time $Y$ (in days) of an injury and the age of the person $X$ (in years). For that purpose a sample of 15 persons with the injury was drawn with the following values:\nAge (years) Recovery time (days) 21 20 26 26 30 27 34 32 39 36 45 37 51 38 54 41 59 42 63 45 71 44 76 43 80 45 84 46 88 44 Compute the regression line of the recovery time on the age. How much increase the recovery time for each year of age? Compute the logarithmic regression model of the recovery time on the age. Which of the previous models explains better the relation between the recovery time and the age? Justify the answer. Use the best of the previous models to predict the recovery time of a person 50 years old. Is reliable the prediction? Use the following sums for the calculations: $\\sum x_i=821$, $\\sum \\log(x_i)=58.7255$, $\\sum y_j=566$, $\\sum \\log(y_j)=54.0702$, $\\sum x_i^2=51703$, $\\sum \\log(x_i)^2=232.7697$, $\\sum y_j^2=22270$, $\\sum \\log(y_j)^2=195.7633$, $\\sum x_iy_j=33256$, $\\sum x_i\\log(y_j)=3026.6478$, $\\sum \\log(x_i)y_j=2265.458$, $\\sum \\log(x_i)\\log(y_j)=213.1763$.\nSolution Linear model $\\bar x=54.7333$ years, $s_x^2=451.1289$ years$^2$. $\\bar y=37.7333$ days, $s_y^2=60.8622$ days$^2$. $s_{xy}=151.7956$ years$\\cdot$days. Regression line of recovery time on age: $y=19.3167 + 0.3365x$. Every year of age the recovery time increases 0.3365 days.\nLogartihmic model $\\overline{\\log(x)}=3.915$ log(years), $s_{\\log(x)}^2=0.1905$ log(years)$^2$. $s_{\\log(x)y}=3.3033$ log(years)$\\cdot$days. Logartihmic model of recovery time on age: $y=-30.1526 + 17.3398\\log(x)$.\nLinear coefficient of determination $r^2=0.8392$. Logarithmic coefficient of determination $r^2=0.9411$. So the logarithmic model fits better.\n$y(50)=-30.1526 + 17.3398\\log(50) = 37.6812$.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616018106,"objectID":"fd2ee0cb4df8d68c89d890403cdbedb8","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-04-01/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-04-01/","section":"teaching","summary":"Grade: Physiotherapy\nDate: April 01, 2016\nQuestion 1 The chart below shows the cumulative frequency distribution the maximum angle of knee deflection after a replacement of the knee cap in a group of patients.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2016-04-01","type":"book"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1451606400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"2414f1cf796086a6cb4c1e40f2fc1ff0","permalink":"/en/publication/innovacion-2016-2/","publishdate":"2020-09-16T21:26:03.10618Z","relpermalink":"/en/publication/innovacion-2016-2/","section":"publication","summary":"","tags":[],"title":"Innovación en la docencia de Estadística con R y rk.Teaching","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1451606400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"39bad87ab5875006e593268ed2565eaf","permalink":"/en/publication/innovacion-2016/","publishdate":"2020-09-16T21:26:01.841858Z","relpermalink":"/en/publication/innovacion-2016/","section":"publication","summary":"","tags":[],"title":"Innovación en la docencia de Estadística con R y rk.Teaching","type":"publication"},{"authors":null,"categories":null,"content":"I\u0026rsquo;m glad to offer a basic manual of Excel, the famous Microsoft Office spreadsheet. This manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\nThe version of Excel used in this manual is Excel 2010, but some parts of this manual are also valid for other versions.\nThis is my first manual in English and so there is likely to be some grammatical errors. I apologize by that and I would like to ask you to correct me in the forum below. I hope you enjoy it.\n","date":1441843200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4b5f7ad46462bd7df6d375917af9d55e","permalink":"/en/post/excel-manual/","publishdate":"2015-09-10T00:00:00Z","relpermalink":"/en/post/excel-manual/","section":"post","summary":"I\u0026rsquo;m glad to offer a basic manual of Excel, the famous Microsoft Office spreadsheet. This manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\n","tags":null,"title":"New Excel manual","type":"post"},{"authors":["Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1420070400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"6fd719886c7cb7d4e40b67df952eb8a7","permalink":"/en/publication/bringing-2015/","publishdate":"2020-09-16T21:26:02.037032Z","relpermalink":"/en/publication/bringing-2015/","section":"publication","summary":"","tags":[],"title":"Bringing R to non-expert users with the package RKTeaching","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"e4bec646020b6be1cad0ffbe8a72bfba","permalink":"/en/publication/bioestadistica-2014/","publishdate":"2020-09-16T21:26:02.230139Z","relpermalink":"/en/publication/bioestadistica-2014/","section":"publication","summary":"","tags":[],"title":"Bioestadística Aplicada con SPSS","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"aa51cd44895df4c2492b5da16f359322","permalink":"/en/publication/towards-2014-1/","publishdate":"2020-09-16T21:26:02.426838Z","relpermalink":"/en/publication/towards-2014-1/","section":"publication","summary":"","tags":[],"title":"Towards a Semanctic Catalog of Similarity Measures","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"ef9a9eb14999b77d107482c74d6c3dbd","permalink":"/en/publication/towards-2014/","publishdate":"2020-09-16T21:26:01.646422Z","relpermalink":"/en/publication/towards-2014/","section":"publication","summary":"","tags":[],"title":"Towards a Semantic Catalog of Similarity Measures","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1356998400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"5f25873cb050b635b9d7ac567a3583fa","permalink":"/en/publication/rkteaching-2013/","publishdate":"2020-09-16T21:26:02.327055Z","relpermalink":"/en/publication/rkteaching-2013/","section":"publication","summary":"","tags":[],"title":"RKTeaching: a new R package for teaching Statistics .","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1325376000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"eff321037b1ca9bd9606d2292bb57de4","permalink":"/en/publication/rkteaching-2012/","publishdate":"2020-09-16T21:26:02.816682Z","relpermalink":"/en/publication/rkteaching-2012/","section":"publication","summary":"","tags":[],"title":"RKTeaching: Un paquete de R para la enseñanza de la Estadística","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1167609600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"99b05fedcabac3a3c427f3d002dbee36","permalink":"/en/publication/evolution-2007/","publishdate":"2020-09-16T21:26:01.746253Z","relpermalink":"/en/publication/evolution-2007/","section":"publication","summary":"","tags":[],"title":"Evolution of neuroendocrine cell population and peptidergic innervation, assessed by discriminant analysis, during postnatal development of the rat prostate","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1104537600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"1978ae6e507b7e05ad3b7c46ff392638","permalink":"/en/publication/amon-2005/","publishdate":"2020-09-16T21:26:02.916684Z","relpermalink":"/en/publication/amon-2005/","section":"publication","summary":"","tags":[],"title":"AMON: A software system for automatic generation of ontology mappings","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1072915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"199ff9472d40e6d884fe9391ee05c343","permalink":"/en/publication/framework-2004/","publishdate":"2020-09-16T21:26:02.625162Z","relpermalink":"/en/publication/framework-2004/","section":"publication","summary":"","tags":[],"title":"Framework for automatic generation of ontology mappings","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1072915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"e58d1e5c9a8550c9c583ec27c0e329ca","permalink":"/en/publication/herramientas-2004/","publishdate":"2020-09-16T21:26:02.523121Z","relpermalink":"/en/publication/herramientas-2004/","section":"publication","summary":"","tags":[],"title":"Herramientas de trabajo cooperativo","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1009843200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"3bdc4ec32bc413161b88aaf91a72f877","permalink":"/en/publication/aspectos-2002/","publishdate":"2020-09-16T21:26:03.009331Z","relpermalink":"/en/publication/aspectos-2002/","section":"publication","summary":"","tags":[],"title":"Aspectos técnicos de la comunidad virtual de usuarios FARMATOXI","type":"publication"},{"authors":["G; Rimbau, V; Sanchez-Alberca, A; Reverte, M; Alguacil, L F Repetto"],"categories":[],"content":"","date":978307200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"1787620af867a98e635721f0278eca4d","permalink":"/en/publication/farmatoxi-2001/","publishdate":"2020-09-16T21:26:03.700427Z","relpermalink":"/en/publication/farmatoxi-2001/","section":"publication","summary":"","tags":[],"title":"FARMATOXI, a new virtual community of pharmacology and toxicology","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":946684800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"d3e2a26e98a746b19fdefca4b7030590","permalink":"/en/publication/farmatoxi-2000/","publishdate":"2020-09-16T21:26:02.719337Z","relpermalink":"/en/publication/farmatoxi-2000/","section":"publication","summary":"","tags":[],"title":"FARMATOXI: Red temática de farmacología y toxicología de RedIris","type":"publication"},{"authors":null,"categories":["Calculus","Derive"],"content":"","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"998b467c38242c175495515600ab7700","permalink":"/en/teaching/derive/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/derive/","section":"teaching","summary":"","tags":["Problems"],"title":"Calculus with Derive","type":"teaching"},{"authors":null,"categories":["Calculus","Geogebra"],"content":"Geogebra GeoGebra is an open source interactive software intended for learning Mathematics in secondary and higher education. Below we present you a Calculus manual with Geogebra, focused, mainly, in the analytical resolution of calculus problems in one and several variables with the CAS view (symbolic calculus) of Geogegra.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"16fb83370cf0bce0cb1290c5f834b6ff","permalink":"/en/teaching/geogebra/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/geogebra/","section":"teaching","summary":"Geogebra GeoGebra is an open source interactive software intended for learning Mathematics in secondary and higher education. Below we present you a Calculus manual with Geogebra, focused, mainly, in the analytical resolution of calculus problems in one and several variables with the CAS view (symbolic calculus) of Geogegra.","tags":["Problems"],"title":"Calculus with Geogebra","type":"teaching"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics formulas Statistics and Probability formulas Excel formulas Standard normal probability distribution table ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"27938b28aaa052fd4512812b0f2e896f","permalink":"/en/teaching/statistics/cheatsheets/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/cheatsheets/","section":"teaching","summary":"Everything you have to know at a glance","tags":["Cheat sheet"],"title":"Statistics Cheat Sheets","type":"book"}] \ No newline at end of file +[{"authors":["asalber"],"categories":null,"content":"Father and environmental activist, I work as a teacher of Mathematics and Statistics at the Applied Maths and Statistics department of the CEU San Pablo University. I do my research in Data Science, including Biostatistics and Machine Learning. I master the programming languages R, Python and LaTeX.\n","date":-62135596800,"expirydate":-62135596800,"kind":"term","lang":"en","lastmod":1600206500,"objectID":"4c1a52ed2fdb89e37dda671db5e7b383","permalink":"/en/author/alfredo-sanchez-alberca/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/author/alfredo-sanchez-alberca/","section":"authors","summary":"Father and environmental activist, I work as a teacher of Mathematics and Statistics at the Applied Maths and Statistics department of the CEU San Pablo University. I do my research in Data Science, including Biostatistics and Machine Learning.","tags":null,"title":"Alfredo Sánchez Alberca","type":"authors"},{"authors":null,"categories":["Calculus","One Variable Calculus","Several Variables Calculus"],"content":" Descargar\nThis Calculus manual has been conceived to ease the learning of Calculus in first years of university studies. It explain in a clear and simplified manner the most important concepts with a lot of examples that ease their understanding.\nThe manual is mainly focused on Health Sciences and most examples are applied to this field. However, the concepts and procedures presented are valid for any scope.\nTable of Contents Analytic Geometry One variable differential calculus Integral calculus Ordinary differential equations Several variables differentiable calculus ","date":1461110400,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"189c5ff675788efb52fff13012b2c1ea","permalink":"/en/teaching/calculus/manual/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/","section":"teaching","summary":"Explanation of the most important concepts in one variable and several variables Calculus with applied examples.","tags":null,"title":"Calculus Manual","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Descargar\nThis Manual of Statistics has been conceived to ease the learning of Statistics. It contains simple explanations of the most important concepts in Statistics with examples. It also explains the the most common statistical procedures for data analysis.\nThe manual is mainly aimed at Biostatistics, and therefore most of the examples are applied to health sciences. However, the concepts and statistical methods presented are valid for any scope.\nTable of Contents Introduction Descriptive Statistics Regression Probability Discrete Random Variables Continuous Random Variables Study flash cards There is an Anki deck of flash cards to study an remember the main concepts of this manual.\nIf you don\u0026rsquo;t know what is Anki, please visit the Anki web site. There is also a excellent tutorial about Anki essentials.\n","date":1461110400,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"e61b7a0a352f5ba615722fcec6c07083","permalink":"/en/teaching/statistics/manual/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/","section":"teaching","summary":"Explanation of the most important concepts in Statistics and Probability with examples.","tags":["Statistics"],"title":"Statistics Manual","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":" Descargar\nThis is a basic manual of Excel, the Microsoft Office spreadsheet. The version of Excel used in this manual is Excel 2010, but some parts of this manual are also valid for other versions.\nThis manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\nTable of Contents Introduction Formatting and Data Printing Formulas Plotting Charts Database Management ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"1f8a65a2485c96f4aa2aa6a90d06495f","permalink":"/en/teaching/excel/manual/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/","section":"teaching","summary":"A basic introduction to Excel for Economics with examples.","tags":["Excel"],"title":"Excel Manual","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Derivatives ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"99e2b20656785fc76fa5f07071839347","permalink":"/en/teaching/calculus/problems/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/problems/","section":"teaching","summary":"Statistics problems with solutions.","tags":["Problems"],"title":"Calculus Problems","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"\n\n","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"3ed71473b71b9ac7e72598691fa90654","permalink":"/en/teaching/excel/exercises/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/exercises/","section":"teaching","summary":"Excel problems with solutions.","tags":["Excel","Problems"],"title":"Excel Exercises","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":" Frequency Tables and Charts Descriptive Statistics Linear Regression Non Linear Regression Probability Diagnostic Tests Discrete Random Variables Continuous Random Variables ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1609665891,"objectID":"4734df1d5e0d7fd295f3f7a54a0584c4","permalink":"/en/teaching/statistics/problems/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/","section":"teaching","summary":"Statistics problems with solutions.","tags":["Problems"],"title":"Statistics Problems","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Calculus exams of the last courses with solutions.\nPharmacy exam 2022-01-17 Pharmacy exam 2021-01-18 Pharmacy exam 2019-12-16 Pharmacy exam 2018-12-17 Pharmacy exam 2018-01-19 Pharmacy exam 2017-11-06 Pharmacy exam 2016-01-10 Pharmacy exam 2016-11-07 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"d2efe47f363828dae16b00de59f4b783","permalink":"/en/teaching/calculus/exams/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/","section":"teaching","summary":"Calculus exams of the last courses with solutions.","tags":["Exams"],"title":"Calculus Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics exams of the last courses with solutions.\nPharmacy Statistics Exams Physiotherapy Statistics Exams ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600725294,"objectID":"505c6dae1d0e0627c3510b060dd6e42f","permalink":"/en/teaching/statistics/exams/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/","section":"teaching","summary":"Statistics exams of the last courses with solutions.","tags":["Exams"],"title":"Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy"],"content":"List of exams Pharmacy exam 2022-01-17 Pharmacy exam 2021-11-22 Pharmacy exam 2021-10-25 Pharmacy exam 2021-01-18 Pharmacy exam 2020-11-23 Pharmacy exam 2020-10-26 Pharmacy exam 2019-12-16 Pharmacy exam 2019-11-18 Pharmacy exam 2019-10-14 Pharmacy exam 2018-12-17 Pharmacy exam 2018-11-19 Pharmacy exam 2018-10-29 Pharmacy exam 2018-01-19 Pharmacy exam 2017-11-27 Pharmacy exam 2017-01-10 Pharmacy exam 2016-11-28 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600206500,"objectID":"c6437d9dc217cdb1190e9310d6a79db0","permalink":"/en/teaching/statistics/exams/pharmacy/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Pharmacy Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"List of exams Physiotherapy exam 2022-06-06 Physiotherapy exam 2022-05-06 Physiotherapy exam 2022-03-11 Physiotherapy exam 2021-06-07 Physiotherapy exam 2021-05-05 Physiotherapy exam 2021-03-17 Physiotherapy exam 2020-06-19 Physiotherapy exam 2020-05-25 Physiotherapy exam 2019-06-18 Physiotherapy exam 2019-05-27 Physiotherapy exam 2019-03-26 Physiotherapy exam 2018-05-31 Physiotherapy exam 2018-04-09 Physiotherapy exam 2017-06-02 Physiotherapy exam 2017-05-19 Physiotherapy exam 2017-03-31 Physiotherapy exam 2016-06-23 Physiotherapy exam 2016-05-19 Physiotherapy exam 2016-05-13 Physiotherapy exam 2016-04-01 ","date":-62135596800,"expirydate":-62135596800,"kind":"section","lang":"en","lastmod":1600206500,"objectID":"1580d126132989916d94dd543a563c7d","permalink":"/en/teaching/statistics/exams/physiotherapy/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Physiotherapy Statistics Exams","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics as a scientific tool What is Statistics? Definition - Statistics. Statistics is a branch of Mathematics that deals with data collection, summary, analysis and interpretation. The role of Statistics is to extract information from data in order to gain knowledge for taking decisions.\nStatistics is essential in any scientific or technical discipline which requires data handling, especially with large volumes of data, such as Physics, Chemistry, Medicine, Psychology, Economics or Social Sciences.\nBut, why is Statistics necessary?\nA changing World Scientists try to study the World. A World with a high variability that makes difficult determining the behaviour of things.\nStatistics provides a bridge between the real world and the mathematical models that attempt to explain it, providing a methodology to assess the discrepancies between reality and theoretical models.\nThis makes Statistics an indispensable tool in applied sciences that require design of experiments and data analysis.\nPopulation and sample Statistical population Definition - Population. A population is a set of elements defined by an or more features that has all the elements and only them. Every element of the population is called individual.\nDefinition - Population size. The number of individuals in a population is known as the population size and is represented by $N$.\nSometimes not all the individuals are accessible to study. Then we distinguish between:\nTheoretical population: Individuals to which we want to extrapolate the study conclusions. Studied population: Individuals truly accessible in the study. Example. In a study about a particular disease, the theoretical population would be all the persons that suffered the disease in some moment, even if they were not born yet. While the studied population will be the set o persons that have suffered the disease and that we can really study (observe that this exclude people with the disease but that we do not have any mean to get information about them).\nDrawbacks in the population study Scientists study a phenomenon in a population to understand it, to get knowledge about it, and so to control it.\nBut, for a complete knowledge of the population it is necessary to study all its individuals.\nHowever, this is not always possible for several reasons:\nThe population size is infinite or too large to study all its individuals. The operations that individuals undergo are destructive. The cost, both in money and time, that would require study all the individuals in the population is not affordable. Statistics Sample When it is not possible or convenient to study all the individuals in a population, we study only a subset of them.\nDefinition - Sample. A sample is a subset of the population.\nDefinition - Sample size. The number of individuals of the sample is called sample size and is represented by $n$.\nUsually, the population study is conducted on samples drawn from it.\nThe sample study only gives an approximate knowledge of the population. But in most cases it is enough.\nSample size determination One of the most interesting questions that arise:\nHow many individuals are required to sample to have an approximate but enough knowledge of the population?\nThe answer depends of several factors, as the population variability or the desired reliability for extrapolations to the population.\nUnfortunately we can not answer that question until the end of the course, but in general, the most individuals the sample has, the more reliable will the conclusions be on the population, but also the study will be longer and more expensive.\nExample. To understand what a sufficient sample size means we can use a picture example. A digital photography consist of a lot of small points called pixels disposed in an big array layout with rows and columns (the more rows and columns, the more resolution the picture has). Here the picture is the population and every pixel is an individual. Every pixel has a colour and it is the variability of colours what forms the picture motif.\nHow many pixels must we take in a sample in order to know the motif of a picture?\nThe answer depends on the variability of colours in the picture. If all the pixels in the picture are of the same colour, only one pixel is required to know the motif. But, if there is a lot of variability in the colours, a large sample size will be required.\nThe image below contains a small sample of the pixels of a picture. Could you find out the motif of the picture?\nWith a small sample size it is difficult to find out the picture motif!\nSurely you has not been able to guess the motif because the number of pixels picked in the sample is too small to understand the variability of colours in the picture.\nThe image below contains a larger sample of pixels. Could you find out the motif of the picture now?\nWith a large sample is easier to find out the picture motif!\nAnd here is the whole population.\nIt is not required to know all the pixels of a picture to find out its motif!\nTypes of reasoning Deduction properties: If the premises are true, it guarantees the certainty of the conclusions (that is, if something is true in the population, it is also true in the sample). However,\nInduction properties: It does not guarantee the certainty of the conclusions (if something is true in the sample, it may not be true in the population, so be careful with the extrapolations!). But, it is the only way to generate new knowledge!\nStatistics is fundamentally based on inductive reasoning, because it uses the information obtained from samples to draw conclusions about populations.\nSampling Definition - Sampling. The process of selecting the elements included in a sample is known as sampling. To reflect reliable information about the whole population, the sample must be representative of the population. That means that the sample should reproduce on a smaller scale the population variability.\nThe goal is to get a representative sample of the population.\nTypes of sampling There exist a lot of sampling methods but all of them can be grouped in two categories:\nRandom sampling: The sample individuals are selected randomly. All the population individuals have the same likelihood of being selected (equiprobability).\nNon random sampling: The sample individuals are not selected randomly. Some population individuals have a higher likelihood of being selected than others.\nOnly random sampling methods avoid the selection bias and guarantee the representativeness of the sample, and therefore, the validity of conclusions.\nNon random sampling methods are not suitable to make generalizations because they do not guarantee the representativeness of the sample. Nevertheless, usually they are less expensive and can be used in exploratory studies.\nSimple random sampling The most popular random sampling method is the simple random sampling, that has the following properties:\nAll the population individuals have the same likelihood of being selected in the sample. The individual selection is performed with replacement, that is, each selected individual is returned to the population before selecting the next one. In this way the population does not change. Each individual selection is independent of the others. The only way of doing a random sampling is to assign a unique identity number to each population individual (conducting a census) and performing a random drawing.\nStatistical variables In every statistical study we are interested in some properties or characteristics of individuals.\nDefinition - Statistical variable. A statistical variable is a property or characteristic measured in the population individuals. The data is the actual values or outcomes recorded on a statistical variable.\nTypes of statistical variables According to the nature of their values and their scale, they can be:\nQualitative variables. They measure non-numeric qualities. They can be:\nNominals: There is no natural order between its categories. Example: The hair colour or the gender.\nOrdinals: There is a natural order between its categories. Example: The education level.\nQuantitative variables: They measure numeric quantities. They can be:\nDiscrete: Their values are isolated numbers (usually integers). Example: The number of children or cars in a family.\nContinuous: They can take any value in a real interval. Example: The height, weight or age of a person.\nQualitative and discrete variables are also called categorical variables and their values categories.\nChoosing the appropriate type of variable Sometimes a characteristic could be measured in variables of different types.\nExample. Whether a person smokes or not could be measure in several ways:\nSmokes: yes/no. (Nominal)\nSmoking level: No smoking/unusual/moderate/quite/heavy. (Ordinal)\nNumber of cigarettes per day: 0,1,2,\u0026hellip;(Discrete)\nIn those cases quantitative variables are preferable to qualitative, continuous variables are preferable to discrete variables and ordinal variables are preferable to nominal, as they give more information.\nAccording to their role in the study:\nIndependent variables: Variables that do not depend on other variables in the study. Usually they are manipulate in an experiment in order to observe their effect on a dependent variable. They are also known as predictor variables.\nDependent variables: Variables that depend on other variables in the study. They are not manipulated in an experiment and are also known as outcome variables.\nExample. In a study on the performance of students in a course, the intelligence of students and the daily study time are independent variables, while the course grade is a dependent variable.\nTypes of statistical studies According to their role in the study:\nExperimental: When the independent variables are manipulated in order to see the effect that that change has on the dependent variables.\nExample. In a study on the performance of students in a test, the teacher manipulates the methodology and creates two or more groups following different methodologies.\nNon-experimental: When the independent variables are not manipulated. That does not mean that it is impossible to do so, but it will either be impractical or unethical to do so.\nExample. In a study a researcher could be interested in the effect of smoking over the lung cancer. However, whilst possible, it would be unethical to ask individuals to smoke in order to study what effect this had on their lungs. In this case, the researcher could study two groups of people, one with lung cancer and other without, an observe in each group how many persons smoke or not.\nExperimental studies allow to identify a cause and effect between variables while non-experimental studies only allow to identify association or relationship between variables.\nThe data table The variables of a study will be measured in each individual of the sample. This will give a data set that usually is arranged in a tabular form known as data table.\nIn this table each column contains the information of a variable and each row contains the information of an individual.\nExample. The table below contains data about the variables Name, Age, Gender, Weight and Height of a sample of 6 persons.\nName Age Gender Weight(Kg) Height(cm) José Luis Martínez 18 H 85 179 Rosa Díaz 32 M 65 173 Javier García 24 H 71 181 Carmen López 35 M 65 170 Marisa López 46 M 51 158 Antonio Ruiz 68 H 66 174 Phases of a statistical study Usually a statistical study goes through the following phases:\nThe study begins with a previous design in which the study goals, the population, the variables to measure and the required sample size are set.\nNext, the sample is selected from the population and the variables are measured in the individuals of the sample (getting the data table). This is accomplished by Sampling.\nThe next step consists in describing and summarizing the information of the sample. This is the job of Descriptive Statistics.\nThen, the information obtained is projected on a mathematical model that intend to explain what happens in population, and the model is validated. This is accomplished by Inferential Statistics.\nFinally, the validated model is used to perform predictions and to draw conclusions on the population.\nThe statistical cycle ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"25b11930afc1ea0d302f6d13ef3201a2","permalink":"/en/teaching/statistics/manual/introduction/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/introduction/","section":"teaching","summary":" ","tags":["Statistics"],"title":"Introduction","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Descriptive Statistics is the part of Statistics in charge of representing, analysing and summarizing the information contained in the sample.\nAfter the sampling process, this is the next step in every statistical study and usually consists of:\nTo classify, group and sort the data of the sample.\nTo tabulate and plot data according to their frequencies.\nTo calculate numerical measures that summarize the information contained in the sample (sample statistics).\nIt has no inferential power, so do not generalize to the population from the measures computed by Descriptive Statistics!.\nFrequency distribution The study of a statistical variable starts by measuring the variable in the individuals of the sample and classifying the values.\nThere are two ways of classifying data:\nNon-grouping: Sorting values from lowest to highest value (if there is an order). Used with qualitative variables and discrete variables with few distinct values.\nGrouping: Grouping values into intervals (classes) and sort them from lowest to highest intervals. Used with continuous variables and discrete variables with many distinct values.\nSample classification It consists in grouping the values that are the same and sorting them if there is an order among them.\nExample. $X=$Height\nFrequency count It consists in counting the number of times that every value appears in the sample.\nExample. $X=$Height\nSample frequencies Definition - Sample frequencies. Given a sample of $n$ values of a variable $X$, for every value $x_i$ of the variable we define\nAbsolute Frequency $n_i$: The number of times that value $x_i$ appears in the sample.\nRelative Frequency $f_i$: The proportion of times that value $x_i$ appears in the sample.\n$$f_i = \\frac{n_i}{n}$$\nCumulative Absolute Frequency $N_i$: The number of values in the sample less than or equal to $x_i$. $$N_i = n_1 + \\cdots + n_i = N_{i-1}+n_i$$\nCumulative Relative Frequency $F_i$: The proportion of values in the sample less than or equal to $x_i$. $$F_i = \\frac{N_i}{n}$$\nFrequency table The set of values of a variable with their respective frequencies is called frequency distribution of the variable in the sample, and it is usually represented as a frequency table.\n$X$ values Absolute frequency Relative frequency Cumulative absolute frequency Cumulative relative frequency $x_1$ $n_1$ $f_1$ $N_1$ $F_1$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $x_i$ $n_i$ $f_i$ $N_i$ $F_i$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $\\vdots$ $x_k$ $n_k$ $f_k$ $N_k$ $F_k$ Example - Quantitative variable and non-grouped data. The number of children in 25 families are:\n1, 2, 4, 2, 2, 2, 3, 2, 1, 1, 0, 2, 2, 0, 2, 2, 1, 2, 2, 3, 1, 2, 2, 1, 2 The frequency table for the number of children in this sample is\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 2 \u0026amp; 0.08\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 8 \u0026amp; 0.32\\newline 2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 22 \u0026amp; 0.88\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 24 \u0026amp; 0.96\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\end{array} $$\nExample - Quantitative variable and grouped data. The heights (in cm) of 30 students are:\n179, 173, 181, 170, 158, 174, 172, 166, 194, 185,\n162, 187, 198, 177, 178, 165, 154, 188, 166, 171,\n175, 182, 167, 169, 172, 186, 172, 176, 168, 187. The frequency table for the height in this sample is\n$$ \\begin{array}{crrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline (150,160] \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 2 \u0026amp; 0.07\\newline (160,170] \u0026amp; 8 \u0026amp; 0.27 \u0026amp; 10 \u0026amp; 0.34\\newline (170,180] \u0026amp; 11 \u0026amp; 0.36 \u0026amp; 21 \u0026amp; 0.70\\newline (180,190] \u0026amp; 7 \u0026amp; 0.23 \u0026amp; 28 \u0026amp; 0.93\\newline (190,200] \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\end{array} $$\nClasses construction Intervals are known as classes and the center of intervals as class marks.\nWhen grouping data into intervals, the following rules must be taken into account:\nThe number of intervals should not be too big nor too small. A usual rule of thumb is to take a number of intervals approximately $\\sqrt{n}$ or $\\log_2(n)$. The intervals must not overlap and must cover the entire range of values. It does not matter if intervals are left-open and right-closed or vice versa. The minimum value must fall in the first interval and the maximum value in the last. Example - Qualitative variable. The blood types of 30 people are:\nA, B, B, A, AB, 0, 0, A, B, B, A, A, A, A, AB, A, A, A, B, 0, B, B, B, A, A, A, 0, A, AB, 0. The frequency table of the blood type in this sample is\n$$ \\begin{array}{crr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \\newline \\hline \\mbox{0} \u0026amp; 5 \u0026amp; 0.16 \\newline \\mbox{A} \u0026amp; 14 \u0026amp; 0.47 \\newline \\mbox{B} \u0026amp; 8 \u0026amp; 0.27 \\newline \\mbox{AB} \u0026amp; 3 \u0026amp; 0.10 \\newline \\hline \\sum \u0026amp; 30 \u0026amp; 1 \\newline \\hline \\end{array} $$\nObserve that in this case cumulative frequencies are nonsense as there is no order in the variable.\nFrequency distribution graphs Usually the frequency distribution is also displayed graphically. Depending on the type of variable and whether data has been grouped or not, there are different types of charts:\nBar chart\nHistogram\nLine or polygon chart.\nPie chart\nBar chart A bar chart consists of a set of bars, one for every value or category of the variable, plotted on a coordinate system.\nUsually the values or categories of the variable are represented on the $x$-axis, and the frequencies on the $y$-axis. For each value or category of the variable, a bar is draw to the height of its frequency. The width of the bar is not important but bars should be clearly separated among them.\nDepending on the type of frequency represented in the $y$-axis we get different types of bar charts.\nSometimes a polygon, known as frequency polygon, is plotted joining the top of every bar with straight lines.\nExample. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.\nThe bar chart below shows the relative frequency distribution of the number of children with the frequency polygon.\nThe bar chart below shows the cumulative absolute frequency distribution of the number of children.\nAnd the bar chart below shows the cumulative relative frequency distribution of the number of children with the frequency polygon.\nHistogram A histogram is similar to a bar chart but for grouped data.\nUsually the classes or grouping intervals are represented on the $x$-axis, and the frequencies on the $y$-axis. For each class, a bar is draw to the height of its frequency. Contrary to bar charts, the width of bars coincides with the width of classes, and there are no space between two consecutive bars.\nDepending on the type of frequency represented in the $y$-axis we get different types of histograms.\nAs with the bar chart, the frequency polygon can be drawn joining the top centre of every bar with straight lines.\nExample. The histogram below shows the absolute frequency distribution of heights.\nThe histogram below shows the relative frequency distribution of heights with the frequency polygon.\nThe cumulative frequency polygon (for absolute or relative frequencies) is known as ogive.\nExample. The histogram and the ogive below show the cumulative relative distribution of heights.\nObserve that in the ogive we join the top right corner of bars with straight lines, instead of the top center, because we do not reach the accumulated frequency of the class until the end of the interval.\nPie chart A pie chart consists of a circle divided in slices, one for every value or category of the variable. Each slice is called a sector and its angle or area is proportional to the frequency of the corresponding value or category.\nPie charts can represent absolute or relative frequencies, but not cumulative frequencies, and are used with nominal qualitative variables. For ordinal qualitative or quantitative variables is better to use bar charts, because it is easier to perceive differences in one dimension (length of bars) than in two dimensions (areas of sectors).\nExample. The pie chart below shows the relative frequency distribution of blood types.\nThe normal distribution Distributions with different properties will show different shapes.\nOutliers One of the main problems in samples are outliers, values very different from the rest of values of the sample.\nExample. The last height of the following sample of heights is an outlier.\nIt is important to find out outliers before doing any analysis, because outliers usually distort the results.\nThey always appears in the ends of the distribution, and can be found out easily with a box and whiskers chart (as be show later).\nOutliers management With big samples outliers have less importance and can be left in the sample.\nWith small samples we have several options:\nRemove the outlier if it is an error. Replace the outlier by the lower or higher value in the distribution that is not an outlier if it is not an error and the outlier does not fit the theoretical distribution. Leave the outlier if it is not an error, and change the theoretical model to fit it to outliers. Sample statistics The frequency table and charts summarize and give an overview of the distribution of values of the studied variable in the sample, but it is difficult to describe some aspects of the distribution from it, as for example, which are the most representative values of the distribution, how is the spread of data, which data could be considered outliers, or how is the symmetry of the distribution.\nTo describe those aspects of the sample distribution more specific numerical measures, called sample statistics, are used.\nAccording to the aspect of the distribution that they study, there are different types of statistics:\nMeasures of locations: They measure the values where data are concentrated or that divide the distribution into equal parts.\nMeasures of dispersion: They measure the spread of data.\nMeasures of shape: They measure aspects related to the shape of the distribution , as the symmetry and the concentration of data around the mean.\nLocation statistics There are two groups:\nCentral location measures: They measure the values where data are concentrated, usually at the centre of the distribution. These values are the values that best represents the sample data. The most important are:\nArithmetic mean Median Mode Non-central location measures: They divide the sample data into equals parts. The most important are:\nQuartiles. Deciles. Percentiles. Arithmetic mean Definition - Sample arithmetic mean $\\bar{x}$. The sample arithmetic mean of a variable $X$ is the sum of observed values in the sample divided by the sample size\n$$\\bar{x} = \\frac{\\sum x_i}{n}$$\nIt can be calculated from the frequency table with the formula\n$$\\bar{x} = \\frac{\\sum x_in_i}{n} = \\sum x_i f_i$$\nIn most cases the arithmetic mean is the value that best represent the observed values in the sample.\nWatch out! It can not be calculated with qualitative variables.\nExample - Non-grouped data. Using the data of the sample with the number of children of families, the arithmetic mean is\n$$ \\begin{aligned} \\bar{x} \u0026amp;= \\frac{1+2+4+2+2+2+3+2+1+1+0+2+2}{25}+\\newline\\newline \u0026amp;+\\frac{0+2+2+1+2+2+3+1+2+2+1+2}{25} = \\frac{44}{25} = 1.76 \\mbox{ children}. \\end{aligned} $$\nor using the frequency table\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; x_in_i \u0026amp; x_if_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 0 \u0026amp; 0\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 6 \u0026amp; 0.24\\newline 2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 28 \u0026amp; 1.12\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 6 \u0026amp; 0.24\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 4 \u0026amp; 0.16 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \u0026amp; 44 \u0026amp; 1.76 \\newline \\hline \\end{array} $$\n$$ \\bar{x} = \\frac{\\sum x_in_i}{n} = \\frac{44}{25}= 1.76 \\mbox{ children}\\qquad \\bar{x}=\\sum{x_if_i} = 1.76 \\mbox{ children}. $$\nThat means that the value that best represent the number of children in the families of the sample is 1.76 children.\nExample - Grouped data. Using the data of the sample of student heights, the arithmetic mean is\n$$\\bar{x} = \\frac{179+173+\\cdots+187}{30} = 175.07 \\mbox{ cm}.$$\nor using the frequency table and taking the class marks as $x_i$,\n$$ \\begin{array}{crrrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; x_in_i \u0026amp; x_if_i\\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 310 \u0026amp; 10.33\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; 0.27 \u0026amp; 1320 \u0026amp; 44.00\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.36 \u0026amp; 1925 \u0026amp; 64.17\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 0.23 \u0026amp; 1295 \u0026amp; 43.17\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 0.07 \u0026amp; 390 \u0026amp; 13 \\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; 1 \u0026amp; 5240 \u0026amp; 174.67 \\newline \\hline \\end{array} $$\n$$ \\bar{x} = \\frac{\\sum x_in_i}{n} = \\frac{5240}{30}= 174.67 \\mbox{ cm} \\qquad \\bar{x}=\\sum{x_if_i} = 174.67 \\mbox{ cm}. $$\nObserve that when the mean is calculated from the table the result differs a little from the real value, because the values used in the calculations are the class marks instead of the actual values.\nWeighted mean In some cases the values of the sample have different importance. In that case the importance or weight of each value of the sample must be taken into account when calculating the mean.\nDefinition - Sample weighted mean $\\bar{x}_p$. Given a sample of values $x_1,\\ldots,x_n$ where every value $x_i$ has a weight $w_i$, the sample weighted mean of variable $X$ is the sum of the product of each value by its weight, divided by sum of weights\n$$\\bar{x}_w = \\frac{\\sum x_iw_i}{\\sum w_i}$$\nFrom the frequency table can be calculated with the formula\n$$\\bar{x}_w = \\frac{\\sum x_iw_in_i}{\\sum w_i}$$\nExample. Assume that a student wants to calculate a representative measure of his/her performance in a course. The grade and the credits of every subjects are\nSubject Credits Grade Maths 6 5 Economics 4 3 Chemistry 8 6 The arithmetic mean is\n$$\\bar{x} = \\frac{\\sum x_i}{n} = \\frac{5+3+6}{3}= 4.67 \\text{ points}.$$\nHowever, this measure does not represent well the performance of the student, as not all the subjects have the same importance and require the same effort to pass. Subjects with more credits require more work and must have more weight in the calculation of the mean.\nIn this case it is better to use the weighted mean, using the credits as the weights of grades, as a representative measure of the student effort\n$$ \\bar{x}_w = \\frac{\\sum x_iw_i}{\\sum w_i} = \\frac{5\\cdot 6+3\\cdot 4+6\\cdot 8}{6+4+8}= \\frac{90}{18} = 5 \\text{ points}. $$\nMedian Definition - Sample median $Me$. The sample median of a variable $X$ is the value that is in the middle of the ordered sample. The median divides the sample distribution into two equal parts, that is, there are the same number of values above and below the median. Therefore, it has cumulative frequencies $N_{Me}= n/2$ y $F_{Me}= 0.5$.\nWatch out! It can not be calculated for nominal variables.\nWith non-grouped data, there are two possibilities:\nOdd sample size: The median is the value in the position $\\frac{n+1}{2}$. Even sample size: The median is the average of values in positions $\\frac{n}{2}$ and $\\frac{n}{2}+1$. Example. Using the data of the sample with the number of children of families, the sample size is 25, that is odd, and the median is the value in the position $\\frac{25+1}{2} = 13$ of the sorted sample.\n$$0,0,1,1,1,1,1,1,2,2,2,2,\\fbox{2},2,2,2,2,2,2,2,2,2,3,3,4$$\nAnd the median is 2 children.\nWith the frequency table, the median is the lowest value with a cumulative absolute frequency greater than or equal to $13$, or with a cumulative relative frequency greater than or equal to $0.5$.\n$$ \\begin{array}{rrrrr} \\hline x_i \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 2 \u0026amp; 0.08\\newline 1 \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 8 \u0026amp; 0.32\\newline \\color{red}2 \u0026amp; 14 \u0026amp; 0.56 \u0026amp; 22 \u0026amp; 0.88\\newline 3 \u0026amp; 2 \u0026amp; 0.08 \u0026amp; 24 \u0026amp; 0.96\\newline 4 \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 1 \\newline \\hline \\end{array} $$\nMedian calculation for grouped-data For grouped data the median is calculated from the ogive, interpolating in the class with cumulative relative frequency 0.5.\nBoth expressions are equal as the angle $\\alpha$ is the same, and solving the equation we get that the formula for the median is\n$$ Me=l_i+\\frac{0.5-F_{i-1}}{F_i-F_{i-1}}(l_i-l_{i-1})=l_i+\\frac{0.5-F_{i-1}}{f_i}a_i $$\nExample - Grouped data. Using the data of the sample of student heights, the median falls in class (170,180].\nAnd interpolating in interval (170,180] we get\nEquating both expressions and solving the equation, we get\n$$ Me= 170+\\frac{0.5-0.34}{0.7-0.34}(180-170)=170+\\frac{0.16}{0.36}10=174.54 \\mbox{ cm}. $$\nThis means that half of the students in the sample have an height lower than or equal to 174.54 cm and the other half greater than or equal to.\nMode Definition - Sample Mode $Mo$. The sample mode of a variable $X$ is the most frequent value in the sample. With grouped data the modal class is the class with the highest frequency.\nIt can be calculated for all types of variables (qualitative and quantitative).\nDistributions can have more than one mode.\nExample. Using the data of the sample with the number of children of families, the value with the highest frequency is $2$, that is the mode $Mo = 2$ children.\n$$ \\begin{array}{rr} \\hline x_i \u0026amp; n_i \\newline \\hline 0 \u0026amp; 2 \\newline 1 \u0026amp; 6 \\newline \\color{red} 2 \u0026amp; 14 \\newline 3 \u0026amp; 2 \\newline 4 \u0026amp; 1 \\newline \\hline \\end{array} $$\nUsing the data of the sample of student heights, the class with the highest frequency is $(170,180]$ that is the modal class $Mo=(170,180]$.\n$$ \\begin{array}{cr} \\hline X \u0026amp; n_i \\newline \\hline (150,160] \u0026amp; 2 \\newline (160,170] \u0026amp; 8 \\newline \\color{red}{(170,180]} \u0026amp; 11 \\newline (180,190] \u0026amp; 7 \\newline (190,200] \u0026amp; 2 \\newline \\hline \\end{array} $$\nWhich central tendency statistic should I use? In general, when all the central tendency statistics can be calculated, is advisable to use them as representative values in the following order:\nThe mean. Mean takes more information from the sample than the others, as it takes into account the magnitude of data.\nThe median. Median takes less information than mean but more than mode, as it takes into account the order of data.\nThe mode. Mode is the measure that fewer information takes from the sample, as it only takes into account the absolute frequency of values.\nBut, be careful with outliers, as the mean can be distorted by them. In that case it is better to use the median as the value most representative.\nExample. If a sample of number of children of 7 families is\n0, 0, 1, 1, 2, 2, 15, then, $\\bar{x}=3$ children and $Me=1$ children.\nWhich measure represent better the number of children in the sample?\nNon-central location measures The non-central location measures or quantiles divide the sample distribution in equal parts.\nThe most used are:\nQuartiles: Divide the distribution into 4 equal parts. There are 3 quartiles: $C_1$ (25% accumulated) , $C_2$ (50% accumulated), $C_3$ (75% accumulated).\nDeciles: Divide the distribution into 10 equal parts. There are 9 deciles: $D_1$ (10% accumulated) ,…, $D_9$ (90% accumulated).\nPercentiles: Divide the distribution into en 100 equal parts. There are 99 percentiles: $P_1$ (1% accumulated),…, $P_{99}$ (99% accumulated).\nObserve that there is a correspondence between quartiles, deciles and percentiles. For example, first quartile coincides with percentile 25, and fourth decile coincides with the percentile 40.\nQuantiles are calculated in a similar way to the median. The only difference lies in the cumulative relative frequency that correspond to every quantile.\nExample. Using the data of the sample with the number of children of families, the cumulative relative frequencies were\n$$ \\begin{array}{rr} \\hline x_i \u0026amp; F_i \\newline \\hline 0 \u0026amp; 0.08\\newline 1 \u0026amp; 0.32\\newline 2 \u0026amp; 0.88\\newline 3 \u0026amp; 0.96\\newline 4 \u0026amp; 1\\newline \\hline \\end{array} $$\n$$ \\begin{aligned} F_{Q_1}=0.25 \u0026amp;\\Rightarrow Q_1 = 1 \\text{ children},\\newline F_{Q_2}=0.5 \u0026amp;\\Rightarrow Q_2 = 2 \\text{ children},\\newline F_{Q_3}=0.75 \u0026amp;\\Rightarrow Q_3 = 2 \\text{ children},\\newline F_{D_4}=0.4 \u0026amp;\\Rightarrow D_4 = 2 \\text{ children},\\newline F_{P_{92}}=0.92 \u0026amp;\\Rightarrow P_{92} = 3 \\text{ children}. \\end{aligned}$$\nDispersion statistics Dispersion or spread refers to the variability of data. So, dispersion statistics measure how the data values are scattered in general, or with respect to a central location measure.\nFor quantitative variables, the most important are:\nRange Interquartile range Variance Standard deviation Coefficient of variation Range Definition - Sample range. The sample range of a variable $X$ is the difference between the the maximum and the minimum values in the sample.\n$$\\text{Range} = \\max_{x_i} -\\min_{x_i}$$\nThe range measures the largest variation among the sample data. However, it is very sensitive to outliers, as they appear at the ends of the distribution, and for that reason is rarely used.\nInterquartile range The following measure avoids the problem of outliers and is much more used.\nDefinition - Sample interquartile range. The sample interquartile range of a variable $X$ is the difference between the third and the first sample quartiles.\n$$\\text{IQR} = Q_3-Q_1$$\nThe interquartile range measures the spread of the 50% central data.\nBox plot The dispersion of a variable in a sample can be graphically represented with a box plot, that represent five descriptive statistics (minimum, quartiles and maximum) known as the five-numbers. It consist in a box, drawn from the lower to the upper quartile, that represent the interquartile range, and two segments, known as the lower and the upper whiskers. Usually the box is split in two with the median.\nThis chart is very helpful as it serves to many purposes:\nIt serves to measure the spread of data as it represents the range and the interquartile range. It serves to detect outliers, that are the values outside the interval defined by the whiskers. It serves to measure the symmetry of distribution, comparing the length of the boxes and whiskers above and below the median. Example. The chart below shows a box plot of newborn weights.\nTo create a box plot follow the steps below:\nCalculate the quartiles.\nDraw a box from the lower to the upper quartile.\nSplit the box with the median or second quartile.\nFor the whiskers calculate first two values called fences $f_1$ y $f_2$. The lower fence is the lower quartile minus one and a half the interquartile range, and the upper fence is the upper quartile plus one and a half the interquartile range:\n$$\\begin{aligned} f_1\u0026amp;=Q_1-1.5,\\text{IQR}\\newline f_2\u0026amp;=Q_3+1.5,\\text{IQR} \\end{aligned}$$\nThe fences define the interval where data are considered normal. Any value outside that interval is considered an outlier.\nFor the lower whisker draw a segment from the lower quartile to the lower value in the sample grater than or equal to $f_1$, and for the upper whisker draw a segment from the upper quartile to the highest value in the sample lower than or equal to $f_2$.\nThe whiskers are not the fences. Finally, if there are outliers, draw a dot at every outlier. Example. The box plot for the sample with the number of children si shown below.\nDeviations from the mean Another way of measuring spread of data is with respect to a central tendency measure, as for example the mean.\nIn that case, it is measured the distance from every value in the sample to the mean, that is called deviation from the mean·\nIf deviations are big, the mean is less representative than when they are small.\nExample. The grades of 3 students in a course with subjects $A$, $B$ and $C$ are shown below.\n$$ \\begin{array}{cccc} \\hline A \u0026amp; B \u0026amp; C \u0026amp; \\bar x\\newline 0 \u0026amp; 5 \u0026amp; 10 \u0026amp; 5\\newline 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 5\\newline 5 \u0026amp; 5 \u0026amp; 5 \u0026amp; 5\\newline \\hline \\end{array} $$\nAll the students have the same mean, but, in which case does the mean represent better the course performance?\nVariance and standard deviation Definition \u0026ndash; Sample variance $s^2$. The sample variance of a variable $X$ is the average of the squared deviations from the mean.\n$$s^2 = \\frac{\\sum (x_i-\\bar x)^2n_i}{n} = \\sum (x_i-\\bar x)^2f_i$$\nIt can also be calculated with the formula\n$$s^2 = \\frac{\\sum x_i^2n_i}{n} -\\bar x^2= \\sum (x_i^2f_i)-\\bar x^2$$\nThe variance has the units of the variable squared, and to ease its interpretation it is common to calculate its square root.\nDefinition - Sample standard deviation $s$. The sample standard deviation of a variable $X$ is the square root of the variance.\n$$s = +\\sqrt{s^2}$$\nBoth variance and standard deviation measure the spread of data around the mean. When the variance or the standard deviation are small, the sample data are concentrated around the mean, and the mean is a good representative measure. In contrast, when variance or the standard deviation are high, the sample data are far from the mean, and the mean does not represent so well.\nStandard deviation small $\\Rightarrow$ Mean is representative Standard deviation big $\\Rightarrow$ Mean is unrepresentative Example. The following samples contains the grades of 2 students in 2 subjects\nWhich mean is more representative?\nExample - Non-grouped data. Using the data of the sample with the number of children of families, with mean $\\bar x= 1.76$ children, and adding a new column to the frequency table with the squared values,\n$$ \\begin{array}{rrr} \\hline x_i \u0026amp; n_i \u0026amp; x_i^2n_i \\newline \\hline 0 \u0026amp; 2 \u0026amp; 0 \\newline 1 \u0026amp; 6 \u0026amp; 6 \\newline 2 \u0026amp; 14 \u0026amp; 56\\newline 3 \u0026amp; 2 \u0026amp; 18\\newline 4 \u0026amp; 1 \u0026amp; 16 \\newline \\hline \\sum \u0026amp; 25 \u0026amp; 96 \\newline \\hline \\end{array}$$\n$$s^2 = \\frac{\\sum x_i^2n_i}{n}-\\bar x^2 = \\frac{96}{25}-1.76^2= 0.7424 \\mbox{ children}^2.$$\nand the standard deviation is $s=\\sqrt{0.7424} = 0.8616$ children.\nCompared to the range, that is 4 children, the standard deviation is not very large, so we can conclude that the dispersion of the distribution is small and consequently the mean, $\\bar x=1.76$ children, represents quite well the number of children of families of the sample.\nExample - Grouped data. Using the data of the sample with the heights of students and grouping heights in classes, we got a mean $\\bar x=174.67$ cm. The calculation of variance is the same than for non-grouped data but using the class marks.\n$$ \\begin{array}{crrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i^2n_i \\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; 48050\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; 217800\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 336875\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 239575\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 76050\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; 918350 \\newline \\hline \\end{array} $$\n$$s^2 = \\frac{\\sum x_i^2n_i}{n}-\\bar x^2 = \\frac{918350}{30}-174.67^2= 102.06 \\mbox{ cm}^2,$$\nand the standard deviation is $s=\\sqrt{102.06} = 10.1$ cm.\nThis value is quite small compared to the range of the variable, that goes from 150 to 200 cm, therefore the distribution of heights has little dispersion and the mean is very representative.\nCoefficient of variation Both, variance and standard deviation, have units and that makes difficult to interpret them, specially when comparing distributions of variables with different units.\nFor that reason it is also common to use the following dispersion measure that has no units.\nDefinition - Sample coefficient of variation $cv$. The sample coefficient of variation of a variable $X$ is the quotient between the sample standard deviation and the absolute value of the sample mean.\n$$cv = \\frac{s}{|\\bar x|}$$\nThe coefficient of variation measures the relative dispersion of data around the sample mean.\nAs it has no units, it is easier to interpret: The higher the coefficient of variation is, the higher the relative dispersion with respect to the mean and the less representative the mean is.\nThe coefficient of variation it is very helpful to compare dispersion in distributions of different variables, even if variables have different units.\nExample. In the sample of the number of children, where the mean was $\\bar x=1.76$ and the standard deviation was $s=0.8616$ children, the coefficient of variation is\n$$cv = \\frac{s}{|\\bar x|} = \\frac{0.8616}{|1.76|} = 0.49.$$\nIn the sample of heights, where the mean was $\\bar x=174.67$ cm and the standard deviation was $s=10.1$ cm, the coefficient of variation is\n$$cv = \\frac{s}{|\\bar x|} = \\frac{10.1}{|174.67|} = 0.06.$$\nThis means that the relative dispersion in the heights distribution is lower than in the number of children distribution, and consequently the mean of height is most representative than the mean of number of children.\nShape statistics They are measures that describe the shape of the distribution.\nIn particular, the most important aspects are:\nSymmetry It measures the symmetry of the distribution with respect to the mean. The statistics most used is the coefficient of skewness.\nKurtosis: It measures the concentration of data around the mean of the distribution. The statistics most used is the coefficient of kurtosis.\nCoefficient of skewness Definition - Sample coefficient of skewness $g_1$. The sample coefficient of skewness of a variable $X$ is the average of the deviations of values from the sample mean to cube, divided by the standard deviation to cube.\n$$g_1 = \\frac{\\sum (x_i-\\bar x)^3 n_i/n}{s^3} = \\frac{\\sum (x_i-\\bar x)^3 f_i}{s^3}$$\nThe coefficient of skewness measures the symmetric or skewness of the distribution, that is, how many values in the sample are above or below the mean and how far from it.\n$g_1=0$ indicates that there are the same number of values in the sample above and below the mean and equally deviated from it, and the distribution is symmetrical. $g_1\u0026lt;0$ indicates that there are more values above the mean than below it, but the values below are further from it, and the distribution is left-skewed (it has longer tail to the left). $g_1\u0026gt;0$ indicates that there are more values below the mean than above it, but the values above are further from it, and the distribution is right-skewed (it has longer tail to the right). Example - Grouped data. Using the frequency table of the sample with the heights of students and adding a new column with the deviations from the mean $\\bar x = 174.67$ cm to cube, we get\n$$ \\begin{array}{crrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i-\\bar x \u0026amp; (x_i-\\bar x)^3 n_i \\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; -19.67 \u0026amp; -15221.00\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; -9.67 \u0026amp; -7233.85\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.33 \u0026amp; 0.40\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 10.33 \u0026amp; 7716.12\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 20.33 \u0026amp; 16805.14\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; \u0026amp; 2066.81 \\newline \\hline \\end{array} $$\n$$g_1 = \\frac{\\sum (x_i-\\bar x)^3n_i/n}{s^3} = \\frac{2066.81/30}{10.1^3} = 0.07.$$\nAs it is close to 0, that means that the distribution of heights is fairly symmetrical.\nCoefficient of kurtosis Definition - Sample coefficient of kurtosis $g_2$ The sample coefficient of kurtosis of a variable $X$ is the average of the deviations of values from the sample mean to the fourth power, divided by the standard deviation to the fourth power and minus 3.\n$$g_2 = \\frac{\\sum (x_i-\\bar x)^4 n_i/n}{s^4}-3 = \\frac{\\sum (x_i-\\bar x)^4 f_i}{s^4}-3$$\nThe coefficient of kurtosis measures the concentration of data around the mean and the length of tails of distribution. The normal (Gaussian bell-shaped) distribution is taken as a reference.\n$g_2=0$ indicates that the kurtosis is normal, that is, the concentration of values around the mean is the same than in a Gaussian bell-shaped distribution (mesokurtic). $g_2\u0026lt;0$ indicates that the kurtosis is less than normal, that is, the concentration of values around the mean is less than in a Gaussian bell-shaped distribution (platykurtic). $g_2\u0026gt;0$ indicates that the kurtosis is greater than normal, that is, the concentration of values around the mean is greater than in a Gaussian bell-shaped distribution (leptokurtic). Example - Grouped data. Using the frequency table of the sample with the heights of students and adding a new column with the deviations from the mean $\\bar x = 174.67$ cm to the fourth power, we get\n$$ \\begin{array}{rrrrr} \\hline X \u0026amp; x_i \u0026amp; n_i \u0026amp; x_i-\\bar x \u0026amp; (x_i-\\bar x)^4 n_i\\newline \\hline (150,160] \u0026amp; 155 \u0026amp; 2 \u0026amp; -19.67 \u0026amp; 299396.99\\newline (160,170] \u0026amp; 165 \u0026amp; 8 \u0026amp; -9.67 \u0026amp; 69951.31\\newline (170,180] \u0026amp; 175 \u0026amp; 11 \u0026amp; 0.33 \u0026amp; 0.13\\newline (180,190] \u0026amp; 185 \u0026amp; 7 \u0026amp; 10.33 \u0026amp; 79707.53\\newline (190,200] \u0026amp; 195 \u0026amp; 2 \u0026amp; 20.33 \u0026amp; 341648.49\\newline \\hline \\sum \u0026amp; \u0026amp; 30 \u0026amp; \u0026amp; 790704.45 \\newline \\hline \\end{array} $$\n$$g_2 = \\frac{\\sum (x_i-\\bar x)^4n_i/n}{s^4} - 3 = \\frac{790704.45/30}{10.1^4}-3 = -0.47.$$\nAs it is a negative value but not too far from 0, that means that the distribution of heights is a little bit platykurtic.\nAs we will see in the chapters of inferential statistics, many of the statistical test can only be applied to normal (bell-shaped) populations.\nNormal distributions are symmetrical and mesokurtic, and therefore, their coefficients of symmetry and kurtosis are equal to 0. So, a way of checking if a sample comes from a normal population is looking how far are the coefficients of skewness and kurtosis from 0.\nIn general, the normality of population is rejected when $g_1$ or $g_2$ are outside the interval $[-2,2]$. In that case, is common to apply a transformation to the variable to correct non-normality.\nNon-normal distributions Non-normal right-skewed distribution An example of left-skewed distribution is the household income.\nNon-normal left-skewed distribution An example of left-skewed distribution is the age at death.\ndistribution Non-normal bimodal distribution An example of left-skewed distribution is the age at death.\nVariable transformations In many cases, the raw sample data are transformed to correct non-normality of distribution or just to get a more appropriate scale.\nFor example, if we are working with heights in metres and a sample contains the following values:\n$$ 1.75 \\mbox{ m}, 1.65 \\mbox{ m}, 1.80 \\mbox{ m}, $$\nit is possible to avoid decimals multiplying by 100, that is, changing from metres to centimetres:\n$$ 175 \\mbox{ cm}, 165 \\mbox{ cm}, 180 \\mbox{ cm}, $$\nAnd it is also possible to reduce the magnitude of data subtracting the minimum value in the sample, in this case 165 cm:\n$$ 10 \\mbox{ cm}, 0 \\mbox{ cm}, 15 \\mbox{ cm}. $$\nIt is obvious that these data are easier to work with than the original ones. In essences, what it is been done is to apply the following transformation to the data:\n$$Y= 100X-165$$\nLinear transformations One of the most common transformations is the linear transformation:\n$$Y=a+bX.$$\nFor a linear transformation, the mean and the standard deviation of the transformed variable are\n$$ \\begin{aligned} \\bar y \u0026amp;= a+ b\\bar x,\\newline s_{y} \u0026amp;= |b|s_{x} \\end{aligned} $$\nAdditionally, the coefficient of kurtosis does not change and the coefficient of skewness changes only the sign if $b$ is negative.\nStandardization and standard scores One of the most common linear transformations is the standardization.\nDefinition - Standardized variable and standard scores. The standardized variable of a variable $X$ is the variable that results from subtracting the mean from $X$ and dividing it by the standard deviation\n$$Z=\\frac{X-\\bar x}{s_{x}}.$$\nFor each value $x_i$ of the sample, the standard score is the value that results of applying the standardization transformation\n$$z_i=\\frac{x_i-\\bar x}{s_{x}}.$$\nThe standard score is the number of standard deviations a value is above or below the mean, and it is useful to avoid the dependency of the variable from its measurement units. This helps, for instance, to compare values from different variables or samples. The standardized variable always has mean 0 and standard deviation 1.\n$$\\bar z = 0 \\qquad s_{z} = 1$$\nExample. The grades of 5 students in 2 subjects are\n$$ \\begin{array}{rccccccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; \\newline \\hline X: \u0026amp; 2 \u0026amp; 5 \u0026amp; 4 \u0026amp; \\color{red} 8 \u0026amp; 6 \u0026amp; \\qquad \u0026amp; \\bar x = 5 \u0026amp; \\quad s_x = 2\\newline Y: \u0026amp; 1 \u0026amp; 9 \u0026amp; \\color{red} 8 \u0026amp; 5 \u0026amp; 2 \u0026amp; \\qquad \u0026amp; \\bar y = 5 \u0026amp; \\quad s_y = 3.16\\newline \\hline \\end{array} $$\nDid the fourth student get the same performance in subject $X$ than the third student in subject $Y$?\nIt might seem that both students had the same performance in every subject because they have the same grade, but in order to get the performance of every student relative to the group of students, the dispersion of grades in every subject must be considered. For that reason it is better to use the standard score as a measure of relative performance.\n$$ \\begin{array}{cccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; -1.50 \u0026amp; 0.00 \u0026amp; -0.50 \u0026amp; \\color{red}{1.50} \u0026amp; 0.50 \\newline Y: \u0026amp; -1.26 \u0026amp; 1.26 \u0026amp; \\color{red}{0.95} \u0026amp; 0.00 \u0026amp; -0.95\\newline \\hline \\end{array} $$\nThat is, the student with an 8 in $X$ is $1.5$ times the standard deviation above the mean of $X$, while the student with an 8 in $Y$ is only $0.95$ times the standard deviation above the mean of $Y$. Therefore, the first student had a higher performance in $X$ than the second in $Y$.\nFollowing with this example and considering both subjects, which is the best student?\nIf we only consider the sum of grades\n$$\\begin{array}{rccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; 2 \u0026amp; 5 \u0026amp; 4 \u0026amp; 8 \u0026amp; 6 \\newline Y: \u0026amp; 1 \u0026amp; 9 \u0026amp; 8 \u0026amp; 5 \u0026amp; 2 \\newline \\hline \\sum \u0026amp; 3 \u0026amp; \\color{red}{14} \u0026amp; 12 \u0026amp; 13 \u0026amp; 8 \\end{array} $$\nthe best student is the second one.\nBut if the relative performance is considered, taking the standard scores\n$$ \\begin{array}{rccccc} \\mbox{Student:} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5\\newline \\hline X: \u0026amp; -1.50 \u0026amp; 0.00 \u0026amp; -0.50 \u0026amp; 1.50 \u0026amp; 0.50 \\newline Y: \u0026amp; -1.26 \u0026amp; 1.26 \u0026amp; 0.95 \u0026amp; 0.00 \u0026amp; -0.95\\newline \\hline \\sum \u0026amp; -2.76 \u0026amp; 1.26 \u0026amp; 0.45 \u0026amp; \\color{red}{1.5} \u0026amp; -0.45 \\end{array} $$\nthe best student is the fourth one.\nNon-linear transformations Non-linear transformations are also common to correct non-normality of distributions.\nThe square transformation $Y=X^2$ compresses small values and expand large values. So, it is used to correct left-skewed distributions.\nThe square root transformation $Y=\\sqrt x$, the logarithmic transformation $Y= \\log X$ and the inverse transformation $Y=1/X$ compress large values and expand small values. So, they are used to correct right-skewed distributions.\nFactors Sometimes it is interesting to describe the frequency distribution of the main variable for different subsamples corresponding to the categories of another variable known as classificatory variable or factor.\nExample. Dividing the sample of heights by gender we get two subsamples\n$$ \\begin{array}{lll} \\hline \\mbox{Females} \u0026amp; \u0026amp; 173, 158, 174, 166, 162, 177, 165, 154, 166, 182, 169, 172, 170, 168. \\newline \\mbox{Males} \u0026amp; \u0026amp; 179, 181, 172, 194, 185, 187, 198, 178, 188, 171, 175, 167, 186, 172, 176, 187. \\newline \\hline \\end{array} $$\nComparing distributions for the levels of a factor Usually factors allow to compare the distribution of the main variable for every category of the factor.\nExample. The following charts allow to compare the distribution of heights according to the gender.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"2891575ebbe9e976e5448211d8b3c292","permalink":"/en/teaching/statistics/manual/descriptive-statistics/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/descriptive-statistics/","section":"teaching","summary":"Descriptive Statistics is the part of Statistics in charge of representing, analysing and summarizing the information contained in the sample.\nAfter the sampling process, this is the next step in every statistical study and usually consists of:","tags":["Statistics","Biostatistics","Descriptive-Statistics"],"title":"Descriptive Statistics","type":"book"},{"authors":null,"categories":["Calculus","Geometry"],"content":"Scalars and Vectors Scalars Some phenomena of Nature can be described by a number and a unit of measurement.\nDefinition - Scalar. A scalar is a number that expresses a magnitude without direction. Example. The height or weight of a person, the temperature of a gas or the time it takes a vehicle to travel a distance.\nHowever, there are other phenomena that cannot be described adequately by a scalar. If, for instance, a sailor wants to head for seaport and only knows the intensity of wind, he will not know what direction to take. The description of wind requires two elements: intensity and direction.\nVectors Definition - Vector. A vector is a number that expresses a magnitude and has associated an orientation and a sense. Example. The velocity of a vehicle or the force applied to an object.\nGeometrically, a vector is represented by an directed line segment, that is, an arrow.\nVector representation An oriented segment can be located in different places in a Cartesian space. However, regardless of where it is located, if the length and the direction of the segment does not change, the segment represents always the same vector.\nThis allows to represent all vectors with the same origin, the origin of the Cartesian coordinate system. Thus, a vector can be represented by the Cartesian coordinates of its final end in any Euclidean space.\nVector from two points Given two points $P$ and $Q$ of a Cartesian space, the vector that starts at $P$ and ends at $Q$ has coordinates $\\vec{PQ}=Q-P$.\nExample. Given the points $P=(1,1)$ and $Q=(3,4)$ in the real plane $\\mathbb{R}^2$, the coordinates of the vector that start at $P$ and ends at $Q$ are $$\\vec{PQ} = Q-P = (3,4)-(1,1) = (3-2,4-1) = (2,3).$$\nModule of a vector Definition - Module of a vector. Given a vector $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, the module of $\\mathbf{v}$ is $$|\\mathbf{v}| = \\sqrt{v_1^2+ \\cdots + v_n^2}.$$ The module of a vector coincides with the length of the segment that represents the vector.\nExamples. Let $\\mathbf{u}=(3,4)$ be a vector in $\\mathbb{R}^2$, then its module is $$|\\mathbf{u}| = \\sqrt{3^2+4^2} = \\sqrt{25} = 5$$\nLet $\\mathbf{v}=(4,7,4)$ be a vector in $\\mathbb{R}^3$, then its module is $$|\\mathbf{v}| = \\sqrt{4^2+7^2+4^2} = \\sqrt{81} = 9$$\nUnit vectors Definition - Unit vector. A vector $\\mathbf{v}$ in $\\mathbb{R}^n$ is a unit vector if its module is one, that is, $\\vert v\\vert=1$. The unit vectors with the direction of the coordinate axes are of special importance and they form the standard basis.\nIn $\\mathbb{R}^2$ the standard basis is formed by two vectors $\\mathbf{i}=(1,0)$ and $\\mathbf{j}=(0,1)$.\nIn $\\mathbb{R}^3$ the standard basis is formed by three vectors $\\mathbf{i}=(1,0,0)$, $\\mathbf{j}=(0,1,0)$ and $\\mathbf{k}=(0,0,1)$.\nSum of two vectors Definition - Sum of two vectors. Given two vectors $\\mathbf{u}=(u_1,\\cdots,u_n)$ y $\\mathbf{v}=(v_1,\\cdots,v_n)$ de $\\mathbb{R}^n$, the sum of $\\mathbf{u}$ and $\\mathbf{v}$ is\n$$\\mathbf{u}+\\mathbf{v} = (u_1+v_1,\\ldots, u_n+v_n).$$\nExample. Let $\\mathbf{u}=(3,1)$ and $\\mathbf{v}=(2,3)$ two vectors in $\\mathbb{R}^2$, then the sum of them is $$\\mathbf{u}+\\mathbf{v} = (3+2,1+3) = (5,4).$$\nProduct of a vector by a scalar Definition - Product of a vector by a scalar. Given a vector $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, and a scalar $a\\in \\mathbb{R}$, the product of $\\mathbf{v}$ by $a$ is\n$$a\\mathbf{v} = (av_1,\\ldots, av_n).$$\nExample. Let $\\mathbf{v}=(2,1)$ a vector in $\\mathbb{R}^2$ and $a=2$ a scalar, then the product of $a$ by $\\mathbf{v}$ is $$a\\mathbf{v} = 2(2,1) = (4,2).$$\nExpressing a vector as a linear combination of the standard basis The sum of vectors and the product of vector by a scalar allow us to express any vector as a linear combination of the standard basis.\nIn $\\mathbb{R}^3$, for instance, a vector with coordinates $\\mathbf{v}=(v_1,v_2,v_3)$ can be expressed as the linear combination $$\\mathbf{v}=(v_1,v_2,v_3) = v_1\\mathbf{i}+v_2\\mathbf{j}+v_3\\mathbf{k}.$$\nDot product of two vectors Definition - Dot product of two vectors. Given the vectors $\\mathbf{u}=(u_1,\\cdots,u_n)$ and $\\mathbf{v}=(v_1,\\cdots,v_n)$ in $\\mathbb{R}^n$, the dot product of $\\mathbf{u}$ and $\\mathbf{v}$ is\n$$\\mathbf{u}\\cdot \\mathbf{v} = u_1v_1 + \\cdots + u_nv_n.$$\nExample. Let $\\mathbf{u}=(3,1)$ and $\\mathbf{v}=(2,3)$ two vectors in $\\mathbb{R}^2$, then the dot product of them is\n$$\\mathbf{u}\\cdot\\mathbf{v} = 3\\cdot 2 +1\\cdot 3 = 9.$$\nTheorem - Dot product. Given two vectors $\\mathbf{u}$ and $\\mathbf{v}$ in $\\mathbb{R}^n$, it holds that\n$$\\mathbf{u}\\cdot\\mathbf{v} = |\\mathbf{u}||\\mathbf{v}|\\cos\\alpha$$\nwhere $\\alpha$ is the angle between the vectors.\nParallel vectors Definition - Parallel vectors. Two vectors $\\mathbf{u}$ and $\\mathbf{v}$ are parallel if there is a scalar $a\\in\\mathbb{R}$ such that\n$$\\mathbf{u} = a\\mathbf{v}.$$\nExample. The vectors $\\mathbf{u}=(-4,2)$ and $\\mathbf{v}=(2,-1)$ in $\\mathbb{R}^2$ are parallel, as there is a scalar $-2$ such that $$\\mathbf{u}= (-4,2) = -2(2,-1) = -2\\mathbf{v}.$$\nOrthogonal and orthonormal vectors Definition - Orthogonal and orthonormal vectors. Two vectors $\\mathbf{u}$ and $\\mathbf{v}$ are orthogonal if their dot product is zero,\n$$\\mathbf{u}\\cdot \\mathbf{v} = 0.$$\nIf in addition both vectors are unit vectors, $\\vert\\mathbf{u}\\vert=\\vert\\mathbf{v}\\vert=1$, then the vectors are orthonormal.\nOrthogonal vectors are perpendicular, that is the angle between them is right. Examples. The vectors $\\mathbf{u}=(2,1)$ and $\\mathbf{v}=(-2,4)$ in $\\mathbb{R}^2$ are orthogonal, as $$\\mathbf{u}\\mathbf{v} = 2\\cdot -2 +1\\cdot 4 = 0,$$ but they are not orthonormal since $|\\mathbf{u}| = \\sqrt{2^2+1^2} \\neq 1$ and $|\\mathbf{v}| = \\sqrt{-2^2+4^2} \\neq 1$.\nThe vectors $\\mathbf{i}=(1,0)$ and $\\mathbf{j}=(0,1)$ in $\\mathbb{R}^2$ are orthonormal, as $$\\mathbf{i}\\mathbf{j} = 1\\cdot 0 +0\\cdot 1 = 0, \\quad |\\mathbf{i}| = \\sqrt{1^2+0^2} = 1, \\quad |\\mathbf j| = \\sqrt{0^2+1^2} = 1.$$\nLines Vectorial equation of a straight line Definition - Vectorial equation of a straight line. Given a point $P=(p_1,\\ldots,p_n)$ and a vector $\\mathbf{v}=(v_1,\\ldots,v_n)$ of $\\mathbb{R}^n$, the vectorial equation of the line $l$ that passes through the point $P$ with the direction of $\\mathbf{v}$ is\n$$l: X= P + t\\mathbf{v} = (p_1,\\ldots,p_n)+t(v_1,\\ldots,v_n) = (p_1+tv_1,\\ldots,p_n+tv_n)$$\nwith $t\\in\\mathbb{R}.$\nExample. Let $l$ the line of $\\mathbb{R}^3$ that goes through $P=(1,1,2)$ with the direction of $\\mathbf{v}=(3,1,2)$, then the vectorial equation of $l$ is $$ l : X= P + t\\mathbf{v} = (1,1,2)+t(3,1,2) = (1+3t,1+t,2+2t)\\quad t\\in\\mathbb{R}. $$\nParametric and Cartesian equations of a line From the vectorial equation of a line $l: X=P + t\\mathbf{v}=(p_1+tv_1,\\ldots,p_n+tv_n)$ is easy to obtain the coordinates of the the points of the line with $n$ parametric equations\n$$x_1(t)=p_1+tv_1, \\ldots, x_n(t)=p_n+tv_n$$\nfrom where, if $\\mathbf{v}$ is a vector with non-null coordinates ($v_i\\neq 0$ $\\forall i$), we can solve for $t$ and equal the equations getting the Cartesian equations\n$$\\frac{x_1-p_1}{v_1}=\\cdots = \\frac{x_n-p_n}{v_n}$$\nExample. Given a line with vectorial equation $l: X=(1,1,2)+t(3,1,2) =(1+3t,1+t,2+2t)$ in $\\mathbb{R^3}$, its parametric equations are\n$$x(t) = 1+3t, \\quad y(t)=1+t, \\quad z(t)=2+2t,$$ and the Cartesian equations are $$\\frac{x-1}{3}=\\frac{y-1}{1}=\\frac{z-2}{2}$$\nPoint-slope equation of a line in the plane In the particular case of the real plane $\\mathbb{R}^2$, if we have a line with vectorial equation $l: X=P+t\\mathbf{v}=(x_0,y_0)+t(a,b) = (x_0+ta,y_0+tb)$, its parametric equations are\n$$x(t)=x_0+ta,\\quad y(t)=y_0+tb$$\nand its Cartesian equation is\n$$\\frac{x-x_0}{a} = \\frac{y-y_0}{b}.$$\nFrom this, moving $b$ to the other side of the equation, we get $$y-y_0 = \\frac{b}{a}(x-x_0),$$ or renaming $m=b/a$,\n$$y-y_0=m(x-x_0).$$\nThis equation is known as the point-slope equation of the line.\nSlope of a line in the plane Definition - Slope of a line in the plane. Given a line $l: X=P+t\\mathbf{v}$ in the real plane $\\mathbb{R}^2$, with direction vector $\\mathbf{v}=(a,b)$, the slope of $l$ is $b/a$. Recall that given two points $P=(x_1,y_1)$ y $Q=(x_2,y_2)$ on the line $l$, we can take as a direction vector the vector from $P$ to $Q$, with coordinates $\\vec{PQ}=Q-P=(x_2-x_1,y_2-y_1)$. Thus, the slope of $l$ is $\\dfrac{y_2-y_1}{x_2-x_1}$, that is, the ratio between the changes in the vertical and horizontal axes.\nPlanes Vector equation of a plane in space To get the equation of a plane in the real space $\\mathbb{R}^3$ we can take a point of the plane $P=(x_0,y_0,z_0)$ and an orthogonal vector to the plane $\\mathbf{v}=(a,b,c)$. Then, any point $Q=(x,y,z)$ of the plane satisfies that the vector $\\vec{PQ} = (x-x_0,y-y_0,z-z_0)$ is orthogonal to $\\mathbf{v}$, and therefore their dot product is zero.\nDefinition - Vector equation of a plane in space. Given a point $P=(v_0,y_0,z_0)$ an a vector $\\mathbf{v}=(a,b,c)$ in the real space $\\mathbb{R}^3$, the vector equation of the plane that passes through $P$ orthogonal to $\\mathbf{v}=(a,b,c)$ is\n$$ \\begin{align*} \\vec{PQ}\\cdot\\mathbf{v} \u0026amp;= (x-x_0,y-y_0,z-z_0)(a,b,c) =\\newline \u0026amp;= a(x-x_0)+b(y-y_0)+c(z-z_0) = 0. \\end{align*} $$\nScalar equation of a plane in space From the vector equation of the plane we can get\n$$a(x-x_0)+b(y-y_0)+c(z-z_0) = 0 \\Leftrightarrow ax+by+cz=ax_0+by_0+cz_0,$$\nthat, renaming $d=ax_0+by_0+cz_0$, can be written as\n$$ax+by+cz=d,$$\nand is known as the scalar equation of the plane.\nExample. Given the point $P=(2,1,1)$ and the vector $\\mathbf{v}=(2,1,2)$, the vector equation of the plane that passes through $P$ and is orthogonal to $\\mathbf{v}$ is\n$$(x-2,y-1,z-1)(2,1,2)=2(x-2)+(y-1)+2(z-1)=0,$$\nand its scalar equation is\n$$2x+y+2z=7.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"22fd1e7f4350c0e3490590a757816c43","permalink":"/en/teaching/calculus/manual/analytic-geometry/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/analytic-geometry/","section":"teaching","summary":"Scalars and Vectors Scalars Some phenomena of Nature can be described by a number and a unit of measurement.\nDefinition - Scalar. A scalar is a number that expresses a magnitude without direction.","tags":null,"title":"Analytic Geometry","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Classify the following variables\nDaily hours of exercise. Nationality. Blood pressure. Severity of illness. Number of sport injuries in a year. Daily calorie intake. Size of clothing. Subjects passed in a course. Solution Quantitative continuous. Qualitative nominal. Quantitative continuous. Qualitative ordinal. Quantitative discrete. Quantitative continuous. Qualitative ordinal. Quantitative discrete. Exercise 2 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Compute:\nConstruct the frequency distribution table of the sample. Draw the bar chart of the sample and the polygon. Draw the cumulative frequency bar chart and polygon. Solution Injuries $n_i$ $f_i$ $N_i$ $F_i$ 0 6 0.2500 6 0.2500 1 11 0.4583 17 0.7083 2 5 0.2083 22 0.9167 3 2 0.0833 24 1.0000 3. Exercise 3 A survey about the daily number of medicines consumed by people over 70 shows the following results:\n3 1 2 2 0 1 4 2 3 5 1 3 2 3 1 4 2 4 3 2 3 5 0 1 2 0 2 3 0 1 1 5 3 4 2 3 0 1 2 3 Construct the frequency distribution table of the sample. Draw the bar chart of the sample and the polygon. Draw the cumulative relative frequency bar chart and polygon. Solution Medicines $n_i$ $f_i$ $N_i$ $F_i$ 1 8 0.200 13 0.325 2 10 0.250 23 0.575 3 10 0.250 33 0.825 4 4 0.100 37 0.925 5 3 0.075 40 1.000 3. Exercise 4 In a survey about the dependency of older people, 23 persons over 75 years were asked about the help they need in daily life. The answers were\nB D A B C C B C D E A B C E A B C D B B A A B where the meanings of letters are:\nA No help. B Help climbing stairs. C Help climbing stairs and getting up from a chair or bed. D Help climbing stairs, getting up and dressing. E Help for almost everything.\nConstruct the frequency distribution table and a suitable chart.\nSolution Help $n_i$ $f_i$ $N_i$ $F_i$ A 5 0.2174 5 0.2174 B 8 0.3478 13 0.5652 C 5 0.2174 18 0.7826 D 3 0.1304 21 0.9130 E 2 0.0870 23 1.0000 Exercise 5 The number of people treated in the emergency service of a hospital every day of November was\n15 23 12 10 28 7 12 17 20 21 18 13 11 12 26 30 6 16 19 22 14 17 21 28 9 16 13 11 16 20 Construct the frequency distribution table of the sample. Draw a suitable chart for the frequency distribution. Draw a suitable chart for the cumulative frequency distribution. Solution People $n_i$ $f_i$ $N_i$ $F_i$ [5,10] 4 0.1333 4 0.1333 (10,15] 9 0.3000 13 0.4333 (15,20] 9 0.3000 22 0.7333 (20,25] 4 0.1333 26 0.8667 (25,30] 4 0.1333 30 1.0000 3. Exercise 6 The following frequency distribution table represents the distribution of time (in min) required by people attended in a medical dispensary.\n$$ \\begin{array}{|c|c|c|c|c|} \\hline \\mbox{Time} \u0026amp; n_{i} \u0026amp; f_{i} \u0026amp; N_{i} \u0026amp; F_{i}\\newline \\hline \\left[ 0,5\\right) \u0026amp; 2 \u0026amp; \u0026amp; \u0026amp; \\newline \\hline \\left[ 5,10\\right) \u0026amp; \u0026amp; \u0026amp; 8 \u0026amp; \\newline \\hline \\left[ 10,15\\right) \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.7 \\newline \\hline \\left[ 15,20\\right) \u0026amp; 6 \u0026amp; \u0026amp; \u0026amp;\\newline \\hline \\end{array} $$\nComplete the table. Draw the ogive. Solution $$ \\begin{array}{|c|c|c|c|c|} \\hline \\mbox{Time} \u0026amp; n_{i} \u0026amp; f_{i} \u0026amp; N_{i} \u0026amp; F_{i}\\newline \\hline \\left[ 0,5\\right) \u0026amp; 2 \u0026amp; 0.1 \u0026amp; 2 \u0026amp; 0.1 \\newline \\hline \\left[ 5,10\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 8 \u0026amp; 0.4 \\newline \\hline \\left[ 10,15\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 14 \u0026amp; 0.7 \\newline \\hline \\left[ 15,20\\right) \u0026amp; 6 \u0026amp; 0.3 \u0026amp; 20 \u0026amp; 1\\newline \\hline \\end{array} $$\nExercise 7 The following table represents the frequency distribution of the yearly uses of a health insurance in a sample of clients of a insurance company.\nuses clients 0 4 1 8 2 6 3 3 4 2 5 1 7 1 Draw the box plot. Study the symmetry of the distribution.\nSolution Exercise 8 The box plots below correspond to the age of a sample of people by marital status.\nWhich group has higher ages? Which group has lower central dispersion? Which groups have outliers? At which group is the age distribution more asymmetric? Solution Widowers. Divorced. Widowers and divorced. Divorced. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"053ef795366cc6d9468a03875df23d5a","permalink":"/en/teaching/statistics/problems/frequency_charts/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/frequency_charts/","section":"teaching","summary":"Exercise 1 Classify the following variables\nDaily hours of exercise. Nationality. Blood pressure. Severity of illness. Number of sport injuries in a year. Daily calorie intake. Size of clothing. Subjects passed in a course.","tags":["Frequencies","Charts"],"title":"Problems of Frequency Tables and Charts","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Excel is a spreadsheet application that is part of the Microsoft Office suite.\nWhat is a spreadsheet? A spreadsheet is a program that allows the user to enter data and make calculations with them in a grid layout.\nThere are a lot of programs for managing spreadsheets but the best-known are Excel, in the Microsoft Office suite, and Calc, in the LibreOffice suite. Although Calc is opensource, with all the advantages associated therewith, Excel is by far the most widespread and mature spreadsheet, thus this manual covers Excel 2010. However, some of the procedures and methods explained in this manual are also valid for Calc.\nExcel 2010 main window The figure below shows a screenshot of the Excel 2010 main window where the different parts of the window have been highlighted.\nExcel 2010 ribbon The top ribbon of Excel 2010 contains a lot of buttons that perform different actions. These buttons are arranged in panels, and the panels are arranged in tabs. The main ribbon tabs are:\nFile – Performs file management tasks (new file, open file, save file, print file, etc.). It also contains general configuration options and help.\nHome – Common tools (clipboard, fonts, alignment, numbers format, insert rows and columns, etc.)\nInsert – Insert objects in the sheet (tables, illustrations, charts, hyperlinks, text, equations, etc.)\nPage Layout – Configure the printing (page setup, scale, themes, etc. )\nFormulas – Functions arranged in categories and formula auditing.\nData – Working with databases (import data, connection with databases, sort and filter data, data validation, etc.)\nReview – Spelling, commenting, protecting and sharing sheets.\nView – How Excel appears on screen (custom windows, grids lines, zoom, windows, etc. Does not affect printing).\nContextual tabs These tabs only appear in some contexts, as for example, when creating a chart or a picture.\nChart design Allows to select the type of chart.\nChart layout Allows the user to insert and configure some parts of charts (title, axis, leyend, gridlines, etc.)\nChart format Allows the user to change the aspect of charts (height, width, font, colors, background, etc.)\nPicture Allows to modify images (borders, rotation, crop, color, filters, special effects, etc.)\nIn addition to these tabs, users can create their own tabs and customise them with buttons at their convenience.\nThere is also a quick access toolbar just above the ribbon that can be customised with the most common buttons.\nAccess dialogs When you click the right bottom corner of any panel, the corresponding dialog is shown where all the related options are available.\nExample. Figure below shows the font dialog with all the options related to fonts (font family, font style, font size, etc.)\nContextual menu Clicking the right button of the mouse (right-clicking) a contextual menu is shown with some buttons or options to perform actions in that context. This menu has different options depending on the part of the windows that is clicked.\nExample. Figure below shows the contextual menu showed right-clicking any cell.\nWorkbooks, worksheets, rows, columns and cells An Excel file is a workbook with several worksheets that are two dimensional tables divided in columns and rows. The intersection of a column with a row is a cell that is where data are entered. Sheets have a maximum of 16,384 columns and 1,048,576 rows.\nEach worksheet has a name and they are arranged in tabs at the bottom. Columns and rows also have names; columns are named with letters at the top of the column and rows with numbers to the left of the row. This way each cell is identified by the name of the worksheet, the name of the column and the name of the row where it is located, and cell names follow the pattern: name-of-worksheet ! column-name row-name. However, to refer to any cell in the active worksheet, the worksheet name may be omitted.\nExample. The name of the selected cell in the figure below is Sheet1!C4.\nThe names of rows and columns can not be changed, but worksheet names can be changed by double-clicking on the name and typing the new name.\nRanges of cells A range of cells is a rectangular block of adjacent cells that is identified by top-left cell and the bottom-right cell separated by a colon, following the pattern top-left-cell-name:bottom-right-cell-name.\nExample. In the figure below the range B3:E5 is selected.\nSelecting cells, rows, columns, ranges and worksheets To select a cell just click it. To select a row click the header of the row or press the keys Shift+Spacebar. To select a column click the header of the column or press the keys Ctrl+Spacebar. To select a range click one corner cell and drag the cursor over the desired cells. To select the whole worksheet click the top-left corner of the worksheet or press the keys Ctrl+A.\nExample. The animation below shows how to select cell C3, then row 3, then column C, then range B3:D7 and finally the whole worksheet.\nData edition Insert data Data are entered into the cells by activating the cell (clicking it) and typing directly in the cell or in the input bar.\nExample. The animation below shows how to enter the text \u0026lsquo;Excel\u0026rsquo; in cell B2 and the number 2010 in cell C2, and then how to change the number of cell C2 to 2013.\nExcel has a smart autocomplete feature that proposes some options for completing the typed data.\nDelete data To delete the content of a cell or a range of cells simply select the it and press Supr key. It is also possible to delete the cell contents with the button Clear All.\nRemove cells, rows, columns and worksheets To remove a whole cell (not only the content), right-click the cell and select the option Delete.... In the dialog that appears select Shift cells left if you want the cells to the left of the removed cell to move to the left to fill the gap, or Shift cells up if you want the cells below the removed cell to move up to fill the gap.\nTo remove a whole row, right-click the header of the row and select the option Delete....\nTo remove a whole column, right-click the header of the column and select the option Delete....\nTo remove a worksheet, right-click the tab with the name of the worksheet and select the option Delete.... Warning: Removing worksheets cannot be undone!\nExample. This shows how to remove a cell, a row, a column and a worksheet.\nInsert cells, rows, columns and worksheets To insert a new cell in a position, right-click the current cell in that position and select the option Insert.... In the dialog that appears select Shift cells right if you want to move the cells to the right to make a gap for the new cell, or Shift cells down if you want to move the cells down to make a gap for the new cell.\nTo insert a new row, right-click the header of the row above which you want to insert the new row and select Insert.\nTo insert a new column, right-click the header of the column to the left of which you want to insert the new column and select Insert.\nTo insert a new worksheet, right-click the tab with the name of the worksheet to the left of which you want to insert the new worksheet and select Insert. In the dialog that appears select `Worksheet\u0026rsquo;.\nExample. The animation below shows how to insert a cell, a row, a column and a worksheet.\nCut, copy and paste Like in many other Windows applications, you can use the clipboard to cut, copy and paste cells, rows, columns and ranges contents.\nTo cut or copy a cell, row, column or range, right-click it and select the option Cut or Copy respectively, or press the keys Ctrl+x or Ctrl+c respectively. Both options copy the content of the cell, row, column or range to the clipboard, but the difference between cut and copy is that cut deletes the content from the current cell, row, column or range, while copy does not.\nTo paste the content of the clipboard in a new cell, row, column or range, select the cell or the first cell of the row, column or range and click the button Paste or press the keys Ctrl+v.\nExample. The animation below shows how to copy and paste the content of a cell, a row, a column and a range and a worksheet.\nAutofill A useful feature of Excel is the autofill of cells following a serie or pattern. In some cases, like for example dates, it is enough to write the content of the first cell and then click the bottom-right corner of the cell and drag the cursor over the column or row to fill the cells with the subsequent dates.\nFor numbers or text, this action replicates the content of the first cell in the others. To autofill with a series of numbers it is necessary to enter the first two numbers of the series in two consecutive cells, then select both cells, click the bottom-left corner and drag the cursor over the column or row to fill the cells with the numbers following in the series.\nExample. The animation below shows how to replicate the content of cell A1 to range A2:A10, then how to auto fill the range B1:B10 with the dates following date in cell B1, and finally how to auto fill the range C1:C10 with the series of even numbers.\nUndo and redo In the quick access toolbar there are buttons Undo and Redo . The Undo button undoes the last data edition action performed and the Redo button reverses the last undone action. If you press the undo button several $n$ times, it undoes the last $n$ actions, and the same happens with the redo button.\nExample. The animation below shows how to remove the content of cell B2, then change the content of cell C2 two times, then undo that action and finally redo the same actions.\nColumn and row sizing Column width and row height can be easily changed. To change the width of a column click the line between the column you want to resize and the next column in the column header, and then drag the pointer mouse to increase or reduce the column width. If you double-click this line the column width will auto resize to the width of the widest cell content in the column.\nIn a similar way, to change the height of a row click the line between the row you want to resize and the next row in the row header, and then drag the pointer mouse to increase or reduce the row height. If you double-click this line the row height will auto resize to the height of the highest cell content in the row.\nExample. The animation below shows how to resize the width of column C and the height of row 3 to fit the content of cell C3.\nFile management Data of workbooks are stored in files. Although Excel makes backups copies of your work regularly, it is good practice to save your work in files regularly.\nSave a file To save the content of a workbook in a file press the tab File and select the option Save. In the dialog that appears type the file name and select the storage unit and folder where you want to save the file. The default extension for Excel 2010 file names is xlsx.\nOpen a file To open an Excel file press the tab File and select the option Open. In the dialog that appears select the storage unit and folder where the file is saved and the file to open, and press the button Open.\nCreate a new workbook To create a new workbook press the tab File and select the option New. In the dialog that appears select Blank workbook. It is possible to create new workbooks from predefined templates.\nClose a workbook To close an open workbook press the tab File and select the option Close. If the last changes in the workbook haven\u0026rsquo;t been saved, a warning will appear allowing you to save the file before to close it.\nExporting and importing data Excel can export and import data in many formats. One of the most common formats is csv (comma separated values). In this format data is saved in a plain text file one row per line and separating columns with commas or semicolons.\nExport to csv format To export a worksheet to csv format file, click the option Save as of the ribbon\u0026rsquo;s File tab. In the dialog that appears select the option CSV (Comma delimited) (*.csv) from the drop-down list Save as type, give a name to the file, select the folder where to save it and click OK.\nExample. The animation below shows how to export a worksheet with a students database to a csv format file.\nImport from csv format To import csv format file click the option Open of the ribbon\u0026rsquo;s File tab. In the dialog that appears click the button to the right of the File name box and select the option Text Files (*.prn;*.txt;*.csv), select the csv format file and click OK.\nIf you want more control in the importation process, click the From Tex button of the Get External Data in the ribbon\u0026rsquo;s Data tab. In the dialog that appears select the csv format file and click the Import button. This brings another dialog where you can select if fields are delimited by a special character or are a fixed number of characters, the delimiter character (Tab, Semicolon, Comma, Space or other), the data format or every column (General, Text or Date). After that click the Finish button and in the dialog that appears select the cell where to put the imported data and click OK.\nExample. The animation below shows how to import the csv format file with the students database of the previous example.\nGetting help One of the most useful features of Microsoft Office programs is the system of help that they have. To get help about any issue in Excel click the option Help in the Help tab of the ribbon, and then click Microsoft Office Help. This shows a browser where you can enter some key words and Excel will search topics related to these words and present the search results in a list. Clicking the desired topic will show you help info about that topic.\nExample. The figure below shows the help search results for the word \u0026ldquo;cell\u0026rdquo;.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"0ef4310edc3a4ddaec8751dec5ce4428","permalink":"/en/teaching/excel/manual/introduction/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/introduction/","section":"teaching","summary":" ","tags":["Excel"],"title":"Introduction","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"In the last chapter we saw how to describe the distribution of a single variable in a sample. However, in most cases, studies require to describe several variables that are often related. For instance, a nutritional study should consider all the variables that could be related to the weight, as height, age, gender, smoking, diet, physic exercise, etc.\nTo understand a phenomenon that involve several variables is not enough to study every variable by its own. We have to study all the variables together to describe how they interact and the type of relation among them.\nUsually in a dependency study there is a dependent variable $Y$ that it is supposed to be influenced by a set of variables $X_1,\\ldots,X_n$ known as independent variables. The simpler case is a simple dependency study when there is only one independent variable, that is the case covered in this chapter.\nJoint distribution Joint frequencies To study the relation between two variables $X$ and $Y$, we have to study the joint distribution of the two-dimensional variable $(X,Y)$, whose values are pairs $(x_i,y_j)$ where the first element is a value of $X$ and the second a value of $Y$.\nDefinition - Joint sample frequencies. Given a sample of $n$ values and a two-dimensional variable $(X,Y)$, for every value of the variable $(x_i,y_j)$ is defined:\nAbsolute frequency $n_{ij}$: Is the number of times that the pair $(x_i,y_j)$ appears in the sample. Relative frequency $f_{ij}$: Is the proportion of times that the pair $(x_i,y_j)$ appears in the sample. $$f_{ij}=\\frac{n_{ij}}{n}.$$\nFor two-dimensional variables it make no sense cumulative frequencies. Joint frequency distribution The values of the two-dimensional variable with their frequencies is known as joint frequency distribution, and is represented in a joint frequency table.\n$$\\begin{array}{|c|ccccc|} \\hline X\\backslash Y \u0026amp; y_1 \u0026amp; \\cdots \u0026amp; y_j \u0026amp; \\cdots \u0026amp; y_q \\newline \\hline x_1 \u0026amp; n_{11} \u0026amp; \\cdots \u0026amp; n_{1j} \u0026amp; \\cdots \u0026amp; n_{1q} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \\newline x_i \u0026amp; n_{i1} \u0026amp; \\cdots \u0026amp; n_{ij} \u0026amp; \\cdots \u0026amp; n_{iq} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \\newline x_p \u0026amp; n_{p1} \u0026amp; \\cdots \u0026amp; n_{pj} \u0026amp; \\cdots \u0026amp; n_{pq} \\newline \\hline \\end{array}$$\nExample (grouped data). The height (in cm) and weight (in kg) of a sample of 30 students is:\n(179,85), (173,65), (181,71), (170,65), (158,51), (174,66), (172,62), (166,60), (194,90), (185,75), (162,55), (187,78), (198,109), (177,61), (178,70), (165,58), (154,50), (183,93), (166,51), (171,65), (175,70), (182,60), (167,59), (169,62), (172,70), (186,71), (172,54), (176,68),(168,67), (187,80). The joint frequency table is\n$$\\begin{array}{|c||c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \\ \\newline \\hline \\end{array}$$\nScatter plot The joint frequency distribution can be represented graphically with a scatter plot, where data is displayed as a collections of points on a $XY$ coordinate system.\nUsually the independent variable is represented in the $X$ axis and the dependent variable in the $Y$ axis. For every data pair $(x_i,y_j)$ in the sample a dot is drawn on the plane with those coordinates.\nThe result is a set of points that usually is known as a point cloud.\nExample. The scatter plot below represent the distribution of heights and weights of the previous sample.\nThe shape of the point cloud in a scatter plot gives information about the type of relation between the variables.\nMarginal frequency distributions The frequency distributions of each variable of the two-dimensional variable are known as marginal frequency distributions.\nWe can get the marginal frequency distributions from the joint frequency table by adding frequencies by rows and columns.\n$$\\begin{array}{|c|ccccc|c|} \\hline X\\backslash Y \u0026amp; y_1 \u0026amp; \\cdots \u0026amp; y_j \u0026amp; \\cdots \u0026amp; y_q \u0026amp; \\color{red}{n_x} \\newline \\hline x_1 \u0026amp; n_{11} \u0026amp; \\cdots \u0026amp; n_{1j} \u0026amp; \\cdots \u0026amp; n_{1q} \u0026amp; \\color{red}{n_{x_1}} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\downarrow + \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\color{red}{\\vdots} \\newline x_i \u0026amp; n_{i1} \u0026amp; \\stackrel{+}{\\rightarrow} \u0026amp; n_{ij} \u0026amp; \\stackrel{+}{\\rightarrow} \u0026amp; n_{iq} \u0026amp; \\color{red}{n_{x_i}} \\newline \\vdots \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\downarrow + \u0026amp; \\vdots \u0026amp; \\vdots \u0026amp; \\color{red}{\\vdots} \\newline x_p \u0026amp; n_{p1} \u0026amp; \\cdots \u0026amp; n_{pj} \u0026amp; \\cdots \u0026amp; n_{pq} \u0026amp; \\color{red}{n_{x_p}} \\newline \\hline \\color{red}{n_y} \u0026amp; \\color{red}{n_{y_1}} \u0026amp; \\color{red}{\\cdots} \u0026amp; \\color{red}{n_{y_j}} \u0026amp; \\color{red}{\\cdots} \u0026amp; \\color{red}{n_{y_q}} \u0026amp; n \\newline \\hline \\end{array}$$\nExample. The marginal frequency distributions for the previous sample of heights and weights are\n$$ \\begin{array}{|c||c|c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \u0026amp; \\color{red}{n_x}\\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{2}\\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{8}\\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \u0026amp; \\color{red}{11} \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \u0026amp; \\color{red}{7} \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \u0026amp; \\color{red}{2}\\ \\newline \\hline \\color{red}{n_y} \u0026amp; \\color{red}{7} \u0026amp; \\color{red}{11} \u0026amp; \\color{red}{7} \u0026amp; \\color{red}{2} \u0026amp; \\color{red}{2} \u0026amp; \\color{red}{1} \u0026amp; 30\\ \\newline \\hline \\end{array} $$\nand the corresponding statistics are\n$$ \\begin{array}{lllll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \u0026amp; \\quad \u0026amp; s_x = 10.1 \\mbox{ cm} \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \u0026amp; \u0026amp; s_y = 12.82 \\mbox{ Kg} \\end{array} $$\nCovariance To study the relation between two variables, we have to analyze the joint variation of them.\nDividing the point cloud of the scatter plot in 4 quadrants centered in the mean point $(\\bar x, \\bar y)$, the sign of deviations from the mean is:\nQuadrant $(x_i-\\bar x)$ $(y_j-\\bar y)$ $(x_i-\\bar x)(y_j-\\bar y)$ 1 $+$ $+$ $+$ 2 $-$ $+$ $-$ 3 $-$ $-$ $+$ 4 $+$ $-$ $-$ If there is an increasing linear relationship between the variables, most of the points will fall in quadrants 1 and 3, and the sum of the products of deviations from the mean will be positive.\n$$\\sum(x_i-\\bar x)(y_j-\\bar y) \u0026gt; 0$$\nIf there is an decreasing linear relationship between the variables, most of the points will fall in quadrants 2 and 4, and the sum of the products of deviations from the mean will be negative.\n$$\\sum(x_i-\\bar x)(y_j-\\bar y) \u0026lt; 0$$\nUsing the products of deviations from the means we get the following statistic.\nDefinition - Sample covariance. The sample covariance of a two-dimensional variable $(X,Y)$ is the average of the products of deviations from the respective means.$$s_{xy}=\\frac{\\sum (x_i-\\bar x)(y_j-\\bar y)n_{ij}}{n}$$ It can also be calculated using the formula\n$$s_{xy}=\\frac{\\sum x_iy_jn_{ij}}{n}-\\bar x\\bar y.$$\nThe covariance measures the linear relation between two variables:\nIf $s_{xy}\u0026gt;0$ there exists an increasing linear relation. If $s_{xy}\u0026lt;0$ there exists a decreasing linear relation. If $s_{xy}=0$ there is no linear relation. Example. Using the joint frequency table of the sample of heights and weights\n$$ \\begin{array}{|c||c|c|c|c|c|c|c|} \\hline X/Y \u0026amp; [50,60) \u0026amp; [60,70) \u0026amp; [70,80) \u0026amp; [80,90) \u0026amp; [90,100) \u0026amp; [100,110) \u0026amp; n_x\\ \\newline \\hline\\hline (150,160] \u0026amp; 2 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 2\\ \\newline \\hline (160,170] \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 8\\ \\newline \\hline (170,180] \u0026amp; 1 \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 0 \u0026amp; 11 \\ \\newline \\hline (180,190] \u0026amp; 0 \u0026amp; 1 \u0026amp; 4 \u0026amp; 1 \u0026amp; 1 \u0026amp; 0 \u0026amp; 7 \\ \\newline \\hline (190,200] \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 0 \u0026amp; 1 \u0026amp; 1 \u0026amp; 2\\ \\newline \\hline n_y \u0026amp; 7 \u0026amp; 11 \u0026amp; 7 \u0026amp; 2 \u0026amp; 2 \u0026amp; 1 \u0026amp; 30\\ \\newline \\hline \\end{array} $$\n$$\\bar x = 174.67 \\mbox{ cm} \\qquad \\bar y = 69.67 \\mbox{ Kg}$$\nwe get that the covariance is equal to\n$$ \\begin{aligned} s_{xy} \u0026amp;=\\frac{\\sum x_iy_jn_{ij}}{n}-\\bar x\\bar y = \\frac{155\\cdot 55\\cdot 2 + 165\\cdot 55\\cdot 4 + \\cdots + 195\\cdot 105\\cdot 1}{30}-174.67\\cdot 69.67 = \\newline \u0026amp; = \\frac{368200}{30}-12169.26 = 104.07 \\mbox{ cm$\\cdot$ Kg}. \\end{aligned} $$\nThis means that there is a increasing linear relation between the weight and the height.\nRegression In most cases the goal of a dependency study is not only to detect a relation between two variables, but also to express that relation with a mathematical function, $$y=f(x)$$ in order to predict the dependent variable for every value of the independent one. The part of Statistics in charge of constructing such a function is called regression, and the function is known as regression function or regression model.\nSimple regression models There are a lot of types of regression models. The most common models are shown in the table below.\nModel Equation Linear $y=a+bx$ Quadratic $y=a+bx+cx^2$ Cubic $y=a+bx+cx^2+dx^3$ Potential $y=a\\cdot x^b$ Exponential $y=e^{a+bx}$ Logarithmic $y=a+b\\log x$ Inverse $y=a+\\frac{b}{x}$ Sigmoidal $y=e^{a+\\frac{b}{x}}$ The model choice depends on the shape of the points cloud in the scatterplot.\nResiduals or predictive errors Once chosen the type of regression model, we have to determine which function of that family explains better the relation between the dependent and the independent variables, that is, the function that predicts better the dependent variable.\nThat function is the function that minimizes the distances from the observed values for $Y$ in the sample to the predicted values of the regression function. These distances are known as residuals or predictive errors.\nDefinition - Residuals or predictive errors. Given a regression model $y=f(x)$ for a two-dimensional variable $(X,Y)$, the residual or predictive error for every pair $(x_i,y_j)$ of the sample is the difference between the observed value of the dependent variable $y_j$ and the predicted value of the regression function for $x_i$,$$e_{ij} = y_j-f(x_i).$$ Least squares fitting A way to get the regression function is the least squares method, that determines the function that minimizes the squared residuals.\n$$\\sum e_{ij}^2.$$\nFor a linear model $f(x) = a + bx$, the sum depends on two parameters,the intercept $a$, and the slope $b$ of the straight line,\n$$\\theta(a,b) = \\sum e_{ij}^2 =\\sum (y_j - f(x_i))^2 =\\sum (y_j-a-bx_i)^2.$$\nThis reduces the problem to determine the values of $a$ and $b$ that minimize this sum.\nTo solve the minimization problem, we have to set to zero the partial derivatives with respect to $a$ and $b$.\n$$ \\begin{aligned} \\frac{\\partial \\theta(a,b)}{\\partial a} \u0026amp;= \\frac{\\partial \\sum (y_j-a-bx_i)^2 }{\\partial a} =0 \\newline \\frac{\\partial \\theta(a,b)}{\\partial b} \u0026amp;= \\frac{\\partial \\sum (y_j-a-bx_i)^2 }{\\partial b} =0 \\end{aligned} $$\nAnd solving the equation system, we get\n$$a= \\bar y - \\frac{s_{xy}}{s_x^2}\\bar x \\qquad b=\\frac{s_{xy}}{s_x^2}$$\nThis values minimize the residuals on $Y$ and give us the optimal linear model.\nRegression line Definition - Regression line. Given a sample of a two-dimensional variable $(X,Y)$, the regression line of $Y$ on $X$ is$$y = \\bar y +\\frac{s_{xy}}{s_x^2}(x-\\bar x).$$ The regression line of $Y$ on $X$ is the straight line that minimizes the predictive errors on $Y$, therefore it is the linear regression model that gives better predictions of $Y$. Example. Using the previous sample of heights ($X$) and weights ($Y$) with the following statistics\n$$ \\begin{array}{lllll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \u0026amp; \\quad \u0026amp; s_x = 10.1 \\mbox{ cm} \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \u0026amp; \u0026amp; s_y = 12.82 \\mbox{ Kg} \\newline \u0026amp; \u0026amp; s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \u0026amp; \u0026amp; \\end{array} $$\nthe regression line of weight on height is\n$$y = \\bar y +\\frac{s_{xy}}{s_x^2}(x-\\bar x) = 69.67+\\frac{104.07}{102.06}(x-174.67) = -108.49 +1.02 x$$\nAnd the regression line of height on weight is\n$$x = \\bar x +\\frac{s_{xy}}{s_y^2}(y-\\bar y) = 174.67+\\frac{104.07}{164.42}(y-69.67) = 130.78 + 0.63 y$$\nObserve that the regression lines are different! Relative position of the regression lines Usually, the regression line of $Y$ on $X$ and the regression line of $X$ on $Y$ are not the same, but they always intersect in the mean point $(\\bar x,\\bar y)$.\nIf there is a perfect linear relation between the variables, then both regression lines are the same, as that line makes both $X$-residuals and $Y$-residuals zero.\nIf there is no linear relation between the variables, then both regression lines are constant and equals to the respective means,\n$$y = \\bar y,\\quad x = \\bar x.$$\nSo, they intersect perpendicularly.\nRegression coefficient The most important parameter of a regression line is the slope.\nDefinition - Regression coefficient $b_{yx}$. Given a sample of a two-dimensional variable $(X,Y)$, the regression coefficient of the regression line of $Y$ on $X$ is its slope,$$b_{yx} = \\frac{s_{xy}}{s_x^2}$$ The regression coefficient has always the same sign as the covariance. It measures how the dependent variable changes in relation to the independent one according to the regression line. In particular, it gives the number of units that the dependent variable increases or decreases for every unit that the independent variable increases. Example. In the sample of heights and weights, the regression line of weight on height was\n$$y=-108.49 +1.02 x.$$\nThus, the regression coefficient of weight on height is\n$$b_{yx}= 1.02 \\mbox{Kg/cm.}$$\nThat means that, according to the regression line of weight on height, the weight will increase $1.02$ Kg for every cm that the height increases.\nRegression predictions Usually the regression models are used to predict the dependent variable for some values of the independent variable.\nExample. In the sample of heights and weights, to predict the weight of a person with a height of 180 cm, we have to use the regression line of weight on height,\n$$y = -108.49 + 1.02 \\cdot 180 = 75.11 \\mbox{ Kg}.$$\nBut to predict the height of a person with a weight of 79 Kg, we have to use the regression line of height on weight,\n$$x = 130.78 + 0.63\\cdot 79 = 180.55 \\mbox{ cm}.$$\nHowever, how reliable are these predictions?\nCorrelation Once we have a regression model, in order to see if it is a good predictive model we have to assess the goodness of fit of the model and the strength of the of relation set by it. The part of Statistics in charge of this is correlation.\nThe correlation study the residuals of a regression model: the smaller the residuals, the greater the goodness of fit, and the stronger the relation set by the model.\nResidual variance To measure the goodness of fit of a regression model is common to use the residual variance.\nDefinition - Sample residual variance $s_{ry}^2$. Given a regression model $y=f(x)$ of a two-dimensional variable $(X,Y)$, its sample residual variance is the average of the squared residuals,\n$$s_{ry}^2 = \\frac{\\sum e_{ij}^2n_{ij}}{n} = \\frac{\\sum (y_j - f(x_i))^2n_{ij}}{n}.$$\nThe greater the residuals, the greater the residual variance and the smaller the goodness of fit.\nWhen the linear relation is perfect, the residuals are zero and the residual variance is zero. Conversely, when there are no relation, the residuals coincide with deviations from the mean, and the residual variance is equal to the variance of the dependent variable.\n$$0\\leq s_{ry}^2\\leq s_y^2$$\nExplained and non-explained variation Coefficient of determination From the residual variance is possible to define another correlation statistic easier to interpret.\nDefinition - Sample coefficient of determination $r^2$. Given a regression model $y=f(x)$ of a two-dimensional variable $(X,Y)$, its coefficient of determination is$$r^2 = 1- \\frac{s_{ry}^2}{s_y^2}$$ As the residual variance ranges from 0 to $s_y^2$, we have\n$$0\\leq r^2\\leq 1$$\nThe greater $r^2$ is, the greater the goodness of fit of the regression model, and the more reliable will its predictions be. In particular,\nIf $r^2 =0$ then there is no relation as set by the regression model. If $r^2=1$ then the relation set by the model is perfect. When the regression model is linear, the coefficient of determination can be computed with this formula\n$$ r^2 = \\frac{s_{xy}^2}{s_x^2s_y^2}.$$\nProof When the fitted model is the regression line, the the residual variance is\n$$ \\begin{aligned} s_{ry}^2 \u0026amp; = \\sum e_{ij}^2f_{ij} = \\sum (y_j - f(x_i))^2f_{ij} = \\sum \\left(y_j - \\bar y -\\frac{s_{xy}}{s_x^2}(x_i-\\bar x) \\right)^2f_{ij}= \\newline \u0026amp; = \\sum \\left((y_j - \\bar y)^2 +\\frac{s_{xy}^2}{s_x^4}(x_i-\\bar x)^2 - 2\\frac{s_{xy}}{s_x^2}(x_i-\\bar x)(y_j -\\bar y)\\right)f_{ij} = \\newline \u0026amp; = \\sum (y_j - \\bar y)^2f_{ij} +\\frac{s_{xy}^2}{s_x^4}\\sum (x_i-\\bar x)^2f_{ij}- 2\\frac{s_{xy}}{s_x^2}\\sum (x_i-\\bar x)(y_j -\\bar y)f_{ij}= \\newline \u0026amp; = s_y^2 + \\frac{s_{xy}^2}{s_x^4}s_x^2 - 2 \\frac{s_{xy}}{s_x^2}s_{xy} = s_y^2 - \\frac{s_{xy}^2}{s_x^2}. \\end{aligned} $$\nand the coefficient of determination is\n$$ \\begin{aligned} r^2 \u0026amp;= 1- \\frac{s_{ry}^2}{s_y^2} = 1- \\frac{s_y^2 - \\frac{s_{xy}^2}{s_x^2}}{s_y^2} = 1 - 1 + \\frac{s_{xy}^2}{s_x^2s_y^2} = \\frac{s_{xy}^2}{s_x^2s_y^2}. \\end{aligned} $$\nExample. In the sample of heights and weights, we had\n$$ \\begin{array}{lll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \\newline s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \\end{array} $$\nThus, the linear coefficient of determination is\n$$r^2 = \\frac{s_{xy}^2}{s_x^2s_y^2} = \\frac{(104.07 \\mbox{ cm\\cdot Kg})^2}{102.06 \\mbox{ cm}^2 \\cdot 164.42 \\mbox{ Kg}^2} = 0.65.$$\nThis means that the linear model of weight on height explains the 65% of the variation of weight, and the linear model of height on weight also explains 65% of the variation of height.\nCorrelation coefficient Definition - Sample correlation coefficient $r$. Given a sample of a two-dimensional variable $(X,Y)$, the sample correlation coefficient is the square root of the linear coefficient of determination, with the sign of the covariance,$$r = \\dfrac{s_{xy}}{s_xs_y}.$$ As $r^2$ ranges from 0 to 1, $r$ ranges from -1 to 1,\n$$-1\\leq r\\leq 1.$$\nThe correlation coefficient measures not only the strength of the linear association but also its direction (increasing or decreasing):\nIf $r=0$ then there is no linear relation. Si $r=1$ then there is a perfect increasing linear relation. Si $r=-1$ then there is a perfect decreasing linear relation. Example. In the sample of heights and weights, we had\n$$\\begin{array}{lll} \\bar x = 174.67 \\mbox{ cm} \u0026amp; \\quad \u0026amp; s^2_x = 102.06 \\mbox{ cm}^2 \\newline \\bar y = 69.67 \\mbox{ Kg} \u0026amp; \u0026amp; s^2_y = 164.42 \\mbox{ Kg}^2 \\newline s_{xy} = 104.07 \\mbox{ cm$\\cdot$ Kg} \\end{array} $$\nThus, the correlation coefficient is\n$$r = \\frac{s_{xy}}{s_xs_y} = \\frac{104.07 \\mbox{ cm\\cdot Kg}}{10.1 \\mbox{ cm} \\cdot 12.82 \\mbox{ Kg}} = +0.8.$$\nThis means that there is a rather strong linear, increasing, relation between height and weight.\nDifferent linear correlations The scatter plots below show linear regression models with differents correlations.\nReliability of regression predictions The coefficient of determination explains the goodness of fit of a regression model, but there are other factors that influence the reliability of regression predictions:\nThe coefficient of determination: The greater $r^2$, the greater the goodness of fit and the more reliable the predictions are.\nThe variability of the population distribution: The greater the variation, the more difficult to predict and the less reliable the predictions are.\nThe sample size: The greater the sample size, the more information we have and the more reliable the predictions are.\nIn addition, we have to take into account that a regression model is only valid for the range of values observed in the sample. That means that, as we don’t have any information outside that range, we must not do predictions for values far from that range. Non-linear regression The fit of a non-linear regression can be also done by the least square fitting method.\nHowever, in some cases the fitting of a non-linear model can be reduced to the fitting of a linear model applying a simple transformation to the variables of the model.\nTransformations of non-linear regression models Logarithmic: A logarithmic model $y = a+b \\log x$ can be transformed in a linear model with the change $t=\\log x$:\n$$y=a+b\\log x = a+bt.$$\nExponential: An exponential model $y = e^{a+bx}$ can be transformed in a linear model with the change $z = \\log y$:\n$$z = \\log y = \\log(e^{a+bx}) = a+bx.$$\nPotential: A potential model $y = ax^b$ can be transformed in a linear model with the changes $t=\\log x$ and $z=\\log y$:\n$$z = \\log y = \\log(ax^b) = \\log a + b \\log x = a^\\prime+bt.$$\nInverse: An inverse model $y = a+b/x$ can be transformed in a linear model with the change $t=1/x$:\n$$y = a + b(1/x) = a+bt.$$\nSigmoidal: A sigmoidal model $y = e^{a+b/x}$ can be transformed in a linear model with the changes $t=1/x$ and $z=\\log y$:\n$$z = \\log y = \\log (e^{a+b/x}) = a+b(1/x) = a+bt.$$\nExponential relation Example. The number of bacteria in a culture evolves with time according to the table below.\n$$\\begin{array}{c|c} \\mbox{Hours} \u0026amp; \\mbox{Bacteria} \\newline \\hline 0 \u0026amp; 25 \\newline 1 \u0026amp; 28 \\newline 2 \u0026amp; 47 \\newline 3 \u0026amp; 65 \\newline 4 \u0026amp; 86 \\newline 5 \u0026amp; 121 \\newline 6 \u0026amp; 190 \\newline 7 \u0026amp; 290 \\newline 8 \u0026amp; 362 \\end{array} $$\nThe scatter plot of the sample is showed below.\nFitting a linear model we get\n$$\\mbox{Bacteria} = -30.18+41,27,\\mbox{Hours, with } r^2=0.85.$$\nIs a good model?\nAlthough the linear model is not bad, according to the shape of the point cloud of the scatter plot, an exponential model looks more suitable.\nTo construct an exponential model $y = e^{a+bx}$ we can apply the transformation $z=\\log y$, that is, applying a logarithmic transformation to the dependent variable.\n$$\\begin{array}{c|c|c} \\mbox{Hours} \u0026amp; \\mbox{Bacteria} \u0026amp; \\mbox{$\\log$(Bacteria)} \\newline \\hline 0 \u0026amp; 25 \u0026amp; 3.22 \\newline 1 \u0026amp; 28 \u0026amp; 3.33 \\newline 2 \u0026amp; 47 \u0026amp; 3.85 \\newline 3 \u0026amp; 65 \u0026amp; 4.17 \\newline 4 \u0026amp; 86 \u0026amp; 4.45 \\newline 5 \u0026amp; 121 \u0026amp; 4.80 \\newline 6 \u0026amp; 190 \u0026amp; 5.25 \\newline 7 \u0026amp; 290 \u0026amp; 5.67 \\newline 8 \u0026amp; 362 \u0026amp; 5.89 \\end{array} $$\nNow it only remains to compute the regression line of the logarithm of bacteria on hours,\n$$\\mbox{$\\log$(Bacteria)} = 3.107 + 0.352, \\mbox{Horas},$$\nand, undoing the change of variable,\n$$\\mbox{Bacteria} = e^{3.107+0.352,\\mbox{Hours}}, \\mbox{ with } r^2=0.99.$$\nThus, the exponential model fits much better than the linear model.\nRegression risks Lack of fit does not mean independence It is important to note that every regression model has its own coefficient of determination.\nThus, a coefficient of determination near zero means that there is no relation as set by the model, but that does not mean that the variables are independent, because there could be a different type of relation. Outliers influence in regression Outliers in regression studies are points that clearly do not follow the tendency of the rest of points, even if the values of the pair are not outliers for every variable separately.\nOutliers in regression studies can provoke drastic changes in the regression models.\nThe Simpson\u0026rsquo;s paradox Sometimes a trend can disappears or even reverses when we split the sample into groups according to a qualitative variable that is related to the dependent variable. This is known as the Simpson\u0026rsquo;s paradox.\nExample. The scatterplot below shows an inverse relation between the study hours and the score in an exam.\nBut if we split the sample in two groups (good and bad students) we get different trends and now the relation is direct, which makes more sense.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1633627012,"objectID":"a459deb83268bdce67f7ac69652daa44","permalink":"/en/teaching/statistics/manual/regression/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/regression/","section":"teaching","summary":"In the last chapter we saw how to describe the distribution of a single variable in a sample. However, in most cases, studies require to describe several variables that are often related.","tags":["Statistics","Biostatistics","Regression"],"title":"Regression","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Calculate the following statistics and interpret them.\nMean. Median. Mode. Quartiles. Percentile 32. Solution $\\bar x=1.125$ injuries. $Me=1$ injury. $Mo=1$ injury. $Q_1=1$ injury, $Q_2=1$ injury and $Q_3=2$ injuries. $P_{32}=1$ injury. Exercise 2 The chart below shows the cumulative distribution of the time (in min) required by 66 students to do an exam.\nAt what time have half of the students finished? And 90% of students? What percentage of students have finished after 100 minutes? What is the time that best represent the time required by students in the sample to finish the exam? Is this value representative or not? Solution $Me=94.62$ min. $P_{90}=132$ min. $57.08%$ of students. $\\bar x=85.9091$ min, $s=37.5268$ min and $cv=0.4368$. Exercise 3 In a study about children\u0026rsquo;s growth, two samples were drawn, one for newborn babies and the other for one year old infants. The heights in cm of children in each of the samples were\nNewborn children: 51 50 51 53 49 50 53 50 47 50 One year old children: 62 65 69 71 65 66 68 69 In which group is the mean more representative? Justify your answer.\nSolution Newborn children: $\\bar x=50.4$ min, $s_x=1.6852$ min and $cv_x=0.0334$.\nOne year old children: $\\bar y=66.875$ min, $s_y=2.7128$ min and $cv_y=0.0406$. Exercise 4 To determine the accuracy of a method for measuring hematocrit in blood, the measurement was repeated 8 times on the same blood sample. The results of hematocrit in plasma, in percentage, were\n42.2 42.1 41.9 41.8 42 42.1 41.9 42 What do you think about the accuracy of the method?\nSolution $\\bar x=42$ min, $s=0.1225$ min and $cv=0.0029$. Exercise 5 The histogram below shows the frequency distribution of the body mass index (BMI) of a group of people by gender.\nDraw the pie chart for the gender. In which group is more representative the mean of the BMI? Calculate the mean for the whole sample. Use the following sums Females: $\\sum x_i=1160$ kg/m$^2$ $\\sum x_i^2=29050$ kg$^2$/m$^4$ Males: $\\sum x_i=1002.5$ kg/m$^2$ $\\sum x_i^2=22781.25$ kg$^2$/m$^4$\nSolution Females: $\\bar x=24.1667$ min, $s_x=4.6022$ min and $cv_x=0.1904$.\nMales: $\\bar y=22.2778$ min, $s_y=3.1545$ min and $cv_y=0.1416$. $\\bar z=23.2527$. Exercise 6 The following table represents the frequency distribution of ages at which a group of people suffered a heart attack.\nage persons [40,50) 6 [50,60) 12 [60,70) 23 [70,80) 19 [80,90) 5 Could we assume that the sample comes from a normal population?\nUse the following sums: $\\sum x_i=4275$ years, $\\sum(x_i-\\bar x)^2=7461.5385$ years$^2$, $\\sum (x_i-\\bar x)^3=-18248.5207$ years$^3$, $\\sum (x_i-\\bar x)^4=2099635.8671$ years$^4$.\nSolution $g_1=-0.2283$ and $g_2=-0.5487$. Exercise 7 To compare two rehabilitation treatments $A$ and $B$ for an injury, every treatment was applied to a different group of people. The number of days required to cure the injury in each group is shown in the following table:\nDays A B 20-40 5 8 40-60 20 15 60-80 18 20 80-100 7 7 In which treatment is more representative the mean? In which treatment the distribution of days is more skew? In which treatment the distribution is more peaked? Use the following sums: $A$: $\\sum x_i=3040$ days, $\\sum (x_i-\\bar x)^2=14568$ days$^2$, $\\sum (x_i-\\bar x)^3=17011.2$ days$^3$, $\\sum (x_i-\\bar x)^4=9989602.56$ days$^4$ $B$: $\\sum y_j=3020$ days, $\\sum (y_j-\\bar y)^2=16992$ days$^2$, $\\sum (y_j-\\bar y)^3=-42393.6$ days$^3$, $\\sum (y_j-\\bar y)^4=12551516.16$ days$^4$\nSolution $A$: $\\bar a=60.8$ days, $s_a=17.0693$ days and $cv_a=0.2807$.\n$B$: $\\bar b=60.4$ days, $s_b=18.4347$ days and $cv_b=0.3052$. $g_{1a}=0.0684$ and $g_{1b}=-0.1353$. $g_{2a}=-0.6465$ and $g_{2b}=-0.8264$, so the distribution of treatment $A$ is more peaked than the one of treatment $B$ as $g_{2a} \u0026gt; g_{2b}$. Exercise 8 The systolic blood pressure (in mmHg) of a sample of persons is\n135 128 137 110 154 142 121 127 114 103 Calculate the central tendency statistics. How is the relative dispersion with respect to the mean? How is the skewness of the sample distribution? How is the kurtosis of the sample distribution? If we know that the method used for measuring the blood pressure is biased, and, in order to get the right values, we have to apply the linear transformation $y=1.2x-5$, what are the statistics values of parts (a) to (d) for the new, corrected distribution? Use the following sums: $\\sum x_i=1271$ mmHg, $\\sum (x_i-\\bar x)^2=2188.9$ mmHg$^2$, $\\sum (x_i-\\bar x)^3=2764.32$ mmHg$^3$, $\\sum (x_i-\\bar x)^4=1040079.937$ mmHg$^4$.\nSolution $\\bar x=127.1$ mmHg, $Me=127.5$ mmHg, $Mo$ all the values. $s=14.7949$ mmHg and $cv=0.1164$. $g_1=0.0854$. $g_2=-0.8292$. $\\bar x=147.52$ mmHg, $Me=148$ mmHg, $Mo=157$ mmHg, $s=17.7539$ mmHg, $cv=0.1203$, $g_1=0.0854$ and $g_2=-0.8292$. Exercise 9 The table below contains the frequency of pregnancies, abortions and births of a sample of 999 women in a city.\nNum Pregnancies Abortions Births 0 61 751 67 1 64 183 80 2 328 51 400 3 301 10 300 4 122 2 90 5 81 2 62 6 29 0 0 7 11 0 0 8 2 0 0 How many birth outliers are in the sample? Which variable has lower spread with respect to the mean? Which value is relatively higher, 7 pregnancies or 4 abortions? Justify your answer. Use the following sums: Pregnancies: $\\sum x_i=2783$, $\\sum x_i^2=9773$. Abortions: $\\sum y_j=333$, $\\sum y_j^2=559$. Births: $\\sum z_k=2450$, $\\sum z_k^2=7370$.\nSolution $129$ outliers. Pregnancies: $\\bar x=2.7858$, $s_x=1.422$ and $cv_x=0.5105$.\nAbortions: $\\bar y=0.3333$, $s_y=0.6697$ and $cv_y=2.009$.\nBirths: $\\bar z=2.4525$, $s_z=1.1674$ and $cv_z=0.476$. Standard score of $7$ pregnancies is $2.9635$, and standard score of $4$ abortions is $5.4754$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616018106,"objectID":"cf36b557c37d44162ad677200a352a36","permalink":"/en/teaching/statistics/problems/statistics/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/statistics/","section":"teaching","summary":"Exercise 1 The number of injuries suffered by the members of a soccer team in a league were\n0 1 2 1 3 0 1 0 1 2 0 1 1 1 2 0 1 3 2 1 2 1 0 1 Calculate the following statistics and interpret them.","tags":["Descriptive Statistics"],"title":"Problems of Descriptive Statistics","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Content of cells can be formatted in many ways: changing the data type, the font family, the alignment, the color, the border, etc. Most formatting options are grouped in the Format Cells dialog. To show this dialog click the bottom right corner of the Font panel in the ribbon\u0026rsquo;s Home tab.\nData types Excel manages several data types. The most common are numbers, dates and times, and text. All available data types are in the Number tab of the Format Cells dialog.\nFormatting numbers By default cells with numeric content are of type Number, but there are other numeric types like Currency and Accounting. Number is used for general display of numbers, while Currency and Accounting are used for monetary values. In all cases you can specify the number of decimal places. For monetary values you can also specify the symbol for the currency (€ by default).\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the format of prices to currency type with 3 decimal places.\nFormatting dates and times By default cells with content following the pattern day/month/year are of type Date, but there are a lot of ways of formatting dates, like for example, year-month-day or day-month_name-year etc.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the format of dates following the pattern Month-Year, with the three first letters of months and the two last digits of years.\nBy default cells with content following the pattern hours:minutes:seconds are of type Time, but there are a several ways of formatting times.\nFormatting text By default cells with non numeric content are of type Text. It\u0026rsquo;s possible to apply this type even to numbers, like for example phone numbers.\nText entered in a cell spreads to adjacent cells to the right if these cells have no content. To confine text to a certain width in the cell, select the cell and click the button Wrap Text in the Alignment section in the ribbon\u0026rsquo;s Home tab.\nAlign cell contents By default numbers are aligned to the right and text to the left, but it\u0026rsquo;s possible to change the alignment of cell contents in the Alignment tab of the Format Cells dialog.\nHorizontal alignment To change the horizontal alignment select Left, Right, Center or Justify in the Horizontal drop down list of the Alignment tab. You can also align the cell contents with the buttons of the Alignment panel in the ribbon\u0026rsquo;s Home tab.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to align the average prices centered.\nVertical alignment To change the vertical alignment select Top, Bottom, Center or Justify in the Vertical drop down list of the Alignment tab. You can also align the cell contents with the buttons of the Alignment panel in the ribbon\u0026rsquo;s Home tab.\nFont properties To format the font of cell contents select the font family, font style, font size and font color from the Font tab of the Format Cells dialog. You can also apply some effects like underline, superscript and subscript.\nIt\u0026rsquo;s also possible to change the font family, style, size and color from the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to change the font family of all table to Arial, size 10 pt.\nThe animation below shows how to change the font style of average prices to bold and the color of fruits names to blue.\nBorders and background To format the borders of cells select the line style and color, and click the borders where to apply that line in the table of the Borders tab in the Format Cells dialog.\nIt\u0026rsquo;s also possible to change the border of cells with the Border button of the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to put lines to some cell borders.\nTo format the background of cells select the background color and pattern style in the Fill tab of the Format Cells dialog.\nIt\u0026rsquo;s also possible to change the background color of cells with the Background colour button of the Font panel in the ribbon\u0026rsquo;s Home tab, and also with the contextual toolbar that appears right-clicking the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how set the background colour of some cells.\nMerge cells To merge several cells in one, select the range of cells and click the button Merge \u0026amp; Center in the Alignment section in the ribbon\u0026rsquo;s Home tab. If there are more than one cell with content in the range, merging will keep the content of the upper-left cell only. By default content of merged cells is centered.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how merge the cells of the first row and center the title.\nCopy and paste format To apply the format of a cell to others select the cell, click the Format painter button to copy the cell format. Then then select the range of cells to paste the that format.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply the same format of the fruit rows to a new row for pineapples.\nConditional formatting Excel allows to apply a format to a cell depending on its value and according to some rules. To set a new rule click the Conditional Formatting button and select New Rule. There are different types of rules:\nFormat all cells based on their value Applies a format style based on the value of the cell. There are 4 types of styles:\n2-Color Scale Applies a colour in a continuous scale ranging from one colour for the minimum value or percentage to other colour for the maximum value or percentage.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices a colour background in a continuous scale from green (the minimum price) to red (the maximum price).\n3-Color Scale The same than 2-Color Scale but with a third intermediate colour in the scale.\nData bar Plots an horizontal bar in each cell with a length proportional to the value of the cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices a data bar format.\nIcon Sets Divide the distribution of selected cell values in several parts according to intervals or percentiles, assign an different icon to each part, and plot the corresponding icon in each cell.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices an icon set format. The icon set has three icons: red is applied to values under the 33 percentile, yellow is applied to values between 33 and 67 percentiles, and green is applied to values over 67 percentile.\nFormat only cells that contain Applies a format to the cell if satisfies a logical condition.\nExample. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to prices higher than 2 € a red colour.\nFormat only top or bottom ranked values Applies a format to a number or percentage of top or bottom values. Example. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply to the three top higher prices a red colour.\nFormat only values that are above or below average Applies a format to cells with values above or below the average of selected cells. Example. The table in the animation below shows the price of fruits during several months and the average price. The animation shows how to apply a red colour to prices above the average and a green colour to prices below the average.\nPredefined styles Excel has a lot of predefined styles for formatting cells and tables. To apply a predefined cell style click Cell Styles button and select the desired style. It\u0026rsquo;s possible to define new cell styles. For that select the cell with the format to define as a style, click Cell Styles button and select New Cell Style option. In the dialog that appears just give a name to the new style, press OK, and the new cell style will appear in the cell styles menu.\nTo apply a predefined table style click Format as Table button and select the desired style. It\u0026rsquo;s also possible to define new table styles. For that click Format as Table button and select New Table Style option. In the dialog that appears just give a name to the new style, define the table format (font, borders and fill), press OK, and the new table style will appear in the table styles menu.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"c84a6dcd76af8594181a43299ad083c8","permalink":"/en/teaching/excel/manual/formatting/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/formatting/","section":"teaching","summary":" ","tags":["Excel"],"title":"Formatting and Data Printing","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Descriptive Statistics provides methods to describe the variables measured in the sample and their relations, but it does not allow to draw any conclusion about the population.\nNow it is time to take the leap from the sample to the population and the bridge for that is Probability Theory.\nRemember that the sample has a limited information about the population, and in order to draw valid conclusions for the population the sample must be representative of it. For that reason, to guarantee the representativeness of the sample, this must be drawn randomly. This means that the choice of individuals in the sample is by chance.\nProbability Theory will provide us the tools to control the random in the sampling and to determine the level of reliability of the conclusions drawn from the sample.\nRandom experiments and events Random experiments The study of a characteristic of the population is conducted through random experiments.\nDefinition - Random experiment. A random experiment is an experiment that meets two conditions:\nThe set of possible outcomes is known. It is impossible to predict the outcome with absolute certainty. Example. Gambling are typical examples of random experiments. The roll of a dice, for example, is a random experiment because\nIt is known the set of possible outcomes: $\\{1,2,3,4,5,6\\}$. Before rolling the dice, it is impossible to predict with absolute certainty the outcome. Another non-gambling example is the random choice of an individual of a human population and the determination of its blood type.\nGenerally, the draw of a sample by a random method is an random experiment.\nSample space Definition - Sample space. The set $\\Omega$ of the possible outcomes of a random experiment is known as the sample space. Example. Some examples of sample spaces are:\nFor the toss of a coin $\\Omega=\\{\\mbox{heads},\\mbox{tails}\\}$. For the roll of a dice $\\Omega=\\{1,2,3,4,5,6\\}$. For the blood type of an individual drawn by chance $\\Omega=\\{\\mbox{A},\\mbox{B},\\mbox{AB},\\mbox{0}\\}$. For the height of an individual drawn by chance $\\Omega=\\mathbb{R}^+$. Tree diagrams In experiments where more than one variable is measured, the determination of the sample space can be difficult. In such a cases, it is advisable to use a tree diagram to construct the sample space.\nIn a tree diagram every variable is represented in a level of the tree and every possible outcome of the variable as a branch.\nExample. The tree diagram below represents the sample space of a random experiment where the gender and the blood type is measured in a random individual.\nRandom events Definition - Random event. A random event is any subset of the sample space $\\Omega$ of a random experiment. There are different types of events:\nImpossible event: Is the event with no elements $\\emptyset$. It has no chance of occurring. Elemental events: Are events with only one element, that is, a singleton. Composed events: Are events with two or more elements. Sure event: Is the event that contains the whole sample space $\\Omega$. It always happens. Set theory Event space Definition - Event space. Given a sample space $\\Omega$ of a random experiment, the event space of $\\Omega$ is the set of all possible events of $\\Omega$, and is noted $\\mathcal{P}(\\Omega).$ Example. Given the sample space $\\Omega=\\{a,b,c\\}$, its even space is\n$$\\mathcal{P}(\\Omega)=\\{\\emptyset, {a},{b},{c},{a,b},{a,c},{b,c},{a,b,c}\\}$$\nAs events are subsets of the sample space, using the set theory we have the following operations on events:\nUnion Intersection Complement Difference Union of events Definition - Union event. Given two events $A,B\\subseteq \\Omega$, the union of $A$ and $B$, denoted by $A\\cup B$, is the event of all elements that are members of $A$ or $B$ or both.\n$$A\\cup B = \\{x\\,|\\, x\\in A\\textrm{ or }x\\in B\\}.$$\nThe union event $A\\cup B$ happens when $A$ or $B$ happen.\nIntersection of events Definition - Intersection event. Given two events $A,B\\subseteq \\Omega$, the intersection of $A$ and $B$, denoted by $A\\cap B$, is the event of all elements that are members of both $A$ and $B$.\n$$A\\cap B = \\{x\\,|\\, x\\in A\\textrm{ and }x\\in B\\}.$$\nThe intersection event $A\\cap B$ happens when $A$ and $B$ happen.\nTwo events are incompatible if their intersection is empty.\nComplement of an event Definition - Complementary event. Given an event $A\\subseteq \\Omega$, the complementary or contrary event of $A$, denoted by $\\bar A$, is the event of all elements of $\\Omega$ except the elements that are members of $A$.\n$$\\bar A = \\{x\\,|\\, x\\not\\in A\\}.$$\nThe complementary event $\\bar A$ happens when $A$ does not happen.\nDifference of events Definition - Difference event. Given two events $A,B\\subseteq \\Omega$, the difference of $A$ and $B$, denoted by $A-B$, is the event of all elements that are members of $A$ but not are members of $B$.\n$$A-B = \\{x\\,|\\, x\\in A\\textrm{ and }x\\not\\in B\\} = A \\cap \\bar B.$$\nThe difference event $A-B$ happens when $A$ happens but $B$ does not.\nExample. Given the sample space of rolling a dice $\\Omega=\\{1,2,3,4,5,6\\}$ and the events $A=\\{2,4,6\\}$ and $B=\\{1,2,3,4\\}$,\nThe union of $A$ and $B$ is $A\\cup B=\\{1,2,3,4,6\\}$. The intersection of $A$ and $B$ is $A\\cap B=\\{2,4\\}$. The complement of $A$ is $\\bar A=\\{1,3,5\\}$. The events $A$ and $\\bar A$ are incompatible. The difference of $A$ and $B$ is $A-B=\\{6\\}$, and the difference of $B$ and $A$ is $B-A=\\{1,3\\}$. Algebra of events Given the events $A,B,C\\subseteq \\Omega$, the following properties are meet.\n$A\\cup A=A$, $\\quad A\\cap A=A$ (idempotency). $A\\cup B=B\\cup A$, $\\quad A\\cap B = B\\cap A$ (commutative). $(A\\cup B)\\cup C = A\\cup (B\\cup C)$, $\\quad (A\\cap B)\\cap C = A\\cap (B\\cap C)$ (associative). $(A\\cup B)\\cap C = (A\\cap C)\\cup (B\\cap C)$, $\\quad (A\\cap B)\\cup C = (A\\cup C)\\cap (B\\cup C)$ (distributive). $A\\cup \\emptyset=A$, $\\quad A\\cap \\Omega=A$ (neutral element). $A\\cup \\Omega=\\Omega$, $\\quad A\\cap \\emptyset=\\emptyset$ (absorbing element). $A\\cup \\overline A = \\Omega$, $\\quad A\\cap \\overline A= \\emptyset$ (complementary symmetric element). $\\overline{\\overline A} = A$ (double contrary). $\\overline{A\\cup B} = \\overline A\\cap \\overline B$, $\\quad \\overline{A\\cap B} = \\overline A\\cup \\overline B$ (Morgan’s laws). $A\\cap B\\subseteq A\\cup B$. Probability definition Classical definition of probability Definition - Probability (Laplace). Given a sample space $\\Omega$ of a random experiment where all elements of $\\Omega$ are equally likely, the probability of an event $A\\subseteq \\Omega$ is the quotient between the number of elements of $A$ and the number of elements of $\\Omega$\n$$P(A) = \\frac{|A|}{|\\Omega|} = \\frac{\\mbox{number of favorable outcomes}}{\\mbox{number of possible outcomes}}$$\nThis definition is well known, but it has important restrictions:\nIt is required that all the elements of the sample space are equally likely (equiprobability). It can not be used with infinite sample spaces. Example. Given the sample space of rolling a dice $\\Omega=\\{1,2,3,4,5,6\\}$ and the event $A=\\{2,4,6\\}$, the probability of $A$ is\n$$P(A) = \\frac{|A|}{|\\Omega|} = \\frac{3}{6} = 0.5.$$\nHowever, given the sample space of the blood type of a random individual $\\Omega=\\{O,A,B,AB\\}$, it is not possible to use the classical definition to compute the probability of having group $A$,\n$$P(A) \\neq \\frac{|A|}{|\\Omega|} = \\frac{1}{4} = 0.25,$$\nbecause the blood types are not equally likely in human populations.\nFrequency definition of probability Theorem - Law of large numbers. When a random experiment is repeated a large number of times, the relative frequency of an event tends to the probability of the event. The following definition of probability uses this theorem.\nDefinition - Frequency probability. Given a sample space $\\Omega$ of a replicable random experiment, the probability of an event $A\\subseteq \\Omega$ is the relative frequency of the event $A$ in an infinite number of repetitions of the experiment\n$$P(A) = lim_{n\\rightarrow \\infty}\\frac{n_A}{n}$$\nAlthough frequency probability avoid the restrictions of classical definition, it also have some drawbacks:\nIt computes an estimation of the real probability (more accurate the higher the sample size). The repetition of the experiment must be in identical conditions. Example. Given the sample space of tossing a coin $\\Omega=\\{H,T\\}$, if after tossing the coin 100 times we got 54 heads, then the probability of $H$ is\n$$P(H) = \\frac{n_H}{n} = \\frac{54}{100} = 0.54.$$\nGiven the sample space of the blood type of a random individual $\\Omega=\\{O,A,B,AB\\}$, if after drawing a random sample of 1000 persons we got 412 with blood type $A$, then the probability of $A$ is\n$$P(A) = \\frac{n_A}{n} = \\frac{412}{1000} = 0.412.$$\nAxiomatic definition of probability Definition - Probability (Kolmogórov). Given a sample space $\\Omega$ of a random experiment, a probability function is a function that maps every event $A\\subseteq \\Omega$ a real number $P(A)$, known as the probability of $A$, that meets the following axioms:\nThe probability of any event is nonnegative,\n$$P(A)\\geq 0.$$\nThe probability of the sure event is 1,\n$$P(\\Omega)=1$$\nThe probability of the union of two incompatible events ($A\\cap B=\\emptyset$) is the sum of their probabilities\n$$P(A\\cup B) = P(A)+P(B).$$\nFrom the previous axioms is possible to deduce some important properties of a probability function.\nGiven a sample space $\\Omega$ of a random experiment and the events $A,B\\subseteq \\Omega$, the following properties are meet:\n$P(\\bar A) = 1-P(A)$.\n$P(\\emptyset)= 0$.\nIf $A\\subseteq B$ then $P(A)\\leq P(B)$.\n$P(A) \\leq 1$. This means that $P(A)\\in [0,1]$.\n$P(A-B)=P(A)-P(A\\cap B)$.\n$P(A\\cup B)= P(A) + P(B) - P(A\\cap B)$.\nIf $A=\\{e_1,\\ldots,e_n\\}$, where $e_i$ $i=1,\\ldots,n$ are elemental events, then\n$$P(A)=\\sum_{i=1}^n P(e_i).$$\nProof $\\bar A = \\Omega \\Rightarrow P(A\\cup \\bar A) = P(\\Omega) \\Rightarrow P(A)+P(\\bar A) = 1 \\Rightarrow P(\\bar A)=1-P(A)$.\n$\\emptyset = \\bar \\Omega \\Rightarrow P(\\emptyset) = P(\\bar \\Omega) = 1-P(\\Omega) = 1-1 = 0.$\n$B = A\\cup (B-A)$. As $A$ and $B-A$ are incompatible, $P(B) = P(A\\cup (B-A)) = P(A)+P(B-A) \\geq P(A).$\nIf we think of probabilities as areas, it is easy to see graphically,\n$A\\subseteq \\Omega \\Rightarrow P(A)\\leq P(\\Omega)=1.$\n$A=(A-B)\\cup (A\\cap B)$. As $A-B$ and $A\\cap B$ are incompatible, $P(A)=P(A-B)+P(A\\cap B) \\Rightarrow P(A-B)=P(A)-P(A\\cap B)$.\nIf we think of probabilities as areas, it is easy to see graphically,\n$A\\cup B= (A-B) \\cup (B-A) \\cup (A\\cap B)$. As $A-B$, $B-A$ and $A\\cap B$ are incompatible, $P(A\\cup B)=P(A-B)+P(B-A)+P(A\\cap B) =P(A)-P(A\\cap B)+P(B)-P(A\\cap B)+P(A\\cap B)$ $=P(A)+P(B)-P(A\\cup B)$.\nIf we think again of probabilities as areas, it is easy to see graphically because the area of $A\\cap B$ is added twice (one for $A$ and other for $), so it must be subtracted once.\n$A=\\{e_1,\\cdots,e_n\\} = \\{e_1\\}\\cup \\cdots \\cup \\{e_n\\} \\Rightarrow P(A)=P(\\{e_1\\}\\cup \\cdots \\cup \\{e_n\\}) = P(\\{e_1\\})+ \\cdots P(\\{e_n\\}).$\nProbability interpretation As set by the previous axioms, the probability of an event $A$, is a real number $P(A)$ that always ranges from 0 to 1.\nIn a certain way, this number expresses the plausibility of the event, that is, the chances that the event $A$ occurs in the experiment. Therefore, it also gives a measure of the uncertainty about the event.\nThe maximum uncertainty correspond to probability $P(A)=0.5$ ($A$ and $\\bar A$ have the same chances of happening). The minimum uncertainty correspond to probability $P(A)=1$ ($A$ will happen with absolute certainty) and $P(A)=0$ ($A$ won’t happen with absolute certainty) When $P(A)$ is closer to 0 than to 1, the chances of not happening $A$ are greater than the chances of happening $A$. On the contrary, when $P(A)$ is closer to 1 than to 0, the chances of happening $A$ are greater than the chances of not happening $A$.\nConditional probability Conditional experiments Occasionally, we can get some information about the experiment before its realization. Usually that information is given as an event $B$ of the same sample space that we know that is true before we conduct the experiment.\nIn such a case, we will say that $B$ is a conditioning event and the probability of another event $A$ is known as a conditional probability and expressed $P(A\\vert B)$. This must be read as probability of $A$ given $B$ or probability of $A$ under the condition $B$.\nUsually, conditioning events change the sample space and therefore the probabilities of events.\nExample. Assume that we have a sample of 100 women and 100 men with the following frequencies\n$$ \\begin{array}{|c|c|c|} \\hline \u0026amp; \\mbox{Non-smokers} \u0026amp; \\mbox{Smokers} \\newline \\hline \\mbox{Females} \u0026amp; 80 \u0026amp; 20 \\newline \\hline \\mbox{Males} \u0026amp; 60 \u0026amp; 40 \\newline \\hline \\end{array} $$\nThen, using the frequency definition of probability, the\n$$P(\\mbox{Smoker})= \\frac{60}{200}=0.3.$$\nHowever, if we know that the person is a woman, then the sample is reduced to the first row, and the probability of being smoker is\n$$P(\\mbox{Smoker}\\mid\\mbox{Female})=\\frac{20}{100}=0.2.$$\nConditional probability Definition - Conditional probability Given a sample space $\\Omega$ of a random experiment, and two events $A,B\\subseteq \\Omega$, the probability of $A$ conditional on $B$ occurring is\n$$P(A|B) = \\frac{P(A\\cap B)}{P(B)},$$as long as, $P(B)\\neq 0$.\nThis definition allows to calculate conditional probabilities without changing the original sample space.\nExample. In the previous example\n$$P(\\mbox{Smoker}\\mid\\mbox{Female})= \\frac{P(\\mbox{Smoker}\\cap \\mbox{Female})}{P(\\mbox{Female})} = \\frac{20/200}{100/200}=\\frac{80}{100}=0.8.$$\nProbability of the intersection event From the definition of conditional probability it is possible to derive the formula for the probability of the intersection of two events.\n$$P(A\\cap B) = P(A)P(B|A) = P(B)P(A|B).$$\nExample. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer. The probability of a random person being smoker and having breast cancer is\n$$P(\\mbox{Smoker}\\cap \\mbox{Cancer})= P(\\mbox{Smoker})P(\\mbox{Cancer}\\mid\\mbox{Smoker}) = 0.3\\times 0.4 = 0.12.$$\nIndependence of events Sometimes, the probability of the conditioning event does not change the original probability of the main event.\nDefinition - Independent events. Given a sample space $\\Omega$ of a random experiment, two events $A,B\\subseteq \\Omega$ are independents if the probability of $A$ does not change when conditioning on $B$, and vice-versa, that is,\n$$P(A|B) = P(A) \\quad \\mbox{and} \\quad P(B|A)=P(B),$$\nif $P(A)\\neq 0$ and $P(B)\\neq 0$.\nThis means that the occurrence of one event does not give relevant information to change the uncertainty of the other.\nWhen two events are independent, the probability of the intersection of them is equal to the product of their probabilities,\n$$P(A\\cap B) = P(A)P(B).$$\nExample. The sample space of tossing twice a coin is $\\Omega=\\{(H,H),(H,T),(T,H),(T,T)\\}$ and all the elements are equiprobable if the coin is fair. Thus, applying the classical definition of probability we have\n$$P((H,H)) = \\frac{1}{4} = 0.25.$$\nIf we name $H_1={(H,H),(H,T)}$, that is, having heads in the first toss, and $H_2=\\{(H,H),(T,H)\\}$, that is, having heads in the second toss, we can get the same result assuming that these events are independent,\n$$P(H,H)= P(H_1\\cap H_2) = P(H_1)P(H_2) = \\frac{2}{4}\\frac{2}{4}=\\frac{1}{4}=0.25.$$\nProbability Space Definition - Probability space. A probability space of a random experiment is a triplet $(\\Omega,\\mathcal{F},P)$ where\n$\\Omega$ is the sample space of the experiment. $\\mathcal{F}$ is a set of events of the experiment. $P$ is a probability function. If we know the probabilities of all the elements of $\\Omega$, then we can calculate the probability of every event in $\\mathcal{F}$ and we can construct easily the probability space.\nProbability space construction In order to determine the probability of every elemental event we can use a tree diagram, using the following rules:\nFor every node of the tree, label the incoming edge with the probability of the variable in that level having the value of the node, conditioned by events corresponding to its ancestor nodes in the tree. The probability of every elemental event in the leaves is the product of the probabilities on edges that go form the root to the leave. Probability tree with dependent variables In a probability tree with dependent variables, the probababilities of every level of the tree are different depending on the outcome of the previous leves.\nExample. In a population there are a 30% of smokers and we know that there are a 40% of smokers with breast cancer, while only 10% of non-smokers have breast cancer. The probability tree of the probability space of the random experiment consisting of picking a random person and measuring the variables smoking and breast cancer is shown below.\nProbability tree with independent variables In a probability tree with independent variables, the probabilities of every level of the tree are the same no matter the outcome of the previous leves.\nExample. The probability tree of the random experiment of tossing two coins is shown below.\nExample. In a population there are 40% of males and 60% of females, the probability tree of drawing a random sample of three persons is shown below.\nTotal probability theorem Partition of the sample space Definition - Partition of the sample space. A collection of events $A_1,A_2,\\ldots,A_n$ of the same sample space $\\Omega$ is a partition of the sample space if it satisfies the following conditions\nThe union of the events is the sample space, that is, $A_1\\cup \\cdots\\cup A_n =\\Omega$. All the events are mutually incompatible, that is, $A_i\\cap A_j = \\emptyset$ $\\forall i\\neq j$. Usually it is easy to get a partition of the sample space splitting a population according to some categorical variable, like for example gender, blood type, etc.\nTotal probability theorem If we have a partition of a sample space, we can use it to calculate the probabilities of other events in the same sample space.\nTheorem - Total probability. Given a partition $A_1,\\ldots,A_n$ of a sample space $\\Omega$, the probability of any other event $B$ of the same sample space can be calculated with the formula\n$$P(B) = \\sum_{i=1}^n P(A_i\\cap B) = \\sum_{i=1}^n P(A_i)P(B|A_i).$$\nProof The proof of the theorem is quite simple. As $A_1,\\ldots,A_n$ is a partition of $\\Omega$, we have\n$$B = B\\cap \\Omega = B\\cap (A_1\\cup \\cdots \\cup A_n) = (B\\cap A_1)\\cup \\cdots \\cup (B\\cap A_n).$$\nAnd all the events of this union are mutually incompatible as $A_1,\\ldots,A_n$ are, thus\n$$ \\begin{aligned} P(B) \u0026amp;= P((B\\cap A_1)\\cup \\cdots \\cup (B\\cap A_n)) = P(B\\cap A_1)+\\cdots + P(B\\cap A_n) =\\newline \u0026amp;= P(A_1)P(B|A_1)+\\cdots + P(A_n)P(B|A_n) = \\sum_{i=1}^n P(A_i)P(B|A_i). \\end{aligned} $$\nExample. A symptom $S$ can be caused by a disease $D$, but it can also be present in persons without the disease. In a population, the rate of people with the disease is $0.2$. We know also that $90%$ of persons with the disease have the symptom, while only $40%$ of persons without the disease have it.\nWhat is the probability that a random person of the population has the symptom?\nTo answer the question we can apply the total probability theorem using the partition $\\{A,\\bar A\\}$:\n$$P(S) = P(D)P(S|D)+P(\\bar D)P(S|\\bar D) = 0.2\\cdot 0.9 + 0.8\\cdot 0.4 = 0.5.$$\nThat is, half of the population has the symptom.\nIndeed, it is a weighted mean of probabilities!\nThe answer to the previous question is even clearer with the tree diagram of the probability space.\n$$ \\begin{aligned} P(S) \u0026amp;= P(D,S) + P(\\bar D,S) = P(D)P(S|D)+P(\\bar D)P(S|\\bar D)\\newline \u0026amp; = 0.2\\cdot 0.9+ 0.8\\cdot 0.4 = 0.18 + 0.32 = 0.5. \\end{aligned} $$\nBayes theorem A partition of a sample space $A_1,\\cdots,A_n$ may also be interpreted as a set of feasible hypothesis for a fact $B$.\nIn such cases it may be helpful to calculate the posterior probability $P(A_i\\vert B)$ of every hypothesis.\nDefinition - Bayes. Given a partition $A_1,\\ldots,A_n$ of a sample space $\\Omega$ and another event $B$ of the same sample space, the conditional probability of every even $A_i$ $i=1,\\ldots,n$ on $B$ can be calculated with the following formula\n$$P(A_i|B) = \\frac{P(A_i\\cap B)}{P(B)} = \\frac{P(A_i)P(B|A_i)}{\\sum_{i=1}^n P(A_i)P(B|A_i)}.$$\nExample. In the previous example, a more interesting question is about the diagnosis for a person with the symptom.\nIn this case we can interpret $D$ and $\\overline{D}$ as the two feasible hypothesis for the symptom $S$. The prior probabilities for them are $P(D)=0.2$ and $P(\\overline{D})=0.8$. That means that if we do not have information about the symptom, the diagnosis would be that the person does not have the disease.\nHowever, if after examining the person we observe the symptom, that information changes the uncertainty about the hypothesis, and we need calculate the posterior probabilities to diagnose, that is, $P(D\\vert S)$ and $P(\\overline{D}\\vert S)$.\nTo calculate the posterior probabilities we can use the Bayes theorem.\n$$ \\begin{aligned} P(D|S) \u0026amp;= \\frac{P(D)P(S|D)}{P(D)P(S|D)+P(\\overline{D})P(S|\\overline{D})} = \\frac{0.2\\cdot 0.9}{0.2\\cdot 0.9 + 0.8\\cdot 0.4} = \\frac{0.18}{0.5}=0.36,\\newline P(\\overline{D}|S) \u0026amp;= \\frac{P(\\overline{D})P(S|\\overline{D})}{P(D)P(S|D)+P(\\overline{D})P(S|\\overline{D})} = \\frac{0.8\\cdot 0.4}{0.2\\cdot 0.9 + 0.8\\cdot 0.4} = \\frac{0.32}{0.5}=0.64. \\end{aligned} $$\nAs we can see the probability of having the disease has increased. Nevertheless, the probability of not having the disease is still greater than the probability of having it, and for that reason, the diagnosis is not having the disease.\nIn this case it is said the the symptom $S$ is not decisive in order to diagnose the disease.\nEpidemiology One of the branches of Medicine that makes an intensive use of probability is , that study the distribution and causes of diseases in populations identifying risk factors for disease and targets for preventive healthcare.\nIn Epidemiology we are interested in how often appears an event or medical event $D$ (typically a disease like flu, a risk factor like smoking or a protection factor like a vaccine) that is measured as a nominal variable with two categories (occurrence or not of the event).\nThere are different measures related to the frequency of a medical event. The most important are:\nPrevalence Incidence Relative risk Odds ratio Prevalence Definition - Prevalence. The prevalence of a medical event $D$ is the proportion of a particular population that is affected by a medical event.\n$$\\mbox{Prevalence}(D) = \\frac{\\mbox{Num people affected by $D$}}{\\mbox{Population size}}$$\nOften, the prevalence is estimated from a sample as the relative frequency of people affected by the event in the sample. It is also common to express that frequency as a percentage.\nExample. To estimate the prevalence of flu a sample of 1000 persons has been studied and 150 of them had flu. Thus, the prevalence of flu is approximately 150/1000=0.15, that is, a 15%.\nIncidence Incidence measures the probability of occurrence of a medical event in a population within a given period of time. Incidence can be measured as a cumulative proportion or as a rate.\nDefinition - Cumulative incidence. The cumulative incidence of a medical event $D$ is the proportion of people that experience the event in a period of time, that is, the number of new cases with the event in the period of time divided by the size of the population at risk.\n$$R(D)=\\frac{\\mbox{Num of new cases with $D$}}{\\mbox{Population at risk size}}$$\nExample. A population initially contains 1000 persons without flu and after two years of observation 160 of them got the flu. The incidence proportion of flu is 160 cases per 1000 persons per two years, i.e. 16% per two years.\nIncidence rate or Absolute risk Definition - Incidence rate. The incidence rate or absolute risk of a medical event $D$ is the number of new cases with the event divided by the size of the population at risk and by the number of units of time in a given period.\n$$R(D)=\\frac{\\mbox{Num of new cases with $D$}}{\\mbox{Population at risk size}\\times \\mbox{Num of unit time intervals}}$$\nExample. A population initially contains $1000$ persons without flu and after two years of observation 160 of them got the flu. If we consider the year as the unit of time, the incidence rate of flu is 160 cases per $1000$ persons divided by two years, i.e. 80 cases per 1000 persons-year or 8% persons per year.\nPrevalence vs Incidence Prevalence must not be confused with incidence. Prevalence indicates how widespread the medical event is, and is more a measure of the burden of the event on society with no regard to time at risk or when subjects may have been exposed to a possible risk factor, whereas incidence conveys information about the risk of being affected by the event.\nPrevalence can be measured in cross-sectional studies at a particular time, while in order to measure incidence we need a longitudinal study observing the individuals during a period of time.\nIncidence is usually more useful than prevalence in understanding the event etiology: for example, if the incidence of a disease in a population increases, then there is a risk factor that promotes it.\nWhen the incidence is approximately constant for the duration of the event, prevalence is approximately the product of event incidence and average event duration, so\n$$\\mbox{prevalence} = \\mbox{incidence} \\times \\mbox{duration}$$\nComparing risks In order to determine if a factor or characteristic is associated with the medical event we need to compare the risk of the medical event in two populations, one exposed to the factor and the other not exposed. The group of people exposed to the factor is known as the treatment group or experimental group and the group of people unexposed as the control group.\nUsually the cases observed for each group are represented in a 2$\\times$2 table like the one below.\nEvent $D$ No event $\\overline D$ Treatment group (exposed) $a$ $b$ Control group(unexposed) $c$ $d$ Attributable risk or Risk difference $RD$ Definition - Attributable risk. The attributable risk or risk difference of a medical event $D$ for people exposed to a factor is the difference between the absolute risks of the treatment group and the control group.\n$$\\begin{aligned}RD(D) \u0026amp;= \\mbox{Risk in treatment group}-\\mbox{Risk in control group}=\\newline \u0026amp;= R_T(D)-R_C(D)=\\frac{a}{a+b}-\\frac{c}{c+d}. \\end{aligned} $$\nThe attributable risk is the risk of an event that is specifically due to the factor of interest.\nObserve that the attributable risk can be positive, when the risk of the treatment group is greater than the risk of the control group, and negative, on the contrary.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The attributable risk of getting the flu for people vaccinated is\n$$AR(D) = \\frac{20}{20+480}-\\frac{80}{80+420} = -0.12.$$\nThis means that the risk of getting flu in vaccinated people is a 12% less than in unvaccinated.\nRelative risk $RR$ Definition - Relative risk. The relative risk of a medical event $D$ for people exposed to a factor is the quotient between the proportions of people that acquired the event in a period of time in the treatment and control groups. That is, the quotient between the incidences of the treatment and the control groups.\n$$RR(D)=\\frac{\\mbox{Risk in treatment group}}{\\mbox{Risk in control group}}=\\frac{R_1(D)}{R_0(D)}=\\frac{a/(a+b)}{c/(c+d)}$$\nRelative risk compares the risk of a medical event between the treatment and the control groups.\n$RR=1$ $\\Rightarrow$ There is no association between the event and the exposure to the factor. $RR\u0026lt;1$ $\\Rightarrow$ Exposure to the factor decreases the risk of the event. $RR\u0026gt;1$ $\\Rightarrow$ Exposure to the factor increases the risk of the event. The further from 1, the stronger the association.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The relative risk of getting the flu for people vaccinated is\n$$RR(D) = \\frac{20/(20+480)}{80/(80+420)} = 0.25.$$\nThis means that vaccinated people were only one-fourth as likely to develop flu as were unvaccinated people, i.e. the vaccine reduce the risk of flu by 75%.\nOdds An alternative way of measuring the risk of a medical event is the odds.\nDefinition - Odds. The odds of a medical event $D$ in a population is the quotient between the people that acquired the event and people that not in a period of time. Unlike incidence or absolute risk, that is a proportion less than 1, the odds can be greater than 1. However, it is possible to convert an odd into a probability with the formula\n$$P(D) = \\frac{\\mbox{ODDS}(D)}{\\mbox{ODDS}(D)+1}$$\nExample. A population initially contains $1000$ persons without flu and after a year 160 of them got the flu. The odds of flu is 160/840.\nObserve that the incidence is 160/1000.\nOdds ratio $OR$ Definition - Odds ratio. The odds ratio of a medical event $D$ for people exposed to a factor is the quotient between the odds of people that acquired the event in a period of time in the treatment and control groups.\n$$OR(D)=\\frac{\\mbox{Odds in treatment group}}{\\mbox{Odds in control group}}=\\frac{a/b}{c/d}=\\frac{ad}{bc}$$\nOdds ratio compares the odds of a medical event between the treatment and the control groups. The interpretation is similar to the relative risk.\n$OR=1$ $\\Rightarrow$ There is no association between the event and the exposure to the factor. $OR\u0026lt;1$ $\\Rightarrow$ Exposure to the factor decreases the risk of the event. $OR\u0026gt;1$ $\\Rightarrow$ Exposure to the factor increases the risk of the event. The further from 1, the stronger the association.\nExample. To determine the effectiveness of a vaccine against the flu, a sample of 1000 person without flu was selected at the beginning of the year. Half of them were vaccinated (treatment group) and the other received a placebo (control group). The table below summarize the results at the end of the year.\nFlu $D$ No flu $\\overline D$ Treatment group(vaccinated) 20 480 Control group(Unvaccinated) 80 420 The odds ratio of getting the flu for people vaccinated is\n$$OR(D) = \\frac{20/480}{80/420} = 0.21875.$$\nThis means that the odds of getting the flu versus not getting the flu in vaccinated individuals is almost one fifth of that in unvaccinated, i.e. approximately for every 22 persons vaccinated with flu there will be 100 persons unvaccinated with flu.\nRelative risk vs Odds ratio Relative risk and odds ratio are two measures of association but their interpretation is slightly different. While the relative risk expresses a comparison of risks between the treatment and control groups, the odds ratio expresses a comparison of odds, that is not the same than the risk. Thus, an odds ratio of 2 does not mean that the treatment group has the double of risk of acquire the medical event.\nThe interpretation of the odds ratio is trickier because is counterfactual, and give us how many times is more frequent the event in the treatment group in comparison with the control group, assuming that in the control group the event is as frequent as the non-event.\nThe advantage of the odds ratio is that it does not depend on the prevalence or the incidence of the event, and must be used necessarily when the number of people with the medical event is selected arbitrarily in both groups, like in the case-control studies.\nExample. In order to determine the association between lung cancer and smoking two samples were selected (the second one with the double of non-cancer individuals) getting the following results:\nSample 1\nCancer No cancer Smokers 60 80 Non-smokers 40 320 $$ \\begin{aligned} RR(D) \u0026amp;= \\frac{60/(60+80)}{40/(40+320)} = 3.86.\\newline OR(D) \u0026amp;= \\frac{60/80}{40/320} = 6. \\end{aligned} $$\nSample 2\nCancer No cancer Smokers 60 160 Non-smokers 40 640 $$ \\begin{aligned} RR(D) \u0026amp;= \\frac{60/(60+160)}{40/(40+640)} = 4.64.\\newline OR(D) \u0026amp;= \\frac{60/160}{40/640} = 6. \\end{aligned} $$\nThus, when we change the incidence or the prevalence of the event (lung cancer) the relative risk changes, while the odds ratio not.\nThe relation between the relative risk and the odds ratio is given by the following formula\n$$RR = \\frac{OR}{1-R_0+R_0OR} = OR\\frac{1-R_1}{1-R_0},$$\nwhere $R_0$ and $R_1$ are the prevalence or the incidence in control and treatment groups respectively.\nThe odds ratio always overestimate the relative risk when it is greater than 1 and underestimate it when it is less than 1. However, with rare medical events (with very small prevalence or incidence) the relative risk and the odds ratio are almost the same.\nDiagnostic tests In Epidemiology it is common to use diagnostic test to diagnose diseases.\nIn general, diagnostic tests are not fully reliable and have some risk of misdiagnosis as it is represented in the table below.\n$$ \\begin{array}{|l|c|c|} \\hline \u0026amp; \\mbox{Presence of disease }D \u0026amp; \\mbox{Absence of disease }\\bar D\\newline \\hline \\mbox{Test outcome positive } + \u0026amp; \\color{green}{ \\mbox{True Positive } TP} \u0026amp; \\color{red}{\\mbox{False Positive } FP}\\newline \\hline \\mbox{Test outcome negative } - \u0026amp; \\color{red}{\\mbox{False Negative } FN} \u0026amp; \\color{green}{\\mbox{True Negative } TN}\\newline \\hline \\end{array} $$\nSensitivity and specificity of a diagnostic test The performance of a diagnostic test depends on the following two probabilities.\nDefinition - Sensitivity. The sensitivity of a diagnostic test is the proportion of positive outcomes in persons with the disease$$P(+|D)=\\frac{TP}{TP+FN}$$ Definition - Specificity. The specificity of a diagnostic test is the proportion of negative outcomes in persons without the disease$$P(-|\\overline{D})=\\frac{TN}{TN+FP}$$ Sensitivity and specificity interpretation Usually, there is a trade-off between sensitivity and specificity.\nA test with high sensitivity will detect the disease in most sick persons, but it will produce also more false positives than a less sensitive test. This way, a positive outcome in a test with high sensitivity is not useful for confirming the disease, but a negative outcome is useful for ruling out the disease, since it rarely misdiagnoses those who have the disease.\nOn the other hand, a test with a high specificity will rule out the disease in most healthy persons, but it will produce also more false negatives than a less specific test. Thus, a negative outcome in a test with high specificity is not useful for ruling out the disease, but a positive is useful to confirm the disease, since it rarely give positive outcomes in healthy people.\nDeciding on a test with greater sensitivity or a test with greater specificity depends on the type of disease and the goal of the test. In general, we will use a sensitive test when:\nThe disease is serious and it is important to dectect it. The disease is curable. The false positives do not provoke serious traumas. An we will use a specific test when:\nThe disease is important but difficult or impossible to cure. The false positives provoke serious traumas. The treatment of false positives can have dangerous consequences. Predictive values of a diagnostic test But the most important aspect of a diagnostic test is its predictive power, that is measured with the following two posterior probabilities.\nDefinition - Positive predictive value $PPV$. The positive predictive value of a diagnostic test is the proportion of persons with the disease to persons with a positive outcome$$P(D|+) = \\frac{TP}{TP+FP}$$ Definition - Negative predictive value $NPV$. The negative predictive value of a diagnostic test is the proportion of persons without the disease to persons with a negative outcome$$P(\\overline{D}|-) = \\frac{TN}{TN+FN}$$ Positive and negative predictive values allow to confirm or to rule out the disease, respectively, if they reach at least a threshold of $0.5$.\n$$ \\begin{array}{rcl} PPV\u0026gt;0.5 \u0026amp; \\Rightarrow \u0026amp; \\mbox{Disease diagnostic}\\newline NPV\u0026gt;0.5 \u0026amp; \\Rightarrow \u0026amp; \\mbox{Not disease diagnostic} \\end{array} $$\nHowever, these probabilities depends on the proportion of persons with the disease in the population $P(D)$ that is known as of the disease. They can be calculated from the sensitivity and the specificity of the diagnostic test using the Bayes theorem.\n$$ \\begin{aligned} PPV=P(D|+) \u0026amp;= \\frac{P(D)P(+|D)}{P(D)P(+|D)+P(\\overline{D})P(+|\\overline{D})}\\newline NPV=P(\\overline{D}|-) \u0026amp;= \\frac{P(\\overline{D})P(-|\\overline{D})}{P(D)P(-|D)+P(\\overline{D})P(-|\\overline{D})} \\end{aligned} $$\nThus, with frequent diseases, the positive predictive value increases, and with rare diseases, the negative predictive value increases.\nExample. A diagnostic test for the flu has been tried in a random sample of 1000 persons. The results are summarized in the table below.\n$$ \\begin{array}{|l|c|c|} \\hline \u0026amp; \\mbox{Presence of flu } D \u0026amp; \\mbox{Absence of flu } \\bar D\\newline \\hline \\mbox{Test outcome } + \u0026amp; 95 \u0026amp; 90 \\newline \\hline \\mbox{Test outcome }- \u0026amp; 5 \u0026amp; 810 \\newline \\hline \\end{array} $$\nAccording to this sample, the prevalence of the flu can be estimated as\n$$P(D) = \\frac{95+5}{1000} = 0.1.$$\nThe sensitivity of this diagnostic test is\n$$P(+|D) = \\frac{95}{95+5}= 0.95.$$\nAnd the specificity is\n$$P(-|\\overline{D}) = \\frac{810}{90+810}=0.9.$$\nThe predictive positive value of the diagnostic test is\n$$PPV = P(D|+) = \\frac{95}{95+90} = 0.5135.$$\nAs this value is over $0.5$, this means that we will diagnose the flu if the outcome of the test is positive. However, the confidence in the diagnostic will be low, as this value is pretty close to $0.5$.\nOn the other hand, the predictive negative value is\n$$NPV = P(\\overline{D}|-) = \\frac{810}{5+810} = 0.9939.$$\nAs this value is almost 1, that means that is almost sure that a person does not have the flu if he or she gets a negative outcome in the test.\nThus, this test is a powerful test to rule out the flu, but not so powerful to confirm it.\nLikelihood ratios of a diagnostic test The following measures are usually derived from sensitivity and specificity.\nDefinition - Positive likelihood ratio $LR+$. The positive likelihood ratio of a diagnostic test is the ratio between the probability of positive outcomes in persons with the disease and healthy persons respectively,\n$$LR+=\\frac{P(+|D)}{P(+|\\overline{D})} = \\frac{\\mbox{Sensitivity}}{1-\\mbox{Specificity}}$$\nDefinition - Negative likelihood ratio $LR-$. The negative likelihood ratio of a diagnostic test is the ratio between the probability of negative outcomes in persons with the disease and healthy persons respectively, $$LR-=\\frac{P(-|D)}{P(-|\\overline{D})} = \\frac{1-\\mbox{Sensitivity}}{\\mbox{Specificity}}$$ Positive likelihood ratio can be interpreted as the number of times that a positive outcome is more probable in people with the disease than in people without it.\nOn the other hand, negative likelihood ratio can be interpreted as the number of times that a negative outcome is more probable in people with the disease than in people without it.\nPost-test probabilities can be calculated from pre-test probabilities through likelihood ratios.\n$$P(D|+) = \\frac{P(D)P(+|D)}{P(D)P(+|D)+P(\\overline{D})P(+|\\overline{D})} = \\frac{P(D)LR+}{1-P(D)+P(D)LR+}$$\nThus,\nA likelihood ratio greater than 1 increases the probability of disease. A likelihood ratio less than 1 decreases the probability of disease. A likelihood ratio 1 does not change the pre-test probability. ","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1615158565,"objectID":"dc3b86f5c99c3bb3d06c28a98d3a21e5","permalink":"/en/teaching/statistics/manual/probability/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/probability/","section":"teaching","summary":"Descriptive Statistics provides methods to describe the variables measured in the sample and their relations, but it does not allow to draw any conclusion about the population.\nNow it is time to take the leap from the sample to the population and the bridge for that is Probability Theory.","tags":["Statistics","Biostatistics","Descriptive-Statistics"],"title":"Probability","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"Spreadsheets are used mainly for doing calculations and one of the most powerful features of spreadsheets are calculation formulas. In this section we will see how to use them.\nEnter formulas To enter a formula in a cell always start typing an equal sign = and then the formula expression.\nFormula expressions can contain arithmetic operators: addition +, subtraction -, multiplication *, division / and powers ^ and named predefined functions like SUM, EXP, SIN, etc. This allow to use Excel as a calculator. When Excel evaluates expressions first evaluate named functions, then powers, then products and quotients, and finally additions and subtractions, but it\u0026rsquo;s possible to use parenthesis to force the evaluation of a subexpression before.\nExample Assuming that cells A1, B1 and C1 contain the values 6,3 and 2 respectively, the next table shows some formulas and their respective results.\nFormula Result A1+B1-C1 7 A1+B1*C1 12 (A1+B1)*C1 18 A1/B1-C1 0 A1/(B1-C1) 6 A1+B1^C1 15 (A1+B1)^C1 81 Example. The animation below shows how to enter the formula 4+2 in cell A1, the formula 4-2 in cell B1, the formula 4*2 in cell C1, the formula 4/2 in cell D1, the formula 4^2 in cell E1 and the formula ((4+1)*2)^3 in cell F1.\nUsing relative and absolutes cell references in formulas Formula expressions can content references to cells. When Excel evaluates formulas it replace every cell reference by its content before doing the calculation.\nExample. The animation below shows how to use the formula =A1+B1 to add up the content of cells A1 and B1 in cell C1.\nReferences that are formed by the name of the cell or range are known as relative references, because referenced cells change When you copy a cell with a formula and paste in another cell. In general, when you copy a formula $n$ columns to the right and $m$ rows down, the referenced cells in the formulas will be updated by the cells $n$ columns to the right and $m$ rows down, an the same if you copy the cell to the left or top.\nExample. The animation below shows how to copy the formula =A1+B1 in cell C1, with relative references to A1 and B1, to the cell E4, that is 2 columns to the right and 3 rows down. Observe how the formula in cell E4 is updated to =C4+D4.\nA common way of copying the formula of a cell to adjacent cells is clicking the bottom-right corner of the cell and dragging the cursor to the desired range of cells.\nExample. The animation below shows how to generate the first ten numbers of the Fibonacci sequence. Cells A1 and B1 contains the two first numbers of the serie and cell C1 the formula =A1+B1 that add the two first numbers up and gives the third number of the serie. For generating the rest of the serie it is enough to copy the formula of cell C1 to the range D1:J1. Observe how references in formulas of these cells are updated.\nAlthough relative references are very helpful in many cases, sometimes we need the references in a formula to remain fixed when copied elsewhere.\nIn that case we need to use absolute references, that are like relative references but preceding the column name or the row name with a $ sign to fix either the row, the column or both on any cell reference.\nExample. The animation below shows how to calculate the IVA of a list of prices. Cells A2 to A5 contains the prices and cell F1 contains the IVA percentage. For calculating the IVA of first price we use the formula A2*F$4/100 where we fix the row of cell F4 because we wan it remain fixed when copying the formula down. Observe how the reference to cell F4 doesn\u0026rsquo;t change when copying the formula down.\nExample. The animation below shows how to calculate the multiplication table using absolute references.\nIn general, if you want to fix a reference in a formula that you pretend to copy horizontally, you must precede the column name with a $ sign; and if you pretend to copy the formula vertically, you must precede the row name with a $ sign.\nNaming cells and ranges Cell references are somewhat abstract, and don\u0026rsquo;t really communicate anything about the data they contain. This makes formulas that involve multiple references difficult to understand. To overcome this difficulty Excel allows to give name to cells or ranges. To define a cell or range name, select or cell range and click the Define Name button of the Defined Names panel in the ribbon\u0026rsquo;s Formulas tab. In the dialog that appears give a name to the cell and click OK. Cell or range names must begin with a letter and can\u0026rsquo;t include spaces.\nYou can also set the name of a cell or range in the name box of the input bar.\nAfter that you can use that cell o range name in any formula. Observe that references with names are always absolutes.\nExample. The animation below shows how to calculate the IVA of a list of prices using a cell name for the cell that contains the IVA percentage.\nFunctions Excel has a huge library of predefined functions that performs different calculations organised by categories. There are three ways to to enter a function in a formula expression:\nType it rawly if you know its name and syntax. Select it from the buttons of the Functions Library panel in the ribbon\u0026rsquo;s Formulas tab. Click the Insert Function button from the input bar. This will show you a dialog where you can type some key words for looking the desired function an select it. This dialog also shows help about the function and its syntax. Numeric functions Numeric functions work with numbers or cells that contains numbers. They are the most frequently used.\nSUM function The most common function is SUM that calculates the sum of several numbers. Its syntax is SUM(number1,number2,...) where number1, number2, etc. are the numbers or cell ranges that you want to sum.\nExample The animation below shows how to calculate the sum of the subject grades for every student in a course.\nSUMIF function The SUMIF function its similar to the SUM function but only sum numbers that satisfied a given criterion. Its syntax is SUMIF(range,criterion,sum-range) range is the cell range to check the criterion, criterion is the condition expression of the criterion, sum-range is the range with the values to sum (if this argument is not provided, the sum is calculated over the values of the range argument that meet the criterion).\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the sum of the grades greater than or equal to 5 for every student in a course.\nCOUNT function The COUNT function counts the number of cells with numbers in a range. Its syntax is COUNT(value1,value2,...) where value1, value2, etc. are the values or cell ranges to count.\nExample The animation below shows how to calculate the number of subjects grades for every student in a course.\nCOUNTIF function The COUNTIF function its similar to the COUNT but only counts number of cells that satisfied a given criterion. Its syntax is SUMIF(range,criterion) range is the cell range to check the criterion and criterion is the condition expression of the criterion,.\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the number of passed subjects (grade greater than or equal to 5).\nMIN function The MIN function calculates the minimum value of several numbers. Its syntax is MIN(number1,number2,...) where number1, number2, etc. are numbers or cell ranges for which you want the minimum.\nExample The animation below shows how to calculate the minimum grade for every student in a course.\nMAX function The MAX function calculates the maximum value of several numbers. Its syntax is MAX(number1,number2,...) where number1, number2, etc. are numbers or cell ranges for which you want the maximum.\nExample The animation below shows how to calculate the maximum grade for every student in a course.\nISNUMBER function The ISNUMBER function checks if a value is number or not and returns the logical value TRUE in the first case and FALSE in the second. Its syntax is ISNUMBER(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain numbers or not. Observe that in the example cells with numbers are aligned to the right and that dates are numbers.\nLogical functions Logical functions are very useful to take decisions.\nIF function The most important logical function is the IF function, that checks whether a condition is met and returns a value if is true or another value if is false. Its syntax is IF(condition,true_value,false_value), where condition is the logical condition to test, true_value is the returned value if the condition is true, and false_value is the returned value if the condition is false.\nIn the logical condition expression you use logical operators like equal =, not equal \u0026lt;\u0026gt;, greater \u0026gt;, less \u0026lt;, greater than or equal to \u0026gt;=, less than or equal to \u0026lt;=, etc. In the true or false value you can put numbers, text in double quotes, dates, cell references or other formulas.\nExample The animation below shows how to use the IF function to decide if students pass or don\u0026rsquo;t pass a course depending on whether the average grade is greater than or equal to 5.\nAND function The AND function will return TRUE if all its arguments are true and FALSE if at least one argument is false. Its syntax is AND(contidion1,condition2,...), where condition1, condition2, etc are logical conditions.\nThe following table, known as a truth table, shows the returned value by the AND function according to the corresponding values of its arguments.\nA B AND(A,B) TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE Example. The animation below shows how to use the AND function to see which students have passed all the subjects of a course with a grade greater than or equal to 5. Observe that conditions that involve blank cells are always false.\nOR function The OR function will return TRUE if one or more of its arguments are true and FALSE if all its arguments are false. Its syntax is OR(contidion1,condition2,...), where condition1, condition2, etc are logical conditions.\nThe following truth table shows the returned value by the OR function according to the corresponding values of its arguments.\nA B OR(A,B) TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE Example. The animation below shows how to use the OR function to see which students have not passed some subjects of a course with a grade greater than or equal to 5.\nNOT function The NOT function will return TRUE if its argument is FALSE, and FALSE if its argument is TRUE. Its syntax is NOT(condition), where condition is a logical condition.\nThe following truth table shows the returned value by the NOT function according to the corresponding values of its argument.\nA NOT(A) TRUE FALSE FALSE TRUE Date and time functions Date and time functions performs operations with dates and times respectively.\nExcel convert automatically any entry with with a date or time formats into a serial number. For dates, this serial number represents the number of days that have elapsed since the beginning of the twentieth century (so that January 1, 1900, is serial number 1; January 2, 1900, is serial number 2; and so on). For times, this serial number is a fraction that represents the number of hours, minutes, and seconds that have elapsed since midnight (so that 00:00:00 is serial number 0.00000000, 12:00:00 p.m. (noon) is serial number 0.50000000; 11:00:00 p.m. is 0.95833333; and so on).\nTime elapsed between two dates or times. To calculate the time elapsed between two dates or times, just enter a formula that subtracts the earlier date or time from the later date or time. In the case of dates, Excel will return the number of days between these dates. If you want to express it in year units, just divide the number of days by 365.25. In the case of times, Excel will return the number of hours between these times. If you want to express it in days unit, just change the cell format to General.\nExample. The animation below shows how to calculate the time elapsed between two dates and two times.\nTODAY function The function TODAY returns the system date (usually the current date). Its syntax is TODAY() and this functions doesn\u0026rsquo;t have arguments.\nExample. The animation below shows how to calculate current age of a person using the TODAY function.\nDATE function The function DATE returns a date serial number for the date specified by the year, month, and day argument. Its syntax is DATE(year,month,day), where year is the year, month is the month (in number) and day is the day.\nExample. The animation below shows how to calculate the date given the year, moth and day.\nDAY, WEEKDAY, MONTH and YEAR functions The DAY function returns the day of the month of a date. Its\u0026rsquo; syntax is DAY(date), where date is the serial number of the date.\nThe WEEKDAY function returns the day of the week of a date. Its\u0026rsquo; syntax is WEEKDAY(date,type), where date is the serial number of the date and type has three possible values (1: 1 equals Sunday and 7 Saturday, 2: 1 equals Monday and 7 equals Sunday; 3: 0 equals Monday and 6 equals Sunday).\nThe MONTH function returns the number of the month of a date. Its\u0026rsquo; syntax is MONTH(date), where date is the serial number of the date.\nThe YEAR function returns the year of a date. Its\u0026rsquo; syntax is YEAR(date), where date is the serial number of the date.\nExample. The animation below shows how to calculate the day, week day, month and year of a date.\nNOW function The function NOW returns the system time (usually the current time). Its syntax is NOW() and this functions doesn\u0026rsquo;t have arguments.\nExample. The animation below shows how to calculate current age of a person using the TODAY function.\nTIME function The function TIME returns a time serial number for the time specified by the hours, minutes and seconds argument. Its syntax is TIME(hours,minutes,seconds), where year is the year, month is the month (in number) and day is the day.\nExample. The animation below shows how to calculate the date given the year, moth and day.\nHOUR, MINUTE and SECOND functions The HOUR function returns the hour of a time. Its\u0026rsquo; syntax is HOUR(time), where time is the serial number of the time.\nThe MINUTE function returns the minute of a time. Its\u0026rsquo; syntax is MINUTE(time), where time is the serial number of the time.\nThe SECOND function returns the hour of a time. Its\u0026rsquo; syntax is SECOND(time), where time is the serial number of the time.\nExample. The animation below shows how to calculate the hour, minute and second of a time.\nText functions Text functions performs different actions on text data type.\nTEXT function The TEXT function converts a number into text using a format specified by the users. Its syntax is TEXT(number,format) where number is a number or a cell reference that you want to convert to text, and format is the format pattern for the text in double quotes. In that pattern you can use a 0 for numbers, . for decimal separator, d for days, m for months, y years, h for hours, m for minutes and s for seconds. Also you can use currency signs and the percentage sign %.\nExample The animation below shows how to convert different numbers, dates and times to text.\nVALUE function The VALUE function converts a text string into a number. Its syntax is VALUE(text) where text is a text or a cell reference with text that represents a number.\nExample The animation below shows how to convert different text strings representing numbers, times and percentages to numbers.\nT function The T function checks if a value is text and if so, returns the text; Otherwise, the function returns an empty text string. Its syntax is T(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain text or not. Observe that in the example cells with text are aligned to the left.\nISTEXT function The ISTEXT function checks if a value is text or not and returns the logical value TRUE in the first case and FALSE in the second. Its syntax is ISTEXT(value) where value is a value or a cell reference.\nExample The animation below shows how to check if the cells of a range contain text or not. Observe that in the example cells with text are aligned to the left.\nLEN function The LEN function counts the number of characters of a text string. Its syntax is LEN(text) where text is a text string or a cell reference with text.\nExample The animation below shows how to count the number of characters of several words. Observe that numbers are previously converted to text, and that blank cells have 0 characters.\nCONCATENATE function The CONCATENATE function joins together two or more text strings into a combined text string. Its syntax is CONCATENATE(text1,text2,...) where text1, text2, \u0026hellip; are text strings or cell ranges with text to join.\nExample The animation below shows how to concatenate the first name and the last name of some persons with a blank space between them.\nFIND and SEARCH functions The FIND function returns the position of a specified character or sub-string within a given text string. Its syntax is FIND(find_text,within_text,[start_num]) where find_text is the sub-string to find, within_text is text where to find the sub-string, and start_num is an optional argument that specifies the position in the within_text string, from which the search should begin (if omitted the search starts from the first character). The search is case-sensitive.\nThe SEARCH functions works the same that the FIND function except that is not case-sensitive.\nExample The animation below shows how to calculate the position of some text sub-strings in a text with the FIND and the SEARCH functions.\nSUBSTITUTE functions The SUBSTITUTE function replaces one or more instances of a specified text sub-string with another one supplied within a given text string. Its syntax is SUBSTITUTE(text, old_text, new_text, [instance_num]) where text is the text where to perform the substitution, old_text is the sub-string to replace, new_text is the new text string that it is used to replace the old_text string, and instance_num is an optional argument that specifies which occurrence of the old_text should be replaced by the new_text (if this argument is not specified all instances of old_text are replaced with the new_text). The search is case-sensitive.\nExample The animation below shows how to replace some sub-strings in some texts by other text strings.\nLOWER and UPPER functions The LOWER function converts all characters in a text string to lower case. Its syntax is LOWER(text) where text is the text to convert to lower case.\nThe UPPER functions works like the LOWER function but it converts text to upper case.\nExample The animation below shows how to convert to lower case some text strings.\nDatabase functions See the Database functions section.\nMathematical functions Some common mathematical functions included in the function library are exponentials, logarithmic and trigonometric.\nSQRT function The SQRT function calculates the root square of a number. Its syntax is SQRT(number) where number is a number or a cell reference for which you want the square root.\nExample The animation below shows how to calculate the square root of grades in a course.\nEXP function The EXP function calculates the exponential of a number. Its syntax is EXP(number) where number is a number or a cell reference for which you want the exponential.\nExample The animation below shows how to calculate the exponential of grades in a course.\nLN and LOG functions The LN function calculates the natural logarithm of a number (that is with base $e$). Its syntax is LN(number) where number is a number or a cell reference for which you want the natural logarithm.\nThe LOG function calculates the logarithm of a number in a given base. Its syntax is LOG(number,[base]) where number is a number or a cell reference for which you want the logarithm and base is the base of the logarithm (if this argument is omitted, then base 10 is taken).\nExample The animation below shows how to calculate the natural logarithm and the base 10 logarithm of grades in a course.\nPI function The PI function returns the constant value of $\\pi$. Its syntax is PI() without arguments.\nSIN, COS and TAN functions The SIN function calculates the sine of an angle in radians. Its syntax is SIN(angle) where angle is a number or a cell reference with the radians for which you want the sine.\nThe COS function calculates the cosine of an angle in radians. Its syntax is COS(angle) where angle is a number or a cell reference with the radians for which you want the cosine.\nThe TAN function calculates the tangent of an angle in radians. Its syntax is TAN(angle) where angle is a number or a cell reference with the radians for which you want the tangent.\nIf angles are in degrees, they have to be converted to radians before with the function RADIANS(degrees) where degrees is a number or a cell reference with the degrees that you want to convert to radians.\nExample The animation below shows how to calculate the sine, cosine and tangent of several angles. Observe that the sine of an angle o 180 degrees is not exactly 0 because the RADIANS function does not calculate the radians corresponding to a number of degrees with total accuracy.\nROUND function The ROUND function rounds a number to a specified number of digits. Its syntax is ROUND(number,digits) where number is a number or a cell reference that you want to round and digits is the number of digits to which you want to round the number.\nExample The animation below shows how to round the grades in a course.\nABS function The ABS function calculates the absolute value of a number. Its syntax is ABS(number) where number is a number or a cell reference for which you want the absolute value.\nStatistical functions Excel provides functions to calculate the main descriptive statistics, probability distributions and also to make inferences about the population. For an introductory text to Statistics visit the Statistic manual page.\nAVERAGE function The AVERAGE function calculates the arithmetic mean of several numbers. Its syntax is AVERAGE(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the average.\nExample The animation below shows how to calculate the average grade for every student in a course. Observe that the average grade is well calculated even when there are blank cells in the range.\nAVERAGEIF function The AVERAGEIF function calculates the arithmetic mean of numbers in a cell range that meet a given criterion. Its syntax is AVERAGEIF\t(range,criterion,[average-range]) where range is the cell range to check the criterion, criterion is the condition expression of the criterion, average-range is the range with the values to average (if this argument is not provided, the average is calculated over the values of the range argument that meet the criterion).\nThe expression with the condition can be a number, a cell reference, a logical expression starting with a logical operator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) in double quotes, or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string) in double quotes.\nExample The animation below shows how to calculate the average grade of students with a grade greater than or equal to 5 for every subject in a course.\nMEDIAN function The MEDIAN function calculates the median of several numbers. Its syntax is MEDIAN(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the median.\nExample The animation below shows how to calculate the median grade for every student in a course. Observe that the median grade is well calculated even when there are blank cells in the range.\nMODE function The MODE function calculates the mode of several numbers. Its syntax is MODE(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the mode.\nExample The animation below shows how to calculate the mode grade for every student in a course. Observe that the mode grade is not calculated when there are not repetitions of values.\nPERCENTILE.EXC function The PERCENTILE.EXC function calculates the k-th percentile of numbers in a cell range. Its syntax is PERCENTILE.EXC(range,k) where range is the cell range with the values for which you want the percentile, and k is the relative frequency (between 0 and 1) of the percentile.\nExample The animation below shows how to calculate the quartiles (percentiles 25, 50 and 75) of grades for every student in a course. Observe that if we use a cell reference for the k argument, putting a relative frequency in that cell (0.25 for first quartile, 0.5 for second quartile and 0.75 for third quartile) we get the correspondent percentile.\nVAR.P function The VAR.P function calculates the variance of several numbers. Its syntax is VAR.P(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the variance.\nExample The animation below shows how to calculate the variance of grades for every student in a course. Observe that the variance is well calculated even when there are blank cells in the range.\nSTDEV.P function The STDEV.P function calculates the standard deviation of several numbers. Its syntax is STDEV.P(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the standard deviation.\nExample The animation below shows how to calculate the standard deviation of grades for every student in a course. Observe that you can also calculate the standard deviation applying the square root to the variance.\nSKEW function The SKEW function calculates the skewness coefficient of several numbers. Its syntax is SKEW(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the skewness coefficient. Excel 2010 uses the following formula to calculate skewness:\n$$g_1=\\frac{n}{(n-1)(n-2)}\\sum \\left(\\frac{x_i-\\bar x}{s}\\right)^3,$$\nwhere $\\bar x$ is the mean and $s$ is the standard deviation.\nExample The animation below shows how to calculate the skewness coefficient of grades for every subject in a course.\nKURT function The KURT function calculates the kurtosis coefficient of several numbers. Its syntax is KURT(number1,number2,...) where number1,number2, etc. are the numbers or cell ranges for which you want the kurtosis coefficient. Excel 2010 uses the following formula to calculate kurtosis:\n$$g_1=\\frac{n(n+1)}{(n-1)(n-2)(n-3)}\\sum \\left(\\frac{x_i-\\bar x}{s}\\right)^4 - \\frac{3(n-1)^2}{(n-2)(n-3)},$$\nwhere $\\bar x$ is the mean and $s$ is the standard deviation.\nExample The animation below shows how to calculate the kurtosis coefficient of grades for every subject in a course.\nOther functions Other common functions are the following.\nISBLANK function The ISBLANK function checks if a value is null or a cell is blank. Its syntax is ISBLANK(value) where value is a value or a cell reference.\nExample The animation below shows how to check if some cells are blank or not. Observe that cell A3 is not blank because it contains a blank space.\nISERROR function The ISBLANK function checks if a value or cell is an error. Its syntax is ISERROR(value) where value is a value or a cell reference.\nExample The animation below shows how to check if some cells have errors.\nAuditing formulas When Excel can not perform an operation or when there is an error in a formula, it shows an error. Some common errors are\n#NAME? error. Occurs when Excel does not recognize text in a formula. Usually happens when you misspell the name of a function. #VALUE! error. Occurs when a formula has the wrong type of argument. Usually happens when you try to performs mathematical operations with cells that does not contain numbers. #DIV/0! error. Occurs when a formula tries to divide a number by 0 or an empty cell. #REF! error. Occurs when a formula refers to a cell that is not valid. Usually happens when a formula refers to a deleted cell. #NUM! error. Occurs when a formula or function contains invalid numeric values. For example when trying to calculate the square root of a negative number. #N/A error Occurs when a value is not available to a function or formula. In complex formulas it could be difficult to detect the error. Fortunately, Excel provide some tools for tracking down errors.\nTracing formulas The simplest procedure to trace formulas is double click a cell with a formula. This will show the cells referenced by the formula marked in different colours.\nAnother possibility is to trace precedents or dependents references. If you select a cell with a formula and click the Trace Precedents button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab, Excel will show arrows to the cells that affect the value of the selected cell. And if click the Trace Dependents button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab, Excel will show arrows to the cells that are affected by selected cell. To remove the arrow simply click the Remove Arrows button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab.\nExample The animation below shows how to trace a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\nError checking If some formula have an error, you can check where the error come from selecting the cell with the error and clicking the Error Checking button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab. This will display a dialog with the formula expression, an explanation of the error and several options. If the error is in the selected cell you can click the option Show Calculation Steps to evaluate the formula (see the section Formula evaluation). But if the error is in a cell that affects the selected cell you can click the option Trace Error. This will show red arrows to cells where the error come from.\nExample The animation below shows how to check an error in a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\nFormula evaluation In general, you can evaluate any formula, even if it has no error, selecting the cell with the formula and clicking the Formula Evaluation button of the Formula Auditing panel on the ribbon\u0026rsquo;s Formulas tab. This will display a dialog where you can evaluate the formula step by step.\nExample The animation below shows how to check an error in a formula to calculate the price of product without discount, with discount but without taxes and with discount and taxes.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"90b31b8635dd3d2cb0c0a2711c78a68c","permalink":"/en/teaching/excel/manual/formulas/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/formulas/","section":"teaching","summary":" ","tags":["Excel"],"title":"Formulas","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Give some examples of:\nNon related variables. Variables that are increasingly related. Variables that are decreasingly related. Solution The daily averge temperature and the daily number of births in a city. The hours preparing an exam and the score. The weight of a person and the time require to run 100 meters. Exercise 2 In a study about the effect of different doses of a medicament, 2 patients got 2 mg and took 5 days to cure, 4 patients got 2 mg and took 6 days to cure, 2 patients got 3 mg ant took 3 days to cure, 4 patients got 3 mg and took 5 days to cure, 1 patient got 3 mg and took 6 days to cure, 5 patients got 4 mg and took 3 days to cure and 2 patients got 4 mg and took 5 days to cure.\nConstruct the joint frequency table. Get the marginal frequency distributions and compute the main statistics for each variable. Compute the covariance and interpret it. Solution $$ \\begin{array}{c|c|c|c} \\hline \\mbox{dose/days} \u0026amp; 3 \u0026amp; 5 \u0026amp; 6\\newline \\hline 2 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4\\newline \\hline 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 1\\newline \\hline 4 \u0026amp; 5 \u0026amp; 2 \u0026amp; 0\\newline \\hline \\end{array} $$\n$$ \\begin{array}{c|c|c|c|c} \\hline \\mbox{dose/days} \u0026amp; 3 \u0026amp; 5 \u0026amp; 6 \u0026amp; \\mbox{Sum}\\newline \\hline 2 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4 \u0026amp; 6\\newline \\hline 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 1 \u0026amp; 7\\newline \\hline 4 \u0026amp; 5 \u0026amp; 2 \u0026amp; 0 \u0026amp; 7\\newline \\hline \\mbox{Sum} \u0026amp; 7 \u0026amp; 8 \u0026amp; 5 \u0026amp; 20\\newline \\hline \\end{array} $$\nDose: $\\bar x=3.05$ mg, $s_x^2=0.6475$ mg$^2$, $s_x=0.8047$ mg. Days: $\\bar y=4.55$ days, $s_y^2=1.4475$ days$^2$, $s_y=1.2031$ days. 3. $s_{xy}=-0.6775$ mg$\\cdot$days.\nExercise 3 The table below shows the two-dimensional frequency distribution of a sample of 80 persons in a study about the relation between the blood cholesterol ($X$) in mg/dl and the high blood pressure ($Y$).\n$$ \\begin{array}{|c||c|c|c||c|} \\hline X\\setminus Y \u0026amp; [110,130) \u0026amp; [130,150) \u0026amp; [150,170) \u0026amp; n_x \\newline \\hline\\hline [170,190) \u0026amp; \u0026amp; 4 \u0026amp; \u0026amp; 12\\newline \\hline [190,210) \u0026amp; 10 \u0026amp; 12 \u0026amp; 4 \u0026amp; \\newline \\hline [210,230) \u0026amp; 7 \u0026amp; \u0026amp; 8 \u0026amp; \\newline \\hline [230,250) \u0026amp; 1 \u0026amp; \u0026amp; \u0026amp; 18\\newline \\hline\\hline n_y \u0026amp; \u0026amp; 30 \u0026amp; 24 \u0026amp; \\newline \\hline \\end{array} $$\nComplete the table. Construct the linear regression model of cholesterol on pressure. Use the linear model to calculate the expected cholesterol for a person with pressure 160 mmHg. According to the linear model, what is the expected pressure for a person with cholesterol 270 mg/dl? Use the following sums: $\\sum x_i=16960$ mg/dl, $\\sum y_j=11160$ mmHg, $\\sum x_i^2=3627200$ (mg/dl)$^2$, $\\sum y_j^2=1576800$ mmHg$^2$ y $\\sum x_iy_j=2378800$ mg/dl$\\cdot$mmHg.\nSolution $$ \\begin{array}{|c||c|c|c||c|} \\hline X\\setminus Y \u0026amp; [110,130) \u0026amp; [130,150) \u0026amp; [150,170) \u0026amp; n_x \\newline \\hline\\hline [170,190) \u0026amp; 8 \u0026amp; 4 \u0026amp; 0 \u0026amp; 12\\newline \\hline [190,210) \u0026amp; 10 \u0026amp; 12 \u0026amp; 4 \u0026amp; 26 \\newline \\hline [210,230) \u0026amp; 7 \u0026amp; 9 \u0026amp; 8 \u0026amp; 24 \\newline \\hline [230,250) \u0026amp; 1 \u0026amp; 5 \u0026amp; 12 \u0026amp; 18\\newline \\hline\\hline n_y \u0026amp; 26 \u0026amp; 30 \u0026amp; 24 \u0026amp; 80\\newline \\hline \\end{array} $$\n$\\bar x=212$ mg/dl, $s_x^2=396$ (mg/dl)$^2$. $\\bar y=139.5$ mmHg, $s_y^2=249.75$ mmHg$^2$. $s_{xy}=161$ mg/dl$\\cdot$mmHg. Regression line of cholesterol on blood pressure: $x=122.0721 + 0.6446y$. 3. $x(160)=225.2152$ mg/dl. 4.\nRegression line of blood pressure on cholesterol: $y=53.3081 + 0.4066x$. $y(270)=163.0808$ mmHg.\nExercise 4 A research study has been conducted to determine the loss of activity of a drug. The table below shows the results of the experiment.\n$$ \\begin{array}{lrrrrr} \\hline \\mbox{Time (in years)} \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \\newline \\mbox{Activity (%)} \u0026amp; 96 \u0026amp; 84 \u0026amp; 70 \u0026amp; 58 \u0026amp; 52 \\newline \\hline \\end{array} $$\nConstruct the linear regression model of activity on time. According to the linear model, when will the activity be 80%? When will the drug have lost all activity? Solution $\\bar x=3$ years, $s_x^2=2$ years$^2$. $\\bar y=72$ %, $s_y^2=264$ %$^2$. $s_{xy}=-22.8$ years$\\cdot$%. Regression line of activity on time: $y=106.2 + -11.4x$. Regression line of time on activity: $x=9.2182 + -0.0864y$. $x(80)=2.3091$ years and $x(0)=9.2182$ years.\nExercise 5 A basketball team is testing a new stretching program to reduce the injuries during the league. The data below show the daily number of minutes doing stretching exercises and the number of injuries along the league.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Stretching minutes} \u0026amp; 0 \u0026amp; 30 \u0026amp; 10 \u0026amp; 15 \u0026amp; 5 \u0026amp; 25 \u0026amp; 35 \u0026amp; 40\\newline \\mbox{Injuries} \u0026amp; 4 \u0026amp; 1 \u0026amp; 2 \u0026amp; 2 \u0026amp; 3 \u0026amp; 1 \u0026amp; 0 \u0026amp; 1\\newline \\hline \\end{array} $$\nConstruct the regression line of the number of injuries on the time of stretching. How much is the reduction of injuries for every minute of stretching? How many minutes of stretching are require for having no injuries? Is reliable this prediction? Use the following sums ($X$=Number of minutes stretching, and $Y$=Number of injuries): $\\sum x_i =160$ min, $\\sum y_j=14$ injuries, $\\sum x_i^2=4700$ min$^2$, $\\sum y_j^2=36$ injuries$^2$ and $\\sum x_iy_j=160$ min$\\cdot$injuries.\nSolution $\\bar x=20$ min, $s_x^2=187.5$ min$^2$. $\\bar y=1.75$ injuries, $s_y^2=1.4375$ injuries$^2$. $s_{xy}=-15$ min$\\cdot$injuries. Regression line of injuries on time of stetching: $y=3.35 + -0.08x$. $0.08$ injuries/min. Regression line of time of stretching on injuries: $x=38.2609 + -10.4348y$. $x(0)=38.2609$ min. $r^2=0.8348$.\nExercise 6 For two variables $X$ and $Y$ we have\nThe regression line of $Y$ on $X$ is $y-x-2=0$. The regression line of $X$ on $Y$ is $y-4x+22=0$. Calculate:\nThe means $\\bar x$ and $\\bar y$. The correlation coefficient. Solution $\\bar x=8$ and $\\bar y=10$. $r=0.5$. Exercise 7 The means of two variables $X$ and $Y$ are $\\bar x=2$ and $\\bar y=1$, and the correlation coefficient is 0.\nPredict the value of $Y$ for $x=10$. Predict the value of $X$ for $y=5$. Plot both regression lines. Solution $y(10)=1$. $x(5)=2$. Exercise 8 A study to determine the relation between the age and the physical strength gave the scatter plot below. Calculate the linear coefficient of determination for the whole sample. Calculate the linear coefficient of determination for the sample of people younger than 25 years old. Calculate the linear coefficient of determination for the sample of people older than 25 years old. For which age group the relation between age and strength is stronger? Use the following sums ($X$=Age and $Y=$Weight lifted).\nWhole sample: $\\sum x_i=431$ years, $\\sum y_j=769$ Kg, $\\sum x_i^2=13173$ years$^2$, $\\sum y_j^2=39675$ Kg$^2$ and $\\sum x_iy_j=21792$ years$\\cdot$Kg.\nYoung people: $\\sum x_i=123$ years, $\\sum y_j=294$ Kg, $\\sum x_i^2=2339$ years$^2$, $\\sum y_j^2=14418$ Kg$^2$ and $\\sum x_iy_j=5766$ years$\\cdot$Kg.\nOld people: $\\sum x_i=308$ years, $\\sum y_j=475$ Kg, $\\sum x_i^2=10834$ years$^2$, $\\sum y_j^2=25257$ Kg$^2$ and $\\sum x_iy_j=16026$ years$\\cdot$Kg.\nSolution $\\bar x=26.9375$ years, $s_x^2=97.6836$ years$^2$. $\\bar y=48.0625$ kg, $s_y^2=169.6836$ kg$^2$. $s_{xy}=67.3164$ years$\\cdot$kg. $r^2=0.2734$. $\\bar x=17.5714$ years, $s_x^2=25.3878$ years$^2$. $\\bar y=42$ kg, $s_y^2=295.7143$ kg$^2$. $s_{xy}=85.7143$ years$\\cdot$kg. $r^2=0.9786$. $\\bar x=34.2222$ years, $s_x^2=32.6173$ years$^2$. $\\bar y=52.7778$ kg, $s_y^2=20.8395$ kg$^2$. $s_{xy}=-25.5062$ years$\\cdot$kg. $r^2=0.9571$. The linear relation between the age and the physical strength is a little bit stronger in the group of young people. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1601555270,"objectID":"8f518bade28c9dd4b2f3818225d824e9","permalink":"/en/teaching/statistics/problems/linear_regression/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/linear_regression/","section":"teaching","summary":"Exercise 1 Give some examples of:\nNon related variables. Variables that are increasingly related. Variables that are decreasingly related. Solution The daily averge temperature and the daily number of births in a city.","tags":["Regression","Linear Regression"],"title":"Problems of Linear Regression","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Random variables The process of drawing a sample randomly is a random experiment and any variable measured in the sample is a random variable because the values taken by the variable in the individuals of the sample are a matter of chance.\nDefinition - Random variable. A random variable $X$ is a function that maps every element of the sample space of a random experiment to a real number.\n$$X:\\Omega \\rightarrow \\mathbb{R}$$\nThe set of values that the variable can assume is called the range and is represented by $\\mbox{Ran}(X)$.\nIn essence, a random variable is a variable whose values come from a random experiment, and every value has a probability of occurrence.\nExample. The variable $X$ that measures the outcome of rolling a dice is a random variable and its range is $\\mbox{Ran}(X)={1,2,3,4,5,6}$.\nTypes of random variables There are two types of random variables:\nDiscrete. They take isolated values, and their range is numerable. Example. Number of children of a family, number of smoked cigarettes, number of subjects passed, etc.\nContinuous. They can take any value in a real interval, and their range is non-numerable. Example. Weight, height, age, cholesterol level, etc.\nThe way of modelling each type of variable is different. In this chapter we are going to study how to model discrete variables.\nProbability distribution of a discrete random variable As values of a discrete random variable are linked to the elementary events of a random experiment, every value has a probability.\nDefinition - Probability function. The probability function of a discrete random variable $X$ is the function $f(x)$ that maps every value $x_i$ of the variable to its probability$$f(x_i) = P(X=x_i).$$ We can also accumulate probabilities the same way that we accumulated sample frequencies.\nDefinition - Distribution function. The distribution function of a discrete random variable $X$ is the function $F(x)$ that maps every value $x_i$ of the variable to the probability of having a value less than or equal to $x_i$$$F(x_i) = P(X\\leq x_i) = f(x_1)+\\cdots +f(x_i).$$ The range of a discrete random variable and its probability function is known as probability distribution of the variable, and it is usually presented in a table\n$$ \\begin{array}{|c|cccc|c|} \\hline X \u0026amp; x_1 \u0026amp; x_2 \u0026amp; \\cdots \u0026amp; x_n \u0026amp; \\sum\\newline \\hline f(x) \u0026amp; f(x_1) \u0026amp; f(x_2) \u0026amp; \\cdots \u0026amp; f(x_n) \u0026amp; 1\\newline \\hline F(x) \u0026amp; F(x_1) \u0026amp; F(x_2) \u0026amp; \\cdots \u0026amp; F(x_n) =1 \u0026amp; \\newline \\hline \\end{array} $$\nThe same way that the sample frequency table shows the distribution of values of a variable in the sample, the probability distribution of a discrete random variable shows the distribution of values in the whole population.\nExample. Let $X$ be the discrete random variable that measures the number of heads after tossing two coins. The probability tree of the random experiment is\nAccording to this, the probability distribution of $X$ is\n$$\\begin{array}{|c|ccc|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2\\newline \\hline f(x) \u0026amp; 0.25 \u0026amp; 0.5 \u0026amp; 0.25\\newline \\hline F(x) \u0026amp; 0.25 \u0026amp; 0.75 \u0026amp; 1 \\newline \\hline \\end{array} \\qquad F(x) = \\begin{cases} 0 \u0026amp; \\mbox{si $x\u0026lt;0$}\\newline 0.25 \u0026amp; \\mbox{si $0\\leq x\u0026lt; 1$}\\newline 0.75 \u0026amp; \\mbox{si $1\\leq x\u0026lt; 2$}\\newline 1 \u0026amp; \\mbox{si $x\\geq 2$} \\end{cases} $$\nPopulation statistics The same way we use sample statistics to describe the sample frequency distribution of a variable, we use population statistics to describe the probability distribution of a random variable in the whole population.\nThe population statistics definition is analogous to the sample statistics definition, but using probabilities instead of relative frequencies.\nThe most important are 1:\nDefinition - Discrete random variable mean The mean or the expectec value of a discrete random variable $X$ is the sum of the products of its values and its probabilities:\n$$\\mu = E(X) = \\sum_{i=1}^n x_i f(x_i)$$\nDefinition - Discrete random variable variance and standard deviation The variance of a discrete random variable $X$ is the sum of the products of its squared values and its probabilities, minus the squared mean:\n$$\\sigma^2 = Var(X) = \\sum_{i=1}^n x_i^2 f(x_i) -\\mu^2$$\nThe standard deviation of a random variable $X$ is the square root of the variance:\n$$\\sigma = +\\sqrt{\\sigma^2}$$\nExample. In the random experiment of tossing two coins the probability distribution is\n$$ \\begin{array}{|c|ccc|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2\\newline \\hline f(x) \u0026amp; 0.25 \u0026amp; 0.5 \u0026amp; 0.25\\newline \\hline F(x) \u0026amp; 0.25 \u0026amp; 0.75 \u0026amp; 1 \\newline \\hline \\end{array} $$\nThe main population statistics are\n$$ \\begin{aligned} \\mu \u0026amp;= \\sum_{i=1}^n x_i f(x_i) = 0\\cdot 0.25 + 1\\cdot 0.5 + 2\\cdot 0.25 = 1 \\mbox{ heads},\\newline \\sigma^2 \u0026amp;= \\sum_{i=1}^n x_i^2 f(x_i) -\\mu^2 = (0^0\\cdot 0.25 + 1^2\\cdot 0.5 + 2^2\\cdot 0.25) - 1^2 = 0.5 \\mbox{ heads}^2,\\newline \\sigma \u0026amp;= +\\sqrt{0.5} = 0.71 \\mbox{ heads}. \\end{aligned} $$\nDiscrete probability distribution models According to the type of experiment where the random variable is measured, there are different probability distributions models. The most common are\nDiscrete uniform Binomial Poisson Discrete uniform distribution $U(a,b)$ When all the values of a random variable $X$ have equal probability, the probability distribution of $X$ is uniform.\nDefinition - Discrete uniform distribution $U(a,b)$. A discrete random variable $X$ follows a discrete uniform distribution model with parameters $a$ and $b$, noted $X\\sim U(a,b)$, if its range is $\\mbox{Ran}(X) = {a, a+1, \\ldots,b}$ and its probability function is\n$$f(x)=\\frac{1}{b-a+1}.$$\nObserve that $a$ and $b$ are the minimum and the maximum of the range respectively.\nThe mean and the variance are\n$$\\mu = \\sum_{i=0}^{b-a}\\frac{a+i}{b-a+1}=\\frac{a+b}{2} \\qquad \\sigma^2 =\\sum_{i=0}^{b-a}\\frac{(a+i-\\mu)^2}{b-a+1}=\\frac{(b-a+1)^2-1}{12}$$\nExample. The variable that measures the outcome of rolling a dice follows a discrete uniform distribution model $U(1,6)$.\nBinomial distribution $B(n,p)$ Usually the binomial distribution corresponds to a variable measured in a random experiment with the following features:\nThe experiment consist in a sequence of $n$ repetitions of the same trial. Each trial is repeated in identical conditions and produces two possible outcomes known as Success or Failure. The trials are independent. The probability of Success is the same in all the trials and is $P(\\mbox{Success})=p$. Under these conditions, the discrete random variable $X$ that measures the number of successes in the $n$ trials follows a binomial distribution model with parameters $n$ and $p$.\nDefinition - Binomial distribution $(B(n,p)$. A discrete random variable $X$ follows a binomial distribution model with parameters $n$ and $p$, noted $X\\sim B(n,p)$, if its range is $\\mbox{Ran}(X) = {0,1,\\ldots,n}$ and its probability function is\n$$f(x) = \\binom{n}{x}p^x(1-p)^{n-x} = \\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}.$$\nObserve that $n$ is known as the number of repetitions of a trial and $p$ is known as the probability of Success in every repetition.\nThe mean and the variance are\n$$\\mu = n\\cdot p \\qquad \\sigma^2 = n\\cdot p\\cdot (1-p).$$\nExample. The variable that measures the number of heads after tossing 10 coins follows a binomial distribution model $B(10,0.5)$.\nAccording to this,\nThe probability of getting 4 heads is $$f(4) = \\binom{10}{4}0.5^4 (1-0.5)^{10-4} = \\frac{10!}{4!6!}0.5^40.5^6 = 210\\cdot 0.5^{10} = 0.2051.$$\nThe probability of getting 2 or less heads is $$\\begin{aligned} F(2) \u0026amp;= f(0) +f(1) + f(2) =\\newline \u0026amp;= \\binom{10}{0}0.5^0 (1-0.5)^{10-0} + \\binom{10}{1}0.5^1 (1-0.5)^{10-1} + \\binom{10}{2}0.5^2 (1-0.5)^{10-2} =\\newline \u0026amp;= 0.0547.\\end{aligned} $$\nAnd the expected number of heads is $$\\mu = 10\\cdot 0.5 = 5 \\mbox{ heads}.$$\nExample. In a population there are a 40% of smokers. The variable $X$ that measures the number of smokers in a random sample with replacement of 3 persons follows a binomial distribution model $X\\sim B(3,,0.4)$.\n$$ \\begin{align*} f(0)\u0026amp;=\\displaystyle\\binom{3}{0}0.4^0(1-0.4)^{3-0}= 0.6^3,\\newline f(1)\u0026amp;=\\displaystyle\\binom{3}{1}0.4^1(1-0.4)^{3-1}= 3\\cdot 0.4\\cdot 0.6^2,\\newline f(2)\u0026amp;=\\displaystyle\\binom{3}{2}0.4^2(1-0.4)^{3-2}= 3\\cdot 0.4^2\\cdot 0.6,\\newline f(3)\u0026amp;=\\displaystyle\\binom{3}{3}0.4^3(1-0.4)^{3-3}= 0.4^3. \\end{align*} $$\nPoisson distribution $P(\\lambda)$ Usually the Poisson distribution correspond to a variable measured in a random experiment with the following features:\nThe experiment consists of observing the number of events occurring in a fixed interval of time or space. For instance, number of births in a month, number of emails in one hour, number of red blood cells in a volume of blood, etc. The events occur independently. The experiment produces the same average rate of events $\\lambda$ for every interval unit. Under these conditions, the discrete random variable $X$ that measures the number of events in an interval unit follows a Poisson distribution model with parameter $\\lambda$.\nDefinition - Poisson distribution $P(\\lambda)$. A discrete random variable $X$ follows a Poisson distribution model with parameter $\\lambda$, noted $X\\sim P(\\lambda)$, if its range is $\\mbox{Ran}(X) = {0,1,\\ldots,\\infty}$ and its probability function is\n$$f(x) = e^{-\\lambda}\\frac{\\lambda^x}{x!}.$$\nObserve that $\\lambda$ is the average rate of event for an interval unit, and it will change if the interval changes.\nThe mean and the variance are\n$$\\mu = \\lambda \\qquad \\sigma^2 = \\lambda.$$\nExample. In a city there are an average of 4 births every day. The random variable $X$ that measures the number of births in a day in the city follows a Poisson distribution model $X\\sim P(4)$.\nAccording to this,\nThe probability that there are 5 births in a day is $$f(5) = e^{-4}\\frac{4^5}{5!} = 0.1563.$$\nThe probability that there are less than 2 births in a day is $$F(1) = f(0)+f(1) = e^{-4}\\frac{4^0}{0!} + e^{-4}\\frac{4^1}{1!} = 5e^{-4} = 0.0916.$$\nThe probability that there are more than 1 birth a day is $$P(X\u0026gt;1) = 1-P(X\\leq 1) = 1-F(1) = 1-0.0916 = 0.9084.$$\nApproximation of Binomial by Poisson distribution The Poisson distribution can be obtained from the Binomial distribution when the number of trials repetition tends to infinite and the probability of Success tends to zero.\nLaw or rare events. The Binomial distribution $X\\sim B(n,p)$ tends to the Poisson distribution $P(\\lambda)$, with $\\lambda=n\\cdot p$, when $n$ tends to infinite and $p$ tends to zero, that is,\n$$\\lim_{n\\rightarrow \\infty, p\\rightarrow 0}\\binom{n}{x}p^x(1-p)^{n-x} = e^{-\\lambda}\\frac{\\lambda^x}{x!}.$$\nIn practice, this approximation can be used for $n\\geq 30$ and $p\\leq 0.1$.\nExample. A vaccine produce an adverse reaction in 4% of cases. If a sample of 50 persons are vaccinated, what is the probability of having more than 2 persons with an adverse reaction?\nThe variable that measures the number of persons with an adverse reaction in the sample follows a Binomial distribution model $X\\sim B(50,0.04)$, but as $n=50\u0026gt;30$ and $p=0.04\u0026lt;0.1$, we can apply the law of rare events and use the Poisson distribution model $P(50\\cdot 0.04)=P(2)$ to do the calculations.\n$$ \\begin{aligned} P(X\u0026gt;2) \u0026amp;= 1-P(X\\leq 2) = 1-f(0)-f(1)-f(2) =\\newline \u0026amp;= 1-e^{-2}\\frac{2^0}{0!}-e^{-2}\\frac{2^1}{1!}-e^{-2}\\frac{2^2}{2!} =\\newline \u0026amp;= 1-5e^{-2} = 0.3233.\\end{aligned} $$\nTo distinguish population statistics from sample statistics we use Greek letters.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ba9fad09ad6c5312ddf502710f334f63","permalink":"/en/teaching/statistics/manual/discrete-random-variables/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/discrete-random-variables/","section":"teaching","summary":"Random variables The process of drawing a sample randomly is a random experiment and any variable measured in the sample is a random variable because the values taken by the variable in the individuals of the sample are a matter of chance.","tags":["Statistics","Biostatistics","Random Variables"],"title":"Discrete Random Variables","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Calculus formulas Main Calculus formulas Derivatives Integrals ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"85066dc0c1cbf4c700bd9b4a270786a4","permalink":"/en/teaching/calculus/cheatsheets/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/cheatsheets/","section":"teaching","summary":"Everything you have to know at a glance","tags":["Cheat sheet"],"title":"Calculus Cheat Sheets","type":"book"},{"authors":null,"categories":["Calculus"],"content":"Exercise 1 Compute the derivative function of $f(x)=x^3-2x^2+1$ at the points $x=-1$, $x=0$ and $x=1$. Explain your result. Find an equation of the tangent line to the graph of $f$ at each of the three given points.\nSolution $f\u0026rsquo;(-1)=7$, $f\u0026rsquo;(0)=0$ y $f\u0026rsquo;(1)=-1$.\nTangent line at $x=-1$: $y=-2+7(x+1)$.\nTangent line at $x=0$: $y=1$.\nTangent line at $x=1$: $y=-(x-1)$. Exercise 2 The pH measures the concentration of hydrogen ions H$^+$ in an aqueous solution. It is defined by $$ \\mbox{pH} = -\\log_{10}(\\mbox{H}^+). $$ Compute the derivative of the pH as a function of the concentration of H$^+$. Study the growth of the pH function.\nSolution The pH decreases as the concentration of hydrogen ions H$^+$ increase. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"5d7a39f96e69b5aea4b51f6c60707876","permalink":"/en/teaching/calculus/problems/derivatives-1/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/problems/derivatives-1/","section":"teaching","summary":"Exercise 1 Compute the derivative function of $f(x)=x^3-2x^2+1$ at the points $x=-1$, $x=0$ and $x=1$. Explain your result. Find an equation of the tangent line to the graph of $f$ at each of the three given points.","tags":["Derivatives"],"title":"Problems of Derivatives","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 A dietary center is testing a new diet in sample of 12 persons. The data below are the number of days of diet and the weight loss (in kg) until them for every person.\n(33,3.9) (51,5.9) (30,3.2) (55,6) (38,4.9) (62,6.2) (35,4.5) (60,6.1) (44,5.6) (69,6.2) (47,5.8) (40,5.3) Draw the scatter plot. According to the point cloud, what type of regression model explains better the relation between the weight loss and the days of diet? Construct the linear regression model and the logarithmic regression model of the weight loss on the number of days of diet. Use the best model to predict the weight that will lose a person after 40 and 100 days of diet. Are these predictions reliable? Use the following sums ($X$=days of diet and $Y$=weight loss): $\\sum x_i=564$ days, $\\sum \\log(x_i)=45.8086$ $\\log(\\mbox{days})$, $\\sum y_j=63.6$ kg, $\\sum x_i^2=28234$ days$^2$, $\\sum \\log(x_i)^2=175.6603$ $\\log(\\mbox{days})^2$, $\\sum y_j^2=347.7$ kg$^2$, $\\sum x_iy_j=3108.5$ days$\\cdot$kg, $\\sum \\log(x_i)y_j=245.4738$ $\\log(\\mbox{days})\\cdot$kg.\nSolution 2. Linear model $\\bar x=47$ days, $s_x^2=143.8333$ days$^2$. $\\bar y=5.3$ kg, $s_y^2=0.885$ kg$^2$. $s_{xy}=9.9417$ days$\\cdot$kg. Regression line of weight loss on days of diet: $y=2.0514 + 0.0691x$. $r^2=0.7765$. Logartihmic model $\\overline{\\log(x)}=3.8174$ log(days), $s_{\\log(x)}^2=0.0659$ log(days)$^2$. $s_{\\log(x)y}=0.224$ log(days)$\\cdot$kg. Logartihmic model of weight loss on days of diet: $y=-7.6678 + 3.397\\log(x)$. $r^2=0.8599$. 3. $y(40)=4.8635$ kg and $y(100)=7.9761$ kg. The predictions are reliable because the coefficient of determination is close to 1, but the last one is less reiable as 100 is far from the observed range of values in the sample.\nExercise 2 The concentration of a drug in blood, in mg/dl, depends on time, in hours, according to the data below.\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Time} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8\\newline \\mbox{Drug concentration} \u0026amp; 25 \u0026amp; 36 \u0026amp; 48 \u0026amp; 64 \u0026amp; 86 \u0026amp; 114 \u0026amp; 168\\newline \\hline \\end{array} $$\nConstruct the linear regression model of drug concentration on time. Construct the exponential regression model of drug concentration on time. Use the best regression model to predict the drug concentration after $4.8$ hours? Is this prediction reliable? Justify your answer. Use the following sums ($C$=Drug concentration and $T$=time): $\\sum t_i=35$ h, $\\sum \\log(t_i)=10.6046$ $\\log(\\mbox{h})$, $\\sum c_j=541$ mg/dl, $\\sum \\log(c_j)= 29.147$ $\\log(\\mbox{mg/dl})$, $\\sum t_i^2=203$ h$^2$, $\\sum \\log(t_i)^2=17.5206$ $\\log(\\mbox{h})^2$, $\\sum c_j^2=56937$ (mg/dl)$^2$, $\\sum \\log(c_j)^2=124.0131$ $\\log(\\mbox{mg/dl})^2$, $\\sum t_ic_j=3328$ h$\\cdot$mg/dl, $\\sum t_i\\log(c_j)=154.3387$ h$\\cdot\\log(\\mbox{mg/dl})$, $\\sum \\log(t_i)c_j=951.6961$ $\\log(\\mbox{h})\\cdot$mg/dl, $\\sum\\log(t_i)\\log(c_j)=46.08046$ $\\log(\\mbox{h})\\cdot\\log(\\mbox{mg/dl})$.\nSolution $\\bar x=5$ hours, $s_x^2=4$ hours$^2$. $\\bar y=77.2857$ mg/dl, $s_y^2=2160.7755$ (mg/dl)$^2$. $s_{xy}=89$ hours$\\cdot$mg/dl. Regression line of drug concentration on time: $y=-33.9643 + 22.25x$. $r^2=0.9165$. $\\overline{\\log(y)}=4.1639$ log(mg/dl), $s_{\\log(y)}^2=0.3785$ log(mg/dl)$^2$. $s_{x\\log(y)}=1.2291$ hours$\\cdot$log(mg/dl). Exponential model of drug concentration on time: $y=e^{2.6275 + 0.3073x}$. $r^2=0.9979$. 3. $y(4.8)=60.4853$ mg/dl.\nExercise 3 A researcher is studying the relation between the obesity and the response to pain. The obesity is measured as the percentage over the ideal weight, and the response to pain as the nociceptive flexion pain threshold. The results of the study appears in the table below.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Obesity} \u0026amp; 89 \u0026amp; 90 \u0026amp; 77 \u0026amp; 30 \u0026amp; 51 \u0026amp; 75 \u0026amp; 62 \u0026amp; 45 \u0026amp; 90 \u0026amp; 20\\newline \\mbox{Pain threshold} \u0026amp; 10 \u0026amp; 12 \u0026amp; 11.5 \u0026amp; 4.5 \u0026amp; 5.5 \u0026amp; 7 \u0026amp; 9 \u0026amp; 8 \u0026amp; 15 \u0026amp; 3\\newline \\hline \\end{array} $$\nAccording to the scatter plot, what model explains better the relation of the response to pain on the obesity? According to the best regression model, what is the response to pain expected for a person with an obesity of 50%? Is this prection reliable? According to the best regression model, what is the expected obesity for a person with a pain threshold of 10? Is this prediction reliable? Use the following sums ($X$=Obesity and $Y$=Pain threshold): $\\sum x_i=629$, $\\sum \\log(x_i)=40.4121$, $\\sum y_j=92.2$, $\\sum \\log(y_j)=21.339$, $\\sum x_i^2=45445$, $\\sum \\log(x_i)^2=165.6795$, $\\sum y_j^2=960.14$, $\\sum \\log(y_j)^2=47.6231$, $\\sum x_iy_j=6537.7$, $\\sum x_i\\log(y_j)=1443.1275$, $\\sum \\log(x_i)y_j=387.5728$, $\\sum \\log(x_i)\\log(y_j)=88.3696$.\nSolution 2. Linear model $\\bar x=62.9$, $s_x^2=588.09$. $\\bar y=9.22$, $s_y^2=11.0056$. $s_{xy}=82.0356$. Regression line of pain threshold on obesity: $y=1.3232 + 0.1255x$. $r^2=0.8422$. Logartihmic model $\\overline{\\log(x)}=4.0412$, $s_{\\log(x)}^2=0.2366$. $s_{\\log(x)y}=1.4973$. Logartihmic model of pain threshold on obesity: $y=-16.3578 + 6.3293\\log(x)$. $r^2=0.8611$. $y(50)=8.4023$. 3.\nExponential model of obesity on pain threshold: $x=e^{2.7868 + 0.1361y}$. $x(10)=63.2648$.\nExercise 4 A blood bank keeps plasma at a temperature of 0ºF. When it is required for a blood transfusion, it is heated in an oven at a constant temperature of 120ºF. In an experiment it has been measured the temperature of plasma at different times during the heating. The results are in the table below.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Time (min)}\t\u0026amp; 5 \u0026amp; 8 \u0026amp; 15 \u0026amp; 25 \u0026amp; 30 \u0026amp; 37 \u0026amp; 45 \u0026amp; 60\\newline \\mbox{Temperature (ºF)} \u0026amp; 25 \u0026amp; 50 \u0026amp; 86 \u0026amp; 102 \u0026amp; 110 \u0026amp; 114 \u0026amp; 118 \u0026amp; 120\\newline \\hline \\end{array} $$\nPlot the scatter plot. Which type of regression model do you think explains better relationship between temperature and time? Which transformation should we apply to the variables to have a linear relationship? Compute the logarithmic regression of the temperature on time. According to the logarithmic model, what will the temperature of the plasma be after 15 minutes of heating? Is this prediction reliable? Justify your answer. Use the following sums ($X$=Time and $Y$=Temperature): $\\sum x_i=225$ min, $\\sum \\log(x_i)=24.5289$ log(min), $\\sum y_j=725$ ºF, $\\sum \\log(y_j)=35.2051$ log(ºF), $\\sum x_i^2=8833$ min², $\\sum \\log(x_i)^2=80.4703$ log²(min), $\\sum y_j^2=74345$ ºF², $\\sum \\log(y_j)^2=157.1023$ log²(ºF), $\\sum x_iy_j=24393$ min⋅ºF, $\\sum x_i\\log(y_j)=1048.0142$ min⋅log(ºF), $\\sum \\log(x_i)y_j=2431.7096$ log(min)⋅ºF, $\\sum \\log(x_i)\\log(y_j)=111.1165$ log(min)log(ºF).\nSolution A logarithmic model. 2. Apply a logarithmic transformation to time $z=\\log(x)$. $\\bar z=28.125$ log(min), $s_z^2=0.6577$ log²(min). $\\bar y=90.625$ ºF, $s_y^2=1080.2344$ ºF². $s_{zy}=26.0969$ log(min)ºF. Logarithmic model of temperature on time: $y=-31.0325 + 39.6781\\log(x)$. $y(15)=76.4176$ ºF. $r^2=0.9586$, that is close to 1, so the prediction is reliable. Exercise 5 The activity of a radioactive substance depends on time according to the data in the table below.\n$$ \\begin{array}{lrrrrrrrr} \\hline t\\mbox{ (hours)} \u0026amp; 0 \u0026amp; 10 \u0026amp; 20 \u0026amp; 30 \u0026amp; 40 \u0026amp; 50 \u0026amp; 60 \u0026amp; 70 \\newline A\\mbox{ ($10^7$ disintegrations/s)} \u0026amp; 25.9 \u0026amp; 8.16 \u0026amp; 2.57 \u0026amp; 0.81 \u0026amp; 0.25 \u0026amp; 0.08 \u0026amp; 0.03 \u0026amp; 0.01\\newline \\hline \\end{array} $$\nRepresent graphically the data of radioactivity as a function of time. Which type of regression model explains better the relationship between radioactivity and time? Represent graphically the data of radioactivity as a function of time in a semi-logarithmic paper. Compute the regression line of the logarithm of radioactivity on time. Taking into account that radioactivity decay follows the formula \\newline[ A(t) = A_0 e^{-\\lambda t} \\newline] where $A_0$ is the number of disintegrations at the begining and $\\lambda$ is a disintegration constant, different for each radioactive substance, use the slope of the previous regression line to compute the disintegration constant for the substance. Use the following sums ($X$=Time and $Y$=Radioactivity): $\\sum x_i=280$ hours, $\\sum y_j=37.81$ 10⁷ disintegrations/s, $\\sum \\log(y_j)=-5.9371$ log(10⁷ disintegrations/s), $\\sum x_i^2=14000$ hours², $\\sum y_j^2=744.7265$ 10⁷ disintegrations/s², $\\sum \\log(y_j)^2=57.7369$ log²(10⁷ disintegrations/s), $\\sum x_iy_j=173.8$ hours⋅10⁷ disintegrations/s, $\\sum x_i\\log(y_j)=-680.9447$ hours⋅log(10⁷ disintegrations/s).\nSolution 2. $\\bar x=35$ hours, $s_x^2=525$ hours². $\\bar z=-0.7421$ log(10⁷ disintegrations/s), $s_z^2=6.6664$ log(10⁷ disintegrations/s)^2. $s_{xz}=-59.1434$ hours⋅log(10⁷ disintegrations/s) Regression line of logarithm of radioactivity on time: $z=3.2008 + -0.1127x$. $\\lambda=0.1127$. Exercise 6 For oscillations of small amplitude, the oscillation period $T$ of a pendulum is given by the formula \\newline[ T = 2\\pi\\sqrt{\\frac{L}{g}} \\newline] where $L$ is the length of the pendulum and $g$ is the gravitational constant. In order to check if the previous formula is satisfied, an experiment has been conducted where it has been measured the oscillation period for different lengths of the pendulum.The measurements are shown in the table below.\n$$ \\begin{array}{lrrrrr} \\hline L\\text{ (cm)} \u0026amp; 52.5 \u0026amp; 68.0 \u0026amp; 99.0 \u0026amp; 116.0 \u0026amp; 146.0 \\newline P\\text{ (seg)} \u0026amp; 1.449 \u0026amp; 1.639 \u0026amp; 1.999 \u0026amp; 2.153 \u0026amp; 2.408\\newline \\hline \\end{array} $$\nRepresent graphically the data of the period versus the length of the pendulum.\nDoes a linear model fit well to the points cloud? Represent graphically the data of the period versus the length in a logarithmic paper. Which type of model fits better to the points cloud? Compute the regression line of the logarithm of period on the logarithm of length. Taking in to account the independent term of the previous regression line, compute the value of $g$. Solution The linear model fits well to the points cloud. 2. The model that best fits the points cloud is linear. 3. Let $X$ be the logarithm of length and $Y$ to the logarithm of period, $\\bar x=4.5025$ log(cm), $s_x^2=0.1353$ log(cm)². $\\bar y=0.6407$ log(s), $s_y^2=0.0339$ log(s)². $s_{xy}=0.0677$ log(cm)log(s)\nRegression line of Y on X: $y=-1.6132 + 0.5006x$. 4. $g=994.4579 cm/s².\nExercise 7 A study tries to determine the relationship between two substances $X$ and $Y$ in blood. The concentrations of these substances have been measured in seven individuals (in $\\mu$g/dl) and the results are shown in the table below.\n$$ \\begin{array}{rrrrrrrr} \\hline X \u0026amp; 2.1 \u0026amp; 4.9 \u0026amp; 9.8 \u0026amp; 11.7 \u0026amp; 5.9 \u0026amp; 8.4 \u0026amp; 9.2 \\newline Y \u0026amp; 1.3 \u0026amp; 1.5 \u0026amp; 1.7 \u0026amp; 1.8 \u0026amp; 1.5 \u0026amp; 1.7 \u0026amp; 1.7 \\newline \\hline \\end{array} $$\nAre $Y$ and $X$ linearly related? Are $Y$ and $X$ potentially related? Use the best of the previous regression models to predict the concentration in blood of $Y$ for $x=8$ $\\mu$gr/dl.Is this prediction reliable. Justify your answer. Use the following sums: $\\sum x_i=52$ μg/dl, $\\sum \\log(x_i)=13.1955$ log(μg/dl), $\\sum y_j=11.2$ μg/dl, $\\sum \\log(y_j)=3.253$ log(μg/dl), $\\sum x_i^2=451.36$ (μg/dl)², $\\sum \\log(x_i)^2=26.9397$ log(μg/dl)², $\\sum y_j^2=18.1$ (μg/dl)², $\\sum \\log(y_j)^2=1.5878$ log(μg/dl)², $\\sum x_iy_j=86.57$ (μg/dl)², $\\sum x_i\\log(y_j)=26.3463$ μg/dl⋅log(μg/dl), $\\sum \\log(x_i)y_j=21.7087$ log(μg/dl)⋅μg/dl, $\\sum \\log(x_i)\\log(y_j)=6.5224$ log(μg/dl)².\nSolution $\\bar x=7.4286$ μg/dl, $s_x^2=9.2963$ (μg/dl)². $\\bar z=-0.7421$ μg/dl, $s_z^2=6.6664$ (μg/dl)². $s_{xz}=-0.4147$ (μg/dl)²\nLinear relation: $r^2=0.9696$, that is close to 1, so there is a strong linear relation.\n2. Naming $u=\\log(x)$ and $v=\\log(y)$,\n$\\bar u=1.8851$ log(μg/dl), $s_u^2=0.295$ log(μg/dl)². $\\bar v=0.4647$ log(μg/dl), $s_v^2=0.0109$ log(μg/dl)². $s_{uv}=0.0558$ (μg/dl)²\nPotential relation: $r^2=0.9688$, that is close to 1, so there is a strong potential relation, although the linear relation is a little bit stronger.\n3. Regression line of $Y$ on $X$: $y=1.2153 + 0.0518x$. $y(8)=1.6296$ μg/dl. The prediction is reliable since the linear coefficient of determination is close to 1.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1601555270,"objectID":"240d83e0159a0490570775dba8ffaa8d","permalink":"/en/teaching/statistics/problems/non_linear_regression/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/non_linear_regression/","section":"teaching","summary":"Exercise 1 A dietary center is testing a new diet in sample of 12 persons. The data below are the number of days of diet and the weight loss (in kg) until them for every person.","tags":["Regression","Non-linear Regression"],"title":"Problems of Non Linear Regression","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Concept of derivative Increment Definition - Increment of a variable. An increment of a variable $x$ is a change in the value of the variable; it is denoted $\\Delta x$. The increment of a variable $x$ along an interval $[a,b]$ is given by $$\\Delta x = b-a.$$ Definition - Increment of a function. The increment of a function $y=f(x)$ along an interval $[a,b]\\subseteq Dom(f)$ is given by $$\\Delta y = f(b)-f(a).$$ Example. The increment of $x$ along the interval $[2,5]$ is $\\Delta x=5-2=3$, and the increment of the function $y=x^2$ along the same interval is $\\Delta y=5^2-2^2=21$.\nAverage rate of change The study of a function $y=f(x)$ requires to understand how the function changes, that is, how the dependent variable $y$ changes when we change the independent variable $x$.\nDefinition - Average rate of change. The average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]\\subseteq Dom(f)$, is the quotient between the increment of $y$ and the increment of $x$ in that interval; it is denoted by $$\\mbox{ARC}\\;f[a,a+\\Delta x]=\\frac{\\Delta y}{\\Delta x}=\\frac{f(a+\\Delta x)-f(a)}{\\Delta x}.$$ Example - Area of a square. Let $y=x^2$ be the function that measures the area of a metallic square of side length $x$.\nIf at any given time the side of the square is $a$, and we heat the square uniformly increasing the side by dilatation a quantity $\\Delta x$, how much will increase the area of the square?\n$$ \\Delta y = f(a+\\Delta x)-f(a)=(a+\\Delta x)^2-a^2= a^2+2a\\Delta x+\\Delta x^2-a^2=2a\\Delta x+\\Delta x^2. $$\nWhat is the average rate of change in the interval $[a,a+\\Delta x]$? $$\\mbox{ARC}\\;f[a,a+\\Delta x]=\\frac{\\Delta y}{\\Delta x}=\\frac{2a\\Delta x+\\Delta x^2}{\\Delta x}=2a+\\Delta x.$$\nGeometric interpretation of the average rate of change The average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]$ is the slope of the secant line to the graph of $f$ through the points $(a,f(a))$ and $(a+\\Delta x,f(a+\\Delta x))$.\nInstantaneous rate of change Often it is interesting to study the rate of change of a function, not in an interval, but in a point.\nKnowing the tendency of change of a function in an instant can be used to predict the value of the function in nearby instants.\nDefinition - Instantaneous rate of change and derivative. The instantaneous rate of change of a function $f$ in a point $a$, is the limit of the average rate of change of $f$ in the interval $[a,a+\\Delta x]$, when $\\Delta x$ approaches 0; it is denoted by\n$$ \\begin{aligned} \\textrm{IRC}\\;f (a) \u0026amp;= \\lim_{\\Delta x\\rightarrow 0} \\textrm{ARC}\\; f[a,a+\\Delta x]=\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta y}{\\Delta x}=\\newline \u0026amp;= \\lim_{\\Delta x\\rightarrow 0}\\frac{f(a+\\Delta x)-f(a)}{\\Delta x}. \\end{aligned} $$\nWhen this limit exists, the function $f$ is said to be differentiable at the point $a$, and its value is called the derivative of $f$ at $a$, and it is denoted $f\u0026rsquo;(a)$ (Lagrange’s notation) or $\\frac{df}{dx}(a)$ (Leibniz’s notation).\nExample - Area of a square. Let us take again the function $y=x^2$ that measures the area of a metallic square of side $x$.\nIf at any given time the side of the square is $a$, and we heat the square uniformly increasing the side, what is the tendency of change of the area in that moment?\n$$\\begin{aligned} \\textrm{IRC}\\;f(a)\u0026amp;=\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta y}{\\Delta x} = \\lim_{\\Delta x\\rightarrow 0}\\frac{f(a+\\Delta x)-f(a)}{\\Delta x} =\\newline \u0026amp;= \\lim_{\\Delta x\\rightarrow 0}\\frac{2a\\Delta x+\\Delta x^2}{\\Delta x}=\\lim_{\\Delta x\\rightarrow 0} 2a+\\Delta x= 2a. \\end{aligned} $$\nThus, $$f\u0026rsquo;(a)=\\frac{df}{dx}(a)=2a,$$ indicating that the area of the square tends to increase the double of the side.\nInterpretation of the derivative The derivative of a function $f\u0026rsquo;(a)$ shows the growth rate of $f$ at point $a$:\n$f\u0026rsquo;(a)\u0026gt;0$ indicates an increasing tendency ($y$ increases as $x$ increases). $f\u0026rsquo;(a)\u0026lt;0$ indicates a decreasing tendency ($y$ decreases as $x$ increases). Example. A derivative $f\u0026rsquo;(a)=3$ indicates that $y$ tends to increase triple of $x$ at point $a$. A derivative $f\u0026rsquo;(a)=-0.5$ indicates that $y$ tends to decrease half of $x$ at point $a$.\nGeometric interpretation of the derivative We have seen that the average rate of change of a function $y=f(x)$ in an interval $[a,a+\\Delta x]$ is the slope of the secant line, but when $\\Delta x$ approaches $0$, the secant line becomes the tangent line.\nThe instantaneous rate of change or derivative of a function $y=f(x)$ at $x=a$ is the slope of the tangent line to the graph of $f$ at point $(a,f(a))$. Thus, the equation of the tangent line to the graph of $f$ at the point $(a,f(a))$ is $$y-f(a) = f\u0026rsquo;(a)(x-a) \\Leftrightarrow y = f(a)+f\u0026rsquo;(a)(x-a)$$\nKinematic applications: Linear motion Assume that the function $y=f(t)$ describes the position of an object moving in the real line at time $t$. Taking as reference the coordinates origin $O$ and the unitary vector $\\mathbf{i}=(1)$, we can represent the position of the moving object $P$ at every moment $t$ with a vector $\\vec{OP}=x\\mathbf{i}$ where $x=f(t)$.\nRemark. It also makes sense when $f$ measures other magnitudes as the temperature of a body, the concentration of a gas, or the quantity of substance in a chemical reaction at every moment $t$.\nKinematic interpretation of the average rate of change In this context, if we take the instants $t=t_0$ and $t=t_0+\\Delta t$, both in $\\mbox{Dom}(f)$, the vector $$\\mathbf{v}_m=\\frac{f(t_0+\\Delta t)-f(t_0)}{\\Delta t}$$ is known as the average velocity of the trajectory $f$ in the interval $[t_0, t_0+\\Delta t]$.\nExample. A vehicle makes a trip from Madrid to Barcelona. Let $f(t)$ be the function that determine the position of the vehicle at every moment $t$. If the vehicle departs from Madrid (km 0) at 8:00 and arrives at Barcelona (km 600) at 14:00, then the average velocity of the vehicle in the path is $$\\mathbf{v}_m=\\frac{f(14)-f(8)}{14-8}=\\frac{600-0}{6} = 100 km/h.$$\nKinematic interpretation of the derivative In the same context of the linear motion, the derivative of the function $f(t)$ at the moment $t_0$ is the vector\n$$\\mathbf{v}=f\u0026rsquo;(t_0)=\\lim_{\\Delta t\\rightarrow 0}\\frac{f(t_0+\\Delta t)-f(t_0)}{\\Delta t},$$\nthat is known, as long as the limit exists, as the instantaneous velocity or simply velocity of the trajectory $f$ at moment $t_0$.\nThat is, the derivative of the object position with respect to time is a vector field that is called velocity along the trajectory $f$.\nExample. Following with the previous example, what indicates the speedometer at any instant is the modulus of the instantaneous velocity vector at that moment.\nAlgebra of derivatives Properties of the derivative If $y=c$, is a constant function, then $y\u0026rsquo;=0$ at any point.\nIf $y=x$, is the identity function, then $y\u0026rsquo;=1$ at any point.\nIf $u=f(x)$ and $v=g(x)$ are two differentiable functions, then\n$(u+v)\u0026rsquo;=u\u0026rsquo;+v'$ $(u-v)\u0026rsquo;=u\u0026rsquo;-v'$ $(u\\cdot v)\u0026rsquo;=u\u0026rsquo;\\cdot v+ u\\cdot v'$ $\\left(\\dfrac{u}{v}\\right)\u0026rsquo;=\\dfrac{u\u0026rsquo;\\cdot v-u\\cdot v\u0026rsquo;}{v^2}$ Derivative of a composite function Theorem - Chain rule. If the function $y=f\\circ g$ is the composition of two functions $y=f(z)$ and $z=g(x)$, then $$(f\\circ g)\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x).$$ Proof It is easy to proof this fact using the Leibniz notation $$\\frac{dy}{dx}=\\frac{dy}{dz}\\frac{dz}{dx}=f\u0026rsquo;(z)g\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x).$$ Example. If $f(z)=\\sin z$ and $g(x)=x^2$, then $f\\circ g(x)=\\sin(x^2)$. Applying the chain rule the derivative of the composite function is $$(f\\circ g)\u0026rsquo;(x)=f\u0026rsquo;(g(x))g\u0026rsquo;(x) = \\cos(g(x)) 2x = \\cos(x^2)2x.$$\nOn the other hand, $g\\circ f(z)= (\\sin z)^2$, and applying the chain rule again, its derivative is $$(g\\circ f)\u0026rsquo;(z)=g\u0026rsquo;(f(z))f\u0026rsquo;(z) = 2f(z)\\cos z = 2\\sin z\\cos z.$$\nDerivative of the inverse of a function Theorem - Derivative of the inverse function. Given a function $y=f(x)$ with inverse $x=f^{-1}(y)$, then $$\\left(f^{-1}\\right)\u0026rsquo;(y)=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{f\u0026rsquo;(f^{-1}(y))},$$ provided that $f$ is differentiable at $f^{-1}(y)$ and $f\u0026rsquo;(f^{-1}(y))\\neq 0$. Proof It is easy to prove this equality using the Leibniz notation $$\\frac{dx}{dy}=\\frac{1}{dy/dx}=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{f\u0026rsquo;(f^{-1}(y))}$$ Example. The inverse of the exponential function $y=f(x)=e^x$ is the natural logarithm $x=f^{-1}(y)=\\ln y$, so we can compute the derivative of the natural logarithm using the previous theorem and we get $$\\left(f^{-1}\\right)\u0026rsquo;(y)=\\frac{1}{f\u0026rsquo;(x)}=\\frac{1}{e^x}=\\frac{1}{e^{\\ln y}}=\\frac{1}{y}.$$\nSometimes it is easier to apply the chain rule to compute the derivative of the inverse of a function. In this example, as $\\ln x$ is the inverse of $e^x$, we know that $e^{\\ln x}=x$, so differentiating both sides and applying the chain rule to the left side we get $$(e^{\\ln x})\u0026rsquo;=x\u0026rsquo; \\Leftrightarrow e^{\\ln x}(\\ln(x))\u0026rsquo; = 1 \\Leftrightarrow (\\ln(x))\u0026rsquo;=\\frac{1}{e^{\\ln x}}=\\frac{1}{x}.$$\nAnalysis of functions Analysis of functions: increase and decrease The main application of derivatives is to determine the variation (increase or decrease) of functions. For that we use the sign of the first derivative.\nTheorem. Let $f(x)$ be a function with first derivative in an interval $I\\subseteq \\mathbb{R}$.\nIf $\\forall x\\in I\\ f\u0026rsquo;(x)\u0026gt; 0$ then $f$ is increasing on $I$. If $\\forall x\\in I\\ f\u0026rsquo;(x)\u0026lt; 0$ then $f$ is decreasing on $I$. If $f\u0026rsquo;(x_0)=0$ then $x_0$ is known as a critical point or stationary point. At this point the function can be increasing, decreasing or neither increasing nor decreasing.\nExample The function $f(x)=x^2$ has derivative $f\u0026rsquo;(x)=2x$; it is decreasing on $\\mathbb{R}^-$ as $f\u0026rsquo;(x)\u0026lt; 0$ $\\forall x\\in \\mathbb{R}^-$ and increasing on $\\mathbb{R}^+$ as $f\u0026rsquo;(x)\u0026gt; 0$ $\\forall x\\in \\mathbb{R}^+$. It has a critical point at $x=0$, as $f\u0026rsquo;(0)=0$; at this point the function is neither increasing nor decreasing.\nA function can be increasing or decreasing on an interval and not have first derivative. Example. Let us analyze the increase and decrease of the function $f(x)=x^4-2x^2+1$. Its first derivative is $f\u0026rsquo;(x)=4x^3-4x$.\nAnalysis of functions: relative extrema As a consequence of the previous result we can also use the first derivative to determine the relative extrema of a function.\nTheorem - First derivative test. Let $f(x)$ be a function with first derivative in an interval $I\\subseteq \\mathbb{R}$ and let $x_0\\in I$ be a critical point of $f$ ($f\u0026rsquo;(x_0)=0$).\nIf $f\u0026rsquo;(x)\u0026gt;0$ on an open interval extending left from $x_0$ and $f\u0026rsquo;(x)\u0026lt;0$ on an open interval extending right from $x_0$, then $f$ has a relative maximum at $x_0$. If $f\u0026rsquo;(x)\u0026lt;0$ on an open interval extending left from $x_0$ and $f\u0026rsquo;(x)\u0026gt;0$ on an open interval extending right from $x_0$, then $f$ has a relative minimum at $x_0$. If $f\u0026rsquo;(x)$ has the same sign on both an open interval extending left from $x_0$ and an open interval extending right from $x_0$, then $f$ has an inflection point at $x_0$. A vanishing derivative is a necessary but not sufficient condition for the function to have a relative extrema at a point. Example. The function $f(x)=x^3$ has derivative $f\u0026rsquo;(x)=3x^2$; it has a critical point at $x=0$. However it does not have a relative extrema at that point, but an inflection point.\nExample. Consider again the function $f(x)=x^4-2x^2+1$ and let us analyze its relative extrema now. Its first derivative is $f\u0026rsquo;(x)=4x^3-4x$.\nAnalysis of functions: concavity The concavity of a function can be determined by de second derivative.\nTheorem. Let $f(x)$ be a function with second derivative in an interval $I\\subseteq \\mathbb{R}$.\nIf $\\forall x\\in I\\ f\u0026rsquo;\u0026rsquo;(x)\u0026gt; 0$ then $f$ is concave up (convex) on $I$. If $\\forall x\\in I\\ f\u0026rsquo;\u0026rsquo;(x)\u0026lt; 0$ then $f$ is concave down (concave) on $I$. Example. The function $f(x)=x^2$ has second derivative $f\u0026rsquo;\u0026rsquo;(x)=2\u0026gt;0$ $\\forall x\\in \\mathbb{R}$, so it is concave up in all $\\mathbb{R}$.\nA function can be concave up or down and not have second derivative. Example. Let us analyze the concavity of the same function of previous examples $f(x)=x^4-2x^2+1$. Its second derivative is $f\u0026rsquo;\u0026rsquo;(x)=12x^2-4$.\nFunction approximation Approximating a function with the derivative The tangent line to the graph of a function $f(x)$ at $x=a$ can be used to approximate $f$ in a neighbourhood of $a$.\nThus, the increment of a function $f(x)$ in an interval $[a,a+\\Delta x]$ can be approximated multiplying the derivative of $f$ at $a$ by the increment of $x$ $$\\Delta y \\approx f\u0026rsquo;(a)\\Delta x$$\nExample - Area of a square. In the previous example of the function $y=x^2$ that measures the area of a metallic square of side $x$, if the side of the square is $a$ and we increment it by a quantity $\\Delta x$, then the increment on the area will be approximately $$\\Delta y \\approx f\u0026rsquo;(a)\\Delta x = 2a\\Delta x.$$ In the figure below we can see that the error of this approximation is $\\Delta x^2$, which is smaller than $\\Delta x$ when $\\Delta x$ approaches to 0.\nApproximating a function by a polynomial Another useful application of the derivative is the approximation of functions by polynomials.\nPolynomials are functions easy to calculate (sums and products) with very good properties:\nDefined in all the real numbers. Continuous. Differentiable of all orders with continuous derivatives. Goal Approximate a function $f(x)$ by a polynomial $p(x)$ near a point $x=a$.\nApproximating a function by a polynomial of order 0 A polynomial of degree 0 has equation $$p(x) = c_0,$$ where $c_0$ is a constant.\nAs the polynomial should coincide with the function at $a$, it must satisfy $$p(a) = c_0 = f(a).$$\nTherefore, the polynomial of degree 0 that best approximate $f$ near $a$ is $$p(x) = f(a).$$\nApproximating a function by a polynomial of order 1 A polynomial of order 1 has equation $$p(x) = c_0+c_1x,$$ but it can also be written as $$p(x) = c_0+c_1(x-a).$$\nAmong all the polynomials of degree 1, the one that best approximates $f(x)$ near $a$ is that which meets the following conditions\n$p$ and $f$ coincide at $a$: $p(a) = f(a)$, $p$ and $f$ have the same rate of change at $a$: $p\u0026rsquo;(a) = f\u0026rsquo;(a)$. The last condition guarantees that $p$ and $f$ have approximately the same tendency, but it requires the function $f$ to be differentiable at $a$.\nImposing the previous conditions we have\n$p(x)=c_0+c_1(x-a) \\Rightarrow p(a)=c_0+c_1(a-a)=c_0=f(a)$, $p\u0026rsquo;(x)=c_1 \\Rightarrow p\u0026rsquo;(a)=c_1=f\u0026rsquo;(a)$. Therefore, the polynomial of degree 1 that best approximates $f$ near $a$ is $$p(x) = f(a)+f \u0026lsquo;(a)(x-a),$$ which turns out to be the tangent line to $f$ at $(a,f(a))$.\nApproximating a function by a polynomial of order 2 A polynomial of order 2 is a parabola with equation $$p(x) = c_0+c_1x+c_2x^2,$$ but it can also be written as $$p(x) = c_0+c_1(x-a)+c_2(x-a)^2.$$\nAmong all the polynomials of degree 2, the one that best approximate $f(x)$ near $a$ is that which meets the following conditions\n$p$ and $f$ coincide at $a$: $p(a) = f(a)$, $p$ and $f$ have the same rate of change at $a$: $p\u0026rsquo;(a) = f\u0026rsquo;(a)$. $p$ and $f$ have the same concavity at $a$: $p\u0026rsquo;\u0026rsquo;(a)=f\u0026rsquo;\u0026rsquo;(a)$. The last condition requires the function $f$ to be differentiable twice at $a$.\nImposing the previous conditions we have\n$p(x)=c_0+c_1(x-a) \\Rightarrow p(a)=c_0+c_1(a-a)=c_0=f(a)$, $p\u0026rsquo;(x)=c_1 \\Rightarrow p\u0026rsquo;(a)=c_1=f\u0026rsquo;(a)$. $p\u0026rsquo;\u0026rsquo;(x)=2c_2 \\Rightarrow p\u0026rsquo;\u0026rsquo;(a)=2c_2=f\u0026rsquo;\u0026rsquo;(a) \\Rightarrow c_2=\\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}$. Therefore, the polynomial of degree 2 that best approximates $f$ near $a$ is $$p(x) = f(a)+f\u0026rsquo;(a)(x-a)+\\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}(x-a)^2.$$\nApproximating a function by a polynomial of order $n$ A polynomial of order $n$ has equation $$p(x) = c_0+c_1x+c_2x^2+\\cdots +c_nx^n,$$ but it can also be written as $$p(x) = c_0+c_1(x-a)+c_2(x-a)^2+\\cdots +c_n(x-a)^n.$$\nAmong all the polynomials of degree $n$, the one that best approximate $f(x)$ near $a$ is that which meets the following $n+1$ conditions:\n$p(a) = f(a)$, $p\u0026rsquo;(a) = f\u0026rsquo;(a)$, $p\u0026rsquo;\u0026rsquo;(a)=f\u0026rsquo;\u0026rsquo;(a)$, $\\cdots$ $p^{(n)}(a)=f^{(n)}(a)$. The successive derivatives of $p$ are\n$$ \\begin{aligned} p(x) \u0026amp;= c_0+c_1(x-a)+c_2(x-a)^2+\\cdots +c_n(x-a)^n,\\newline p\u0026rsquo;(x)\u0026amp; = c_1+2c_2(x-a)+\\cdots +nc_n(x-a)^{n-1},\\newline p\u0026rsquo;\u0026rsquo;(x)\u0026amp; = 2c_2+\\cdots +n(n-1)c_n(x-a)^{n-2},\\newline \\vdots \\newline p^{(n)}(x)\u0026amp;= n(n-1)(n-2)\\cdots 1 c_n=n!c_n. \\end{aligned} $$\nImposing the previous conditions we have\n$p(a) = c_0+c_1(a-a)+c_2(a-a)^2+\\cdots +c_n(a-a)^n=c_0=f(a)$, $p\u0026rsquo;(a) = c_1+2c_2(a-a)+\\cdots +nc_n(a-a)^{n-1}=c_1=f\u0026rsquo;(a)$, $p\u0026rsquo;\u0026rsquo;(a) = 2c_2+\\cdots +n(n-1)c_n(a-a)^{n-2}=2c_2=f\u0026rsquo;\u0026rsquo;(a)\\Rightarrow c_2=f\u0026rsquo;\u0026rsquo;(a)/2$, $\\cdots$ $p^{(n)}(a)=n!c_n=f^{(n)}(a)=c_n=\\frac{f^{(n)}(a)}{n!}$. Taylor polynomial of order $n$ Definition - Taylor polynomial. Given a function $f(x)$ differentiable $n$ times at $x=a$, the Taylor polynomial of order $n$ of $f$ at $a$ is the polynomial with equation\n$$ \\begin{aligned} p_{f,a}^n(x) \u0026amp;= f(a) + f\u0026rsquo;(a)(x-a) + \\frac{f\u0026rsquo;\u0026rsquo;(a)}{2}(x-a)^2 + \\cdots + \\frac{f^{(n)}(a)}{n!}(x-a)^n = \\newline \u0026amp;= \\sum_{i=0}^{n}\\frac{f^{(i)}(a)}{i!}(x-a)^i. \\end{aligned} $$\nThe Taylor polynomial of order $n$ of $f$ at $a$ is the $n$th degree polynomial that best approximates $f$ near $a$, as is the only one that meets the previous conditions. Example. Let us approximate the function $f(x)=\\log x$ near the value $1$ by a polynomial of order $3$.\nThe equation of the Taylor polynomial of order $3$ of $f$ at $a=1$ is $$p_{f,1}^3(x)=f(1)+f\u0026rsquo;(1)(x-1)+\\frac{f\u0026rsquo;\u0026rsquo;(1)}{2}(x-1)^2+\\frac{f\u0026rsquo;\u0026rsquo;\u0026rsquo;(1)}{3!}(x-1)^3.$$ The derivatives of $f$ at $1$ up to order $3$ are\n$$ \\begin{array}{lll} f(x)=\\log x \u0026amp; \\quad \u0026amp; f(1)=\\log 1 =0,\\newline f\u0026rsquo;(x)=1/x \u0026amp; \u0026amp; f\u0026rsquo;(1)=1/1=1,\\newline f\u0026rsquo;\u0026rsquo;(x)=-1/x^2 \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;(1)=-1/1^2=-1,\\newline f\u0026rsquo;\u0026rsquo;\u0026rsquo;(x)=2/x^3 \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;\u0026rsquo;(1)=2/1^3=2. \\end{array} $$\nAnd substituting into the polynomial equation we get $$p_{f,1}^3(x)=0+1(x-1)+\\frac{-1}{2}(x-1)^2+\\frac{2}{3!}(x-1)^3= \\frac{2}{3}x^3-\\frac{3}{2}x^2+3x-\\frac{11}{6}.$$\nMaclaurin polynomial of order $n$ The Taylor polynomial equation has a simpler form when the polynomial is calculated at $0$. This special case of Taylor polynomial at $0$ is known as the Maclaurin polynomial.\nDefinition - Maclaurin polynomial. Given a function $f(x)$ differentiable $n$ times at $0$, the Maclaurin polynomial of order $n$ of $f$ is the polynomial with equation\n$$ \\begin{aligned} p_{f,0}^n(x)\u0026amp;=f(0)+f\u0026rsquo;(0)x+\\frac{f\u0026rsquo;\u0026rsquo;(0)}{2}x^2+\\cdots +\\frac{f^{(n)}(0)}{n!}x^n = \\newline \u0026amp;=\\sum_{i=0}^{n}\\frac{f^{(i)}(0)}{i!}x^i. \\end{aligned} $$\nExample. Let us approximate the function $f(x)=\\sin x$ near the value $0$ by a polynomial of order $3$.\nThe Maclaurin polynomial equation of order $3$ of $f$ is $$p_{f,0}^3(x)=f(0)+f\u0026rsquo;(0)x+\\frac{f\u0026rsquo;\u0026rsquo;(0)}{2}x^2+\\frac{f\u0026rsquo;\u0026rsquo;\u0026rsquo;(0)}{3!}x^3.$$ The derivatives of $f$ at $0$ up to order $3$ are\n$$\\begin{array}{lll} f(x)=\\sin x \u0026amp; \\quad \u0026amp; f(0)=\\sin 0 =0,\\newline f\u0026rsquo;(x)=\\cos x \u0026amp; \u0026amp; f\u0026rsquo;(0)=\\cos 0=1,\\newline f\u0026rsquo;\u0026rsquo;(x)=-\\sin x \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;(0)=-\\sin 0=0,\\newline f\u0026rsquo;\u0026rsquo;\u0026rsquo;(x)=-\\cos x \u0026amp; \u0026amp; f\u0026rsquo;\u0026rsquo;\u0026rsquo;(0)=-\\cos 0=-1. \\end{array} $$\nAnd substituting into the polynomial equation we get $$p_{f,0}^3(x)=0+1\\cdot x+\\frac{0}{2}x^2+\\frac{-1}{3!}x^3= x-\\frac{x^3}{6}.$$\nMaclaurin polynomials of elementary functions $$ \\renewcommand{\\arraystretch}{2.5} \\begin{array}{cc} \\hline f(x) \u0026amp; p_{f,0}^n(x) \\newline \\hline \\sin x \u0026amp; \\displaystyle x - \\frac{x^3}{3!} + \\frac{x^5}{5!} - \\cdots + (-1)^k\\frac{x^{2k-1}}{(2k-1)!} \\mbox{ if $n=2k$ or $n=2k-1$}\\newline \\cos x \u0026amp; \\displaystyle 1 - \\frac{x^2}{2!} + \\frac{x^4}{4!} - \\cdots + (-1)^k\\frac{x^{2k}}{(2k)!} \\mbox{ if $n=2k$ or $n=2k+1$}\\newline \\arctan x \u0026amp; \\displaystyle x - \\frac{x^3}{3} + \\frac{x^5}{5} - \\cdots + (-1)^k\\frac{x^{2k-1}}{(2k-1)} \\mbox{ if $n=2k$ or $n=2k-1$}\\newline e^x \u0026amp; \\displaystyle 1 + x + \\frac{x^2}{2!} + \\frac{x^3}{3!} + \\cdots + \\frac{x^n}{n!}\\newline \\log(1+x) \u0026amp; \\displaystyle x - \\frac{x^2}{2} + \\frac{x^3}{3} - \\cdots + (-1)^{n-1}\\frac{x^n}{n}\\newline \\hline \\end{array} $$\nTaylor remainder and Taylor formula Taylor polynomials allow to approximate a function in a neighborhood of a value $a$, but most of the times there is an error in the approximation.\nDefinition - Taylor remainder. Given a function $f(x)$ and its Taylor polynomial of order $n$ at $a$, $p_{f,a}^n(x)$, the Taylor remainder of order $n$ of $f$ at $a$ is the difference between the function and the polynomial,\n$$r_{f,a}^n(x)=f(x)-p_{f,a}^n(x).$$\nThe Taylor remainder measures the error int the approximation of $f(x)$ by the Taylor polynomial and allow us to express the function as the Taylor polynomial plus the Taylor remainder\n$$f(x)=p_{f,a}^n(x) + r_{f,a}^n(x).$$\nThis expression is known as the Taylor formula of order $n$ or $f$ at $a$.\nIt can be proved that\n$$\\lim_{h\\rightarrow 0}\\frac{r_{f,a}^n(a+h)}{h^n}=0,$$\nwhich means that the remainder $r_{f,a}^n(a+h)$ is much smaller than $h^n$.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1606295051,"objectID":"c2c1680f6068dcb308d49e1be4b37a9b","permalink":"/en/teaching/calculus/manual/derivatives-one-variable/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/derivatives-one-variable/","section":"teaching","summary":"Concept of derivative Increment Definition - Increment of a variable. An increment of a variable $x$ is a change in the value of the variable; it is denoted $\\Delta x$. The increment of a variable $x$ along an interval $[a,b]$ is given by $$\\Delta x = b-a.","tags":["Derivative","Tangent Line"],"title":"One variable differential calculus","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"A picture is worth a thousand words. That\u0026rsquo;s why data is usually presented in a graphical form, and for that reason spreadsheets provide different types of charts. This section presents the main chart types and how to plot them in Excel 2010.\nCharts creation Regardless the chart type, the steps to create a chart are:\nSelect the range that contains the data to plot. Data should be arranged in series (vertically or horizontally) following the next rules:\nDo not leave empty rows or columns within the data range or between data labels and data. Only one row and/or one column should be used for data labels. Each data label should be unique. Select the type of chart from the Charts panel on the ribbon\u0026rsquo;s Insert tab.\nSet the chart design (data serie to plot, order, etc.). You can use the ribbon\u0026rsquo;s Design tab.\nApply a layout (title, axis, legend, grids, data labels, etc.). You can use the ribbon\u0026rsquo;s Layout tab.\nApply a style format (text, line and background colours). You can use the ribbon\u0026rsquo;s Format tab.\nCharts are embedded in the same worksheet that data by default but it\u0026rsquo;s possible to put it on a separate worksheet. For that right-clicking the chart background and select Move chart. In the dialog that appears select New sheet give a name to the worksheet a click OK.\nCharts are linked to data from which they come. This means that any change in the data will be immediately reflected in any derived chart.\nTypes of charts There are eleven major chart types (Column, Line, Pie, Bar, Area, Scatter, Stock, Surface, Doughnut, Bubble and Radar) and each has many subtypes.\nEach chart type has a purpose and requires data to be arranged in a particular way. So choosing the right chart is probably the most important decision. The main chart types and their purpose are presented below.\nColumn and bar charts A column or bar chart is a set of bars (usually rectangles) graphed over an horizontal and vertical axis (also known as XY axis). Each bar is graphed over the corresponding category with a length proportional to the value of the category in the data serie. Usually more than one data serie are plotted and bars corresponding to different series are differentiated with colours. In a column chart, categories appear horizontally and values appear vertically, whereas in a bar chart, categories appear vertically. Column charts, unlike bar charts, is suitable for emphasizing data variations over a period of time.\nExample. The next figure shows a column chart showing the evolution of fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (longest bars) and apples the cheapest (shortest bars) along the time. Also that the prices of strawberries and bananas are decreasing, the prices of oranges are increasing and the prices of apples are almost stables.\nExcel offers a lot of shapes for the bars (rectangles, cylinders, cones, pyramids) in 2-D an 3-D, and allows to stack bars. Also is possible to add error bars to the bars.\nExample. The animation below shows how to create a column chart for the apple prices evolution (one data serie).\nAnd the animation below shows how to create a column chart for the fruit prices evolution (several data series).\nLine charts A line chart display a serie of data points called markers connected by straight line segments. Each marker is graphed over the corresponding category at a height proportional to the value of the category in the data serie. It\u0026rsquo;s similar to a column chart but using markers at the end of bars instead of bars, and joining them with straight line segments. Line charts are suitable for displaying and comparing trends over a period of time.\nExample. The next figure shows a line chart showing the evolution of fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (higher markers) and apples the cheapest (lowest markers) along the time. Also that the prices of strawberries and bananas are decreasing (lines with negative slope), the prices of oranges are increasing (line with positive slope) and the prices of apples first decrease an then increases.\nExcel offers different subtypes of line charts, with or without data points in 2-D and 3-D, and also allows to stack lines.\nExample. The animation below shows how to create a line chart for the fruit prices evolution. Looking at the chart you can quickly realize which prices are increasing and which prices are decreasing.\nArea charts An area chart is similar to a line chart but filling the area between the line and the horizontal axis. Area charts are suitable for displaying the relative importance of values over time. It\u0026rsquo;ss similar to a line chart, but because the area between lines is filled in, the area chart puts greater emphasis on the magnitude of values and less emphasis on the flow of change over time.\nExample. The next figure shows an area chart showing the evolution of accumulated fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (the largest area) and that accumulated prices are decreasing.\nExcel allows to plot areas in 2-D or 3-D and also to stack areas.\nExample. The animation below shows how to create an area chart for the evolution of accumulated fruit prices.\nPie charts A pie chart is a circle divided into slices called sectors. Each sector represents a category of the data serie an has an angle or area proportional to the quantity that correspond to the category.\nPie charts are suitable for displaying the parts of a whole. Unlike the other charts presented so far, which can graph multiple data series, pie charts can graph just one data series.\nExample. The next figure shows a pie chart comparing fruit prices. Looking at the chart you can quickly realize that strawberries are the most expensive (biggest sector) and apples are the cheapest (smallest sector).\nAgain Excel has several subtypes that allows you to emphasize a part of the whole in 2-D or 3-D.\nExample. The animation below shows how to create a pie chart comparing the fruit prices of January.\nDoughnut charts Doughnut charts are similar to pie charts except for its ability to display more than one data series.\nExample. The next figure shows a doughnut chart comparing fruit prices in January and April. The inner doughnut correspond to prices of January and the outer to prices of April. Looking at the chart you can quickly realize that, although the price of apples were smaller in April than in January, it was relatively higher in April than in January, compared to the rest of fruit prices.\nExample. The animation below shows how to create a doughnut chart comparing the fruit prices in January and April.\nXY Scatter charts An XY scatter chart is a point cloud graphed using Cartesian coordinates. Each point correspond to a pair of values. The first value of the pair determines the position on the horizontal axis and the second value of the pair determines the position on the vertical axis. XY Scatter charts are suitable for displaying correlation among the data pairs of two numeric variables.\nExample. The next figure shows an XY Scatter chart relating banana and strawberry prices. Looking at the chart you can quickly realize that there is a positive correlation (when banana price increase, strawberry price increase too).\nExample. The animation below shows how to create an XY Scatter chart relating banana and strawberry prices.\nHistograms A histogram is a graphical representation of the distribution of numerical data. It\u0026rsquo;s similar to a column chart but data values are grouped into interval classes and each bar represents a class. Histograms charts are suitable for displaying frequency of data values in one numeric variable.\nTo plot an histogram previously is required to load the Analysis ToolPak add-in.\nExample. The animation below shows how to create an histogram of the grades in a course.\nChart design Changing the data source You can change the data range graphed in a chart anytime clicking the Select Data button of the Data panel on the ribbon\u0026rsquo;s Design tab. This brings a dialog where you can select the new data range, switch rows and columns series, add new data series to graph and their labels, remove or edit existing data series or change the order in which are graphed in the chart.\nObserve that is possible to plot in the same chart data in separated ranges.\nExample. The animation below shows how to add the orange prices data serie to a column chart for the apple prices evolution.\nSwitching rows and columns When Excel creates a new chart with x and y axis, it automatically graphs the data by rows in the selected range so that the column headings appear along the horizontal axis and the row headings appear in the legend. If you want to switch from row series to column series, that is, that row headings appear on the horizontal axis and the column headings appear in the legend, click the Switch Row/Column button of the Data panel on the ribbon\u0026rsquo;s Design tab.\nExample. The animation below shows how to switch from row series to column series in a column chart for the fruit prices evolution.\nChart layout After creating a chart you can add new layout elements like chart titles, axis titles, legends, data labels, grids, trend lines, error bars, etc. or modify the existing ones.\nTo format any element of a chart right-click the element (bar, line, title, axis, legend, etc) and select the corresponding option at the bottom of the contextual menu. This will open a dialog where you can perform the desired changes for the selected element.\nTitles You can add a title to the chart selecting the chart and clicking the Chart Title button of the Labels panel on the ribbon\u0026rsquo;s Layout tab. That will show a drop down menu that let you choose between a centered overlay title (inside the chart area) or an above chart (outside the chart area).\nExample. The animation below shows how to add a title to a column chart for the fruits prices evolution and how to change the font colour.\nAxes You can add a title to the horizontal or vertical axes selecting the chart and clicking the Axis Title button of the Labels panel on the ribbon\u0026rsquo;s Layout tab.\nExample. The animation below shows how to add a title to the horizontal and vertical axes of a column chart for the fruits prices evolution. The vertical axis title is rotated 90 degrees.\nOne of the most important parts of a chart are axis scales. Excel allows you to configure the axis scale setting the minimum and maximum showed in the axis, the major and minor units, the format of thick marks (small lines intersecting axis that indicate categories, scale units or chart data series) and their labels, or even the scale type (linear by default or logarithmic). To configure an axis right-click any label of the axis (not the axis title) and select the Format Axis option from the contextual menu. This will open a dialog with a lot of axis options. Change whatever you want and click Close.\nExample. The animation below shows how to change the scales of the horizontal and vertical axes of a column chart for the apple prices evolution. Observe that in the original chart the minimum value of the vertical axis scale is 1.26, what magnify the differences between month prices. To avoid that the minimum value of vertical scale is set to €0, and the major unit is set to €0.1. Also the format of tick marks labels is changed to currency with two decimal places. On the other hand, the tick marks labels of the horizontal axis are rotated 30 degrees counterclockwise.\nGrid A grid is composed of horizontal or vertical lines (usually equally spaced) over the axes. Grids are helpful to mark out more precisely the position of markers, bars, lines or other chart elements in the axis scales.\nExcel allows to plot both horizontal and vertical grid lines for major and minor tick marks. To plot vertical grid lines right-click any label of the horizontal axis and select the Add Major Gridlines option for drawing lines over the major tick marks, or Add Minor Gridlines for drawing lines over the minor tick marks. To plot horizontal grid lines do the same but right-clicking any label of the vertical axis. Once the grid line is plotted you can change its format right-clicking any label of the axis and selecting the Format Major Gridlines or Format Minor Gridlines option.\nExample. The animation below shows how add vertical major grid lines and horizontal minor grid lines. Also show how to change the line style of minor grid lines.\nLegends A legend is key that identifies patterns, colors, or symbols associated with the markers of a chart data series. The legend shows the data series name corresponding to each data marker.\nExcel usually plots a legend to the right of the chart but it\u0026rsquo;s possible to change the legend to other position or to remove it. To plot the lenged of a chart click the Legend button of the Labels panel on the ribbon\u0026rsquo;s Layout tab. This shows a drop down menu with different positions for the legend. After plotting the legend, if you want to format it right-click it and select Format Legend. This will open a dialog where you can choose the legend position, the frame and background colours and many other legend aspects. Finally if you want to remove a legend, just select it and press the Supr key.\nExample. The animation below shows how add a legend for the fruits to the right of a column chart with the fruit prices evolution. Also it shows how to plot a frame around the legend and how to move the legend to the top.\nData series The aspect of any graphic element used to represent a data serie in a chart (bars, markers, lines, sectors, etc) can be easily changed. To format the graphic element corresponding to a data serie right-click it and select the Format Data Series option. This will open a dialog where you can change the shape, border and background colours, space between elements, and many other aspects. It\u0026rsquo;s also possible to format only one element of the serie. For that you need to click it two times (not double-clicking), then right-click it and select the Format Data Point option.\nExample. The animation below shows how to change the background colour of orange bars in a column chart for the fruits prices evolution. It also shows how to add a glow effect over the highest bar.\nData labels Sometime is useful to plot the values for a data serie next to their bars, markers, lines, sectors or other chart elements. To plot the values of a data serie right-click the chart element (bar, marker, line, sector, etc) corresponding to the data serie and select the Data Labels option. This will plot the value corresponding to each bar, marker, sector, etc. close to it.\nExample. The animation below shows how add a legend for the fruits to the right of a column chart with the fruit prices evolution. Also it shows how to plot a frame around the legend and how to move the legend to the top.\nChart styles Finally, the Chart styles panel on the ribbon\u0026rsquo;s Design tab has many predefined chart styles that combine different colours for graphics elements and backgrounds. Apply one of those styles is as easy as select the chart an click the desired style.\nAlso, the Shape styles panel on the ribbon\u0026rsquo;s Format tab have predefined styles for the background area and frame of the chart.\nExample. The animation below shows how to apply some chart and shape styles to a column chart with the fruit prices evolution.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"73bbf06356c8b605d786a699a05bfcb7","permalink":"/en/teaching/excel/manual/charts/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/charts/","section":"teaching","summary":" ","tags":["Excel"],"title":"Plotting Charts","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Probability distribution of a continuous random variable Continuous random variables, unlike discrete random variables, can take any value in a real interval. Thus the range of a continuous random variables is infinite and uncountable.\nSuch a density of values makes impossible to compute the probability for each one of them, and therefore, it’s not possible to define a probabilistic model trough a probability function like with discrete random variables.\nBesides, usually the measurement of continuous random variable is limited by the precision of the measuring instrument. For instance, when somebody says that is 1.68 meters tall, his or her true height is no exactly 1.68 meters, because the precision of the measuring instrument is only cm (two decimal places). This means that the true height of that person is between 1.675 y 1.685 meters.\nHence, for continuous variables, it makes no sense to calculate the probability of an isolated value, and we will calculate probabilities for intervals.\nProbability density function To model the probability distribution of a continuous random variable we use a probability density function.\nDefinition - Probability density function. The probability density function of a continuous random variable $X$ is a function $f(x)$ that meets the following conditions:\nIt is non-negative: $f(x)\\geq 0$ $\\forall x\\in \\mathbb{R}$,\nThe area bounded by the curve of the density function and the x-axis is equal to 1, that is,\n$$\\int_{-\\infty}^{\\infty} f(x)\\; dx = 1.$$\nThe probability that $X$ assumes a value between $a$ and $b$ is equal to the area bounded by the density function and the x-axis from $a$ to $b$, that is,\n$$P(a\\leq X\\leq b) = \\int_a^b f(x)\\; dx$$\nThe probability density function measures the relative likelihood of every value, but $f(x)$ is not the probability of $x$, cause $P(X=x)=0$ for every $x$ value by definition. Distribution function The same way that for discrete random variables, for continuous random variables it makes sense to calculate cumulative probabilities.\nDefinition - Distribution function. The distribution function of a continuous random variable $X$ is a function $F(x)$ that maps every value $a$ to the probability that $X$ takes on a value less than or equal to $a$, that is,\n$$F(a) = P(X\\leq a) = \\int_{-\\infty}^{a} f(x)\\; dx.$$\nProbabilities as areas To calculate probabilities with a continuous random variable we measure the area bounded by the probability density function and the x-axis in an interval.\nThis area can be calculated integrating the density function or subtracting the distribution function that is easier,\n$$P(a\\leq X\\leq b) = \\int_a^b f(x), dx = F(b)-F(a)$$\nExample. Given the following function\n$$ f(x) = \\begin{cases} 0 \u0026amp; \\mbox{if $x\u0026lt;0$} \\newline e^{-x} \u0026amp; \\mbox{if $x\\geq 0$}, \\end{cases} $$\nlet’s check that is a density function.\nAs this function is clearly non-negative, we have to check that total area bounded by the curve and the x-axis is 1.\n$$ \\begin{align*} \\int_{-\\infty}^\\infty f(x)\\;dx \u0026amp;= \\int_{-\\infty}^0 f(x)\\;dx +\\int_0^\\infty f(x)\\;dx = \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty e^{-x}\\;dx =\\newline \u0026amp;= \\left[-e^{-x}\\right]^{\\infty}_0 = -e^{-\\infty}+e^0 = 1. \\end{align*} $$\nNow, let’s calculate the probability of $X$ having a value between 0 and 2.\n$$ \\begin{align*} P(0\\leq X\\leq 2) \u0026amp;= \\int_0^2 f(x)\\;dx = \\int_0^2 e^{-x}\\;dx = \\left[-e^{-x}\\right]^2_0 = -e^{-2}+e^0 = 0.8646. \\end{align*} $$\nPopulation statistics The calculation of the population statistics is similar to the case of discrete variables, but using the density function instead of the probability function, and extending the discrete sum to the integral.\nThe most important are:\nDefinition - Continuous random variable mean The mean or the expectec value of a continuous random variable $X$ is the integral of the products of its values and its probabilities:\n$$\\mu = E(X) = \\int_{-\\infty}^\\infty x f(x)\\; dx$$\nDefinition - Continuous random variable variance and standard deviation The variance of a continuous random variable $X$ is the integral of the products of its squared values and its probabilities, minus the squared mean:\n$$\\sigma^2 = Var(X) = \\int_{-\\infty}^\\infty x^2f(x)\\; dx -\\mu^2$$\nThe standard deviation of a random variable $X$ is the square root of the variance:\n$$\\sigma = +\\sqrt{\\sigma^2}$$\nExample. Let $X$ be a variable with the following probability density function\n$$ f(x) = \\begin{cases} 0 \u0026amp; \\mbox{si $x\u0026lt;0$}\\newline e^{-x} \u0026amp; \\mbox{si $x\\geq 0$} \\end{cases} $$\nThe mean is\n$$ \\begin{aligned} \\mu \u0026amp;= \\int_{-\\infty}^\\infty xf(x)\\;dx = \\int_{-\\infty}^0 xf(x)\\;dx +\\int_0^\\infty xf(x)\\;dx = \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty xe^{-x}\\;dx =\\newline \u0026amp;= \\left[-e^{-x}(1+x)\\right]_0^{\\infty} = 1. \\end{aligned} $$\nand the variance is\n$$ \\begin{aligned} \\sigma^2 \u0026amp;= \\int_{-\\infty}^\\infty x^2f(x)\\;dx -\\mu^2 = \\int_{-\\infty}^0 x^2f(x)\\;dx +\\int_0^\\infty x^2f(x)\\;dx -\\mu^2 = \\newline \u0026amp;= \\int_{-\\infty}^0 0\\;dx +\\int_0^\\infty x^2e^{-x}\\;dx -\\mu^2= \\left[-e^{-x}(x^2+2x+2)\\right]^{\\infty}_0 - 1^2 = \\newline \u0026amp;= 2e^0-1 = 1. \\end{aligned} $$\nContinuous probability distribution models According to the type of experiment where the random variable is measured, there are different probability distributions models. The most common are\nContinuous uniform. Normal. Student’s T. Chi-square. Fisher-Snedecor’s F. Continuous uniform distribution When all the values of a random variable $X$ have equal probability, the probability distribution of $X$ is uniform.\nDefinition \u0026ndash; Continuous uniform distribution $U(a,b)$. A continuous random variable $X$ follows a probability distribution model uniform of parameters $a$ and $b$, noted $X\\sim U(a,b)$, if its range is $\\mbox{Ran}(X) = [a,b]$ and its density function is\n$$f(x)= \\frac{1}{b-a}\\quad \\forall x\\in [a,b]$$\nObserve that $a$ and $b$ are the minimum and the maximum of the range respectively, and that the density function is constant.\nThe mean and the variance are $$\\mu = \\frac{a+b}{2}$$ and $$\\sigma^2=\\frac{(b-a)^2}{12}.$$\nExample. The generation of a random number between 0 and 1 is follows a continuous uniform distribution $U(0,1)$.\nAs the density function is constant, the distribution function has a linear growth.\nExample. A bus has a frequency of 15 minutes. Assuming that a person can arrive to the bus station in any time, what is the probability of waiting for the bus between 5 and 10 minutes?\nIn this case, the variable $X$ that measures the waiting time follows a continuous uniform distribution $U(0,15)$ as any waiting time between 0 and 15 is equally likely.\nThen, the probability of waiting between 5 and 10 minutes is\n$$ \\begin{aligned} P(5\\leq X\\leq 10) \u0026amp;= \\int_{5}^{10} \\frac{1}{15}\\;dx = \\left[\\frac{x}{15}\\right]^{10}_5 = \\newline \u0026amp;= \\frac{10}{15}-\\frac{5}{15} =\\frac{1}{3}. \\end{aligned} $$\nAnd the expected waiting (the mean) time is $\\mu=\\frac{0+15}{2}=7.5$ minutes.\nNormal distribution The normal distribution model is, without a doubt, the most important continuous distribution model as it is the most common in Nature.\nDefinition - Normal distribution $N(\\mu,\\sigma)$. A continuous random variable $X$ follows a probability distribution model normal of parameters $\\mu$ and $\\sigma$, noted $X\\sim N(\\mu,\\sigma)$, if its range is $\\mbox{Ran}(X) = (-\\infty,\\infty)$ and its density function is\n$$f(x)= \\frac{1}{\\sigma\\sqrt{2\\pi}}e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}.$$\nThe two parameters $\\mu$ and $\\sigma$ are the mean and the standard deviation of the population respectively.\nThe plot of the probability density function of a normal distribution $N(\\mu,\\sigma)$ is bell shaped and it is known as a Gauss bell.\nThe bell shape depends on the mean $\\mu$ and the standard deviation $\\sigma$,\nThe mean $\\mu$ sets the center of the bell. The standard deviation sets $\\sigma$ the width of the bell. The plot of the distribution function of a normal distribution is S shaped.\nNormal distribution properties It is symmetric with respect to the mean, and therefore, the coefficient of skewness is zero, $g_1=0$. It is mesokurtic, as the density function is bell shaped, and so, the coefficient of kurtosis is zero, $g_2=0$. The mean, median and mode are the same $$\\mu = Me = Mo.$$ It asymptotically approaches 0 when $x$ tends to $\\pm \\infty$. $P(\\mu-\\sigma \\leq X \\leq \\mu+\\sigma) = 0.68$ $P(\\mu-2\\sigma \\leq X \\leq \\mu+2\\sigma) = 0.95$ $P(\\mu-3\\sigma \\leq X \\leq \\mu+3\\sigma) = 0.99$ Example. It is known that the cholesterol level in females of age between 40 and 50 follows a normal distribution with mean 210 mg/dl and standard deviation 20 mg/dl.\nAccording to the Gauss bell properties, this means that\nThe 68% of females have a cholesterol level between $210\\pm 20$ mg/dl, i.e., between 190 and 230 mg/dl. The 95% of females have a cholesterol level between $210\\pm 2\\cdot 20$ mg/dl, i.e., between 170 and 250 mg/dl. The 99% of females have a cholesterol level between $210\\pm 3\\cdot 20$ mg/dl, i.e., between 150 and 270 mg/dl. Example of blood analysis. In blood analysis it is common to use the interval $\\mu\\pm 2\\sigma$ to detect possible pathologies. In the case of cholesterol, this interval is $[170\\text{ mg/dl}, 250\\text{ mg/dl}]$.\nThus, when a women between 40 and 50 years of age has a cholesterol level out of this interval, it’s common to think about some pathology. However this person could be healthy, although the likelihood of that happening is only 5%.\nThe central limit theorem This behavior is common in many physical and biological variables in Nature.\nIf you think about the distribution of the height, for instance, you can check that most people in the population have a height around the mean, but as the heights move away from the mean, both below and above the mean, there are few and few people with such a heights.\nThe explanation for this behavior is the , that we will see in the next chapter; it states that a continuous random variable whose values depends on a huge number of independent factors adding their effects, always follows a normal distribution.\nThe standard normal distribution $N(0,1)$ The most important normal distribution has mean zero, $\\mu=0$, and standard deviation one, $\\sigma=1$. It is known as Standard normal distribution and usually represented as $Z\\sim N(0,1)$.\nCalculation of probabilities with the normal distribution To avoid integrating the normal density function to compute probabilities it’s common to use the distribution function, that is given in a tabular format like the one below. For instance, to calculate $P(Z\\leq 0.52)$\n0.00 0.01 0.02 \u0026hellip; 0.0 0.5000 0.5040 0.5080 \u0026hellip; 0.1 0.5398 0.5438 0.5478 \u0026hellip; 0.2 0.5793 0.5832 0.5871 \u0026hellip; 0.3 0.6179 0.6217 0.6255 \u0026hellip; 0.4 0.6554 0.6591 0.6628 \u0026hellip; 0.5 0.6915 0.6950 0.6985 \u0026hellip; ⋮ ⋮ ⋮ ⋮ ⋮ $$0.52 \\rightarrow \\mbox{row }0.5 + \\mbox{column }0.02$$\nTo compute cumulative probabilities to the right of a value, we can apply the rule for the complement event. For instance,\n$$P(Z\u0026gt;0.52) =1-P(Z\\leq 0.52) = 1-F(0.52) = 1 - 0.6985 = 0.3015.$$\nStandardization We have seen how to use the table of the standard normal distribution function to compute probabilities, but, what to do when the normal distribution is not the standard one?\nIn that case we can use standardization to transform any normal distribution in the standard normal distribution.\nTheorem - Standardization. If $X$ is a continuous random variables that follow a Normal probability distribution model with mean $\\mu$ and standard deviation $\\sigma$, $X\\sim N(\\mu,\\sigma)$, then the variable that result of subtracting $\\mu$ to $X$ and dividing by $\\sigma$, follows a Standard Normal probability distribution,\n$$X\\sim N(\\mu,\\sigma) \\Rightarrow Z=\\frac{X-\\mu}{\\sigma}\\sim N(0,1).$$\nThus, to compute probabilities with a non-standard normal distribution first we have to standardize the variable before using the table of the standard normal distribution function.\nExample. Assume that the grade of an exam $X$ follows a normal probability distribution model $N(\\mu=6,\\sigma=1.5)$. What percentage of students didn’t pass the exam?\nAs $X$ follows a non-standard normal distribution model, we have to apply standardization first, $Z=\\displaystyle \\frac{X-\\mu}{\\sigma} = \\frac{X-6}{1.5}$,\n$$ P(X\u0026lt;5) = P\\left(\\frac{X-6}{1.5}\u0026lt;\\frac{5-6}{1.5}\\right) = P(Z\u0026lt;-0.67). $$\nThen we can use the table of the standard normal distribution function,\n$$P(Z\u0026lt;-0.67) = F(-0.67) = 0.2514.$$\nTherefore, $25.14%$ of students didn’t pass the exam.\nChi-square distribution Definition - Chi-square distribution $\\chi^2(n)$. Given $n$ independent random variables $Z_1,\\ldots,Z_n$, all of them following a standard normal probability distribution, then the variable\n$$\\chi^2(n) = Z_1^2+\\cdots +Z_n^2,$$\nfollows a chi-square probability distribution with $n$ degrees of freedom.\nIts range is $\\mathbb{R}^+$ and its mean and variance are $\\mu = n$ and $\\sigma^2 = 2n.$.\nExample. Below are plotted the density functions of some chi-square distribution models.\nChi-square distribution properties The range is non-negative. If $X\\sim \\chi^2(n)$ and $Y\\sim \\chi^2(m)$, then $$X+Y \\sim \\chi^2(n+m).$$ It asymptotically approaches to a normal distribution as the degrees of freedom increase. As we will see in the next chapter, the chi-square distribution plays an important role in the estimation of the population variance and in the study of relations between qualitative variables.\nStudent’s t distribution Definition - Student’s t distribution $T(n)$. Given a variable $Z$ following a standard normal distribution model, $Z\\sim N(0,1)$, and a variable $X$ following a chi-square distribution model with $n$ degrees of freedom, $X\\sim \\chi^2(n)$, independent, the variable\n$$T = \\frac{Z}{\\sqrt{X/n}},$$\nfollows a Student’s t probability distribution model with $n$ degrees of freedom.\nIts range is $\\mathbb{R}$ and its mean and variance are $$\\mu = 0$$ and $$\\sigma^2 = \\frac{n}{n-2}$$ if $n\u0026gt;2$.\nExample. Below are plotted the density functions of some student\u0026rsquo;s t distribution models.\nStudent’s t distribution properties The mean, the median and the mode are the same, $\\mu=Me=Mo$. It is symmetric, $g_1=0$. It asymptotically approaches to the standard normal distribution as the degrees of freedom increase. In practice for $n\\geq 30$ both distributions are approximately the same. $$T(n)\\stackrel{n\\rightarrow \\infty}{\\approx}N(0,1).$$ As we will see in the next chapter, the Student’s t distribution plays an important role in the estimation of the population mean.\nFisher-Snedecor’s F distribution Definition - Fisher-Snedecor’s F distribution $F(m,n)$. Given two independent variables $X$ and $Y$ both following a chi-square probability distribution model with $m$ an $n$ degrees of freedom respectively, $X\\sim \\chi^2(m)$ and $Y\\sim \\chi^2(n)$, then the variable\n$$F = \\frac{X/m}{Y/n},$$\nfollows a Fisher-Snedecor’s F probability distribution model with $m$ and $n$ degrees of freedom.\nIts range is $\\mathbb{R}^+$ and its mean and variance are $$\\mu = \\frac{n}{n-2}$$ and $$\\sigma^2 =\\frac{2n^2(m+n−2)}{m(n-2)^2(n-4)}$$ if $n\u0026gt;4$.\nExample. Below are plotted the density functions of some Fisher-Snedecor\u0026rsquo;s F distribution models.\nFisher-Snedecor’s F distribution properties The range is non-negative. It satisfies $$F(m,n) =\\frac{1}{F(n,m)}.$$ Thus, if we name $f(m,n)_p$ the value that satisfies $P(F(m,n)\\leq f(m,n)_p)=p$, then $$f(m,n)_p =\\frac{1}{f(n,m)_{1-p}}$$ which is helpful in order to compute probabilities from the table of the distribution function. As we will see in the next chapter, the Fisher-Snedecor’s F distribution plays an important role in the comparison of population variances and in the analysis of variance test (ANOVA).\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"f27b27994d49369e7085870ab298be38","permalink":"/en/teaching/statistics/manual/continuous-random-variables/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/manual/continuous-random-variables/","section":"teaching","summary":"Probability distribution of a continuous random variable Continuous random variables, unlike discrete random variables, can take any value in a real interval. Thus the range of a continuous random variables is infinite and uncountable.","tags":["Statistics","Biostatistics","Random Variables"],"title":"Continuous Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Economics"],"content":"A database is an organised collection of data. Usually databases are composed of records that contains information about the same object (person, company, product, etc), and records are composed of fields that contains every piece of information (name, address, phone number, price, etc.).\nExample The next table show a students database with fields First name, Last name, Address, City, Birth date, Average grade and Passed credits.\nFirst name Last name Address City Birth date Average grade Passed credits María Sánchez García c. Estrella, 9 Madrid 23/10/1994 5,8 78 Carlos Pérez López c. Bravo Murillo, 34 3º-D Madrid 16/08/1993 7,9 123 Luis González Roca c. Antonio López, 67 1º-A Madrid 07/07/1995 8,2 45 Camen Aguirre Jordán c. Espada, 12 4º-C Sevilla 06/03/1994 4,2 28 Luisa Martín Garrido c. Cervantes, 14 Albacete 22/01/1994 6,7 54 Alberto Pintado Marín c. Arroyo, 27 2º-C Sevilla 10/03/1995 4,1 12 Marina Gómez Gómez c. Velázquez 28 4º-A Madrid 12/04/1994 7,7 62 Javier Yagüe Pinzón c. Rosales, 76 8º-B Madrid 18/12/1993 6,1 82 Lucas Guerrero Monzón c. Isaac Peral, 30 Bajo Albacete 12/01/1995 5,4 32 Database creation in Excel Excel allows to define databases as tables where fields are defined in columns and records in rows. The first row of the table contains labels for each field. This tables are also called data lists.\nTo create a data list first enter the name of the fields in the first row of the table, each in one column. This first row with the field names is the headers row. Field names must be unique and there musn\u0026rsquo;t be blank cells in the headers row. After creating the fields enter first record data in the appropriate columns of the row immediately below the one containing the field names. To Excel recognise this table as a data list, click the Format as Table button on the ribbon’s Home tab and then click a thumbnail of one of the table styles in the drop-down gallery.\nAfter that you can enter the remaining records, one by row. After entering the data of a field press the Tab key to go to the next field of the same record, or to the first field of the next record if you are in the last field of a record.\nExample. The animation below shows how create a data list of students with the fields First name, Last name, Address, City, Birth date, Average grade and Passed credits.\nAfter creating a data list Excel will give a name to it, but is advisable to give it a descriptive name (see the Naming cells and ranges section).\nData validation When entering data to a data list is important to validate data to maintain database integrity. Data validation allows to specify which type and range of data are accepted by a cell or field (column). To apply a validation rule to a field, select the field column of the data list and click Data validation button of the Data tools panel on the ribbon\u0026rsquo;s Data tab. In the dialog that appears, select the validation criteria from the drop-down list of the Setting:\nWhole number allows only integers numbers between a specified minimum and a maximum or greater o less than a specified number. Decimal allows decimal numbers between a specified minimum and maximum or greater or less than a specified number. List allows a list of defined entries. Date allows dates between two specified dates or before or after a specified date. Time allows times between two specified times or before of after a specified time. Text length allows text with a restricted length. After selecting the validating criteria, enter the correspondent parameters (minimum or maximum numbers, dates, times or range with the entries of the list). You can also define an input message in the Input Message tab and an error message in the Error Alert tab that will be shown if an invalid entry is entered in the field.\nExample. The animation below shows how create a validation rule for the Average grade field in a data list of students.\nImporting databases Excel offers the possibility to import data from diverse sources like csv text files, XML files, relational databases like Access or web data sources.\nImporting data from csv text files To see how to import data from csv text file visit the section Import from csv format.\nImporting from web data sources There are many web pages that offers open data in a suitable format for import from Excel. To import data from a web data source click the From Web buttom of the Get External Data panel on the ribbon\u0026rsquo;s Data tab. This opens a web browser where you must enter the URL of the page with de data source. When the browser shows the data table some yellow arrows appears that allow you to select the rows and columns of the table to import.\nExample The animation below shows how to import the IBEX 35 serie from Yahoo finances.\nImporting data from Qandl Quandl is a finance and economic data repository with hundred of open data series. It\u0026rsquo;s possible to import data from Qandl to Excel easily, but you need the Quandl add in for Excel. To install the Quandl add in for Excel follow these instructions.\nAfter installing the add in a new tab labelled Quandl appears in the ribbon. To import a data serie from Qandl, first search the data serie clicking the Search button on the ribbon\u0026rsquo;s Quandl tab, enter some key words for the search and click the Show Results button, select the data serie desired from the search results, click the Insert Selected Codes buttom and click the Close button. This will insert the Quandl code of the data serie (if you know the Quandl code of the data serie you can avoid the search and enter it directly in a cell). Finally, select the cell with the Quandl code and click the Download button on the ribbon\u0026rsquo;s Quandl tab. This will download the data serie and put it in a range below the cell that contais the Quandl code.\nExample The animation below shows how to import the IBEX 35 serie from Quandl.\nData sorting To sort the data list records on a single field, you simply click that field’s AutoFilter button (the button with the triangle that appears to the right of the header) and then click the appropriate sort option on its drop-down list:\nSort A to Z or Sort Z to A in a text field. Sort Smallest to Largest or Sort Largest to Smallest in a number field. Sort Oldest to Newest or Sort Newest to Oldest in a date field. Other option to sort a data list on a field is to select a cell of the field column an click the Sort A to Z button of the Sort \u0026amp; Filter panel on the ribbon’s Data tab, to sort ascending, or the Sort Z to A button to sort descending.\nExcel then will reorder all the records in the data list according to the ascending or descending order selected.\nExample. The animation below shows how to sort a students database. First ascending on the Birth date field, next descending on the Average degree field, and finally ascending on the Last name field.\nIf you need to sort a data list on more than one field, select a cell of the data list and click the Sort button of the Sort \u0026amp; Filter panel on the ribbon\u0026rsquo;s Data tab. Then, in the dialog that appears, select the first sorting field column and the sorting order (ascending or descending), next the second sorting field column an the sorting order, and so on.\nExample. The animation below shows how to sort a students database on the fields City ascending and Average grade descending.\nYou can also sort a range of cells in general indicating the name of the columns instead of the field names.\nSummarizing data With large tables or data lists is difficult to extract relevant information. For that purpose, Excel provides several methods for summarizing data.\nTotaling and subtotaling fields A common operation is to apply a function to a whole field in a data list, as for instance the SUM function for summarizing or the AVERAGE function for averaging all the values in a field column. This could be done activating the Total row check box of the Table Style Options panel on the ribbon\u0026rsquo;s Table Options tab. This will add a total row at the bottom of the table. Clicking any cell of this row you can choose which function to apply to the whole field.\nExample The animation below shows how to sum the passed credits of students in a students database. It also shows how to average the average grade.\nExcel also allows subtotaling a field by categories of other field. This procedure only works with data lists formatted like tables, so if a data list have been formatted like a table first it has to be converted to a range selecting any cell of the table and clicking the Convert to Range button of the Tool panel on the ribbon\u0026rsquo;s Table Tools - Design tab. After that, you have to sort the data list by the field with the categories to summarize (see the Data sorting section). Finally, to subtotaling a data list click the Subtotal button of the Outline panel on the ribbons\u0026rsquo; Data tab. This will display a dialog where you have to select the field with the categories in the At each change in drop-down menu, the function to apply (sum, count, average, etc.) in the Use function drop-down menu, check the fields to with apply the subtotaling function in the Add subtotal to list, and click OK.\nExample The animation below shows how to subtotaling the passed credits of students in a students database by the city where they live.\nPivot tables A pivot table is a powerful tool for exploring data. It help you organise and summarize the raw data in your data list, revealing patterns or relationships that might not be obvious at first glance.\nTo create a pivot table click on any cell of a data list and then click the PivotTable button on the ribbon’s Insert tab. This display a dialog where you can select the range for the pivot table (by default Excel select the whole data list) and choose between placing the pivot table in a new workbook (default) or in the same workbook (in this case you have to indicate in which cell). After click OK, a pane appears on the right side of the pane:\nReport Filter for the fields that enable you to page through the data summaries shown in the actual pivot table by filtering out sets of data — they act as the filters for the report. So, for example, if you designate the Year Field from a data list as a Report Filter, you can display data summaries in the pivot table for individual years or for all years represented in the data list. Column Labels for the fields that determine the arrangement of data shown in the columns of the pivot table. Row Labels for the fields that determine the arrangement of data shown in the rows of the pivot table. Values for the fields whose data are presented and summarized in the body cells of the pivot table. By default Excel will use the SUM function to summarize values. To use another function click the field and select the Value Field Settings option in the menu that appears. In the dialog that appears just select the function that you want to use for summarizing and click OK. Example The animation below shows how to create a pivot table for a students database. The pivot table shows and summarizes the passed credits by degrees on rows and by cities on columns.\nThe animation below shows how to arrange the previous pivot table to show the passed credits summarized first by city and then by degree and vice versa, both on rows.\nThe animation below shows how to arrange the previous pivot table to show, in addition to the passed credits, the average grade of students. The passed credits are summarized using the SUM function while the average grade is summarized using the AVERAGE function.\nThe animation below shows how to filter the previous pivot table to show only the values of course year 2014 and not to show the physics degree.\nTo change the format of a pivot table you can use the Layout panel on ribbon\u0026rsquo;s PivotTable Tools - Design tab. This panel has four buttons:\nSubtotals Allows to show subtotals at top of groups, at bottom of groups or not to show subtotals. Grand Totals Allows to show grand totals for rows, for columns, for both rows and columns, or not to show grand totals. Report Layout Allows to show the groups in compact form (all the grouping fields in the same column), in outline form (every grouping field in a different column) or in tabular form (like the outline form but adding extra rows for the subtotals). Blank rows Allow to insert or not a blank row after each group. It\u0026rsquo;s also possible to apply a predefined style to a pivot table just selecting the desired style from the PivotTable Styles panel on ribbon\u0026rsquo;s PivotTable Tools - Design tab.\nExample The animation below shows how to format and how to apply a style to the previous pivot table.\nPivot chart Pivot tables can be accompanied by pivot charts, that is an interactive chart where you can present and summarize data grouped by some fields like a in a pivot table. To create a pivot chart from a pivot table, in the worksheet with the pivot table click the PivotChart button of the Tools panel on the ribbon\u0026rsquo;s PivotTable Tools - Options tab. This will show a dialog with the charts types. Select the desired chart type and click OK. After that Excel inserts a chart in the same worksheet of the pivot table reflecting the same information of the pivot table. Fron now on, any change in the pivot table will be reflected in the pivot chart.\nExample The animation below shows how to create a pivot chart from a pivot table for a students database.\nOf course, you can change the pivot chart layout as any other chart (see section Chart layout).\nData filtering With huge databases it\u0026rsquo;s difficult to find the desired information. To overcome this problem Excel provide several methods to filter the database. Filtering is the procedure for specifying the data that you want displayed in an Excel data list.\nApply a simple filter The easiest way to perform this basic type of filtering on a field is to click the AutoFilter button (the button with the triangle that appears to the right of the header). This display a drop-down menu that contains at the end a list box with a complete listing of all entries made in that column, each with its own check box. In this list click the check box in front of the (Select All) option at the top of the field’s list box to clear the check boxes, then click each of the check boxes corresponding to the entries for the records you do want displayed in the filtered data list, and finally click OK. Excel then hides rows in the data list for all records except for those that contain the entries you just selected.\nExample The animation below shows how to filter the students of Sevilla and Albacete in a students database.\nTo perform more sophisticated filters you can use the other filter options of the AutoFiller button. These filter options depend on the type of entries in the field:\nIf the column only contains dates, the menu contains a Date Filters option with a submenu that allows you to filter dates equals to, before o after a given date; dates between two given dates; dates of today, yesterday and tomorrow; dates of this week, last week and next week; dates of this month, last month and next month; dates of this quarter, last quarter and next quarter; dates of this year, last year and next year; and dates in a specific period (quarter or month).\nIf the column contains only numbers or a mixture of dates with numbers, the menu contains a Number Filters option with a submenu that allows you to filter numbers equal or not equal to a given number; numbers greater than, greater than or equal to, less than, less than or equal to a given number; numbers between two given numbers; top 10 numbers; number above the average and numbers below the average.\nIf the column only text or a mixture of text, date and numbers, the menu contains a Text Filters option with a submenu that allows you to filter text equal or not equal to a given text; text that begins or end with a given text; and text that contains or does not contains a given text.\nIf the filter selected requires some parameter (date, number or text), a dialog appears where you must enter that data and click OK.\nExample The animation below shows how to filter the students born before 1/1/1995, with an average grade greater than or equal to 5, and whose name begins with M, in a students database.\nApply a complex filter Simple filters are enough in most cases, but sometime you need to filter data according to more complex criteria. Fortunately Excel provides a method to perform filters based on calculated criteria with formulas.\nTo perform a filter with calculated criteria first you have to specify the criteria somewhere in the worksheet that contains the data list. The criteria must have a cell header and a logical formula in the cell just below. In the logical formula you can use functions and references to the cells, but it\u0026rsquo;s important to note that all references must be to cells in the first row of the data list. After that, to apply the filter you need to select a cell in the data list and click the Advanced button of the Sort \u0026amp; Filter panel on the ribbons\u0026rsquo;s Data tab. This shows a dialog where you have to enter the range of the data list (usually Excel auto recognise it), the range of the filter criteria and click OK. Excel will apply the logical formula to every row of the data list and show only the records where the formula returns TRUE.\nExample The animation below shows how to filter the students with an average grade greater than or equal to 5, and a number of passed credits over the average, in a students database, using a calculated criteria. Observe how is used the data list name and the field name to reference the column of passed credits in the average calculation.\nClear a filter To clear an active filter in a data list click the AutoFilter button of the column with the active filter and select the option Clear Filter. After that Excel will show all the records hidden by the removed filter, but the rest of filters will continue active. To clear all the filters in a data list, select a cell of the data list and then click the Clear button of the Sort \u0026amp; Filter panel on the ribbons\u0026rsquo;s Data tab. This will show all the records of the data list.\nDatabase functions Excel have some predefined functions that can be applied to data list. Some of them apply other function only to records in a data list that match a criteria you specify.\nDefine a criteria The criteria must be defined in a range and must include at least one header with a field name that indicates the field whose values are to be evaluated and one cell just below with the value or expression to be used in the evaluation. The expression with the condition is a text string starting with a logical comparator (=,\u0026gt;,\u0026lt;,\u0026gt;=,\u0026lt;=,\u0026lt;\u0026gt;) or a pattern text with wildcards like the question mark ? (that matches any character) or the asterisk * (that matches any character string). You can specify multiple conditions in different columns. If you want to apply the function to all the records of the data list, just leave the cell with the criteria conditions blank.\nDSUM function The DSUM function sums the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DSUM(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to sum the passed credits of students from Madrid born in 1994 or after with an average grade greater or equal to 6, in a students database.\nDCOUNT function The DCOUNT function counts the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DCOUNT(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to count the students with an average grade greater than or equal to 6 whose name begins with L, in a students database.\nDMIN function The DMIN function returns the minimum in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DMIN(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nDMAX function The DMAX function returns the maximum in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DMAX(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to calculate the minimum and the maximum average grade of students from Madrid born before 1995, in a students database.\nDAVERAGE function The DAVERAGE function averages the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DAVERAGE(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to average the average grades of students from Madrid born in 1994 or after with an average grade greater or equal to 6, in a students database.\nDSTDEVP function The DSTDEVP function calculates the standard deviation the values in a numeric field (column) of records in a data list that match the criteria you specify. Its syntax is DSTDEVP(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to add up (it must be a numeric column) enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nExample The animation below shows how to calculate the standard deviation of average grades of students from Madrid born in Madrid before 1995, in a students database.\nDGET function The DGET function returns the value of field (column) in the record of a data list that match the criteria you specify. Its syntax is DGET(database,field,criteria), where database is the range of the data list, field is the name of the field that contains the values to return enclosed in double quotes, and criteria is the range that contains the criteria with the conditions you specify.\nIf no record satisfy the criteria, the function returns a #VALUE! error, and if more than one records satisfy the criteria the functions return a #NUM! error.\nExample The animation below shows how to find the student with the highest grade in a student database.\nOther functions allow to search values in a list or table.\nVLOOKUP and HLOOKUP functions The VLOOKUP function finds things in a table or list by row. Its syntax is VLOOKUP (value, table, col-index, [approx-match]), where value is the value you want to look up, table is the range of the table or list in which to perform the search, col-index is the the column number (starting with 1 for the left-most column of table range) that contains the return value, and approx-match is an optional logical argument that specifies whether to find an approximate match (TRUE by default) or an exact match (FALSE). The function looks the value argument up in the first column of the table argument. If the approx-match argument is TRUE, the table should be ordered by the firs column (the column where to look the value up) and the function will return the value of the col-index column in the same row that the closest value to value in the first column of the table range. If approx-match is false, the function will search for the exact value in the firs column and it will return the value of the col-index column in the same row that the first matched value in the first column. If no value in the first column matches the value argument, the function will return a #N/A error.\nExample The animation below shows how to look the phone up of a student in a students database.\nThe HLOOKUP function works like the VLOOKUP function but it performs a search by columns. Its syntax is HLOOKUP (value, table, row-index, [approx-match]), where value is the value you want to look up, table is the range of the table or list in which to perform the search, row-index is the the row number (starting with 1 for the top-most row of table range) that contains the return value, and approx-match is an optional logical argument that specifies whether to find an approximate match (TRUE by default) or an exact match (FALSE).\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"0bf14acea21eaa00b95813c4c2dc6e25","permalink":"/en/teaching/excel/manual/databases/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/excel/manual/databases/","section":"teaching","summary":" ","tags":["Excel"],"title":"Database Management","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Antiderivative of a function Definition - Antiderivative of a function. Given a function $f(x)$, the function $F(X)$ is an antiderivative or primitive function of $f$ if it satisfies that $F\u0026rsquo;(x)=f(x)$ $\\forall x \\in \\mathop{\\rm Dom}(f)$. Example. The function $F(x)=x^2$ is an antiderivative of the function $f(x)=2x$ as $F\u0026rsquo;(x)=2x$ on $\\mathbb{R}$.\nRoughly speaking, the calculus of antiderivatives is the reverse process of differentiation, and that is the reason for the name of antiderivative.\nIndefinite integral of a function As two functions that differs in a constant term have the same derivative, if $F(x)$ is an antiderivative of $f(x)$, so will be any function of the form $F(x)+k$ $\\forall k \\in \\mathbb{R}$. This means that, when a function has an antiderivative, it has an infinite number of antiderivatives.\nDefinition - Indefinite integral. The indefinite integral of a function $f(x)$ is the set of all its antiderivatives; it is denoted by\n$$\\int{f(x)}\\,dx=F(x)+C$$ where $F(x)$ is an antiderivative of $f(x)$ and $C$ is a constant.\nExample. The indefinite integral of the function $f(x)=2x$ is $$\\int 2x\\, dx = x^2+C.$$\nInterpretation of the integral We have seen in a previous chapter that the derivative of a function is the instantaneous rate of change of the function. Thus, if we know the instantaneous rate of change of the function at any point, we can compute the change of the function.\nExample. What is the space covered by an free falling object?\nAssume that the only force acting upon an object drop is gravity, with an acceleration of $9.8$ m/s$^2$. As acceleration is the the rate of change of the speed, that is constant at any moment, the antiderivative is the speed of the object,\n$$v(t) = 9.8t \\mbox{ m/s}$$\nAnd as the speed is the rate of change of the space covered by object during the fall, the antiderivative of the speed is the space covered by the object,\n$$s(t) = \\int 9.8t\\, dt = 9,8\\frac{t^2}{2}.$$\nThus, for instance, after 2 seconds, the covered space is $s(2) = 9.8\\frac{2^2}{2} = 19.6$ m.\nLinearity of integration Given two integrable functions $f(x)$ and $g(x)$ and a constant $k \\in \\mathbb{R}$, it is satisfied that\n$\\int{(f(x)+g(x))}\\,dx=\\int{f(x)}\\,dx+\\int{g(x)}\\,dx$, $\\int{kf(x)}\\,dx=k\\int{f(x)}\\,dx$. This means that the integral of any linear combination of functions equals the same linear combination of the integrals of the functions.\nElementary integrals $\\int a\\,dx=ax+C$, with $a$ constant. $\\int x^n\\,dx=\\dfrac{x^{n+1}}{n+1}+C$ if $n\\neq -1$. $\\int \\dfrac{1}{x}\\, dx=\\ln\\vert x\\vert+C$. $\\int e^x\\,dx=e^x+C$. $\\int a^x\\,dx=\\dfrac{a^x}{\\ln a}+C$. $\\int \\sin x\\, dx=-\\cos x+C$. $\\int \\cos x\\, dx=\\sin x+C$. $\\int \\tan x\\, dx=\\ln\\vert\\sec x\\vert+C$. $\\int \\sec x\\, dx = \\ln\\vert\\sec x + \\tan x\\vert+C$. $\\int \\csc x\\, dx= \\ln\\vert\\csc x-\\cot x\\vert+C$. $\\int \\cot x \\, dx= \\ln\\vert\\sin x\\vert+C$. $\\int \\sec^2 x\\, dx= \\tan x+ C$. $\\int \\csc^2 x\\, dx= -\\cot x+ C$. $\\int \\sec x \\tan x\\, dx= \\sec x+ C$. $\\int \\csc x \\cot x\\, dx = -\\csc x +C$. $\\int \\dfrac{dx}{\\sqrt{a^2-x^2}}=\\arcsin\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{a^2+x^2}=\\dfrac{1}{a}\\arctan\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{x\\sqrt{x^2-a^2}}=\\dfrac{1}{a}\\sec^{-1}\\dfrac{x}{a}+C$. $\\int \\dfrac{dx}{a^2-x^2}=\\dfrac{1}{2a}\\ln\\left\\vert\\dfrac{x+a}{x-a}\\right\\vert+C$. Techniques of integration Unfortunately, unlike differential calculus, the is not a foolproof procedure to compute the antiderivative of a function. However, there are some techniques that allow to integrate some types of functions. The most common methods of integration are\nIntegration by parts Integration by reduction Integration by substitution Integration of rational functions Integration of trigonometric functions Integration by parts Theorem - Integration by parts. Given two differentiable functions $u(x)$ and $v(x)$,\n$$\\int{u(x)v\u0026rsquo;(x)}\\,dx=u(x)v(x)-\\int{u\u0026rsquo;(x)v(x)}\\,dx,$$\nor, writing $u\u0026rsquo;(x)dx=du$ and $v\u0026rsquo;(x)dx=dv$,\n$$\\int{u}\\,dv=uv-\\int{v}\\,du.$$\nProof From the rule for differentiating a product we have\n$$ (uv)\u0026rsquo; = u\u0026rsquo;v + uv\u0026rsquo; $$\nand computing the integrals both sides we get\n$$ \\begin{gathered} \\int (uv)\u0026rsquo; \\, dx = \\int u\u0026rsquo;v \\, dx + \\int uv\u0026rsquo;\\, dx \\Rightarrow\\newline uv = \\int v\\,du + \\int u\\, dv \\Rightarrow\\newline \\int{u}\\,dv=uv-\\int{v}\\,du. \\end{gathered} $$\nTo apply this method we have to choose the functions $u$ and $dv$ in a way so that the final integral is easier to compute than the original one.\nExample. To integrate $\\int{x \\sin x}\\,dx$ we have to choose $u=x$ and $dv=\\sin x\\, dx$, so $du=dx$ and $v=-\\cos x$, getting $$\\int{x \\sin x}\\,dx=-x\\cos x-\\int (-\\cos x)\\,dx = -x\\cos x +\\sin x.$$ If we had chosen $u=\\sin x$ and $dv=x\\,dx$, we would have got a more difficult integral.\nIntegration by reduction The reduction technique is used when we have to apply the integration by parts several times.\nIf we want to compute the antiderivative $I_{n}$ that depends on a natural number $n$, the reduction formulas allow us to write $I_{n}$ as a function of $I_{n-1}$, that is, we have a recurrent relation $$\\ I_{n}=f(I_{n-1},x,n)$$ so by computing the first antiderivative $I_0$ we should be able to compute the others.\nExample. To compute $I_{n}=\\int{x^ne^x}\\,dx$ applying integration by parts, we have to choose $u=x^n$ y $dv=e^x\\,dx$, so $du=nx^{n-1}\\,dx$ and $v=e^{x}$, getting\n$$\\ I_{n}=\\int{x^ne^x}\\,dx=x^ne^x-n\\int{x^{n-1}e^x}\\,dx=x^ne^x-nI_{n-1}.$$\nThus, for instance, for $n=3$ we have\n$$ \\begin{aligned} \\int x^3 e^x\\, dx \u0026amp;= I_3 = x^3e^x-3I_2 = x^3e^x-3(x^2e^x-2I_1) =\\newline \u0026amp;= x^3e^x-3(x^2e^x-(xe^x-I_0) = x^3e^x-3(x^2e^x-(xe^x-e^x) =\\newline \u0026amp;= e^x(x^3-3x^2+6x-6). \\end{aligned} $$\nIntegration by substitution From the chain rule for differentiating the composition of two functions\n$$f(g(x))\u0026rsquo; = f\u0026rsquo;(g(x))g\u0026rsquo;(x),$$\nwe can make a variable change $u=g(x)$, so $du=g\u0026rsquo;(x)dx$, and get\n$$\\int f\u0026rsquo;(g(x))g\u0026rsquo;(x)\\, dx = \\int f\u0026rsquo;(u)\\, du = f(u)+C = f(g(x))+C.$$\nExample. To compute the integral of $\\int{\\dfrac{1}{x\\log x}}\\, dx$ we can make the substitution $u=\\log x$, so $du=\\frac{1}{x}dx$, and we have\n$$\\int \\frac{dx}{x\\log x}=\\int \\frac{1}{\\log x}\\frac{1}{x}\\,dx = \\int \\frac{1}{u}\\,du = \\log \\vert u\\vert+ C.$$\nFinally, undoing the substitution we get\n$$\\int \\frac{1}{x\\log x}\\,dx= \\log \\vert\\log x\\vert + C.$$\nIntegration of rational functions Partial fractions decomposition A rational function can be written as the sum of a polynomial (with an immediate antiderivative) plus a proper rational function, that is, a rational function in which the degree of the numerator is less than the degree of the denominator.\nOn the other hand, depending of the factorization of the denominator, a proper rational function can be expressed as a sum of simpler fractions of the following types\nDenominator with a single linear factor: $\\dfrac{A}{(x-a)}$ Denominator with a linear factor repeated $n$ times : $\\dfrac{A}{(x-a)^{n}}$ Denominator with a single quadratic factor: $\\dfrac{Ax+B}{x^2+cx+d}$ Denominator with a quadratic factor repeated $n$ times: $\\dfrac{Ax+B}{(x^2+cx+d)^n}$ Antiderivatives of partial fractions Using the linearity of integration, we can compute the antiderivative of a rational function from the antiderivative of these partial fractions\n$$ \\begin{aligned} \\int \\frac{A}{x-a}\\,dx \u0026amp;= A\\log\\vert x-a\\vert+C,\\newline \\int \\frac{A}{(x-a)^n}\\,dx \u0026amp;= \\frac{-A}{(n-1)(x-a)^{n-1}}+C \\textrm{ si $n\\neq 1$}.\\newline \\int \\frac{Ax+B}{x^2+cx+d} \u0026amp;= \\frac{A}{2}\\log\\vert x^2+cx+d\\vert + \\frac{2B-Ac}{\\sqrt{4d-c^2}}\\arctan \\frac{2x+c}{\\sqrt{4d-c^2}}+C. \\end{aligned} $$\nIntegration of a rational function with a denominator with linear factors Example. Consider the function $f(x)=\\dfrac{x^2+3x-5}{x^3-3x+2}$.\nThe factorization of the denominator is $x^3-3x+2=(x-1)^2(x+2)$; it has a single linear factor $(x+2)$ and a linear factor $(x-1)$, repeated two times. In this case the decomposition in partial fractions is:\n$$ \\begin{aligned} \\frac{x^2+3x-5}{x^3-3x+2}\u0026amp;=\\frac{A}{x-1}+\\frac{B}{(x-1)^2}+\\frac{C}{x+2} = \\newline \u0026amp;= \\frac{A(x-1)(x+2)+ B(x+2)+C(x-1)^2}{(x-1)^2(x+2)} = \\newline \u0026amp;= \\frac{(A+C)x^2+(A+B-2C)x+(-2A+2B+C)}{(x-1)^2(x+2)} \\end{aligned} $$\nand equating the numerators we get $A=16/9$, $B=-1/3$ and $C=-7/9$, so\n$$\\frac{x^2+3x-5}{x^3-3x+2}= \\frac{16/9}{x-1}+\\frac{-1/3}{(x-1)^2}+\\frac{-7/9}{x+2}.$$\nFinally, integrating each partial fraction we have\n$$ \\begin{aligned} \\int \\frac{x^2+3x-5}{x^3-3x+2}\\, dx \u0026amp;= \\int \\frac{16/9}{x-1}\\,dx+\\int \\frac{-1/3}{(x-1)^2}\\,dx+\\int \\frac{-7/9}{x+2}\\,dx = \\newline \u0026amp;= \\frac{16}{9}\\int\\frac{1}{x-1}\\,dx-\\frac{1}{3}\\int(x-1)^{-2}\\,dx- \\frac{7}{9}\\int \\frac{1}{x+2}\\,dx = \\newline \u0026amp;= \\frac{16}{9}\\ln\\vert x-1\\vert+\\frac{1}{3(x-1)}-\\frac{7}{9}\\ln\\vert x+2\\vert+C. \\end{aligned} $$\nIntegration of a rational function with a denominator with simple quadratic factors Example. Consider the function $f(x)=\\dfrac{x+1}{x^2-4x+8}$.\nIn this case the denominator cannot be factorised as a product of linear factors, but we can write\n$$x^2-4x+8 = (x-2)^2+4,$$\nso\n$$ \\begin{aligned} \\int \\dfrac{x+1}{x^2-4x+8}\\, dx \u0026amp;= \\int \\dfrac{x-2+3}{(x-2)^2+4}\\,dx = \\newline \u0026amp;= \\int \\dfrac{x-2}{(x-2)^2+4}\\,dx + \\int \\dfrac{3}{(x-2)^2+4}\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\ln\\vert(x-2)^2+4\\vert + \\dfrac{3}{2}\\arctan\\left(\\frac{x-2}{2}\\right)+C. \\end{aligned} $$\nIntegration of trigonometric functions Integration of $\\sin^n x\\cos^m x$ with $n$ or $m$ odd If $f(x)=\\sin^n x\\cos^m x$ with $n$ or $m$ odd, then we can make the substitution $t=\\sin x$ or $t=\\cos x$, to convert the function into a polynomial. Example.\n$$\\int \\sin^2 x\\cos^3 x\\, dx = \\int \\sin^2 x\\cos^2 x\\cos x\\, dx = \\int \\sin^2 x(1-\\sin^2 x)\\cos x\\, dx,$$\nand making the substitution $t=\\sin x$, so $dt = \\cos x dx$, we have\n$$\\int \\sin^2 x(1-\\sin^2 x)\\cos x\\, dx = \\int t^2(1-t^2)\\, dt = \\int t^2-t^4 \\, dt = \\frac{t^3}{3}-\\frac{t^5}{5}+C.$$\nFinally, undoing the substitution we have\n$$\\int \\sin^2 x\\cos^3 x\\, dx = \\frac{\\sin^3 x}{3}-\\frac{\\sin^5 x}{5}+C.$$\nIntegration of $\\sin^n x\\cos^m x$ with $n$ and $m$ even If $f(x)=\\sin^n x\\cos^m x$ with $n$ and $m$ even, then we can make the following substitutions to simplify the integration\n$$ \\begin{aligned} \\sin^2 x \u0026amp;= \\frac{1}{2}(1-\\cos(2x))\\newline \\cos^2 x \u0026amp;= \\frac{1}{2}(1+\\cos(2x))\\newline \\sin x\\cos x \u0026amp;= \\frac{1}{2}\\sin(2x) \\end{aligned} $$\nExample.\n$$ \\begin{aligned} \\int \\sin^2 x\\cos^4 x\\, dx \u0026amp;= \\int (\\sin x\\cos x)^2\\cos^2 x\\, dx = \\int \\left(\\frac{1}{2}\\sin(2x)\\right)^2\\frac{1}{2}(1+\\cos(2x))\\,dx =\\newline \u0026amp;= \\frac{1}{8}\\int \\sin^2(2x)\\,dx+\\frac{1}{8}\\int \\sin^2(2x) \\cos(2x)\\,dx, \\end{aligned} $$\nthe first integral is of the same type and the second one of the previous type, so $$\\int \\sin^2 x\\cos^4 x\\, dx = \\frac{1}{32}x-\\frac{1}{32}\\sin(2x)+\\frac{1}{24}\\sin^3(2x).$$\nProducts of sines and cosines The equalities\n$$ \\begin{aligned} \\sin x\\cos y \u0026amp;= \\frac{1}{2}(\\sin(x-y)+\\sin(x+y))\\newline \\sin x\\sin y \u0026amp;= \\frac{1}{2}(\\cos(x-y)-\\cos(x+y))\\newline \\cos x\\cos y \u0026amp;= \\frac{1}{2}(\\cos(x-y)+\\cos(x+y)) \\end{aligned} $$\ntransform products in sums, simplifying the integration.\nExample.\n$$ \\begin{aligned} \\int \\sin x\\cos 2x\\, dx \u0026amp;= \\int \\frac{1}{2}(\\sin(x-2x)+\\sin(x+2x))\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\int \\sin (-x)\\,dx +\\frac{1}{2}\\int \\sin 3x\\,dx = \\newline \u0026amp;= \\frac{1}{2}\\cos(-x)- \\frac{1}{6}\\cos 3x +C. \\end{aligned} $$\nRational functions of sines and cosines If $f(x,y)$ is a rational function then the function $f(\\sin x,\\cos x)$ can be transformed in an rational function of $t$ with the following substitutions\n$$\\tan \\frac{x}{2}=t \\quad \\sin x=\\frac{2t}{1+t^2} \\quad \\cos x = \\frac{1-t^2}{1+t^2} \\quad dx = \\frac{2}{1+t^2}dt.$$\nExample.\n$$\\int \\frac{1}{\\sin x}\\,dx = \\int \\frac{1}{\\frac{2t}{1+t^2}}\\frac{2}{1+t^2}\\,dt = \\int \\frac{1}{t}\\,dt = \\log\\vert t\\vert+C = \\log\\vert\\tan\\frac{x}{2}\\vert+C.$$\nDefinite integral Definition - Definite integral. Let $f(x)$ be a function which is continuous on an interval $[a, b]$. Divide this interval into $n$ subintervals of equal width $\\Delta x$ and choose an arbitrary point $x_i$ from each subinterval. The definite integral of $f$ from $a$ to $b$ is defined to be the limit\n$$\\int_a^b f(x)\\,dx = \\lim_{n\\rightarrow \\infty}\\sum_{i=1}^n f(x_i)\\Delta x.$$\nTheorem - First fundamental theorem of Calculus. If $f(x)$ is continuous on the interval $[a,b]$ and $F(x)$ is an antiderivative of $f$ on $[a,b]$, then\n$$\\int_a^b f(x)\\,dx = F(b)-F(a)$$\nExample. Given the function $f(x)=x^2$, we have\n$$\\int_1^2 x^2\\,dx = \\left[\\frac{x^3}{3}\\right]_1^2 = \\frac{2^3}{3}-\\frac{1^3}{3} = \\frac{7}{3}.$$\nProperties of the definite integral Given two functions $f(x)$ and $g(x)$ integrable on $[a,b]$ and $k \\in \\mathbb{R}$ the following properties are satisfied:\n$\\int_{a}^{b}(f(x)+g(x))\\,dx=\\int_{a}^{b}f(x)\\,dx+\\int_{a}^{b}g(x)\\,dx$ (linearity)\n$\\int_{a}^{b}{kf(x)}\\,dx=k\\int_{a}^{b}{f(x)}\\,dx$ (linearity)\n$\\int_{a}^{b}{f(x)\\,dx} \\leq \\int_{a}^{b}{g(x)\\,dx}$ si $f(x)\\leq g(x)\\ \\forall x \\in [a,b]$ (monotony)\n$\\int_{a}^{b}{f(x)\\,dx} = \\int_{a}^{c}{f(x)\\,dx}+\\int_{c}^{b}{f(x)\\,dx}$ for any $c\\in(a,b)$ (additivity)\n$\\int_a^b f(x)\\,dx = -\\int_b^a f(x)\\,dx$\nArea calculation Area between a positive function and the $x$ axis If $f(x)$ is an integrable function on the interval $[a,b]$ and $f(x)\\geq 0\\ \\forall x\\in[a,b]$, then the definite integral\n$$\\int_a^b f(x)\\,dx$$\nmeasures the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$.\nArea between a negative function and the $x$ axis If $f(x)$ is an integrable function on the interval $[a,b]$ and $f(x)\\leq 0\\ \\forall x\\in[a,b]$, then the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$ is\n$$-\\int_a^b f(x)\\,dx.$$\nArea between a function and the $x$ axis In general, if $f(x)$ is an integrable function on the interval $[a,b]$, no matter the sign of $f$ on $[a,b]$, the area between the graph of $f$ and the $x$ axis on the interval $[a,b]$ is\n$$\\int_a^b \\vert f(x)\\vert\\,dx.$$\nArea between two functions If $f(x)$ and $g(x)$ are two integrable functions on the interval $[a,b]$, then the area between the graph of $f$ and $g$ on the interval $[a,b]$ is $$\\int_{a}^{b}{\\vert f(x)- g(x)\\vert\\,dx}.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600986575,"objectID":"534b57a451c94fee658cb2589add7cce","permalink":"/en/teaching/calculus/manual/integrals/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/integrals/","section":"teaching","summary":"Antiderivative of a function Definition - Antiderivative of a function. Given a function $f(x)$, the function $F(X)$ is an antiderivative or primitive function of $f$ if it satisfies that $F\u0026rsquo;(x)=f(x)$ $\\forall x \\in \\mathop{\\rm Dom}(f)$.","tags":["Integral","Area"],"title":"Integral calculus","type":"book"},{"authors":null,"categories":["Calculus","One Variable Calculus"],"content":"Ordinary Differential Equations Often in Physics, Chemistry, Biology, Geometry, etc there arise equations that relate a function with its derivative, or successive derivatives.\nDefinition - Ordinary differential equation. An ordinary differential equation (O.D.E.) is a equation that relates an independent variable $x$, a function $y(x)$ that depends on $x$, and the successive derivatives of $y$, $y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n)}$; it can be written as\n$$F(x, y, y\u0026rsquo;, y\u0026rsquo;\u0026rsquo;,\\ldots, y^{(n)})=0.$$\nThe order of a differential equation is the greatest order of the derivatives in the equation.\nExample. The equation $y\u0026rsquo;\u0026rsquo;\u0026rsquo;+sen(x)y\u0026rsquo;=2x$ is a differential equation of order 3.\nDeducing a differential equation To deduce a differential equation that explains a natural phenomenon is essential to understand what a derivative is and how to interpret it.\nExample. Newton’s law of cooling states\n“The rate of change of the temperature of a body in a surrounding medium is proportional to the difference between the temperature of the body $T$ and the temperature of the medium $T_a$.”\nThe rate of change of the temperature is the derivative of temperature with respect to time $dT/dt$. Thus, Newton’s law of cooling can be explained by the differential equation\n$$\\frac{dT}{dt}=k(T-T_a),$$\nwhere $k$ is a proportionality constant.\nSolution of an ordinary differential equation Definition - Solution of an ordinary differential equation. Given an ordinary differential equation $F(x,y,y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n})=0$, the function $y=f(x)$ is a solution of the ordinary differential equation if it satisfies the equation, that is, if\n$$F(x,f(x), f\u0026rsquo;(x), f\u0026rsquo;\u0026rsquo;(x),\\ldots, f^{(n}(x))=0.$$\nThe graph of a solution of the ordinary differential equation is known as integral curve.\nSolving an ordinary differential equations consists on finding all its solutions in a given domain. For that integral calculus is required.\nThe same manner than the indefinite integral is a family of antiderivatives, that differ in a constant term, after integrating an ordinary differential equation we get a family of solutions that differ in a constant. We can get particular solutions giving values to this constant.\nGeneral solution of an ordinary differential equation Definition - General solution of an ordinary differential equation. Given an ordinary differential equation $F(x,y,y\u0026rsquo;,y\u0026rsquo;\u0026rsquo;,\\ldots,y^{(n})=0$ of order $n$, the general solution of the differential equation is a family of functions\n$$y =f (x,C_1,\\ldots,C_n),$$\ndepending on $n$ constants, such that for any value of $C_1,\\ldots,C_n$ we get a solution of the differential equation.\nFor every value of the constant we get particular solution of the differential equation. Thus, when a differential equation can be solved, it has infinite solutions.\nGeometrically, the general solution of a differential equation corresponds to a family of integral curves of the differential equation.\nOften, it is common to impose conditions to the solutions of a differential equation to reduce the number of solutions. In many cases, these conditions allow to determine the values of the constants in the general solution to get a particular solution.\nFirst order differential equations In this chapter we are going to study first order differential equations\n$$F(x,y,y\u0026rsquo;)=0.$$\nThe general solution of a first order differential equation is\n$$y = f (x,C),$$\nso to get a particular solution from the general one, it is enough to set the value of the constant $C$, and for that we only need to impose one initial condition.\nDefinition - Initial value problem. The group consisting of a first order differential equation and an initial condition is known as initial value problem:\n$$ \\begin{cases} F(x,y,y\u0026rsquo;)=0, \u0026amp; \\mbox{First order differential equation;} \\newline y(x_0)=y_0, \u0026amp; \\mbox{Initial condition.} \\end{cases} $$\nSolving an initial value problem consists in finding a solution of the first order differential equation that satisfies the initial condition.\nExample. Recall the first order differential equation of the Newton’s law of cooling, $$\\frac{dT}{dt}=k(T-T_a),$$ where $T$ is the temperature of the body and $T_a$ is the temperature of the surrounding medium.\nIt is easy to check that the general solution of this equation is\n$$T(t) = Ce^{kt}+T_a.$$\nIf we impose the initial condition that the temperature of the body at the initial instant is $5$ ºC, that is, $T(0)=5$, we have\n$$T(0) = Ce^{k\\cdot0}+T_a = C+T_a = 5,$$\nfrom where we get $C=5-T_a$, and this give us the particular solution\n$$T(t) = (5-T_a)e^{kt}+T_a.$$\nIntegral curve of an initial value problem Example. If we assume in the previous example that the temperature of the surrounding medium is $T_a=0$ ºC and the cooling constant of the body is $k=1$, the general solution of the differential equation is $$T(t)=Ce^t,$$ that is a family of integral curves. Among all of them, only the one that passes through the point $(0,5)$ corresponds to the particular solution of the previous initial value problem.\nExistence and uniqueness of solutions Theorem - Existence and uniqueness of solutions of a first order ODE. Given an initial value problem\n$$\\begin{cases} y\u0026rsquo;=F(x,y);\\newline y(x_0)=y_0; \\end{cases} $$\nif $F(x,y(x))$ is a function continuous on an open interval around the point $(x_0,y_0)$, then a solution of the initial value problem exists. If, in addition, $\\frac{\\partial F}{\\partial y}$ is continuous in an open interval around $(x_0,y_0)$, the solution is unique.\nAlthough this theorem guarantees the existence and uniqueness of a solution of a first order differential equation, it does not provide a method to compute it. In fact, there is not a general method to solve first order differential equations, but we will see how to solve some types:\nSeparable differential equations Homogeneous differential equations Linear differential equations Separable differential equations Definition - Separable differential equation. A separable differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;g(y)=f(x),$$\nor what is the same,\n$$g(y)dy=f(x)dx,$$\nso the different variables are on different sides of the equality (the variables are separated).\nThe general solution for a separable differential equation comes after integrating both sides of the equation\n$$\\int g(y)\\,dy = \\int f(x)\\,dx+C.$$\nExample. The differential equation of the Newton’s law of cooling\n$$\\frac{dT}{dt}=k(T-T_a),$$\nis a separable differential equation since it can be written as\n$$\\frac{1}{T-T_a}dT=k\\,dt.$$\nIntegrating both sides of the equation we have\n$$\\int \\frac{1}{T-T_a}\\,dT=\\int k\\,dt\\Leftrightarrow \\log(T-T_a)=kt+C,$$\nand solving for $T$ we get the general solution of the equation\n$$T(t)=e^{kt+C}+T_a=e^Ce^{kt}+T_a=Ce^{kt}+T_a,$$\nrewriting $C=e^C$ as an arbitrary constant.\nHomogeneous differential equations Definition - Homogeneous function. A function $f(x,y)$ is homogeneous of degree $n$, if it satisfies\n$$f(kx,ky)= k^nf(x,y),$$\nfor any value $k\\in \\mathbb{R}$.\nIn particular, a homogeneous function of degree $0$ always satisfies\n$$f(kx,ky)=f(x,y).$$\nSetting $k=1/x$ we have\n$$f(x,y)=f\\left(\\frac{1}{x}x,\\frac{1}{x}y\\right)=f\\left(1,\\frac{y}{x}\\right)=g\\left(\\frac{y}{x}\\right).$$\nThis way, a homogeneous function of degree $0$ always can be written as a function of $u=y/x$:\n$$f(x,y)=g\\left(\\frac{y}{x}\\right)=g(u).$$\nDefinition - Homogeneous differential equation. A homogeneous differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;=f(x,y),$$\nwhere $f(x,y)$ is a homogeneous function of degree $0$.\nWe can solve a homogeneous differential equation by making the substitution\n$$u=\\frac{y}{x}\\Leftrightarrow y=ux,$$\nso the equation becomes\n$$u\u0026rsquo;x+u=f(u),$$\nthat is a separable differential equation.\nOnce solved the separable differential equation, the substitution must be undone.\nExample. Let us consider the following differential equation $$4x-3y+y\u0026rsquo;(2y-3x)=0.$$\nRewriting the equation in this way\n$$y\u0026rsquo;=\\frac{3y-4x}{2y-3x}$$\nwe can easily check that it is a homogeneous differential equation.\nTo solve this equation we have to do the substitution $y=ux$, and we get\n$$u\u0026rsquo;x+u=\\frac{3ux-4x}{2ux-3x}=\\frac{3u-4}{2u-3}$$\nthat is a separable differential equation.\nSeparating the variables we have\n$$u\u0026rsquo;x=\\frac{3u-4}{2u-3}-u=\\frac{-2u^2+6u-4}{2u-3}\\Leftrightarrow \\frac{2u-3}{-2u^2+6u-4}\\,du=\\frac{1}{x}\\,dx.$$\nNow, integrating both sides of the equation we have\n$$ \\renewcommand{\\arraystretch}{2} \\begin{array}{c} \\displaystyle \\int \\frac{2u-3}{-2u^2+6u-4}\\,du=\\int \\frac{1}{x}\\,dx \\Leftrightarrow -\\frac{1}{2}\\log|u^2-3u+2|=\\log|x|+C \\Leftrightarrow\\newline \\Leftrightarrow \\log|u^2-3u+2|=-2\\log|x|-2C, \\end{array} $$\nthen, applying the exponential function to both sides and simplifying we get the general solution\n$$u^2-3u+2=e^{-2\\log|x|-2C}=\\frac{e^{-2C}}{e^{\\log|x|^2}}=\\frac{C}{x^2},$$\nrewriting the constant $K=e^{-2C}$.\nFinally, undoing the initial substitution $u=y/x$, we arrive at the general solution of the homogeneous differential equation\n$$\\left(\\frac{y}{x}\\right)^2-3\\frac{y}{x}+2=\\frac{K}{x^2}\\Leftrightarrow y^2-3xy+2x^2=K.$$\nLinear differential equations Definition - Linear differential equation A linear differential equation is a first order differential equation that can be written as\n$$y\u0026rsquo;+g(x)y = h(x).$$\nTo solve a linear differential equation we try to write the left side of the equation as the derivative of a product. For that we multiply both sides by the function $f(x)$, such that\n$$f\u0026rsquo;(x)=g(x)f(x).$$\nThus, we get\n$$ \\begin{array}{c} y\u0026rsquo;f(x)+g(x)f(x)y=h(x)f(x)\\newline \\Updownarrow\\newline y\u0026rsquo;f(x)+f\u0026rsquo;(x)y=h(x)f(x)\\newline \\Updownarrow\\newline \\dfrac{d}{dx}(yf(x))=h(x)f(x) \\end{array} $$\nIntegrating both sides of the previous equation we get the solution\n$$yf(x)=\\int h(x)f(x)\\,dx+C.$$\nOn the other hand, the unique function that satisfies $f\u0026rsquo;(x)=g(x)f(x)$ is\n$$f(x)=e^{\\int g(x)\\,dx},$$\nso, substituting this function in the previous solution we arrive at the solution of the linear differential equation\n$$ye^{\\int g(x)\\,dx}=\\int h(x) e^{\\int g(x)\\,dx}\\,dx+C,$$\nor what is the same\nSolution of a linear differential equation.\n$$y=e^{-\\int g(x)\\,dx}\\left(\\int h(x)e^{\\int g(x)\\,dx}\\,dx+C\\right).$$\nExample. If in the differential equation of the Newton’s law of cooling the temperature of the surrounding medium is a function of time $T_a(t)$, then the differential equation\n$$\\frac{dT}{dt}=k(T-T_a(t)),$$\nis a linear differential equation since it can be written as\n$$T\u0026rsquo;-kT=-kT_a(t),$$\nwhere the independent term is $-kT_a(t)$ and the coefficient of $T$ is $-k$.\nSubstituting in the formula of the general solution of a linear differential equation we have\n$$y=e^{-\\int -k\\,dt}\\left(\\int -kT_a(t)e^{\\int -k\\,dt}\\,dt+C\\right)= e^{kt}\\left(-\\int kT_a(t)e^{-kt}\\,dt+C\\right).$$\nIn the particular case that $T_a(t)=t$, and the proportionality constant $k=1$, the general solution of the linear differential equation is\n$$y=e^{t}\\left(-\\int te^{-kt}\\,dt+C\\right)=e^t(e^{-t}(t+1)+C)=Ce^t+t+1.$$\nIf, in addition, we know that the temperature of the body at time $t=0$ is $5$ ºC, that is, we have the initial condition $T(0)=5$, then we can compute the value of the constant $C$,\n$$y(0)=Ce^0+0+1=5 \\Leftrightarrow C+1=5 \\Leftrightarrow C=4,$$ and we get the particular solution\n$$y(t)=4e^t+t+1.$$\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600986575,"objectID":"a0eba6affdd1b66e27df2ea84c83aa50","permalink":"/en/teaching/calculus/manual/ordinary-differential-equations/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/ordinary-differential-equations/","section":"teaching","summary":"Ordinary Differential Equations Often in Physics, Chemistry, Biology, Geometry, etc there arise equations that relate a function with its derivative, or successive derivatives.\nDefinition - Ordinary differential equation. An ordinary differential equation (O.","tags":["Ordinary Differential Equation"],"title":"Ordinary differential equations","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Construct the sample space of the following random experiments:\nPick a random person and record the gender and whether she or he is smoker or not. Pick a random person and record the blood type and whether she or he is smoker or not. Pick a random person and record the gender, the blood type and whether she or he is smoker or not. Exercise 2 There are two boxes with balls of different colors. The first box contains 3 white balls and 2 black balls, and the second one contains 2 blue balls, 1 red ball and 1 green ball. Construct the sample space of the following random experiments:\nPick a random ball from every box. Pick two random balls from every box. Exercise 3 The Morgan’s laws state that given two events $A$ and $B$ from the same sample space, $\\overline{A\\cup B}=\\bar A \\cap \\bar B$ and $\\overline{A\\cap B}=\\bar A \\cup \\bar B$. Proof both statements graphically using Venn diagrams.\nExercise 4 Compute the probability of the following events of the random experiment consisting in tossing 3 coins:\nGet exactly 1 head. Get exactly 2 tails. Get two or more heads. Get some tails. Solution $P(\\mbox{1 head})=0.375$. $P(\\mbox{2 tails})=0.375$. $P(\\mbox{2 or more heads})=0.5$. $P(\\mbox{some tails})=0.875$. Exercise 5 In a laboratory there are 4 flasks with sulfuric acid and 2 with nitric acid, and in another laboratory there are 1 flask with sulfuric acid and 3 with nitric acid. A random experiment consist in picking two flask, one from every laboratory. Compute the probability of the following events:\nThe two picked flasks are of sulfuric acid. The two picked flasks are of nitric acid. The two picked flasks contains different acids. Compute again the above probabilities if the flask picked in the first laboratory is put in the second laboratory before picking the flask from it.\nSolution $P(\\mbox{Two flasks of sulfuric acid})=4/24$. $P(\\mbox{Two flasks of nitric acid})=6/24$. $P(\\mbox{One flask of each})=14/24$. Putting the first flask in the second laboratory: $P(\\mbox{Two flasks of sulfuric acid})=8/30$. $P(\\mbox{Two flasks of nitric acid})=8/30$. $P(\\mbox{One flask of each})=14/30$. Exercise 6 Let $A$ and $B$ two be events of a same sample space, such that $P(A)=3/8$, $P(B)=1/2$, $P(A\\cap B)=1/4$. Compute the following probabilities:\n$P(A\\cup B)$. $P(\\bar A)$ y $P(\\bar B)$. $P(\\bar A\\cap \\bar B)$. $P(A\\cap \\bar B)$. $P(A\\vert B)$. $P(A\\vert \\bar B)$. Solution $P(A\\cup B)=5/8$. $P(\\bar A)=5/8$ and $P(\\bar B)=1/2$. $P(\\bar A\\cap \\bar B)=3/8$. $P(A\\cap \\bar B)=1/8$. $P(A\\vert B)=1/2$. $P(A\\vert \\bar B)=1/4$. Exercise 7 In a hospital the probability of getting hepatitis in a blood transfusion from a unit of blood is $0.01$. A patient gets two units of blood while staying at the hospital. What is the probability of getting hepatitis?\nSolution $P(\\mbox{Hepatitis})=0.0199$. Exercise 8 Let $A$ and $B$ be two events of a same sample space, such that $P(A)=0.6$ and $P(A\\cup B)=0.9.$ Compute $P(B)$ under the following assumptions:\n$A$ and $B$ are incompatible. $A$ and $B$ are independent. Solution $P(B)=0.3$. $P(B)=0.75$. Exercise 9 A study about smoking has found that 40% of smokers have a smoker father, 25% have a smoker mother and 52% have al least one of the parents smoker. We pick a random person from this population. Answer the following questions:\nWhat is the probability of having a smoker mother if the father smokes? What is the probability of having a smoker mother if the father does not smoke? Are independent the events having a smoker father and having a smoker mother? Solution Naming $SF$ tho the event of having a smoker father and $SM$ to the event of having a smoker mother,\n$P(SM/SF)=0.325$. $P(SM/\\bar SF)=0.2$. The events aren\u0026rsquo;t independent. Exercise 10 The probability that an injury $A$ is repeated is $4/5$, the probability that another injury $B$ is repeated is $1/2$, and the probability that both injuries are repeated is $1/3$. Compute the probability of the following events:\nOnly injury $B$ is repeated. At least one injury is repeated. Injury $B$ is repeated if injury $A$ has been repeated. Injury $B$ is repeated if injury $A$ has not been repeated. Solution $P(B\\cap\\overline A)=1/6$. $P(A\\cup B)=29/30$. $P(B\\vert A)=5/12$. $P(B\\vert \\overline A)=5/6$. Exercise 11 In a digestive clinic, from every 1000 patients that arrive with stomach pain, 700 have gastritis, 200 have an ulcer and 100 have cancer. After analyzing the gastric symptoms, it is known that the probability of vomiting is $0.3$ in case of gastritis, $0.6$ in case of ulcer and $0.9$ in case of cancer. What is the diagnosis for a new patient with stomach pain that suffers from vomiting?\nNote: Assume that the only diseases are gastritis, ulcer and cancer and that are incompatible among them.\nSolution Let $G$, $U$ and $C$ be the events of having gastritis, ulcer and cander respectively, and let $V$ be the event of vomiting, $P(G/V)=0.5$, $P(U/V)=0.286$ and $P(C/V)=0.214$, so, the diagnosis is gastritis. Exercise 12 A severe pain without effusion in a particular zone of the knee joint is a symptom of sprained lateral collateral ligament (SLCL). If the sprains in that ligament are classified into grade 1, when there is only distension (60% of cases), grade 2 when there is a partial tearing (30% of cases) and grade 3 when there is a complete tearing (10% of cases). Taking into account that the symptom appears in 80% of cases of grade 1 sprains, 90% of cases of grade 2 and 100% of cases of grade 3, answer the following questions:\nIf a person has SLCL what is the probability that he or she present severe pain without effusion? What is the diagnosis for a person with severe pain without effusion? From a total of 10000 people with severe pain without effusion, how many are expected to have a grade 1 sprain? How many are expected to have a grade 2 sprain? And a grade 3 sprain? Solution Naming $S$ to the event of presenting severe pain without effusion, and $G1$, $G2$ and $G3$ to the events of having a SLCL of grade 1, 2 and 3 respectively,\n$P(S)=0.85$. $P(G1\\vert S)=0.5647$, $P(G2\\vert S)=0.3176$ and $P(G3\\vert S)=0.1176$, so the diagnosis is a SLCL of grade 1. $5647.0588$ will have a grade 1 sprain, $3176.4706$ will have a grade 2 sprain and $1176.4706$ will have a grade 3 sprain. Exercise 13 A physiotherapist uses two techniques $A$ and $B$ to cure an injury. It is known that the injury is 3 times more frequent in people over 30 than in people under 30. It is also known that in people over 30 technique $A$ works in 30% of cases and technique $B$ in 60%, while in people under 30 technique $A$ works in 50% of cases and technique $B$ in 70%. If both techniques are applied with the same probability, no matter the age,\nWhat is the probability that a random person under 30 is cured? And for a people over 30? What is the probability that a random person is cured? If after applying a technique to a person over 30, the person does not cure, what is the probability that the technique applied was $A$? Solution Naming $J$ to the event of being under 30, $C$ to the event of being cured, and $A$ and $B$ to the events of applying techniques $A$ and $B$ respectively,\n$P(C\\vert J)=0.45.$ and $P(C\\vert \\bar J)=0.6$. $P(C)=0.5625$.\n$P(A/\\bar J\\cap \\bar C)=0.625$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"926fe34432677779f405a3d111214b9f","permalink":"/en/teaching/statistics/problems/probability/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/probability/","section":"teaching","summary":"Exercise 1 Construct the sample space of the following random experiments:\nPick a random person and record the gender and whether she or he is smoker or not. Pick a random person and record the blood type and whether she or he is smoker or not.","tags":["Probability"],"title":"Problems of Probability","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Epidemiology"],"content":"Exercise 1 A test was applied to a sample of people in order to evaluate its effectiveness; the results are as follows:\n$$ \\begin{array}{l|cc} \u0026amp; \\mbox{Test }+ \u0026amp; \\mbox{Test }- \\newline \\hline \\mbox{Sick} \u0026amp; 2020 \u0026amp; 140 \\newline\n\\mbox{Healthy} \u0026amp; 80 \u0026amp; 7760 \\newline \\end{array} $$\nCalculate for this test:\nThe sensitivity and the specificity. The positive and negative predictive value. The probability of a correct diagnosis. Solution Naming $S$ and $H$ to the events of being sick and healthy respectively,\nSensitivity $P(+\\vert S)=0.9352$ and specificity $P(-\\vert H)=0.9898$. PPV $P(S\\vert +)=0.9619$ and NPV $P(H\\vert -)=0.9823$. $P((S\\cap +)\\cup (H\\cap -)) = P(S\\cap +) + P(H\\cap -) = 0.978$. Exercise 2 We know, from a research study, that 10% of people over 50 suffer a particular type or arthritis. We have developed a new method to detect the disease and after clinical trials we observe that if we apply the method to people with arthritis we get a positive result in 85% of cases, while if we apply the method to people without arthritis, we get a positive result in 4% of cases. Answer the following questions:\nWhat is the probability of getting a positive result after applying the method to a random person? If the result of applying the method to one person has been positive, what is the probability of having arthritis? Solution Naming $A$ to the event of having arthritis,\n$P(+)=0.121$. $P(A\\vert +) = 0.7025$. Exercise 3 We have two different test $A$ and $B$ to diagnose a disease. Test $A$ have a sensitivity of 98% and a specificity of 80%, while test $B$ have a sensitivity of 75% and a specificity of 99%.\nWhich test is better to confirm the disease? Which test is better to rule out the disease? Often a test is used to discard the presence of the disease in a large amount of people apparently healthy. This type of test is known as screening test. Which test will work better as a screening test? What are the predictive values of this test if the prevalence of the disease is 0.01? And if the prevalence of de disease is 0.2? The positive predictive value of a screening test used to be not too high. How can we combine the tests $A$ and $B$ to have a higher confidence in the diagnosis of the disease? Calculate the post-test probability of having the disease with the combination of both thest, if the outcome of both test is positive for a prevalence of 0.01. Solution Test $B$ cause it has a greater specificity. Test $A$ cause it has a greater sensitivity. Test $A$ will perform better as a screening test.\nFor a prevalence of $0.01$ the PPV is $P(D\\vert +)=0.0472$ and the NPV is $P(\\bar D\\vert -)=0.9997$.\nFor a prevalence of $0.2$ the PPV is $P(D\\vert +)=0.5506$ and the NPV is $P(\\bar D\\vert -)=0.9938$. First applying test $A$ to everybody and then applying test $B$ to positive cases of test $A$.\n$P(D\\vert +_A\\cap +_B)=0.7878$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"d9f103bcc7581ebf8ef06c9e4bd3de2a","permalink":"/en/teaching/statistics/problems/diagnostic_tests/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/diagnostic_tests/","section":"teaching","summary":"Exercise 1 A test was applied to a sample of people in order to evaluate its effectiveness; the results are as follows:\n$$ \\begin{array}{l|cc} \u0026amp; \\mbox{Test }+ \u0026amp; \\mbox{Test }- \\newline \\hline \\mbox{Sick} \u0026amp; 2020 \u0026amp; 140 \\newline","tags":["Probability","Diagnostic Tests"],"title":"Problems Diagnostic Tests","type":"book"},{"authors":null,"categories":["Calculus","Several Variables Calculus"],"content":"Vector functions of a single real variable Definition - Vector function of a single real variable. A vector function of a single real variable or vector field of a scalar variable is a function that maps every scalar value $t\\in D\\subseteq \\mathbb{R}$ into a vector $(x_1(t),\\ldots,x_n(t))$ in $\\mathbb{R}^n$:\n$$ \\begin{array}{rccl} f: \u0026amp; \\mathbb{R} \u0026amp; \\longrightarrow \u0026amp; \\mathbb{R}^n\\newline \u0026amp; t \u0026amp; \\longrightarrow \u0026amp; (x_1(t),\\ldots, x_n(t)) \\end{array} $$\nwhere $x_i(t)$, $i=1,\\ldots,n$, are real function of a single real variable known as coordinate functions.\nThe most common vector field of scalar variable are in the the real plane $\\mathbb{R}^2$, where usually they are represented as\n$$f(t)=x(t)\\mathbf{i}+y(t)\\mathbf{j},$$\nand in the real space $\\mathbb{R}^3$, where usually they are represented as\n$$f(t)=x(t)\\mathbf{i}+y(t)\\mathbf{j}+z(t)\\mathbf{k},$$\nGraphic representation of vector fields The graphic representation of a vector field in $\\mathbb{R}^2$ is a trajectory in the real plane.\nThe graphic representation of a vector field in $\\mathbb{R}^3$ is a trajectory in the real space.\nDerivative of a vector field The concept of derivative as the limit of the average rate of change of a function can be extended easily to vector fields.\nDefinition - Derivative of a vectorial field. A vectorial field $f(t)=(x_1(t),\\ldots,x_n(t))$ is differentiable at a point $t=a$ if the limit\n$$\\lim_{\\Delta t\\rightarrow 0} \\frac{f(a+\\Delta t)-f(a)}{\\Delta t}.$$\nexists. In such a case, the value of the limit is known as the derivative of the vector field at $a$, and it is written $f\u0026rsquo;(a)$.\nMany properties of real functions of a single real variable can be extended to vector fields through its component functions. Thus, for instance, the derivative of a vector field can be computed from the derivatives of its component functions.\nTheorem. Given a vector field $f(t)=(x_1(t),\\ldots,x_n(t))$, if $x_i(t)$ is differentiable at $t=a$ for all $i=1,\\ldots,n$, then $f$ is differentiable at $a$ and its derivative is\n$$f\u0026rsquo;(a)=(x_1\u0026rsquo;(a),\\ldots,x_n\u0026rsquo;(a))$$\nProof The proof for a vectorial field in $\\mathbb{R}^2$ is easy.\n$$\\begin{aligned} f\u0026rsquo;(a)\u0026amp;=\\lim_{\\Delta t\\rightarrow 0} \\frac{f(a+\\Delta t)-f(a)}{\\Delta t} = \\lim_{\\Delta t\\rightarrow 0} \\frac{(x(a+\\Delta t),y(a+\\Delta t))-(x(a),y(a))}{\\Delta t} =\\newline \u0026amp;= \\lim_{\\Delta t\\rightarrow 0} \\left(\\frac{x(a+\\Delta t)-x(a)}{\\Delta t},\\frac{y(a+\\Delta t)-y(a)}{\\Delta t}\\right) =\\newline \u0026amp;= \\left(\\lim_{\\Delta t\\rightarrow 0}\\frac{x(a+\\Delta t)-x(a)}{\\Delta t},\\lim_{\\Delta t\\rightarrow 0}\\frac{y(a+\\Delta t)-y(a)}{\\Delta t}\\right) = (x\u0026rsquo;(a),y\u0026rsquo;(a)). \\end{aligned} $$\nKinematics: Curvilinear motion The notion of derivative as a velocity along a trajectory in the real line can be generalized to a trajectory in any euclidean space $\\mathbb{R}^n$.\nIn case of a two dimensional space $\\mathbb{R}^2$, if $f(t)$ describes the position of a moving object in the real plane at any time $t$, taking as reference the coordinates origin $O$ and the unitary vectors ${\\mathbf{i}=(1,0),\\mathbf{j}=(0,1)}$, we can represent the position of the moving object $P$ at every moment $t$ with a vector $\\vec{OP}=x(t)\\mathbf{i}+y(t)\\mathbf{j}$, where the coordinates\n$$ \\begin{cases} x=x(t)\\newline y=y(t) \\end{cases} \\quad t\\in \\mbox{Dom}(f) $$\nare the coordinate functions of $f$.\nIn this context the derivative of a trajectory $f\u0026rsquo;(a)=(x_1\u0026rsquo;(a),\\ldots,x_n\u0026rsquo;(a))$ is the velocity vector of the trajectory $f$ at moment $t=a$. Example. Given the trajectory $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, whose image is the unit circumference centred in the coordinate origin, its coordinate functions are $x(t) = \\cos t$, $y(t) = \\sin t$, $t\\in \\mathbb{R}$, and its velocity is\n$$\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t))=(-\\sin t, \\cos t).$$\nIn the moment $t=\\pi/4$, the object is in position $f(\\pi/4) = (\\cos(\\pi/4),\\sin(\\pi/4)) =(\\sqrt{2}/2,\\sqrt{2}/2)$ and it is moving with a velocity $\\mathbf{v}=f\u0026rsquo;(\\pi/4)=(-\\sin(\\pi/4),\\cos(\\pi/4))=(-\\sqrt{2}/2,\\sqrt{2}/2)$.\nObserve that the module of the velocity vector is always 1 as $\\vert\\mathbf{v}\\vert=\\sqrt{(-\\sin t)^2+(\\cos t)^2}=1$.\nTangent line to a trajectory Tangent line to a trajectory in the plane Vectorial equation Given a trajectory $f(t)$ in the real plane, the vectors that are parallel to the velocity $\\mathbf{v}$ at a moment $a$ are called tangent vectors to the trajectory $f$ at the moment $a$, and the line passing through $P=f(a)$ directed by $\\mathbf{v}$ is the tangent line to the graph of $f$ at the moment $a$.\nDefinition - Tangent line to a trajectory. Given a trajectory $f(t)$ in the real plane $\\mathbb{R}^2$, the tangent line to to the graph of $f$ at $a$ is the line with equation\n$$ \\begin{aligned} l:(x,y) \u0026amp;= f(a)+tf\u0026rsquo;(a) = (x(a),y(a))+t(x\u0026rsquo;(a),y\u0026rsquo;(a))\\newline \u0026amp; = (x(a)+tx\u0026rsquo;(a),y(a)+ty\u0026rsquo;(a)). \\end{aligned} $$\nExample. We have seen that for the trajectory $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, whose image is the unit circumference centred at the coordinate origin, the object position at the moment $t=\\pi/4$ is $f(\\pi/4)=(\\sqrt{2}/2,\\sqrt{2}/2)$ and its velocity $\\mathbf{v}=(-\\sqrt{2}/2,\\sqrt{2}/2)$. Thus the equation of the tangent line to $f$ at that moment is\n$$ \\begin{aligned} l: (x,y) \u0026amp; = f(\\pi/4)+t\\mathbf{v} = \\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right)+t\\left(\\frac{-\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right) =\\newline \u0026amp; =\\left(\\frac{\\sqrt{2}}{2}-t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right). \\end{aligned} $$\nCartesian and point-slope equations From the vectorial equation of the tangent to a trajectory $f(t)$ at the moment $t=a$ we can get the coordinate functions\n$$ \\begin{cases} x=x(a)+tx\u0026rsquo;(a)\\newline y=y(a)+ty\u0026rsquo;(a) \\end{cases} \\quad t\\in \\mathbb{R}, $$\nand solving for $t$ and equalling both equations we get the Cartesian equation of the tangent\n$$\\frac{x-x(a)}{x\u0026rsquo;(a)}=\\frac{y-y(a)}{y\u0026rsquo;(a)},$$\nif $x\u0026rsquo;(a)\\neq 0$ and $y\u0026rsquo;(a)\\neq 0$.\nFrom this equation it is easy to get the point-slope equation of the tangent\n$$y-y(a)=\\frac{y\u0026rsquo;(a)}{x\u0026rsquo;(a)}(x-x(a)).$$\nExample. Using the vectorial equation of the tangent of the previous example\n$$l: (x,y)=\\left(\\frac{\\sqrt{2}}{2}-t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right),$$\nits Cartesian equation is $$\\frac{x-\\sqrt{2}/2}{-\\sqrt{2}/2} = \\frac{y-\\sqrt{2}/2}{\\sqrt{2}/2}$$ and the point-slope equation is\n$$y-\\sqrt{2}/2 = \\frac{-\\sqrt{2}/2}{\\sqrt{2}/2}(x-\\sqrt{2}/2) \\Rightarrow y=-x+\\sqrt{2}.$$\nNormal line to a trajectory in the plane We have seen that the tangent line to a trajectory $f(t)$ at $a$ is the line passing through the point $P=f(a)$ directed by the velocity vector $\\mathbf{v}=f\u0026rsquo;(a)=(x\u0026rsquo;(a),y\u0026rsquo;(a))$. If we take as direction vector a vector orthogonal to $\\mathbf{v}$, we get another line that is known as normal line to the trajectory.\nDefinition - Normal line to a trajectory. Given a trajectory $f(t)$ in the real plane $\\mathbb{R}^2$, the normal line to the graph of $f$ at moment $t=a$ is the line with equation\n$$l: (x,y)=(x(a),y(a))+t(y\u0026rsquo;(a),-x\u0026rsquo;(a)) = (x(a)+ty\u0026rsquo;(a),y(a)-tx\u0026rsquo;(a)).$$\nThe Cartesian equation is\n$$\\frac{x-x(a)}{y\u0026rsquo;(a)} = \\frac{y-y(a)}{-x\u0026rsquo;(a)},$$\nand the point-slope equation is\n$$y-y(a) = \\frac{-x\u0026rsquo;(a)}{y\u0026rsquo;(a)}(x-x(a)).$$\nThe normal line is always perpendicular to the tangent line as their direction vectors are orthogonal. Example. Considering again the trajectory of the unit circumference $f(t) = (\\cos t,\\sin t)$, $t\\in \\mathbb{R}$, the normal line to the graph of $f$ at moment $t=\\pi/4$ is\n$$ \\begin{aligned} l: (x,y)\u0026amp;=(\\cos(\\pi/2),\\sin(\\pi/2))+t(\\cos(\\pi/2),\\sin(\\pi/2)) =\\newline \u0026amp;= \\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right)+t\\left(\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}\\right) =\\left(\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2},\\frac{\\sqrt{2}}{2}+t\\frac{\\sqrt{2}}{2}\\right), \\end{aligned} $$\nthe Cartesian equation is\n$$\\frac{x-\\sqrt{2}/2}{\\sqrt{2}/2} = \\frac{y-\\sqrt{2}/2}{\\sqrt{2}/2},$$ and the point-slope equation is $$y-\\sqrt{2}/2 = \\frac{\\sqrt{2}/2}{\\sqrt{2}/2}(x-\\sqrt{2}/2) \\Rightarrow y=x.$$\nTangent and normal lines to a function A particular case of tangent and normal lines to a trajectory are the tangent and normal lines to a function of one real variable. For every function $y=f(x)$, the trajectory that trace its graph is\n$$g(x) = (x,f(x)) \\quad x\\in \\mathbb{R},$$\nand its velocity is\n$$g\u0026rsquo;(x) = (1,f\u0026rsquo;(x)),$$\nso that the tangent line to $g$ at the moment $a$ is\n$$\\frac{x-a}{1} = \\frac{y-f(a)}{f\u0026rsquo;(a)} \\Rightarrow y-f(a) = f\u0026rsquo;(a)(x-a),$$\nand the normal line is\n$$\\frac{x-a}{f\u0026rsquo;(a)} = \\frac{y-f(a)}{-1} \\Rightarrow y-f(a) = \\frac{-1}{f\u0026rsquo;(a)}(x-a).$$\nExample. Given the function $y=x^2$, the trajectory that traces its graph is $g(x)=(x,x^2)$ and its velocity is $g\u0026rsquo;(x)=(1,2x)$. At the moment $x=1$ the trajectory passes through the point $(1,1)$ with a velocity $(1,2)$. Thus, the tangent line at that moment is\n$$\\frac{x-1}{1} = \\frac{y-1}{2} \\Rightarrow y-1 = 2(x-1) \\Rightarrow y = 2x-1,$$\nand the normal line is\n$$\\frac{x-1}{2} = \\frac{y-1}{-1} \\Rightarrow y-1 = \\frac{-1}{2}(x-1) \\Rightarrow y = \\frac{-x}{2}+\\frac{3}{2}.$$\nTangent line to a trajectory in the space The concept of tangent line to a trajectory can be easily extended from the real plane to the three-dimensional space $\\mathbb{R}^3$.\nIf $f(t)=(x(t),y(t),z(t))$, $t\\in \\mathbb{R}$, is a trajectory in the real space $\\mathbb{R}^3$, then at the moment $a$, the moving object that follows this trajectory will be at the position $P=(x(a),y(a),z(a))$ with a velocity $\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t),z\u0026rsquo;(t))$. Thus, the tangent line to $f$ at this moment have the following vectorial equation\n$$ \\begin{aligned} l\u0026amp;: (x,y,z)=(x(a),y(a),z(a))+t(x\u0026rsquo;(a),y\u0026rsquo;(a),z\u0026rsquo;(a)) =\\newline \u0026amp;= (x(a)+tx\u0026rsquo;(a),y(a)+ty\u0026rsquo;(a),z(a)+tz\u0026rsquo;(a)), \\end{aligned} $$\nand the Cartesian equations are $$\\frac{x-x(a)}{x\u0026rsquo;(a)}=\\frac{y-y(a)}{y\u0026rsquo;(a)}=\\frac{z-z(a)}{z\u0026rsquo;(a)},$$ provided that $x\u0026rsquo;(a)\\neq 0$, $y\u0026rsquo;(a)\\neq 0$ y $z\u0026rsquo;(a)\\neq 0$.\nExample. Given the trajectory $f(t)=(\\cos t, \\sin t, t)$, $t\\in \\mathbb{R}$ in the real space, at the moment $t=\\pi/2$ the trajectory passes through the point\n$$f(\\pi/2)=(\\cos(\\pi/2),\\sin(\\pi/2),\\pi/2)=(0,1,\\pi/2),$$\nwith velocity\n$$\\mathbf{v}=f\u0026rsquo;(\\pi/2)=(-\\sin(\\pi/2),\\cos(\\pi/2), 1)=(-1,0,1),$$\nand the tangent line to the graph of $f$ at that moment is\n$$l:(x,y,z)=(0,1,\\pi/2)+t(-1,0,1) = (-t,1,t+\\pi/2).$$\nInteractive Example\nNormal plane to a trajectory in the space In the three-dimensional space $\\mathbb{R}^3$, the normal line to a trajectory is not unique. There are an infinite number of normal lines and all of them are in the normal plane.\nIf $f(t)=(x(t),y(t),z(t))$, $t\\in \\mathbb{R}$, is a trajectory in the real space $\\mathbb{R}^3$, then at the moment $a$, the moving object that follows this trajectory will be at the position $P=(x(a),y(a),z(a))$ with a velocity $\\mathbf{v}=f\u0026rsquo;(t)=(x\u0026rsquo;(t),y\u0026rsquo;(t),z\u0026rsquo;(t))$. Thus, using the velocity vector as normal vector the normal plane to $f$ at this moment have the following vectorial equation\n$$ \\begin{aligned} \\Pi \u0026amp;: (x-x(a),y-y(a),z-z(a))(x\u0026rsquo;(a),y\u0026rsquo;(a),z\u0026rsquo;(a)) = 0\\newline \u0026amp;= x\u0026rsquo;(a)(x-x(a))+y\u0026rsquo;(a)(y-y(a))+z\u0026rsquo;(a)(z-z(a))=0. \\end{aligned} $$\nExample. For the trajectory of the previous example $f(t)=(\\cos t, \\sin t, t)$, $t\\in \\mathbb{R}$, at the moment $t=\\pi/2$ the trajectory passes through the point\n$$f(\\pi/2)=(\\cos(\\pi/2),\\sin(\\pi/2),\\pi/2)=(0,1,\\pi/2),$$\nwith velocity\n$$\\mathbf{v}=f\u0026rsquo;(\\pi/2)=(-\\sin(\\pi/2),\\cos(\\pi/2), 1)=(-1,0,1),$$ and normal plane to the graph of $f$ at that moment is\n$$\\Pi:\\left(x-0,y-1,z-\\frac{\\pi}{2}\\right)(-1,0,1) =0 \\Leftrightarrow -x+z-\\frac{\\pi}{2}=0.$$\nInteractive Example\nFunctions of several variables A lot of problems in Geometry, Physics, Chemistry, Biology, etc. involve a variable that depend on two or more variables:\nThe area of a triangle depends on two variables that are the base and height lengths. The volume of a perfect gas depends on two variables that are the pressure and the temperature. The way travelled by an object free falling depends on a lot of variables: the time, the area of the cross section of the object, the latitude and longitude of the object, the height above the sea level, the air pressure, the air temperature, the speed of wind, etc. These dependencies are expressed with functions of several variables.\nDefinition - Functions of several real variables. A function of $n$ real variables or a scalar field from a set $A_1\\times \\cdots \\times A_n\\subseteq \\mathbb{R}^n$ in a set $B\\subseteq \\mathbb{R}$, is a relation that maps any tuple $(a_1,\\ldots,a_n)\\in A_1\\times \\cdots\\times A_n$ into a unique element of $B$, denoted by $f(a_1,\\ldots,a_n)$, that is knwon as the image of $(a_1,\\ldots,a_n)$ by $f$.\n$$ \\begin{array}{lccc} f: \u0026amp; A_1\\times\\cdots\\times A_n \u0026amp; \\longrightarrow \u0026amp; B\\newline \u0026amp;(a_1,\\ldots,a_n) \u0026amp; \\longrightarrow \u0026amp; f(a_1,\\ldots,a_n) \\end{array} $$\nThe area of a triangle is a real function of two real variables $$f(x,y)=\\frac{xy}{2}.$$\nThe volume of a perfect gas is a real function of two real variables $$v=f(t,p)=\\frac{nRt}{p},\\quad \\mbox{with $n$ and $R$ constants.}$$\nGraph of a function of two variables The graph of a function of two variables $f(x,y)$ is a surface in the real space $\\mathbb{R}^3$ where every point of the surface has coordinates $(x,y,z)$, with $z=f(x,y)$.\nExample. The function $f(x,y)=\\dfrac{xy}{2}$ that measures the area of a triangle of base $x$ and height $y$ has the graph below.\nThe function $\\displaystyle f(x,y)=\\frac{\\sin(x^2+y^2)}{\\sqrt{x^2+y^2}}$ has the peculiar graph below.\nLevel set of a scalar field Definition - Level set Given a scalar field $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$, the level set $c$ of $f$ is the set\n$$C_{f,c}={(x_1,\\ldots,x_n): f(x_1,\\ldots,x_n)=c},$$\nthat is, a set where the function takes on the constant value $c$.\nExample. Given the scalar field $f(x,y)=x^2+y^2$ and the point $P=(1,1)$, the level set of $f$ that includes $P$ is\n$$C_{f,2} = {(x,y): f(x,y)=f(1,1)=2} = {(x,y): x^2+y^2=2},$$\nthat is the circumference of radius $\\sqrt{2}$ centred at the origin.\nLevel sets are common in applications like topographic maps, where the level curves correspond to points with the same height above the sea level,\nand weather maps (isobars), where level curves correspond to points with the same atmospheric pressure.\nPartial functions Definition - Partial function. Given a scalar field $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$, an $i$-th partial function of $f$ is any function $f_i:\\mathbb{R}\\rightarrow \\mathbb{R}$ that results of substituting all the variables of $f$ by constants, except the $i$-th variable, that is:\n$$f_i(x)=f(c_1,\\ldots,c_{i-1},x,c_{i+1},\\ldots,c_{n}),$$\nwith $c_j$ $(j=1,\\ldots, n,\\ j\\neq i)$ constants.\nExample. If we take the function that measures the area of a triangle\n$$f(x,y)=\\frac{xy}{2},$$ and set the value of the base to $x=c$, then we the area of the triangle depends only of the height, and $f$ becomes a function of one variable, that is the partial function\n$$f_1(y)=f(c,y)=\\frac{cy}{2},\\quad \\mbox{with $c$ constant}.$$\nPartial derivative notion Variation of a function with respect to a variable We can measure the variation of a scalar field with respect to each of its variables in the same way that we measured the variation of a one-variable function.\nLet $z=f(x,y)$ be a scalar field of $\\mathbb{R}^2$. If we are at point $(x_0,y_0)$ and we increase the value of $x$ a quantity $\\Delta x$, then we move in the direction of the $x$-axis from the point $(x_0,y_0)$ to the point $(x_0+\\Delta x,y_0)$, and the variation of the function is $$\\Delta z=f(x_0+\\Delta x,y_0)-f (x_0,y_0).$$\nThus, the rate of change of the function with respect to $x$ along the interval $[x_0,x_0+\\Delta x]$ is given by the quotient\n$$\\frac{\\Delta z}{\\Delta x}=\\frac{f(x_0+\\Delta x,y_0)-f(x_0,y_0)}{\\Delta x}.$$\nInstantaneous rate of change of a scalar field with respect to a variable If instead o measuring the rate of change in an interval, we measure the rate of change in a point, that is, when $\\Delta x$ approaches 0, then we get the instantaneous rate of change that is the partial derivative with respect to $x$.\n$$\\lim_{\\Delta x\\rightarrow 0}\\frac{\\Delta z}{\\Delta x}=\\lim_{\\Delta x \\rightarrow 0}\\frac{f(x_0+\\Delta x,y_0)-f(x_0,y_0)}{\\Delta x}.$$\nThe value of this limit, if exists, it is known as the partial derivative of $f$ with respect to the variable $x$ at the point $(x_0,y_0)$; it is written as $$\\frac{\\partial f}{\\partial x}(x_0,y_0).$$\nThis partial derivative measures the instantaneous rate of change of $f$ at the point $P=(x_0,y_0)$ when $P$ moves in the $x$-axis direction.\nGeometric interpretation of partial derivatives Geometrically, a two-variable function $z=f(x,y)$ defines a surface. If we cut this surface with a plane of equation $y=y_0$ (that is, the plane where $y$ is the constant $y_0$) the intersection is a curve, and the partial derivative of $f$ with respect to to $x$ at $(x_0,y_0)$ is the slope of the tangent line to that curve at $x=x_0$.\nInteractive Example\nPartial derivative The concept of partial derivative can be extended easily from two-variable function to $n$-variables functions.\nDefinition - Partial derivative. Given a $n$-variables function $f(x_1,\\ldots,x_n)$, $f$ is partially differentiable with respect to the variable $x_i$ at the point $a=(a_1,\\ldots,a_n)$ if exists the limit\n$$\\lim_{\\Delta x_i\\rightarrow 0} \\frac{f(a_1,\\ldots,a_{i-1},a_i+\\Delta x_i,a_{i+1},\\ldots,a_n)-f(a_1,\\ldots,a_{i-1},a_i,a_{i+1},\\ldots,a_n)} {h}.$$\nIn such a case, the value of the limit is known as partial derivative of $f$ with respect to $x_i$ at $a$; it is denoted\n$$f\u0026rsquo;_{x_i}(a)=\\frac{\\partial f}{\\partial x_i}(a).$$\nRemark. The definition of derivative for one-variable functions is a particular case of this definition for $n=1$.\nPartial derivatives computation When we measure the variation of $f$ with respect to a variable $x_i$ at the point $a=(a_1,\\ldots,a_n)$, the other variables remain constant. Thus, if we can consider the $i$-th partial function $$f_i(x_i)=f(a_1,\\ldots,a_{i-1},x_i,a_{i+1},\\ldots,a_n),$$\nthe partial derivative of $f$ with respect to $x_i$ can be computed differentiating this function:\n$$\\frac{\\partial f}{\\partial x_i}(a)=f_i\u0026rsquo;(a_i).$$\nTo differentiate partially $f(x_1,\\ldots,x_n)$ with respect to the variable $x_i$, you have to differentiate $f$ as a function of the variable $x_i$, considering the other variables as constants. Example of a perfect gas. Consider the function that measures the volume of a perfect gas $$v(t,p)=\\frac{nRt}{p},$$ where $t$ is the temperature, $p$ the pressure and $n$ and $R$ are constants.\nThe instantaneous rate of change of the volume with respect to the pressure is the partial derivative of $v$ with respect to $p$. To compute this derivative we have to think in $t$ as a constant and differentiate $v$ as if the unique variable was $p$:\n$$\\frac{\\partial v}{\\partial p}(t,p)=\\frac{d}{dp}\\left(\\frac{nRt}{p}\\right)_{\\mbox{$t=$cst}}=\\frac{-nRt}{p^2}.$$\nIn the same way, the instantaneous rate of change of the volume with respect to the temperature is the partial derivative of $v$ with respect to $t$:\n$$\\frac{\\partial v}{\\partial t}(t,p)=\\frac{d}{dt}\\left(\\frac{nRt}{p}\\right)_{\\mbox{$p=$cst}}=\\frac{nR}{p}.$$\nGradient Definition - Gradient. Given a scalar field $f(x_1,\\ldots,x_n)$, the gradient of $f$, denoted by $\\nabla f$, is a function that maps every point $a=(a_1,\\ldots,a_n)$ to a vector with coordinates the partial derivatives of $f$ at $a$,\n$$\\nabla f(a)=\\left(\\frac{\\partial f}{\\partial x_1}(a),\\ldots,\\frac{\\partial f}{\\partial x_n}(a)\\right).$$\nLater we will show that the gradient in a point is a vector with the magnitude and direction of the maximum rate of change of the function in that point. Thus, $\\nabla f(a)$ points to direction of maximum increase of $f$ at $a$, while $-\\nabla f(a)$ points to the direction of maximum decrease of $f$ at $a$. Example. After heating a surface, the temperature $t$ (in $^\\circ$C) at each point $(x,y,z)$ (in m) of the surface is given by the function\n$$t(x,y,z)=\\frac{x}{y}+z^2.$$\nIn what direction will increase the temperature faster at point $(2,1,1)$ of the surface? What magnitude will the maximum increase of temperature have?\nThe direction of maximum increase of the temperature is given by the gradient\n$$\\nabla t(x,y,z)=\\left(\\frac{\\partial t}{\\partial x}(x,y,z),\\frac{\\partial t}{\\partial y}(x,y,z),\\frac{\\partial t}{\\partial z}(x,y,z)\\right)=\\left(\\frac{1}{y},\\frac{-x}{y^2},2z\\right).$$\nAt point $(2,1,1)$ de direction is given by the vector\n$$\\nabla t(2,1,1)=\\left(\\frac{1}{1},\\frac{-2}{1^2},2\\cdot 1\\right)=(1,-2,2),$$\nand its magnitude is\n$$|\\nabla f(2,1,1)|=|\\sqrt{1^2+(-2)^2+2^2}|=|\\sqrt{9}|=3 \\mbox{ $^\\circ$C/m}.$$\nComposition of a vectorial field with a scalar field Multivariate chain rule If $f:\\mathbb{R}^n\\rightarrow \\mathbb{R}$ is a scalar field and $g:\\mathbb{R}\\rightarrow \\mathbb{R}^n$ is a vectorial function, then it is possible to compound $g$ with $f$, so that $f\\circ g:\\mathbb{R}\\rightarrow \\mathbb{R}$ is a one-variable function.\nTheorem - Chain rule. If $g(t)=(x_1(t),\\ldots,x_n(t))$ is a vectorial function differentiable at $t$ and $f(x_1,\\ldots,x_n)$ is a scalar field differentiable at the point $g(t)$, then $f\\circ g(t)$ is differentiable at $t$ and\n$$(f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t)=\\frac{\\partial f}{\\partial x_1}\\frac{dx_1}{dt}+ \\cdots + \\frac{\\partial f}{\\partial x_n}\\frac{dx_n}{dt}$$\nExample. Let us consider the scalar field $f(x,y)=x^2y$ and the vectorial function $g(t)=(\\cos t,\\sin t)$ $t\\in [0,2\\pi]$ in the real plane, then\n$$\\nabla f(x,y) = (2xy, x^2) \\quad \\mbox{and} \\quad g\u0026rsquo;(t) = (-\\sin t, \\cos t),$$\nand\n$$ \\begin{aligned} (f\\circ g)\u0026rsquo;(t) \u0026amp;= \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = (2\\cos t\\sin t,\\cos^2 t)\\cdot (-\\sin t,\\cos t) =\\newline \u0026amp;= -2\\cos t\\sin^2 t+\\cos^3 t. \\end{aligned} $$\nWe can get the same result differentiating the composed function directly\n$$(f\\circ g)(t) = f(g(t)) = f(\\cos t, \\sin t) = \\cos^2 t\\sin t,$$\nand its derivative is\n$$(f\\circ g)\u0026rsquo;(t) = 2\\cos t(-\\sin t)\\sin t+\\cos^2 t \\cos t = -2\\cos t\\sin^2 t+\\cos^3 t.$$\nThe chain rule for the composition of a vectorial function with a scalar field allow us to get the algebra of derivatives for one-variable functions easily:\n$$ \\begin{aligned} (u+v)\u0026rsquo; \u0026amp;= u\u0026rsquo;+v\u0026rsquo;\\newline (uv)\u0026rsquo; \u0026amp;= u\u0026rsquo;v+uv\u0026rsquo;\\newline \\left(\\frac{u}{v}\\right)\u0026rsquo; \u0026amp;= \\frac{u\u0026rsquo;v-uv\u0026rsquo;}{v^2}\\newline (u\\circ v)\u0026rsquo; \u0026amp;= u\u0026rsquo;(v)v' \\end{aligned} $$\nTo infer the derivative of the sum of two functions $u$ and $v$, we can take the scalar field $f(x,y)=x+y$ and the vectorial function $g(t)=(u(t),v(t))$. Applying the chain rule we get\n$$(u+v)\u0026rsquo;(t) = (f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = (1,1)\\cdot (u\u0026rsquo;,v\u0026rsquo;) = u\u0026rsquo;+v\u0026rsquo;.$$\nTo infer the derivative of the quotient of two functions $u$ and $v$, we can take the scalar field $f(x,y)=x/y$ and the vectorial function $g(t)=(u(t),v(t))$.\n$$\\left(\\frac{u}{v}\\right)\u0026rsquo;(t) = (f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = \\left(\\frac{1}{v},-\\frac{u}{v^2}\\right)\\cdot (u\u0026rsquo;,v\u0026rsquo;) = \\frac{u\u0026rsquo;v-uv\u0026rsquo;}{v^2}.$$\nTangent plane and normal line to a surface Let $C$ be the level set of a scalar field $f$ that includes a point $P$. If $\\mathbf{v}$ is the velocity at $P$ of a trajectory following $C$, then\n$$\\nabla f(P) \\cdot \\mathbf{v} = 0.$$\nProof If we take the trajectory $g(t)$ that follows the level set $C$ and passes through $P$ at time $t=t_0$, that is $P=g(t_0)$, so $\\mathbf{v}=g\u0026rsquo;(t_0)$, then\n$$(f\\circ g)(t) = f(g(t)) = f(P),$$\nthat is constant at any $t$. Thus, applying the chain rule we have\n$$(f\\circ g)\u0026rsquo;(t) = \\nabla f(g(t))\\cdot g\u0026rsquo;(t) = 0,$$\nand, particularly, at $t=t_0$, we have\n$$\\nabla f(P)\\cdot \\mathbf{v} = 0.$$\nThat means that the gradient of $f$ at $P$ is normal to $C$ at $P$, provided that the gradient is not zero.\nNormal and tangent line to curve in the plane Normal line to a curve in the plane. According to the previous result, the normal line to a curve with equation $f(x,y)=0$ at point $P=(x_0,y_0)$, has equation\n$$P+t\\nabla f(P) = (x_0,y_0)+t\\nabla f(x_0,y_0).$$\nExample. Given the scalar field $f(x,y)=x^2+y^2-25$, and the point $P=(3,4)$, the level set of $f$ that passes through $P$, that satisfies $f(x,y)=f(P)=0$, is the circle with radius 5 centred at the origin of coordinates. Thus, taking as a normal vector the gradient of $f$\n$$\\nabla f(x,y) = (2x,2y),$$\nat the point $P=(3,4)$ is $\\nabla f(3,4) = (6,8)$, and the normal line to the circle at $P$ is\n$$P+t\\nabla f(P) = (3,4)+t(6,8) = (3+6t,4+8t),$$\nOn the other hand, the tangent line to the circle at $P$ is\n$$((x,y)-P)\\cdot \\nabla f(P) = ((x,y)-(3,4))\\cdot (6,8) = (x-3,y-4)\\cdot(6,8) = 6x+8y=50.$$\nNormal line and tangent plane to a surface in the space Normal line to a surface in the space. if we have a surface with equation $f(x,y,z)=0$, at the point $P=(x_0,y_0,z_0)$ the normal line has equation\n$$P+t\\nabla f(P) = (x_0,y_0,z_0)+t\\nabla f(x_0,y_0,z_0).$$\nExample. Given the scalar field $f(x,y,z)=x^2+y^2-z$, and the point $P=(1,1,2)$, the level set of $f$ that passes through $P$, that satisfies $f(x,y)=f(P)=0$, is the paraboloid $z=x^2+y^2$. Thus, taking as a normal vector the gradient of $f$\n$$\\nabla f(x,y,z) = (2x,2y,-1),$$\nat the point $P=(1,1,2)$ is $\\nabla f(1,1,2) = (2,2,-1)$, and the normal line to the paraboloid at $P$ is\n$$ \\begin{aligned} P+t\\nabla f(P)\u0026amp;= (1,1,2)+t\\nabla f(1,1,2) = (1,1,2)+t(2,2,-1)\\newline \u0026amp;= (1+2t,1+2t,2-t). \\end{aligned} $$\nOn the other hand, the tangent plane to the paraboloid at $P$ is\n$$\\begin{aligned} ((x,y,z)-P)\\cdot \\nabla f(P) \u0026amp;= ((x,y,z)-(1,1,2))(2,2,-1) = (x-1,y-1,z-2)(2,2,-1)=\\newline \u0026amp;= 2(x-1)+2(y-1)-(z-2) = 2x+2y-z-2= 0. \\end{aligned}$$\nThe graph of the paraboloid $f(x,y,z)=x^2+y^2-z=0$ and the normal line and the tangent plane to the graph of $f$ at the point $P=(1,1,2)$ are below.\nInteractive Example\nDirectional derivative For a scalar field $f(x,y)$, we have seen that the partial derivative $\\dfrac{\\partial f}{\\partial x}(x_0,y_0)$ is the instantaneous rate of change of $f$ with respect to $x$ at point $P=(x_0,y_0)$, that is, when we move along the $x$-axis.\nIn the same way, $\\dfrac{\\partial f}{\\partial y}(x_0,y_0)$ is the instantaneous rate of change of $f$ with respect to $y$ at the point $P=(x_0,y_0)$, that is, when we move along the $y$-axis.\nBut, what happens if we move along any other direction?\nThe instantaneous rate of change of $f$ at the point $P=(x_0,y_0)$ along the direction of a unitary vector $u$ is known as directional derivative.\nDefinition - Directional derivative. Given a scalar field $f$ of $\\mathbb{R}^n$, a point $P$ and a unitary vector $\\mathbf{u}$ in that space, we say that $f$ is differentiable at $P$ along the direction of $\\mathbf{u}$ if exists the limit\n$$f^\\prime_{\\mathbf{u}}(P) = \\lim_{h\\rightarrow 0}\\frac{f(P+h\\mathbf{u})-f(P)}{h}.$$\nIn such a case, the value of the limit is known as directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$.\nTheorem - Directional derivative . Given a scalar field $f$ of $\\mathbb{R}^n$, a point $P$ and a unitary vector $\\mathbf{u}$ in that space, the directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$ can be computed as the dot product of the gradient of $f$ at $P$ and the unitary vector $\\mathbf{u}$:\n$$f^\\prime_{\\mathbf{u}}(P) = \\nabla f(P)\\cdot \\mathbf{u}.$$\nProof If we consider a unitary vector $\\mathbf{u}$, the trajectory that passes through $P$, following the direction of $\\mathbf{u}$, has equation\n$$g(t)=P+t\\mathbf{u},\\ t\\in\\mathbb{R}.$$\nFor $t=0$, this trajectory passes through the point $P=g(0)$ with velocity $\\mathbf{u}=g\u0026rsquo;(0)$.\nThus, the directional derivative of $f$ at the point $P$ along the direction of $\\mathbf{u}$ is\n$$(f\\circ g)\u0026rsquo;(0) = \\nabla f(g(0))\\cdot g\u0026rsquo;(0) = \\nabla f(P)\\cdot \\mathbf{u}.$$\nThe partial derivatives are the directional derivatives along the vectors of the canonical basis. Example. Given the function $f(x,y) = x^2+y^2$, its gradient is\n$$\\nabla f(x,y) = (2x,2y).$$\nThe directional derivative of $f$ at the point $P=(1,1)$, along the unit vector $\\mathbf{u}=(1/\\sqrt{2},1/\\sqrt{2})$ is\n$$f_{\\mathbf{u}}\u0026rsquo;(P) = \\nabla f(P)\\cdot \\mathbf{u} = (2,2)\\cdot(1/\\sqrt{2},1/\\sqrt{2}) = \\frac{2}{\\sqrt{2}}+\\frac{2}{\\sqrt{2}} = \\frac{4}{\\sqrt{2}}.$$\nTo compute the directional derivative along a non-unitary vector $\\mathbf{v}$, we have to use the unitary vector that results from normalizing $v$ with the transformation\n$$\\mathbf{v\u0026rsquo;}=\\frac{\\mathbf{v}}{|\\mathbf{v}|}.$$\nGeometric interpretation of the directional derivative Geometrically, a two-variable function $z=f(x,y)$ defines a surface. If we cut this surface with a plane of equation $a(y-y_0)=b(x-x_0)$ (that is, the vertical plane that passes through the point $P=(x_0,y_0)$ with the direction of vector $\\mathbf{u}=(a,b)$) the intersection is a curve, and the directional derivative of $f$ at $P$ along the direction of $\\mathbf{u}$ is the slope of the tangent line to that curve at point $P$.\nInteractive Example\nGrowth of scalar field along the gradient We have seen that for any vector $\\mathbf{u}$\n$$f^\\prime_{\\mathbf{u}}(P) = \\nabla f(P)\\cdot \\mathbf{u} = |\\nabla f(P)|\\cos \\theta,$$\nwhere $\\theta$ is the angle between $\\mathbf{u}$ and the gradient $\\nabla f(P)$.\nTaking into account that $-1\\leq \\cos\\theta\\leq 1$, for any vector $\\mathbf{u}$ it is satisfied that\n$$-|\\nabla f(P)|\\leq f\u0026rsquo;_{\\mathbf{u}}(P)\\leq |\\nabla f(P)| .$$\nFurthermore, if $\\mathbf{u}$ has the same direction and sense than the gradient, we have $f\u0026rsquo;_{\\mathbf{u}}(P)=\\vert\\nabla f(P)\\vert\\cos 0=\\vert\\nabla f(P)\\vert$. Therefore, the maximum increase of a scalar field at a point $P$ is along the direction of the gradient at that point.\nIn the same manner, if $\\mathbf{u}$ has the same direction but opposite sense than the gradient, we have $f_{\\mathbf{u}}\u0026rsquo;(P)=\\vert\\nabla f(P)\\vert\\cos \\pi=-\\vert\\nabla f(P)\\vert$. Therefore, the maximum decrease of a scalar field at a point $P$ is along the opposite direction of the gradient at that point.\nImplicit derivation When we have a relation $f(x,y)=0$, sometimes we can consider $y$ as an implicit function of $x$, at least in a neighbourhood of a point $(x_0,y_0)$.\nThe equation $x^2+y^2=25$, whose graph is the circle of radius 5 centred at the origin of coordinates, its not a function, because if we solve the equation for $y$, we have two images for some values of $x$,\n$$y=\\pm \\sqrt{25-x^2}$$\nHowever, near the point $(3,4)$ we can represent the relation as the function $y=\\sqrt{25-x^2}$, and near the point $(3,-4)$ we can represent the relation as the function $y=-\\sqrt{25-x^2}$.\nIf an equation $f(x,y)=0$ defines $y$ as a implicit function of $x$, $y=h(x)$, in a neighbourhood of $(x_0,y_0)$, then we can compute de derivative of $y$, $h\u0026rsquo;(x)$, even if we do not know the explicit formula for $h$.\nTheorem - Implicit derivation. Let $f(x,y):\\mathbb{R}^2\\longrightarrow \\mathbb{R}$ a two-variable function and let $(x_0,y_0)$ be a point in $\\mathbb{R}^2$ such that $f(x_0,y_0)=0$. If $f$ has partial derivatives continuous at $(x_0,y_0)$ and $\\frac{\\partial f}{\\partial y}(x_0,y_0)\\neq 0$, then there is an open interval $I\\subset \\mathbb{R}$ with $x_0\\in I$ and a function $h(x): I\\longrightarrow \\mathbb{R}$ such that\n$y_0=h(x_0)$. $f(x,h(x))=0$ for all $x\\in I$. $h$ is differentiable on $I$, and $y\u0026rsquo;=h\u0026rsquo;(x)=\\frac{-\\dfrac{\\partial f}{\\partial x}}{\\dfrac{\\partial f}{\\partial y}}$ Proof. To prove the last result, take the trajectory $g(x)=(x,h(x))$ on the interval $I$. Then\n$$(f\\circ g)(x) = f(g(x)) = f(x,h(x))=0.$$\nThus, using the chain rule we have\n$$ \\begin{aligned} (f\\circ g)\u0026rsquo;(x) \u0026amp;= \\nabla f(g(x))\\cdot g\u0026rsquo;(x) = \\left(\\frac{\\partial f}{\\partial x}, \\frac{\\partial f}{\\partial y}\\right)\\cdot (1,h\u0026rsquo;(x)) = \\newline \u0026amp;= \\frac{\\partial f}{\\partial x}+\\frac{\\partial f}{\\partial y}h\u0026rsquo;(x) = 0, \\end{aligned} $$\nfrom where we can deduce\n$$y\u0026rsquo;=h\u0026rsquo;(x)=\\frac{-\\dfrac{\\partial f}{\\partial x}}{\\dfrac{\\partial f}{\\partial y}}.$$\nThis technique that allows us to compute $y\u0026rsquo;$ in a neighbourhood of $x_0$ without the explicit formula of $y=h(x)$, it is known as implicit derivation.\nExample. Consider the equation of the circle of radius 5 centred at the origin $x^2+y^2=25$. It can also be written as\n$$f(x,y) = x^2+y^2-25 = 0.$$ Take the point $(3,4)$ that satisfies the equation, $f(3,4)=0$.\nAs $f$ have partial derivatives $\\frac{\\partial f}{\\partial x}=2x$ and $\\frac{\\partial f}{\\partial y}=2y$, that are continuous at $(3,4)$, and $\\frac{\\partial f}{\\partial y}(3,4)=8\\neq 0$, then $y$ can be expressed as a function of $x$ in a neighbourhood of $(3,4)$ and its derivative is\n$$y\u0026rsquo;=\\frac{-\\frac{\\partial f}{\\partial x}}{\\frac{\\partial f}{\\partial y}} = \\frac{-2x}{2y}=\\frac{-x}{y} \\quad \\mbox{and} \\quad y\u0026rsquo;(3)=\\frac{-3}{4}.$$\nIn this particular case, that we know the explicit formula of $y=\\sqrt{1-x^2}$, we can get the same result computing the derivative as usual\n$$y\u0026rsquo; = \\frac{1}{2\\sqrt{1-x^2}}(-2x) = \\frac{-x}{\\sqrt{1-x^2}}.$$\nThe implicit function theorem can be generalized to functions with several variables.\nTheorem - Implicit derivation. Let $f(x_1,\\ldots,x_n,y):\\mathbb{R}^{n+1}\\longrightarrow \\mathbb{R}$ a $n+1$-variables function and let $(a_1,\\ldots, a_n,b)$ be a point in $\\mathbb{R}^{n+1}$ such that $f(a_1,\\ldots,a_n,b)=0$. If $f$ has partial derivatives continuous at $(a_1,\\ldots,a_n,b)$ and $\\frac{\\partial f}{\\partial y}(a_1,\\ldots,a_n,b)\\neq 0$, then there is a region $I\\subset \\mathbb{R}^n$ with $(x_1,\\ldots,x_n)\\in I$ and a function $h(x_1,\\ldots, x_n): I\\longrightarrow \\mathbb{R}$ such that\n$b=h(a_1,\\ldots,a_n)$. $f(x_1,\\ldots,x_n,h(x_1,\\ldots,x_n))=0$ for all $(x_1,\\ldots,x_n)\\in I$. $h$ is differentiable on $I$, and $\\dfrac{\\partial y}{\\partial x_i}=\\frac{-\\dfrac{\\partial f}{\\partial x_i}}{\\dfrac{\\partial f}{\\partial y}}$ Second order partial derivatives As the partial derivatives of a function are also functions of several variables we can differentiate partially each of them.\nIf a function $f(x_1,\\ldots,x_n)$ has a partial derivative $f^\\prime_{x_i}(x_1,\\ldots,x_n)$ with respect to the variable $x_i$ in a set $A$, then we can differentiate partially again $f_{x_i}^\\prime$ with respect to the variable $x_j$. This second derivative, when exists, is known as second order partial derivative of $f$ with respect to the variables $x_i$ and $x_j$; it is written as\n$$\\frac{\\partial ^2 f}{\\partial x_j \\partial x_i}= \\frac{\\partial}{\\partial x_j}\\left(\\frac{\\partial f}{\\partial x_i}\\right).$$\nIn the same way we can define higher order partial derivatives.\nExample. The two-variables function $$f(x,y)=x^y$$ has 4 second order partial derivatives:\n$$ \\begin{aligned} \\frac{\\partial^2 f}{\\partial x^2}(x,y) \u0026amp;= \\frac{\\partial}{\\partial x}\\left(\\frac{\\partial f}{\\partial x}(x,y)\\right) = \\frac{\\partial}{\\partial x}\\left(yx^{y-1}\\right) = y(y-1)x^{y-2},\\newline \\frac{\\partial^2 f}{\\partial y \\partial x}(x,y) \u0026amp;= \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial f}{\\partial x}(x,y)\\right) = \\frac{\\partial}{\\partial y}\\left(yx^{y-1}\\right) = x^{y-1}+yx^{y-1}\\log x,\\newline \\frac{\\partial^2 f}{\\partial x \\partial y}(x,y) \u0026amp;= \\frac{\\partial}{\\partial x}\\left(\\frac{\\partial f}{\\partial y}(x,y)\\right) = \\frac{\\partial}{\\partial x}\\left(x^y\\log x \\right) = yx^{y-1}\\log x+x^y\\frac{1}{x},\\newline \\frac{\\partial^2 f}{\\partial y^2}(x,y) \u0026amp;= \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial f}{\\partial y}(x,y)\\right) = \\frac{\\partial}{\\partial y}\\left(x^y\\log x \\right) = x^y(\\log x)^2. \\end{aligned} $$\nHessian matrix and Hessian Definition - Hessian matrix. Given a scalar field $f(x_1,\\ldots,x_n)$, with second order partial derivatives at the point $a=(a_1,\\ldots,a_n)$, the Hessian matrix of $f$ at $a$, denoted by $\\nabla^2f(a)$, is the matrix\n$$ \\nabla^2f(a)=\\left( \\begin{array}{cccc} \\dfrac{\\partial^2 f}{\\partial x_1^2}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_1 \\partial x_2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_1 \\partial x_n}(a)\\newline \\dfrac{\\partial^2 f}{\\partial x_2 \\partial x_1}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_2^2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_2 \\partial x_n}(a)\\newline \\vdots \u0026amp; \\vdots \u0026amp; \\ddots \u0026amp; \\vdots \\newline \\dfrac{\\partial^2 f}{\\partial x_n \\partial x_1}(a) \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_n \\partial x_2}(a) \u0026amp; \\cdots \u0026amp; \\dfrac{\\partial^2 f}{\\partial x_n^2}(a) \\end{array} \\right) $$\nThe determinant of this matrix is known as Hessian of $f$ at $a$; it is denoted $Hf(a)=\\vert\\nabla^2f(a)\\vert$.\nExample. Consider again the two-variables function\n$$f(x,y)=x^y.$$\nIts Hessian matrix is\n$$ \\nabla^2f(x,y) = \\left( \\begin{array}{cc} \\dfrac{\\partial^2 f}{\\partial x^2} \u0026amp; \\dfrac{\\partial^2 f}{\\partial x \\partial y}\\newline \\dfrac{\\partial^2 f}{\\partial y \\partial x} \u0026amp; \\dfrac{\\partial^2 f}{\\partial y^2} \\end{array} \\right) = \\left(\\begin{array}{cc} y(y-1)x^{y-2} \u0026amp; x^{y-1}(y\\log x+1) \\newline x^{y-1}(y\\log x+1) \u0026amp; x^y(\\log x)^2 \\end{array} \\right). $$\nAt point $(1,2)$ is\n$$ \\nabla^2 f(1,2) = \\left( \\begin{array}{cc} 2(2-1)1^{2-2} \u0026amp; 1^{2-1}(2\\log 1+1) \\newline 1^{2-1}(2\\log 1+1) \u0026amp; 1^2(\\log 1)^2 \\end{array} \\right) = \\left( \\begin{array}{cc} 2 \u0026amp; 1 \\newline 1 \u0026amp; 0 \\end{array} \\right). $$\nAnd its Hessian is\n$$ Hf(1,2)=\\left| \\begin{array}{cc} 2 \u0026amp; 1 \\newline 1 \u0026amp; 0 \\end{array} \\right|= 2\\cdot 0-1\\cdot1= -1. $$\nSymmetry of second partial derivatives In the previous example we can observe that the mixed derivatives of second order $\\frac{\\partial^2 f}{\\partial y\\partial x}$ and $\\frac{\\partial^2 f}{\\partial x\\partial y}$ are the same. This fact is due to the following result.\nTheorem - Symmetry of second partial derivatives. If $f(x_1,\\ldots,x_n)$ is a scalar field with second order partial derivatives $\\frac{\\partial^2 f}{\\partial x_i\\partial x_j}$ and $\\frac{\\partial^2 f}{\\partial x_j\\partial x_i}$ continuous at a point $(a_1,\\ldots,a_n)$, then\n$$\\frac{\\partial^2 f}{\\partial x_i\\partial x_j}(a_1,\\ldots,a_n)=\\frac{\\partial^2 f}{\\partial x_j\\partial x_i}(a_1,\\ldots,a_n).$$\nThis means that when computing a second partial derivative.\nAs a consequence, if the function satisfies the requirements of the theorem for all the second order partial derivatives, the Hessian matrix is symmetric.\nTaylor polynomials Linear approximation of a scalar field In a previous chapter we saw how to approximate a one-variable function with a Taylor polynomial. This can be generalized to several-variables functions.\nIf $P$ is a point in the domain of a scalar field $f$ and $\\mathbf{v}$ is a vector, the first degree Taylor formula of $f$ around $P$ is\n$$f(P+\\mathbf{v}) = f(P) + \\nabla f(P)\\cdot \\mathbf{v} +R^1_{f,P}(\\mathbf{v}),$$\nwhere\n$$P^1_{f,P}(\\mathbf{v}) = f(P)+\\nabla f(P)\\mathbf{v}$$\nis the first degree Taylor polynomial of $f$ at $P$, and $R^1_{f,P}(\\mathbf{v})$ is the Taylor remainder for the vector $\\mathbf{v}$, that is the error in the approximation.\nThe remainder satisfies\n$$\\lim_{|\\mathbf{v}|\\rightarrow 0} \\frac{R^1_{f,P}(\\mathbf{v})}{|\\mathbf{v}|} = 0$$\nThe first degree Taylor polynomial for a function of two variables is the tangent plane to the graph of $f$ at $P$. Linear approximation of a two-variable function If $f$ is a scalar field of two variables $f(x,y)$ and $P=(x_0,y_0)$, as for any point $Q=(x,y)$ we can take the vector $\\mathbf{v}=\\vec{PQ}=(x-x_0,y-y_0)$, then the first degree Taylor polynomial of $f$ at $P$, can be written as\n$$ \\begin{aligned} P^1_{f,P}(x,y) \u0026amp;= f(x_0,y_0)+\\nabla f(x_0,y_0)(x-x_0,y-y_0) =\\newline \u0026amp;= f(x_0,y_0)+\\frac{\\partial f}{\\partial x}(x_0,y_0)(x-x_0)+\\frac{\\partial f}{\\partial y}(x_0,y_0)(y-y_0). \\end{aligned} $$\nExample. Given the scalar field $f(x,y)=\\log(xy)$, its gradient is\n$$\\nabla f(x,y) = \\left(\\frac{1}{x},\\frac{1}{y}\\right),$$\nand the first degree Taylor polynomial at the point $P=(1,1)$ is\n$$\\begin{aligned} P^1_{f,P}(x,y) \u0026amp;= f(1,1) +\\nabla f(1,1)\\cdot (x-1,y-1) = \\newline \u0026amp;= \\log 1+(1,1)\\cdot(x-1,y-1) = x-1+y-1 = x+y-2. \\end{aligned}$$\nThis polynomial approximates $f$ near the point $P$. For instance,\n$$f(1.01,1.01) \\approx P^1_{f,P}(1.01,1.01) = 1.01+1.01-2 = 0.02.$$\nThe graph of the scalar field $f(x,y)=\\log(xy)$ and the first degree Taylor polynomial of $f$ at the point $P=(1,1)$ is below.\nQuadratic approximation of a scalar field If $P$ is a point in the domain of a scalar field $f$ and $\\mathbf{v}$ is a vector, the second degree Taylor formula of $f$ around $P$ is\n$$f(P+\\mathbf{v}) = f(P) + \\nabla f(P)\\cdot \\mathbf{v} + \\frac{1}{2}\\left(\\mathbf{v}\\nabla^2f(P)\\mathbf{v}\\right) + R^2_{f,P}(\\mathbf{v}),$$\nwhere\n$$P^2_{f,P}(\\mathbf{v})f(P)+\\nabla f(P)\\mathbf{v}+\\frac{1}{2}\\left(\\mathbf{v}\\nabla^2f(P)\\mathbf{v}\\right)$$\nis the second degree Taylor polynomial of $f$ at the point $P$, and $R^2_{f,P}(\\mathbf{v})$ is the Taylor remainder for the vector $\\mathbf{v}$, that is the error in the approximation.\nThe remainder satisfies\n$$\\lim_{|\\mathbf{v}\\rightarrow 0|} \\frac{R^2_{f,P}(\\mathbf{v})}{|\\mathbf{v}|^2} = 0.$$\nThis means that the remainder is smaller than the square of the module of $\\mathbf{v}$.\nQuadratic approximation of a two-variable function If $f$ is a scalar field of two variables $f(x,y)$ and $P=(x_0,y_0)$, then the second degree Taylor polynomial of $f$ at $P$, can be written as\n$$ \\begin{aligned} P^2_{f,P}(x,y) \u0026amp;= f(x_0,y_0)+\\nabla f(x_0,y_0)(x-x_0,y-y_0) + \\newline \u0026amp; + \\frac{1}{2}(x-x_0,y-y_0)\\nabla^2f(x_0,y_0)(x-x_0,y-y_0)= \\newline \u0026amp; = f(x_0,y_0)+\\frac{\\partial f}{\\partial x}(x_0,y_0)(x-x_0)+\\frac{\\partial f}{\\partial y}(x_0,y_0)(y-y_0)+ \\newline \u0026amp; + \\frac{1}{2}(\\frac{\\partial^2 f}{\\partial x^2}(x_0,y_0) (x-x_0)^2 + 2\\frac{\\partial^2 f}{\\partial y\\partial x}(x_0,y_0)(x-x_0)(y-y_0) + \\newline \u0026amp; + \\frac{\\partial^2 f}{\\partial y^2}(x_0,y_0)(y-y_0^2)) \\end{aligned} $$\nExample. Given the scalar field $f(x,y)=\\log(xy)$, its gradient is\n$$\\nabla f(x,y) = \\left(\\frac{1}{x},\\frac{1}{y}\\right),$$\nits Hessian matrix is\n$$Hf(x,y) = \\left( \\begin{array}{cc} \\frac{-1}{x^2} \u0026amp; 0\\newline 0 \u0026amp; \\frac{-1}{y^2} \\end{array} \\right)$$\nand the second degree Taylor polynomial of $f$ at the point $P=(1,1)$ is\n$$\\begin{aligned} P^2_{f,P}(x,y) \u0026amp;= f(1,1) +\\nabla f(1,1)\\cdot (x-1,y-1) +\\newline \u0026amp;+ \\frac{1}{2}(x-1,y-1)\\nabla^2f(1,1)\\cdot(x-1,y-1)=\\newline \u0026amp;= \\log 1+(1,1)\\cdot(x-1,y-1) +\\newline \u0026amp;+ \\frac{1}{2}(x-1,y-1) \\left( \\begin{array}{cc} -1 \u0026amp; 0\\newline 0 \u0026amp; -1 \\end{array} \\right) \\left( \\begin{array}{c} x-1\\newline y-1 \\end{array} \\right) = \\newline \u0026amp;= x-1+y-1+\\frac{-x^2-y^2+2x+2y-2}{2} =\\newline \u0026amp;= \\frac{-x^2-y^2+4x+4y-6}{2}. \\end{aligned}$$\nThus, $$ \\begin{aligned} f(1.01,1.01) \\approx P^1_{f,P}(1.01,1.01) \u0026amp;= \\frac{-1.01^2-1.01^2+4\\cdot 1.01+4\\cdot 1.01-6}{2} \\newline \u0026amp;= 0.0199. \\end{aligned} $$\nThe graph of the scalar field $f(x,y)=\\log(xy)$ and the second degree Taylor polynomial of $f$ at the point $P=(1,1)$ is below.\nInteractive Example\nRelative extrema Definition - Relative extrema. A scalar field $f$ in $\\mathbb{R}^n$ has a relative maximum at a point $P$ if there is a value $\\epsilon\u0026gt;0$ such that\n$$f(P)\\geq f(X)\\ \\forall X, |\\vec{PX}|\u0026lt;\\epsilon.$$\n$f$ has a relative minimum at $f$ if there is a value $\\epsilon\u0026gt;0$ such that\n$$f(P)\\leq f(X)\\ \\forall X, |\\vec{PX}|\u0026lt;\\epsilon.$$\nBoth relative maxima and minima are known as relative extrema of $f$.\nCritical points Theorem - Critical points. If a scalar field $f$ in $\\mathbb{R}^n$ has a relative maximum or minimum at a point $P$, then $P$ is a critical or stationary point of $f$, that is, a point where the gradient vanishes\n$$\\nabla f(P) = 0.$$\nProof Taking the trajectory that passes through $P$ with the direction of the gradient at that point $$g(t)=P+t\\nabla f(P),$$ the function $h=(f\\circ g)(t)$ does not decrease at $t=0$ since\n$$h\u0026rsquo;(0)= (f\\circ g)\u0026rsquo;(0) = \\nabla f(g(0))\\cdot g\u0026rsquo;(0) = \\nabla f(P)\\cdot \\nabla f(P) = |\\nabla f(P)|^2\\geq 0,$$\nand it only vanishes if $\\nabla f(P)=0$.\nThus, if $\\nabla f(P)\\neq 0$, $f$ can not have a relative maximum at $P$ since following the trajectory of $g$ from $P$ there are points where $f$ has an image greater than the image at $P$. In the same way, following the trajectory of $g$ in the opposite direction there are points where $f$ has an image less than the image at $P$, so $f$ can not have relative minimum at $P$.\nExample. Given the scalar field $f(x,y)=x^2+y^2$, it is obvious that $f$ only has a relative minimum at $(0,0)$ since\n$$f(0,0)=0 \\leq f(x,y)=x^2+y^2,\\ \\forall x,y\\in \\mathbb{R}.$$\nIs easy to check that $f$ has a critical point at $(0,0)$, that is $\\nabla f(0,0) = 0$.\nSaddle points Not all the critical points of a scalar field are points where the scalar field has relative extrema. If we take, for instance, the scalar field $f(x,y)=x^2-y^2$, its gradient is\n$$\\nabla f(x,y) = (2x,-2y),$$\nthat only vanishes at $(0,0)$. However, this point is not a relative maximum since the points $(x,0)$ in the $x$-axis have images $f(x,0)=x^2\\geq 0=f(0,0)$, nor a relative minimum since the points $(0,y)$ in the $y$-axis have images $f(0,y)=-y^2\\leq 0=f(0,0)$. This type of critical points that are not relative extrema are known as saddle points.\nAnalysis of the relative extrema From the second degree Taylor’s formula of a scalar field $f$ at a point $P$ we have\n$$f(P+\\mathbf{v})-f(P)\\approx \\nabla f(P)\\mathbf{v}+\\frac{1}{2}\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}.$$\nThus, if $P$ is a critical point of $f$, as $\\nabla f(P)=0$, we have\n$$f(P+\\mathbf{v})-f(P)\\approx \\frac{1}{2}\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}.$$\nTherefore, the sign of the $f(P+\\mathbf{v})-f(P)$ is the sign of the second degree term $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}$.\nThere are four possibilities:\nDefinite positive: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026gt;0$ $\\forall \\mathbf{v}\\neq 0$.\nDefinite negative: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026lt;0$ $\\forall \\mathbf{v}\\neq 0$.\nIndefinite: $\\nabla^2f(P)\\mathbf{v}\\cdot\\mathbf{v}\u0026gt;0$ for some $\\mathbf{v}\\neq 0$ and $\\nabla^2f(P)\\mathbf{u}\\cdot\\mathbf{u}\u0026lt;0$ for some $\\mathbf{u}\\neq 0$.\nSemidefinite: In any other case.\nThus, depending on de sign of $\\nabla^2 f(P)\\mathbf{v}\\cdot\\mathbf{v}$, we have\nTheorem. Given a critical point $P$ of a scalar field $f$, it holds that\nIf $\\nabla^2f(P)$ is definite positive then $f$ has a relative minimum at $P$. If $\\nabla^2f(P)$ is definite negative then $f$ has a relative maximum at $P$. If $\\nabla^2f(P)$ is indefinite then $f$ has a saddle point at $P$. When $\\nabla^2f(P)$ is semidefinite we can not draw any conclusion and we need higher order partial derivatives to classify the critical point.\nAnalysis of the relative extrema of a scalar field in $\\mathbb{R}^2$ In the particular case of a scalar field of two variables, we have\nTheorem. Given a critical point $P=(x_0,y_0)$ of a scalar field $f(x,y)$, it holds that\nIf $Hf(P)\u0026gt;0$ and $\\dfrac{\\partial^2 f}{\\partial x^2}(x_0,y_0)\u0026gt;0$ then $f$ has a relative minimum at $P$. If $Hf(P)\u0026gt;0$ and $\\dfrac{\\partial^2 f}{\\partial x^2}(x_0,y_0)\u0026lt;0$ then $f$ has a relative maximum at $P$. IF $Hf(P)\u0026lt;0$ then $f$ has a saddle point at $P$. Example. Given the scalar field $f(x,y)=\\dfrac{x^3}{3}-\\dfrac{y^3}{3}-x+y$, its gradient is\n$$\\nabla f(x,y)= (x^2-1,-y^2+1),$$\nand it has critical points at $(1,1)$, $(1,-1)$, $(-1,1)$ and $(-1,-1)$.\nThe hessian matrix is\n$$\\nabla^2f(x,y) = \\left( \\begin{array}{cc} 2x \u0026amp; 0\\newline 0 \u0026amp; -2y \\end{array} \\right)$$\nand the hessian is\n$$Hf(x,y) = -4xy.$$\nThus, we have\nPoint $(1,1)$: $Hf(1,1)=-4\u0026lt;0 \\Rightarrow$ Saddle point.\nPoint $(1,-1)$: $Hf(1,-1)=4\u0026gt;0$ and $\\frac{\\partial^2}{\\partial x^2}(1,-1)=2\u0026gt;0 \\Rightarrow$ Relative min.\nPoint $(-1,1)$: $Hf(-1,1)=4\u0026gt;0$ and $\\frac{\\partial^2}{\\partial x^2}(-1,1)=-2\u0026lt;0 \\Rightarrow$ Relative max.\nPoint $(-1,-1)$: $Hf(-1,-1)=-4\u0026lt;0 \\Rightarrow$ Saddle point.\nThe graph of the function $f(x,y)=\\dfrac{x^3}{3}-\\dfrac{y^3}{3}-x+y$ and their relative extrema and saddle points are shown below.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1631523540,"objectID":"b82b8cd53b4f3645f020f7b3a65480d0","permalink":"/en/teaching/calculus/manual/derivatives-n-variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/calculus/manual/derivatives-n-variables/","section":"teaching","summary":"Vector functions of a single real variable Definition - Vector function of a single real variable. A vector function of a single real variable or vector field of a scalar variable is a function that maps every scalar value $t\\in D\\subseteq \\mathbb{R}$ into a vector $(x_1(t),\\ldots,x_n(t))$ in $\\mathbb{R}^n$:","tags":["Partial Derivative","Gradient","Tangent Line","Normal Line","Tangent Plane","Normal Plane","Hessian Matrix","Extrema"],"title":"Several variables differentiable calculus","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Let $X$ be a discrete random variable with the following probability distribution\n$$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \\newline \\hline f(x) \u0026amp; 0.15 \u0026amp; 0.35 \u0026amp; 0.10 \u0026amp; 0.25 \u0026amp; 0.15 \\newline \\hline \\end{array} $$\nCalculate and represent graphically the distribution function. Calculate the following probabilities a. $P(X\u0026lt;7.5)$. b. $P(X\u0026gt;8)$. c. $P(4\\leq X\\leq 6.5)$. d. $P(5\u0026lt;X\u0026lt;6)$. Solution $$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;4$,}\\newline 0.15 \u0026amp; \\text{if $4\\leq x\u0026lt;5$,}\\newline 0.5 \u0026amp; \\text{if $5\\leq x\u0026lt;6$,}\\newline 0.6 \u0026amp; \\text{if $6\\leq x\u0026lt;7$,}\\newline 0.85 \u0026amp; \\text{if $7\\leq x\u0026lt;8$,}\\newline 1 \u0026amp; \\text{if $8\\leq x$.} \\end{cases} $$ $P(X\u0026lt;7.5)=0.85$, $P(X\u0026gt;8)=0$, $P(4\\leq x\\leq 6.5)=0.6$ and $P(5\u0026lt;X\u0026lt;6)=0$. Exercise 2 Let $X$ be a discrete random variable with the following probability distribution\n$$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;1$,} \\newline 1/5 \u0026amp; \\text{if $1\\leq x\u0026lt; 4$,} \\newline 3/4 \u0026amp; \\text{if $4\\leq x\u0026lt;6$,} \\newline 1 \u0026amp; \\text{if $6\\leq x$.} \\end{cases} $$\nCalculate the probability function. Calculate the following probabilities a. $P(X=6)$. b. $P(X=5)$. c. $P(2\u0026lt;X\u0026lt;5.5)$. d. $P(0\\leq X\u0026lt;4)$. Calculate the mean. Calculate the standard deviation. Solution $$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 1 \u0026amp; 4 \u0026amp; 6 \\newline \\hline f(x) \u0026amp; 0.2 \u0026amp; 0.55 \u0026amp; 0.25 \\newline \\hline \\end{array} $$\n$P(X=6)= 0.25$, $P(X=5)=0$, $P(2\u0026lt;X\u0026lt;5.5)=0.55$ and $P(0\\leq X\u0026lt;4)=0.2$.\n$\\mu=3.9$.\n$\\sigma=1.6703$.\nExercise 3 An experiment consist in injecting a virus to three rats and checking if they survive or not. It is known that the probability of surviving is $0.5$ for the first rat, $0.4$ for the second and $0.3$ for the third.\nCalculate the probability function of the variable $X$ that measures the number of surviving rats. Calculate the distribution function. Calculate $P(X\\leq 1)$, $P(X\\geq 2)$ and $P(X=1.5)$. Calculate the mean and the standard deviation. Is representative the mean? Solution $$ \\begin{array}{|c|c|c|c|c|} \\hline X \u0026amp; 0 \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \\newline \\hline f(x) \u0026amp; 0.21 \u0026amp; 0.44 \u0026amp; 0.29 \u0026amp; 0.06\\newline \\hline \\end{array} $$ 2.$$ F(x)= \\begin{cases} 0 \u0026amp; \\text{si $x\u0026lt;0$,}\\newline 0.21 \u0026amp; \\text{si $0\\leq x\u0026lt;1$,}\\newline 0.65 \u0026amp; \\text{si $1\\leq x\u0026lt;2$,}\\newline 0.94 \u0026amp; \\text{si $2\\leq x\u0026lt;3$,}\\newline 1 \u0026amp; \\text{si $3\\leq x$.} \\end{cases} $$\n$P(X\\leq 1)=0.65$, $P(X\\geq 2)=0.35$ and $P(X=1.5)=0$. $\\mu=1.2$ rats, $\\sigma^2=0.7$ rats$^2$ y $\\sigma=0.84$ rats. Exercise 4 The chance of being cured with certain treatment is 0.85. If we apply the treatment to 6 patients,\nWhat is the probability that half of them get cured? What is the probability that a least 4 of them get cured? Solution Let $X$ be the number of cured patients,\n$P(X=3) = 0.0415$. $P(X\\geq 4)= 0.9527$. Exercise 5 Ten persons came into contact with a person infected with tuberculosis. The probability of being infected after contacting a person with tuberculosis is 0.1.\nWhat is the probability that nobody is infected? What is the probability that at least 2 persons are infected? What is the expected number of infected persons? Solution Let $X$ be the number of persons infected,\n$P(X=0) = 0.3487$. $P(X\\geq 2)= 0.2639$. $\\mu=1$. Exercise 6 The probability of suffering an adverse reaction to a vaccine is 0.001. If 2000 persons are vaccinated, what is the probability of suffering some adverse reaction?\nSolution Let $X$ be the number of adverse reactions, $P(X\\geq 1)=0.8648$. Exercise 7 The average number of calls per minute received by a telephone switchboard is 120.\nWhat is the probability of receiving less than 4 calls in 2 seconds? What is the probability of receiving at least 3 calls in 3 seconds? Solution Let $X$ be the number of calls in 2 seconds, $P(X\u0026lt;4)=0.4335$. Let $Y$ be the number of calls in 3 seconds, $P(X\\geq 3)= 0.938$. Exercise 8 A test contains 10 questions with 3 possible options each. For every question you get a point if you give the right answer and lose half a point if the answer is wrong. A student knows the right answer for 3 of the 10 questions and answers the rest randomly. What is the probability of passing the exam?\nSolution Let $X$ be the number of correct answers in questions randomly answered, $P(X\\geq 4)=0.1733$. Exercise 9 It has been observed experimentally that 1 of every 20 trillions of cells exposed to radiation mutates becoming carcinogenic. We know that the human body has approximately 1 trillion of cells by kilogram ot tissue. Calculate the probability that a 60 kg person exposed to radiation develops cancer. If the radiation affects 3 persons weighing 60 kg, what is the probability that a least one of them develops cancer?\nSolution Let $X$ be the number of cells mutated, $P(X\u0026gt;0)=0.9502$.\nLet $Y$ be the of persons developing cancer, $P(Y\\geq 1) = 0.9999$. Exercise 10 A diagnostic test for a disease returns 1% of positive outcomes, and the positivie and negative predictive values are 0.95 and 0.98 respectively.\nCalculate the prevalence of the disease. Calculate the sensitivity and the specificity of the test. If the test is applied to 12 sick persons, what is the probability of getting at least a wrong diagnosis? If the test is applied to 12 persons, what is the probability of getting a right diagnosis for all of them? Solution $P(D)=0.0293$. Sensitivity $P(+\\vert D)=0.3242$ and specificity $P(-\\vert \\bar D)=0.9995$. Let $X$ be the number of wrong diagnosis in 12 sick persons, $P(X\\geq 1)=1$. Let $Y$ be the number of right diagnosis in 12 persons, $P(X=12)=0.7818$. Exercise 11 In a study about a parasite that attacks the kidney of rats it is known that the average number of parasites per kidney is 3.\nCalculate the probability that a rat has more than 3 parasites. Calculate the probability of having at least 9 rats infected in a sample of 10 rats. Solution Let $X$ be the number of parasites in a rat, $P(X\u0026gt;3)=0.8488$. Let $Y$ be the number of rats with parasites in a sample of 10 rats, $P(Y\\geq 9)=0.9997$. Exercise 12 In a physiotherapy course there are 60% of females and 40% of males.\nIf 6 random students have to go to a hospital for making practices, what is the probability of going more males than females? In 5 samples of 6 students, what is the probability of having some sample without males? Solution Let $X$ be the number of females in a group of 6 students, $P(X\u0026lt;2)=0.1792$. Let $Y$ be the number of groups of 6 students without males in a sample of 5 groups, $P(Y\u0026gt;0) =0.2125$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a1636b6696a9581dbd7c28189956fd1d","permalink":"/en/teaching/statistics/problems/discrete_random_variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/discrete_random_variables/","section":"teaching","summary":"Exercise 1 Let $X$ be a discrete random variable with the following probability distribution\n$$ \\begin{array}{|c|c|c|c|c|c|} \\hline X \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \\newline \\hline f(x) \u0026amp; 0.","tags":["Random Variables","Discrete Random Variables"],"title":"Problems of Discrete Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Exercise 1 Given the continuous random variable $X$ with the following probability density function chart, Check that $f(x)$ is a probability density function. Calculate the following probabilities a. $P(X\u0026lt;1)$ b. $P(X\u0026gt;0)$ c. $P(X=1/4)$ d. $P(1/2\\leq X\\leq 3/2)$ Calculate the distribution function. Solution $P(X\u0026lt;1)=0.5$, $P(X\u0026gt;0)=1$, $P(X=1/4)=0$ and $P(1/2\\leq X\\leq 3/2)=0.875$. $$ F(x)= \\begin{cases} 0 \u0026amp; \\text{if $x\u0026lt;0$,} \\newline x^2/2 \u0026amp; \\text{if $0\\leq x\u0026lt; 1$,} \\newline x-5 \u0026amp; \\text{if $1\\leq x\u0026lt;1.5$,} \\newline 1 \u0026amp; \\text{if $1.5\\leq x$.} \\end{cases} $$\nExercise 2 A worker can arrive to the workplace at any moment between 6 and 7 in the morning with the same likelihood.\nCompute and plot the probability density function of the variable that measures the arrival time. compute and plot the distribution function. Compute the probability of arriving before quarter past six and after half past six. What is the expected arrival time? Solution $P(X\u0026lt;6.25)=0.25$ and $P(X\u0026gt;6.5)=0.5$. $\\mu=6.5$. Exercise 3 Let $Z$ be a random variable following a standard normal distribution model. Calculate the following probabilities using the table of the distribution function:\n$P(Z\u0026lt;1.24)$ $P(Z\u0026gt;-0.68)$ $P(-1.35\\leq Z\\leq 0.44)$ Solution $P(Z\u0026lt;1.24)=0.8925$. $P(Z\u0026gt;-0.68)=0.7517$. $P(-1.35\\leq Z\\leq 0.44)=0.5815$. Exercise 4 Let $Z$ be a random variable following a standard normal distribution model. Determine the value of $x$ in the following cases using the table of the distribution function:\n$P(Z\u0026lt;x)=0.6406$. $P(Z\u0026gt;x)=0.0606$. $P(0\\leq Z\\leq x)=0.4783$. $P(-1.5\\leq Z\\leq x)=0.2313$. $P(-x\\leq Z\\leq x)=0.5467$. Solution $x=0.3601$. $x=1.5498$. $x=2.0198$. $x=-0.5299$. $x=0.7499$. Exercise 5 Let $X$ be a random variable following a normal distribution model $N(10,2)$.\nCalculate $P(X\\leq 10)$. Calculate $P(8\\leq X\\leq 14)$. Calculate the interquartile range. Calculate the third decile. Solution $P(X\\leq 10)=0.5$. $P(8\\leq X\\leq 14)=0.8186$. $IQR=2.698$. $D_3=8.9512$. Exercise 6 It is known that the glucose level in blood of diabetic persons follows a normal distribution model with mean 106 mg/100 ml and standard deviation 8 mg/100 ml.\nCalculate the probability of a random diabetic person having a glucose level less than 120 mg/100 ml. What percentage of persons have a glucose level between 90 and 120 mg/100 ml? Calculate and interpret the first quartile of the glucose level. Solution $P(X\\leq 120)=0.9599$. $P(90\\leq X\\leq 120)=0.9372 \\Rightarrow 93.72%$. $Q_1=100.6041$ mg/100 ml. Exercise 7 It is known that the cholesterol level in males 30 years old follows a normal distribution with mean 220 mg/dl and standard deviation 30 mg/dl. If there are 20000 males 30 years old in the population,\nhow many of them have a cholesterol level between 210 and 240 mg/dl? If a cholesterol level greater than 250 mg/dl can provoke a thrombosis, how many of them are at risk of thrombosis? Calculate the cholesterol level above which 20% of the males are? Solution $P(210\\leq X\\leq 240)=0.3781 \\Rightarrow 7561.3$ persons. $P(X\u0026gt; 250)=0.1587 \\Rightarrow 3173.1$ persons. $P_{80}=245.2486$ mg/dl. Exercise 8 In an exam done by 100 students, the average grade is 4.2 and only 32 students pass. Assuming that the grade follows a normal distribution model, how many students got a grade greater than 7?\nSolution $P(X\u0026gt;7)=0.0508 \\Rightarrow 5.1$ students. Exercise 9 In a population with 40000 persons, 2276 have between 0.8 and 0.84 milligrams of bilirubin per deciliter of blood, and 11508 have more than 0.84. Assuming that the level of bilirubin in blood follows a normal distribution model,\nCalculate the mean and the standard deviation. How many persons have more than 1 mg of bilirubin per dl of blood? Solution $\\mu=0.7001$ mg/dl and $s=0.2497$ mg/dl. $P(X\u0026gt;1)=0.1149 \\Rightarrow 11.5$ persons. Exercise 10 It is known that the blood pressure of people in a population with 20000 persons follows a normal distribution model with mean 13 mm Hg and interquartile range 4 mm Hg.\nHow many persons have a blood pressure above 16 mm Hg? How much have to decrease the blood pressure of a person with 16 mm Hg in order to be below the 40% of people with lowest blood pressure? Solution $P(X\u0026gt;16)=0.1587 \\Rightarrow 3174$ persons. $D_4 = 12.25$ mm Hg, so, must decrease a least $3.75$ mm Hg. Exercise 11 A study tries to determine the effect of a low fat diet in the lifetime of rats. The rats where divided into two groups, one with a normal diet and another with a low fat diet. It is assumed that the lifetimes of both groups are normally distributed with the same variance but different mean. If 20% of rats with normal diet lived more than 12 months, 5% less than 8 months, and 85% of rats with low fat diet lived more than 11 months,\nwhat is the mean and the standard deviation of the lifetime of rats following a low fat diet? If 40% of the rats were under a normal diet, and 60% of rats under a low fat diet, what is the probability that a random rat die before 9 months? Solution Naming $X_1$ and $X_2$ to the lifetime of rats with a normal diet and a low fat diet respectively,\n$\\mu_2=12.6673$ months and $s=1.6087$ months. $P(X\u0026lt;9)=0.068$. Exercise 12 A diagnostic test to determine doping of athletes returns a positive outcome when the concentration of a substance in blood is greater than 4 $\\mu$g/ml. If the distribution of the substance concentration in doped athletes follows a normal distribution model with mean 4.5 $\\mu$g/ml and standard deviation 0.2 $\\mu$g/ml, and in non-doped athletes is normally distributed with mean 3 $\\mu$g/ml and standard deviation 0.3 $\\mu$g/ml,\nwhat is the sensitivity and specificity of the test? If there is a 10% of doped athletes in a competition, what is the positive predicted value? Solution Naming $D$ to the event of being doped, $X$ to the concentration in doped athletes and $Y$ to the concentration in non-doped athletes,\nSensitivity $P(+\\vert D) = P(X\u0026gt;4)=0.9938$ and specificity $P(-\\vert \\bar D)=P(Y\u0026lt;4)=0.9996$ PPV $P(D\\vert +) = 0.9961$. Exercise 13 According to the central limit theorem, for big samples ($n\\geq 30$) the sample mean $\\bar x$ follows a normal distribution model $N(\\mu,\\sigma/\\sqrt{n})$, where $\\mu$ is the population mean and $\\sigma$ the population standard deviation.\nIt is known that in a population the sural triceps elongation follows has mean 60 cm and standard deviation 15 cm. If you draw a sample of 30 individuals from this population, what is the probability of having a sample mean greater than 62 cm? If a sample is atypical if its mean is below the 5th percentile, is atypical a sample of 60 individuals with $\\bar x=57$?\nSolution $P(\\bar x\u0026gt;62) = 0.2326$.\n$P_{5}=56.8148$, so, the sample is non-atypical. Exercise 14 The curing time of a knee injury in soccer players follows a normal distribution model with mean 50 days and standard deviation 10 days. If there is a final match in 65 days, what is the probability that a player that has just injured his knee will miss the final? If the semifinal match is in 40 days, and 4 players has just injured the knee, what is the probability that some of them can play the semifinal?\nSolution Let $X$ be the curing time, $P(X\u0026gt;65)=0.0668$.\nLet $Y$ be the number of injured players that could play the semifinal, $P(Y\\geq 1)=0.4989$. ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ecd113b6c57caed03f719da29b66a46b","permalink":"/en/teaching/statistics/problems/continuous_random_variables/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/problems/continuous_random_variables/","section":"teaching","summary":"Exercise 1 Given the continuous random variable $X$ with the following probability density function chart, Check that $f(x)$ is a probability density function. Calculate the following probabilities a. $P(X\u0026lt;1)$ b. $P(X\u0026gt;0)$ c.","tags":["Random Variables","Continuous Random Variables"],"title":"Problems of Continuous Random Variables","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 6, 2022\nQuestion 1 The patients of a physiotherapy clinic were asked to assess their satisfaction in a scale from 0 to 10. The assessments are summarized in the table below.\n$$\\begin{array}{lr} \\hline \\mbox{Assessment} \u0026amp; \\mbox{Patients}\\newline 0 - 2 \u0026amp; 3 \\newline 2 - 4 \u0026amp; 12 \\newline 4 - 6 \u0026amp; 9 \\newline 6 - 8 \u0026amp; 18 \\newline 8 - 10 \u0026amp; 22 \\newline \\hline \\end{array} $$\nCompute the interquartile range of the assessment and interpret it.\nIf it is required an assessment greater than 5 in more than 50% of patients for the clinic to remain open, will the clinic remain open?\nIs the assessment mean representative?\nCompute the coefficient of kurtosis of the assessment and interpret it. Is the kurtosis normal?\nIf the assessment mean of another clinic is 6.8 and the standard deviation is 2.6, which assessment is relatively higher 6 in the first clinic or 6.2 in the second?\nUse the following sums for the computations: $\\sum x_in_i=408$, $\\sum x_i^2n_i=3000$, $\\sum (x_i-\\bar x)^3n_i=-548.25$ and $\\sum (x_i-\\bar x)^4n_i=5140.45$.\nShow solution Let $X$ be the patient assessment.\n$Q_1= 4.2203$, $Q_3=8.5457$ and $IQR = 4.3254$, so the central dispersion is moderate.\n$F(5)=0.305$, and the percentage of patients with an assessment greater than 5 is $69.5\\%$.\n$\\bar x = 6.375$, $s_x^2 = 6.2344$, $s_x=2.4969$ and $cv=0.3917$, thus the representativity of the mean is moderate.\n$g_2 = -0.9335$ and the distribution is flatter than a Gauss bell, but normal, as $g_2$ is between -2 and 2.\nFirst clinic: $z(6)=-0.1502$\nSecond clinic: $z(6.2)=-0.3077$.\nThus, an assessment of 6 in the first clinic is relatively higher as its standard score is greater.\nQuestion 2 A study tries to determine the effectiveness a training program to increase the grip strength. The table below shows the grip strength in Kg in some weeks of the training program.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Week} \u0026amp; 1 \u0026amp; 3 \u0026amp; 6 \u0026amp; 9 \u0026amp; 14 \u0026amp; 17 \u0026amp; 21 \u0026amp; 24 \\newline \\mbox{Grip strength} \u0026amp; 15 \u0026amp; 22 \u0026amp; 29 \u0026amp; 34 \u0026amp; 36 \u0026amp; 39 \u0026amp; 40 \u0026amp; 41 \\newline \\hline \\end{array} $$\nCompute the regression coefficient of the grip strength on the weeks and interpret it.\nAccording to the logarithmic regression model, what is the expected grip strength after 5 and 25 weeks. Are these predictions reliable? Would these predictions be more reliable with the linear regression model?\nAccording to the exponential regression model, how many weeks are required to have a grip strength of 25 Kg?\nWhat percentage of the total variability of the weeks is explained by the exponential model?\nUse the following sums ($X$=Weeks and $Y$=Grip strength):\n$\\sum x_i=95$, $\\sum \\log(x_i)=16.7824$, $\\sum y_j=256$, $\\sum \\log(y_j)=27.3423$,\n$\\sum x_i^2=1629$, $\\sum \\log(x_i)^2=43.606$, $\\sum y_j^2=8804$, $\\sum \\log(y_j)^2=94.3237$,\n$\\sum x_iy_j=3552$, $\\sum x_i\\log(y_j)=342.9642$, $\\sum \\log(x_i)y_j=608.4186$, $\\sum \\log(x_i)\\log(y_j)=60.047$.\nShow solution $\\bar x=11.875$ weeks, $s_x^2=62.6094$ weeks$^2$.\n$\\bar y=32$ Kg, $s_y^2=76.5$ Kg$^2$.\n$s_{xy}=64$ weeks$\\cdot$Kg.\nRegression coefficient of $Y$ on $X$: $b_{yx} = 1.0222$ Kg/week. The grip strength increases $1.0222$ Kg per week.\n$\\overline{\\ln(x)} = 2.0978$ ln(weeks), $s_{\\ln(x)}^2 = 1.05$ ln(weeks)$^2$ and $s_{\\ln(x)y} = 8.9226$ ln(weeks)Kg.\nLogarithmic regression model of $Y$ on $X$: $y = 14.1729 + 8.498 \\ln(x)$.\nPredictions: $y(5) = 27.8499$ Kg and $y(25) = 41.5268$ Kg.\nLogarithmic coefficient of determination: $r^2 = 0.9912$. The predictions are not reliable because the sample size is small.\nLinear coefficient of determination: $r^2 = 0.8552$.\nAs the linear coefficient of determination is less than the logarithmic one, the predictions with the logarithmic model are more reliable.\nExponential regression model of $X$ on $Y$: $x = e^{-1.6345 + 0.1166y}$.\nPrediction: $x(25)=3.6015$ Weeks.\nAs $r^2 = 0.9912$, the exponential models explains $99.12$% of the variability of the weeks.\nQuestion 3 A diagnostic test for a cervical injury has a 99% of sensitivity and produces 80% of right diagnosis. Assuming that the prevalence of the injury is 10%:\nCompute the specificity of the test.\nCan we rule out the injury with a negative outcome of the test?\nCan we diagnose the injury with a positive outcome of the test? What must the minimum prevalence of the injury be to diagnose the injury with a positive outcome of the test?\nShow solution Specificity = $P(-|\\overline D) = 0.7789$.\nNegative predictive value = $P(\\overline D|-) = 0.9986 \u0026gt; 0.5$, so we can rule out the injury with a negative outcome.\nPositive predictive value = $P(D|+) = 0.3322 \u0026lt; 0.5$, so we can not diagnose the injury with a positive outcome. The minimum prevalence required to be able to diagnose the injury with a positive outcome is $P(D)=0.1825$.\nQuestion 4 A pharmacy sells two vaccines $A$ and $B$ against a virus. The $A$ vaccine produces 5% of side effects, while the $B$ vaccine produces 2% of side effects. The pharmacy has sold 10 units of the $A$ vaccine and 100 units of the $B$ vaccine.\nCompute the probability of having less than 2 side effects with the $A$ vaccine.\nCompute the probability of having more than 3 side effects with the $B$ vaccine.\nIf we apply both vaccines to the same person at different moments, and assuming that the production of side effects of the vaccines are independent, what is the probability that this person will have any side effect?\nShow solution Let $X$ be the number of side effects in 10 applications of A vaccine. Then, $X\\sim B(10, 0.05)$ and $P(X\u0026lt;2) = 0.9139$.\nLet $Y$ be the number of side effects in 100 applications of B vaccine. Then, $Y\\sim B(100, 0.02)\\approx P(2)$ and $P(Y\u0026gt;3) = 0.1429$.\nLet $A$ and $B$ the events of having side effects with vaccines A and B respectively. $P(A\\cup B) = 0.069$.\nQuestion 5 The length of the femur bone is normally distributed in both men and women with a standard deviation of 4 cm. It is also known that the first quartile in women is 42.3 cm, while the third quartile in men is 50.7 cm.\nWhat is the difference between the means of the femur length of women and men?\nRemark: If you do not know how to compute the means, use a mean 44 cm for women and a mean 47 cm for men in the following parts.\nCompute the 60th percentile of the femur length in women. What percentage of men have a femur length less than the 60th percentile of women?\nIf we pick a woman and man at random, what is the probability that neither of them has a femur length less than 45 cm?\nShow solution Let $X$ and $Y$ be the femur length of women and men respectively. Then $X\\sim N(\\mu_x, 4)$ and $Y\\sim N(\\mu_y,4)$.\n$\\mu_x = 44.91$ cm, $\\mu_y = 48.02$ cm and $\\mu_x - \\mu_y = -3.11$ cm.\n60th percentile in women $P_{60}=45.9234$ cm, and $P(Y\u0026lt;45.9234) = 0.3001$, that is, a $30.01\\%$ of men have a femur length less than the 60th percentile of women.\n$P(X\\geq 45 \\cap Y\\geq 45) = 0.3805$.\n","date":1654473600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1655134020,"objectID":"f5f50c8bc7726b0c5fe86666dfba940f","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-06-06/","publishdate":"2022-06-06T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-06-06/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 6, 2022\nQuestion 1 The patients of a physiotherapy clinic were asked to assess their satisfaction in a scale from 0 to 10. The assessments are summarized in the table below.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-06-06","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: May 6, 2022\nQuestion 1 A basketball player scores 12 points per game on average.\nWhat is the probability that the player scores more than 4 points in a quarter?\nIf the player plays 10 games in a league, what is the probability of scoring less than 6 points in some game?\nShow solution Let $X$ be the points scored in a quarter by the player. Then $X\\sim P(3)$, and $P(X\u0026gt;4)=0.1847$.\nLet $Y$ be the number of points scored in a game by the player. Then $Y\\sim P(12)$ and $P(Y\u0026lt;6)=0.0203$.\nLet $Z$ be the number of games with less than 6 points scored by the player. Then $Z\\sim B(10, 0.0203)$, and $P(Z\u0026gt;0)=0.1858$.\nQuestion 2 8% of people in a population consume cocaine. It is also known that 4% of people who consume cocaine have a heart attack and 10% of people who have a heart attack consume cocaine.\nConstruct the probability tree for the random experiment of drawing a random person from the population and measuring if he or she consumes cocaine and if he or she has a heart attack.\nCompute the probability that a random person of the population does not consume cocaine and does not have a heart attack.\nAre the events of consuming cocaine and having a heart attack dependent?\nCompute the relative risk and the odds ratio of suffering a heart attack consuming cocaine. Which association measure is more suitable for this study? Interpret it.\nShow solution Let $C$ the event of consuming cocaine and $H$ the event of having a heart attack.\n$P(\\overline C\\cap \\overline H)=0.8912$.\nThe events are dependent as $P(C)=0.08\\neq P(C|H)=0.1$.\n$RR(H)=1.2778$ and $OR(H)=1.2894$. The odds ratio is more suitable as the study is retrospective. That means that the odds of having a heart attack is $1.2894$ times greater if a person consumes cocaine.\nQuestion 3 The creatine phosphokinase (CPK3) is an enzyme in the body that causes the phosphorylation of creatine. This enzyme is found in the skeletal muscle and can be measured in a blood analysis. The concentration of CPK3 in blood is normally distributed, and the interval centred at the mean with the reference values, that accumulates 99% of the population, ranges from 40 to 308 IU/L in healthy adult males.\nCompute the mean and the standard deviation of the concentration of CPK3 in healthy males.\nA diagnostic test to detect muscular dystrophy gives a negative outcome when the concentration of CPK3 is below 300 UI/L. Compute the specificity of the test.\nIf the concentration of CPK3 in people with muscular dystrophy also follows a normal distribution with mean 350 IU/L and the same standard deviation, what is the sensitivity of the test?\nCompute the predictive values of the test and interpret them assuming that the muscular dystrophy prevalence is 8%.\nShow solution $\\mu = 174$ IU/L and $\\sigma = 51.938$ IU/L.\nSpecificity = $0.9924$.\nSensitivity = $0.8321$.\nThe test is better to confirm the disease as the specificity is greater than the sensitivity.\nPPV = $0.9046$. Thus, we can diagnose the disease with a positive outcome.\nNPV = $0.9855$. Thus, we can rule out the disease with a negative outcome.\n","date":1651795200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652390973,"objectID":"035e2e68234e69067ec8995b0251f08d","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-05-06/","publishdate":"2022-05-06T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-05-06/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 6, 2022\nQuestion 1 A basketball player scores 12 points per game on average.\nWhat is the probability that the player scores more than 4 points in a quarter?","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-05-06","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: March 11, 2022\nQuestion 1 The table below shows the number of credits obtained by the students of the first year of the physiotherapy grade.\n$$48, 52, 60, 60, 24, 48, 48, 36, 39, 54, 54, 60, 12, 46$$\nCompute the median and the mode and interpret them.\nDraw the box and whiskers plot and interpret it. Are there outliers in the sample?\nCan we assume that the sample comes from a normal population?\nIf the the second year the mean of credits obtained is $102$ and the standard deviation is $12.5$, which year has a higher relative dispersion?\nWhich number of credits is relatively higher, 50 in the first year, or 105 in the second year?\nUse the following sums for the computations:\n$\\sum x_i=641$ credits, $\\sum x_i^2=31901$ credits$^2$, $\\sum (x_i-\\bar x)^3=-40158.06$ credits$^3$ and $\\sum (x_i-\\bar x)^4=1672652.57$ credits$^4$.\nShow solution $Me = 48$ credits and $Mo = 48$ and $60$ credits.\n$Q_1= 39$ credits, $Q_3= 54$ credits, $IQR=15$ credits, $f_1= 16.5$ credits and $f_2= 76.5$ credits. 12 credits is an outlier.\n$\\bar x=45.7857$ credits, $s^2=182.3112$ credits$^2$, $s=13.5023$ credits.\n$g_1=-1.1653$ and $g_2=0.5946$. Thus, we can assume that the sample comes from a normal distribution as the coef. of skewness and the coef. of kurtosis fall between -2 and 2.\nFirst year: $cv=0.2949$. Second year: $cv=0.1225$. Thus, the first year has a higher relative dispersion as the coef. of variation is greater.\nStandard score for the first year: $z(50)=0.3121$\nStandard score for the second year: $z(105)=0.24$\nAs the standard score of $50$ the first year is greater than the standard score of $105$ the second year, 50 credits in the first year is relatively higher than 105 credits in the second year.\nQuestion 2 The Regional Ministry of Health of the Community of Madrid realizes a possible relationship between the level of air pollution and the number of cases of pneumonia in the population in the first 10 weeks of the year. To verify this, the variable $X$ registers the number of pollution meters that exceed the pollution limits each week, and the variable $Y$ indicates the number of people affected by pneumonia in each week.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline\nX \u0026amp; 3 \u0026amp; 3 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8 \u0026amp; 3 \u0026amp; 4 \u0026amp; 2 \u0026amp; 3 \\newline Y \u0026amp; 2 \u0026amp; 1 \u0026amp; 2 \u0026amp; 3 \u0026amp; 6 \u0026amp; 6 \u0026amp; 2 \u0026amp; 2 \u0026amp; 1 \u0026amp; 1 \\newline \\hline \\end{array} $$\nAre the number of people affected by pneumonia and the number of meters that exceed the pollution limits two linearly independent variables?\nAccording to the linear model, how does the number of people affected by pneumonia change in relation to the number of meters that exceed the pollution limits?\nJustify whether or not the linear relationship between the two variables is well explained and in what proportion.\nAccording to the exponential regression model, how many people are expected to be affected by pneumonia a week with 5 meters exceeding the pollution limits?\nWhich of the following diagrams best represents the regression lines? Justify the answer.\nUse the following sums for the computations:\n$\\sum x_i=44$ meters, $\\sum \\log(x_i)=13.9004$ log(meters), $\\sum y_j=26$ persons, $\\sum \\log(y_j)=7.4547$ log(persons),\n$\\sum x_i^2=230$ meters$^2$, $\\sum \\log(x_i)^2=21.1414$ log(meters)$^2$, $\\sum y_j^2=100$ persons$^2$, $\\sum \\log(y_j)^2=9.5496$ log(persons)$^2$,\n$\\sum x_iy_j=146$ meters$\\cdot$persons, $\\sum x_i\\log(y_j)=43.8653$ meters$\\cdot$log(persons), $\\sum \\log(x_i)y_j=42.8037$ log(meters)$\\cdot$persons, $\\sum \\log(x_i)\\log(y_j)=12.7804$ log(meters)$\\cdot$log(persons).\nShow solution $\\bar x = 4.4$ meters, $s_x^2=3.64$ meters$^2$.\n$\\bar y = 2.6$ persons, $s_y^2=3.24$ persons$^2$.\n$s_{xy}=3.16$ meters$\\cdot$persons. That means that there is a direct linear relation between the meters that exceed pollution limits and the people affected by pneumonia.\n$b_{yx}=0.8681$ persons/meter. Thus, the number of people affected by pneumonia increases $0.8681$ persons for every meter more that exceed the pollution limits.\nLinear coefficient of determination $r^2=0.8467$. Therefore, the linear regression model explains $84.67$ % of the variability of the number of people affected by pneumonia.\n$\\overline{\\log(y)}=0.7455$ log(persons), $s_{x\\log(y)}=1.1065$ meters*log(persons).\nExponential regression model: $y=e^{-0.592 + 0.304x}$, and $y(5)=-3.552$ persons.\nDiagram $A$ because the relation is direct and very strong according to the linear coefficient of determination.\n","date":1646956800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1650733145,"objectID":"40392b70da564e0b81e3910a9f83a155","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-03-11/","publishdate":"2022-03-11T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2022-03-11/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 11, 2022\nQuestion 1 The table below shows the number of credits obtained by the students of the first year of the physiotherapy grade.\n$$48, 52, 60, 60, 24, 48, 48, 36, 39, 54, 54, 60, 12, 46$$","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2022-03-11","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 17, 2022\nQuestion 1 To analyze the hypoxemia tolerance of mammals, in a laboratory some rats are exposed to extreme conditions with variable levels of oxygen. The rats are in a room whose oxygen level (in %) at any position $(x,y)$ (in meters), is\n$$O(x,y)=\\frac{1}{10}x^2y^2e^{x-y}$$\nFor the rats to survive, they must reach positions where the oxygen level is above 18%.\nA rat $A$ is at position $(3,2)$. If the rat stays in that position, will it survive?\nWhat direction should rat $A$ take in order to increase the oxygen level as quickly as possible? What is the instantaneous rate of change of the oxygen level following that direction?\nAnother rat $B$ is at position $(2,2)$. If it starts to move in such a way that $y$ decreases the double of the increment of $x$, how will the oxygen level change?\nSolution $O(3,2)=9.7858$%, therefore the rat will not survive.\nThe direction of the gradient $\\nabla(3,2) = (6e,0)$. Following this direction, the instantaneous rate of change of the oxygen level is $|\\nabla(3,2)|=6e$%/m.\nDirectional derivative along the direction of the vector $\\mathbf{v}=(1,-2)$: $f\u0026rsquo;_{\\mathbf{v}}(2,2)=1.4311$%/m.\nQuestion 2 The ozone ($O_3$) in the atmosphere is transformed into oxygen ($O_2$) through the following chemical reaction:\n$$2O_3 \\rightarrow 3O_2$$\nIt was experimentally observed that the speed at which the amount of oxygen varies is inversely proportional to the amount of oxygen present. If there is initially 10 g of oxygen in a place, and after one hour this amount of oxygen doubles,\nWhat will the amount of oxygen be after 5 hours?\nHow long will it take to have 1 kg of oxygen?\nSolution Let $t$ the time and $o(t)$ the amount of oxygen at time $t$. Differential equation $o\u0026rsquo;=k/o$.\nParticular solution: $o(t)=\\sqrt{300t+100}$.\n$o(5)=40$ g.\n$3333$ hours.\nQuestion 3 Two insects start moving from the same point following perpendicular directions.\nIf the first insect moves at a speed of 3 cm/s and the second at a speed of 4 cm/s, at what instantaneous speed does the distance between them change 2 seconds after they start moving? And at 3 seconds?\nIf 4 seconds after they start moving the second insect stops and the first continues moving with the same direction and speed, at what instantaneous speed does the distance between the two insects change at that moment?\nRemark: The distance between the two insects is the length of the hypotenuse of the right triangle whose sides are the distance travelled by them.\nSolution Let $h(t)$ the length of the hypotenuse of the right triangle whose sides are the distance travelled by the insects at time $t$.\n$h\u0026rsquo;(2)=5$ cm/s and $h\u0026rsquo;(3)=5$ cm/s.\n$h\u0026rsquo;(4)=1.8$ cm/s.\n","date":1642377600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642789234,"objectID":"43c4ecd6f56eb0b8b0a7887afa991d54","permalink":"/en/teaching/calculus/exams/pharmacy-2022-01-17/","publishdate":"2022-01-17T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2022-01-17/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 17, 2022\nQuestion 1 To analyze the hypoxemia tolerance of mammals, in a laboratory some rats are exposed to extreme conditions with variable levels of oxygen.","tags":["Exam"],"title":"Pharmacy exam 2022-01-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 17, 2022\nQuestion 1 A diagnostic test for a disease with a prevalence of 10% has a positive predictive value of 40% and negative predictive value of 95%.\nCompute the sensitivity and the specificity of the test.\nCompute the probability of a right diagnose.\nWhat must be the minimum sensitivity of the test to be able to diagnose the disease?\nSolution Sensitivity $P(+|D)=0.571$ and specificity $P(-|\\overline D)=0.9048$.\n$P(\\mbox{Right diagnose}) = P(D \\cap +) + P(\\overline D \\cap -) = 0.8714$.\nMinimum sensitivity to diagnose the disease $P(+|D)=0.857$.\nQuestion 2 To study the effectiveness of two antigen tests for the COVID both tests have been applied to a sample of 100 persons. The table below shows the results:\n$$ \\begin{array}{ccr} \\hline \\mbox{Test $A$} \u0026amp; \\mbox{Test $B$} \u0026amp; \\mbox{Num persons}\\newline \\mbox{+} \u0026amp; \\mbox{+} \u0026amp; 8\\newline \\mbox{+} \u0026amp; \\mbox{-} \u0026amp; 2\\newline \\mbox{-} \u0026amp; \\mbox{+} \u0026amp; 3\\newline \\mbox{-} \u0026amp; \\mbox{-} \u0026amp; 87\\newline \\hline \\end{array} $$\nDefine the following events and compute its probabilities:\nGet a $+$ in the test $A$.\nGet a $+$ in the test $A$ and a $-$ in the test $B$.\nGet a $+$ in some of the two tests.\nGet different results in the two tests.\nGet the same result in the two tests.\nGet a $+$ in the test $B$ if we got a $+$ in the test $A$.\nAre the outcomes of the two tests independent?\nSolution Let $A$ and $B$ the events of getting positive outcomes in the tests $A$ and $B$ respectively.\n$P(A)=0.1$.\n$P(A\\cap \\overline B)=0.02$.\n$P(A\\cup B) = 0.13$.\n$P(A\\cap \\overline B) + P(\\overline A \\cap B) = 0.05$.\n$P(A\\cap B) + P(\\overline A \\cap \\overline B)= 0.95$.\n$P(B|A) = 0.8$.\nAs $P(B|A)\\neq P(B)$ the events are dependent.\nQuestion 3 It is known that the life of a battery for a peacemaker follows a normal distribution. It has been observed that 20% of the batteries last more than 15 years, while 10% last less than 12 years.\nCompute the mean and the standard deviation of the battery life.\nCompute the fourth decile of the battery life.\nIf we take a sample of 5 batteries, what is the probability that more than half of them last between 13 and 14 years?\nIf we take a sample of 100 batteries, what is the probability that some of them last less than 11 years?\nSolution Let $X$ be the duration of a battery. Then $X\\sim N(\\mu,\\sigma)$.\n$\\mu = 13.8108$ years and $\\sigma = 1.413$ years.\n$D_4 = 13.4528$ years.\nLet $Y$ be the number of batteries lasting between 13 and 14 years in a sample of 5 batteries. Then $Y\\sim B(5,0.2702)$ and $P(Y\u0026gt;2.5)=0.0209$.\nLet $U$ be the number of batteries lasting less than 11 years in a sample of 100 batteries. Then $U\\sim B(100, 0.0233)\\approx P(2.3335)$ and $P(U\\geq 1)=0.903$.\n","date":1642377600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642925317,"objectID":"7af3f40b063c6570e1321845c462b81e","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2022-01-17/","publishdate":"2022-01-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2022-01-17/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 17, 2022\nQuestion 1 A diagnostic test for a disease with a prevalence of 10% has a positive predictive value of 40% and negative predictive value of 95%.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2022-01-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 22, 2021\nQuestion 1 The cranial capacity (in dm$^3$) of a primate population follows a normal probability distribution $X\\sim N(\\mu,\\sigma)$. The chart below shows the Gauss bell of $X$. Observe that the chart shows the area below the bell between 1 and 3.\nWhat is the mean of the cranial capacity distribution?\nIs the mean of the cranial capacity representative of the population?\nWhat are the coefficients of skewness and kurtosis?\nWhat is the interquartile range of the cranial capacity?\nIf a cranial capacity outside of the interval $(Q_1-1.5IQR, Q_3+1.5IQR)$ is considered an outlier, what is the probability of observing an outlier in the cranial capacity?\nSolution Let $X$ be the cranial capacity of a primate. Then, $X\\sim N(\\mu, \\sigma)$.\n$\\mu=2$ dm$^3$\n$\\sigma=0.5$ dm$^6$ and $cv=0.25$. As the coef. of variation is small, the mean is representative.\nAs $X$ follows a normal distribution, $g_1=0$ and $g_2=0$.\n$Q_1 = 1.6628$ dm$^3$, $Q_3=2.3372$ dm$^3$ and $IQR=0.6745$ dm$^3$.\nFences: $f_1=0.651$ dm$^3$ and $f_2=3.349$.\n$P(X \u0026lt; 0.651) + P(X \u0026gt; 3.349) = 0.007$.\nQuestion 2 A pharmaceutical company produces the same drug in 5 different laboratories. It has been observed that each laboratory produces, on average, one non-marketable defective batch every three months.\nWhat is the probability that a laboratory produce more than 3 defective batches in one year?\nWhat is the probability that at least 2 laboratories produce no defective batches in one year?\nSolution Let $X$ be the number of defective batches in a year then $X\\sim P(4)$, and $P(X\u0026gt;3) = 0.5665$.\nLet $Y$ be the number of laboratories that produce no defective batches in one year, then $Y\\sim B(5, 0.0183)$ and $P(Y\\geq 2) = 0.0032$.\nQuestion 3 The table below shows the frequencies observed in a random sample from a population for the blood type and SARS-CoV-2 infection:\n$$ \\begin{array}{llr} \\hline \\mbox{Blood type} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{Persons}\\newline \\mbox{O} \u0026amp; \\mbox{No} \u0026amp; 1800\\newline \\mbox{O} \u0026amp; \\mbox{Yes} \u0026amp; 100\\newline \\mbox{A} \u0026amp; \\mbox{No} \u0026amp; 4200\\newline \\mbox{A} \u0026amp; \\mbox{Yes} \u0026amp; 400\\newline \\mbox{B} \u0026amp; \\mbox{No} \u0026amp; 2500\\newline \\mbox{B} \u0026amp; \\mbox{Yes} \u0026amp; 150\\newline \\mbox{AB} \u0026amp; \\mbox{No} \u0026amp; 800\\newline \\mbox{AB} \u0026amp; \\mbox{Yes} \u0026amp; 50\\newline \\hline \\end{array} $$\nCompute the probability of SARS-CoV-2 infection for a random person.\nCompute the probability of having a blood type A and being infected by SARS-CoV-2 for a random person.\nCompute the probability of having a blood type A or being infected by SARS-CoV-2 for a random person.\nCompute the probability of being infected by SARS-CoV-2 for a person with blood type O.\nCompute the probability of having a blood type different from A and B for a person infected by SARS-CoV-2.\nDoes the SARS-CoV-2 infection depend on the blood type?\nSolution Let $I$ be the probability of being infected by SARS-CoV-2.\n$P(I) = 0.07$.\n$P(A\\cap I) = 0.04$.\n$P(A\\cup I) = 0.49$.\n$P(I|O) = 0.0526$.\n$P(\\overline A \\cap \\overline B|I) = 0.2143$.\nThe infection depends on the blood as, for instance, $p(I)\\neq P(I|O)$.\nQuestion 4 To study the relation between the blood Rh and the SARS-CoV-2 infection a random sample of non-infected people was drawn from a population. The table below shows the number of people infected after one year.\n$$ \\begin{array}{llr} \\hline \\mbox{Blood Rh} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{Persons}\\newline \\mbox{-} \u0026amp; \\mbox{Yes} \u0026amp; 520\\newline \\mbox{-} \u0026amp; \\mbox{No} \u0026amp; 6380\\newline \\mbox{+} \u0026amp; \\mbox{Yes} \u0026amp; 780\\newline \\mbox{+} \u0026amp; \\mbox{No} \u0026amp; 6200\\newline \\hline \\end{array} $$\nCompute the relative risk and the odds ratio to study the association between the SARS-CoV-2 infection and the blood Rh. Which association measure is more suitable to explain the relation between the SARS-CoV-2 infection and the blood Rh. Interpret it.\nA diagnostic test for the SARS-CoV-2 has been developed with a 95% of specificity and a 60% of sensitivity, regardless of blood Rh. In which blood Rh will produce more errors? Which diagnosis will we make if we apply the test to a persons with blood Rh- and we get a positive outcome? Which diagnosis will we make if we apply the test to a persons with blood Rh+ and we get a negative outcome?\nSolution Let $I$ be the event of being infected by SARS-CoV-2.\n$RR(I) = R_+(I) / R_-(I) = 1.4828$ and $OR(I) = O_+(I) / O_-(I) = 1.5435$.\nThe relative risk is more suitable as this is a prospective study and the incidence of infection can be estimated. Thus, the risk of infection with Rh+ is almost one and a half the risk with Rh-.\n$P(\\mbox{Error}|\\mbox{Rh-}) = 0.0764$ and $P(\\mbox{Error}|\\mbox{Rh+}) = 0.0891$. Thus, the test will produce more errors in people with Rh+.\nPositive predictive value for Rh-: $p(I|+)=0.4945$. Therefore, we will diagnose no infection.\nNegative predictive value for Rh+: $p(\\overline I|-)=0.9497$. Therefore, we will predict no infection.\n","date":1637539200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1638051573,"objectID":"10872ade46fe0189bda4d1cb55780dab","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-11-22/","publishdate":"2021-11-22T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-11-22/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 22, 2021\nQuestion 1 The cranial capacity (in dm$^3$) of a primate population follows a normal probability distribution $X\\sim N(\\mu,\\sigma)$. The chart below shows the Gauss bell of $X$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-11-22","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 25, 2021\nQuestion 1 The table below shows the number of daily sugary drinks drunk by a sample of 16-years-old people.\n$$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; \u0026amp; 0.1 \u0026amp; \u0026amp; \\newline \\hline 1 \u0026amp; \u0026amp; \u0026amp; 48 \u0026amp; \\newline \\hline 2 \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.725\\newline \\hline 3 \u0026amp; 24 \u0026amp; \u0026amp; \u0026amp; \\newline \\hline 4 \u0026amp; \u0026amp; \u0026amp; \u0026amp; 0.975\\newline \\hline 5 \u0026amp; \u0026amp; \u0026amp; 120 \u0026amp; \\newline \\hline \\end{array} $$\nComplete the table explaining how.\nPlot the cumulative frequency polygon.\nAre there outliers?\nStudy the normality of the distribution.\nIf another sample of 18-years-old people has a mean $2.1$ drinks and a variance $1.5$ drinks$^2$, in which distribution is more representative the mean?\nWho consumes a higher relative amount of sugary drinks, a 16-years-old who consumes 3 drinks a day or a 18-years-old who consumes 4?\nUse the following sums for the computations: $\\sum x_i= 225$ drinks, $\\sum x_i^2=579$ drinks$^2$, $\\sum (x_i-\\bar x)^3=80.16$ drinks$^3$ and $\\sum (x_i-\\bar x)^4=616.32$ drinks$^4$.\nSolution $$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i \\newline \\hline 0 \u0026amp; 12 \u0026amp; 0.100 \u0026amp; 12 \u0026amp; 0.100 \\newline \\hline 1 \u0026amp; 36 \u0026amp; 0.300 \u0026amp; 48 \u0026amp; 0.400 \\newline \\hline 2 \u0026amp; 39 \u0026amp; 0.325 \u0026amp; 87 \u0026amp; 0.725 \\newline \\hline 3 \u0026amp; 24 \u0026amp; 0.200 \u0026amp; 111 \u0026amp; 0.925 \\newline \\hline 4 \u0026amp; 6 \u0026amp; 0.050 \u0026amp; 117 \u0026amp; 0.975 \\newline \\hline 5 \u0026amp; 3 \u0026amp; 0.025 \u0026amp; 120 \u0026amp; 1.000 \\newline \\hline \\end{array} $$\nQuartiles: $Q_1=1$ drinks, $Q_2=2$ drinks, $Q_3=3$ drinks\n$IQR = 2$ drinks.\nFences: $f_1=-2$ drinks and $f_2=6$ drinks. Thus, there are no outliers.\n$\\bar x=1.875$ drinks, $s^2=1.3094$ drinks$^2$, $s=1.1443$ drinks, $g_1=0.4458$ and $g_2=-0.0043$. As the coefficient of skewness and the coefficient of kurtosis are between -2 and 2 we can assume that the sample comes from a normal population.\nLet $Y$ be the daily sugary drinks drunk by 18-year-old people. Then, $cv_x=0.6103$ and $cv_y=0.5832$. As the coefficient of variation of 18-year-old is a little bit smaller than the one of 16-year-old, the mean of the 18-year-old is a little bit more representative.\nStandard score for 16-year-old: $z(3)=0.9832$\nStandard score for 18-year-old: $z(4)=1.5513$\nAs the standard score of 4 for a 18-years-old is greater than the standard score of 3 for a 16-years-old, 4 drinks for a 18-year-old is relatively higher than 3 drinks for a 16-years-old.\nQuestion 2 The rowan is a species of tree that grows at different altitudes. In order to study how the rowan adapts to different habitats, we have collected a sample of branches of 12 trees at different altitudes in Scotland. In the laboratory, the respiration rate of each branch was observed during the night. The following table shows the altitude (in meters) of each branch and the respiration rate (in nl of O$_2$ per hour per mg of weight).\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Altitude} \u0026amp; 90 \u0026amp; 230 \u0026amp; 240 \u0026amp; 260 \u0026amp; 330 \u0026amp; 400 \u0026amp; 410 \u0026amp; 550 \u0026amp; 590 \u0026amp; 610 \u0026amp; 700 \u0026amp; 790 \\newline \\mbox{Respiration rate} \u0026amp; 110 \u0026amp; 200 \u0026amp; 130 \u0026amp; 150 \u0026amp; 180 \u0026amp; 160 \u0026amp; 230 \u0026amp; 180 \u0026amp; 230 \u0026amp; 260 \u0026amp; 320 \u0026amp; 370 \\newline \\hline \\end{array} $$\nIs there a linear relationship between altitude and respiration rate of rowan. How is this relationship?\nHow much increases the respiration rate per each increment of 100 meters in the altitude?\nWhat respiration rate is expected for a rowan at 500 meters of altitude? And for a rowan at the sea level?\nAre these predictions reliable?\nUse the following sums for the computations ($X$=Altitude and $Y$=Respiration rate): $\\sum x_i=5200$ m, $\\sum y_i=2520$ nl/(mg$\\cdot$ h), $\\sum x_i^2=2760000$ (m)$^2$, $\\sum y_i^2=594600$ nl/(mg$\\cdot$ h)$^2$ and $\\sum x_iy_j=1253400$ m$\\cdot$ nl/(mg$\\cdot$ h).\nSolution $\\bar x=433.3333$ m, $s_x^2=42222.2222$ (m)$^2$,\n$\\bar y=210$ nl/(mg$\\cdot$ h), $s_y^2=5450$ nl/(mg$\\cdot$ h)$^2$,\n$s_{xy}=13450$ m $\\cdot$ nl/(mg$\\cdot$ h).\nAs the covariance is positive, there is a direct linear relation between the altitude and the respiration rate.\nThe respiration rate increases $b_{yx} = 0.3186$ nl/(mg$\\cdot$h) per meter, or what is the same, $31.8553$ nl/(mg$\\cdot$h) per 100 meters.\nRegression line of the respiration rate on the altitude: $y=71.9605 + 0.3186x$.\nPredictions: $y(500) = 231.2368$ nl/(mg$\\cdot$ h) and $y(0) = 71.9605$ nl/(mg$\\cdot$ h).\n$r^2 = 0.7862$. As the coefficient of determination is not far from 1, the regression line fits well, but the sample size is too small to have reliable predictions. In addition, the prediction for the sea level is less reliable because it falls outside the range of values of the sample.\nQuestion 3 The relationship between basal metabolic rate and age is being studied in a sample of healthy men and the following regression lines have been obtained\nCompute the means of the basal metabolic rate and the age.\nHow is the fit of the two lines?\nSolution Let $X$ be the age and $Y$ the basal metabolic rate.\n$\\bar x=40$ and $\\bar y=40$.\n$b_{yx}=-0.1$, $b_{xy}=-5$ and $r^2 = 0.5$, thus the fit of the regression lines moderate.\n","date":1635120000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1635284889,"objectID":"11b6ecf127b258e5427cf2d1a9d17ad2","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-10-25/","publishdate":"2021-10-25T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-10-25/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 25, 2021\nQuestion 1 The table below shows the number of daily sugary drinks drunk by a sample of 16-years-old people.\n$$ \\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Drinks} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; \u0026amp; 0.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-10-25","type":"book"},{"authors":null,"categories":["R"],"content":" Application access Table of Contents Application access What is Rubrics? How to cite Rubrics? What is Rubrics? Rubrics is a Shiny web application for assessment with rubrics.\nThe application allows:\nCreate a rubric for an exam or test. Load the list of students from a csv file and generate a template for the assessment. Load the assessment from the template. Generate a list with the students grades. Generate a descriptive summary of the distribution of grades. Generate a personalized report with the assessment of each student. The video below contains a more detailed presentation of this application (in Spanish):\nHow to cite Rubrics? Anemone, Gloria., Sánchez-Alberca, Alfredo. (2021). Rubrics (version 1.0) [software]. Obtained from: https://aprendeconalf.es/en/project/rubrics.\n","date":1630454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1634551798,"objectID":"f9a37edad77f43f8a27a71e21ea67b2c","permalink":"/en/project/rubrics/","publishdate":"2021-09-01T00:00:00Z","relpermalink":"/en/project/rubrics/","section":"project","summary":"A web app for assessment with rubrics.","tags":["Rybrics","Software","Shiny"],"title":"Rubrics","type":"project"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 7, 2021\nDescriptive Statistics and Regression Question 1 To study the effectiveness of a new treatment for the polymyalgia rheumatica a sample of patients with polymyalgia was drawn and they were divided into two groups. The first group received the new treatment while the second one received a placebo. After a year following the treatment they filled out a survey. The chart below shows the distribution of the survey score of the two groups of patients (the greater the score the better the treatment).\nConstruct the frequency table of the scores for the placebo group and plot the ogive.\nCompute the interquartile range of the scores for the placebo group.\nAre there outliers in the placebo group?\nIn which group the score mean represents better?\nWhich distribution is more normal regarding the kurtosis?\nWhich score is relatively better, a score of 5 in the placebo group or a score of 6 in the treatment group?\nUse the following sums for the computations:\nPlacebo: $\\sum x_i=125.5$, $\\sum x_i^2=680.25$, $\\sum (x_i-\\bar x)^3=27.11$ and $\\sum (x_i-\\bar x)^4=253.27$.\nTreatment: $\\sum x_i=131$, $\\sum x_i^2=887$ $\\sum (x_i-\\bar x)^3=2.66$ and $\\sum (x_i-\\bar x)^4=88.03$.\nShow solution $$\\begin{array}{lrrrr} \\mbox{Score} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i \\newline \\hline [2,3] \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 1 \u0026amp; 0.0 \\newline (3,4] \u0026amp; 6 \u0026amp; 0.24 \u0026amp; 7 \u0026amp; 0.3 \\newline (4,5] \u0026amp; 7 \u0026amp; 0.28 \u0026amp; 14 \u0026amp; 0.6 \\newline (5,6] \u0026amp; 3 \u0026amp; 0.12 \u0026amp; 17 \u0026amp; 0.7 \\newline (6,7] \u0026amp; 7 \u0026amp; 0.28 \u0026amp; 24 \u0026amp; 1.0 \\newline (7,8] \u0026amp; 0 \u0026amp; 0.00 \u0026amp; 24 \u0026amp; 1.0 \\newline (8,9] \u0026amp; 1 \u0026amp; 0.04 \u0026amp; 25 \u0026amp; 1.0 \\newline \\hline \\end{array} $$ $Q_1= 3.875$, $Q_3= 6.25$ and $IQR=2.375$.\n$f_1 = 0.3125$ and $f_2=9.8125$. Thus, there are no outliers in the placebo sample because all the values fall between the fences.\nPlacebo: $\\bar x=5.02$, $s^2=2.0096$, $s=1.4176$ and $cv=0.2824$.\nTreatment: $\\bar x=6.55$, $s^2=1.4475$, $s=1.2031$ and $cv=0.1837$.\nPlacebo: $g_2=-0.4914$. Treatment: $g_2=-0.8992$. Thus, the distribution of the placebo group is more normal as the coef. of kurtosis is closer to 0.\nStandard score for the placebo: $z(5)=-0.0141$.\nStandard score for the treatment: $z(6)=-0.4571$.\nAs the standard score of $5$ in the placebo group is greater than the standard score of $6$ in the treatment group, a score of 5 in the placebo group is better.\nQuestion 2 We have applied different doses of an antibiotic to a culture of bacteria. The table below shows the number of residual bacteria corresponding to the different doses.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Dose ($\\mu$g)} \u0026amp; 0.2 \u0026amp; 0.7 \u0026amp; 1 \u0026amp; 1.5 \u0026amp; 2 \u0026amp; 2.4 \u0026amp; 2.8 \u0026amp; 3 \\newline \\mbox{Bacteria} \u0026amp; 40 \u0026amp; 32 \u0026amp; 28 \u0026amp; 20 \u0026amp; 18 \u0026amp; 15 \u0026amp; 12 \u0026amp; 11 \\newline \\hline \\end{array} $$\nWhich regression model explains better the number of residual bacteria as a function of the antibiotic dose, the linear or the exponential?\nUse the best of the two previous regression models to predict the number of residual bacteria for an antibiotic dose of 3.5 $\\mu$g. Is this prediction reliable?\nAccording to the linear regression model, what is the expected decrease in the number of residual bacteria per each $\\mu$g more of antibiotic?\nUse the following sums for the computations ($X$=Antibiotic dose and $Y$=Number of bacteria):\n$\\sum x_i=13.6$ $\\mu$g, $\\sum \\log(x_i)=2.1362$ $\\log(\\mbox{$\\mu$g})$, $\\sum y_j=176$ bacteria, $\\sum \\log(y_j)=23.9638$ $\\log(\\mbox{bacteria})$,\n$\\sum x_i^2=30.38$ $\\mu$g$^2$, $\\sum \\log(x_i)^2=6.3959$ $\\log(\\mbox{$\\mu$g})^2$, $\\sum y_j^2=4622$ bacteria$^2$, $\\sum \\log(y_j)^2=73.3096$ $\\log(\\mbox{bacteria})^2$,\n$\\sum x_iy_j=227$ $\\mu$g$\\cdot$bacteria, $\\sum x_i\\log(y_j)=37.4211$ $\\mu$g$\\cdot\\log(\\mbox{bacteria})$, $\\sum \\log(x_i)y_j=-17.633$ $\\log(\\mbox{$\\mu$g})$bacteria, $\\sum \\log(x_i)\\log(y_j)=3.6086$ $\\log(\\mbox{$\\mu$g})\\log(\\mbox{bacteria})$.\nShow solution $\\overline{x}=1.7$ $\\mu$g, $s_x^2=0.9075$ $\\mu$g$^2$.\n$\\bar y=22$ bacteria, $s_y^2=93.75$ bacteria$^2$.\n$s_{xy}=-9.025$ $\\mu$g$\\cdot$bacteria.\nLinear coefficient of determination $r^2 = 0.9574$.\n$\\overline{\\log(y)}=2.9955$ log(bacteria), $s_{\\log(y)}^2=0.1908$ log(bacteria)$^2$.\n$s_{x\\log(y)}=-0.4147$ $\\mu$g$\\cdot$ log(bacteria).\nExponential coefficient of determination $r^2 = 0.9928$.\nThus, the exponential model explains better the number of residual bacteria as a function of the antibiotic dose because the exponential coef. of determination is greater.\nExponential regression model: $y=e^{3.7723-0.4569x}$.\nPrediction: $y(3.5)=8.7845$ bacteria.\nAlthough the coef. of determination is close to 1, the this prediction is not reliable because the sample size is very small.\n$b_{yx}=-9.9449$, therefore the number of bacteria decreases $9.9449$ per each $\\mu$g more of antibiotic.\nProbability and Random Variables Question 3 In women, the shoulder circumference follows a normal distribution with mean 98 cm and standard deviation 5 cm.\nCompute the percentage of women in the population with a shoulder circumference between 95 and 105 cm.\nAbove what value are the 5% of women with a highest shoulder circumference?\nCompute the probability that in a sample of 50 women there is at least 2 with a shoulder circumference less than 90 cm.\nShow solution Let $X$ be the shoulder circumference, then $X\\sim N(98, 5)$.\n$P(95\\leq X\\leq 105) = 0.645$, that is $6.45%$.\n$P_{95} = 106.22$ cm.\nLet $Y$ be the number of women with a shoulder circumference less than 90 cm in a sample of 50 women. Then, $Y\\sim B(50, 0.0548) \\approx P(2.74)$, and $P(Y\\geq 2) = 0.7585$.\nQuestion 4 It has been observed that a company of components for physiotherapy machines produces 12 defective components every 300 hours on average.\nWhat is the probability of producing more than 2 defective components in 100 hours?\nWhat is the probability of producing at most one defective component in 50 hours?\nIf there are 7 companies in Spain that produce these components, and assuming that all of them produce the same number of defective components on average, compute the probability that at least one company produces more than 3 defective components in 50 hours.\nShow solution Let $X$ be the number of defective components in 100 hours, then $X\\sim P(4)$, and $P(X\u0026gt;2) = 0.7619$.\nLet $Y$ be the number of defective components in 50 hours, then $X\\sim P(2)$, and $P(X\\leq 1) = 0.406$.\nLet $Z$ be the number of companies that produce more than 3 defective components in 50 hours in a sample of 7 companies, then $Z\\sim B(7, 0.1429)$, and $P(Y\\geq 1) = 0.6601$.\nQuestion 5 We want to study the risk for a new vaccine to cause thrombi compared with a traditional vaccine. After applying the new vaccine to 1000 persons and the traditional vaccine to 3000 persons, we observed 30 persons with thrombi in the new vaccine group and 42 persons with thrombi in the traditional vaccine group.\nCompute the relative risk of suffering thrombi with the new vaccine and interpret it.\nCompute the odds ratio of suffering thrombi with the new vaccine and interpret it.\nWhich association measure is more reliable?\nIn a random experiment we applied both vaccines (in different moments) to a sample and we observed that 4% of persons suffered some thrombi (due to the new vaccine or to the traditional vaccine). Compute the probability of suffering thrombi with the new vaccine and no with the traditional one.\nAre the events corresponding to suffering thrombi with the new vaccine and the traditional vaccine independent?\nShow solution Let $T$ be the event of suffering thrombi.\n$RR(T)=2.1429$. Thus, the risk of suffering thrombi with the new vaccine is more than the double that with traditional vaccine.\n$OR(T)=2.1782$. Thus, the odds of suffering thrombi with the new vaccine is more than the double that with traditional vaccine.\nBoth measures are reliable because the study is prospective and we can estimate the incidence, but the relative risk is easier to interpret.\nLet $T_n$ and $T_t$ the events of suffering thrombi with the new and the traditional vaccines, respectively. $P(T_n\\cap \\overline{T_t}) = 0.026$.\n$P(T_t|T_n) = 0.1333 \\neq P(T_t) = 0.014$, thus the events are dependent.\n","date":1623024000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1623953052,"objectID":"7448cde3ef53e43e21e1233c2c8dd253","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-06-07/","publishdate":"2021-06-07T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-06-07/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 7, 2021\nDescriptive Statistics and Regression Question 1 To study the effectiveness of a new treatment for the polymyalgia rheumatica a sample of patients with polymyalgia was drawn and they were divided into two groups.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-06-07","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: May 5, 2021\nProbability and random variables Question 1 The average number of injuries in an international tennis tournament is 2.\nCompute the probability that in an international tennis tournament there are more than 2 injuries.\nIf a tennis circuit has 6 international tournaments, what is the probability that there are no injuries in some of them?\nShow solution Let $X$ be the number of injuries in a tournament, then $X\\sim P(2)$ and $P(X\u0026gt;2)=0.3233$.\nLet $Y$ be the number of tournaments in the tennis circuit with no injuries, then $Y\\sim B(6,0.1353)$ and $P(Y\u0026gt;0)=0.5821$.\nQuestion 2 The tables below corresponds to two tests $A$ and $B$ to detect an injury that have been applied to the same sample.\n$$ \\begin{array}{lcc} \\hline \\mbox{Test A} \u0026amp; \\mbox{Injury} \u0026amp; \\mbox{No injury} \\newline \\mbox{Outcome } + \u0026amp; 87 \u0026amp; 14 \\newline \\mbox{Outcome }- \u0026amp; 33 \u0026amp; 866 \\newline \\hline \\end{array} \\qquad \\begin{array}{lcc} \\hline \\mbox{Test B}\u0026amp; \\mbox{Injury} \u0026amp; \\mbox{No injury} \\newline \\mbox{Outcome }+ \u0026amp; 104 \u0026amp; 115 \\newline \\mbox{Outcome }- \u0026amp; 16 \u0026amp; 765 \\newline \\hline \\end{array} $$\nWhich test is more sensitive? Which one is more specific?\nAccording to the predictive values, which test is better to diagnose the injury? Which one is better to rule out the injury?\nAssuming that both tests are independent, what is the probability of getting a right diagnose with both tests if we apply both tests to a healthy person?\nAssuming that both tests are independent, what is the probability of getting at least a positive outcome if we apply both tests to a random person?\nShow solution Let $D$ the event of suffering the injury, and $+$ and $-$ the events of getting a positive and a negative outcome in the test, respectively.\nTest $A$: sen = $0.725$ and spe = $0.9841$.\nTest $B$: sen = $0.8667$ and spe = $0.8693$.\nThus, test $A$ is more specific and test $B$ is more sensitive.\nTest $A$: PPV = $0.8614$ and NPV = $0.9633$.\nTest $B$: PPV = $0.4749$ and NPV = $0.9795$.\nThus, test $A$ is better to diagnose the injury and test $B$ is better to rule out the injury.\n$P(-_A\\cap -_B | \\overline{D}) = 0.8555$.\n$P(+_A\\cup +_B) = 0.2979$.\nQuestion 3 A study tries to determine the effect of a low fat diet in the lifetime of rats. The rats where divided into two groups, one with a normal diet and another with a low fat diet. It is assumed that the lifetimes of both groups are normally distributed with the same variance but different mean. If 20% of rats with normal diet lived more than 12 months, 5% less than 8 months, and 85% of rats with low fat diet lived more than 11 months,\nCompute the means and the standard deviation of the lifetime of rats following a normal diet and a low fat diet?\nIf 40% of the rats were under a normal diet, and 60% of rats under a low fat diet, what is the probability that a random rat die before 9 months?\nShow solution Let $X$ be the life time of a random rat, and let $X_1$ and $X_2$ be the lifetime of rats with a normal diet and a low fat diet respectively,\n$\\mu_1=10.6461$ months, $\\mu_2=12.6673$ months and $s=1.6087$ months.\n$P(X\u0026lt;9)=0.068$.\n","date":1620172800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1620723858,"objectID":"82950554557ef075738b8ca9b4d2516b","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-05-05/","publishdate":"2021-05-05T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-05-05/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 5, 2021\nProbability and random variables Question 1 The average number of injuries in an international tennis tournament is 2.\nCompute the probability that in an international tennis tournament there are more than 2 injuries.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-05-05","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: March 17, 2021\nDescriptive Statistics and Regression Question 1 The chart below shows the distribution of the number of subjects passed in a sample of first year students of a degree.\nDraw the box and whiskers plot and interpret it.\nCompute the central tendency statistics and interpret them.\nHow is the asymmetry of the distribution? And the kurtosis? Can we assume that the sample comes from a normal population?\nIf the mean of subjects passed in the second year was 5.5 and the variance was 2, is the mean of the subjects passed in the first year more or less representative than the one of the second year?\nWhich student is better, a first year student that pass 7 subjects or a second year student that pass 6 subjects?\nUse the following sums for the computations: $\\sum x_i=478$ subjects, $\\sum x_i^2=3036$ subjects$^2$, $\\sum (x_i-\\bar x)^3=29.5$ subjects$^3$ and $\\sum (x_i-\\bar x)^4=1226.27$ subjects$^4$.\nShow solution Quartiles: $Q_1=5$ subjects, $Q_2=6$ subjects, $Q_3=7$ subjects. $IQR = 2$ subjects. Fences: $f_1=2$ subjects and $f_2=10$ subjects. 50% of central data fall between 5 and 7 subjects, that is a moderate dispersion. The are no outliers and the right whisker is a little bit longer than the left one, so the distribution is a little bit right skew but almost normal.\n$\\bar x=5.975$ subjects, $Me=6$ subjects and $Mo=6$ subjects. They are very close, and that means that the distribution is normal.\n$s^2=2.2494$ (subjects)$^2$, $s=1.4998$ subjects and $g_1=0.1093$, so that the distribution is slightly skewed to the right.\n$g_2=0.0295$, so that the distribution is a little bit more peaked than a Gauss bell.\nWe can assume that the sample comes from a normal population as both, the coefficient of skewness and the coefficient of kurtosis, are between -2 and 2.\nLet $Y$ the number of subjects passed the second year. Then, $cv_x=0.251$ and $cv_y=0.2571$. As the coefficient of variation of the first year is a little bit smaller than the one of the second year, the mean of the first year is a little bit more representative.\nStandard score for the first year: $z(7)=0.6834$.\nStandard score for the second year: $z(6)=0.3536$.\nAs the standard score of $7$ the first year is greater than the standard score of $6$ the second year, the firs year student is better.\nQuestion 2 The table below shows the number of days of rehabilitation for a knee injury, and the knee flexion angle in degrees after those days.\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Days} \u0026amp; 10 \u0026amp; 15 \u0026amp; 20 \u0026amp; 25 \u0026amp; 30 \u0026amp; 35 \u0026amp; 40 \u0026amp; 45 \u0026amp; 50 \\newline \\mbox{Angle} \u0026amp; 45 \u0026amp; 58 \u0026amp; 65 \u0026amp; 75 \u0026amp; 82 \u0026amp; 88 \u0026amp; 91 \u0026amp; 93 \u0026amp; 94 \\newline \\hline \\end{array} $$\nCompute the covariance of the number of days of rehabilitation and the knee flexion angle, and interpret it.\nAccording to the regression line, how many degrees increases or decreases the knee flexion angle per day of rehabilitation?\nAccording to the logarithmic model, what is the expected number of degrees of the knee flexion angle after 32 days? Is this prediction more or less reliable than the prediction of the linear model?\nAccording to the exponential model, how many days of rehabilitation are required to get a knee flexion angle of 120degrees. Is this prediction reliable?\nUse the following sums for the computations ($X$=Days of rehabilitation and $Y$=knee flexion angle):\n$\\sum x_i=270$ days, $\\sum \\log(x_i)=29.5894$ $\\log(\\mbox{days})$,$\\sum y_j=691$ degrees, $\\sum \\log(y_j)=38.8298$$\\log(\\mbox{degrees})$,\n$\\sum x_i^2=9600$ days$^2$, $\\sum \\log(x_i)^2=99.5821$$\\log(\\mbox{days})^2$, $\\sum y_j^2=55473$ degrees$^2$,$\\sum \\log(y_j)^2=168.0436$ $\\log(\\mbox{degrees})^2$,\n$\\sum x_iy_j=22560$ days$\\cdot$degrees,$\\sum x_i\\log(y_j)=1190.8727$ days$\\cdot\\log(\\mbox{degrees})$,$\\sum \\log(x_i)y_j=2346.0281$ $\\log(\\mbox{days})$degrees,$\\sum \\log(x_i)\\log(y_j)=128.738$$\\log(\\mbox{days})\\log(\\mbox{degrees})$.\nShow solution $\\overline{x}=30$ days, $s_x^2=166.6667$ days$^2$.\n$\\bar y=76.7778$ degrees, $s_y^2=268.8395$ degrees$^2$.\n$s_{xy}=203.3333$ days$\\cdot$degrees.\nAs the covariance is positive, there is a direct linear relation between the number of days of rehabilitation and the knee flexion angle.\n$b_{yx}=1.22$ degrees/day, therefore the knee flexion angle will increase$1.22$ degrees per day of rehabilitation.\n$\\overline{\\log(x)}=3.2877$ log(days), $s_{\\log(x)}^2=0.2557$log(days)$^2$ and $s_{\\log(x)y}=8.247$ log(days)degrees.\nLogarithmic regression model: $y=-29.2741+32.2571\\log(x)$.\nPrediction: $y(32)=82.5205$ degrees.\nThe logarithmic coefficient of determination is $0.9895$ and the linear coefficient of determination is $0.9227$. Thus, the prediction with the logarithmic model is more reliable as the coefficient of determination of the logarithmic model is greater.\nExponential regression model: $x=e^{0.9324+0.0307y}$.\nPrediction: $x(120)=100.8475$ days.\nThis prediction is not reliable as 120 degrees falls far away of the range of values observed in the sample for the knee flexion angle.\n","date":1615939200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616317723,"objectID":"8d60a06bc0b6ed3414eda4e966b2d502","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-03-17/","publishdate":"2021-03-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2021-03-17/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 17, 2021\nDescriptive Statistics and Regression Question 1 The chart below shows the distribution of the number of subjects passed in a sample of first year students of a degree.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2021-03-17","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 18, 2021\nQuestion 1 A drug is administered intravenously at a speed of 15 mg/hour. At the same time, the body methabolizes the drug at a rate of 80% of the amount in the body per hour.\nIf the drug is administered continuously, what will the maximum amount of drug in the body be? Assume that there was no drug in the body at the beginning of the process.\nIf administration is stopped when the amount administered is 150 mg, how long from that point will it take for the patient to have only 10 mg of drug in the body?\nSolution Let $x(t)$ be the amount of drug in the body at any time $t$.\nDifferential equation: $x\u0026rsquo;=15-0.8x$. Initial condition $x(0)=0$. Particular solution: $x(t)=18.75-18.75e^{-0.8t}$ and the maximum amount of drug in the body will be 18.75 mg.\nDifferential equation: $x\u0026rsquo;=-0.8x$. Initial condition $x(0)=18.74$. Particular solution: $x(t)=18.74e^{-0.8t}$ and the time required to have 10 mg of drug in the body will be $0.7851$ hours.\nResolución Question 2 The function $T(x,y)=\\ln(3xy+2x^2-y)$ gives the temperature of the surface of a mountain at latitude $x$ and longitude $y$. Some mountaineers are lost at position $(1,2)$ and are at risk of freezing to death.\nIn which direction should they move to avoid the risk of freezing as fast as possible?\nIf they are in the wrong direction and move so that the longitude decreases half of the increase of the latitude, will the risk of hypothermia increase or decrease?\nIn which direction should they move to keep constant the temperature?\nSolution $\\nabla T(1,2)=\\frac{1}{3}(5,1)$.\nLet $\\mathbf{u}$ the vector $(1,-1/2)$, then $T\u0026rsquo;_{\\mathbf{u}}(1,2) = \\frac{3}{\\sqrt{5}}$ ºC.\nAlong the direction of the vector $(1,-5)$.\nResolución Question 3 A beach ball has a volumen of 50 dm$^3$ at the time when we start to pump air into it at a rate of 2 dm$^3$/min.\nWhat is the speed at which the radius is changing?\nAbout when will the surface of the ball be twice its initial value?\nRemark: The volume of a sphere is $V(r)=\\frac{4}{3}\\pi r^3$ and the surface is $S(r)=4\\pi r^2$.\nSolution $\\dfrac{dr}{dt}=0.0305$ dm/s.\nUsing the linear approximation $dt = S\u0026rsquo;/dS=37.5013$ seconds approximately.\nResolución ","date":1610928000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1612727901,"objectID":"c844d2814ed646f5b268eee41227a75a","permalink":"/en/teaching/calculus/exams/pharmacy-2021-01-18/","publishdate":"2021-01-18T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2021-01-18/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 18, 2021\nQuestion 1 A drug is administered intravenously at a speed of 15 mg/hour. At the same time, the body methabolizes the drug at a rate of 80% of the amount in the body per hour.","tags":["Exam"],"title":"Pharmacy exam 2021-01-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 18, 2021\nQuestion 1 The table below contains the differences between the grades in the final school exam and the entrance exam in a sample of public high schools ($X$) and private high schools ($Y$):\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Public schools} \u0026amp; -1.2 \u0026amp; -0.7 \u0026amp; -0.4 \u0026amp; -0.9 \u0026amp; -1.6 \u0026amp; 0.5 \u0026amp; 0.2 \u0026amp; -1.8 \u0026amp; 0.8\\newline\n\\mbox{Private schools} \u0026amp; -2.1 \u0026amp; -0.5 \u0026amp; -0.7 \u0026amp; -1.9 \u0026amp; 0.2 \u0026amp; -2.8 \u0026amp; -1\\newline\n\\hline \\end{array} $$\nWhich of the following box plots corresponds to each variable? Compare the central dispersion of the two variables according to the box plots. In which variable is smaller the median? In which type of schools is more representative the mean of grades?\nIn which type of schools is more symmetric the distribution of grades?\nIn which type of schools is more peaked the distribution of grades?\nWhich difference is relatively smaller, $-0.5$ points in a public high school or $-1$ points in a private high school?\nUse the following sums for the computations:\nPublic: $\\sum x_i=-5.1$, $\\sum x_i^2=9.63$, $\\sum (x_i-\\bar x)^3=0.95$ and $\\sum (x_i-\\bar x)^4=8.76$.\nPrivate: $\\sum y_i=-8.8$, $\\sum y_i^2=17.64$, $\\sum (y_i-\\bar y)^3=-0.82$ and $\\sum (y_i-\\bar y)^4=11.28$.\nSolution The box plot 1 corresponds to private schools and the box plot 2 to public schools. The central dispersion is pretty similar in both variables. The median is smaller in private schools.\nPublic schools: $\\bar x=-0.5667$ , $s^2=0.7489$ , $s=0.8654$ and $cv=1.5271$.\nPrivate schools: $\\bar y=-1.2571$ , $s^2=0.9396$ , $s=0.9693$ and $cv=0.7711$.\nThus, the mean of the grade is more representative in private schools.\n$g_{1x}=0.1626$ and $g_{1y}=-0.1285$. Thus, the distribution of grades in private schools is more symmetric as the coefficient of skewness is closer to 0.\n$g_{2x}=-1.2651$ and $g_{2y}=-1.1748$. Thus, the distribution of grades in private schools is more peaked.\nPublic schools: $z(-0.5)=0.077$.\nPrivate schools: $z(-1)=0.2653$.\nThus, a difference of grades -0.5 in a public schools is relatively smaller than a difference of -1 in a private school.\nQuestion 2 An auditor is studying the relationship between the salary and the number of absences of a hospital warden. The following table shows the salary in thousands of euros ($X$) and the annual average of absences with that salary ($Y$).\n$$ \\begin{array}{lrrrrrrrrr} \\hline \\mbox{Salary} \u0026amp; 20.0 \u0026amp; 22.5 \u0026amp; 25 \u0026amp; 27.5 \u0026amp; 30.0 \u0026amp; 32.5 \u0026amp; 35.0 \u0026amp; 37.5 \u0026amp; 40.0 \\newline \\mbox{Absences} \u0026amp; 2.3 \u0026amp; 2.0 \u0026amp; 2 \u0026amp; 1.8 \u0026amp; 2.2 \u0026amp; 1.5 \u0026amp; 1.2 \u0026amp; 1.3 \u0026amp; 0.6 \\newline \\hline \\end{array} $$\nCompute the regression line that best explains the absences as a function of the salary.\nWhat is the expected number of absences that will have a warden with a salary of 29000€? Is this prediction reliable?\nHow much will the number of absences increase or decrease for every increment of 1000€ in the salary?\nUse the following sums for the computations:\n$\\sum x_i=270$ $10^3$€, $\\sum y_i=14.9$ absences,\n$\\sum x_i^2=8475$ ($10^3$€)$^2$, $\\sum y_i^2=27.11$ absences$^2$,\n$\\sum x_iy_j=420$ $10^3$€ absences.\nSolution $\\bar x=30$ $10^3$€, $s_x^2=41.6667$ ($10^3$€)$^2$,\n$\\bar y=1.6556$ absences, $s_y^2=0.2714$ absences$^2$,\n$s_{xy}=-3$ $10^3$€ absences.\nRegression line of absences on salary: $y=3.8156-0.072x$.\n$y(29) = 1.7276$ absences.\n$r^2 = 0.796$, thus the model fits well as the coefficient of determination is not far from 1, but the sample size is too small to be reliable the prediction.\nThe number of absences will decrease 0.072 for every increment of 1000€ in the salary.\nQuestion 3 In a regression study it is known that the regression line of $Y$ on $X$ is $y+2x-10=0$ and the regression line of $X$ on $Y$ is $y+3x-14=0$.\nCompute the means of $X$ and $Y$.\nCompute the linear correlation coefficient and interpret it.\nSolution $\\bar x=4$ and $\\bar y=2$.\n$r=-0.8165$. The linear correlation coefficient is near -1 so there is a strong inverse relation between $X$ and $Y$.\nQuestion 4 A test to detect prostate cancer produces 1% of false positives and 0.2% false negatives. It is known that 1 in 400 males suffer this type of cancer.\nCompute the sensitivity and the specificity of the test.\nIf a male got a positive outcome in the test, what is the chance of developing cancer?\nCompute and interpret the negative predictive value.\nIs this test better to predict or to rule out the cancer?\nTo study whether there is an association between the practice of sports and this type of cancer, a sample of 1000 males was drawn, of which 700 practised sports, and it was observed that there were 2 males with cancer in the group of males who practised sports, and there were 3 males with cancer in the group of males who did not practice sports. Compute the relative risk and the odds ratio and interpret them.\nSolution Let $D$ the event corresponding to suffering prostate cancer and $+$ and $-$ the events corresponding to get a positive and a negative outcome respectively.\nThe sensitivity is $P(+|D) = 0.2$ and specificity $P(-|\\overline D) = 0.99$.\nPositive predictive value $P(D|+) = 0.0476$.\nNegative predictive value $P(\\overline D|-) = 0.998$.\nAs the positive predictive value is smaller than the negative predictive value, this test is better to rule out the disease. In fact, we can not use this test to detect the prostate cancer because the positive predictive value is less than 0.5.\n$RR(D)=0.2857$ and $OR(D)=0.2837$. Thus, there is an association between the practice of sports and the prostate cancer and the risks and the odds of developing cancer is almost one fourth smaller if the male practice sports.\nResolución Question 5 The probability that a child of a mother with the color-blind gene and a father without the color-blind gene is a color-blind male is $0.25$. It is also known that in a population there is one color-blind male for every 5000 males.\nIf this couple has 5 children, what is the probability that at most 2 of them are color-blind males?\nIf this couple has 5 children, and the gender of the children is equiprobable, what is the probability that 3 or more are females?\nIn a random sample of 10000 males of this population, what is the probability that more than 3 are color-blind males?\nSolution Let $X$ be the number of color-blind sons in a sample of 5 children, then $X\\sim B(5, 0.25)$ and $P(X\\leq 2)=0.8965$.\nLet $Y$ be the number of girls in a sample of 5 children, then $Y\\sim B(5, 0.5)$ and $P(Y\\geq 3)=0.5$.\nLet $Z$ be the number of color-blind males in a sample of 10000 males, then $Z\\sim B(10000, 0.0002)\\approx P(2)$ and $P(Z\u0026gt;3)=0.1429$.\nResolución Question 6 The primate cranial capacity follows a normal distribution with mean 1200 cm$^3$ and standard deviation 140 cm$^3$.\nCompute the probability that the cranial capacity of a primate is greater than 1400 cm$^3$.\nCompute the probability that the cranial capacity of a primate is exactly than 1400 cm$^3$.\nAbove what cranial capacity will 20% of primates be?\nCompute the interquartile range of the cranial capacity of primates and interpret it.\nSolution Let $X$ be the primate cranial capacity. Then $X\\sim N(1200,140)$.\n$P(X\u0026gt;1400) = 0.0766$.\n$P(X=1400) = 0$.\n$P_{80} = 1317.827$ cm$^3$.\n$Q_1 = 1105.5714$ cm$^3$, $Q_3 = 1294.4286$ cm$^3$ and $IQR = 188.8571$ cm$^3$. Thus the 50% of central data will be concentranted in an interval of width $188.8571$ cm$^3$, that is a small spread.\nResolución ","date":1610928000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1612657121,"objectID":"4c570671d341494644ab5f9fa875dc09","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-01-18/","publishdate":"2021-01-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2021-01-18/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 18, 2021\nQuestion 1 The table below contains the differences between the grades in the final school exam and the entrance exam in a sample of public high schools ($X$) and private high schools ($Y$):","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2021-01-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 23, 2020\nQuestion 1 A test to detect the COVID19 was applied to 850 persons infected by COVID19 with a positive outcome in 800 of them, and it was also applied to 9150 non-infected persons with a positive outcome in 10% of them.\nCompute the sensitivity and the specificity of the test.\nCompute the positive and the negative predictive values and interpret them.\nCompute the probability of a correct diagnostic.\nSolution Let $D$ the event corresponding to suffering COVID19 and $+$ and $-$the events corresponding to get a positive and a negative outcome respectively.\nThe sensitivity is $P(+|D) = 0.9412$ and specificity $P(-|\\overline D) = 0.9$.\nPositive predictive value $P(D|+) = 0.4665$ and negative predictive value $P(\\overline D|-) = 0.994$. As the positive predictive value is less than 0.5 we can not use this test to confirm COVID19, but we can use it to rule it out with a strong confidence since the negative predictive value is pretty close to 1.\n$P(D\\cap +) + P(\\overline D\\cap -) = 0.9035$.\nQuestion 2 A newborn baby affected by Moebius syndrome blinks, on average, twice a minute.\nCompute the probability that a newborn blinks twice in half a minute.\nIn a hospital five children have been born with Moebius syndrome. Compute the probability that at least 3 of them blink in their first minute of life.\nIn which distribution is more representative the mean, in the number of times that a newborn blinks in a minute or in the number of times that a newborn blinks in half a minute?\nSolution Let $X$ be the number of times that a newborn blinks in half a minute, then $X\\sim P(1)$ and $P(X=2)=0.1839$. Let $Y$ be the number of newborns that blink in their first minute of life in a sample of 5 newborns, then $Y\\sim B(5,0.8647)$ and $P(Y\\geq 3)=0.98$. Let $Z$ be the number of times that a newborn blinks in a minute, then $cv_z = 0.7071$ and $cv_x = 1$. Thus, the mean of $Z$ represents better since its coefficient of variation is smaller. Question 3 The prolactin level in pregnant and non-pregnant females follows anormal distribution with different means but with the same variance.When the prolactin levels exceed 15 ng/ml, females secrete milk through their mammary glands. It is known that 95% of pregnant females secrete milk but only 1% of non-pregnant females secret milk.\nIf the median of the prolactin level in pregnant females is 16 ng/ml, what are the means and the standard deviation of the prolactin level in both populations?\nCompute the percentage of pregnant females with a prolactin level between 15.5 and 17 ng/ml.\nCompute the prolactin level such that 20% of pregnant females are above that level.\nSolution Let $X$ and $Y$ be the prolactin levels in pregnant and non-pregnant females respectively.\n$\\mu_x=16$ ng/ml, $\\mu_y=13.5857$ ng/ml and $\\sigma=0.608$ ng/ml.\n$P(15.5\u0026lt;X\u0026lt;17) = 0.7446$, so 74.4583% of pregnant females.\n$P_{80} = 16.5117$ ng/ml.\nQuestion 4 An organism has the same chance of being infected by a virus and a bacteria. At the same time, the probability of being infected by a virus doubles when the organism has been previously infected by a bacteria. On the other hand, the probability of being infected by no pathogen (neither virus nor bacteria) is $0.52$.\nWhat is the probability of being infected by a virus and a bacteria at the same time?\nWhat is the probability of being infected by a bacteria if it has been infected by a virus?\nWhat is the probability of being infected only by a virus?\nAre the events of being infected by a virus an a bacteria independent?\nSolution Let $V$ and $B$ the events corresponding to be infected by a virus and a bacteria respectively.\n$P(V\\cap B) = 0.32$.\n$P(B|V) = 0.8$.\n$P(V\\cap \\overline B) = 0.08$.\nThe events are dependents since $P(V) = 0.4 \\neq 0.8 = P(V|B)$.\n","date":1606089600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1611391437,"objectID":"96467a52325f88fdb9dbe65d260306cc","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-11-23/","publishdate":"2020-11-23T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-11-23/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 23, 2020\nQuestion 1 A test to detect the COVID19 was applied to 850 persons infected by COVID19 with a positive outcome in 800 of them, and it was also applied to 9150 non-infected persons with a positive outcome in 10% of them.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2020-11-23","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 26, 2020\nQuestion 1 The table below shows the daily number of patients hospitalized in a hospital during the month of September.\n$$ \\begin{array}{cr} \\mbox{Patients} \u0026amp; \\mbox{Frequency} \\newline \\hline (10,14] \u0026amp; 6 \\newline (14,18] \u0026amp; 10 \\newline (18,22] \u0026amp; 7 \\newline (22,26] \u0026amp; 6 \\newline (26,30] \u0026amp; 1 \\newline \\hline \\end{array}$$\nStudy the spread of the 50% of central data.\nCompute the mean and study the dispersion with respect to it.\nStudy the normality of the patients distribution.\nIf the mean was 35 patients and the variance 40 patients$^2$ during the month of April, which month had a higher relative variability?\nWhich number of people hospitalized was greater, 20 persons in September or 40 in April?\nUse the following sums for the computations:\n$\\sum x_in_i=544$ patients, $\\sum x_i^2n_i=10464$ patients$^2$, $\\sum (x_i-\\bar x)^3n_i=736.14$ patients$^3$ and $\\sum (x_i-\\bar x)^4n_i = 25367.44$ patients$^4$.\nSolution $Q_1=16$ patients, $Q_3=20$ patients and $IQR=4$ patients. Thus the central dispersion is small.\n$\\bar x=18.1333$ patients, $s^2=19.9822$ patients$^2$, $s=4.4701$ patients and $cv=0.2465$. Thus, the dispersion with respect to the mean is small and the mean represents well.\n$g_1=0.2747$ and $g_2=-0.8823$. As the coefficient of skewness and the coefficient of kurtosis fall between -2 and 2, we can assume that the sample comes from a normal population.\nLet $Y$ be the daily number of patients hospitalized during April. Then, $cv_y=0.1807$. Since the coefficient of variation in September is greater than the one in April, there is a relative higher variability in September.\nSeptember: $z(20)=0.4176$.\nApril: $z(40)=0.7906$.\nThus, 40 patients hospitalized in April is relatively higher than 20 in September as its standard score is greater.\nQuestion 2 The chart below shows the distribution of scores in three subjects.\nWhich subject is more difficult?\nWhich subject has more central dispersion?\nWhich subjects have outliers?\nWhich subject is more asymmetric?\nSolution Subject $Y$ because its scores are smaller.\nSubject $X$ because the box is wider.\nSubject $Z$ because there is a score out of the whiskers.\nSubject $Z$ because the distance from the first quartile to the median (left side of the box) is greater than the distance from the third quartile to the median (right side of the box).\nQuestion 3 In a sample of 10 families with a son older than 20 it has been measured the height of the father ($X$), the mother ($Y$) and the son ($Z$) in centimetres, getting the following results:\n$\\sum x_i=1774$ cm, $\\sum y_i=1630$ cm, $\\sum z_i=1795$ cm,\n$\\sum x_i^2=315300$ cm$^2$, $\\sum y_i^2=266150$ cm$^2$, $\\sum z_i^2=322737$ cm$^2$,\n$\\sum x_iy_j=289364$ cm$^2$, $\\sum x_iz_j=318958$ cm$^2$, $\\sum y_iz_j=292757$ cm$^2$.\nOn which height does the height of the son depend more linearly, the height of the father or the mother?\nUsing the best linear regression model, predict the height of a son with a father 181 cm tall and a mother 163 cm tall.\nAccording to the linear model, how much will increase the height of the son for each centimetre that increases the height of the father? And for each centimetre that increases the height of the mother?\nHow would the reliability of the prediction be if the heights were measured in inches? (An inch is 2.54 cm).\nSolution $\\bar x=177.4$ cm, $s_x^2=59.24$ cm$^2$,\n$\\bar y=163$ cm, $s_y^2=46$ cm$^2$,\n$\\bar z=179.5$ cm, $s_z^2=53.45$ cm$^2$,\n$s_{xz}=52.5$ cm$^2$ and $s_{yz}=17.2$ cm$^2$.\n$r^2_{xz}=0.8705$ and $r^2_{yz}=0.1203$, thus the height of the son depends linearly more on the height of the father since the $r^2_{xz}\u0026gt;r^2_{yz}$.\nRegression line of $Z$ on $X$: $z=22.2836 + 0.8862x$ and $z(181)=182.6904$ cm.\nThe height of the son will increase $0.8862$ cm per cm of the height of the father and $0.3739$ cm per cm of the height of the mother.\nThe reliability of the prediction will be the same, as after applying the same linear transformation to $X$ and $Z$, the variances are multiplied by the square of the slope and the covariance is also multiplied by the square of the slope.\n","date":1603670400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1611391437,"objectID":"0912b77d9377cae59002d5676b9dded0","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-10-26/","publishdate":"2020-10-26T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2020-10-26/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 26, 2020\nQuestion 1 The table below shows the daily number of patients hospitalized in a hospital during the month of September.\n$$ \\begin{array}{cr} \\mbox{Patients} \u0026amp; \\mbox{Frequency} \\newline \\hline (10,14] \u0026amp; 6 \\newline (14,18] \u0026amp; 10 \\newline (18,22] \u0026amp; 7 \\newline (22,26] \u0026amp; 6 \\newline (26,30] \u0026amp; 1 \\newline \\hline \\end{array}$$","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2020-10-26","type":"book"},{"authors":null,"categories":["R"],"content":"Table of Contents What is rkTeaching? Installation Installation on Windows Installation on Mac OS Installation on Linux Statistical procedures Functionality How to cite rkTeaching? What is rkTeaching? rkTeaching is an R package that provides a plugin for the graphical user interface RKWard adding new menus and dialog specially designed for teaching and learning Statistics.\nThis package has been developed and is maintained by Alfredo Sánchez Alberca asalber@ceu.es in the Department of Applied Math and Statistics of the San Pablo CEU of Madrid.\nIf you find out some error or have a suggestion, please, let me know it by email or opening an issue on Github.\nInstallation Installation on Windows For Windows users there is a bundle that include R, RKWard and rkTeaching.\nDownload the last version (R version 4.3, RKWard version 0.8, rkTeaching version 1.3.0)\nDownload the previous version (R version 3.6.2, RKWard version 0.7.1b, rkTeaching version 1.3.0)\nOnce the file is downloaded, all you have to do is to execute it. It will ask for the installation unit and directory. It is recommended to install it on the root of unit C, that ist C:\\. The installation creates a folder RKWard into the installation directory. There, in the bin folder you have to execute the rkward.exe file to start the program.\nThe following video tutorial shows the installation process (in Spanish).\nInstallation on Mac OS To install the software on Mac OS systems, you must take the following steps:\nInstall R. R can be downloaded from the following link https://cran.r-project.org/.\nIt is recommended to install the version 4.3 of R for MacOs. Depending the computer processor you must select the arm64 version for computers with a silicon chip (M1-3) or the x86_64 version for computers with an Intel chip.\nR version 4.3 for MacOs with silicon chip (M1-3) R version 4.3 for MacOs with Intel chip (x86) Install RKWard. RKWard can be downloaded from the web https://rkward.kde.org/.\nYou must select the distribution corresponding to Mac Os ( https://rkward.kde.org/RKWard_on_Mac.html).\nAfter downloading it follow the installation instructions\nIt is important having a version of Mac OX X 10.15 or higher, because RKWard does not work with previous versions.\nIf you get some errors during the installation process, check for possible solutions at ( http://rkward.sourceforge.net/wiki/RKWard_on_Mac#Troubleshooting)\nInstall the packages that rkTeaching depends on. The rkTeaching package depends on several packages that should be installed first. To install this packages you must run RKWard, open the R console and type the following commands:\ninstall.packages(c(\u0026quot;R2HTML\u0026quot;,\u0026quot;car\u0026quot;,\u0026quot;e1071\u0026quot;,\u0026quot;Hmisc\u0026quot;, \u0026quot;ez\u0026quot;, \u0026quot;multcomp\u0026quot;, \u0026quot;psych\u0026quot;, \u0026quot;probs\u0026quot;, \u0026quot;tidyverse\u0026quot;, \u0026quot;knitr\u0026quot;, \u0026quot;kableExtra\u0026quot;, \u0026quot;remotes\u0026quot;)) Install rkTeaching. To install the rkTeaching package you must type the following commands in the R console:\nlibrary(remotes) install_github(\u0026quot;rkward-community/rk.Teaching\u0026quot;) The following video tutorial shows the installation process (only for RKWard version 0.7.0).\nInstallation on Linux To install the software in Linux systems, you must take the following steps:\nInstall R. R can be downloaded from the web https://cran.r-project.org/. You have to select the Linux distribution and follow the instructions there. It is required an R version 3.4 or higher.\nWith Debian based distributions like Ubuntu, you can install R from the command line typing the command:\nsudo apt-get install rbase Install RKWard. RKWard can be downloaded from the web https://rkward.kde.org/. You have to select the Linux distribution and follow the instructions there.\nWith Debian based distributions like Ubuntu, you can install R from the command line typing the command:\nsudo apt-get install rkward Install the packages that rkTeaching depends on. The rkTeaching package depends on several packages that should be installed first. To install this packages you must run RKWard, open the R console and type the following commands:\ninstall.packages(c(\u0026quot;R2HTML\u0026quot;,\u0026quot;car\u0026quot;,\u0026quot;e1071\u0026quot;,\u0026quot;Hmisc\u0026quot;, \u0026quot;ez\u0026quot;, \u0026quot;multcomp\u0026quot;, \u0026quot;psych\u0026quot;, \u0026quot;probs\u0026quot;, \u0026quot;tidyverse\u0026quot;, \u0026quot;knitr\u0026quot;, \u0026quot;kableExtra\u0026quot;, \u0026quot;remotes\u0026quot;)) Install rkTeaching. To install the rkTeaching package you must type the following commands in the R console:\nlibrary(remotes) install_github(\u0026quot;rkward-community/rk.Teaching\u0026quot;) The following video tutorial shows the installation process (in Spanish).\nStatistical procedures Once installed a new menu Teaching will appear in RKWard with the following statistical procedures:\nData manipulation: Fiter data Calculate variable Recoding variable Weight data Frequency distributions: Frequency tabulation Bidimensional frequency tabulation Plots: Bar chart Histogram Pie chart Box plot Means chart Interaction chart Line chart Scatterplot Scatterplot matrix Descriptive statistics Statistics Regression: Correlation Linear Regression Non linear regression Regression model comparison Regression prediction Parametric tests: Means: T test for one sample T test for two independent samples T test for two paired samples ANOVA Sample size calculation for mean estimation Variances: Fisher test for two samples Levene test for multiple samples Proportions: Test for one proportion Test for two proportions Sample size calculation for proportion estimation Non parametric tests: Normality tests: Shapiro-Wilk, Kolmogorov U Mann-Whitney test Wilcoxon test Friedman test Kruskal-Wallis test Chi-square test Concordance Intraclass correlation coefficient Cohen\u0026rsquo;s kappa Probability: Random games: Coins Dice Cards Urn Build probability space Combine probability spaces Repeat probability space Calculate probability Probability distributions Discrete: Binomial Geometric Hypergeometric Poisson Continous: Uniform Normal Chi-square Student\u0026rsquo;s T Fisher\u0026rsquo;s F Simulations: Law of rare events Functionality Menus and dialogs specially designed to easy the learning, ruling out non-common options to get an simplified and intuitive interface.\nAll the dialogs have a wizard that guide the user step by step through the statistical procedure. HTML output tha presents the results of the analysis in a clear and concise way. Charts based in the modern ggplot2 package. Computation formulas and details available for some statistical procedures. rkTeaching is maintained by asalber.\nHow to cite rkTeaching? Sánchez-Alberca, A. (2024). rkTeaching (version 1.4) [software]. Get from: http://aprendeconalf.es/projects/rkteaching.\n","date":1598918400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1729805100,"objectID":"91a88147a2d56cb2aead87000b360f2e","permalink":"/en/project/rkteaching/","publishdate":"2020-09-01T00:00:00Z","relpermalink":"/en/project/rkteaching/","section":"project","summary":"An R package for teaching and learning Statistics","tags":["RKWard","rkTeaching","Software"],"title":"rkTeaching","type":"project"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 19, 2020\nDescriptive Statistics and Regression Question 1 To see if the confinement due to COVID-19 has influenced the performance of a course, the number of failed subjects of each student in the current course and in the previous year course has been counted, obtaining the table below.\n$$ \\begin{array}{crr} \\mbox{Failed subjects} \u0026amp; \\mbox{Previous year course} \u0026amp; \\mbox{Current course} \\newline \\hline 0 \u0026amp; 7 \u0026amp; 8 \\newline 1 \u0026amp; 15 \u0026amp; 12 \\newline 2 \u0026amp; 11 \u0026amp; 8 \\newline 3 \u0026amp; 5 \u0026amp; 7 \\newline 4 \u0026amp; 4 \u0026amp; 3 \\newline 5 \u0026amp; 2 \u0026amp; 2 \\newline 6 \u0026amp; 1 \u0026amp; 2 \\newline 8 \u0026amp; 0 \u0026amp; 1 \\newline \\hline \\end{array}$$\nDraw the box plots of the failed subjects in the current and the previous year courses and compare them. Can we assume that both samples come from a normal population? In which sample the mean is more representative? Which number of failed subjects is greater, 7 in the current course or 6 in the previous year course? Use the following sums for the computations:\nPrevious year course: $\\sum x_in_i=84$, $\\sum x_i^2n_i=254$, $\\sum (x_i-\\bar x)^3n_i=122.99$ y $\\sum (x_i-\\bar x)^4n_i=669.21$.\nCurrent course: $\\sum y_in_i=91$, $\\sum y_i^2n_i=341$, $\\sum (y_i-\\bar y)^3n_i=301.16$ y $\\sum (y_i-\\bar y)^4n_i=2012.88$.\nShow solution Both distributions are pretty similar. The central dispersion is the same and both are right skewed. The only difference is that there is an outlier in the current year distribution. 2. Previous year course: $\\bar x=1.8667$, $s^2=2.16$, $s=1.4697$, $g_1=0.8609$ and $g_2=0.1874$. Current course: $\\bar y=2.1163$, $s^2=3.4516$, $s=1.8578$, $g_1=1.0922$ and $g_2=0.9292$. As the coefficients of skewness and kurtosis are between -2 and 2, we can assume that both distributions come from a normal distribution. 3. Previous year course: $cv=0.7873$. Current year: $cv=0.8779$. Thus, the mean is more representative in the previous year course, since the coefficient of variation is smaller. 4. Previous year course: $z(6)=2.8124$. Current course: $z(7)=2.6287$. Thus, 7 failed subjects in the current course is relatively less than 6 in the previous year course, since the standard score is smaller.\nQuestion 2 A study tries to develop a new technique for detecting a certain antibody. For this, a piezoelectric immunosensor is used, which allows to measure the change in the signal in Hz by varying the concentration of the antibody ($\\mu$g/ml). The table below presents the data collected.\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Concentration ($\\mu$g/ml)} \u0026amp; 5 \u0026amp; 8 \u0026amp; 20 \u0026amp; 35 \u0026amp; 50 \u0026amp; 80 \u0026amp; 110 \\newline \\mbox{Signal (Hz)} \u0026amp; 50 \u0026amp; 70 \u0026amp; 100 \u0026amp; 150 \u0026amp; 170 \u0026amp; 190 \u0026amp; 200 \\newline \\hline \\end{array}$$\nCompute the logarithmic model of the change in the signal on the concentration of the antibodies.\nIt was observed that at a concentration of 100 $\\mu$g/ml the change in signal tends to stabilize. Predict the value of the signal corresponding to such concentration using the logarithmic model.\nPredict the antibody concentration that corresponds to a change in the signal of 120 using the exponential model.\nUse the following sums for the computations ($X$=Concentration and $Y$=Signal):\n$\\sum x_i=308$ Hz, $\\sum \\log(x_i)=23.2345$ $\\log(\\mbox{Hz})$, $\\sum y_j=930$ $\\mu$g/ml, $\\sum \\log(y_j)=33.4575$ $\\log(\\mbox{$\\mu$g/ml})$,\n$\\sum x_i^2=22714$ Hz$^2$, $\\sum \\log(x_i)^2=85.1299$ $\\log(\\mbox{Hz})^2$, $\\sum y_j^2=144900$ $\\mu$g/ml$^2$, $\\sum \\log(y_j)^2=161.6475$ $\\log(\\mbox{$\\mu$g/ml})^2$,\n$\\sum x_iy_j=53760$ Hz$\\cdot\\mu$g/ml, $\\sum x_i\\log(y_j)=1580.3905$ Hz$\\cdot\\log(\\mbox{$\\mu$g/ml})$, $\\sum \\log(x_i)y_j=3496.6333$ $\\log(\\mbox{Hz})\\mu$g/ml, $\\sum \\log(x_i)\\log(y_j)=114.7297$ $\\log(\\mbox{Hz})\\log(\\mbox{$\\mu$g/ml})$.\nShow solution $\\overline{\\log(x)}=3.3192$ log($\\mu$g/ml), $s_{\\log(x)}^2=1.1442$ log($\\mu$g/ml)$^2$. $\\bar y=132.8571$ Hz, $s_y^2=3048.9796$ Hz$^2$. $s_{\\log(x)y}=58.5379$ log($\\mu$g/ml)Hz. Logarithmic regression model: $y=-36.9501+51.1589\\log(x)$. Prediction: $y(100)=198.6453$ Hz. Exponential regression model: $y=e^{0.7685+0.0192y}$. Prediction: $y(120)=21.5929$ $\\mu$g/ml. Probability and Random Variables Question 3 Two symptoms of COVID-19 are fever and cough. We know that 30% of people with COVID-19 cough and 20% have fever and cough. Also, if somebody with COVID-19 have fever then the probability of coughing 0.5.\nConstruct the probability tree for the sample space of the random experiment consisting in picking a random person with COVID-19 and measuring the symptoms that he or she have.\nCalculate the probability of having any of the symptoms.\nCalculate the probability of having only cough.\nCalculate the probability of having only fever.\nCalculate the probability no fever nor cough.\nAre the symptoms dependent or independent?\nShow solution Let $C$ and $F$ be the events of having cough and fever respectively. According to the statement $P(C)=0.3$, $P(C\\cap F)=0.2$ and $P(C|F)=0.5$. 2. $P(C\\cup F) = 0.5$. 3. $P(C\\cap \\overline F) = 0.1$. 4. $P(\\overline C \\cap F) = 0.2$. 5. $P(\\overline C \\cap \\overline F) = 0.5$. 6. The events are dependent since $P(C)\\neq P(C|F)$. Question 4 The sensitivity and specificity of a diagnostic test are 0.58 and 0.01, respectively, and the probability of a true positive is 0.02.\nCalculate the prevalence of the disease.\nCalculate predictive values.\nIs the test more useful to rule out or confirm the disease?\nIf we have 10 non-sick patients, what is the probability that more than 9 have a misdiagnosis?\nIf we have 60 patients, what is the probability that at least two of them have a correct diagnosis?\nShow solution $P(D) = 0.0345$. $PPV = P(D|+) = 0.0205$ and $NPV = P(\\overline D|-) = 0.4$. The test is not helpful to confirm nor to rule out the disease, since both the positive and the negative predictive values are below 0.5. Let $X$ be the number non sick patients with a positive outcome, then $X\\sim B(10, 0.99)$, and $P(X\\geq 9)=0.9957$. Let $Y$ be the number of patients with a right diagnose, then $Y\\sim B(60, 0.0297)\\approx P(1.7793)$, and $P(Y\\geq 2)=0.531$. Question 5 The time required to cure a basketball injury with a rehabilitation technique follows a normal distribution with quartiles $Q_1 = 22$ days and $Q_2 = 25$ days.\nCalculate the mean and standard deviation of the curation time.\nIf a player has just been injured and has to play a match in 30 days, what is the probability that he will miss it?\nCalculate the interquartile range of the curation time distribution.\nShow solution Let $X$ be the time required to cure the injury, then $X\\sim N(25, 4.4478)$. $P(X \u0026gt; 30) = 0.1305$. $IQR = 6$ days. ","date":1592524800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4355f28ec946456c86aa32dfd51f95bd","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-06-19/","publishdate":"2020-06-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-06-19/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 19, 2020\nDescriptive Statistics and Regression Question 1 To see if the confinement due to COVID-19 has influenced the performance of a course, the number of failed subjects of each student in the current course and in the previous year course has been counted, obtaining the table below.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2020-06-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 25, 2020\nDescriptive Statistics and Regression Question 1 In a course there are 150 students, of which 50 are working students and the other 100 non-working students. The table below shows the frequency distribution of the grade in an exam of these two groups:\n$$ \\begin{array}{crr} \\mbox{Grade} \u0026amp; \\mbox{Num non-working students} \u0026amp; \\mbox{Num working students} \\newline \\hline 0-2 \u0026amp; 8 \u0026amp; 2 \\newline 2-4 \u0026amp; 15 \u0026amp; 9 \\newline 4-6 \u0026amp; 25 \u0026amp; 19 \\newline 6-8 \u0026amp; 38 \u0026amp; 11 \\newline 8-10 \u0026amp; 14 \u0026amp; 9 \\newline \\hline \\end{array} $$\nCompute the percentage of students that passed the exam (a grade 5 or above) in both groups, working and non-working students.\nIn which group is there a higher relative dispersion of the grade with respect to the mean?\nWhich grade distribution is more asymmetric, the distribution of working students, or the non-working students one?\nTo apply for a scholarship to go abroad, the grade must be transformed applying the linear transformation $Y = 0.5 + X * 1.45$. Compute the mean of Y for the two groups. How changes the asymmetry of the two groups?\nWhich grade is relatively higher, 6 in the working students group, or 7 in the non-working students group?\nUse the following sums for the computations:\nNon-working students: $\\sum x_in_i=570$, $\\sum x_i^2n_i=3764$, $\\sum (x_i-\\bar x)^3n_i=-547.8$ and $\\sum (x_i-\\bar x)^4n_i=6475.73$.\nWorking students: $\\sum y_in_i=282$, $\\sum y_i^2n_i=1826$, $\\sum (y_i-\\bar y)^3n_i=-1.31$ and $\\sum (y_i-\\bar y)^4n_i=2552.14$.\nSolution 35.5% of non-working students passed and 41% of working students passed. Non-working students: $\\bar x=5.7$, $s^2=5.15$, $s=2.2694$ and $cv=0.3981$. Working students: $\\bar y=5.64$, $s^2=4.7104$, $s=2.1703$ and $cv=0.3848$. The sample of non-working students has a slightly higher relative dispersion with respect to the mean as the coefficient of variation is greater. Non-working students: $g_1=-0.4687$. Working students: $g_1=-0.0026$. Thus, the sample of non-working students is more asymmetric as the coefficient os skewness is further from 0. Non-working students: $\\bar y=8.765$. Working students: $\\bar x=8.678$. The coefficient of skewness does not change as the slope of the linear transformation is positive. Non-working students: $z(7)=0.5728$. Working students: $z(6)=0.1659$. Thus, a 7 in the sample of non-working students is relatively higher than than a 6 in the sample of working students, as its standard score is greater. Question 2 The effect of a doping substance on the response time to a given stimulus was analyzed in a group of patients. The same amount of substance was administered in successive doses, from 10 to 80 mg, to all the patients. The table below shows the average response time to the stimulus, expressed in hundredths of a second:\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Dose (mg)} \u0026amp; 10 \u0026amp; 20 \u0026amp; 30 \u0026amp; 40 \u0026amp; 50 \u0026amp; 60 \u0026amp; 70 \u0026amp; 80 \\newline \\mbox{Response time ($10^{-2}$ s)} \u0026amp; 28 \u0026amp; 46 \u0026amp; 62 \u0026amp; 81 \u0026amp; 100 \u0026amp; 132 \u0026amp; 195 \u0026amp; 302 \\newline \\hline \\end{array} $$\nAccording to the linear regression model, how much will the response time increase or decrease for each mg we increase the dose?\nBased on the exponential model, what will be the expected response time for a 75 mg dose?\nIf a response time greater than one second is considered dangerous for health, from what level should the administration of the doping substance be regulated, or even prohibited, according to the logarithmic model?\nUse the following sums for the computations:\n$\\sum x_i=360$ mg, $\\sum \\log(x_i)=29.0253$ $\\log(\\mbox{mg})$, $\\sum y_j=946$ $10^{-2}$ s, $\\sum \\log(y_j)=36.1538$ $\\log(\\mbox{$10^{-2}$ s})$,\n$\\sum x_i^2=20400$ mg$^2$, $\\sum \\log(x_i)^2=108.7717$ $\\log(\\mbox{mg})^2$, $\\sum y_j^2=169958$ $10^{-2}$ s$^2$, $\\sum \\log(y_j)^2=167.5694$ $\\log(\\mbox{$10^{-2}$ s})^2$,\n$\\sum x_iy_j=57030$ mg$\\cdot 10^{-2}$ s, $\\sum x_i\\log(y_j)=1758.6576$ mg$\\cdot\\log(\\mbox{$10^{-2}$ s})$, $\\sum \\log(x_i)y_j=3795.4339$ $\\log(\\mbox{mg})10^{-2}$ s, $\\sum \\log(x_i)\\log(y_j)=134.823$ $\\log(\\mbox{mg})\\log(\\mbox{$10^{-2}$ s})$.\nSolution $\\bar x=45$ mg, $s_x^2=525$ mg$^2$. $\\bar y=118.25$ $10^{-2}$ s, $s_y^2=7261.6875$ $10^{-4}$ s$^2$. $s_{xy}=1807.5$ mg$\\cdot 10^{-2}$ s. $b_{yx} = 3.4429$ $10^{-2}$ s/mg. Therefore, the response time increases $3.4429$ hundredths of a second for each mg the dose is increased. $\\overline{\\log(y)}=4.5192$ log($10^{-2}$ s), $s_{\\log(y)}^2=0.5227$ log($10^{-2}$ s)$^2$. $s_{x\\log(y)}=16.4669$ mg$\\cdot\\log(10^{-2}$ s). Exponential regression model: $y=e^{3.1078+0.0314x}$. Prediction: $y(75)=235.1434$ $10^{-2}$ s. Exponential coefficient of determination: $r^2=0.988$ Thus, the exponential model fits almost perfectly to the cloud of points of the scatter plot, but the sample is too small to get reliable predictions. Logarithmic regression model: $x=-97.3603+31.501\\ln(y)$. Prediction: $x(100)=47.7072$ mg. Probability and Random Variables Question 3 A hospital orders a DNA compatibility test to three labs A, B and C. Lab A performs 40 test a day, lab B 50, and lab C 60. It is known that the probability of a wrong diagnose is 20% in lab A, 18% in lab B and 22% in lab C. If we select a random test of the hospital,\nCompute the probability of wrong diagnose in that test.\nIf the test is wrong, what is the probability that it has been performed by lab B?\nIf the test is right, which lab is more likely to have performed the test?\nSolution Let $A$, $B$ and $C$ be the events of performing the test in labs $A$, $B$ and $C$ respectively, and $R$ the event of getting a right diagnose. According to the statement $P(A)=0.2667$, $P(B)=0.3333$, $P(C)=0.4$, $P(R|A)=0.8$, $P(R|B)=0.82$ and $P(R|C)=0.78$.\n$P(\\overline R) = 0.2013$. $P(B|\\overline R) = 0.298$. $P(A|R) = 0.2671$, $P(B|R) = 0.3422$ and $P(C|R) = 0.3907$, thus, it is more likely that it has been performed in lab $C$. Question 4 An epidemiological study tries to determine the effectiveness of face masks to prevent the COVID19. In a sample 4000 persons without the virus and 1000 persons with it were selected. I was observed that in the group of infected people 120 had used face masks in the two previous weeks, while in the non-infected group, 1250 had used face masks in the two previous weeks.\nCompute the relative risk of been infected with face masks.\nCompute the odds ratio of been infected with face masks.\nWhich association measure is more reliable?\nSolution Let $D$ be the event of being infected.\n$RR(D)=0.3613$. Thus, the risk of being infected with face mask is almost one third of the likelihood of been infected without face mask. $OR(D)=0.3$. Thus, the odds of being infected with face mask is less than one third of the likelihood of been infected without face mask. As we can not compute the prevalence of $D$, the odds ratio is more reliable. Question 5 During the COVID19 quarantine a telephone exchange with 4 telephone operators received an average of 12 calls per day. Assuming that the calls are equally distributed among the operators,\nCompute the probability that an operator received more than 3 calls a day.\nCompute the probability that all the the operators received some call a day.\nSolution Let $X$ be the number of calls that arrive to one operator, then $X\\sim P(3)$, and $P(X\u0026gt;3)=0.3528$. Let $Y$ be the number of operators that receive some call, then $Y\\sim B(4, 0.9502)$, and $P(Y=4)=0.8152$. Question 6 In a course with 200 students the score of a test to measure the intelligence quotient follows a normal distribution. After applying the test to the students 10 of them got a score above 130 and 30 of them a score below 60.\nCompute the mean and the standard deviation of the score.\nHow many students will have a score between 90 and 95?\nCompute the limits of the interval centered at the mean that accumulates 95% of the scores.\nSolution Let $X$ be the score of the test then $X\\sim N(87.058, 26.1069)$ $P(90\\leq X \\leq 95) = 0.0747$, that is, around $14.9309$ students. Interval with 95% of probability $(35.8895, 138.2265)$. ","date":1590364800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1646900374,"objectID":"07181cbc4126ee87dcf255d682414dbc","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-05-25/","publishdate":"2020-05-25T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2020-05-25/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 25, 2020\nDescriptive Statistics and Regression Question 1 In a course there are 150 students, of which 50 are working students and the other 100 non-working students.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2020-05-25","type":"book"},{"authors":["Parrab-Blesa, Alfonso; Sanchez-Alberca, Alfredo; Garcia-Medina, Jose Javier"],"categories":[],"content":"","date":1577836800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"c28983392176b78c2c324f2c97d13ef2","permalink":"/en/publication/clinical-2020/","publishdate":"2020-09-16T21:26:02.134179Z","relpermalink":"/en/publication/clinical-2020/","section":"publication","summary":"Primary open-angle glaucoma (POAG) is considered one of the main causes of blindness. Detection of POAG at early stages and classification into evolutionary stages is crucial to blindness prevention. Methods: 1001 patients were enrolled, of whom 766 were healthy subjects and 235 were ocular hypertensive or glaucomatous patients in different stages of the disease. Spectral domain optical coherence tomography (SD-OCT) was used to determine Bruch’s membrane opening-minimum rim width (BMO-MRW) and the thicknesses of peripapillary retinal nerve fibre layer (RNFL) rings with diameters of 3.0, 4.1 and 4.7 mm centred on the optic nerve. The BMO-MRW rim and RNFL rings were divided into seven sectors (G-T-TS-TI-N-NS-NI). The k-means algorithm and linear discriminant analysis were used to classify patients into disease stages. Results: We defined four glaucoma stages and provided a new model for classifying eyes into these stages, with an overall accuracy greater than 92% (88% when including healthy eyes). An online application was also implemented to predict the probability of glaucoma stage for any given eye. Conclusions: We propose a new objective algorithm for classifying POAG into clinical-evolutionary stages using SD-OCT.","tags":[],"title":"Clinical-Evolutionary Staging System of Primary Open-Angle Glaucoma Using Optical Coherence Tomography","type":"publication"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Dic 16, 2019\nQuestion 1 A lagoon contaminated with nitrates contains 1000 tons of nitrates dissolved in 6 millions of cubic meters of water. To decontaminate the lagoon, we start to introduce pure water into the lagoon at a rate of 100000 cubic meters per day, and we take out the same amount of contaminated water. Assuming that the concentration of nitrates remains uniform in the lagoon, what amount of nitrates will be in the lagoon after two weeks? If the maximum concentration of nitrates to consider a water not contaminated is $0.1$ kg/m$^3$, when will the lagoon be decontaminated?\nSolution Let $n(t)$ the amount of nitrates in the lagoon at time $t$.\nDifferential equation: $n\u0026rsquo;=-n/60$.\nSolution: $n(t)=10^6 e^{-t/60}$.\n$n(14)=791889.6$ kg.\nThe lagoon will be decontaminated after $30.6495$ days. Question 2 The temperature $T$ of a chemical reaction depends on the concentrations of two substances $x$ and $y$ according to the function $T(x,y)=-x^3+4x^2y-3y^2$.\nIf the concentration of $x$ and $y$ are 2 gr/dl and 1 gr/dl respectively, how must the two concentrations be changed to increase the temperature the maximum? How is the variation of the temperature if we change the two concentration in that direction?\nHow must the two concentrations be changed to increase the temperature at a rate of 10 ºC (gr/dl)$^{-1}$?\nSolution $x$ and $y$ must be changed along the direction of the gradient $\\nabla T(2,1) = (4, 10)$. Along this direction the rate of change of the temperature is $|\\nabla T(2,1)|=10.77$ ºC (gr/dl)$^{-1}$. $x$ and $y$ must be changed along the direction of the unit vector $(0, 1)$, that is $x$ must be keep constant. Question 3 It is known the concentration in blood of the active ingredient of a drug $t$ hours after applying the drug is given by the function $c(t) = t^2e^{-t/2}$ mg/ml.\nCompute the maximum value for the concentration of the active ingredient and give the time when the maximum is reached. Study the concavity and compute the inflection points of the concentration. Solution The maximum is reached at $t=4$ hours and $c(4)=16e^{-2}$ mg/dl. There are two inflection points at $t=1.1716$ and $t=6.8284$.\nThe function is concave up in $(-\\infty, 1.1716) \\cup (6.8284, \\infty)$ and concave down in $(1.1716, 6.8284)$. ","date":1576454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6d5b5e97873b2f8778658cb293c56c3c","permalink":"/en/teaching/calculus/exams/pharmacy-2019-12-16/","publishdate":"2019-12-16T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2019-12-16/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Dic 16, 2019\nQuestion 1 A lagoon contaminated with nitrates contains 1000 tons of nitrates dissolved in 6 millions of cubic meters of water. To decontaminate the lagoon, we start to introduce pure water into the lagoon at a rate of 100000 cubic meters per day, and we take out the same amount of contaminated water.","tags":["Exam"],"title":"Pharmacy exam 2019-12-16","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: December 16, 2019\nQuestion 1 The table below summarizes the time (in minutes) required to remove anesthesia after a surgery in a sample of 50 patients.\n$$ \\begin{array}{cr} \\mbox{Time} \u0026amp; \\mbox{Patients} \\newline \\hline 10-30 \u0026amp; 2 \\newline 30-45 \u0026amp; 11 \\newline 45-60 \u0026amp; 18 \\newline 60-90 \u0026amp; 9 \\newline 90-120 \u0026amp; 8 \\newline 120-180 \u0026amp; 2 \\newline \\hline \\end{array} $$\nAre there some outliers in the sample?\nCompute the mean. Is it representative?\nIf according to a postoperative protocol the 15% of patients that require more time to remove the anesthesia must be monitored, above what time should a patient be monitored?\nIf we apply a drug that is anesthesia antagonist, it is known that the time required to remove the anesthesia decreases a 25%. How will the time decrease affect the representativeness of the mean?\nIf it is known that another type of anesthesia $B$ has mean 50 minutes and standard deviation 15 minutes, what time is relatively greater, 70 minutes with this type of anesthesia or 60 minutes with the type $B$?.\nUse the following sums for the computations:\n$\\sum x_in_i=3212.5$ min, $\\sum x_i^2n_i=249706.25$ min$^2$,\n$\\sum (x_i-\\bar x)^3n_i=1400531.25$ min$^3$ y\n$\\sum (x_i-\\bar x)^4n_i=143958437.7$ min$^4$.\nSolution $Q_1=44.3182$, $Q_3=81.6667$, $IQR=37.3485$, $f_1=-11.7045$ and $f_2=137.6894$. Since the last class contains values above the upper fence, there could be outliers. $\\bar x=64.25$ min, $s^2=866.0625$ min$^2$, $s=29.4289$ min and $cv=0.458$. Thus the representativity of the mean is moderate. $P_{85}=99.375$ min. Applying the linear transformation $y=0.75x$, $\\bar y=48.1875$ min, $s_y=22.0717$ min and $cv=0.458$. Thus the representativity of the mean is the same. Standard score in first anesthesia: $z(70)=0.1954$. Standard score in anesthesia $B$: $z(60)=0.6667$. Thus, 60 min with anesthesia $B$ is relatively greater. Question 2 The table below summarizes the scores of a group of 10 students in three practical exams of Maths.\n$$ \\begin{array}{rrr} \\mbox{Exam 1} (X) \u0026amp; \\mbox{Exam 2} (Y) \u0026amp; \\mbox{Exam 3} (Z) \\newline \\hline 5.5 \u0026amp; 3.2 \u0026amp; 5.0 \\newline 7.5 \u0026amp; 6.5 \u0026amp; 2.0 \\newline 2.5 \u0026amp; 4.0 \u0026amp; 1.0 \\newline 6.0 \u0026amp; 4.0 \u0026amp; 6.0 \\newline 8.0 \u0026amp; 7.5 \u0026amp; 6.0 \\newline 4.0 \u0026amp; 3.5 \u0026amp; 1.0 \\newline 7.0 \u0026amp; 5.5 \u0026amp; 4.0 \\newline 9.5 \u0026amp; 10.0 \u0026amp; 9.0 \\newline 10.0 \u0026amp; 9.5 \u0026amp; 8.0 \\newline 1.0 \u0026amp; 3.0 \u0026amp; 0.5 \\newline \\hline \\end{array} $$\nWhich two scores are more linearly correlated?\nUsing linear models, what are the expected scores of the second and third exams for a student with a score $6.5$ in the first exam?\nUse the following sums for the computations:\n$\\sum x_i=61$, $\\sum y_i=56.7$, $\\sum z_i=42.5$,\n$\\sum x_i^2=449$, $\\sum y_i^2=382.49$, $\\sum z_i^2=264.25$,\n$\\sum x_iy_j=405.85$, $\\sum x_iz_j=327$, $\\sum y_jz_j=295$.\nSolution $\\bar x=6.1$, $s_x^2=7.69$, $\\bar y=5.67$, $s_y^2=6.1001$, $\\bar z=4.25$, $s_z^2=8.3625$, $s_{xy}=5.998$, $s_{xz}=6.775$, $s_{yz}=5.4025$, $r^2_{xy}=0.7669$, $r^2_{xz}=0.7138$ and $r^2_{yz}=0.5722$. Thus, the two variables more linearly related are $X$ and $Y$, since their coefficient of determination is greater. Regression line of $Y$ on $X$: $y=0.9122 + 0.78x$ and $y(6.5)=5.982$. Regression line of $Z$ on $X$: $z=-1.1242 + 0.881x$ and $z(6.5)=4.6024$. Question 3 To study the association between the osteoporosis and the gender a random sample of people between 65 and 70 years old was taken. The following table summarize the results\n$$ \\begin{array}{lcc} \\hline \u0026amp; \\mbox{Osteoporosis} \u0026amp; \\mbox{Not osteoporosis}\\newline \\mbox{Women} \u0026amp; 480 \u0026amp; 2320\\newline \\mbox{Men} \u0026amp; 255 \u0026amp; 1505\\newline \\hline \\end{array} $$\nCompute the prevalence of the osteoporosis in the population.\nCompute the relative risk of osteoporosis in females with respect to males and interpret it.\nCompute the odds ratio of osteoporosis in females with respect to males and interpret it.\nWhich of the two measures is most suitable to study the association between the osteoporosis and the gender?\nSolution Let $D$ be the event of suffering osteoporosis.\nPrevalence: $P(D)=0.1612$. $RR(D)=1.1832$. Thus, the risk of suffering osteoporosis in women is higher than in men but not to much. There is no strong association between the osteoporosis and the gender. $OR(D)=1.2211$. Thus, the odds of suffering osteoporosis in women is higher than in men but not to much. Since we can compute the prevalence of $D$, both statistics are suitable, but relative risk is easier to interpret. Question 4 The risks of getting the flu in two cities $A$ and $B$ with the same population size are 14% and 8% respectively.\nCompute the probability of having more than 2 persons getting the flu in a random sample of 10 persons of the city $A$.\nCompute the probability of having more than 2 and less than 5 persons getting the flu in a random sample of 50 persons of the city $B$.\nCompute the probability of having 2 persons getting the flu in a random sample of 8 persons of the two cities.\nCompute the probability of having some person getting the flu in a random sample of 5 persons that have been living in both cities.\nSolution Let $X$ be the number of persons with flu in a sample of 10 persons from $A$, then $X\\sim B(10, 0.14)$ and $P(X\u0026gt;2)=0.1545$. Let $Y$ be the number of persons with flu in a sample of 50 persons from $B$, then $Y\\sim B(50, 0.08)\\approx P(4)$ and $P(2 \u0026lt; Y \u0026lt; 5) = 0.3907$. Let $Z$ be the number of persons with flu in a sample of 8 persons from $A$ and $B$, then $Z\\sim B(8, 0.11)$ and $P(Z = 2) = 0.1684$. Let $U$ be the number of persons with flu in a sample of 5 persons living in both cities, then $U\\sim B(5, 0.2088)$ and $P(U\u0026gt;0)=0.69$. Question 5 In a study about the cholesterol two samples of 10000 males and 10000 females was taken. It was observed that 3420 males and 1234 females had a cholesterol level above 230 mg/dl, and that 4936 males had a cholesterol level between 210 and 230 mg/dl. Assuming that the cholesterol level in males and females follows a normal distribution with the same standard deviation, compute:\nThe means and the standard deviation of the distributions of cholesterol level in males and females.\nThe percentage of males with cholesterol level between 200 and 240 mg/dl.\nThe interquartile range of the cholesterol level of females.\nSolution Let $X$ be cholesterol level in males and $Y$ the cholesterol level in females, then $X\\sim N(224.1164, 14.4556)$ and $X\\sim N(213.2581, 14.4556)$. $P(200\\leq X \\leq 240) = 0.8164$. $IQR = 19.5003$ mg/dl. ","date":1576454400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6150ba40285e943123193991f81d26bf","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-12-16/","publishdate":"2019-12-16T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-12-16/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: December 16, 2019\nQuestion 1 The table below summarizes the time (in minutes) required to remove anesthesia after a surgery in a sample of 50 patients.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-12-16","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 18, 2019\nQuestion 1 In a population where the prevalence of a disease is 10% we apply a diagnostic test with a sensitivity 85%. What must be the minimum specificity of the test to diagnose the disease when the outcome of the test is positive?\nSolution The specificity must be at least $0.9056$. Question 2 In a stretch of a road there is an average of 2 accidents per day.\nCompute the probability of having more than 2 accidents a random day.\nCompute the probability of having more than 2 accidents a random day, knowing that there is at least one accident that day.\nCompute the probability of having 14 accidents a random week.\nSolution Let $X$ be the number of accidents in a day. $X\\sim P(2)$ and $P(X\u0026gt;2)=0.3233$. $P(X\u0026gt;2|X\\geq 1)=0.3739$. Let $Y$ be the number of accidents in a week. $X\\sim P(14)$ and $P(X=14)=0.106$. Question 3 In a study about the effectiveness of two flu drugs $A$ and $B$ it has been observed in a clinical trial that in 12% of cases only drug $A$ is effective, in 24% of cases only drug $B$ is effective and in 80% of cases where drug $A$ was effective, also was effective the drug $B$.\nWhat is the probability that both drugs are effective at the same time?\nWhat is the probability that only one of the drugs is effective?\nWhat is the probability that none of the drugs are effective?\nAre the effectiveness of the two drugs independent?\nSolution According to the problem statement, $P(A\\cap \\overline B) = 0.12$, $P(\\overline A\\cap B)=0.24$ and $P(B|A)=0.8$.\n$P(A\\cap B)=0.48$. $P(A\\cap \\overline B) + P(\\overline A\\cap B) =0.36$. $P(\\overline A \\cap \\overline B) = 0.16$. The events are dependent because $P(B)=0.72 \\neq P(B|A)=0.8$. Question 4 It is known that the annual rainfall in a region follows a normal probability distribution. If the statistics show that 15% of the years the annual rainfall has been greater than 45 cm and 3% of the years less than 30 cm,\nCompute the mean and the standard deviation of the annual rainfall.\nWhat is the probability that in the next 5 years at least one year the annual rainfall was above 50 cm?\nSolution Let $X$ be the annual rainfall. $X\\sim N(\\mu, \\sigma)$, and according to the statement $P(X\u0026gt;45)=0.15$ and $P(X\u0026lt;30)=0.03$. $\\mu=39.6708$ cm and $\\sigma=5.1419$ cm. Let $Y$ be the number of years in the next 5 years with annual rainfall above 50 cm. Then $Y\\sim B(5, 0.0223)$, and $P(X\\geq 1)=0.1065$. ","date":1574035200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"415bf266af33209796f13d8e0d1df047","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-11-18/","publishdate":"2019-11-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-11-18/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 18, 2019\nQuestion 1 In a population where the prevalence of a disease is 10% we apply a diagnostic test with a sensitivity 85%. What must be the minimum specificity of the test to diagnose the disease when the outcome of the test is positive?","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-11-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 14, 2019\nQuestion 1 It has been measured the systolic blood pressure (in mmHg) in two groups of 100 persons of two populations $A$ and $B$. The table below summarize the results.\n$$ \\begin{array}{lrr} \\mbox{Systolic blood pressure} \u0026amp; \\mbox{Num persons $A$} \u0026amp; \\mbox{Num persons $B$} \\newline \\hline (80, 90] \u0026amp; 4 \u0026amp; 6 \\newline (90, 100] \u0026amp; 10 \u0026amp; 18 \\newline (100, 110] \u0026amp; 28 \u0026amp; 30 \\newline (110, 120] \u0026amp; 24 \u0026amp; 26 \\newline (120, 130] \u0026amp; 16 \u0026amp; 10 \\newline (130, 140] \u0026amp; 10 \u0026amp; 7 \\newline (140, 150] \u0026amp; 6 \u0026amp; 2 \\newline (150, 160] \u0026amp; 2 \u0026amp; 1 \\newline \\hline \\end{array} $$\nWhich of the two systolic blood pressure distributions is less asymmetric? Which one has a higher kurtosis? According to skewness and kurtosis can we assume that populations $A$ and $B$ are normal?\nIn which group is more representative the mean of the systolic blood pressure?\nCompute the value of the systolic blood pressure such that 30% of persons of the group of population $A$ are above it?\nWhich systolic blood pressure is relatively higher, 132 mmHg in the group of population $A$, or 130 mmHg in the group of population $B$?\nIf we measure the systolic blood pressure of the group of population $A$ with another tensiometer, and the new pressure obtained ($Y$) is related with the first one ($X$) according to the equation $y=0.98x-1.4$, in which distribution, $X$ or $Y$, is more representative the mean?\nUse the following sums for the computations:\nGroup $A$: $\\sum x_in_i=11520$ mmHg, $\\sum x_i^2n_i=1351700$ mmHg$^2$, $\\sum (x_i-\\bar x)^3n_i=155241.6$ mmHg$^3$ and $\\sum (x_i-\\bar x)^4n_i=16729903.52$ mmHg$^4$.\nGroup $B$: $\\sum x_in_i=11000$ mmHg, $\\sum x_i^2n_i=1230300$ mmHg$^2$, $\\sum (x_i-\\bar x)^3n_i=165000$ mmHg$^3$ and $\\sum (x_i-\\bar x)^4n_i=13632500$ mmHg$^4$.\nSolution Group $A$: $\\bar x=115.2$ mmHg, $s^2=245.96$ mmHg$^2$, $s=15.6831$ mmHg, $g_{1A}=0.4024$ and $g_{2A}=-0.2346$. Group $B$: $\\bar x=110$ mmHg, $s^2=203$ mmHg$^2$, $s=14.2478$ mmHg, $g_{1B}=0.5705$ and $g_{2B}=0.3081$. Thus the distribution of the population $A$ group is less asymmetric since $g_{1A}$ is closer to 0 than $g_{1B}$ and the populaton $B$ group has a higher kurtosis since $g_{2B}\u0026gt;g_{2A}$. Both populations can be cosidered normal since $g_1$ and $g_2$ are between -2 and 2. $cv_A=0.1361$ and $cv_B=0.1295$, thus, the mean of group $B$ is a little bit more representative since its coef. of variation is smaller than the one of group $A$. $P_{70}\\approx 125$ mmHg. The standard scores are $z_A(132)=1.0712$ and $z_B(130)=1.4037$. Thus, 130 mmHg in group $B$ is relatively higher than 132 mmHg in group $A$. $\\bar y=111.496$, $s_y=15.3694$ and $cv_y=0.1378$. Thus the mean of $X$ is more representative than the mean of $Y$ since $cv_x\u0026lt;cv_y$. Question 2 In a symmetric distribution the mean is 15, the first quartile 12 and the maximum value is 25.\nDraw the box and whiskers plot. Could an hypothetical value of 2 be considered an outlier in this distribution? Solution $Q_1=12$, $Q_2=15$, $Q_3=18$, $IQR=6$, $f_1=3$, $f_2=27$, $Min=5$ and $Max=25$. Yes, because $2\u0026lt;f_1$. Question 3 A pharmaceutical company is trying three different analgesics to determine if there is a relation among the time required for them to take effect. The three analgesics were administered to a sample of 20 patients and the time it took for them to take effect was recorded. The following sums summarize the results, where $X$, $Y$ and $Z$ are the times for the three analgesics.\n$\\sum x_i=668$ min, $\\sum y_i=855$ min, $\\sum z_i=1466$ min,\n$\\sum x_i^2=25056$ min$^2$, $\\sum y_i^2=42161$ min$^2$, $\\sum z_i^2=123904$ min$^2$,\n$\\sum x_iy_j=31522$ min$^2$, $\\sum y_jz_j=54895$ min$^2$.\nIs there a linear relation between the times $X$ and $Y$? And between $Y$ and $Z$? How are these linear relationships?\nAccording to the regression line, how much will the time $X$ increase for every minute that time $Y$ increases?\nIf we want to predict the time $Y$ using a linear regression model, ¿which of the two times $X$ or $Z$ is the most suitable? Why?\nUsing the chosen linear regression model in the previous question, predict the value of $Y$ for a value of $X$ or $Z$ of 40 minutes.\nIf the correlation coefficient between the times $X$ and $Z$ is $r=-0.69$, compute the regression line of $X$ on $Z$.\nSolution $\\bar x=33.4$ min, $s_x^2=137.24$ min$^2$, $\\bar y=42.75$ min, $s_y^2=280.4875$ min$^2$, $\\bar z=73.3$ min, $s_z^2=822.31$ min$^2$, $s_{xy}=148.25$ min$^2$ and $s_{yz}=-388.825$ min$^2$. Thus, there is a direct linear relation between $X$ and $Y$ and an inverse linear relation between $Y$ and $Z$. $b_{xy}=0.5285$ min. $r^2_{xy}=0.5709$ and $r^2_{yz}=0.6555$, thus the regression line of $Y$ on $Z$ explains better $Y$ than the regression line of $Y$ on $X$ since $r^2_{yz}\u0026gt;r^2_{xy}$. Regression line of $Y$ on $Z$: $y=77.4095 + -0.4728z$ and $y(40)=58.4957$. $s_{xz}=-231.7967$ and the regression line of $X$ on $Z$ is $x=54.0622 + -0.2819z$. ","date":1571011200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"84914b3cccbde96cecb28bd06c7b2549","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-10-14/","publishdate":"2019-10-14T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2019-10-14/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 14, 2019\nQuestion 1 It has been measured the systolic blood pressure (in mmHg) in two groups of 100 persons of two populations $A$ and $B$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2019-10-14","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: June 18, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours. A sample of individuals between 40 and 50 years that spent more than 5 hours sitting were drawn. It was observed if they followed or not the occupational risk prevention program and the number of spinal injuries after 10 years. The results are shown in the table below.\n$$ \\begin{array}{lrrrrrrrrrrrrrrr} \\hline \\mbox{With prevention program} \u0026amp; 1 \u0026amp; 3 \u0026amp; 2 \u0026amp; 4 \u0026amp; 4 \u0026amp; 0 \u0026amp; 2 \u0026amp; 4 \u0026amp; 2 \u0026amp; 2 \u0026amp; 5 \u0026amp; 2 \u0026amp; 3 \u0026amp; 2 \u0026amp; 0 \\newline \\mbox{Wihtout prevention program} \u0026amp; 6 \u0026amp; 3 \u0026amp; 1 \u0026amp; 3 \u0026amp; 7 \u0026amp; 6 \u0026amp; 5 \u0026amp; 5 \u0026amp; 9 \u0026amp; 5 \u0026amp; 5 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; \\newline \\hline \\end{array}$$\nPlot the polygon of cumulative relative frequencies of the total sample.\nAccording to the interquartile range, which sample has more central spread of the spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich sample has a greater relative spread with respect to the mean of the spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich sample has a more normal kurtosis of the number of spinal injuries, the sample of people following the prevention program or the sample of people not following the prevention program?\nWhich number of spinal injuries is relatively greater, 2 injuries of a person following the prevention program or 4 injuries of a person not following the prevention program?\nUse the following sums for the computations:\nWith prevention program: $\\sum x_i=36$ injuries, $\\sum x_i^2=116$ injuries$^2$, $\\sum (x_i-\\bar x)^3=-0.48$ injuries$^3$ and $\\sum (x_i-\\bar x)^4=135.97$ injuries$^4$.\nWithout prevention program: $\\sum y_i=66$ injuries, $\\sum y_i^2=362$ injuries$^2$, $\\sum (y_i-\\bar y)^3=27.92$ injuries$^3$ and $\\sum (y_i-\\bar y)^4=586.9$ injuries$^4$.\nSolution With prevention program: $Q_1=2$ injuries, $Q_3=4$ injuries, $IQR=2$ injuries.\nWithout prevention program: $Q_1=3$ injuries, $Q_3=6$ injuries, $IQR=3$ injuries.\nThe sample not following the prevention program has more central spread since the interquartile range is greater.\nWith prevention program: $\\bar x=2.4$ injuries, $s^2=1.9733$ injuries$^2$, $s=1.4048$ injuries and $cv=0.5853$.\nWithout prevention program: $\\bar y=4.7143$ injuries, $s^2=3.6327$ injuries$^2$, $s=1.906$ injuries and $cv=0.4043$.\nThe sample following the prevention program has a greater relative spread with respect to the mean since the coef. of variation is greater.\nWith prevention program: $g_2=-0.6722$.\nWithout prevention program: $g_2=0.1768$.\nThus the sample not following the prevention program has a more normal kurtosis, since the coeff. of kurtosis is closer to 0.\nWith prevention program: $z(2)=-0.2847$.\nWithout prevention program: $z(4)=-0.3748$.\nThus 4 injuries in the sample not following the prevention program is relatively smaller, since its standard score is smaller.\nQuestion 2 The evolution of the price of a muscle relaxant between 2015 and 2019 is shown in the table below.\n$$ \\begin{array}{lrrrrr} \\hline \\mbox{Year} \u0026amp; 2015 \u0026amp; 2016 \u0026amp; 2017 \u0026amp; 2018 \u0026amp; 2019 \\newline \\mbox{Price (€)} \u0026amp; 1.40 \u0026amp; 1.60 \u0026amp; 1.92 \u0026amp; 2.30 \u0026amp; 2.91 \\newline \\hline \\end{array}$$\nWhich regression model is better to predict the price, the linear or the exponential?\nUse the best of the two previous models to predict the price in 2020.\nSolution $\\bar x=2017$ years, $s_x^2=2$ years$^2$.\n$\\bar y=2.026$ €, $s_y^2=0.2882$ €$^2$.\n$\\overline{\\log(y)}=0.672$ log(€), $s_{\\log(y)}^2=0.0673$ log(€)$^2$.\n$s_{xy}=0.744$ years$\\cdot$€, $s_{x\\log(y)}=0.3653$ years$\\cdot\\log(€)$\nLinear coef. determination: $r^2=0.9603$ Exponential coef. determination: $r^2=0.9909$\nThus the exponential regression model is better to predict the price since the coef. of determination is greater. Exponential regression model: $y=e^{-367.6861+0.1826x}$.\nPrediction: $y(2020)=3.3867$ €. Question 3 In a linear regression study between two variables $X$ and $Y$ we know $\\bar x = 3$, $s_x^2=2$, $s_y^2=10.8$ and the regression line of $Y$ on $X$ is $y=90.9-2.3x$.\nCompute the mean of $Y$.\nCompute and interpret the linear correlation coefficient.\nSolution $\\bar y = 84$. $r=-0.9898$. Probability and Random Variables Question 4 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours. A sample of 500 individuals between 40 and 50 years that spent more than 5 hours sitting was drawn. Half of the individuals followed the prevention program (treatment group) and the other half not (control group). After 5 years it was observed that 12 individuals suffered spinal injuries in the group following the prevention program while 32 individuals suffered spinal injuries in the other group. In the following 5 years it was observed that 21 individuals suffered spinal injuries in the group following the prevention program while 48 individuals suffered spinal injuries in the other group.\nCompute the cumulative incidence of spinal injuries in the total sample after 5 years and after 10 years.\nCompute the absolute risk of suffering spinal injuries in 10 years in the treatment and control groups.\nCompute the relative risk of suffering spinal injuries in 10 years in the treatment group compared to the control group. Interpret it.\nCompute the odds ratio of suffering spinal injuries in 10 years in the treatment group compared to the control group. Interpret it.\nWhich statistics, the relative risk or the odds ratio, is more suitable in this study? Justify the answer.\nSolution Let $D$ be the event of suffering spinal injuries.\nCumulative incidence after 5 years: $R(D)=0.088$. Cumulative incidence after 10 years: $R(D)=0.226$.\nRisk in the treatment group: $R_T(D)=0.132$. Risk in the control group: $R_C(D)=0.32$.\n$RR(D)=0.4125$. Thus, the risk of suffering spinal injuries is less than half following the prevention program.\n$OR(D)=0.3232$. Thus, the odd of suffering spinal injuries is less than one third following the prevention program.\nSince the study is prospective and we can estimate the prevalence of $D$, both statistics are suitable, but relative risk is easier to interpret.\nQuestion 5 The table below shows the results of a study to evaluate the usefulness of a reactive strip to diagnose an urinary infection.\n$$ \\begin{array}{ccc} \\hline \\mbox{Outcome} \u0026amp; \\mbox{Infection} \u0026amp; \\mbox{No infection}\\newline \\mbox{Positive} \u0026amp; 60 \u0026amp; 80\\newline \\mbox{Negative} \u0026amp; 10 \u0026amp; 200\\newline \\hline \\end{array} $$\nCompute the sensitivity and the specificity of the test.\nCompute the positive and the negative predictive values.\nIs this test better to confirm or to rule out the infection?\nIf another study has determined that the true prevalence of the infection is 2%, how does this affect to the predictive values?\nSolution Let $D$ be the event corresponding to suffering the urinary infection and $+$ and $-$ the events corresponding to get a positive and negative outcome in the test respectively.\nSensitivity = $0.8571$ and Specificity = $0.7143$.\n$PPV=0.4286$ and $NPV=0.9524$. Since the $PPV\u0026lt;NPV$ the test is better to rule out the infection.\n$PPV=0.0577$ and $NPV=0.9959$. The positive predictive value descreases a lot while the negative predictive value increases al little bit.\nQuestion 6 The time required to recover from an injury follows a normal distribution with variance 64 days.\nIt is also known that 10% of people with this injury require more than 80 days to recover.\nWhat is the expected time required to recover from the injury?\nWhat percentage of individuals will require between 60 and 75 days to recover?\nIf we draw a random sample of 12 individuals with this injury, what is the probability of having between 9 and 11 individuals, both included, requiring less than 80 days to recover?\nIf we draw a random sample of 500 individuals with this injury, what is the probability of having less than 4 requiring a time above the 99th percentile to recover?\nSolution Let $X$ be the time required to recover from the injury. Then $X\\sim N(\\mu, 8)$.\n$\\mu=69.7476$ days.\n$P(60\u0026lt;X\u0026lt;75) = 0.6327$.\nLet $Y$ be the number of individuals with the injury requiring less than 80 days to recover in a sample of 12. Then $Y\\sim B(12, 0.9)$ and $P(9\\leq Y\\leq 11)=0.6919$.\nLet $Z$ be the number of individuals with the injury requiring a time above the 99th percentile to recover in a sample of 500. Then $Z\\sim B(500, 0.01)\\approx P(5)$ and $P(Z\\leq 4)=0.265$.\n","date":1560816000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1647070234,"objectID":"c221fea84ca6e9626829bc11271943dc","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-06-18/","publishdate":"2019-06-18T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-06-18/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 18, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effectiveness of an occupational risk prevention program in jobs that require to be sit a lot of hours.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-06-18","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 27, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effect of smoking during the pregnancy in the weight of newborns. The table below shows the daily number of cigarretes smoked by mothers ($X$) and the weight of the newborn (all of them are males) ($Y$).\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Daily num cigarettes} \u0026amp; 10.00 \u0026amp; 14.00 \u0026amp; 8.00 \u0026amp; 11.00 \u0026amp; 7.00 \u0026amp; 6.00 \\newline \\mbox{Weight (kg)} \u0026amp; 2.55 \u0026amp; 2.44 \u0026amp; 2.68 \u0026amp; 2.65 \u0026amp; 2.71 \u0026amp; 2.85 \\newline \\hline \\end{array} $$\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Daily num cigarettes} \u0026amp; 2.00 \u0026amp; 5.00 \u0026amp; 9.00 \u0026amp; 9.00 \u0026amp; 4.00 \u0026amp; 6.00 \\newline \\mbox{Weight (kg)} \u0026amp; 3.45 \u0026amp; 2.93 \u0026amp; 2.67 \u0026amp; 2.59 \u0026amp; 3.02 \u0026amp; 2.72 \\newline \\hline \\end{array} $$\nGive the equation of the regression line of the weight of newborns on the daily number of cigarettes and interpret the slope.\nWhich regression model is better to predict the weight of newborns, the logarithmic or the exponential?\nUse the best of the two previous regression models to predict the weight of a newborn whose mother smokes 12 cigarettes a day. Is this prediction reliable?\nUse the following sums for the computations:\n$\\sum x_i=91$ cigarettes, $\\sum \\log(x_i)=23.0317$ $\\log(\\mbox{cigarettes})$, $\\sum y_j=33.26$ kg, $\\sum \\log(y_j)=12.1857$ $\\log(\\mbox{kg})$,\n$\\sum x_i^2=809$ cigarettes$^2$, $\\sum \\log(x_i)^2=47.196$ $\\log(\\mbox{cigarettes})^2$, $\\sum y_j^2=92.9708$ kg$^2$, $\\sum \\log(y_j)^2=12.4665$ $\\log(\\mbox{kg})^2$,\n$\\sum x_iy_j=243.61$ cigarettes$\\cdot$kg, $\\sum x_i\\log(y_j)=89.3984$ cigarettes$\\cdot\\log(\\mbox{kg})$, $\\sum \\log(x_i)y_j=62.3428$ $\\log(\\mbox{cigarettes})$kg, $\\sum \\log(x_i)\\log(y_j)=22.8753$ $\\log(\\mbox{cigarettes})\\log(\\mbox{kg})$.\nSolution $\\bar x=7.5833$ cigarettes, $s_x^2=9.9097$ cigarettes$^2$. $\\bar y=2.7717$ kg, $s_y^2=0.0654$ kg$^2$. $s_{xy}=-0.7176.$ cigarettes$\\cdot$kg Regression line: $y=-0.0724x + 3.3208$. The slope of the regression line is $b_{yx}=-0.0724$. That means that the weight of the newborn will decrease 0.0724 kg per daily cigarette smoked by the mother.\n$\\overline{\\log(x)}=1.9193$ log(cigarettes), $s_{\\log(x)}^2=0.2492$ log(cigarettes)$^2$. $\\overline{\\log(y)}=1.0155$ log(kg), $s_{\\log(y)}^2=0.0077$ log(kg)$^2$. $s_{x\\log(y)}=-0.2508$ cigarettes$\\cdot$log(kg), $s_{\\log(x)y}=-0.1245$ log(cigarettes)$\\cdot$kg Logarithmic coef. determination: $r^2=0.9499$ Exponential coef. determination: $r^2=0.8268$ Therefore, the logarithmic models fits better the data and is better to predict the weight.\nLogarithmic regression model: $y=3.7301+-0.4994\\log(x)$. Prediction: $y(12)=2.4892$ kg. The coefficient of determination is high but the sample size small, so the prediction is not enterely reliable.\nQuestion 2 The table below summarize the time that took to the runners to reach the finish in a long-distance race in Madrid:\n$$ \\begin{array}{lr} \\mbox{Time (min)} \u0026amp; \\mbox{Num runners}\\newline \\hline (30,35] \u0026amp; 15\\newline (35,40] \u0026amp; 35\\newline (40,45] \u0026amp; 40\\newline (45,50] \u0026amp; 10\\newline \\hline \\end{array}$$\nIn a another race in Paris, the mean of time was 40 minutes, the standard deviation 5 minutes and the coefficient of skewness $0.75$.\nWhat percentage of runners took less than 42 minutes to reach the finish in Madrid?\nCompute and interpret the interquartile range of the time for Madrid race.\nIn which race the mean of the time is more representative?\nIn which race the time have a more symmetric distribution?\nIn which race a time of 39 minutes to reach the finish is relatively smaller?\nUse the following sums for the computations: $\\sum x_i=3975$ min, $\\sum x_i^2=159875$ min$^2$, $\\sum (x_i-\\bar x)^3=-628.12$ min$^3$ y $\\sum (x_i-\\bar x)^4=80701.95$ min$^4$.\nSolution $F(42)=0.66$, thus approximately $66%$ of runners finished before 42 minutes.\n$Q_1=36.4286$ min, $Q_3=43.125$ min and $IQR=6.6964$ min. The central 50% of times fall in a range of $6.6964$ minutes.\nMadrid statistics: $\\bar x=39.75$ min, $s^2=18.6875$ min$^2$, $s=4.3229$ min and $cv=0.1088$. Paris statistics: $cv=0.125$. Thus, the mean of time in Madrid is a little bit more representative since the coef. of variation is smaller.\n$g_1=-0.0778$, that is closer to 0 than the distribution of times in Paris, thus the distribution of times in Madrid is more symmetric.\nThe standard score of the Madrid sample is $z(39)=-0.1735$ and the standard score of the Paris one $z(39)=-0.2$, thus a time of 39 min is relatively smaller in the sample of Paris.\nProbability and Random Variables Question 1 It has been observed that the concentration of a metabolite in urine can be used as a diagnostic test for a disease. The concentration (in mg/dl) in healthy individuals follows a normal distribution with mean 90 and standard deviation 8, while in sick individuals follows a normal distribution with mean 120 and standard deviation 10.\nIf the cut-off point is set at 105 mg/dl (positive above and negative below), what is the sensitivity and the specificity of the test?\nIf the cut-off point is set at 105 mg/dl and we assume a prevalence of 10%, what is the probability of a correct diagnostic?\nIf we want a sensitivity of 95%, where must we set the cut-off point? What would the specificity of the test be?\nSolution Let $X$ and $Y$ be the distributions of the concentration of metabolite in healthy and sick individuals respectively.\nSensitivity: $P(+|D) = P(Y\u0026gt;105) = 0.9332$. Specificity: $P(-|\\overline D) = P(X\u0026lt;105) = 0.9696$.\n$P(\\mbox{correct diagnostic}) = P(D\\cap +) + P(\\overline D \\cap -) = 0.966$.\nCut-off point $103.5515$ mg/dl. Specificity: $P(-|\\overline D) = P(X\u0026lt;103.5515) = 0.9549$.\nQuestion 2 Let $A$ and $B$ be two events of a random experiment, such that $A$ is three times as likely as $B$, $P(A\\cup B)=0.8$ and $P(A\\cap B)=0.2$.\nCompute $P(A)$ and $P(B)$.\nCompute $P(A-B)$ and $P(B-A)$.\nCompute $P(\\bar A \\cup \\bar B)$ and $P(\\bar A \\cap \\bar B)$.\nCompute $P(A|B)$ and $P(B|A)$.\nAre $A$ and $B$ independent?\nSolution $P(A) = 0.75$ and $P(B) = 0.25$.\n$P(A-B) = 0.55$ and $P(B-A) = 0.05$.\n$P(\\bar A \\cup \\bar B) = 0.8$ and $P(\\bar A \\cap \\bar B) = 0.2$.\n$P(A|B) = 0.8$ and $P(B|A) = 0.2667$.\nNo, they are dependent since $P(A|B)\\neq P(A)$.\nQuestion 3 The employees of a courier company send an average of $246.2$ messages in a period of 12 hours. It is also known that the mean of messages sent by males is $256.2$ and by females is $237.4$ in the same period.\nCompute the probability that a random person of the company sends 5 messages in a period of half an hour.\nIf we draw randomly 10 women of this company, what is the probability that at least 3 of them sends more than one message in a period of one hour?\nIf we draw randomly 100 men of this company, what is the probability that none of them sends less than 2 messages in a period of a quarter of an hour?\nSolution Let $X$ be the number of messages sent in 1 hour. Then $X\\sim P(10.2583)$ and $P(X=5)=0.0332$.\nLet $Y$ be the number of women in a sample of 10 that sent more than 1 message in 1 hour. Then $Y\\sim B(10, 1)$ and $P(Y\\geq 3)=1$.\nLet $Z$ be the number of men in a sample of 100 that sent less than 2 messages in a quarter of hour. Then $Z\\sim B(100, 0.0305)$ and $P(Z=0)=0.0166$.\n","date":1558915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1646900374,"objectID":"5ab4f4415cc715de5fb5e8c1aae2eeaf","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-05-27/","publishdate":"2019-05-27T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-05-27/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 27, 2019\nDescriptive Statistics and Regression Question 1 A study tries to determine the effect of smoking during the pregnancy in the weight of newborns. The table below shows the daily number of cigarretes smoked by mothers ($X$) and the weight of the newborn (all of them are males) ($Y$).","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-05-27","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: March 26, 2019\nQuestion 1 The time required by a drug $A$ to be effective has been measured (in minutes) in a sample of 150 patients. The table below summarize the results.\n$$ \\begin{array}{lr} \\mbox{Response time} \u0026amp; \\mbox{Patients} \\newline \\hline (0,5] \u0026amp; 5 \\newline (5,10] \u0026amp; 15 \\newline (10,15] \u0026amp; 32 \\newline (15,20] \u0026amp; 36 \\newline (20,30] \u0026amp; 42 \\newline (30,60] \u0026amp; 20 \\newline \\hline \\end{array} $$\nAre there outliers in the sample? Justify the answer.\nWhat is the minimum time for the 20% of patients with highest response time?\nWhat is the average response time? Is the mean representative?\nCan we assume that the sample comes from a normal population?\nIf we take another sample of patients with mean 18 min and standard deviation 15 min, in which group is greater a response time of 25 min?\nUse the following sums for the computations: $\\sum x_i=3105$ min, $\\sum x_i^2=83650$ min$^2$, $\\sum (x_i-\\bar x)^3=206851.65$ min$^3$ y $\\sum (x_i-\\bar x)^4=8140374.96$ min$^4$.\nSolution $Q_1=12.7344$ min, $Q_3=25.8333$ min, $IQR=13.099$ min, $f_1=-6.9141$ min and $f_2=45.4818$ min. Therefore there are outliers in the sample since the upper limit of the last interval is above the upper fence. $P_{80}=27.619$ min. $\\bar x=20.7$ min, $s^2=129.1767$ min$^2$, $s=11.3656$ min and $cv=0.5491$. The mean is moderately representative since the $cv\\approx 0.5$. $g_1=0.9393$ and $g_2=0.2523$. Since $g_1$ and $g_2$ are between -2 and 2, we can assume that the sample comes from a normal (bell-shaped) population. The standard score of the first sample is $z(25)=0.3783$ and the standard score of the second one is $z(25)=0.4667$, thus a time of 25 min is relatively greater in the second sample. Question 2 In a regression study about the relation between two variables $X$ and $Y$ we got $\\bar x=7$ and $r^2=0.9$. If the equation of the regression line of $Y$ on $X$ is $y-x=1$, compute\nThe mean of $Y$.\nThe equation of the regression line of $X$ on $Y$.\nWhat value does this regression model predict for $x=6$? And for $y=10$?\nSolution $\\bar y=8$. Regression line of $X$ on $Y$: $x=0.9y-0.2$. $y(6)=7$ and $x(10)=8.8$. Question 3 In a tennis club the age ($X$) and the height ($Y$) of the ten players conforming the female youth team has been measured.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Age (years)} \u0026amp; 9 \u0026amp; 10 \u0026amp; 11 \u0026amp; 12 \u0026amp; 13 \u0026amp; 14 \u0026amp; 15 \u0026amp; 16 \u0026amp; 17 \u0026amp; 18 \\newline \\mbox{Height (cm)} \u0026amp; 128 \u0026amp; 144 \u0026amp; 148 \u0026amp; 154 \u0026amp; 158 \u0026amp; 161 \u0026amp; 165 \u0026amp; 164 \u0026amp; 166 \u0026amp; 167 \\newline \\hline \\end{array} $$\nPlot the scatter plot (Height on Age).\nWhich regression model bests fits these data, the linear or the logarithmic?\nWhat is the expected height of a player 12.5 years old according to the best of two previous models?\nUse the following sums for the computations:\n$\\sum x_i=135$ years, $\\sum \\log(x_i)=25.7908$ $\\log(\\mbox{years})$, $\\sum y_j=1555$ cm, $\\sum \\log(y_j)=50.4358$ $\\log(\\mbox{cm})$,\n$\\sum x_i^2=1905$ years$^2$, $\\sum \\log(x_i)^2=67.0001$, $\\log(\\mbox{years})^2$, $\\sum y_j^2=243191$ cm$^2$, $\\sum \\log(y_j)^2=254.4404$ $\\log(\\mbox{cm})^2$,\n$\\sum x_iy_j=21303$ years$\\cdot$cm, $\\sum x_i\\log(y_j)=682.9473$ years$\\cdot\\log(\\mbox{cm})$, $\\sum \\log(x_i)y_j=4035.0697$ $\\log(\\mbox{years})$cm, $\\sum \\log(x_i)\\log(y_j)=130.2422$ $\\log(\\mbox{years})\\log(\\mbox{cm})$.\nSolution 2.$\\bar x=13.5$ years, $s_x^2=8.25$ years$^2$, $\\overline{\\log(x)}=2.5791$ log(years), $s_{\\log(x)}^2=0.0483$ log(years)$^2$.\n$\\bar y=155.5$ cm, $s_y^2=138.85$ cm$^2$. $s_{xy}=31.05$ years$\\cdot$cm, $s_{\\log(x)y}=2.4594$ log(years)cm Linear coef. determination: $r^2=0.8416$ Logarithmic coef. determination: $r^2=0.9013$ Therefore, both models fit pretty well, but the logarithmic model fits a little bit better. 3. Logarithmic regression model: $y=24.2639+50.8848\\log(x)$. Prediction: $x(12.5)=152.785$ cm.\n","date":1553558400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a26227c769d9eb5dda80d4c6cd4b9b77","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-03-26/","publishdate":"2019-03-26T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2019-03-26/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: March 26, 2019\nQuestion 1 The time required by a drug $A$ to be effective has been measured (in minutes) in a sample of 150 patients. The table below summarize the results.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2019-03-26","type":"book"},{"authors":["Alfredo Sánchez Alberca, María Luisa Sánchez Rodríguez, Manuel Camacho Sampelayo, José Miguel Camacho Sampelayo, José Javier García Medina Alfonso Parra Blesa"],"categories":[],"content":"","date":1546300800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"b92a4a34decc05a181d033ea99c9c0da","permalink":"/en/publication/clasificacion-2019/","publishdate":"2020-09-16T21:26:03.60199Z","relpermalink":"/en/publication/clasificacion-2019/","section":"publication","summary":"","tags":[],"title":"Clasificación por estadios clínico evolutivos del glaucoma primario de ángulo abierto (GPAA) usando valores normalizados obtenidos mediante tomografía de coherencia óptica","type":"publication"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Dic 17, 2018\nQuestion 1 A organism metabolizes alcohol at a rate half of the present amount per minute. If initially there is no alcohol and we start to introduce alcohol in the organism at a constant rate of 2 ml/min, how much alcohol will there be in the organism after 5 minutes?\nSolution Let $a$ be the alcohol in the organism and $t$ the time.\nDifferential equation: $a\u0026rsquo;=2-a/2$.\nSolution: $a(t)=4-4e^{-t/2}$.\n$a(5)=3.6717$ ml. Question 2 The amount $y$ of bacteria of type $B$ (in thousands) in a culture is related to the amount $x$ of bacteria of type $A$ (also in thousands) according to the function $y=f(x)$. Knowing that the equation $x^2y^3-6x^3y^2+2xy=1$ is satisfied in this culture and that $f(1/2)=2$, study if $f$ could have a local maximum at $x=1/2$.\nSolution Implicit derivative: $y\u0026rsquo;= \\dfrac{-2xy^3+18x^2y^2+2y}{3x^2y^2-12x^3y+2x}$.\n$y\u0026rsquo;(1/2)=6\\neq 0$, so $f$ has no local maximum at $x=1/2$. Question 3 A capsule has pyramidal shape with base a rectangle of sides $a=3$ cm, $b=4$ cm, and height $h=6$ cm.\nHow must change the dimensions of the capsule to increase the volumen the most? What would be the rate of change of the volume if we changed the dimensions in such a way? If we start to change the dimensions of the capsule such that the largest side of the rectangle decreases half of the increase of the smaller side, and the height increases the double of the increase of the smaller side, what will the rate of change of the volume be? Remark: The volume of a pyramid is $1/3$ of the base area times the height.\nSolution $\\nabla V(3,4,6)=(8,6,4)$ and the volume will increase $|\\nabla V(3,4,6)|=10.7703$ cm$^3$/s if we change the dimensions of the capsule following this direction. Directional derivative of $V$ in $(3,4,6)$ along the vector $\\mathbf{u}=(1,-1/2,2)$: $V\u0026rsquo;_{\\mathbf{u}}(3,4,6)=5.6737$ cm$^3$/s. Question 4 The yield of a crop $y$ depends of the concentrations of nitrogen $n$ and phosphor $p$ according to the function $$y(n,p)=npe^{-(n+p)}.$$ Compute the amount of $n$ and $p$ that maximizes the yield of the crop.\nSolution $n=1$ and $p=1$. ","date":1545004800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"9165fcc76313b9241b8413b9fe25ad1a","permalink":"/en/teaching/calculus/exams/pharmacy-2018-12-17/","publishdate":"2018-12-17T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2018-12-17/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Dic 17, 2018\nQuestion 1 A organism metabolizes alcohol at a rate half of the present amount per minute. If initially there is no alcohol and we start to introduce alcohol in the organism at a constant rate of 2 ml/min, how much alcohol will there be in the organism after 5 minutes?","tags":["Exam"],"title":"Pharmacy exam 2018-12-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: December 17, 2018\nQuestion 1 The chart below represents the cumulative distribution of the number of daily defective drugs produced by a machine in a sample of 40 days.\nConstruct the frequency table of the number of defective drugs. Draw the box and whiskers plot of the number of defective drugs. Study the symmetry of the distribution of the number of defective drugs. If the number of defective drugs produced by a second machine follows the equation $y=3x+2$, where $x$ and $y$ are the number of defective drugs with the first and the second machines respectively, in which machine is more representative the mean of the number of defective drugs? Which number of defective drugs is relatively smaller, 3 drugs in the first machine or 9 in the second one? Solution $$\\begin{array}{|c|r|r|r|r|} \\hline \\mbox{Defective drugs} \u0026amp; n_i \u0026amp; f_i \u0026amp; N_i \u0026amp; F_i\\newline \\hline 0 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 1 \u0026amp; 0.025\\newline 1 \u0026amp; 3 \u0026amp; 0.075 \u0026amp; 4 \u0026amp; 0.100\\newline 2 \u0026amp; 6 \u0026amp; 0.150 \u0026amp; 10 \u0026amp; 0.250\\newline 3 \u0026amp; 7 \u0026amp; 0.175 \u0026amp; 17 \u0026amp; 0.425\\newline 4 \u0026amp; 8 \u0026amp; 0.200 \u0026amp; 25 \u0026amp; 0.625\\newline 5 \u0026amp; 6 \u0026amp; 0.150 \u0026amp; 31 \u0026amp; 0.775\\newline 6 \u0026amp; 5 \u0026amp; 0.125 \u0026amp; 36 \u0026amp; 0.900\\newline 7 \u0026amp; 2 \u0026amp; 0.050 \u0026amp; 38 \u0026amp; 0.950\\newline 8 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 39 \u0026amp; 0.975\\newline 9 \u0026amp; 1 \u0026amp; 0.025 \u0026amp; 40 \u0026amp; 1.000\\newline \\hline \\end{array} $$ $\\bar x=3.975$ drugs, $s_x=1.9936$ drugs and $g_1=0.3184$. Thus the distribution is a little bit right-skewed. $cv_x=0.5015$, $\\bar y=13.925$ drugs, $s_y=5.9808$ drugs and $cv_y=0.4295$. Thus, the mean of $y$ is more representative than the mean of $x$ since its coef. of variation is smaller. $z_x=-0.4891$ and $z_y=-0.8235$, therefore 9 defective drugs in the $y$ machine is relatively smaller. Question 2 A pharmaceutical laboratory produces two models of blood pressure monitor, one for the arm and the other for the wrist. To compare the accuracy of both blood pressure monitors, a quality control has been conducted with a sample of 20 patients, getting the following results:\n$\\sum x_i=265.4$ mmHg, $\\sum y_i=262.5$ mmHg , $\\sum z_i=262.4$ mmHg,\n$\\sum x_i^2=3701.14$ mmHg$^2$, $\\sum y_i^2=3629.41$ mmHg$^2$, $\\sum z_i^2=3615.38$ mmHg$^2$,\n$\\sum x_iy_j=3658.28$ mmHg$^2$, $\\sum x_iz_j=3655.95$ mmHg$^2$, $\\sum y_jz_j=3613.97$ mmHg$^2$.\nWhere $X$ is the blood pressure with the arm monitor, $Y$ with the wrist monitor and $Z$ the real blood pressure.\nWhich blood pressure monitor predicts better the real blood pressure with a linear regression model? If a patient has a real blood pressure of $13.5$ mmHg, what is the expected blood pressure given by the arm blood pressure monitor? Solution Blood pressure with the arm monitor: $\\bar x=13.27$ mmHg, $s^2_x=8.9641$ mmHg².\nBlood pressure with the wrist monitor: $\\bar y=13.125$ mmHg, $s^2_y=9.2049$ mmHg².\nReal blood pressure: $\\bar z=13.12$ mmHg, $s^2_z=8.6346$ mmHg². $s_{xz}=8.6951$ mmHg², $s_{yz}=8.4985$ mmHg², $r^2_{xz}=0.9768$ and $r^2_{yz}=0.9087$.\nThus, the arm monitor predicts better the real pressure with a linear regression model since its linear coef. of determination is greater. Regression line of $X$ on $Z$: $x=0.0581+1.007z$.\nPrediction: $x(13.5)=13.6527$ mmHg. Question 3 The regression line of $Y$ on $X$ is $y=1.2x-0.6$.\nWhich of the following lines can not be the regression line of $X$ on $Y$. Justify the answer. $x=0.9y-0.6$ $x=-0.7y+0.4$ $x=0.8y-0.7$ $x=-0.6y-0.5$ $x=0.4y-0.6$ $x=-0.5y+0.9$ Considering only the ones that can be the regression line of $X$ on $Y$, which one will give better predictions? Justify the answer. Solution (b), (d) and (f) are not possible because the slope is negative, and (a) is not possible because the coef. of determination is greater than 1. (c) gives better predictions because its coef. of determination is greater. Question 4 In an epidemiological study a sample of 400 persons with breast cancer was drawn and another sample of 1200 persons without breast cancer. In the sample of persons with breast cancer there was 180 smokers, while in the sample of persons without breast cancer there was 1140 non-smokers.\nCompute the relative risk of developing cancer smoking and interpret it. Compute the odds ratio of developing cancer smoking and interpret it. Solution Let $C$ be the event of having cancer.\n$RR(C)=4.6364$. That means that the probability of having cancer smoking is $4.6364$ times higher than non-smoking. $OR(C)=15.5455$. As is posibive there is a direct association between smoking and having cancer. The odds of having cancer smoking is more than 15 times greater than non-smoking. Question 5 We want to develop a diagnostic test to rule out a disease when the outcome of the test is negative (negative predictive value) with a probability 90% at least. It is known that the prevalence of the disease in the population is 15% and the sensitivity of the test is set to 80%.\nWhat must be the minimum specificity of the test? Using the previous specificity, compute the probability of a correct diagnostic. If we apply the same test two times to the same patient with negative outcomes, what is the probability of ruling out the disease? Solution Let $D$ be the event of having the disease and $+$ and $-$ the events of getting a positive and a negative outcome in the diagnostic test respectively.\nMinimum specificity $P(-|\\overline{D})=0.3176$. $P(TP) + P(TN) = P(D\\cap +) + P(\\overline{D}\\cap -) = 0.12+0.27 = 0.39$. $P(\\overline{D}| -_1\\cap -_2)=0.9346$. Question 6 It is known that in a city one out of 20 persons, in average, has blood type $AB$.\nIf we draw randomly 200 blood donors, what is the probability of having at least 5 with blood type $AB$? If we draw randomly 10 blood donors, what is the probability of having more than 8 with blood type different of $AB$? Solution Let $X$ be the number of donors with blood type $AB$ in a sample of 200 blood donors. Then $X\\sim B(200,1/20)\\approx P(10)$, and $P(X\\geq 5)=0.9707$. Let $Y$ be the number of donors with no blood type $AB$ in a sample of 10 blood donors. Then $Y\\sim B(10,19/20)$, and $P(Y\u0026gt;8)=0.9139$. Question 7 In a course there are 150 females and 80 males. It is known that the distribution of scores of females and males are normal with the same standard deviation. It is also known that there are 120 females and 56 males with a score greater than 5, and 36 males with a score between 5 and 7.\nCompute the means and standard deviations of the distributions of scores of females and males. How many females will have a score between 4.5 and 8? Above what score will be 10% of females? Solution Let $X$ be score of a random male in the course and $Y$ the score of a random female in the course. Then $X\\sim N(\\mu_x,\\sigma)$ and $Y\\sim N(\\mu_y,\\sigma)$.\n$\\mu_x=5.87$, $\\mu_y=6.41$ and $\\sigma=1.68$. $P(4.5\\leq Y\\leq 8) = 0.7018$, that is, $105.27$ females. $P_{90}=8.8$. ","date":1545004800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1615158565,"objectID":"e9e0cdf3b80324768c6aa76bd9d50ebc","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-12-17/","publishdate":"2018-12-17T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-12-17/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: December 17, 2018\nQuestion 1 The chart below represents the cumulative distribution of the number of daily defective drugs produced by a machine in a sample of 40 days.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-12-17","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 19, 2018\nQuestion 1 In a population that is exposed to two viruses strains $A$ and $B$ it is known that 2% of persons are immune only to virus $A$ and 4% are immune only to virus $B$. On the other hand it is known tha 91% of the population would be infected by some of the two viruses.\nWhat is the probability that a person is immune to the two viruses? What is the probability that a person immune to virus $A$ is infected by virus $B$? Are dependent the events of being immune to the two viruses? Solution Let $A$ and $B$ the events of being inmune to virus $A$ and $B$ respectively.\n$P(A\\cap B)=0.09$ $P(\\overline B|A)=0.1818$. The events are dependent. Question 2 In a study about the blood pressure the systolic pressure of 2400 males older than 18 was measured. It was observed that 640 had a pressure greater than 14 mmHg and 1450 had between 10 and 14 mmHg. Assuming that the systolic pressure in males older than 18 is normally distributed,\nCompute the mean and the standard deviation. Compute how many males had a systolic pressure between 11 and 13 mmHg. Compute the value of the systolic pressure such that there was 300 males with a systolic pressure above it. Solution Let $X$ be the systolic pressure, $X\\sim N(12.5788, 2.2815)$. $P(10\\leq X\\leq 13)=0.3288$ and there are $789.0501$ persons with a systolic pressure between 11 and 13 mmHg. 300 males have a systolic pressure above 15.2 mmHg. Question 3 The average number of people that enters the intensive care unit of a hospital in an 8-hours shift is $1.4$.\nCompute the probability that a day enter more than 3 persons in the ICU. Compute the probability that in a week there are more than one day with less than 3 persons entering the ICU. Solution Let $X$ be the number of persons that enter in the ICU in a day. $X\\sim P(4.2)$ and $P(X\u0026gt;3)=0.6046$. Let $Y$ be the number of days in a week with less than 3 persons entering the ICU. $Y\\sim B(7,0.2102)$ and $P(Y\u0026gt;1)=0.4513$. Question 4 Two hospitals use different tests $A$ and $B$ to detect a streptococcal infection. The tables below show the results of applying these tests in each hospital during the last year.\n$$ \\begin{array}{ccc} \\mbox{First hospital} (A) \u0026amp; \\quad \u0026amp; \\mbox{Second hospital} (B) \\newline \\begin{array}{|l|r|r|} \\hline \u0026amp; \\mbox{Test} + \u0026amp; \\mbox{Test} - \\newline \\hline \\mbox{Infected} \u0026amp; 705 \u0026amp; 65 \\newline \\hline \\mbox{Non infected} \u0026amp; 120 \u0026amp; 4110 \\newline \\hline \\end{array} \u0026amp; \u0026amp; \\begin{array}{|l|r|r|} \\hline \u0026amp; \\mbox{Test} + \u0026amp; \\mbox{Test} - \\newline \\hline \\mbox{Infected} \u0026amp; 1710 \u0026amp; 70 \\newline \\hline \\mbox{Non infected} \u0026amp; 415 \u0026amp; 7805 \\newline \\hline \\end{array} \\end{array} $$\nCompute the probability of a correct diagnostic with test $A$. Compute the positive predicted value of test $A$. Compute the negative predicted value of test $B$. How can these tests be combined to reduce the risk of wrong diagnosis? Solution $P(\\mbox{Correct diagnotic})=0.963$. $PPV_A=0.8545$. $NPV_B=0.9911$. $NPV_A=0.9844$ and $PPV_B=0.8047$. Since $B$ has the higher negative predicted value and $A$ the higher positive predicted value, it is better to use test $B$ first to rule out the infection and then apply test $A$ only to individuals with a positive outome in test $B$, to confirm the infection. ","date":1542585600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"a9d96c57ed3c123a92228bb40a986d97","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-11-19/","publishdate":"2018-11-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-11-19/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 19, 2018\nQuestion 1 In a population that is exposed to two viruses strains $A$ and $B$ it is known that 2% of persons are immune only to virus $A$ and 4% are immune only to virus $B$.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-11-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: October 29, 2018\nQuestion 1 A study about obesity in a city has measured the body mass index (BMI) in a sample. The collected data is shown in the table below.\n$$ \\begin{array}{lr} \\mbox{BMI} \u0026amp; \\mbox{Persons} \\newline \\hline 15-18 \u0026amp; 5 \\newline 18-21 \u0026amp; 62 \\newline 21-24 \u0026amp; 72 \\newline 24-27 \u0026amp; 45 \\newline 27-30 \u0026amp; 12 \\newline 30-33 \u0026amp; 2 \\newline 33-36 \u0026amp; 1 \\newline 36-39 \u0026amp; 1 \\newline \\hline \\end{array} $$\nCompute the percentage of people with a BMI between 19 and 25. Which is the BMI with a 20% of persons above it? Are there outliers in the sample? Give the outliers if there are some. Solution Non interpolating:\n$F(19)\\approx 0.335$ and $F(25)\\approx 0.920$, so the percentage of people between 19 and 25 is 58.5% approximately. $P_{80}\\approx 25.5$. $Q_1\\approx 19.5$, $Q_3\\approx 25.5$, $IQR\\approx 6$, $f_1\\approx 10.5$ and $f_2\\approx 34.5$. Thus there is at leats one outlier in the interval (36-39). Interpolating: $F(19)=0.1283$ and $F(25)=0.77$, so the percentage of people between 19 and 25 is 64.17% $P_{80}=25.4$. $Q_1=20.1774$, $Q_3=24.7333$, $IQR=4.5559$, $f_1=13.3435$ and $f_2=31.5671$. Thus there are at leats two outliers in the intervals (33-36) and (36-39). Question 2 A gene of a rat species has been modified to help the metabolization of cholesterol in blood. To check the effectiveness of this genetic modification two samples of 20 rats were drawn, ones with the gene modified and the others not, and they were fed with the same diet with different concentrations of palm oil during one month. The following sums summarize the results:\nPalm oil quantity in gr (the same in both samples)\n$\\sum x_i=640.6467$ gr, $\\sum x_i^2=23508.6387$ gr², $\\sum(x_i-\\bar x)^3=-5527.08$ gr³, $\\sum(x_i-\\bar x)^4=792910$ gr⁴\nCholesterol level in blood in mg/dl of non genetically modified rats $\\sum y_j=2945.8545$ mg/dl, $\\sum y_j^2=439517.5975$ (mg/dl)², $\\sum(y_j-\\bar y)^3=604.08$ (mg/dl)³, $\\sum(y_j-\\bar y)^4=3717331.07$ (mg/dl)⁴\n$\\sum x_iy_j=98156.0658$ gr$\\cdot$mg/dl.\nCholesterol level in blood in mg/dl of genetically modified rats\n$\\sum y_j=2126.5899$ mg/dl, $\\sum y_j^2=226824.5373$ (mg/dl)², $\\sum(y_j-\\bar y)^3=-629.4$ (mg/dl)³, $\\sum(y_j-\\bar y)^4=48248.29$ (mg/dl)⁴ $\\sum x_iy_j=69517.3648$ gr$\\cdot$mg/dl.\nIn which sample the cholesterol has a more representative mean, genetically modified or non modified rats? In which sample the distribution of cholesterol is more skew? In which sample the kurtosis of the distribution of cholesterol is less normal? Which rat has a cholesterol level relatively bigger, a genetically modified rat with a cholesterol level of 130 mg/dl, or a non genetically modified rat with a cholesterol level of 145 mg/dl? In which sample the regression line of cholesterol on the palm oil quantity fits better? According to the regression line, what level of cholesterol is expected for a genetically modified rat with a diet of 25 gr of palm oil? And for a non genetically modified rat? What amount of palm oil must be supplied to a non genetically modified rat to have a cholesterol level smaller than 150 mg/dl? Is this prediction reliable? Solution Non genetically modified rats: $\\bar y=147.2927$ mg/dl, $s^2_y=280.7332$ (mg/dl)², $s=16.7551$ mg/dl and $cv_y=0.1138$. Genetically modified rats: $\\bar y=106.3295$ mg/dl, $s^2_y=35.265$ (mg/dl)², $s=5.9384$ mg/dl and $cv_y=0.0558$. Thus, the mean of genetically modified rats is more representative since the coef. of variation is smaller. Non genetically modified rats: $g_1=0.0064$. Genetically modified rats: $g_1-0.1503$ Thus, the distribution of genetically modified rats is more skew since the coef. of skewness is further from 0. Non genetically modified rats: $g_2=-0.6416$. Genetically modified rats: $g_2-1.0602$ Thus, the kurtosis of the distribution of genetically modified rats is less normal since the coef. of kurtosis is further from 0. Non genetically modified rats: $z(145)=-0.1368$. Genetically modified rats: $z(130)=3.986$. Thus, a cholesterol level of 130 mg/dl in genetically modified rats is relatively greater than 145 mg/dl in non genetically modidied rats. $\\bar x=32.0323$ gr, $s^2_x=149.3614$ gr². Non genetically modified rats: $s_{xy}=189.6733$ gr$\\cdot$mg/dl and $r^2=0.858$. Genetically modified rats: $s_{xy}=69.8861$ gr$\\cdot$mg/dl and $r^2=0.9273$. Thus, the regression line fits better in genetically modified rats since the coef. of determination is greater. Regression line of $Y$ on $X$ in non genetically modified rats: $y=106.615+1.2699x$. Prediction: $y(25)=138.3624$ Regression line of $Y$ on $X$ in genetically modified rats: $y=91.3416+0.4679x$. Prediction: $y(25)=103.0391$ Regression line of $X$ on $Y$ in non genetically modified rats: $x=-67.4838+0.6756y$. Prediction: $x(150)=33.8615$. The prediction is very reliable since the coef. of determination is close to 1. Question 3 It is known that the regression line of $Y$ on $X$ has equation $3x+2y-4=0$ and it explains half of the variability of $Y$. According to the linear regression model, how much will $X$ change for each unit that increases $Y$?\nSolution $r^2=0.5$ and $b_{xy}=-\\frac{1}{3}$, so $X$ decreases 1/3 of the increase of $Y$. ","date":1540771200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1609746010,"objectID":"99f2fc30cabfe509c3a18fd72f648889","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-10-29/","publishdate":"2018-10-29T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-10-29/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: October 29, 2018\nQuestion 1 A study about obesity in a city has measured the body mass index (BMI) in a sample. The collected data is shown in the table below.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-10-29","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: May 31, 2018\nQuestion 1 The ages of a sample of patients of a physical therapy clinic are:\n25, 30, 44, 44, 51, 51, 53, 56, 57, 58, 58, 58, 59, 59, 61, 63, 63, 63, 66, 68, 70, 71, 72, 74, 82, 85\nCompute the quartiles.\nDraw the box plot and identify outliers (do not group data into intervals).\nSplit the sample into two groups, patients younger and older than 65. In which group is the mean more representative. Justify the answer.\nWhich distribution is less symmetric, the one of patients younger than 65 or the one of patients older?\nWhich age is relatively smaller with respect to its group, 50 years in the group of patients younger than 65 or 72 years in the group of patients older than 65?\nUse the following sums for the computations.\nYounger than 65: $\\sum x_i=953$ years, $\\sum x_i^2=52475$ years$^2$, $\\sum (x_i-\\bar x)^3=-30846.51$ years$^3$ and $\\sum (x_i-\\bar x)^4=939658.83$ years$^4$.\nOlder than 65: $\\sum x_i=588$ years, $\\sum x_i^2=43530$ years$^2$, $\\sum (x_i-\\bar x)^3=1485$ years$^3$ and $\\sum (x_i-\\bar x)^4=26983.5$ years$^4$.\nSolution $Q_1=53$ years, $Q_2=59$ years and $Q_3=68$ years. There are 2 outliers: 25, 30. Let $x$ be the age in patients younger than 65 and $y$ the age in patients older than 65.\n$\\bar x=52.9444$ years, $s_x^2=112.1636$ years$^2$, $s_x=10.5907$ years and $cv_x=0.2$.\n$\\bar y=73.5$ years, $s_y^2=39$ years$^2$, $s_y=6.245$ years and $cv_y=0.085$.\nThe mean is more representative in patients older than 65 since the coefficient of variation is smaller. $g_{1x}=-1.4426$ and $g_{1y}=0.7621$, thus the distribution of ages of people younger than 65 is less symmetric. The standard scores are $z_x(50)=-0.278$ and $z_y(72)=-0.2402$, thus 50 years is relative smaller in the group of people younger than 65. Question 2 The table below shows the number of injuries of several teams during a league and the average varm-up time of its players.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Warm-up time} \u0026amp; 15 \u0026amp; 35 \u0026amp; 22 \u0026amp; 28 \u0026amp; 21 \u0026amp; 18 \u0026amp; 25 \u0026amp; 30 \u0026amp; 23 \u0026amp; 20 \\newline \\mbox{Injuries} \u0026amp; 42 \u0026amp; 2 \u0026amp; 16 \u0026amp; 6 \u0026amp; 17 \u0026amp; 29 \u0026amp; 10 \u0026amp; 3 \u0026amp; 12 \u0026amp; 20 \\newline \\hline \\end{array} $$\nDraw the scatter plot.\nWhich regression model is more suitable to predict the number of injuries as a function of the warm-up time, the logarithmic or the exponential? Use that regression model to predict the expected number of injuries for a team whose players warm-up 20 minutes a day.\nWhich regression model is more suitable to predict the warm-up time as a function of the number of injuries, the logarithmic or the exponential? Use that regression model to predict the warm-up time required to have no more than 10 injuries in a league.\nAre these predictions reliable? Which one is more reliable?\nUse the following sums for the computations ($X$ warm-up time and $Y$ number of injuries):\n$\\sum x_i=237$, $\\sum \\log(x_i)=31.3728$, $\\sum y_j=157$, $\\sum \\log(y_j)=24.0775$,\n$\\sum x_i^2=5937$, $\\sum \\log(x_i)^2=98.9906$, $\\sum y_j^2=3843$, $\\sum \\log(y_j)^2=66.3721$,\n$\\sum x_iy_j=3115$, $\\sum x_i\\log(y_j)=519.1907$, $\\sum \\log(x_i)y_j=465.8093$, $\\sum \\log(x_i)\\log(y_j)=73.3995$.\nSolution $\\bar x=23.7$ min, $s_x^2=32.01$ min$^2$. $\\bar \\log(x)=3.1373$ log(min), $s_{\\log(x)}^2=0.0565$ log(min)$^2$. $\\bar y=15.7$ injuries, $s_y^2=137.81$ injuries$^2$. $\\bar \\log(y)=2.4078$ log(injuries), $s_{\\log(y)}^2=0.8399$ log(injuries)$^2$. $s_{x\\log(y)}=-5.1446$, $s_{\\log(x)y}=-2.6744$. Exponential determination coefficient: $r^2=0.9844$. Logarithmic determination coefficient: $r^2=0.9185$. So the exponential regression model es better to predict the number of injuries as a function of the warm-up time. Exponential regression model: $y=e^{6.2168+-0.1607x}$.\nPrediction: $y(20)=20.1341$ injuries.\nThe logarithmic model is better to predict the warm-up time as a function of the number of injuries. Logarithmic regression model: $x=164.1851+-47.3292\\log(y)$. Prediction: $x(10)=55.2056112360638$ min.\nBoth predictions are very reliable since the determination coefficient is very high, but the last one is a little less reliable as it is for a value further from the data range.\nQuestion 3 An ultrasonic technique is used to diagnose a disease with a sensitivity of 91% and a specificity of 98%. The prevalence of the disease is 20%,\nIf we apply the technique to an individual and the outcome is positive, what is the probability of having the disease for that individual?\nIf the outcome was negative, what is the probability of not having the disease?\nIs this technique more reliable to confirm or to rule out the disease? Justify the answer.\nCompute the probability of having a correct diagnosis with this technique.\nSolution Let $D$ the event corresponding to have the disease and + and - the events corresponding to have a positive and negative outcome respectively in the test.\n$PPV=0.9192$. $NPV=0.9776$. It is more reliable to rule out the disease since the NPV is greater than the PPV. $P(D\\cap +)+P(\\overline D\\cap -) = 0.966$. Question 4 It is known that the femur length of a fetus with 25 weeks of pregnancy follows a normal distribution with mean 44 mm and standard deviation 2 mm.\nCompute the probability that the femur length of a fetus with 25 weeks is greater than 46 mm.\nCompute the probability that the femur length of a fetus with 25 weeks is between 46 and 49 mm.\nCompute an interval $(a,b)$ centered at the mean, such that it contains 80% of the femur lengths of fetus with 25 weeks.\nSolution Let $X\\sim N(44,2)$ be the femur length of fetus with 25 weeks of pregnancy.\n$P(X\u0026gt;46)=0.1587$. $P(46\u0026lt;X\u0026lt;49))=0.1524$. The interval centered at $44$ that contains 80% of the femur lengths of fetus with 25 weeks is $(41.4369,46.5631)$. Question 5 The probability that an injury $A$ is repeated is $4/5$, the probability that another injury $B$ is repeated is $1/2$, and the probability that none of them are repeated is $1/20$. Compute the probability of the following events:\nAt least one injury is repeated.\nOnly injury $B$ is repeated.\nInjury $B$ is repeated if injury $A$ has been repeated.\nInjury $B$ is repeated if injury $A$ has not been repeated.\nSolution $P(A\\cup B)=19/20$. $P(B\\cap\\overline{A})=3/20$. $P(B/A)=7/16$. $P(B/\\overline{A})=3/4$. Question 6 A physical therapy clinic opens 6 hours a day and the average number of patients that arrive to the clinic is 12 a day.\nCompute the probability of arriving more than 4 patients in 1 hour.\nIf the clinic has 4 physiotherapists and each of them can treat one patient per hour, what is the probability that a day there was some hour in which some patient can not be attended? How many physiotherapists must be in the clinic to guarantee that this probability is less than 10%?\nSolution Let $X$ be the number of patients that arrive in 1 hours. $X\\sim P(2)$ and $P(X\u0026gt;4)=0.0527$. Let $Y$ be the number of hours in a day in which some patient can not be treated. $Y\\sim B(6, 0.0527)$ and $P(Y\u0026gt;0)=0.2771$.\nThe clinic requires 5 physiotherapists, since $P(X\u0026gt;5)=0.0527$ and $P(Y\u0026gt;0)=0.0954$, with $Y\\sim B(6, 0.0166)$ now. ","date":1527724800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"c366dd731d9f6a89fe137304002a8cee","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-05-31/","publishdate":"2018-05-31T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-05-31/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: May 31, 2018\nQuestion 1 The ages of a sample of patients of a physical therapy clinic are:\n25, 30, 44, 44, 51, 51, 53, 56, 57, 58, 58, 58, 59, 59, 61, 63, 63, 63, 66, 68, 70, 71, 72, 74, 82, 85","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2018-05-31","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Degrees: Physiotherapy\nDate: April 9, 2018\nQuestion 1 The chart below describes the distribution of the head arc of rotation (in degrees) in people working with and without computers.\nPlot the ogive of the head arc of rotation for people working with computers. If a person with a head arc of rotation less than or equal to 115 degrees is considered a person with reduced mobility, what percentage of people working with computers has reduced mobility? Which distribution has a more representative mean of the head arc of rotation, people working with computers or people not working with computers? Compute the global mean of the head arc of rotation. Which distribution is more asymmetric, people working with computers or people not working with computers? Which value of the head arc of rotation is relatively less, 150 degrees in people working with computers or 170 in people not working with computers? Use the following sums for the computations.\nWith computer: $\\sum x_i=3970$ degrees, $\\sum x_i^2=534750$ degrees$^2$, $\\sum (x_i-\\bar x)^3=103662.22$ degrees$^3$ and $\\sum (x_i-\\bar x)^4=7903715.56$ degrees$^4$.\nWithout computers: $\\sum x_i=4230$ degrees, $\\sum x_i^2=645900$ degrees$^2$, $\\sum (x_i-\\bar x)^3=-42359.69$ degrees$^3$ and $\\sum (x_i-\\bar x)^4=4101700.53$ degrees$^4$.\nSolution $F(115)=0.1667 \\rightarrow 16.67%$ of people working with computers have reduced mobility. With computer: $\\bar x=132.3333$ degrees, $s_x^2=312.8889$ degrees², $s_x=17.6887$ degrees and $cv_x=0.1337$ Without computer: $\\bar x=151.0714$ degrees, $s_x^2=245.2806$ degrees², $s_x=15.6614$ degrees and $cv_x=0.1037$ The mean of people working without computer is more representative than the mean of people working with computers since its coefficient of variation is smaller. $\\bar x=141.3793$. With computer $g_1=0.6243$ and without computer $g_1=-0.3938$. Therefore, the distribution of people working with computers is more asymmetric. Standard scores: $z(150)=0.9988$ and $z(170)=1.2086$. Therefore, an arc of rotation of 150 degrees in people working with computers is relatively smaller than an arc of rotation of 170 in people working without computers. Question 2 The concentration of a drug in blood $C$, in mg/dl, depends on time $t$, in hours, according to the following table:\n$$ \\begin{array}{lrrrrrrr} \\hline \\mbox{Time} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 5 \u0026amp; 6 \u0026amp; 7 \u0026amp; 8\\newline \\mbox{Concentration} \u0026amp; 25 \u0026amp; 36 \u0026amp; 48 \u0026amp; 64 \u0026amp; 86 \u0026amp; 114 \u0026amp; 168\\newline \\hline \\end{array} $$\nWhich regression model, the linear or the exponential, is more reliable to predict the concentration of the drug as a function of time? Use the best model to predict the concentration of drug in blood after $4.8$ hours. Use the following sums for the computations:\n$\\sum x_i=35$, $\\sum \\log(x_i)=10.6046$, $\\sum y_j=541$, $\\sum \\log(y_j)=29.147$,\n$\\sum x_i^2=203$, $\\sum \\log(x_i)^2=17.5205$, $\\sum y_j^2=56937$, $\\sum \\log(y_j)^2=124.0131$,\n$\\sum x_iy_j=3328$, $\\sum x_i\\log(y_j)=154.3387$, $\\sum \\log(x_i)y_j=951.6961$, $\\sum \\log(x_i)\\log(y_j)=46.0805$.\nSolution Linear model of Concentration on Time: $\\bar x=5$ hours, $s_x^2=4$ hours² . $\\bar y=77.2857$ mg/dl, $s_y^2=2160.7755$ (mg/dl)². $s_{xy}=89$ hours⋅mg/dl.\nLinear coefficient of determination of Concentration on Time $r^2=0.9165$.\nExponential model of Concentration on Time: $\\overline{\\log(y)}=4.1639$ log(mg/dl), $s_{\\log(y)}^2=0.3785$ log(mg/dl)². $s_{x\\log(y)}=1.2291$ hours⋅log(mg/dl).\nExponential coefficient of determination of Concentration on Time $r^2=0.9979$.\nTherefore, the exponential model explains better than the linear one the relation between the concentration and time, since its coefficient of determination is greater.\nExponential model of Concentration on Time: $y=e^{2.6275 + 0.3073x}$. $y(4.8)=60.4853$ mg/dl.\n","date":1523232000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"f59a5652dc07feee9c0ee716219e2bb6","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-04-09/","publishdate":"2018-04-09T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2018-04-09/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: April 9, 2018\nQuestion 1 The chart below describes the distribution of the head arc of rotation (in degrees) in people working with and without computers.\nPlot the ogive of the head arc of rotation for people working with computers.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2018-04-09","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 19, 2018\nQuestion 1 Find an equation of the tangent plane to the surface $S: e^xy-zy^2+\\frac{x^4}{z}=-1$ at the point $P=(0,1,2)$. Find the tangent line to the curve obtained by the intersection of $S$ and the plane $z=2$ at the given point $P$. Solution Tangent plane: $x-3y-z+5=0$. Tangent line: $(3t,1+t)$ or $y=\\frac{x}{3}+1$. Question 2 An organism metabolizes (eliminates) alcohol at a rate of three times the amount of alcohol present in the organism per hour. If the organism does not have alcohol at initial time and it starts to get alcohol at a constant rate of 12 cl per hour; how much alcohol will be in the organism after 5 hours? What will be the maximum amount of alcohol in the organism? When will that maximum amount be achieved?\nSolution Let $y$ be the alcohol in the organism and $t$ the time.\nDifferential equation: $y\u0026rsquo;=12-3y$.\nSolution: $y(t)=4-4e^{-3t}$.\n$y(5)=3.99$ cl.\nThe maximum amount of alcohol will be 4 cl and it will be achieved at $t=\\infty$. Question 3 Three alleles (alternative versions of a gene) $A$, $B$ and $O$ determine the four blood types $A$ ($AA$ or $AO$), $B$ ($BB$ or $BO$), $O$ ($OO$) and $AB$. The Hardy-Weinberg Law states that the proportion of individuals in a population who carry two different alleles is\n$$ p(x,y,z)=2xy+2xz+2yz $$\nwhere $x$, $y$ and $z$ are the proportions of $A$, $B$ and $O$ in the population. Use the fact that $x+y+z=1$ to compute the maximum value of $p$.\nSolution There is a local maximum at $(\\frac{1}{3},\\frac{1}{3})$ and $f(\\frac{1}{3},\\frac{1}{3})=\\frac{2}{3}$. Question 4 Three substances interact in a chemical process in quantities $x$, $y$ and $z$. At equilibrium, the three quantities are related by the following equation:\n$$ \\ln z - \\frac{x^2y}{z}=-1 $$\nAssume $z$ is an implicit function of $x$ and $y$; compute the variation of $z$ when $x=y=z=1$ and $y$ decreases at the same rate as $x$ increases.\nSolution Directional derivative of $z$ in $(1,1,1)$ along $\\mathbf{v}=(1,-1)$: $z\u0026rsquo;_\\mathbf{v}(1,1,1)=\\frac{1}{2\\sqrt{2}}$. ","date":1516320000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"db8e1450fc62ef751b48ebd4ae3e9902","permalink":"/en/teaching/calculus/exams/pharmacy-2018-01-19/","publishdate":"2018-01-19T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2018-01-19/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 19, 2018\nQuestion 1 Find an equation of the tangent plane to the surface $S: e^xy-zy^2+\\frac{x^4}{z}=-1$ at the point $P=(0,1,2)$. Find the tangent line to the curve obtained by the intersection of $S$ and the plane $z=2$ at the given point $P$.","tags":["Exam"],"title":"Pharmacy exam 2018-01-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 19, 2018\nQuestion 1 A study done on a group of senior people to determine the relation between age $X$, and the number of visits to the doctor $Y$, shows the following results:\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Age} \u0026amp; 62 \u0026amp; 65 \u0026amp; 71 \u0026amp; 79 \u0026amp; 83 \u0026amp; 88 \u0026amp; 90 \u0026amp; 95\\newline \\mbox{No. of Visits} \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 6 \u0026amp; 6 \u0026amp; 8 \u0026amp; 10 \u0026amp; 14\\newline \\hline \\end{array} $$\nDo the following:\nEstimate the number of times a 70-year-old patient will go to the doctor, according to a linear regression model. What will be the estimate equal to if you consider an exponential model instead of the linear one? Which of the two estimates is more reliable? A potential model has equation of the type $Y=aX^b$, where $a$ and $b$ are constants to be determined; what transformation should you apply to the variables $X$ and $Y$ to change a potential model into a linear one? Use the following sums for the computations: $\\sum x_i=633$, $\\sum \\log(x_i)=34.8835$, $\\sum y_j=53$, $\\sum \\log(y_j)=13.7827$, $\\sum x_i^2=51109$, $\\sum \\log(x_i)^2=152.28$, $\\sum y_j^2=461$, $\\sum \\log(y_j)^2=26.6206$, $\\sum x_iy_j=4509$, $\\sum x_i\\log(y_j)=1144.0108$, $\\sum \\log(x_i)y_j=235.1289$, $\\sum \\log(x_i)\\log(y_j)=60.7921$.\nSolution Linear model of Visits on Age: $\\bar x=79.125$ years, $s_x^2=127.8594$ years² . $\\bar y=6.625$ visits, $s_y^2=13.7344$ visits². $s_{xy}=39.4219$ years⋅visits. Regression line of Visits on Age: $y=-17.771 + 0.3083x$. $y(70) =3.8116$ visits.\n$\\overline{\\log(y)}=1.7228$ log(visits), $s_{\\log(y)}^2=0.3594$ log(visits)². $s_{x\\log(y)}=6.6823$ years⋅log(visits). Exponential model of Visits on Age: $y=e^{-2.4124 + 0.0523x}$. $y(70)=3.4762$ visits.\nLinear coefficient of determination of Visits on Age $r^2=0.885$. Exponential coefficient of determination of Visits on Age $r^2=0.9716$. Thus, the exponential model explains a little bit better the number of visits to the doctor with respect to the age.\nWe must apply the logarithm to both Visits and Age: $\\log(Y)=\\log(aX^b)\\Rightarrow \\log(Y)=\\log(a)+\\log(X^b)=\\log(a)+b\\log(X)=a\u0026rsquo;+b\\log(X)$.\nQuestion 2 The grass pollen concentration in the center of a city in grains/m$^3$ of air, during the last year, is given in the following table:\n$$ \\begin{array}{cr} \\hline \\mbox{Pollen concentration} \u0026amp; \\mbox{Num days}\\newline 0-300 \u0026amp; 51\\newline 300-500 \u0026amp; 60\\newline 500-600 \u0026amp; 79\\newline 600-800 \u0026amp; 91\\newline 800-1000 \u0026amp; 60\\newline 1000-1300 \u0026amp; 24\\newline \\hline \\end{array} $$\nHealth authorities have determined that the level of pollen did not pose a risk for 75% of the days in the year; what is the minimum level of pollen that is consider a health hazard? On days with pollen level between 575 and 860 health authorities issue a warning to citizens; on how many days of the last year there were warnings issued? Are there outliers in the above sample? Platanaceae has a pollen cycle similar to grass: if $X$ are the pollen levels of grass, and $Y$ are the levels of the platanaceae, it is known that $Y=0.5X-100$. What will be the average pollen level for platanaceae? Which of the two averages is more representative? Can one say that the level of grass pollen comes from a population that is normally distributed? Use the following sums for the computations: $\\sum x_i=220400$ grains/m$^3$, $\\sum x_i^2=159575000$ (grains/m$^3$)$^2$, $\\sum (x_i-\\bar x)^3=261917220.867$ (grains/m$^3$)$^3$ y $\\sum (x_i-\\bar x)^4=4872705679772.61$ (grains/m$^3$)$^4$.\nSolution $P_{75}=784.0417$ grains/m³. $F(575)=0.4664$ and $F(860)=0.8192$, so the frequency of days with a warning is $0.3528$ that correspond to $128.77$ days. $Q_1=434.1849$ grains/m³, $Q_3=784.0417$ grains/m³ and $IQR=349.8568$ grains/m³. Fences: $F_1=-90.6001$ grains/m³ and $F_2=1308.8269$ grains/m³. Since all the values fall into the fences there are no outliers. $\\bar x=603.8356$ grains/m³, $s_x^2=72574.3291$ (grains/m³)², $s_x=269.3962$ grains/m³ and $cv_x=0.4461$ $\\bar y=201.9178$ grains/m³, $s_y=134.6981$ grains/m³ and $cv_y=0.6671$. The mean of $X$ is more representative than the mean of $Y$ as $cv_x\u0026lt;cv_y$. $g_1=0.0367$ and $g_2=-0.4654$. As both of them are between -2 and 2, we can assume that the pollen concentrations are normally distributed. Question 3 Polen level in Madrid in the year 2017 is normally distributed with mean equal to 90 particles per cubic meter. In 42 days of 2017, the level was above 120 particles per cubic meter. Do the following:\nCompute the standard deviation of the polen level in the year 2017. On how many days the polen level did not go over 50 particles per cubic meter of air? On 20% of the days the level of polen was high enough to pose a health risk for allergic people; what is the level of polen that triggers this high risk situation? Solution Let $X$ be the polen level in Madrid in 2017. $X\\sim N(90,\\sigma)$.\n$\\sigma=25$ grains/m³. $P(X\\leq 50)=0.0548$ that correspond to $20.0017$ days. $P_{80}=111.0405$ grains/m³. Question 4 A study on two drugs to reduces the cholesterol levels in blood shows that drug $A$ is effective in 75% of the people, and drug $B$ is effective in 85% of the cases. There is a 5% of people for which none of the two drugs works.\nCompute the percentage of the population for which only drug $A$ works. Assume that drug $A$ works on a person; what is the probability hat drug $B$ will also work in that person? On the other hand, if drug $B$ has not worked for a person, what is the probability that drug $A$ will actually work? Are the effects of the two drugs independent events? Solution $P(A\\cap \\overline B)=0.1$, that is, a $10%$. $P(B|A)=0.8667$. $P(A|\\overline B)=0.6667$. $P(B|A)\\neq P(B)$, thus the events are dependent. Question 5 The weekly average births on a hospital is equal to 14.\nCompute the probability that on a given day more than 2 births take place. Compute the probability that during a week there are more than one day without births taken place. Solution Let $X$ be the number of births in a day. $X\\sim P(2)$. $P(X\u0026gt;2)=0.3233.$ Let $Y$ be the number of days without births in a week. $Y\\sim B(7,0.1353)$. $P(Y\u0026gt;1)=0.2427$. Question 6 A trial to develop a diagnosis test for a desease is tested on 250 people, of which 50 suffer the desease and 200 are healthy. The medical team in charge of the trial wants for the test to have a positive predictive value of $0.7$, and a negative predictive value of $0.9$.\nIn order to get the values given above, how many of the healthy people should get a positive outcome in the test? And how many of the sick people should get a negative outcome in the test? What is the probability that a person with two positive outcomes in the test has the desase? Solution Let $D$ be the event of having the disease.\n$P(+|\\overline{D})=0.0625\\Rightarrow 12.5$ persons. $P(-|D)=0.4165\\Rightarrow 20.825$ persons. $P(D|+\\cap +)=0.9561$. ","date":1516320000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"6176c3ec77be3d921871660423823b51","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-01-19/","publishdate":"2018-01-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2018-01-19/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 19, 2018\nQuestion 1 A study done on a group of senior people to determine the relation between age $X$, and the number of visits to the doctor $Y$, shows the following results:","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2018-01-19","type":"book"},{"authors":["Alfonso; Sánchez-Alberca, Alfredo; Sanchez-Rodríguez, María Luisa; García-Medina, José Javier Parra-Blesa"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"32a70383266fe72dd001692ee75901b5","permalink":"/en/publication/analisis-2018/","publishdate":"2020-09-16T21:26:03.202933Z","relpermalink":"/en/publication/analisis-2018/","section":"publication","summary":"","tags":[],"title":"Análisis epidemiológico evolutivo del daño sectorizado en la papila y retina papilar a través de OCT. Nueva clasificación de grados de GCAA.","type":"publication"},{"authors":["Alfonso Parra Blesa y Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"8640a1da22ff6713a8127e31da65afaf","permalink":"/en/publication/analisis-2018-2/","publishdate":"2020-09-16T21:26:03.500896Z","relpermalink":"/en/publication/analisis-2018-2/","section":"publication","summary":"","tags":[],"title":"Análisis estadístico inferencial y descriptivo de las capas retinianas; glaucoma versus no glaucoma","type":"publication"},{"authors":["Alfonso Parra Blesa y Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"5685a40d87b8cfe9d30a2750e895478b","permalink":"/en/publication/clasificacion-2018/","publishdate":"2020-09-16T21:26:03.396092Z","relpermalink":"/en/publication/clasificacion-2018/","section":"publication","summary":"","tags":[],"title":"Clasificación por estadios del glaucoma primario de ángulo abierto usando valores normalizados del anillo BMO","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"51e29658e8323bebe5e43b0869705ba6","permalink":"/en/publication/nueva-2018-2/","publishdate":"2020-09-16T21:26:03.299049Z","relpermalink":"/en/publication/nueva-2018-2/","section":"publication","summary":"","tags":[],"title":"Una nueva taxonomía de colecciones y de funciones de similitud para su comparación","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1514764800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"6135f07167be641ec45b3f1793a049e7","permalink":"/en/publication/nueva-2018/","publishdate":"2020-09-16T21:26:01.940216Z","relpermalink":"/en/publication/nueva-2018/","section":"publication","summary":"","tags":[],"title":"Una nueva taxonomía de colecciones y de funciones de similitud para su comparación","type":"publication"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 27, 2017\nQuestion 1 The following diagram show the NO₂ emissions (𝜇g/m³) in Madrid during the month of October, 2017.\nThe European Standards on Air Quality establish that the average monthly value cannot be over 40 𝜇g/m³ for a healthy environment. Was this requirement met during the month of October? Is the value computed representative of the measurements taken during the month of October? The Local Government of Madrid has set speed limits on those days with emissions measurements over 72 𝜇g/m³; furthermore, there will be additional parking restrictions if the level is over 92 𝜇g/m³. What percentage of days in October had only speed restrictions? According to the October sample shown, can we say that the distribution of the NO₂ emissions in the city of Madrid is normally distributed? Besides the NO₂ level, the Municipal Corporation also checks the level of SO₂, and it has found out that the average level of this substance during October was 2.85 𝜇g/m³, with a standard deviation equa to 0.42 𝜇g/m³. On a day with an NO₂ level of 46, and an SO₂ level of 2.24, which level should be considered higher? The Air Quality Index (AQI) is computed by multiplying the NO₂ level by 0.95, and adding 30 to the result. What was the average AQI in Madrid during the month of October? Is this value more or less representative than the average NO₂ level? Are there outliers in the NO₂ emissions in October? Justify your answer. Use the following data for your computations: $\\sum x_i=1945$ 𝜇g/m³,$\\sum x_i^2=131575$ (𝜇g/m³)$^2$, $\\sum (x_i-\\bar x)^3=93995.838$ (𝜇g/m³)³ y $\\sum (x_i-\\bar x)^4=7766271.021$ (𝜇g/m³)⁴.\nSolution $\\bar x=62.7419$ 𝜇g/m³, so the requirement was not met. $s^2=307.8044$ (𝜇g/m³)², $s=17.5444$ 𝜇g/m³, $cv=0.2796$. As the coefficient of variation is less than 0.3 there is a low variability and the mean is quite representative. $F(72)=0.7097$ and $F(92)=0.9161$, so the percentage of days with only speed restrictions is $20.64%$. $g_1=0.5615$ and $g_2=-0.3558$. As both of them are between -2 and 2, we can assume that the emissions are normally distributed. NO₂: $z(46)=-0.9543$. SO₂: $z(2.24)=-1.4524$. Thus, the NO₂ emission is relatively higher. Let $y=0.95x+30$ the AQI. $\\bar y=89.6048$, $s_y=16.6671$, $cv=0.186$. As the coeffitient of variation is lower, the AQI mean is more representative. $Q_1=49.5816$ 𝜇g/m³, $Q_3=74.0093$ 𝜇g/m³ and $IQR=24.4277$ 𝜇g/m³. Fences: $F_1=12.94$ 𝜇g/m³ and $F_2=110.65$ 𝜇g/m³. Thus, there are outliers. Question 2 The table below shows the flu incidence rate (per 100,000 people) registered after a number of days from the beginning of the study.\n$$ \\begin{array}{lrrrrrrrr} \\hline \\mbox{Days} \u0026amp; 1 \u0026amp; 5 \u0026amp; 8 \u0026amp; 12 \u0026amp; 20 \u0026amp; 26 \u0026amp; 38 \u0026amp; 44\\newline \\mbox{Flu rate} \u0026amp; 60 \u0026amp; 66 \u0026amp; 71 \u0026amp; 80 \u0026amp; 106 \u0026amp; 132 \u0026amp; 194 \u0026amp; 235\\newline \\hline \\end{array} $$\nEstimate the flu incidence rate 50 days after the beginning of the study with a linear regression model. What is the daily rate of change of the flu incidence rate, according to the linear model computed? Estimate the incidence rate 50 days after the beginning of the study with an exponential regression model? Which of the two estimates is more reliable? Why? Use the following data for your computations ($X=$Days and $Y=$Flu rate): $\\sum x_i=154$, $\\sum \\log(x_i)=19.8494$, $\\sum y_j=944$, $\\sum \\log(y_j)=37.2024$, $\\sum x_i^2=4690$, $\\sum \\log(x_i)^2=60.2309$, $\\sum y_j^2=140918$, $\\sum \\log(y_j)^2=174.8363$, $\\sum x_iy_j=25182$, $\\sum \\log(x_i)y_j=2795.2484$, $\\sum x_i\\log(y_j)=772.3504$, $\\sum \\log(x_i)\\log(y_j)=96.1974$.\nSolution Linear model of flu rate on days: $\\bar x=19.25$ days, $s_x^2=215.6875$ days² . $\\bar y=118$ people, $s_y^2=3690.75$ people². $s_{xy}=876.25$ days⋅people. Regression line of flu rate on days: $y=39.7951 + 4.0626x$. $y(50) =242.9247$.\n$4.0626$ persons per day.\n$\\overline{\\log(y)}=4.6503$ log(people), $s_{\\log(y)}^2=0.2293$ log(people)². $s_{x\\log(y)}=7.0255$ days⋅log(people). Exponential model of flu rate on days: $y=e^{4.0233 + 0.0326x}$. $y(50)=284.8357$.\nLinear coefficient of determination of flu rate on days $r^2=0.9645$. Exponential coefficient of determination of flu rate on days $r^2=0.9982$. Thus, the exponential model explains a little bit better the evolution of the the flu rate with respect to the number of days.\n","date":1511740800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"74b1e63b5e7bd2e8817622bd9d83ab3a","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-11-27/","publishdate":"2017-11-27T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-11-27/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 27, 2017\nQuestion 1 The following diagram show the NO₂ emissions (𝜇g/m³) in Madrid during the month of October, 2017.\nThe European Standards on Air Quality establish that the average monthly value cannot be over 40 𝜇g/m³ for a healthy environment.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2017-11-27","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Nov 6, 2017\nQuestion 1 Adenoma is a benign tumor, which grows usually in spherical shape. Suppose the rate of growth of the radius of a certain adenoma is equal to half the size of the radius per second; compute the rate of growth of the volume of the tumor when the radius is 5mm.\nIf the measurement of the radius has a possible error of $\\pm 0.01$mm, what will be the error in the measurement of the volume?\nNote: The volume of a sphere of radius $r$ is equal to $\\frac{4}{3}\\pi r^3$.\nSolution Rate of growth of the volume: $250\\pi$ mm³/s.\nError in the volume: $\\pi$ mm³. Question 2 The weight of a baby during the first few months of life grows at a rate proportional to the reciprocal of the weight. Suppose a baby\u0026rsquo;s weight was 3.3 kg at birth, and 4.3 kg a month later.\nWhat will be the weight of the baby one year after birth? When will the weight be equal to 8 kg? Is this model of the weight good to determine the weight of a person during his whole life? Solution Let $t$ the time and $w(t)$ the weight of the baby at time $t$.\nDifferential equation: $w\u0026rsquo;=\\dfrac{k}{w}$\nParticular solution: $w(t)=\\sqrt{7.6t+10.89}$.\n$w(12)=10.1$ kg. At 7 months. No, because the function is always increasing. Question 3 The function $f(x,y)=ye^{-x^2-\\frac{1}{2}y^2}$ gives the quantity $z=f(x,y)$ of a substance during a chemical process, depending on the quantities $x$ and $y$ of two other substances.\nCompute the maximum value of $z$ assuming that $x\\geq 0$ and $y\\geq 0$. What will be the variation of $z$ at $x=1$ and $y=0$ when $x$ increases twice as much as $y$? Compute the second degree Taylor polynomial of $f$ at the point $(1,0)$. Solution $f$ has a local maximum at $(0,1)$ and the maximum value is $z=f(0,1)=1/\\sqrt{e}$. Directional derivative of $f$ at $(1,0)$ along the direction of $v=(2,1)$: $f\u0026rsquo;_v(1,0)=\\frac{1}{e\\sqrt{5}}$. $P^2_{f,(1,0)}(x,y)=\\displaystyle\\frac{-2xy+3y}{e}$. Question 4 Given $h(t)=(t\\cos(t), \\cos(t), \\ln(t^2+1)),$ compute the tangent line and normal plane to the trajectory determined by $h$ at the point $(0,1,0)$.\nSolution Tangent line: $(t,1,0)$. Normal plane: $x=0$. ","date":1509926400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"7c1f54c779298d9a2f22ac02e09e6e40","permalink":"/en/teaching/calculus/exams/pharmacy-2017-11-06/","publishdate":"2017-11-06T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2017-11-06/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Nov 6, 2017\nQuestion 1 Adenoma is a benign tumor, which grows usually in spherical shape. Suppose the rate of growth of the radius of a certain adenoma is equal to half the size of the radius per second; compute the rate of growth of the volume of the tumor when the radius is 5mm.","tags":["Exam"],"title":"Pharmacy exam 2017-11-06","type":"book"},{"authors":null,"categories":null,"content":"From now on there are available some cheat sheets for Calculus and Statistics. These cheat sheets contains a summary with the main formulas used in Calculus and Statistics.\nThe cheat sheets can be downloaded from the following links:\nCalculus cheat sheets Statistics cheat sheets I would appreciate if you inform me about any mistake that you detect in this sheets.\n","date":1509618110,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"0eaeafaa3a3d783c1ed1b386110fed16","permalink":"/en/post/cheat-sheet/","publishdate":"2017-11-02T10:21:50Z","relpermalink":"/en/post/cheat-sheet/","section":"post","summary":"From now on there are available some cheat sheets for Calculus and Statistics. These cheat sheets contains a summary with the main formulas used in Calculus and Statistics.\n","tags":null,"title":"Calculus and Statistics cheat sheets","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotherapy\nDate: June 02, 2017\nQuestion 1 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. In the sample of persons studied, it was measured the average daily time using some of these devices, and if the person had or not a cervical disc herniation (CDH). The table below summarizes the results.\n$$ \\begin{array}{crrr} \\hline \\mbox{Time (in min)} \u0026amp; \\mbox{People with CDH} \u0026amp; \\mbox{People without CDH} \u0026amp; \\mbox{Total}\\newline 0-60 \u0026amp; 2 \u0026amp; 32 \u0026amp; 34\\newline 60-120 \u0026amp; 5 \u0026amp; 86 \u0026amp; 91\\newline 120-180\t\u0026amp; 14 \u0026amp; 136 \u0026amp; 150\\newline 180-240\t\u0026amp; 21 \u0026amp; 127 \u0026amp; 148\\newline 240-300\t\u0026amp; 16 \u0026amp; 68 \u0026amp; 84\\newline 300-360\t\u0026amp; 10 \u0026amp; 12 \u0026amp; 22\\newline \\mbox{Total} \u0026amp; 68 \u0026amp; 461 \u0026amp; 529\\newline \\hline \\end{array} $$\nPlot the ogive of the global distribution of time (including people with CDH and without CDH). Plot the box plot of the global distribution of time and interpret it. In which sample there is less relative dispersion with respect to the mean, in people with CDH or in people without CDH? Which distribution is less symmetric, people with CDH or without CDH? Compute the standard score of a person with CDH that uses those devices 200 minutes a day and the same for a person without CDH. Interpret them. Use the following sums for the computations:\nPeople with CDH: $\\sum x_i=14640$, $\\sum x_i^2=3538800$, $\\sum(x_i-\\bar x)^3=-8746878.8927$.\nPeople without CDH: $\\sum x_i=78090$, $\\sum x_i^2=15650100$, $\\sum(x_i-\\bar x)^3=-3234289.0161$.\nSolution 2. People with CDH: $\\bar x=215.2941$ points, $s=75.4296$ points, $cv=0.3504$. People without CDH: $\\bar x=169.3926$ points, $s=72.4865$ points, $cv=0.4279$. Since the coefficient of variation of people with CDF less than the one of people without CDF, there is less relative spread with respect to the mean in de distribution of people with CDF. People with CDF: $g_1=-0.2997$.\nPeople without CDF: $g_1=-0.0184$.\nSince the coefficient of skewness of people with CDF is further from zero, the distribution is less symmetric.\nPerson with CDH: $z(200)=-0.2028$.\nPerson without CDH: $z(200)=0.4222$\nThe person with CDH has a value less than the mean but relatively closer to the mean than the person without CDH. Question 2 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. One goal of the research is determining if there is some relation between the average daily time using some of those devices and the number of cervical vertigo attacks in the last year. The table below shows the collected information in a sample of 12 persons.\n$$ \\begin{array}{lrrrrrrrrrrrr} \\hline \\mbox{Time (min)} \u0026amp; 344 \u0026amp; 68 \u0026amp; 24 \u0026amp; 178 \u0026amp; 218 \u0026amp; 315 \u0026amp; 262 \u0026amp; 77 \u0026amp; 152 \u0026amp; 186 \u0026amp; 144 \u0026amp; 103\\newline \\mbox{Vertigo attacks} \u0026amp; 42 \u0026amp; 3 \u0026amp; 2 \u0026amp; 6 \u0026amp; 14 \u0026amp; 31 \u0026amp; 22 \u0026amp; 3 \u0026amp; 7 \u0026amp; 9 \u0026amp; 3 \u0026amp; 4\\newline \\hline \\end{array} $$\nWhich regression model is better to predict the number of vertigo attacks given the time using these devices, the linear or the exponential? Justify the answer. Use the best regression model (the exponential or the linear) to predict the number or vertigo attacks expected for a person that uses those devices 200 minutes every day. Which regression model would you use to predict the time using those devices required to have a number of vertigo attacks, the linear, the exponential or the logarithmic? Justify the answer. Use the following sums for the computations ($X$=Time and $Y$=Vertigo attacks):\n$\\sum x_i=2071$, $\\sum \\log(x_i)=59.3234$, $\\sum y_j=146$, $\\sum \\log(y_j)=24.2119$,\n$\\sum x_i^2=465587$, $\\sum \\log(x_i)^2=299.5558$, $\\sum y_j^2=3618$, $\\sum \\log(y_j)^2=60.1295$,\n$\\sum x_iy_j=38162$, $\\sum x_i\\log(y_j)=5252.95$, $\\sum \\log(x_i)y_j=800.3072$, $\\sum \\log(x_i)\\log(y_j)=127.0449$.\nSolution Linear regression model of vertigo attacks on time: $\\bar x=172.5833$ min, $s_x^2=9013.9097$ min².\n$\\bar y=12.1667$ attacks, $s_y^2=153.4722$ attacks².\n$s_{xy}=1080.4028$ min⋅attacks.\n$r^2 = 0.8438$.\nExponential regression model of vertigo attacks on time: $\\overline{\\log(y)}=2.0177$ log(attacks), $s_{\\log(y)}^2=0.9398$ log(attacks)². $s_{x\\log(y)}=89.5312$ min⋅log(attacks). $r^2 = 0.9462$.\nTherefore, the exponential regression model is better since its coefficient of determination is higher.\nExponential regression model of vertigo attacks on time: $y=e^{0.3035 + 0.0099x}$.\nNumber of vertigo attacks expected for 200 min usign electronic gadgets $y(200)=9.8747$.\nSince the exponential regression model is better than the linear one to predict the number of vertigo attacks as a function of time using electronic gadgets, to predict the time as a function of the number of vertigo attacks is better to use the inverse of the exponential regression model, that is, the logarithmic regression model.\nQuestion 3 Cervical radiculopathy occurs in 0.35% of men. The Spurling test is a test to diagnose cervical radiculopathy with a sensitivity of 95% and a specificity of 93%.\nCompute the positive and negative predictive values of the test and interpret them. Is this test a good test as a screening test (to rule out the disease)? Compute the minimum specificity of the test to be able to diagnose the cervical radiculopathy with a positive outcome. Solution $PPV=P(D|+)=0.0455$. $NPV=P(\\overline D|-)=0.9998$. It is a good screening test as the post test probability of not having the cervical radiculopathy for a negative outcome is very high. Minimum specificity $P(-|\\overline D)=0.9967$. Question 4 The haematocrit concentration in blood of healthy males follows a normal distribution with mean and standard deviation not known. However, it is known that the first quartile of haematocrit is 38.5% and the third quartile is 52%.\nCompute the mean and the standard deviation of haematocrit in healthy males. Compute the percentage of healthy males with more than 64 of haematocrit. Solution Naming $X$ to the haematocrit level in healthy males,\n$\\mu=45.25$ and $\\sigma=10.07$, thus, $X\\sim N(45.25, 10.07)$.\n$P(X\u0026gt;64)=0.0313$, thus, a $3.13$% of healthy males. Question 5 It is known that 20% of professional cyclists use Erythropoietin (EPO) to improve their physical performance, and 99% of the cyclists that use EPO, also use other forbidden substances to mask the use of EPO.\nIf there are 10 professional cyclists in a team, what is the probability that more than 2 are doped with EPO? If there are 100 professional cyclists doped with EPO in a competition, what is the probability that at least 98 of them had taken some substances to mask the use of EPO? If there are 2000 professional cyclists in a country, what is the probability that some of them has taken EPO without masking it? Solution Naming $X$ to the number of cyclists doped with EPO in a team with 10 cyclists, $X\\sim B(10,0.2)$ and $P(X\u0026gt;2)=0.3222$. Naming $Y$ to the number of cyclists that have taken some substances to mask th EPO in 100 cyclists doped with EPO, $Y\\sim B(100,0.99)$ and $P(Y\\geq 98)=0.9206$. Naming $Z$ to the number of cyclists that has taken EPO without masking it in 2000 cyclists, $Z\\sim B(2000,0.002)\\approx P(4)$ and $P(Z\u0026gt;0)=0.9817$. Question 6 The probability that an injury $A$ is repeated is 4/5, the probability that another injury $B$ is repeated is 1/2, and the probability that both injuries are repeated is 1/3. Compute the probability of the following events:\nOnly injury $B$ is repeated. At least one injury is repeated. Injury $B$ is repeated if injury $A$ has been repeated. Injury $B$ is repeated if injury $A$ has not been repeated. Are the injuries independent? Solution $P(B\\cap\\overline A)=1/6$. $P(A\\cup B)=29/30$. $P(B\\vert A)=5/12$. $P(B\\vert \\overline A)=5/6$. The injuries are dependent as $P(B|A)\\neq P(B)$. ","date":1496361600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"25d9a4ba8e47ff15b3867e60e0629e2f","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-06-02/","publishdate":"2017-06-02T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-06-02/","section":"teaching","summary":"Degrees: Physiotherapy\nDate: June 02, 2017\nQuestion 1 A study try to determine the influence of electronic gadgets (mobile phones, tables, consoles, etc.) in neck disorders. In the sample of persons studied, it was measured the average daily time using some of these devices, and if the person had or not a cervical disc herniation (CDH).","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2017-06-02","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Degrees: Physiotheraphy\nDate: May 19, 2017\nProbability and random variables Question 1 The prevalence of sciatica in a population is 3%. The Lasegue\u0026rsquo;s test is a neurotension test that is used to diagnose the sciatica with a sensitivity of 91% and a specificity of 26%. On the other hand, there is an alternative test with a sensitivity of 80% and a specificity of 90%.\nCompute the positive predictive value for the Lasegue\u0026rsquo;s test. Assuming that the tests are independent, compute the probability of having a positive outcome in both tests. Compute the probability of getting a wrong diagnose in the Lesegue\u0026rsquo;s test or in the alternative test. Which test is better as a screening test (to rule out the sciatica)? Solution $PPV=P(D|+)=0.0366$. It is not a goot test to confirm the sciatica as the post test probability of having the sciatica for a positive outcome is very low. Naming $L⁺$ to the event of having a positive outcome in Lasegue\u0026rsquo;s test and $A⁺$ to the event of having a positive outcome in the alternative test: $P(L^+\\cap A^+)=P(L^+)P(A^+)=0.7451\\cdot 0.121 = 0.0902$. Naming $WL$ to the event of having a wrong diagnose with Lasegue\u0026rsquo;s test and $WA$ to the event of having a wrong diagnose with the alternative test: $P(WL\\cup WA)=P(WL)+P(WA)-P(WL\\cap WA)=0.7205+ 0.103-0.7205\\cdot0.103=0.7493$. Lesegue test: $NPV=P(\\overline D|-)=0.9894$. Alternative test: $NPV=P(\\overline D|-)=0.9932$. Thus, the alternative test is better to rule out the sciatica. Question 2 A physiotherapist opens a clinic and use the social networks to advertise it. In particular he send a friend request to 20 contacts on Facebook. If the probability that a Facebook user accept the friend request is 80%, what is the probability that more than 18 accept the friend request? What is the expected number of friend requests accepted?\nSolution Naming $X$ to the number of accepted friend request, $X\\sim B(20,0.8)$ and $P(X\u0026gt;18)=0.0692$. The expected number of accepted friend request is $16$. Question 3 According to a study of the Information Society of Spain in 2013, the spanish checks the mobile phone 150 times a day in average. What is the probability that a spanish person checks the mobile phone more than 2 times an hour?\nSolution Naming $X$ to the number of times that a spanish person checks the phone in an hour, $X\\sim P(6.25)$ and $P(X\u0026gt;2)=0.9483$. Question 4 The the cervical rotation in a population follows a normal probability distribution model with mean 58º and standard deviation 6º.\nBetween what values are the cervical rotation of the central 50% of the population? Taking into account the precision of the measurement instrument, a goniometer, a rotation less than 53º is considered a mobility limitation. If we take a random sample of 100 persons from this population, what is the expected number of persons with mobility limitation in the sample? Solution Naming $X$ to the cervical rotation, $X\\sim N(58, 6)$.\n$(Q1,Q3)=(53.9531, 62.0469)$. $P(X\u0026lt;53)=0.2023$ and the expected number of persons with mobility limitation in a sample of 100 persons is $20.2328$. ","date":1495152000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"80f5e75bcf897805a50ba29cbb3590f3","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-05-19/","publishdate":"2017-05-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-05-19/","section":"teaching","summary":"Degrees: Physiotheraphy\nDate: May 19, 2017\nProbability and random variables Question 1 The prevalence of sciatica in a population is 3%. The Lasegue\u0026rsquo;s test is a neurotension test that is used to diagnose the sciatica with a sensitivity of 91% and a specificity of 26%.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2017-05-19","type":"book"},{"authors":null,"categories":null,"content":"In this post I offer some advice or tips that can help undergraduate students, especially during their first years, to be successful making the most of classes and learning to maximize their work.\nI give this pieces of advice from a deep teaching background and from my own experience as a student.\nMost of this tips could seem obvious but many students fail to put them into practice.\nSelf Motivation {: .pull-right}\nMotivation is the key for success in every project, not only in your academic education, but also in your professional life. Without motivation you are likely to give up when you face the first learning difficulties or trouble, and there will be for sure. Therefore, you should bear in mind the reasons that made you start this way (especially when you have difficulties). That is, try to walk with an eye in the goal and the other in the path.\nA grade require a lot of effort, but you must think that many people, even with less capabilities than you, have finished it successfully.\nBe proactive This is the big challenge because in primary or secondary school students got used to follow the steps of the teacher, and is the teacher who usually drives the learning process. But higher education is not compulsory and here is the student who has take the initiative driving his/her learning.\nWhat does it mean to be proactive Explore the subject by yourself. Prepare classes in advance. Try to solve problems by yourself. Expand the information given by the teacher from other sources. Try to apply what you learn to your life or context. Let your curiosity and creativity loose. The teacher is your ally Unfortunately some times students think about the teacher as a judge or an enemy. But nothing far from true, because the main main purpose of the teacher is to help you in your learning process. You have to change your mind and think about your teacher as your ally. Thus, do not hesitate about ask him or her for help any time you need it.\nMake the most of your classes Classroom sessions are not the only way to learn, but they are one of the most effective. Attend to class just to sign the attendance list could be counterproductive. Thus, if you attend a class, forget about other subjects and try to put all your senses in the topic that the class is about.\nAlso, you need to know how make a proper use of every type of classs. A master class tries to give you a general idea about a topic, highlighting the most important concepts. So, try to save the key ideas and do not worry about the details.\nA seminar, on the other hand, tries to develop a topic in deep. So it is a good idea to prepare the class in advance having a prior look at the topic. This way you can focus on the most difficult aspects of the topic during the class.\nFinally, in a problem-solving workshop its important trying to solve the problems before in order to find out the main difficulties or bottle necks.\nAlso, you should ask your teacher to flip the class.\nWhat is a flipped classroom? The flipped classroom describes a reversal of traditional teaching where students gain first exposure to new material outside of class, usually via reading or lecture videos, and then class time is used to do the harder work of assimilating that knowledge through strategies such as problem-solving, discussion or debates.\nDo not be afraid to ask Related to the previous item, the interaction among the students and between the theacher and the students is essential to take advantage of classes. The teacher can not guess if you have understood a concept or are lost unless you give some feedback to him or her. So, try to overcome your shyness and ask without any fear of ridicule, because most of the time our doubts are gone to be the same for your classmates, and even if not, remember that the only stupid question is the one that you don\u0026rsquo;t ask.\nFeedback is an essential aspect to ensure solid progress in any learning process.\nTake advantage of tutoring One of the main advantages of a private university like this is the availability and closeness of the teachers. If it is clear that the teacher is your ally, you should not hesitate to use tutorials whenever you have a trouble.\nEach student has a personal tutor whose function is to inform and advise him or her and to resolve any academic question that arise, from the course registration until the exams. If you do not know who is your tutor, ask for him or her at the secretariat and arrange an interview with him or her as soon as possible.\nAny time you have a trouble that you do not know how to solve, ask your tutor for help. Even if you do not have any trouble, is a good practice to have regular meetings with your tutor to check the course progress.\nOn the other hand, any subject has its own tutoring hours. Those tutorials are usually at the office of the subject teacher.\nUse tutoring hours for \u0026hellip;\nGetting advise from the teacher about how to face the subject or study a topic. Reviewing concepts that you do not understand (especially if you missed a class). Getting help to solve difficult problems. Reviewing your test or exams. But \u0026hellip;\nTutoring hours are not private classes. This means that you have to go to tutorials with a specific doubt or problem and it requires some previous work about the question. Respect the tutoring schedule. Most of the teachers promote tutorials not only to help students, but also because are an effective way to interact with the student and to know each other better. However, the teachers usually have other occupations in addition to teaching. So, in order to not interrupt their work, try not to go to tutorials outside their schedule. If you can not go to a tutorial at that hours, ask the teacher for an appointment. Read and write Another key for success is to have good documentation that complements what is seen in class. Many students believe that the only important information to consider is that provided by the teacher. But this is a mistake, because the information provided by the teacher is limited, incomplete, and sometimes wrong (unfortunately the teachers commit also mistakes). So, try to complement the class notes with recommended readings, because a good documentation can help you not only to contrast what is seen in class with other sources, but also to understand better what has been explained, and to expand it with new examples, discovering new applications, etc.\nFor each subject you should have two or three reference books. You have some recommendations in the bibliography of the course guides. Have a look to those books (most of them are available at the university library), but if you do not feel comfortable with none of them, ask the teacher for others books or try to find them by yourself on Internet, because each student has to find out his or her book!\nOn the other hand, it is very important, for structuring and settling the main ideas about a subject, writing about it. There is a clear evidence that writing about a subject helps to helps you to organize, clarify and fix the ideas in your mind.\nIs helpful to write summaries and schemas with the key ideas of a class or a topic. Another good practice is to make smalls presentations for every chapter or topic.\nFinally, you should take into account that most assessment tests are written, so writing about the subject is a good training for exams.\nWork in groups Group work is important, not only from the point of view of learning, but also because it helps to develop social habits. In most jobs it is usual to work in teams, since some tasks or problems are simplified and easier to solve working cooperatively. Something similar happens with teaching, since collaborative learning is usually faster and better (and more fun).\nIn this way, working in groups have many advantages:\nHelps you to develop social skills. Reinforce the motivation being part of a team. Enrich the learning with different points of view. Helps to develop a critical reasoning. But working in group is not easy (especially when somebody does not assume his/her responsibility). Acquiring the skills to work in group requires time and practice, so the sooner you start the better.\nGet organized Review regularly the progress of the course It is important during the course to set some moments to analyze the progress of your work and to review the path travelled and the path ahead. These reviews are intended to make a small self-assessment on the learning process. To what extent the objectives set are being met and if we are being faithful to the guidelines set by this survival manual. Thereby, we can identify the main difficulties in learning and what is failing in order to correct the course in time. In the worst case, if we fail to pass any subject, it is very important to make a final review trying to identify the causes to learn from mistakes not to repeat them in the future. Do not be discouraged and think that even if you have not passed the subject, for sure you have not completely waste your time. Be positive and think about everything you have learned that will undoubtedly provides you a valuable experience for your life.\nReview regularly this tips Finally, do not forget to review and refresh these tips.\n","date":1493018510,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"7178c1061630403e50e1fc59e98c026d","permalink":"/en/post/survival-guide-for-degrees/","publishdate":"2017-04-24T09:21:50+02:00","relpermalink":"/en/post/survival-guide-for-degrees/","section":"post","summary":"In this post I offer some advice or tips that can help undergraduate students, especially during their first years, to be successful making the most of classes and learning to maximize their work.\n","tags":null,"title":"Survival guide for degrees","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy","Medicine"],"content":"Degrees: Physiotherapy, Medicine\nDate: March 31, 2017\nQuestion 1 The table below gives the distribution of points obtained by students in a physiotherapy public competition this year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.50 \u0026amp; 638140781.25\\newline (40,80] \u0026amp; 185 \u0026amp; 11100 \u0026amp; 666000 \u0026amp; -361328.13 \u0026amp; 4516601.56\\newline (80,120] \u0026amp; 72 \u0026amp; 7200 \u0026amp; 720000 \u0026amp; 1497375.00 \u0026amp; 41177812.50\\newline (120,160] \u0026amp; 40 \u0026amp; 5600 \u0026amp; 784000 \u0026amp; 12301875.00 \u0026amp; 830376562.50\\newline (160,200] \u0026amp; 19 \u0026amp; 3420 \u0026amp; 615600 \u0026amp; 23603640.63 \u0026amp; 2537391367.19\\newline \\hline \\sum \u0026amp; 400 \u0026amp; 29000 \u0026amp; 2819200 \u0026amp; 24886500.00 \u0026amp; 4051603125.00\\newline \\hline \\end{array} $$\nCompute the interquartile range and explain your result. Are there outliers in the sample? The minimum number of points to pass the exam is 150; what percentage of students passed the exam? If the mean of the score of the previous year exam was 80 points and the standard deviation was 52 points, which year is the mean more representative? Justify the answer. According to the values of skewness and kurtosis, can we assume that the sample has been taken from a normally distributed population? What score is relatively higher, 150 points in this year exam or 160 in the previous year exam? Justify the answer. Solution $Q_1=43.48$ points, $Q_3=97.78$ points and $IQR=54.3$ points. Fences: $F_1=-37.97$ points and $F_2=179.23$ points. Thus, there are outliers. $F_{150}=0.925$, so the percentage of students that passed the exam is $7.5%$. This year: $\\bar x=72.5$ points, $s^2=1791.75$ points², $s=42.3291$ points, $cv=0.5838$. Previous year: $\\bar x=80$ points, $s=52$ points, $cv=0.65$. As the coefficient of variation of this year is less than the one of the previous year, there is less relative spread this year and the mean is more representative. $g_1=0.8203$, so the distribution is right-skewed. $g_2=0.1551$, so the distribution is a little bit more peaked than a bell curve (leptokurtic). As $g_1$ and $g_2$ are between -2 and 2 we can assume that the sample has been taken from a normaly distributed population. This year standard score: $z(150)=1.83$. Previous year standard score: $z(160)=1.53$. As the standard score of 150 this year is greater than the standard score of 160 the previous year, 150 points this year is relatively higher than 160 points the previous year. Question 2 A study try to determine the relation between obesity and the response to pain. The obesity is measured as the percentage over the ideal weight ($X$), and the response to pain with a measure of the twinge sensation. For a sample of 10 individuals we got the following sums:\n$\\sum x_i=737$, $\\sum y_j=77$, $\\sum x_i^2=55589$, $\\sum y_j^2=799.5$, $\\sum x_iy_j=6056.5$\nCompute the linear regression model of the response to pain on the obesity. What is the change in the response to pain for an increment of one point in the weight? What percentage of the variability of the response to pain does not explain the linear regression model? Taking into account the parameters of the exponential model given in the table below, give the equation of the exponential model. Which transformation is required to convert this model into a linear one? $$ \\begin{array}{lr} \\hline \\mbox{Coefficient} \u0026amp; \\mbox{Estimation}\\newline \\mbox{Intercept} \u0026amp; -1.772\\newline x \u0026amp; 0.049\\newline \\hline \\end{array} \\qquad \\begin{array}{r} \\hline R^2\\newline 0.72\\newline \\hline \\end{array} $$\nWhat is the expected response to pain for an obesity of 50% according to the linear model? And according to the exponential model? Which prediction is more reliable? Solution Linear model of response to pain on obesity: $\\bar x=73.7$, $s_x^2=127.21$. $\\bar y=7.7$, $s_y^2=20.66$. $s_{xy}=38.16$ Regression line of pain relief on obesity: $y=-14.41+0.3x$. For each increment of one unit in the obesity the response to pain will increase 0.3 units. Linear coefficient of determination: $r^2=0.554$. So, the linear model explains the 55.4% of the variability of the response to pain and it does not explain the remaining 44.6%. Exponential regression model: $y=e^{-1.772+0.049x}$. To compute this model you have to apply the logarithm to the dependen variable, that is, the response to pain and then compute the regression line of the logarithm of the response to pain on obesity. Prediction with the linear model: $y(50)=0.59$ Prediction with the exponential model: $y(50)=1.9699$ The prediction with the exponential model is better as the exponential coefficient of determination is greater than the linear one. ","date":1490918400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"3ac241dbb67a86fa4f23770cd0210b3d","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-03-31/","publishdate":"2017-03-31T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2017-03-31/","section":"teaching","summary":"Degrees: Physiotherapy, Medicine\nDate: March 31, 2017\nQuestion 1 The table below gives the distribution of points obtained by students in a physiotherapy public competition this year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2017-03-31","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Jan 10, 2017\nQuestion 1 The rate of growth of certain bacteria population is the square root of the number of bacteria in the population. How much will the population have increased after 1 hour from the beginning of the growth? How long will it take until the population is four times the population at the beginning?\nSolution Naming $x$ to the number of bacteria and $t$ to time, $x(t)=(\\frac{t}{2}+C)^2$.\nThe number of bacteria has increased $\\frac{1}{4}+C$ after 1 hour from the beginning.\nThe number of bacteria is four times the population at the beginning at time $t=2C$. Question 2 The temperature of a chemical process depends on the amounts $x$ and $y$ of two substances, according to the function $T(x,y)=4x^3+y^3-3xy$. Determine the local extrema and saddle points of the temperature function (recall that the amounts $x$ and $y$ cannot be negative).\nSolution $T$ has a saddle point at $(0,0)$ and a local minimum at $(\\frac{\\sqrt[3]{4}}{4},\\frac{\\sqrt[3]{2}}{2})$. Question 3 An ecological model explains the number of individuals in a population through the function $$f(x,y)=\\dfrac{e^t}{x},$$ where $t$ is the time and $x$ the number of predators in the area. Give an approximation of the number of individuals at $t=0.1$ and $x=0.9$ using the second order Taylor polynomial of function at point $(1,0)$.\nSolution Second order Taylor polynomial of $f$ at point $(1,0)$: $P^2_{f,(1,0)}(x,y)=3-3x+2t+x^2+\\frac{t^2}{2}-xt$.\n$P^2_{f,(1,0)}(0.9,0.1)=1.225$. Question 4 The position of a moving object in space is given by the function $f(t)=(e^{t/2}, \\sin^2(t), \\sqrt[3]{1-t})$.\nCompute the velocity and acceleration vectors at time $t=0$.\nRemark: velocity is the variation of space with respect to time, and acceleration is the variation of velocity with respect to time. Compute an equation of the plane normal to the trajectory at time $t=0$. Solution $f\u0026rsquo;(t)=(\\frac{e^{t/2}}{2},2\\sin t \\cos t, \\frac{-(1-t)^{-2/3}}{3})$ and $f\u0026rsquo;(0)=(\\frac{1}{2},0,-\\frac{1}{3})$. $f\u0026rsquo;\u0026rsquo;(t)=(\\frac{e^{t/2}}{4},2(\\cos^2 t-\\sin^2 t), \\frac{-2(1-t)^{-5/3}}{9})$ and $f\u0026rsquo;\u0026rsquo;(0)=(\\frac{1}{4},2,-\\frac{2}{9})$. Normal plane to the trajectory at time $t=0$: $3x-2z=1$. ","date":1484006400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"b0db71b6201932c9f355cf651505de25","permalink":"/en/teaching/calculus/exams/pharmacy-2017-01-10/","publishdate":"2017-01-10T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2017-01-10/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Jan 10, 2017\nQuestion 1 The rate of growth of certain bacteria population is the square root of the number of bacteria in the population. How much will the population have increased after 1 hour from the beginning of the growth?","tags":["Exam"],"title":"Pharmacy exam 2016-01-10","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: January 10, 2017\nDescriptive Statistics and Regression Question 1 The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.\n$$ \\begin{array}{cr} \\hline \\mbox{Time} \u0026amp; \\mbox{Patients}\n(0,10] \u0026amp; 22\\newline (10,20] \u0026amp; 43\\newline (20,30] \u0026amp; 33\\newline (30,40] \u0026amp; 12\\newline (40,50] \u0026amp; 6\\newline (50,60] \u0026amp; 4\\newline \\hline \\end{array} $$\nPlot the ogive of the waiting time. Compute the median of the distribution, and explain its meaning. What percentage of patients have waited for longer than 38 minutes? Solution 2. $Me=18.89$ min. 3. 10% of patients have waited for longer than 38 minutes. Question 2 To study fertility in two different populations $A$ and $B$, a sample of each population was taken and the number of pregnancies for each woman was recorded. The results of such records are shown below.\n$$ \\begin{array}{ccccccccccccccccc} \\hline A \u0026amp; 2 \u0026amp; 3 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; 2 \u0026amp; 6 \u0026amp; 1 \u0026amp; 5 \u0026amp; 3 \u0026amp; 4 \u0026amp; 4 \u0026amp; 3 \u0026amp; 2 \u0026amp; 5 \u0026amp; 0\\newline B \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 0 \u0026amp; 3 \u0026amp; 0 \u0026amp; 1 \u0026amp; 0 \u0026amp; 2 \u0026amp; 5 \u0026amp; 1 \u0026amp; 1 \u0026amp; 1\\newline \\hline \\end{array} $$\nDraw the box diagram of each sample and compare them. In which of the two samples is the mean more representative? Justify your answer. Compute the skewness coefficient for both samples; which one is more skewed? What is relatively bigger, a case of 5 pregnancies in sample $A$, or a case of 3 pregnancies in sample $B$? Consider the following sums for your computations:\n$\\sum a_i=51$, $\\sum a_i^2=199$, $\\sum (a_i-\\bar a)^3=-11.6016$, $\\sum (a_i-\\bar a)^4=217.9954$,\n$\\sum b_i=20$, $\\sum b_i^2=52$, $\\sum (b_i-\\bar b)^3=49.5$, $\\sum (b_i-\\bar b)^4=220.3125$.\nSolution 2. $\\bar a=3.1875$ pregnancies, $s_a^2=2.2773$ pregnancies², $s_a=1.5091$ pregnancies, $cv_a=0.4734$. $\\bar b=1.25$ pregnancies, $s_b^2=1.6875$ pregnancies², $s_b=1.299$ pregnancies, $cv_b=1.0392$. As the coefficient of variation of $A$ is less than the coefficient of variation of $B$, the mean of population $A$ is more representative than the mean of population $B$. 3. $g_{1,a}=-0.211$ and $g_{1,b}=1.4113$, so the distribution of $B$ is more skewed than the distribution of $A$. 5. $z_a(5)=1.2011$ and $z_b(3)=1.3472$, so 3 pregnancies is relatively bigger in population $B$ than 5 pregnancies in population $A$. Question 3 A study to find the relation between the reduction in cholesterol levels in blood and exercise has been carried out. The results are shown in the table below.\n$$ \\begin{array}{lrrrrrrrrrr} \\hline \\mbox{Minutes of exercise} \u0026amp; 96 \u0026amp; 106 \u0026amp; 163 \u0026amp; 207 \u0026amp; 227 \u0026amp; 244 \u0026amp; 261 \u0026amp; 271 \u0026amp; 272 \u0026amp; 301\\newline \\mbox{Cholesterol reduction (mg/dl)} \u0026amp; 4 \u0026amp; 5 \u0026amp; 8 \u0026amp; 13 \u0026amp; 15 \u0026amp; 17 \u0026amp; 22 \u0026amp; 39 \u0026amp; 31 \u0026amp; 45\\newline \\hline \\end{array} $$\nWhich regression models explains better the reduction of cholesterol as a function of the exercise time, the linear o the exponential? Justify the answer. According to the linear regression model, how much will be the reduction in cholesterol when the exercise time is increased by one minute? According to the logarithmic model, how much exercise time is needed to get a reduction of cholesterol of 100 mg/dl? Is this estimation reliable? Justify your answer. Consider the following values for your computations, where $X$=exercise time in minutes, and $Y$=cholesterol reduction: $\\sum x_i=2148$, $\\sum \\log(x_i)=53.0559$, $\\sum y_j=199$, $\\sum \\log(y_j)=27.1766$,\n$\\sum x_i^2=507082$, $\\sum \\log(x_i)^2=282.9578$, $\\sum y_j^2=5779$, $\\sum \\log(y_j)^2=80.035$,\n$\\sum x_iy_j=50750$, $\\sum x_i\\log(y_j)=6359.0468$, $\\sum \\log(x_i)y_j=1097.978$, $\\sum \\log(x_i)\\log(y_j)=147.0682$.\nSolution 1.Linear regression model of cholesterol reduction on exercise time: $\\bar x=214.8$ min, $s_x^2=4569.16$ min². $\\bar y=19.9$ mg/dl, $s_y^2=181.89$ (mg/dl)². $s_{xy}=800.48$ min⋅mg/dl. $r^2 = 0.771$. Exponential regression model of cholesterol reduction on exercise time: $\\overline{\\log(y)}=2.7177$ log(mg/dl), $s_{\\log(y)}^2=0.6178$ log(mg/dl)². $s_{x\\log(y)}=52.1504$ min⋅log(mg/dl). $r^2 = 0.9635$. Therefore, the exponential regression model is better since its coefficient of determination is higher. 2. Regression line of cholesterol reduction on exercise time: $y=-17.7312 + 0.1752x$. The cholesterol reduction increases 0.1752 mg/dl when the exercise time is increased by one minute. 3. Logarithmic regression model of exercise time on cholesterol reduction: $x=-14.6075 + 84.4135\\log(y)$. $x(100)=374.131$ min. Despite the coefficient of determination is pretty close to 1, the estimation is not reliable since 100 mg/dl is far away from the range of values in the sample. Probability and random variables Question 4 The medical emergency services of a town gets 6 requests per day in average. This service is staffed with three shifts of 8 hours each.\nCompute the probability of getting more than 3 requests in an 8 hours shift. Compute the probability that in some of the three shifts there are no requests. Solution Naming $X$ to the number of requests in an 8 hours shift, $X\\sim P(2)$ and $P(X\u0026gt;3)=0.1429$. Naming $Y$ to the number of shifts with no requests, $Y\\sim B(3,0.1353)$ and $P(Y\u0026gt;0)=0.3535$. Question 5 The prevalence on certain disease in a population is 10%. A diagnosis test for that disease has a sensitivity of 95% and a specificity of 85%.\nCompute the positive and negative predictive values and explain the result obtained. What is the test more useful for, to detect the disease or to rule it out? What should be the specificity of the test so that the test has a positive predictive value equal to 80%? Solution $PPV=P(D|+)=0.413$ and $NPV=P(\\overline D|-)=0.9935$. The specificity should be $97.37%$. Question 6 In a study of blood pressure on 8000 individuals, it has been recorded that 2254 people show readings of blood pressure above 130 mmHg, and 3126 individuals show readings between 110 and 130 mmHg. Assume that blood pressure is normally distributed.\nCompute the mean and standard deviation (of blood pressure). Readings above 140 mmHg are considered to be a high pressure problem. How many people in the group have such pressure problem? A test will flag a blood pressure problem if the reading of a patient pressure is in the bottom 5% or in the top 5% of the results for the population. For what values of the blood pressure is an individual in the population considered normal? Solution Naming $X$ to the blood pressure, $X\\sim N(118.723, 19.5221)$. $P(X\u0026gt;140)=0.1379$ and there are $1103.0473$ persons with high pressure. The blood pressure is normal in the interval $(86.612, 150.8341)$. Question 7 Students in a Chemistry class need to take two exams in order to pass the subject. The percentage of students that passed the midterm were 60% for the first exam, and 68% for the second. We also have that 80% of the students that passed the first midterm also passed the second midterm. A student from the class is picked randomly.\nCompute the probability that the student has failed both exams. Compute the probability that the student has passed the first exam if we know that she has failed the second exam. Solution Naming $E_1$ tho the event of passing the first exam and $E_2$ to the event of passing the second exam:\n$P(\\overline E_1\\cap \\overline E_2)=0.2$. $P(E_1|\\overline E_2)=0.375$. ","date":1484006400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"248fdb079a6a3e75e0a16c9db7579175","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-01-10/","publishdate":"2017-01-10T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2017-01-10/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: January 10, 2017\nDescriptive Statistics and Regression Question 1 The table below gives the distribution of the waiting time (in minutes) at the emergency room of a set of patients.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2017-01-10","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy, Biotechnology\nDate: November 28, 2016\nQuestion 1 The table below gives the distribution of points obtained by students in the MIR exam last year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.50 \u0026amp; 638140781.25\\newline (40,80] \u0026amp; 185 \u0026amp; 11100 \u0026amp; 666000 \u0026amp; -361328.13 \u0026amp; 4516601.56\\newline (80,120] \u0026amp; 72 \u0026amp; 7200 \u0026amp; 720000 \u0026amp; 1497375.00 \u0026amp; 41177812.50\\newline (120,160] \u0026amp; 40 \u0026amp; 5600 \u0026amp; 784000 \u0026amp; 12301875.00 \u0026amp; 830376562.50\\newline (160,200] \u0026amp; 19 \u0026amp; 3420 \u0026amp; 615600 \u0026amp; 23603640.63 \u0026amp; 2537391367.19\\newline \\hline \\sum \u0026amp; 400 \u0026amp; 29000 \u0026amp; 2819200 \u0026amp; 24886500.00 \u0026amp; 4051603125.00\\newline \\hline \\end{array} $$\nCompute the interquartile range and explain your result. Are there outliers in the sample? The minimum number of points to pass the exam is 150; what percentage of students passed the exam? Study the representativity of the mean. According to the values of skewness and kurtosis, can we assume that the sample has been taken from a normally distributed population? Compute the standardized points of a student that got 150 points in the MIR. Solution $Q_1=43.48$ points, $Q_3=97.78$ points and $IQR=54.3$ points. Fences: $F_1=-37.97$ points and $F_2=179.23$ points. Thus, there are outliers. $F_{150}=0.925$, so the percentage of students that passed the exam is $7.5%$. $\\bar x=72.5$ points, $s^2=1791.75$ points², $s=42.3291$ points, $cv=0.5838$. As the coefficient of variation is greater than 0.5 but not too much there is a moderate variability and the mean is moderately representative. $g_1=0.8203$, so the distribution is right-skewed. $g_2=0.1551$, so the distribution is a little bit more peaked than a bell curve (leptokurtic). As $g_1$ and $g_2$ are between -2 and 2 we can assume that the sample has been taken from a normaly distributed population. $z(150)=1.83$. Question 2 The table show the data of the GDP (Gross Domestic Product) per capita (thousands of euros) and infant mortality (children per thousand) from 1993 till 2000.\nYear GDP Mortality 1993 17 6.0 1994 17 5.6 1995 18 5.2 1996 18 4.9 1997 19 4.6 1998 20 4.3 1999 21 4.1 2000 22 4.0 Estimate the value of the GDP for an infant mortality of 3.8 children per thousand using the linear regression model. Which regression model explains better the GDP as a function of the infant mortality, a linear model or an exponential one? If we assume that the GPD per capita in year 2001 was 23 thousand €, what will be the expected infant mortality, according to the exponential regression model? Consider the linear models of GDP on infant mortality, and infant mortality on GDP; which of the two is more reliable? Use the following sums for the computations ($X$=GDP and $Y$=Infant mortality): $\\sum x_i=152$, $\\sum \\log(x_i)=23.5229$, $\\sum y_j=38.7$, $\\sum \\log(y_j)=12.5344$, $\\sum x_i^2=2912$, $\\sum \\log(x_i)^2=69.2305$, $\\sum y_j^2=190.87$, $\\sum \\log(y_j)^2=19.7912$, $\\sum x_iy_j=726.5$, $\\sum x_i\\log(y_j)=236.3256$, $\\sum \\log(x_i)y_j=113.3308$, $\\sum \\log(x_i)\\log(y_j)=36.76$. Solution Linear model of GDF on infant mortality: $\\bar x=19$ 10³€, $s_x^2=3$ 10⁶€. $\\bar y=4.8375$ children per thousand, $s_y^2=0.4573$ (children per thousan)². $s_{xy}=-1.1$ 10³€⋅children per thousand. Regression line of GDP on infant mortality: $x=30.6351 + -2.4052y$. $x(3.8) =21.4954$.\n$\\overline{\\log(x)}=2.9404$ log(10³€), $s_{\\log(x)}^2=0.0081$ log(10³€)². $s_{\\log(x)y}=-0.0577$ log(10³€)•children per thousand. Linear coefficient of determination of GDP on infant mortality $r^2=0.8819$. Exponential coefficient of determination of GDP on infant mortality $r^2=0.9002$. Thus, the exponential model explains a little bit better the relation between the GDP and the infant mortality.\n$\\overline{\\log(y)}=1.5668$ log(children per thousand), $s_{\\log(y)}^2=0.019$ log(children per thousand)². $s_{x\\log(y)}=-0.2284$ 10³€⋅log(children per thousand). Exponential model of infant mortality on GDP: $y=e^{3.0135 + -0.0761x}$. $y(23)=3.5332$.\nThe reliability of both models is the same as they have the same coefficient of determination.\nQuestion 3 Consider two variables $X$ and $Y$. Assume that the regression lines of the linear models intersect at the point $(2,3)$, and that, according to the appropriate linear model, the expected value of $Y$ for $x=3$ is $y=1$. How much will $Y$ change, according to the linear model, when $X$ increases by one unit?\nIf the coefficient of linear correlation is $-0.8$, how much will $X$ change, according to the linear model, when $Y$ increases by one unit?\nSolution $\\bar x=2$ and $\\bar y=3$. $b_{yx}=-2$, so $Y$ decreases 2 units when $X$ increases by one unit. $b_{xy}=-0.32$, so $X$ decreases 0.32 units when $Y$ increases by one unit. ","date":1480291200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"5b5f6b16255621541eab98497d33e33c","permalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2016-11-28/","publishdate":"2016-11-28T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/pharmacy/pharmacy-2016-11-28/","section":"teaching","summary":"Degrees: Pharmacy, Biotechnology\nDate: November 28, 2016\nQuestion 1 The table below gives the distribution of points obtained by students in the MIR exam last year.\n$$ \\begin{array}{|c|r|r|r|r|r|} \\hline x \u0026amp; n_i \u0026amp; x_in_i \u0026amp; x_i^2n_i \u0026amp; (x_i-\\bar x)^3n_i \u0026amp; (x_i-\\bar x)^4n_i\\newline \\hline (0,40] \u0026amp; 84 \u0026amp; 1680 \u0026amp; 33600 \u0026amp; -12155062.","tags":["Exam","Statistics","Biostatistics"],"title":"Pharmacy exam 2016-11-28","type":"book"},{"authors":null,"categories":["Calculus","Pharmacy","Biotechnology"],"content":"Degrees: Pharmacy and Biotechnology\nDate: Nov 7, 2016\nQuestion 1 The amount $x$ and $y$ in mg of two compounds in a certain chemical reaction are related by the following equation: $$ \\log(\\sqrt{x^2+y^2}) = y. $$\nCompute the equations of the tangent and normal lines to the graph of $y$ as a function of $x$ at the point $(1,0)$. Compute the approximate change of the amount $y$ if $x$ changes by 2mg, from the same point $(1,0)$. Solution Tangent line: $y=x-1$.\nNormal line: $y=-x+1$. $\\Delta y\\approx 2$ mg. Question 2 The temperature at a point $(x,y,z)$ in three-dimensional space is given by the following function: $$ T(x,y,z)= \\frac{e^{xy}}{z} $$\nSuppose we are position at $(1,1,1)$.\nIn which direction will the temperature decrease the fastest? What will be the rate of that decrease? What is the meaning of your result? Compute the directional derivative in the direction where $y$ increases twice as much as $x$, and $z$ increases half of $x$. What is the meaning of your result? Solution $-\\nabla f(1,1,1)=(-e,-e,e)$. The rate of decrease is $\\sqrt{3}e$. Taking the vector $\\mathbf{u}=(1,2,1/2)$, $f_{\\mathbf{u}}\u0026rsquo;(1,1,1)=5e/\\sqrt{21}$. This means that for each unit in the direction of the vector $(1,2,1/2)$ the function will increase $5e/\\sqrt{21}$ units. Question 3 Allometric growth refers to relationships between sizes of different parts of an organism. Suppose $x(t)$ and $y(t)$ are the size of two organs in an organism of age $t$; then the allometric relationship is given by the equation: $$ \\frac{1}{y}\\frac{dy}{dt} = k \\frac{1}{x}\\frac{dx}{dt}, $$ where $k$ is a positive constant.\nCompute the differential equation that explains $y$ as a function of $x$ (that is, take $x$ as the independent variable and $y$ as the dependent one). Solve the equation for $y$. Assume $y$ denotes the mass of a cell, and $x$ its volume, with $k=0.0794$, compute $y$ as a function of $x$ if $x=1000\\ \\mu$m$^3$ at the age at which $y$ is equal to 1 ng. Solution Differential equation: $y\u0026rsquo;=k\\dfrac{y}{x}$.\nGeneral solution: $y=cx^k$. Particular solution: $y=0.5778 x^{0.0794}$. Question 4 Find the local extrema and saddle points of the function $f(x,y)=e^y(y^2-x^2)$.\nSolution $f$ has a saddle point a $(0,0)$ and a local maximum at $(0,-2)$. ","date":1478476800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ee35cae8cf6a9ad706be7e4918064a0b","permalink":"/en/teaching/calculus/exams/pharmacy-2016-11-07/","publishdate":"2016-11-07T00:00:00Z","relpermalink":"/en/teaching/calculus/exams/pharmacy-2016-11-07/","section":"teaching","summary":"Degrees: Pharmacy and Biotechnology\nDate: Nov 7, 2016\nQuestion 1 The amount $x$ and $y$ in mg of two compounds in a certain chemical reaction are related by the following equation: $$ \\log(\\sqrt{x^2+y^2}) = y.","tags":["Exam"],"title":"Pharmacy exam 2016-11-07","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: June 23, 2016\nQuestion 1 It is believed that the age at which a person finish their first marathon depends on gender. To check it, a sample of 180 marathon runners was drawn. For every runner it was recorded the gender, the age (in years) when they finish the first marathon and if they finish with tendinitis. The data are summarized in the table below.\nMales\u0026nbsp; Females Age Finished With tendinitis \u0026nbsp; Finished Width tendinitis (10,20] 7 2 \u0026nbsp; 3 1 (20,30] 35 12 \u0026nbsp; 22 5 (30,40] 30 6 \u0026nbsp; 29 4 (40,50] 15 2 \u0026nbsp; 22 3 (50,60] 9 1 \u0026nbsp; 3 0 (60,70] 4 0 \u0026nbsp; 1 0 Calculate the average age at which it is finished the first marathon, both of males and females. Which mean is more representative? Justify the answer.\nCalculate the interquartile range of the age for the joint distribution (joining males and females) and interpret it.\nWhat age distribution is more asymmetric, males or females distribution. Justify the answer.\nTaking into account the relative spread in each group, who finished a marathon before, a man that finished his first marathon at the age of 48 or a woman that finished her first marathon at the age of 47? Justify the answer.\nUsing frequencies to approximate probabilities, compute the following probabilities:\nProbability that a runner finish their first marathon with tendinitis. Probability that a man 40 or less years old finish their first marathon with tendinitis. Probability that a woman who finish her first marathon with tendinitis is between 20 and 30 years old. Use the following sums for the calculations: Males: $\\sum n_i = 100$, $\\sum x_i n_i = 3460$, $\\sum x_i^2 n_i= 134700$, $\\sum(x_i-\\bar x)^3 n_i =121987$, $\\sum(x_i-\\bar x)^4 n_i =6480792$ Females: $\\sum n_i = 80$, $\\sum x_i n_i = 2830$, $\\sum x_i^2 n_i= 107800$, $\\sum(x_i-\\bar x)^3 n_i =18346$, $\\sum(x_i-\\bar x)^4 n_i =2175992$\nSolution Males: $\\bar x_m = 34.6$ years, $s_m=12.2409$ years, $cv_m=0.3538$. Females: $\\bar x_f = 35.375$ years, $s_f=9.8035$ years, $cv_f=0.2771$. The mean of females is more representative as the coefficient of variation is lower. $IQR=16.292$ years. The spread of central data is low. Coeff. of skewness of males $g_{1m}=0.2434$ and coeff. of skewness of females $g_{1f}=0.8378$, thus the males distribution of ages is less asymmetric. Standard score for a man of 48 years $z_m(48)=1.0947$ and standard score for a woman of 47 years $z_m(47)=1.1858$, thus the man finished his first marathon before. Naming $T$ to the event of finishing the first marathon with tendinitis, $M$ to the event of being male and $F$ to the event of being female, $P(T)=0.2$, $P(T|M\\cap \\mbox{Age}\u0026lt;=40) = 0.2778$, $P(\\mbox{Age}\\in (20,30]|T\\cap F) = 0.3846$. Question 2 A study tries to determine if the number of muscular injuries of professional athletes depends on stress. The study lasted four years and measured the average level of stress and the number of muscular injuries suffered by a group of athletes. The collected data is shown in the table below.\nStress ($X$) 2.3 3.8 5.1 1.4 6.9 7.2 3.2 8.3 Injuries ($Y$) 3 6 7 2 6 8 4 8 Calculate the linear regression model of the number of injuries on stress.\nAccording to the most appropriate linear model, what stress level is expected for an athlete that suffered 4 injuries in that period?\nCalculate the logarithmic regression model of the number of injuries on stress.\nWhich regression model is better, the linear or the logarithmic? Justify the answer.\nUse the following sums for the calculations: $\\sum x_i = 38.2$, $\\sum y_j=44$, $\\sum \\log(x_i)=11.3186$, $\\sum \\log(y_j)=12.8664$, $\\sum x_i^2 = 226.28$, $\\sum y_j^2=278$, $\\sum \\log^2(x_i)=18.7028$, $\\sum \\log^2(y_j)=22.4647$, $\\sum x_iy_j = 246.4$, $\\sum x_i\\log(y_j)=69.2607$, $\\sum \\log(x_i)y_j=71.5508$, $\\sum \\log(x_i)\\log(y_j)=20.2895$.\nSolution $\\bar x=4.775$ points, $s_x^2=5.4844$ points$^2$. $\\bar y=5.5$ injuries, $s_y^2=4.5$ injuries$^2$. $s_{xy}=4.5375$ points$\\cdot$injuries. Regression line of injuries on stress: $y=1.5494 + 0.8274x$.\n$x(4)=3.2625$.\n$\\overline{\\log(x)}=1.4148$ log(points), $s_{\\log(x)}^2=0.3361$ log(points)$^2$. $s_{\\log(x)y}=1.1623$ log(points)$\\cdot$injuries. Logartihmic model of injuries on stress: $y=0.6075 + 3.458\\log(x)$.\nLinear coefficient of determination $r^2=0.8342$. Logarithmic coefficient of determination $r^2=0.8932$. Thus, the logarithmic model fits better.\nQuestion 3 A diagnostic test with a sensitivity of 96% and a specificity of 93% is used to determine a disease with a prevalence of 10%.\nWhat are the positive and negative predictive values of the test?\nIf the test is applied to 15 persons, what is the probability of having more than one positive outcomes?\nIf the test is applied to 50 persons, what is the probability of having a wrong diagnosis in more than two persons?\nSolution $PPV = P(D\\vert +) = 0.6038$ and $NPV=P(\\bar D\\vert -)=0.9952$. Naming $X$ to the number of positive outcomes after applying the test to a sample of 15 persons, $P(X\u0026gt;1)=0.7144$. Naming $Y$ to the number of wrong diagnosis after applying the test to a sample of 50 persons, $P(Y\u0026gt;2)=0.6505$. Question 4 It is known from previous studies that the hours of study of Statistics for students that pass the subject follows a normal distribution with mean 50 hours and standard deviation unknown; while for students that fail the subject follows a normal distribution with mean unknown and standard deviation 10 hours. If 20% of students that pass study more than 70 hours and 30% of students that fail study less than 25 hours,\nCalculate the standard deviation of the hours of study distribution for students that pass and the mean of the distribution for students that fail.\nIf a year there are 200 students enrolled in the subject and 150 of them pass, how many of the total students have studied more than 55 hours?\nSolution Naming $H_p$ and $H_f$ to the number of hours of study for students thar pass and fail respectively,\n$\\sigma_p=23.7637$ mg/dl and $\\mu_f=30.141$ hours. $62.8244$ students. ","date":1466640000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4f497265697b1b6777e07aff2740f6f8","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-06-23/","publishdate":"2016-06-23T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-06-23/","section":"teaching","summary":"Grade: Physiotherapy\nDate: June 23, 2016\nQuestion 1 It is believed that the age at which a person finish their first marathon depends on gender. To check it, a sample of 180 marathon runners was drawn.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-06-23","type":"book"},{"authors":null,"categories":["rkTeaching","R"],"content":"The 1.3.0 version of the R package rkTeaching for learning Statistics is available to install. This version is updated with the 3.2.3 version of R and the 0.6.5 version of RKWard.\nThis is a transitional version towards a major update that will arrive shortly and will incorporate the internationalization of the package.\nTo install it visit the page of rkTeaching.\n","date":1464566400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"c080d61a8a40cf8fea7390fb0d1ca8b9","permalink":"/en/post/rkteaching-version1.3/","publishdate":"2016-05-30T00:00:00Z","relpermalink":"/en/post/rkteaching-version1.3/","section":"post","summary":"The 1.3.0 version of the R package rkTeaching for learning Statistics is available to install. This version is updated with the 3.2.3 version of R and the 0.6.5 version of RKWard.\n","tags":["rkTeaching","RKWard"],"title":"Released version 1.3.0 of the rkTeaching package","type":"post"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: May 19, 2016\nQuestion 1 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on gender, a sample of 390 patients (210 males and 180 females) was drawn and the recovery time was measured for every patient. The table below shows the frequencies of times.\n$$ \\begin{array}{ccc} \\hline \\mbox{Time (days)} \u0026amp; \\mbox{Males} \u0026amp; \\mbox{Females}\\newline 20-30 \u0026amp; 50 \u0026amp; 73\\newline 30-40 \u0026amp; 61 \u0026amp; 42\\newline 40-50 \u0026amp; 26 \u0026amp; 31\\newline 50-60 \u0026amp; 32 \u0026amp; 20\\newline 60-70 \u0026amp; 20 \u0026amp; 12\\newline 70-80 \u0026amp; 11 \u0026amp; 2\\newline 80-90 \u0026amp; 10 \u0026amp; 0\\newline \\hline \\end{array} $$\nCalculate the mean of recovery time for males, females and for the whole sample. What mean is more representative the mean of the recovery time of males or the one of females? Justify the answer. What distribution is more symmetric, the distribution of recovery time of males or the one of females? Compare the kurtosis of the recovery time of males and females. Calculate the 80th percentile of the recovery time of males. What percentage of females will have a recovery time greater than 63 days? Use the following sums for the calculations, Males: $\\sum x_in_i = 9290$ days, $\\sum x_i^2n_i=474050$ days$^2$, $\\sum(x_i-\\bar x)^3n_i = 812271.3832$ days$^3$ and $\\sum(x_i-\\bar x)^4n_i = 48895722.3971$ days$^4$. Females: $\\sum x_in_i = 6720$ days, $\\sum x_i^2n_i=282300$ days$^2$, $\\sum(x_i-\\bar x)^3n_i = 347773.3333$ days$^3$ and $\\sum(x_i-\\bar x)^4n_i = 14802393.3333$ days$^4$.\nSolution Males: $\\bar x_m=44.2381$ days, $s^2_m=300.3719$ days$^2$, $s_m=17.3312$ days and $cv_m=0.3918$. Females: $\\bar x_f=37.3333$ days, $s^2_f=174.5556$ days$^2$, $s_f=13.2119$ days and $cv_f=0.3539$. Thus, is more representative the mean of females. $g_{1m}=0.743$ and $g_{1f}=0.8378$. Thus, both distributions are right-skewed but is more symmetric the distribution of males. $g_{2m}=-0.4193$ and $g_{2f}=-0.3011$. Thus, both distributions are platykurtic, but the disribution of males is flatter. $P_{80}=59.7041$ days. $16.68%$. Question 2 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on age, a sample of 8 patients was drawn and the recovery time $Y$ (in days) and ages $X$ (in years) were measured for every patient. The table below shows the results.\nAge (years) Recovery time (days) 32 20 38 25 48 32 51 40 57 55 61 75 68 102 71 130 Calculate the regresion line of the recovery time on the age. According to the linear regression model, what is expected age for a patient with a recovery time of 100 days? Calculate the exponential regression model of the recovery time on age. What regression model explains better the relation between the recovery time and the age, the exponential or the linear? Justify the answer. Use the following sums for the calculations: $\\sum x_i=426$, $\\sum \\log(x_i)=31.5425$, $\\sum y_j=479$, $\\sum \\log(y_j)=31.1866$, $\\sum x_i^2=24008$, $\\sum \\log(x_i)^2=124.909$, $\\sum y_j^2=39603$, $\\sum \\log(y_j)^2=124.7374$, $\\sum x_iy_j=29042$, $\\sum x_i\\log(y_j)=1724.5468$, $\\sum \\log(x_i)y_j=1956.6274$, $\\sum \\log(x_i)\\log(y_j)=124.2263$. Solution Linear model $\\bar x=53.25$ years, $s_x^2=165.4375$ years$^2$. $\\bar y=59.875$ days, $s_y^2=1365.3594$ days$^2$. $s_{xy}=441.9062$ years$\\cdot$days. Regression line of recovery time on age: $y=-82.3631 + 2.6711x$.\n$66.2367$ years.\nExponential model $\\overline{\\log(y)}=3.8983$ log(days), $s_{\\log(y)}^2=0.3953$ log(days)$^2$. $s_{x\\log(y)}=7.9829$ years$\\cdot$log(days). Exponential model of recovery time on age: $y=e^{1.3288 + 0.0483x}$.\nLinear coefficient of determination $r^2=0.8645$. Exponential coefficient of determination $r^2=0.9745$. So the exponential model fits better.\nQuestion 3 In a random sample of 500 people drawn from a population there are 20 persons with an injury $A$, 40 persons with other injury $B$ and 450 persons with none of the injuries. Use relative frequencies to estimate probabilities in following questions:\nCalculate the probability that a person has both injuries Calculate the probability that a person has some injury. Calculate the probability that a person has injury $A$ but no $B$. Calculate the probability that a person has injury $A$ if he or she has injury $B$. Calculate the probability that a person has injury $B$ if he or she doesn\u0026rsquo;t have injury $A$. Are the injuries $A$ and $B$ dependent? Solution $P(A\\cap B) = 0.02$. $P(A\\cup B) = 0.1$. $P(A-B) = 0.02$. $P(A|B) = 0.25$. $P(B|\\bar A) = 0.0625$. The injuries are dependent. Question 4 The level of severity $X$ of an injury is classified in a scale from 1 to 5, from low to high severity. The probability distribution of $X$ in a population is plotted below.\nCalculate and plot the distribution function. Calculate the following probabilities: $P(X\\leq 2)$, $P(X\u0026gt;3)$, $P(X=4.2)$ and $P(1\u0026lt;X\\leq 4.2)$. Calculate the mean and the standard deviation of $X$. Is the mean representative? If a level of severity of 0.05 is considered incurable, what is the probability of having some person with an incurable injury in a sample of 10 persons with the injury? If there are 6 persons injured per month in average, what is the probability of having more than 2 persons injured? What is the probability of having more than 1 person injured with an incurable injury? Solution $$F(x) = \\begin{cases} 0 \u0026amp; \\mbox{if } x\u0026lt;1\\newline 0.2 \u0026amp; \\mbox{if } 1\\leq x\u0026lt; 2\\newline 0.6 \u0026amp; \\mbox{if } 2\\leq x\u0026lt; 3\\newline 0.85 \u0026amp; \\mbox{if } 3\\leq x\u0026lt; 4\\newline 0.95 \u0026amp; \\mbox{if } 4\\leq x\u0026lt; 5\\newline 1 \u0026amp; \\mbox{if } x\\geq 5 \\end{cases} $$ $P(X\\leq 2)=0.6$, $P(X\u0026gt;3)=0.15$, $P(X=4.2)=0$, $P(1\u0026lt;X\\leq 4.2)=0.75$\n$\\mu = 2.4$ and $s=1.0677$. The mean is moderately representative because $cv=0.4449$.\nNaming $X$ to the number of persons having an incurable injury in a sample of 10 persons with the injury, $P(X\\geq 1)=0.4013$.\nNaming $Y$ to the number of persons injured in a month, $P(T\u0026gt;2)=0.938$. Naming $Z$ to the number of persons injured with an incurable injury in an month, $P(T\u0026gt;1)=0.0369$.\nQuestion 5 A diagnostic test to determine doping of athletes returns a positive outcome when the concentration of a substance in blood is greater than 4 $\\mu$g/ml. If the distribution of the substance concentration in doped athletes follows a normal distribution model with mean 4.5 $\\mu$g/ml and standard deviation 0.2 $\\mu$g/ml, and in non-doped athletes follow a normal distribution model with mean 3 $\\mu$g/ml and standard deviation 0.3 $\\mu$g/ml,\nwhat is the sensitivity and specificity of the test? If there are a 10% of doped athletes in a competition, what are the predicted values? Solution Naming $D$ to the event of being doped, $X$ to the concentration in doped athletes and $Y$ to the concentration in non-doped athletes,\nSensitivity $P(+\\vert D) = P(X\u0026gt;4)=0.9938$ and specificity $P(-\\vert \\bar D)=P(Y\u0026lt;4)=0.9996$ PPV $P(D\\vert +) = 0.9961$ and NPV $P(\\bar D\\vert -) = 0.9993$ ","date":1463616000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1652131064,"objectID":"3ef3260185c5eb8d184e5901deb0f762","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-19/","publishdate":"2016-05-19T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-19/","section":"teaching","summary":"Grade: Physiotherapy\nDate: May 19, 2016\nQuestion 1 To check if the recovery time from a patellar tendonitis with a physioterapy treatment depends on gender, a sample of 390 patients (210 males and 180 females) was drawn and the recovery time was measured for every patient.","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-05-19","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics","Physiotherapy"],"content":"Grade: Physiotherapy\nDate: May 13, 2016\nQuestion 1 Of all the anterior cruciate ligament of the knee injuries, the rupture occurs in 20% of cases, and to detect it there are three different tests:\nThe drawer test that analyzes the stability of the tibia. It has a sensitivity of 80% and a specificity of 0.99%. A radiologic study in 2 planes, that allows rule out bone avulsion. It has a sensitivity of 0.85% and a specificity of 0.9%. A magnetic resonance, that it is the most appropriate when there is hematoma. It has a sensitivity and a specificity of 0.98%. Assuming that the tests are independent,\nCompute the predictive values of the drawer test. If an individual has an anterior cruciate ligament injury, what is the probability that the radiologic study and the magnetic resonance return a positive outcome? If an individual has an anterior cruciate ligament injury, what is the probability that the radiologic study or the magnetic resonance give a wrong diagnosis? Solution $PPV_1 = P(D\\vert +_1) = 0.9524$ and $NPV_1=P(\\bar D\\vert -_1)=0.9519$. $P(+_2)=0.25$, $P(+_3)=0.212$ and $P(+_2\\cap +_3)=0.053$. $P(\\mbox{Error}_2)=0.11$, $P(\\mbox{Error}_3)=0.02$ and $P(\\mbox{Error}_2\\cup \\mbox{Error}_3)=0.1278$. Question 2 It is known that 10% of professional soccer players have a cruciate ligament injury during the league. It is also known that the ligament rupture occurs in 20% of cruciate ligament injuries.\nCalculate the probability that in a team with 20 players more than 3 have a cruciate ligament injury during the league. Calculate the probability that in a league with 200 players more than 3 have a ligament rupture. Solution Naming $X$ to the number of players in a team with a cruciate ligament injury, $P(X\u0026gt;3)=0.133$. Naming $Y$ to the number of players in a league with a ligament rupture, $P(Y\u0026gt;3)= 0.5665$. Question 3 In a blood analysis the LDL cholesterol level reference interval for a particular population is $(42,155)$ mg/dl. (The reference interval contains the 95% of the population and is centered in the mean).\nAssuming that the LDL cholesterol level follows a normal distribution,\nCalculate the mean and the standard deviation of the LDL cholesterol level.\nAccording to the LDL cholesterol level, patients are classified into three categories of infarct risk:\nLDL cholesterol level Infarct risk Less than 100 mg/dl Low Between 100 and 160 mg/dl Medium Greater than 160 mg/dl High Calculate the percentage of people in the population that falls into every category of infarct risk.\nThe probability of having an infarct with a high risk is twice the probability of having infarct with a medium risk, and this is twice the probability of having infarct with a low risk. What is the probability of having infart in the whole population if the probability of having infarct with a low risk is 0.01?\nSolution Naming $C$ to the LDL cholesterol level,\n$\\mu=98.5$ mg/dl and $\\sigma=28.25$ mg/dl. $P(\\mbox{Low})=P(C\u0026lt;100)=0.5199$, $P(\\mbox{Medium})=P(100\\leq C\\leq 160)=0.4654$ and $P(\\mbox{Low})=P(C\u0026gt;160)=0.0146$. Thus, there are 51.99% of persons with low risk, 46.54% of persons with medium risk and 1.46% of persons with high risk. Naming $I$ to the event of havig an infarct, $P(I)=0.0151$. ","date":1463097600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"ac4bdefcce517aeee1316f2f97006ad6","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-13/","publishdate":"2016-05-13T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-05-13/","section":"teaching","summary":"Grade: Physiotherapy\nDate: May 13, 2016\nQuestion 1 Of all the anterior cruciate ligament of the knee injuries, the rupture occurs in 20% of cases, and to detect it there are three different tests:","tags":["Exam","Statistics","Biostatistics"],"title":"Physiotherapy exam 2016-05-13","type":"book"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Grade: Physiotherapy\nDate: April 01, 2016\nQuestion 1 The chart below shows the cumulative frequency distribution the maximum angle of knee deflection after a replacement of the knee cap in a group of patients.\nCalculate the quartiles and interpret them. Are there outliers in the sample? What percentage of patients have a maximum angle of knee deflection of 90 degrees? Solution $Q_1=64$, $Q_2=83.3333$, $Q_3=100$. Fences: $F_1=10$ and $F_2=154$. There are no outliers. $F_{90}=60%$. Question 2 The waiting times in a physiotherapy clinic of a sample of patiens are\n18, 8, 27, 6, 13, 26, 14, 23, 14, 31, 27, 19, 15, 20, 11, 30, 25, 23, 20, 15 Calculate the mean. Is representative? Justify the answer. Calculate the coefficient of skewness and interpret it. Calculate the coefficient of kurtosis and interpret it. Use the following sums for the calculations: $\\sum x_i=385$ min, $\\sum(x_i-\\bar x)^2=983.75$ min$^2$, $\\sum (x_i-\\bar x)^3=-601.125$ min$^3$, $\\sum (x_i-\\bar x)^4=98369.1406$ min$^4$.\nSolution $\\bar x=19.25$ min, $s^2=49.1875$ min$^2$, $s=7.0134$ min, $cv=0.3643$. As the $cv\u0026lt;0.5$ there is a low variability and the mean is representative. $g_1=-0.0871$. The distribution is almost symmetrical. $g_2=-0.9671$. The distribution is flatter than a bell curve (platykurtic). Question 3 A study try to determine if there is relation between recovery time $Y$ (in days) of an injury and the age of the person $X$ (in years). For that purpose a sample of 15 persons with the injury was drawn with the following values:\nAge (years) Recovery time (days) 21 20 26 26 30 27 34 32 39 36 45 37 51 38 54 41 59 42 63 45 71 44 76 43 80 45 84 46 88 44 Compute the regression line of the recovery time on the age. How much increase the recovery time for each year of age? Compute the logarithmic regression model of the recovery time on the age. Which of the previous models explains better the relation between the recovery time and the age? Justify the answer. Use the best of the previous models to predict the recovery time of a person 50 years old. Is reliable the prediction? Use the following sums for the calculations: $\\sum x_i=821$, $\\sum \\log(x_i)=58.7255$, $\\sum y_j=566$, $\\sum \\log(y_j)=54.0702$, $\\sum x_i^2=51703$, $\\sum \\log(x_i)^2=232.7697$, $\\sum y_j^2=22270$, $\\sum \\log(y_j)^2=195.7633$, $\\sum x_iy_j=33256$, $\\sum x_i\\log(y_j)=3026.6478$, $\\sum \\log(x_i)y_j=2265.458$, $\\sum \\log(x_i)\\log(y_j)=213.1763$.\nSolution Linear model $\\bar x=54.7333$ years, $s_x^2=451.1289$ years$^2$. $\\bar y=37.7333$ days, $s_y^2=60.8622$ days$^2$. $s_{xy}=151.7956$ years$\\cdot$days. Regression line of recovery time on age: $y=19.3167 + 0.3365x$. Every year of age the recovery time increases 0.3365 days.\nLogartihmic model $\\overline{\\log(x)}=3.915$ log(years), $s_{\\log(x)}^2=0.1905$ log(years)$^2$. $s_{\\log(x)y}=3.3033$ log(years)$\\cdot$days. Logartihmic model of recovery time on age: $y=-30.1526 + 17.3398\\log(x)$.\nLinear coefficient of determination $r^2=0.8392$. Logarithmic coefficient of determination $r^2=0.9411$. So the logarithmic model fits better.\n$y(50)=-30.1526 + 17.3398\\log(50) = 37.6812$.\n","date":1461110400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1616018106,"objectID":"fd2ee0cb4df8d68c89d890403cdbedb8","permalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-04-01/","publishdate":"2016-04-20T00:00:00Z","relpermalink":"/en/teaching/statistics/exams/physiotherapy/physiotherapy-2016-04-01/","section":"teaching","summary":"Grade: Physiotherapy\nDate: April 01, 2016\nQuestion 1 The chart below shows the cumulative frequency distribution the maximum angle of knee deflection after a replacement of the knee cap in a group of patients.","tags":["Exam","Statistics","Biostatistics","Physiotherapy"],"title":"Physiotherapy exam 2016-04-01","type":"book"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1451606400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"2414f1cf796086a6cb4c1e40f2fc1ff0","permalink":"/en/publication/innovacion-2016-2/","publishdate":"2020-09-16T21:26:03.10618Z","relpermalink":"/en/publication/innovacion-2016-2/","section":"publication","summary":"","tags":[],"title":"Innovación en la docencia de Estadística con R y rk.Teaching","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1451606400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"39bad87ab5875006e593268ed2565eaf","permalink":"/en/publication/innovacion-2016/","publishdate":"2020-09-16T21:26:01.841858Z","relpermalink":"/en/publication/innovacion-2016/","section":"publication","summary":"","tags":[],"title":"Innovación en la docencia de Estadística con R y rk.Teaching","type":"publication"},{"authors":null,"categories":null,"content":"I\u0026rsquo;m glad to offer a basic manual of Excel, the famous Microsoft Office spreadsheet. This manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\nThe version of Excel used in this manual is Excel 2010, but some parts of this manual are also valid for other versions.\nThis is my first manual in English and so there is likely to be some grammatical errors. I apologize by that and I would like to ask you to correct me in the forum below. I hope you enjoy it.\n","date":1441843200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"4b5f7ad46462bd7df6d375917af9d55e","permalink":"/en/post/excel-manual/","publishdate":"2015-09-10T00:00:00Z","relpermalink":"/en/post/excel-manual/","section":"post","summary":"I\u0026rsquo;m glad to offer a basic manual of Excel, the famous Microsoft Office spreadsheet. This manual is intended mainly for students of Economics and Business Administration, and for that reason, most of the examples in this manual are applied to accounting and finance. However, the manual also serves for learning a basic management of Excel, no matter the field of application.\n","tags":null,"title":"New Excel manual","type":"post"},{"authors":["Alfredo Sánchez Alberca"],"categories":[],"content":"","date":1420070400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"6fd719886c7cb7d4e40b67df952eb8a7","permalink":"/en/publication/bringing-2015/","publishdate":"2020-09-16T21:26:02.037032Z","relpermalink":"/en/publication/bringing-2015/","section":"publication","summary":"","tags":[],"title":"Bringing R to non-expert users with the package RKTeaching","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"e4bec646020b6be1cad0ffbe8a72bfba","permalink":"/en/publication/bioestadistica-2014/","publishdate":"2020-09-16T21:26:02.230139Z","relpermalink":"/en/publication/bioestadistica-2014/","section":"publication","summary":"","tags":[],"title":"Bioestadística Aplicada con SPSS","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"aa51cd44895df4c2492b5da16f359322","permalink":"/en/publication/towards-2014-1/","publishdate":"2020-09-16T21:26:02.426838Z","relpermalink":"/en/publication/towards-2014-1/","section":"publication","summary":"","tags":[],"title":"Towards a Semanctic Catalog of Similarity Measures","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1388534400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"ef9a9eb14999b77d107482c74d6c3dbd","permalink":"/en/publication/towards-2014/","publishdate":"2020-09-16T21:26:01.646422Z","relpermalink":"/en/publication/towards-2014/","section":"publication","summary":"","tags":[],"title":"Towards a Semantic Catalog of Similarity Measures","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1356998400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"5f25873cb050b635b9d7ac567a3583fa","permalink":"/en/publication/rkteaching-2013/","publishdate":"2020-09-16T21:26:02.327055Z","relpermalink":"/en/publication/rkteaching-2013/","section":"publication","summary":"","tags":[],"title":"RKTeaching: a new R package for teaching Statistics .","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1325376000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"eff321037b1ca9bd9606d2292bb57de4","permalink":"/en/publication/rkteaching-2012/","publishdate":"2020-09-16T21:26:02.816682Z","relpermalink":"/en/publication/rkteaching-2012/","section":"publication","summary":"","tags":[],"title":"RKTeaching: Un paquete de R para la enseñanza de la Estadística","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1167609600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"99b05fedcabac3a3c427f3d002dbee36","permalink":"/en/publication/evolution-2007/","publishdate":"2020-09-16T21:26:01.746253Z","relpermalink":"/en/publication/evolution-2007/","section":"publication","summary":"","tags":[],"title":"Evolution of neuroendocrine cell population and peptidergic innervation, assessed by discriminant analysis, during postnatal development of the rat prostate","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1104537600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"1978ae6e507b7e05ad3b7c46ff392638","permalink":"/en/publication/amon-2005/","publishdate":"2020-09-16T21:26:02.916684Z","relpermalink":"/en/publication/amon-2005/","section":"publication","summary":"","tags":[],"title":"AMON: A software system for automatic generation of ontology mappings","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1072915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"199ff9472d40e6d884fe9391ee05c343","permalink":"/en/publication/framework-2004/","publishdate":"2020-09-16T21:26:02.625162Z","relpermalink":"/en/publication/framework-2004/","section":"publication","summary":"","tags":[],"title":"Framework for automatic generation of ontology mappings","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1072915200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"e58d1e5c9a8550c9c583ec27c0e329ca","permalink":"/en/publication/herramientas-2004/","publishdate":"2020-09-16T21:26:02.523121Z","relpermalink":"/en/publication/herramientas-2004/","section":"publication","summary":"","tags":[],"title":"Herramientas de trabajo cooperativo","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":1009843200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"3bdc4ec32bc413161b88aaf91a72f877","permalink":"/en/publication/aspectos-2002/","publishdate":"2020-09-16T21:26:03.009331Z","relpermalink":"/en/publication/aspectos-2002/","section":"publication","summary":"","tags":[],"title":"Aspectos técnicos de la comunidad virtual de usuarios FARMATOXI","type":"publication"},{"authors":["G; Rimbau, V; Sanchez-Alberca, A; Reverte, M; Alguacil, L F Repetto"],"categories":[],"content":"","date":978307200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"1787620af867a98e635721f0278eca4d","permalink":"/en/publication/farmatoxi-2001/","publishdate":"2020-09-16T21:26:03.700427Z","relpermalink":"/en/publication/farmatoxi-2001/","section":"publication","summary":"","tags":[],"title":"FARMATOXI, a new virtual community of pharmacology and toxicology","type":"publication"},{"authors":["Alfredo Sánchez-Alberca"],"categories":[],"content":"","date":946684800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600293332,"objectID":"d3e2a26e98a746b19fdefca4b7030590","permalink":"/en/publication/farmatoxi-2000/","publishdate":"2020-09-16T21:26:02.719337Z","relpermalink":"/en/publication/farmatoxi-2000/","section":"publication","summary":"","tags":[],"title":"FARMATOXI: Red temática de farmacología y toxicología de RedIris","type":"publication"},{"authors":null,"categories":["Calculus","Derive"],"content":"","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"998b467c38242c175495515600ab7700","permalink":"/en/teaching/derive/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/derive/","section":"teaching","summary":"","tags":["Problems"],"title":"Calculus with Derive","type":"teaching"},{"authors":null,"categories":["Calculus","Geogebra"],"content":"Geogebra GeoGebra is an open source interactive software intended for learning Mathematics in secondary and higher education. Below we present you a Calculus manual with Geogebra, focused, mainly, in the analytical resolution of calculus problems in one and several variables with the CAS view (symbolic calculus) of Geogegra.\n","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"16fb83370cf0bce0cb1290c5f834b6ff","permalink":"/en/teaching/geogebra/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/geogebra/","section":"teaching","summary":"Geogebra GeoGebra is an open source interactive software intended for learning Mathematics in secondary and higher education. Below we present you a Calculus manual with Geogebra, focused, mainly, in the analytical resolution of calculus problems in one and several variables with the CAS view (symbolic calculus) of Geogegra.","tags":["Problems"],"title":"Calculus with Geogebra","type":"teaching"},{"authors":null,"categories":["Statistics","Biostatistics"],"content":"Statistics formulas Statistics and Probability formulas Excel formulas Standard normal probability distribution table ","date":-62135596800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1600206500,"objectID":"27938b28aaa052fd4512812b0f2e896f","permalink":"/en/teaching/statistics/cheatsheets/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/en/teaching/statistics/cheatsheets/","section":"teaching","summary":"Everything you have to know at a glance","tags":["Cheat sheet"],"title":"Statistics Cheat Sheets","type":"book"}] \ No newline at end of file diff --git a/en/index.xml b/en/index.xml index 3f901055..b195d943 100644 --- a/en/index.xml +++ b/en/index.xml @@ -477,7 +477,7 @@ For each value or category of the variable, a bar is draw to the height of its f <p><strong>Example</strong>. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.</p> -<div id="chart-142863579" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> +<div id="chart-762543189" class="chart pb-3" style="max-width: 100%; margin: auto;"></div> <script> (function() { let a = setInterval( function() { @@ -487,7 +487,7 @@ For each value or category of the variable, a bar is draw to the height of its f clearInterval( a ); Plotly.d3.json("./img/absolute-barchart.json", function(chart) { - Plotly.plot('chart-142863579', chart.data, chart.layout, {responsive: true}); + Plotly.plot('chart-762543189', chart.data, chart.layout, {responsive: true}); }); }, 500 ); })(); diff --git a/en/project/rkteaching/index.html b/en/project/rkteaching/index.html index c90f8456..5f55de0f 100644 --- a/en/project/rkteaching/index.html +++ b/en/project/rkteaching/index.html @@ -280,7 +280,7 @@ - + @@ -311,7 +311,7 @@ ], "datePublished": "2020-09-01T00:00:00Z", - "dateModified": "2022-02-24T07:24:00+01:00", + "dateModified": "2024-10-24T23:25:00+02:00", "author": { "@type": "Person", @@ -737,7 +737,7 @@

rkTeaching

Last updated on - Feb 24, 2022 + Oct 24, 2024 diff --git a/en/sitemap.xml b/en/sitemap.xml index 3ddae04c..27d2d747 100644 --- a/en/sitemap.xml +++ b/en/sitemap.xml @@ -105,7 +105,7 @@ 2020-09-15T23:48:20+02:00 /en/ - 2022-06-13T17:27:00+02:00 + 2024-10-24T23:25:00+02:00 2022-06-13T17:27:00+02:00 /en/categories/ - 2022-06-13T17:27:00+02:00 + 2024-10-24T23:25:00+02:00 2022-06-13T17:27:00+02:00 /en/tags/ - 2022-06-13T17:27:00+02:00 + 2024-10-24T23:25:00+02:00 2021-10-26T23:48:09+02:00 /en/project/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 /en/category/r/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 /en/tag/software/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 2021-01-23T09:43:57+01:00 /en/tag/rkteaching/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 /en/project/rkteaching/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 /en/tag/rkward/ - 2022-02-24T07:24:00+01:00 + 2024-10-24T23:25:00+02:00 Bar chart

Example. The bar chart below shows the absolute frequency distribution of the number of children in the previous sample.

-
+