Select Page

Topics will include efficiency, consistency, sufficiency, robustness, methods of estimation, confidence intervals, Bayesian inference as well as the Neyman-Pearson lemma, power functions, likelihood ratio tests, hypothesis tests, and applications." 1These notes are meant to supplement the lectures for Stat 411 at UIC given by the author. Some of the parameters you change very rarely, and others you change all the time. We cannot cover mathematical modeling in detail in MA346, because it can take several courses on its own, but you can learn more about regression modeling in particular in MA252 at Bentley. In mathematical modeling and machine learning, we sometimes distinguish between a model and a fit model. But there is a relatively new toolkit called Pingouin; it’s not as popular (yet?) If you need to bind later ones, then you can do it yourself using a lambda, as in the following example. But when you can use NumPy, you should, for the following important reason. TeX code for all documents is available from a git repository. This Statistics preparation material will cover the important concepts of Statistics syllabus. These notes have not been classroom tested and may have typographical errors. I’ve done so in the example code below. In programming, we almost never name variables with unhelpful names like k and x, because later readers of the code (or even ourselves reading it in two months) won’t know what k and x actually do. Consistency and Limiting Distributions. That linear model is technically a function of three variables; we might write it as $$f(\beta_0,\beta_1,x)$$. (The rvs function stands for “random values.”). Compare: But the first one is so much easier to read. Let’s try guessing $$\beta_0=1,\beta_1=2,\beta_2=3$$. To begin, we code the model as a Python function taking inputs in this order: first, $$x$$, then after it, all the model parameters $$\beta_0,\beta_1$$, and so on, however many model parameters there happen to be (in this case three). Example 1: The quadratic formula is almost always written using the letters $$a$$, $$b$$, and $$c$$. A mathematical modeling course can help you learn how to assess the appropriateness of a given type of line, curve, or more complex model for a given situation. Using the language from earlier in this chapter, SciPy will tell us how to bind values to the parameters $$\beta_0,\beta_1,\beta_2$$ of my_model so that the resulting function, which just takes x as input, is the one best fit to our data. It contains chapters discussing all the basic concepts of Statistics with suitable examples. So model fitting is an example of binding the variables of a function. Concatenating and Merging DataFrames 13. Notice that Mathematical Statistics 1 is a sufficient prerequisite for Mathematical Statistics 2 (STAT 4057/5057), Applications of Statistics (STAT 4287/5287), and Sampling and Survey Techniques (STAT 4307/5307). The course roughly follows the text by Hogg, McKean, and Craig, Introduction to Mathematical Statistics, 7th edition, 2012, henceforth referred to as HMC. The notes and supplements may contain hyperlinks to posted webpages; the links appear in red fonts. © Copyright 2020. Lecture Notes on Mathematical Statistical Physics Roderich Tumulka⇤ Summer semester 2019 ⇤Fachbereich Mathematik, Eberhard-Karls-Universitat, Auf der Morgenstelle 10, 72076 Tu¨bingen. If we had had to write it in pure Python, we would have used either a loop or a list comprehension, like in the example below. We will assume you have data stored in a pandas DataFrame, and we will lift out just two columns of the DataFrame, one that will be used as our $$x$$ values (independent variable), and the other as our $$y$$ values (dependent variable). math.exp(x) is $$e^x$$, so the following computes $$e$$. The prerequisites are Linear Algebra (MATH 2010), Foundations of Probability and Statistics  Calculus Based (MATH 2050), and Calculus 3 (MATH 2110). The "Proofs of Theorems" files were prepared in Beamer. Rounding to a few decimal places, our model is therefore the following: It fits the data very well, as you can see below. All the functions in NumPy are vectorized, meaning that they will automatically apply themselves to every element of a NumPy array. Notice that this makes it very easy to compute certain mathematical formulas. but it has some advantages over the other two. You’ll probably have a particular normal distribution you want to work with, so you’ll choose $$\mu$$ and $$\sigma$$, and then you’ll want to use the function on many different values of $$x$$. The author makes no guarantees that these notes are free of typos or other, more serious errors. Example 2: Statistics always uses $$\mu$$ for the mean of a population and $$\sigma$$ for its standard deviation. Relations as Graphs - Network Analysis, 9.6.4. Curve-fitting is a powerful tool, and it’s easy to misuse it by fitting to your data a model that doesn’t make sense for that data. A fit model is the specific version of the general model that’s been tailored to suit your data. Obviously, it’s not the equation of a line, so linear regression tools like those covered in the GB213 review notebook won’t be sufficient. You start with your own guess for the parameters, and SciPy will improve it. Notes 1. This tutorial will give you great understanding on concepts present in Statistics syllabus and after completing this preparation material you will be at intermediate level of experties from where you can take yourself at higher level of expertise. In fact, SciPy’s built-in random number generating procedures let you use them either by binding arguments or not, at your preference. There are some other functions useful for data work (like math.dist(), math.comb(), and math.perm()) coming in Python 3.8, but most Python tools (like pandas, NumPy, and SciPy) haven’t yet been udpated to work with Python 3.8. Abstraction 8. Processing the Rows of a DataFrame 12. You can refer at any time to one of the appendices in these course notes, a review of GB213, but in Python. A comprehensive list of NumPy’s math routines appear in the NumPy documentation. Since NumPy implements tons of mathematical tools, why bother using the ones in Python’s built-in math module? E-mail: roderich.tumulka@uni-tuebingen.de 1 # now I can use that on as many x inputs as I like, such as: $$y=\frac{\beta_0}{1+e^{\beta_1(-x+\beta_2)}}$$, 15. So the first two parameters we choose just once, and the third parameter changes all the time. That’s why we need SciPy to find the $$\beta$$s. Here’s how we ask it to do so. If you had to write a loop to apply a Python function (like lambda x: x**2) to a list of 1000 entries, then the loop would (obviously) run in Python. If we wrote code where we used mean and standard_deviation for those, that wouldn’t be hard to read, but it wouldn’t be as clear, either. Let p(,~) d,z be the probability that an event will occur in the infinitesimal unit of time, a to ~ -}- de. Using vectorization saves you the work of writing loops. See also the slides that summarize a portion of this content. Author Joseph McKean has a HOMEPAGE for the 8th EDITION of HMC where auxillary and supplemental material (including information on the software R) for the book can be downloaded.