All published worksheets from http://sagenb.org
Image: ubuntu2004
Introduction
Sage is a computing environment designed particularly for doing mathematics. We are going to run through a few basic tools that should help you do the calculations you have been doing all year on a bigger sale than you would do by hand.
Sage can do much, much more, and if you are interested, I recommend you start with the tutorial.
We start with a classical program nearly all tutorials start with. Think of it as an benediction, or invocation of the muse.
Click in the text box below. type the following, exactly as you see it:
print 'Hello, world.'
Next, click the evaluate link or just hold down Shift and hit Return.
That's pretty much all there is to computing. Tell the computer to do something and it does it.
We begin by using sage as a simple calculator.
We can store values as variables (names can be any length, not just one letter) and can use them in other expressions.
Notice in the third line above, it appears we have a nonsensical mathematical expression my = my - 15. In computer language, this statement does not mean equality, but rather is an instruction to set the value of the varible on the left to the expression on the right. In other words, update the value of my by subtracting 15.
A variable need not only stand for a number. One of the most critical data types is the list, which is exactly that, an ordered list. We use the square brackets [] and separate each entry by a comma.
We can find the length (i.e. the number of elements) of any list using the len() function. We can also reference individual entries by using their index, or position, within the list. Note: computer scientists tend to start counting at 0, not 1, so if you try the following, you get an error.
Sage has a lot of useful built-in functions (actually, 23,763 as of the latest release. Some are exhibited below. You may recognize them.
Note that the standard deviation is the sample standard deviation (i.e. the divisor is n-1). If you want to get population standard deviation, you must pass the argument bias=True.
Note that sage leaves numbers as general expressions. If you want to see a decimal expansion, invoke the N() function.
Sage has several probability models built in to it, including the major ones we studied. They are accessed through the class RealDistribution. Thoiugh there are many, the relevant ones are as follows:
- Normal model - 'gaussian'
- Student's T Model - 't'
- χ2 Model = 'chisquared'
The models have a lot of components and functionality (called methods and attributes). You can get a list by typing the name of the variable followed by a '.' and hit the Tab key.
Try it with Theo here.
Of immediate use is the plot() function.
We can make fancier pictures by passing more options and combining plots.
Or combining them with other objects.
Of major importance is the cumulative distribution function (CDF). Which take a single argument and returns the area under the probability distribution below that number, i.e. the probability of a random element falling below that number. (You'll recognize its values as those listed on the various tables distributed during the class.) This is accessed through the cum_distribution_function() method. Its inverse, naturally, is included as the cum_distribution_function_inv() method.
Match these values with those on your tables.
Putting It Together
Now that we have this functionality, we can automate many of the statistics tasks we perform by writing our own functions. This is done using the def keyword.
Functions
Before beginning, we must decide some parameters for our function, namely, what its inputs and outputs should be. Let's write a function CI() to compute the confidence interval for a mean. We need:
- Input: data (the values of a sample, in list form), lev (the confidence level, a number from 0 to 1)
- Output: a tuple, that is, a pair of numbers representing the endpoints of the interval
Simulations
We can also use the random number generatorbuilt into Sage to run simulations of real-world phenomena. Probability models all come with a method get_random_element() which does exactly what it says it does.
Recall the problem with students'´ heights. We had two models for males and females and wanted to find the probability that a random pairing would place a shorter male with a taller female. As always we start with the models.
We can select a random pair as follows.
Now, let's run this experiment a thousand times to get a thousand data points.
You'll see it does not take long at all to run this "experiment". Let's see what our results were with respect to a taller female.
Now, let's check this against the theoretic probability. For this, we make a new model for the difference in height between males and females. The expected difference is 4 inches. The standard deviation come from the difference of the two random variables.
The female is taller exactly when the difference is less than 0. Therefore, we compute the -score of 0.
This reflects rather closely the experiment above.
Exercises
1. Take one varaible collected from your project and enter it here as a list.
2. Now use an appropriate model and compute either of the following:
- an appropriate confidence interval for your data.
- a -value for your data against some expected or established result.
3. (Challenge) Write a function that will return the five-number summary for any list of data.