Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

All published worksheets from http://sagenb.org

Views: 168771
Image: ubuntu2004

Introduction

Sage is a computing environment designed particularly for doing mathematics. We are going to run through a few basic tools that should help you do the calculations you have been doing all year on a bigger sale than you would do by hand.

Sage can do much, much more, and if you are interested, I recommend you start with the tutorial.

We start with a classical program nearly all tutorials start with. Think of it as an benediction, or invocation of the muse.

Click in the text box below. type the following, exactly as you see it:

print 'Hello, world.'

Next, click the evaluate link or just hold down Shift and hit Return.

That's pretty much all there is to computing. Tell the computer to do something and it does it.

We begin by using sage as a simple calculator.

print 34/17 print 34 * 17^16 print 34/7.0
2 1654480523772673528354 4.85714285714286

We can store values as variables (names can be any length, not just one letter) and can use them in other expressions.

my = 17 print my * 2 my = my - 15 print my * 2
34 4

Notice in the third line above, it appears we have a nonsensical mathematical expression my = my - 15. In computer language, this statement does not mean equality, but rather is an instruction to set the value of the varible on the left to the expression on the right. In other words, update the value of my by subtracting 15.

A variable need not only stand for a number. One of the most critical data types is the list, which is exactly that, an ordered list. We use the square brackets [] and separate each entry by a comma.

jean = [2,4,6,0,1]
print len(jean) print jean[0] print jean[2]
5 2 6

We can find the length (i.e. the number of elements) of any list using the len() function. We can also reference individual entries by using their index, or position, within the list. Note: computer scientists tend to start counting at 0, not 1, so if you try the following, you get an error.

jean[5]
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "_sage_input_7.py", line 4, in <module> exec compile(ur'open("___code___.py","w").write(_support_.preparse_worksheet_cell(base64.b64decode("amVhbls1XQ=="),globals())+"\n"); execfile(os.path.abspath("___code___.py"))' + '\n', '', 'single') File "", line 1, in <module> File "/tmp/tmplSvdAb/___code___.py", line 2, in <module> exec compile(ur'jean[_sage_const_5 ]' + '\n', '', 'single') File "", line 1, in <module> IndexError: list index out of range

Sage has a lot of useful built-in functions (actually, 23,763 as of the latest release. Some are exhibited below. You may recognize them.

jean.sort() print jean print mean(jean) print std(jean)
[0, 1, 2, 4, 6] 13/5 sqrt(29/5)

Note that the standard deviation is the sample standard deviation (i.e. the divisor is n-1). If you want to get population standard deviation, you must pass the argument bias=True.

std(jean,bias=True)
2/5*sqrt(29)

Note that sage leaves numbers as general expressions. If you want to see a decimal expansion, invoke the N() function.

N(std(jean))
2.40831891575846

Sage has several probability models built in to it, including the major ones we studied. They are accessed through the class RealDistribution. Thoiugh there are many, the relevant ones are as follows:

  • Normal model - 'gaussian'
  • Student's T Model - 't'
  • χ2 Model = 'chisquared'
#auto # This is a comment. The system stops reading a line at '#'. Feel free to insult it here. Norman = RealDistribution('gaussian',1) # The second parameter is the std dev. The mean is always 0. Theo = RealDistribution('t',2) # The second parameter is the degrees of freedom. Kai = RealDistribution('chisquared',3) # Here as well, the second parameter is d.f.

The models have a lot of components and functionality (called methods and attributes). You can get a list by typing the name of the variable followed by a '.' and hit the Tab key.

Try it with Theo here.

Of immediate use is the plot() function.

Theo.plot()

We can make fancier pictures by passing more options and combining plots.

p1 = Norman.plot((x,-3,10),fill='axis',fillcolor='yellow') p2 = Theo.plot((x,-3,10),fill='axis',fillcolor='red',linestyle='--') p3 = Kai.plot((x,0,10),fill='axis',fillcolor='blue',linestyle=':') (p1+p2+p3).show()

Or combining them with other objects.

(s1,s2,s3) = (Norman.distribution_function(1),Norman.distribution_function(2),Norman.distribution_function(3)) p = Norman.plot((x,-3.5,4),fill='axis',fillcolor='green') p += line([(-1,s1),(1,s1),(1,s1+.03)], color='black') p += line([(-2,s2),(2,s2),(2,s2+.03)], color='black') p += line([(-3,s3),(3,s3),(3,s3+.03)], color='black') p += text('68%', (1.05,s1+.02), color='red',horizontal_alignment='left',fontsize=15) p += text('95%', (2.05,s2+.02), color='red',horizontal_alignment='left',fontsize=15) p += text('99.7%', (3.05,s3+.02), color='red',horizontal_alignment='left',fontsize=15) p.show()

Of major importance is the cumulative distribution function (CDF). Which take a single argument and returns the area under the probability distribution below that number, i.e. the probability of a random element falling below that number. (You'll recognize its values as those listed on the various tables distributed during the class.) This is accessed through the cum_distribution_function() method. Its inverse, naturally, is included as the cum_distribution_function_inv() method.

Match these values with those on your tables.

print Norman.cum_distribution_function(1) print Norman.cum_distribution_function_inv(.025)
0.841344746069 -1.95996398454

Putting It Together

Now that we have this functionality, we can automate many of the statistics tasks we perform by writing our own functions. This is done using the def keyword.

Functions

Before beginning, we must decide some parameters for our function, namely, what its inputs and outputs should be. Let's write a function CI() to compute the confidence interval for a mean. We need:

  • Input: data (the values of a sample, in list form), lev (the confidence level, a number from 0 to 1)
  • Output: a tuple, that is, a pair of numbers representing the endpoints of the interval
def CI(data,lev): # the def keyword tells the computer we are defining a function n = len(data) # count the sample size. (note all code must be indented properly) y = mean(data) # the sample mean se = std(data)/sqrt(n) # the standard error M = RealDistribution('t',n-1) # use a t model for means ts = M.cum_distribution_function_inv((lev+1)/2) #the critical value return (N(y-ts*se),N(y+ts*se)) # this return the pair of numbers representing the interval
CI([5,6,5,7,7,6,7,8,8,7,8,8,9,6,6,5,6,7,6,7,7,8,6,5,9,5,10,9,6,4,8],.9)
(6.36526659279120, 7.24763663301525)

Simulations

We can also use the random number generatorbuilt into Sage to run simulations of real-world phenomena. Probability models all come with a method get_random_element() which does exactly what it says it does.

Recall the problem with students'´ heights. We had two models for males and females and wanted to find the probability that a random pairing would place a shorter male with a taller female. As always we start with the models.

M = RealDistribution('gaussian',4) # Males had a standard deviation of four inches in height F = RealDistribution('gaussian',3) # while it was 3" for females.

We can select a random pair as follows.

d1 = (66 + F.get_random_element(),70+M.get_random_element()) print d1
(64.3311384647, 69.5198620573)

Now, let's run this experiment a thousand times to get a thousand data points.

pairs = [] # Start with a blank list. for i in range(1000): # This loops the following operation 1000 times. pairs.append((66 + F.get_random_element(),70+M.get_random_element()))

You'll see it does not take long at all to run this "experiment". Let's see what our results were with respect to a taller female.

count = 0 for i in pairs: if (i[0] > i[1]): # 'if' denotes a conditional; the following line will be executed count += 1 # if and only if the condition in parentheses is met. print count print count/1000.
221 0.221000000000000

Now, let's check this against the theoretic probability. For this, we make a new model for the difference in height between males and females. The expected difference μ\mu is 4 inches. The standard deviation σ\sigma come from the difference of the two random variables. σ=σF2+σM2\sigma = \sqrt{\sigma_F^2+\sigma_M^2}

dif = RealDistribution('gaussian',1)

The female is taller exactly when the difference is less than 0. Therefore, we compute the zz-score of 0. z=xμσ=045=.8z = \frac{x-\mu}{\sigma} = \frac{0-4}{5} = -.8

dif.cum_distribution_function(-.8)
0.211855398583

This reflects rather closely the experiment above.

Exercises

1. Take one varaible collected from your project and enter it here as a list.

2. Now use an appropriate model and compute either of the following:

  • an appropriate confidence interval for your data.
  • a PP-value for your data against some expected or established result.

 

3. (Challenge) Write a function that will return the five-number summary (Q0,Q1,,Q4)(Q_0,Q_1,\ldots,Q_4) for any list of data.