CoCalc -- 2017-11-05-144938.ipynb

⁵⁹ views

Kernel: Python 2 (SageMath)

Part 1: Synthesizing original LBA data Briefly describe your two variables. Is there any relationship between the variables? Do you have reason to believe there is a causal relationship between the variables? Why or why not? Make a prediction about how you think the results of next round data collection will turn out. [#variables]

Briefly describe your two variables. Is there any relationship between the variables? Do you have reason to believe there is a causal relationship between the variables? Why or why not? Make a prediction about how you think the results of next round data collection will turn out

Why did you choose this variable? How will you choose which cafes to go to? How do you define a cafe? Why did you choose this?

The variables we have chosen this assignment are the ratio of laptop users in a cafe and the WIFI speeds. we chose the variables to identify if there is a strog correlation between them. many factor can affect the ratio of laptop users in cafes, and in this assignment, we will see if WiFi as a dependent variable is a storng factor. We chose to go to Lower Nob hill nihberhood, which have a mix of offices and apratments. we chose the 10 Main cafesd with free working internet connection, and that were considered to be good places to work or study. We believe there is a causal relationship between the variables because we feel that as higher the WIfi speed will be, the ratio of laptop users will grow. Laptop users use internet to oprate most of their applications on the computer. We expect the results of this data collection to be interesting because this neigberhood is not all apartments or offices, and it is not considered to be as welthy as SOMA or Financial District. to be a posetive linear corelation.

The selected area to be mesured is Lower Nob hill, in San Franssisco, between Powell st, Bush st., Leavenworth st. and Geary st.

In [1]:

from IPython.display import Image
Image("Screen Shot 2017-11-05 at 14.42.58.png")

Out[1]:

Laptop users ratio data collection:

In this study, you investigate the ratio of laptop users in a cafe in San Fransisco at noon (12:00-14:00).
The data will be collected making observations, that will be conducted in ten different cafes in a ten-block area in the city.
Make sure the observations being collected on the same hours of the control variable, and on weekdays.
In this area you should count the number of customers in each of the coffee shops, and the number of people using a laptop (not including tablets or smartphones).
Then use the data to calculate the ratio (percentage) of the cafe customers (How much do they constitute out of the total number).
Finaly, divide the ratio by the number of measurments to caulcalte the avereage ratio of the laptop useres.

In [35]:

import numpy as np
from numpy import median


PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4]
  
mean = sum(PUL) / float(len(PUL)) # mean is average. to culcatale that we divide the sum of the all the items of data by the number of the items of data.
    
Median = median(PUL) # This library function finds the median by organizing the list from low to high and then finds the value in the middle. In this case, there is an even number of values, so it cualcaltes the average of the two numbers in the middle.

Range = (max(PUL) - min(PUL)) # The range is the difference between the lowest and highest values. we subtract the highest value we have by the lowest item of data.

STD = np.std(PUL) # This function uses the list with the formula of standard deviation. The sum of every value on the list decreased by the value of the mean, and than squared. Divided by the length of the list minus one, and then square root the answer.

ALL = [mean, Median, Range, STD]
stats = ["Mean = ","Median = ", "Range = ", "Standard Deviation = "]

    
for y in range(4):
    print (str(stats[y])+str(ALL[y]))

Out[35]:

Mean = 0.304545454545
Median = 0.3
Range = 0.866666666667
Standard Deviation = 0.248540274676

Data Interpretation: The range is very high, as we are dealing with ratio, which means there are very low values and measurements with almost ratio of 100% The mean gives us the average of the all the chosen values. The mean comes out to be 0.304. This means that average ratio of laptop usres in Lower Nob Hill in the 10 Block area has an average of 30.4% ratio of plaptop users in a cafe. The median gives us the midpoint of the frequency distribution of the chosen values. For this set of observations, the median comes out to be 0.3. The mean minus the median is almost zero, since the difference comes out to be around 0.004, which is considerably small considering the range (0.866), the histogram will look simetrical in a way, maybe a little negative skewed, because the mean is lower than the median. It means that many of the values in the graph tend to be towards the left side of the histogram, meanings the values are towards the lower side of the range. This means that many of the cafes have lower lpatop useres ratio. The range is very high, as we are dealing with ratio (0.866 when 1 is the maximum), which means there are very low values and measurements with almost ratio of 100%

In [37]:

import matplotlib.pyplot as plt
import numpy as np

PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4]
plt.hist(PUL, bins = 5)
plt.xlabel("Ratio of laptop users")
plt.ylabel("Frequency")

#fig = plt.gcf()

Out[37]:

<matplotlib.text.Text at 0x7fecd02440d0>

2.2

In [38]:

import matplotlib.pyplot as plt

# Data
WiFi = [3050,3000,2800,2700,2600,2400,2300, 2047, 2000, 1980 ]
PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4] # Ratio of laptop users


# Plot
plt.scatter(PUL, WiFi) # Searching for correlation between the two variables
plt.title('Ratio of laptop Users VS. Wifi Signal') # Title of the diagram
plt.xlabel('Ratio of Laptop Users') # Title of the X axis.
plt.ylabel('revenue (US Dollar $)') # Title of the Y axis.
plt.xlim(0, 1)
plt.ylim(0, 3500)
plt.show()

print ("Cafe' num." + "\t" + "Ratio" )
for x in range (10):
    print (str(cafenames[x])+"\t"+"\t""|"+str(PUL[x]))

Out[38]:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-cb790ff62592> in <module>()
      7 
      8 # Plot
----> 9 plt.scatter(PUL, WiFi) # Searching for correlation between the two variables
     10 plt.title('Ratio of laptop Users VS. Wifi Signal') # Title of the diagram
     11 plt.xlabel('Ratio of Laptop Users') # Title of the X axis.
/ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, hold, data, **kwargs)
   3249                          vmin=vmin, vmax=vmax, alpha=alpha,
   3250                          linewidths=linewidths, verts=verts,
-> 3251                          edgecolors=edgecolors, data=data, **kwargs)
   3252     finally:
   3253         ax.hold(washold)
/ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/__init__.pyc in inner(ax, *args, **kwargs)
   1810                     warnings.warn(msg % (label_namer, func.__name__),
   1811                                   RuntimeWarning, stacklevel=2)
-> 1812             return func(ax, *args, **kwargs)
   1813         pre_doc = inner.__doc__
   1814         if pre_doc is None:
/ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   3838         y = np.ma.ravel(y)
   3839         if x.size != y.size:
-> 3840             raise ValueError("x and y must be the same size")
   3841 
   3842         s = np.ma.ravel(s)  # This doesn't have to match x, y in size.
ValueError: x and y must be the same size

In [0]:

Product

Resources

Company