Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: LBA2
Views: 38
Kernel: Python 2 (SageMath)

Part 1: Synthesizing original LBA data Briefly describe your two variables. Is there any relationship between the variables? Do you have reason to believe there is a causal relationship between the variables? Why or why not? Make a prediction about how you think the results of next round data collection will turn out. [#variables]

Briefly describe your two variables. Is there any relationship between the variables? Do you have reason to believe there is a causal relationship between the variables? Why or why not? Make a prediction about how you think the results of next round data collection will turn out

Why did you choose this variable? How will you choose which cafes to go to? How do you define a cafe? Why did you choose this?

The variables we have chosen this assignment are the ratio of laptop users in a cafe and the WIFI speeds. we chose the variables to identify if there is a strog correlation between them. many factor can affect the ratio of laptop users in cafes, and in this assignment, we will see if WiFi as a dependent variable is a storng factor. We chose to go to Lower Nob hill nihberhood, which have a mix of offices and apratments. we chose the 10 Main cafesd with free working internet connection, and that were considered to be good places to work or study. We believe there is a causal relationship between the variables because we feel that as higher the WIfi speed will be, the ratio of laptop users will grow. Laptop users use internet to oprate most of their applications on the computer. We expect the results of this data collection to be interesting because this neigberhood is not all apartments or offices, and it is not considered to be as welthy as SOMA or Financial District. to be a posetive linear corelation.

The selected area to be mesured is Lower Nob hill, in San Franssisco, between Powell st, Bush st., Leavenworth st. and Geary st.

from IPython.display import Image Image("Screen Shot 2017-11-05 at 14.42.58.png")
Image in a Jupyter notebook

Laptop users ratio data collection:

  • In this study, you investigate the ratio of laptop users in a cafe in San Fransisco at noon (12:00-14:00).

  • The data will be collected making observations, that will be conducted in ten different cafes in a ten-block area in the city.

  • Make sure the observations being collected on the same hours of the control variable, and on weekdays.

  • In this area you should count the number of customers in each of the coffee shops, and the number of people using a laptop (not including tablets or smartphones).

  • Then use the data to calculate the ratio (percentage) of the cafe customers (How much do they constitute out of the total number).

  • Finaly, divide the ratio by the number of measurments to caulcalte the avereage ratio of the laptop useres.

import numpy as np from numpy import median PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4] mean = sum(PUL) / float(len(PUL)) # mean is average. to culcatale that we divide the sum of the all the items of data by the number of the items of data. Median = median(PUL) # This library function finds the median by organizing the list from low to high and then finds the value in the middle. In this case, there is an even number of values, so it cualcaltes the average of the two numbers in the middle. Range = (max(PUL) - min(PUL)) # The range is the difference between the lowest and highest values. we subtract the highest value we have by the lowest item of data. STD = np.std(PUL) # This function uses the list with the formula of standard deviation. The sum of every value on the list decreased by the value of the mean, and than squared. Divided by the length of the list minus one, and then square root the answer. ALL = [mean, Median, Range, STD] stats = ["Mean = ","Median = ", "Range = ", "Standard Deviation = "] for y in range(4): print (str(stats[y])+str(ALL[y]))
Mean = 0.304545454545 Median = 0.3 Range = 0.866666666667 Standard Deviation = 0.248540274676

Data Interpretation: The range is very high, as we are dealing with ratio, which means there are very low values and measurements with almost ratio of 100% The mean gives us the average of the all the chosen values. The mean comes out to be 0.304. This means that average ratio of laptop usres in Lower Nob Hill in the 10 Block area has an average of 30.4% ratio of plaptop users in a cafe. The median gives us the midpoint of the frequency distribution of the chosen values. For this set of observations, the median comes out to be 0.3. The mean minus the median is almost zero, since the difference comes out to be around 0.004, which is considerably small considering the range (0.866), the histogram will look simetrical in a way, maybe a little negative skewed, because the mean is lower than the median. It means that many of the values in the graph tend to be towards the left side of the histogram, meanings the values are towards the lower side of the range. This means that many of the cafes have lower lpatop useres ratio. The range is very high, as we are dealing with ratio (0.866 when 1 is the maximum), which means there are very low values and measurements with almost ratio of 100%

import matplotlib.pyplot as plt import numpy as np PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4] plt.hist(PUL, bins = 5) plt.xlabel("Ratio of laptop users") plt.ylabel("Frequency") #fig = plt.gcf()
<matplotlib.text.Text at 0x7fecd02440d0>
Image in a Jupyter notebook

2.2

import matplotlib.pyplot as plt # Data WiFi = [3050,3000,2800,2700,2600,2400,2300, 2047, 2000, 1980 ] PUL = [4.0/12,7.0/20,3.0/10,13.0/15,0.0/7,0.0/5,1.0/4,0.0/3,2.0/4,5.0/10,1.0/4] # Ratio of laptop users # Plot plt.scatter(PUL, WiFi) # Searching for correlation between the two variables plt.title('Ratio of laptop Users VS. Wifi Signal') # Title of the diagram plt.xlabel('Ratio of Laptop Users') # Title of the X axis. plt.ylabel('revenue (US Dollar $)') # Title of the Y axis. plt.xlim(0, 1) plt.ylim(0, 3500) plt.show() print ("Cafe' num." + "\t" + "Ratio" ) for x in range (10): print (str(cafenames[x])+"\t"+"\t""|"+str(PUL[x]))
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-38-cb790ff62592> in <module>() 7 8 # Plot ----> 9 plt.scatter(PUL, WiFi) # Searching for correlation between the two variables 10 plt.title('Ratio of laptop Users VS. Wifi Signal') # Title of the diagram 11 plt.xlabel('Ratio of Laptop Users') # Title of the X axis. /ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, hold, data, **kwargs) 3249 vmin=vmin, vmax=vmax, alpha=alpha, 3250 linewidths=linewidths, verts=verts, -> 3251 edgecolors=edgecolors, data=data, **kwargs) 3252 finally: 3253 ax.hold(washold) /ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/__init__.pyc in inner(ax, *args, **kwargs) 1810 warnings.warn(msg % (label_namer, func.__name__), 1811 RuntimeWarning, stacklevel=2) -> 1812 return func(ax, *args, **kwargs) 1813 pre_doc = inner.__doc__ 1814 if pre_doc is None: /ext/sage/sage-8.0/local/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs) 3838 y = np.ma.ravel(y) 3839 if x.size != y.size: -> 3840 raise ValueError("x and y must be the same size") 3841 3842 s = np.ma.ravel(s) # This doesn't have to match x, y in size. ValueError: x and y must be the same size
Image in a Jupyter notebook