CoCalc -- gaussian

GitHub Repository: restrepo/ComputationalMethods
Path: blob/master/material/gaussian_fit.ipynb
⁹³⁴ views

Kernel: Python 3 (ipykernel)

Gaussian fit

Based on the data colected with this google form: https://bit.ly/guesbookpages for the guesses in the number of pages for a book

f(x)=a\exp\left[ -\frac{(x-\mu)^2}{2\sigma^2} \right]

where $a$ is the height of the gaussian, $\mu$ is the mean (expected value), and $\sigma$ es la varianze

In [7]:

%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit

def gaussian(x,a,μ,σ):
    return a*np.exp(-(x-μ)**2/(2*σ**2))

Out[7]:

Populating the interactive namespace from numpy and matplotlib

In [2]:

df=pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vTu_XE2dAiTcjHTfbaVKt7xEl_GnNeF_VYFsIBi5uM-gqBlBRfNHso-X1z3lxV7IW2f9UYKmZkSOYv-/pub?output=csv')
#Convert to integer
df['Guess']=df['Guess'].str.replace(',','').astype(int)
# Configure binned data
bins=range(0,1500,100)

In [3]:

(df[((df.Guess>=0) & (df.Guess<100))].shape[0],
 df[((df.Guess>=100) & (df.Guess<200))].shape[0],
 df[((df.Guess>=200) & (df.Guess<300))].shape[0],
 df[((df.Guess>=300) & (df.Guess<400))].shape[0],
 df[((df.Guess>=400) & (df.Guess<500))].shape[0])

Out[3]:

(0, 1, 0, 3, 8)

In [4]:

y,x,p=plt.hist(df['Guess'],bins=bins)

Out[4]:

In [1]:

%pylab inline

Out[1]:

%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib

In [2]:

plt.hist?

Out[2]:

Signature:
plt.hist(
    x,
    bins=None,
    range=None,
    density=False,
    weights=None,
    cumulative=False,
    bottom=None,
    histtype='bar',
    align='mid',
    orientation='vertical',
    rwidth=None,
    log=False,
    color=None,
    label=None,
    stacked=False,
    *,
    data=None,
    **kwargs,
)
Docstring:
Compute and plot a histogram.

This method uses `numpy.histogram` to bin the data in *x* and count the
number of values in each bin, then draws the distribution either as a
`.BarContainer` or `.Polygon`. The *bins*, *range*, *density*, and
*weights* parameters are forwarded to `numpy.histogram`.

If the data has already been binned and counted, use `~.bar` or
`~.stairs` to plot the distribution::

    counts, bins = np.histogram(x)
    plt.stairs(bins, counts)

Alternatively, plot pre-computed bins and counts using ``hist()`` by
treating each bin as a single point with a weight equal to its count::

    plt.hist(bins[:-1], bins, weights=counts)

The data input *x* can be a singular array, a list of datasets of
potentially different lengths ([*x0*, *x1*, ...]), or a 2D ndarray in
which each column is a dataset. Note that the ndarray form is
transposed relative to the list form. If the input is an array, then
the return value is a tuple (*n*, *bins*, *patches*); if the input is a
sequence of arrays, then the return value is a tuple
([*n0*, *n1*, ...], *bins*, [*patches0*, *patches1*, ...]).

Masked arrays are not supported.

Parameters
----------
x : (n,) array or sequence of (n,) arrays
    Input values, this takes either a single array or a sequence of
    arrays which are not required to be of the same length.

bins : int or sequence or str, default: :rc:`hist.bins`
    If *bins* is an integer, it defines the number of equal-width bins
    in the range.

    If *bins* is a sequence, it defines the bin edges, including the
    left edge of the first bin and the right edge of the last bin;
    in this case, bins may be unequally spaced.  All but the last
    (righthand-most) bin is half-open.  In other words, if *bins* is::

        [1, 2, 3, 4]

    then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
    the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which
    *includes* 4.

    If *bins* is a string, it is one of the binning strategies
    supported by `numpy.histogram_bin_edges`: 'auto', 'fd', 'doane',
    'scott', 'stone', 'rice', 'sturges', or 'sqrt'.

range : tuple or None, default: None
    The lower and upper range of the bins. Lower and upper outliers
    are ignored. If not provided, *range* is ``(x.min(), x.max())``.
    Range has no effect if *bins* is a sequence.

    If *bins* is a sequence or *range* is specified, autoscaling
    is based on the specified bin range instead of the
    range of x.

density : bool, default: False
    If ``True``, draw and return a probability density: each bin
    will display the bin's raw count divided by the total number of
    counts *and the bin width*
    (``density = counts / (sum(counts) * np.diff(bins))``),
    so that the area under the histogram integrates to 1
    (``np.sum(density * np.diff(bins)) == 1``).

    If *stacked* is also ``True``, the sum of the histograms is
    normalized to 1.

weights : (n,) array-like or None, default: None
    An array of weights, of the same shape as *x*.  Each value in
    *x* only contributes its associated weight towards the bin count
    (instead of 1).  If *density* is ``True``, the weights are
    normalized, so that the integral of the density over the range
    remains 1.

cumulative : bool or -1, default: False
    If ``True``, then a histogram is computed where each bin gives the
    counts in that bin plus all bins for smaller values. The last bin
    gives the total number of datapoints.

    If *density* is also ``True`` then the histogram is normalized such
    that the last bin equals 1.

    If *cumulative* is a number less than 0 (e.g., -1), the direction
    of accumulation is reversed.  In this case, if *density* is also
    ``True``, then the histogram is normalized such that the first bin
    equals 1.

bottom : array-like, scalar, or None, default: None
    Location of the bottom of each bin, ie. bins are drawn from
    ``bottom`` to ``bottom + hist(x, bins)`` If a scalar, the bottom
    of each bin is shifted by the same amount. If an array, each bin
    is shifted independently and the length of bottom must match the
    number of bins. If None, defaults to 0.

histtype : {'bar', 'barstacked', 'step', 'stepfilled'}, default: 'bar'
    The type of histogram to draw.

    - 'bar' is a traditional bar-type histogram.  If multiple data
      are given the bars are arranged side by side.
    - 'barstacked' is a bar-type histogram where multiple
      data are stacked on top of each other.
    - 'step' generates a lineplot that is by default unfilled.
    - 'stepfilled' generates a lineplot that is by default filled.

align : {'left', 'mid', 'right'}, default: 'mid'
    The horizontal alignment of the histogram bars.

    - 'left': bars are centered on the left bin edges.
    - 'mid': bars are centered between the bin edges.
    - 'right': bars are centered on the right bin edges.

orientation : {'vertical', 'horizontal'}, default: 'vertical'
    If 'horizontal', `~.Axes.barh` will be used for bar-type histograms
    and the *bottom* kwarg will be the left edges.

rwidth : float or None, default: None
    The relative width of the bars as a fraction of the bin width.  If
    ``None``, automatically compute the width.

    Ignored if *histtype* is 'step' or 'stepfilled'.

log : bool, default: False
    If ``True``, the histogram axis will be set to a log scale.

color : color or array-like of colors or None, default: None
    Color or sequence of colors, one per dataset.  Default (``None``)
    uses the standard line color sequence.

label : str or None, default: None
    String, or sequence of strings to match multiple datasets.  Bar
    charts yield multiple patches per dataset, but only the first gets
    the label, so that `~.Axes.legend` will work as expected.

stacked : bool, default: False
    If ``True``, multiple data are stacked on top of each other If
    ``False`` multiple data are arranged side by side if histtype is
    'bar' or on top of each other if histtype is 'step'

Returns
-------
n : array or list of arrays
    The values of the histogram bins. See *density* and *weights* for a
    description of the possible semantics.  If input *x* is an array,
    then this is an array of length *nbins*. If input is a sequence of
    arrays ``[data1, data2, ...]``, then this is a list of arrays with
    the values of the histograms for each of the arrays in the same
    order.  The dtype of the array *n* (or of its element arrays) will
    always be float even if no weighting or normalization is used.

bins : array
    The edges of the bins. Length nbins + 1 (nbins left edges and right
    edge of last bin).  Always a single array even when multiple data
    sets are passed in.

patches : `.BarContainer` or list of a single `.Polygon` or list of such objects
    Container of individual artists used to create the histogram
    or list of such containers if there are multiple input datasets.

Other Parameters
----------------
data : indexable object, optional
    If given, the following parameters also accept a string ``s``, which is
    interpreted as ``data[s]`` (unless this raises an exception):

    *x*, *weights*

**kwargs
    `~matplotlib.patches.Patch` properties

See Also
--------
hist2d : 2D histogram with rectangular bins
hexbin : 2D histogram with hexagonal bins

Notes
-----
For large numbers of bins (>1000), plotting can be significantly faster
if *histtype* is set to 'step' or 'stepfilled' rather than 'bar' or
'barstacked'.
File:      /usr/local/lib/python3.9/dist-packages/matplotlib/pyplot.py
Type:      function

In [5]:

Out[5]:

array([   0,  100,  200,  300,  400,  500,  600,  700,  800,  900, 1000,
       1100, 1200, 1300, 1400])

In [6]:

#Plot histogram and extract binned data
y,x,p=plt.hist(df['Guess'],bins=bins)
#Chose the right-side of the bar
x=x[1:]
#show the choosen points
plt.plot(x,y,'k.')

#Gaussiang fit
#initial point `p0` to start the fit
a=1
μ=500
σ=100

fit=curve_fit(gaussian,x,y,p0=[a,μ,σ])[0]
print('Fitted values are: a={:.1f}, μ={:.1f}, σ={:.1f}'.format(fit[0],fit[1],fit[2]))
x=np.linspace(0,1400)
plt.plot(x,gaussian(x,*fit),lw=3)

plt.grid()
plt.xlabel('x [pages]',size=15)
plt.ylabel('$f(x)$',size=15)
plt.title( r'$f(x)=%.0f\cdot \exp\left[ -{(x-%.0f)^2}/{(2\cdot %.0f^2)} \right]$' %(fit[0],fit[1],fit[2]))
plt.savefig('gaussian.png')

Out[6]:

Fitted values are: a=10.2, μ=726.2, σ=237.7

Conclusion

The number of pages is $726\pm 238$ .

Another way to obtain the bins:

In [ ]:

xx=df.Guess.value_counts(bins=bins)

In [ ]:

df=pd.DataFrame( {'X':xx.index.right,'Y':xx.values} )
df=df.sort_values('X').reset_index(drop=True)

In [ ]:

df

In [10]:

df=pd.read_csv('/home/restrepo/Downloads/lens-export_Book.csv')

In [11]:

df.to_excel('/home/restrepo/Downloads/lens-export_Book.xlsx',index=False)

In [92]:

def hola(func):
    def mundo(x):
        h='hola'
    print('hola'+' '+'mundo')
    return mundo
@hola
def foos(x):
    return x

Out[92]:

hola mundo

In [144]:

def hola(func):
    def function_wrapper(x):
        res = 'hola mundo '+str(func(x))
        return res
    return function_wrapper

@hola
def mundano(n):
    return n+' y despiadado'

@hola
def mundito(n):
    return n+' y floreciente'

In [145]:

mundano('cruel')

Out[145]:

'hola mundo cruel y despiadado'

In [146]:

mundito('brillante')

Out[146]:

'hola mundo brillante y floreciente'

In [122]:

foo=mundo(hola)

In [126]:

x=foo('mundo')

Out[126]:

Before calling function_wrapper
Before calling hola
hola mundo
After calling hola
hola 2
After calling function_wrapper

In [127]:

Out[127]:

2

In [87]:

foo=hola(foo)

Out[87]:

hola mundo

In [88]:

Out[88]:

hola mundo

In [90]:

foo('hola')

In [ ]:

Gaussian fit

Conclusion

Product

Resources

Company