Think Bayes
Second Edition
Copyright 2020 Allen B. Downey
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Exercise: This question is inspired by a question that appeared on Reddit.
Suppose you own a mango farm with 1800 trees. Every year as harvest time approached, you would like to estimate the total weight of the mangos on the trees.
And suppose you hire a professional mango estimator who can look at a tree and estimate the total number of mangos and their average weight. The estimator is not perfect, but they are pretty accurate on average.
Finally, suppose the estimator inspects 10 trees and reports the following data:
Mangos per tree: 50, 60, 70, 80, 90, 100...
Average weight in each tree: x, y, z
Compute posterior distributions for (1) the total number of mangos on all trees and (2) their total weight.
To get you started, I'll solve two easier problems:
First, suppose you know that the number of mangos in the trees is well described by a normal distribution with mean and standard deviation . And suppose we compute the average weight of the mangos in each tree; some trees yield bigger mangos that others, but the distribution of averages is well describes by a normal distribution with mean grams and standard deviation grams.
I'll assign these parameters to variables and make norm objects to represent the distributions.
Based on these parameters, what is the distribution for the total number of mangos on 1800 trees, and for their total weight?
To keep things simple, let's assume there is no correlation between the number of mangos and their size, so we can solve this problem by drawing random values from the two distributions independently.
Next, suppose we know from previous harvests that the parameters we used in the previous step vary from year to year, depending on the ages of the trees, weather, and other factors. And suppose that the distributions of and are well modeled by normal distributions, and the distributions of and , under the transformation , are well-modeled by chi-square distributions, (where is the sample size, is the sample standard deviation, and is the population standard deviation).