Find Expected Value of Continuous Random Variable With Multiple Variables

Question

You are generating the mean and variance of Y = 2X, when you want the mean and variance of the X's themselves. You know the density, but the CDF is more useful for random variate generation than the PDF. For your problem, the density is:

f(x) = 2x

so the CDF is:

F(x) = x**2

Given that the CDF is an easily invertible function for the range [0,1], you can use inverse transform sampling to generate X values by setting F(X) = U, where U is a Uniform(0,1) random variable, and inverting the relationship to solve for X. For your problem, this yields X = U^1/2.

In other words, you can generate X values with

          import numpy as np N = 100_000 X = np.sqrt(np.random.uniform(size = N))

and then do anything you want with the data, such as calculate mean and variance, plot histograms, use in simulation models, or whatever.

A histogram will confirm that the generated data have the desired density:

          import matplotlib.pyplot as plt  plt.hist(X, bins = 100, density = True) plt.show()

produces Histogram of the generated data, plotted as density

The mean and variance estimates can then be calculated directly from the data:

          print(np.mean(X), np.var(X))     # => 0.6661509538922444 0.05556962913014367

But wait! There's more...

Margin of error

Simulation generates random data, so estimates of mean and variance will be variable across repeated runs. Statisticians use confidence intervals to quantify the magnitude of the uncertainty in statistical estimates. When the sample size is sufficiently large to invoke the central limit theorem, an interval estimate of the mean is calculated as (x-bar ± half-width), where x-bar is the estimate of the mean. For a so-called 95% confidence interval, the half-width is 1.96 * s / sqrt(n) where:

s is the estimated standard deviation;
n is the number of samples used in the estimates of mean and standard deviation; and
1.96 is a scaling constant derived from the normal distribution and the desired level of confidence.

The half-width is a quantitative measure of the margin of error, a.k.a. precision, of the estimate. Note that as n gets larger, the estimate has a smaller margin of error and becomes more precise, but there are diminishing returns to increasing the sample size due to the square root. Increasing the precision by a factor of 2 would require 4 times the sample size if independent sampling is used.

In Python:

          var = np.var(X) print(np.mean(X), var, 1.96 * np.sqrt(var / N))

produces results such as

          0.6666763186360812 0.05511848269208021 0.0014551397290634852

where the third column is the confidence interval half-width.

Improving precision

Inverse transform sampling can yield greater precision for a given sample size if we use a clever trick based on fundamental properties of expectation and variance. In intro prob/stats courses you probably were told that Var(X + Y) = Var(X) + Var(Y). The true relationship is actually Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y), where Cov(X,Y) is the covariance between X and Y. If they are independent, the covariance is 0 and the general relationship becomes the one we learn/teach in intro courses, but if they are not independent the more general equation must be used. Variance is always a positive quantity, but covariance can be either positive or negative. Consequently, it's easy to see that if X and Y have negative covariance the variance of their sum will be less than when they are independent. Negative covariance means that when X is above its mean Y tends to be below its mean, and vice-versa.

So how does that help? It helps because we can use the inverse transform, along with a technique known as antithetic variates, to create pairs of random variables which are identically distributed but have negative covariance. If U is a random variable with a Uniform(0,1) distribution, U' = 1 - U also has a Uniform(0,1) distribution. (In fact, flipping any symmetric distribution will produce the same distribution.) As a result, X = F^-1(U) and X' = F^-1(U') are identically distributed since they're defined by the same CDF, but will have negative covariance because they fall on opposite sides of their shared median and thus strongly tend to fall on opposite sides of their mean. If we average each pair to get A = (F^-1(u_i) + F^-1(1-u_i)) / 2) the expected value E[A] = E[(X + X')/2] = 2E[X]/2 = E[X] while the variance Var(A) = [(Var(X) + Var(X') + 2Cov(X,X')]/4 = 2[Var(X) + Cov(X,X')]/4 = [Var(X) + Cov(X,X')]/2. In other words, we get a random variable A whose average is an unbiased estimate of the mean of X but which has less variance.

To fairly compare antithetic results head-to-head with independent sampling, we take the original sample size and allocate it with half the data being generated by the inverse transform of the U's, and the other half generated by antithetic pairing using 1-U's. We then average the paired values and generate statistics as before. In Python:

          U = np.random.uniform(size = N // 2) antithetic_avg = (np.sqrt(U) + np.sqrt(1.0 - U)) / 2 anti_var = np.var(antithetic_avg) print(np.mean(antithetic_avg), anti_var, 1.96*np.sqrt(anti_var / (N / 2)))

which produces results such as

          0.6667222935263972 0.0018911848781598295 0.0003811869837216061

Note that the half-width produced with independent sampling is nearly 4 times as large as the half-width produced using antithetic variates. To put it another way, we would need more than an order of magnitude more data for independent sampling to achieve the same precision.

Find Expected Value of Continuous Random Variable With Multiple Variables

But wait! There's more...

Margin of error

Improving precision

0 Response to "Find Expected Value of Continuous Random Variable With Multiple Variables"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel