Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Wednesday, September 28, 2016

multidimensional confidence intervals

multidimensional confidence intervals


I have numerous tuples (par1,par2), i.e. points in a 2 dimensional parameter space obtained from repeating an experiment multiple times.

I'm looking for a possibility to calculate and visualize confidence ellipses (not sure if thats the correct term for this). Here an example plot that I found in the web to show what I mean:

enter image description here

source: blogspot.ch/2011/07/classification-and-discrimination-with.html

So in principle one has to fit a multivariate normal distribution to a 2D histogram of data points I guess. Can somebody help me with this?

Answer by gcalmettes for multidimensional confidence intervals


I guess what you are looking for is to compute the Confidence Regions.

I don't know much how about it, but as a starting point, I would check the sherpa application for python. At least, in their Scipy 2011 talk, authors mention that you can determine and obtain confidence regions with it (you may need to have a model for your data though).

See the video and corresponding slides of the Sherpa talk.

HTH

Answer by Daniel Velkov for multidimensional confidence intervals


Here is my attempt at getting something like your example. It's not exactly the same because I assume we don't know the number of clusters in advance. My approach is to fit a KDE to the data and then find the contour of the 95th percentile. So we get a confidence contour instead of an ellipse. Here's the code with comments:

import numpy as np  import scipy  import scipy.stats  import matplotlib.pyplot as plt    # generate two normally distributed 2d arrays  x1=np.random.multivariate_normal((100,420),[[120,0],[0,80]],400)  x2=np.random.multivariate_normal((140,340),[[90,0],[0,80]],400)    # join the two distributions together  x=np.vstack([x1,x2])    # fit a KDE to the data  pdf=scipy.stats.kde.gaussian_kde(x.T)    # create a grid over which we can evaluate pdf  q,w=np.meshgrid(range(50,200,10), range(300,500,10))  r=pdf([q.flatten(),w.flatten()])    # sample the pdf and find the value at the 95th percentile  s=scipy.stats.scoreatpercentile(pdf(pdf.resample(1000)), 10)    # reshape back to 2d  r.shape=(20,15)    # plot the contour at the 95th percentile  plt.contour(range(50,200,10), range(300,500,10), r, [s])    # scatter plot the two normal distributions  plt.scatter(x1[:,0],x1[:,1])  plt.scatter(x2[:,0],x2[:,1],c='r')  plt.show()  

The results looks like: confidence contour

Answer by Joe Kington for multidimensional confidence intervals


It sounds like you just want the 2-sigma ellipse of the scatter of points?

If so, consider something like this (From some code for a paper here: https://github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py):

import numpy as np    import matplotlib.pyplot as plt  from matplotlib.patches import Ellipse    def plot_point_cov(points, nstd=2, ax=None, **kwargs):      """      Plots an `nstd` sigma ellipse based on the mean and covariance of a point      "cloud" (points, an Nx2 array).        Parameters      ----------          points : An Nx2 array of the data points.          nstd : The radius of the ellipse in numbers of standard deviations.              Defaults to 2 standard deviations.          ax : The axis that the ellipse will be plotted on. Defaults to the               current axis.          Additional keyword arguments are pass on to the ellipse patch.        Returns      -------          A matplotlib ellipse artist      """      pos = points.mean(axis=0)      cov = np.cov(points, rowvar=False)      return plot_cov_ellipse(cov, pos, nstd, ax, **kwargs)    def plot_cov_ellipse(cov, pos, nstd=2, ax=None, **kwargs):      """      Plots an `nstd` sigma error ellipse based on the specified covariance      matrix (`cov`). Additional keyword arguments are passed on to the       ellipse patch artist.        Parameters      ----------          cov : The 2x2 covariance matrix to base the ellipse on          pos : The location of the center of the ellipse. Expects a 2-element              sequence of [x0, y0].          nstd : The radius of the ellipse in numbers of standard deviations.              Defaults to 2 standard deviations.          ax : The axis that the ellipse will be plotted on. Defaults to the               current axis.          Additional keyword arguments are pass on to the ellipse patch.        Returns      -------          A matplotlib ellipse artist      """      def eigsorted(cov):          vals, vecs = np.linalg.eigh(cov)          order = vals.argsort()[::-1]          return vals[order], vecs[:,order]        if ax is None:          ax = plt.gca()        vals, vecs = eigsorted(cov)      theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))        # Width and height are "full" widths, not radius      width, height = 2 * nstd * np.sqrt(vals)      ellip = Ellipse(xy=pos, width=width, height=height, angle=theta, **kwargs)        ax.add_artist(ellip)      return ellip    if __name__ == '__main__':      #-- Example usage -----------------------      # Generate some random, correlated data      points = np.random.multivariate_normal(              mean=(1,1), cov=[[0.4, 9],[9, 10]], size=1000              )      # Plot the raw points...      x, y = points.T      plt.plot(x, y, 'ro')        # Plot a transparent 3 standard deviation covariance ellipse      plot_point_cov(points, nstd=3, alpha=0.5, color='green')        plt.show()  

enter image description here

Answer by Rodrigo for multidimensional confidence intervals


I slightly modified one of the examples above that plots the error or confidence region contours. Now I think it gives the right contours.

It was giving the wrong contours because it was applying the scoreatpercentile method to the joint dataset (blue + red points) when it should be applied separately to each dataset.

The modified code can be found below:

import numpy  import scipy  import scipy.stats  import matplotlib.pyplot as plt    # generate two normally distributed 2d arrays  x1=numpy.random.multivariate_normal((100,420),[[120,80],[80,80]],400)  x2=numpy.random.multivariate_normal((140,340),[[90,-70],[-70,80]],400)    # fit a KDE to the data  pdf1=scipy.stats.kde.gaussian_kde(x1.T)  pdf2=scipy.stats.kde.gaussian_kde(x2.T)    # create a grid over which we can evaluate pdf  q,w=numpy.meshgrid(range(50,200,10), range(300,500,10))  r1=pdf1([q.flatten(),w.flatten()])  r2=pdf2([q.flatten(),w.flatten()])    # sample the pdf and find the value at the 95th percentile  s1=scipy.stats.scoreatpercentile(pdf1(pdf1.resample(1000)), 5)  s2=scipy.stats.scoreatpercentile(pdf2(pdf2.resample(1000)), 5)    # reshape back to 2d  r1.shape=(20,15)  r2.shape=(20,15)    # plot the contour at the 95th percentile  plt.contour(range(50,200,10), range(300,500,10), r1, [s1],colors='b')  plt.contour(range(50,200,10), range(300,500,10), r2, [s2],colors='r')    # scatter plot the two normal distributions  plt.scatter(x1[:,0],x1[:,1],alpha=0.3)  plt.scatter(x2[:,0],x2[:,1],c='r',alpha=0.3)  

Answer by Syrtis Major for multidimensional confidence intervals


Refer the post How to draw a covariance error ellipse.

Here's the python realization:

import numpy as np  from scipy.stats import norm, chi2    def cov_ellipse(cov, q=None, nsig=None, **kwargs):      """      Parameters      ----------      cov : (2, 2) array          Covariance matrix.      q : float, optional          Confidence level, should be in (0, 1)      nsig : int, optional          Confidence level in unit of standard deviations.           E.g. 1 stands for 68.3% and 2 stands for 95.4%.        Returns      -------      width, height, rotation :           The lengths of two axises and the rotation angle in degree      for the ellipse.      """        if q is not None:          q = np.asarray(q)      elif nsig is not None:          q = 2 * norm.cdf(nsig) - 1      else:          raise ValueError('One of `q` and `nsig` should be specified.')      r2 = chi2.ppf(q, 2)        val, vec = np.linalg.eigh(cov)      width, height = 2 * sqrt(val[:, None] * r2)      rotation = np.degrees(arctan2(*vec[::-1, 0]))        return width, height, rotation  

The meaning of standard deviation is wrong in the answer of Joe Kington. Usually we use 1, 2 sigma for 68%, 95% confidence levels, but the 2 sigma ellipse in his answer does not contain 95% probability of the total distribution. The correct way is using a chi square distribution to esimate the ellipse size as shown in the post.


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.