A Simple Regression Problem

This article is part of a model new assortment that features points with reply, that may help you hone your machine finding out and pattern recognition experience. Try to resolve this draw back by your self first, sooner than wanting on the reply. Today’s draw back moreover has an intriguing mathematical enchantment and reply: this allows you to confirm in case your reply found using machine finding out strategies, is suitable or not. The diploma is for newcomers. 

The draw back is as follows. Let X1, X2, X3 and so forth be a sequence recursively outlined by Xn+1 = Stdev(X1, …, Xn). Here X1, the preliminary state of affairs, is a optimistic precise amount or random variable. Thus,

It is clear that Xn = An X1, the place An is a amount that does not depend upon X1. So we’re capable of assume, with out lack of generality, that X1 = 1. For event, A1 = 1 and A2 = 0. The goal proper right here is to test the conduct of An (for large n) using straightforward model turning into strategies. I plotted the first few values of An, underneath. In the decide underneath, the X-axis represents n, and the Y-axis represents An. The question is: the way in which to approximate An as a straightforward function of n? Of course, a linear regression is not going to work. What a few polynomial regression?

The first 600 values of An may be discovered proper right here, as a textual content material file.

Solution

A instrument as main as Excel is good ample to go looking out the reply. However, whenever you use Excel, the built-in function Stdev has a correcting situation that should be taken care of. But you might merely use the values of An obtainable in my textual content material file talked about above, to stay away from this draw back.

If you utilize Excel, you might try diversified sorts of sample traces to approximate the blue curve, and even compute the regression coefficients and the R-squared for each examined model. You will uncover in a short while that the power sample line is among the greatest model by far, that is, An could also be very successfully approximated (for large values of n) by An = b n^c. Here n^c stands for n at power c; moreover, b and c are the regression coefficients. In completely different phrases, log An = log b + c log n (roughly). 

What could also be very attention-grabbing, is that using some arithmetic, you might actually compute the exact price of c. Indeed, c is reply of the equation c^2 = (2c + 1) (c + 1)^2, see proper right here. This is a polynomial equation of diploma 3, so the exact price of c could also be computed. The approximation is c = -0.3522011. It is nonetheless very arduous to get the exact price of b

It would attention-grabbing to plot the residual error for each estimated price of An, and see if it reveals some pattern. This would possibly end in a better approximation: An = b n^c (1 + n), with three parameters: b, c (unchanged) and d.

To acquire a weekly digest of our new articles, subscribe to our publication, proper right here.

About the author:  Vincent Granville is a data science pioneer, mathematician, e-book author (Wiley), patent proprietor, former post-doc at Cambridge University, former VC-funded authorities, with 20+ years of firm experience along with CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent may also be self-publisher at DataShaping.com, and based mostly and co-founded just some start-ups, along with one with a worthwhile exit (Data Science Central acquired by Tech Target). He simply recently opened Paris Restaurant, in Anacortes. You can entry Vincent’s articles and books, proper right here.