with a sample calculation from Dowding and another Spreadsheet with a Set of Sample Data. The thing to look at are the graphs showing the 50%/84% and 95% lines for each of the two methods. The 84% line are close. The 95% ones are miles out.
I have calculated the 84% and 95% Confidence Lines using two approaches. Both approaches assume the data is lognormally distributed. The 84% ones get better with more data (Large N), but the 95% one are dreadful and I'm wondering if ONE of the methods is incorrect, or inappropriate for the data. The Approaches are:
Approach (1): That Shown in CH Dowding's Book "Construction Vibrations, which states:
"The equation of the 50% line of the attenuation relations is found by fitting the least squares best-fit line to the data. Such a best-fit relationship is found to be linear when the log10 of the PPV is compared to the log10 of the scaled distance...If the data were normally distributed the 84% line would be found by adding one standard deviation of the data to the 50% line, however since the data are lognormally distributed, the 84% line is found by MULTIPLYING the median 50% line value by 10^SD, where SD= the Standard Deviation of the data ABOUT the 50% line. This Power of 10 value is called the Standard Error of the estimate, SE, the estimate being the 50% line value. To find the 84% confidence line, multiply the expected value, which is the median for lognormally distributed data, not the mean) by 10^StDev, where StDev is the Standard Deviation ABOUT the best fit line."
He then quotes Benjamin & Cornell (1970) for this SE Calculation and I found that the equation for the Standard Deviation is eqtn 4.3.26 or 4.3.27 of B&C. i.e.
SEE=10^SigmaLogData Sigma(LogData)=SQRT(n*(1-cc^2)*(StdDevLogData^2)/(n-2)). Where n=Sample Size, cc=Correlation Coefficient.
Dowding's SEE is not the same as the one you get using Excels built in Standard Error Function, so I this is another separate query. I have a copy of Benjamin & Cornell (1970) which Dowding refers to and while it doesn't make a lot of sense, I found out how he calculated his SEE, which is close, but not the same as the SEE that Excel uses. So I guess I'm also curious as to why there's a difference there too, but again, with large N, the SEE's tend to converge to the same number in any case. I also don't fully understand the MULTIPLICATION of the 50% line by 1.000 to get the 84% line, or by 1.645 to get the 95% Line Data, but this is borne out by other statistcial texts whcih note these ratios are correct, and this fits with the Cumulative Normal Distrobution. (I always understood ADDING a certain number of Standard Deviations was correct).
Approach (2): An ISEE Paper entitled "The Reliability of Peak Particle Velocity Analysis Methods" by Hunt, Wetherelt & Powell, which discusses and uses a simple regression approach. i.e.
(i) The PPV95% = 50%Intercept(K) * Scaled Dist (b) * 10^(Norm95%*SEE) OR (ii) by taking the 10th power of Log10(PPV) which is = Log (50%InterceptK) + Slope * Log (Scaled Dist) + Norm95%*SEE.
Both (i) & (ii) give the same Result. Norm95% = 1.96, from Normal Distribution Tables, for a 95% Confidence INTERVAL. SE in this instance is NOT the Dowding SE, but rather the Standard Error of the Estimate as shown on the Spreadsheet, which is close.
Theoretically, and please correct me if I'm wrong, the 68% Interval (1SD each way of mean, i.e. +34/-34), should produce an 84% line (50%+34%) and similarly the 90% INTERVAL "should" produces the (=50+45%) 95% LINE.
Dowding's approach seems to give an upper Bound 95% Confidence LINE (derived using different approach to other methods), including those methods discussed briefly in some commercial programs, which appear to use a 95% Confidence INTERVAL.
I do understand Approach 2 somewhat better, which simply assumes a Lognormal distribution has a Linear Regression line and adds the Std Error to the estimate, like a Normal Distribution Curve. This is a linear equation on a log plot and taking the logs of both sides allows you to find the Intercept, then transpose back to the non-log data (Although I have now ALSO seen papers which say THAT (to Transpose) is not a good idea either).
I have been reading an awful lot about statistics for the Lognormal Distribution and see some people say you have to use Land's Method for Lognormal Data, but others say that you need to take great care with it and is applicable over onyl certain ranges. I'd rather not get into OTHER methods or other distributions (or even testing if this distribution is appropriate for the data (although that DOES make sense)), as I will take the Author's word for it that they follow the Lognormal distribution.
The short queries are :
A. Even though the Simple Regression 95% Wide Confidence Interval encompass 95% of the data, and thus has 2.5% of the data above it, this does not appear to equate (i.e. the 95% K Value is not the same as) a line which Dowding proposes, which is a CUMULATIVE distribution with 97.5% of the data BELOW the line. i.e. also 2.5% of the Data above the line. i.e. The 95% lines do not calculate to the SAME intercept (although the 84% line is CLOSE). Is (Dowdings) 95% line is ="50%LineValue*1.645*SE" statement correct?
B. Why is there a difference in the SE Calculated using both methods? (Excel vs B&C)?
C. If you KNOW one approach gives you lower charge weights than the other, which one do you use? I guess it depends on which the USBM RI 8027 uses, if this is used as the baseline for comparison. I haven't yet got hold of the original USBM report, but when I do I might be able to answer that question.