Drowning in Statistics
Jun. 29th, 2006 04:32 pmI have a confession to make. I nearly flunked Intro to Satistics at MIT (I passed statistical physics, though), and so I dropped the course. I don't know if this was a function of my state of mind, or my own personal blind spot, but ultimately what it means is that I don't know my statistics well at all.
The statistics I did learn to do in lab were all using Matlab, a tool to which I no longer have access.
I'm supposed to be analyzing this data for an article on the educational market. I've got the number of students (male, female, and both) in JK-13 beginning with the year 1997/98 and ending with the year 2003/2004 for all of Canada, and broken down by province and territory.
I need to extrapolate numbers for the next school year (2006/07). But the curves are not exponential or linear, so far as I can see. They are more complex than that.
Excel 2003, the program I have access to, has a data analysis pack, but the help section simply refers me to a long list of textbooks. Eek!
I don't know what I'm going to do/how I'm going to do this. Anyone able to offer advice?
metalana?
You'd think that I could simply get away with saying, "the kids who were in 1st grade in 2003/04 will be in 4th grade in 2006/07." But looking at these curves, the relationships aren't that simple. Kids are held back, or skip grades, or leave the country or public system, or enter from another country or private school. And in high school, it's a real mess, because of the phasing out of OAC (Gr. 13) in Ontario; in 2003, two generations of students "graduated" in the same year. Many of the students destined to graduate in 2003 likely either aimed to graduate early or late so as to not get burned by the double cohort, leading to some very strange looking graphs/statistics.
Gah. *frustration*
The statistics I did learn to do in lab were all using Matlab, a tool to which I no longer have access.
I'm supposed to be analyzing this data for an article on the educational market. I've got the number of students (male, female, and both) in JK-13 beginning with the year 1997/98 and ending with the year 2003/2004 for all of Canada, and broken down by province and territory.
I need to extrapolate numbers for the next school year (2006/07). But the curves are not exponential or linear, so far as I can see. They are more complex than that.
Excel 2003, the program I have access to, has a data analysis pack, but the help section simply refers me to a long list of textbooks. Eek!
I don't know what I'm going to do/how I'm going to do this. Anyone able to offer advice?
You'd think that I could simply get away with saying, "the kids who were in 1st grade in 2003/04 will be in 4th grade in 2006/07." But looking at these curves, the relationships aren't that simple. Kids are held back, or skip grades, or leave the country or public system, or enter from another country or private school. And in high school, it's a real mess, because of the phasing out of OAC (Gr. 13) in Ontario; in 2003, two generations of students "graduated" in the same year. Many of the students destined to graduate in 2003 likely either aimed to graduate early or late so as to not get burned by the double cohort, leading to some very strange looking graphs/statistics.
Gah. *frustration*
no subject
Date: 2006-06-29 09:22 pm (UTC)I had to take the 2nd semester stats in the spring, and suddenly, one day it hit me! I suddenly "got it" and had a "ah ha" moment!
What helped in the spring is the Wooldridge book on stats - the intro/easy version. Basically, stats uses a very specific model and tries to crunch parameters and numbers. I was completely missing the model at first!
anyhow, the common software for stats is Stata(my preference), SAS, and MiniTab. You can get one of those either on trial or free/cheap.
but it sounds like you have quite a "dirty" data set. bleh. have fun.
specific question
Date: 2006-06-29 09:27 pm (UTC)what do you mean by extrapolating grades? what are you trying to predict? projections of first half grades as a predictor of later years grades?
in general, even if it's not linear, you still want to predict it using a linear model. anything else will be overly complicated. you can also use a window instead of the whole data set to get a better linear fit. there's ways to measure how well it fits a linear model and not.
also, working with discrete numbers - eg. grade level and academic grades - lead to some problems. primarily that you are assuming that the steps are evenly spread out from each other. this is not necessarily always true since one can argue that it's harder to go from a C to a B, than from a B to an A(or vice versa).
Re: specific question
Date: 2006-06-29 09:44 pm (UTC)no subject
Date: 2006-06-29 09:52 pm (UTC)Extrapolating the curves is unlikely to be accurate, given that you're starting with non-linear data and inadequate extrapolation skills/tools. (And unfortunately I never studied much extrapolation techniques.)
Your best bet is demographics, as described in your last paragraph. Most kids born in year X will go to kindergarten in year X+5 or whatever. But if you want to account for deaths, migration and private school popularity, you're doing a way more precise analysis than journalists have time or money for. Hire a demographer. Or make a rough approximation.
no subject
Date: 2006-06-30 01:23 pm (UTC)