danaeris: (Default)
[personal profile] danaeris
I have a confession to make. I nearly flunked Intro to Satistics at MIT (I passed statistical physics, though), and so I dropped the course. I don't know if this was a function of my state of mind, or my own personal blind spot, but ultimately what it means is that I don't know my statistics well at all.

The statistics I did learn to do in lab were all using Matlab, a tool to which I no longer have access.

I'm supposed to be analyzing this data for an article on the educational market. I've got the number of students (male, female, and both) in JK-13 beginning with the year 1997/98 and ending with the year 2003/2004 for all of Canada, and broken down by province and territory.

I need to extrapolate numbers for the next school year (2006/07). But the curves are not exponential or linear, so far as I can see. They are more complex than that.

Excel 2003, the program I have access to, has a data analysis pack, but the help section simply refers me to a long list of textbooks. Eek!

I don't know what I'm going to do/how I'm going to do this. Anyone able to offer advice? [livejournal.com profile] metalana?

You'd think that I could simply get away with saying, "the kids who were in 1st grade in 2003/04 will be in 4th grade in 2006/07." But looking at these curves, the relationships aren't that simple. Kids are held back, or skip grades, or leave the country or public system, or enter from another country or private school. And in high school, it's a real mess, because of the phasing out of OAC (Gr. 13) in Ontario; in 2003, two generations of students "graduated" in the same year. Many of the students destined to graduate in 2003 likely either aimed to graduate early or late so as to not get burned by the double cohort, leading to some very strange looking graphs/statistics.

Gah. *frustration*

Date: 2006-06-29 09:22 pm (UTC)
From: [identity profile] tunape.livejournal.com
hah! I also flunked stats in the fall semester - but as a grad student! :) it's a weird blind spot that you don't realize. What's weird is that I could crunch all the numbers, but just couldn't understand what was going on.

I had to take the 2nd semester stats in the spring, and suddenly, one day it hit me! I suddenly "got it" and had a "ah ha" moment!

What helped in the spring is the Wooldridge book on stats - the intro/easy version. Basically, stats uses a very specific model and tries to crunch parameters and numbers. I was completely missing the model at first!

anyhow, the common software for stats is Stata(my preference), SAS, and MiniTab. You can get one of those either on trial or free/cheap.

but it sounds like you have quite a "dirty" data set. bleh. have fun.

specific question

Date: 2006-06-29 09:27 pm (UTC)
From: [identity profile] tunape.livejournal.com
what is the specific question you are trying to answer?

what do you mean by extrapolating grades? what are you trying to predict? projections of first half grades as a predictor of later years grades?

in general, even if it's not linear, you still want to predict it using a linear model. anything else will be overly complicated. you can also use a window instead of the whole data set to get a better linear fit. there's ways to measure how well it fits a linear model and not.

also, working with discrete numbers - eg. grade level and academic grades - lead to some problems. primarily that you are assuming that the steps are evenly spread out from each other. this is not necessarily always true since one can argue that it's harder to go from a C to a B, than from a B to an A(or vice versa).

Re: specific question

Date: 2006-06-29 09:44 pm (UTC)
From: [identity profile] danaeris.livejournal.com
Ah, no, I mean that I am trying to predict the number of students in a given grade in a given year. Not their grades, but what grade they are IN.

Date: 2006-06-29 09:52 pm (UTC)
From: [identity profile] metalana.livejournal.com
I was going to say "I no longer give stats advice" but in fact your question is about applied math and demographics, so I will opine.

Extrapolating the curves is unlikely to be accurate, given that you're starting with non-linear data and inadequate extrapolation skills/tools. (And unfortunately I never studied much extrapolation techniques.)

Your best bet is demographics, as described in your last paragraph. Most kids born in year X will go to kindergarten in year X+5 or whatever. But if you want to account for deaths, migration and private school popularity, you're doing a way more precise analysis than journalists have time or money for. Hire a demographer. Or make a rough approximation.

Date: 2006-06-30 01:23 pm (UTC)
From: [identity profile] danaeris.livejournal.com
Thanks for the reality check. This was very helpful.

Profile

danaeris: (Default)
danaeris

August 2022

S M T W T F S
 123456
78910111213
14 151617181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 22nd, 2026 06:34 am
Powered by Dreamwidth Studios