Thursday, October 29, 2009

Measuring the Race to the Bottom

I've written extensively about the Race to the Bottom that is created by aspects of NCLB where states' performance is measured by how well each state meets its own targets. I've also pointed out that individual states participating in this race to the bottom are not particularly keen on having transparent ways to compare their standards with other states.

The National Center for Education Statistics (part of the Department of Education) has found a way to use data from the National Assessment of Educational Progress along side state accountability reports to actually examine and quantify any Race to the Bottom. In a new report, Mapping State Proficiency Standards Onto NAEP Scales: 2005-2007, they have looked at changes from 2005 to 2007 in state scores and how they compare with the national measure. The report looks at reading and math in the 4th and 8th grades.

A word about proficiency

The NEAP makes a distinction between a basic level and a proficient level of performance. For the NEAP proficient means competency over challenging subject matter and not merely grade-level performance. Most (all?) states also make a distinction on their state accountability tests. When talking about the TAKS test in Texas, the word proficient is often used to refer to the minimum passing requirement and the term commended is used to describe the higher level.

In Texas, parents will hear the word proficient to refer to the minimum standard of passing the TAKS. That is not how the word is used nationally. And it is not how I will use it here. I will try to avoid confusion where I can. But I suspect that the Texan use of the word proficiency is a form of grade inflation attempting to make families feel that, as in Lake Wobegon, in Texas all children are above average.

Comparison among states

The report compares state standards for the proficiency level (not the basic level). That is, this report, when it comes to Texas is looked at the level needed to score a commended TAKS result. It worked to determine what the NEAP cut-off would be for getting a commended result on the 4th and 8th grade math and reading TAKS. It did this for each state for which there was sufficient data. This allows us to compare the proficiency levels from state to state.

In 2007 data Texas falls below the national average in its commended levels for 4th and 8th grade math and reading. For Texas is fourth from the bottom in 8th grade reading, beating out only North Carolina, Georgia and Tennessee. (Note that DC, Nebraska and Utah weren't included in this measure due to insufficient data.) For 8th grade math, Texas is near the middle of the pack. For 4th grade reading and math, Texas falls near the top of the bottom third.

States with higher proficiency (commended) standards have few students meeting those standards. There should be no surprise there. This leads to the question of whether it matters at all where states set their proficiency standards. Remember that proficiency standards are higher than the basic standards which all students are expected to meet. It turns out that states that set their own higher proficiency standards appear to get better results on the national NAEP exams. Whether the setting of higher standards is the cause of those higher scores is unknown. It should be noted that this relationship is much less pronounced for 8th grade reading, where it is not statistically significant.

Comparison over time

The question we asked with respect to any race to the bottom is whether states are lowering their own standards over time. The rest of the report concerns comparing 2005 and 2007 data. Getting the comparisons is mathematically tricky and so is the statistical inferencing. The report discusses their techniques in great detail, which I have yet to carefully review.

For each of 4th and 8th grade math and reading, they did two kinds of comparisons. The first is simply looking at the NEAP scores corresponding to the commended cut-offs has changed from 2005 to 2007. In this, Texas had no real change in 4th or 8th grade reading or 4th grade math (there was a decline in NEAP points, but that was within the margin of error for the analysis). But for 8th grade math there was a statistically meaningful decline of 4.2 points on the NEAP scale.

The report also looked at change in state standards in another way. If a state had a large increase in the number of students reaching the commended (proficient) level from 2005 to 2007 but did not have such a large (or any) increase in numbers of students improving on the NEAP.

Using this measure Texas students showed significantly more improvement on the Texas tests than on the national tests in 4th grade reading, 4th grade math, and 8th grade math.

Are the state standards getting easier

The pattern of change describe for Texas can be seen in many states (while other states are going in other directions). But does this means that states are lowering their standards in a race to the bottom? It certainly could mean that, but I suspect that this is more a consequence of schools getting better at preparing students for the state tests.

Schools are teaching test taking skills that are geared to the state tests. They are providing hot breakfasts on test days, they are perfecting their ways of motivating students and families to perform well on these tests. And with the actual teaching of content, there may be an increase in teaching to the test. A great deal of these efforts to improve state test scores will not carry over to the NEAP tests. The state accountability tests are very high stakes tests for the schools, while the NEAP tests have little direct consequence for the students, teachers or schools.

So schools will be engaging in activities that improve state test performance but do little for NEAP tests. This way we can see the results reported without it meaning that states are formally lowering their standards. Of course, if I am right about this, it means that we should be even more skeptical of improvements in state test results. It doesn't reflect a real increase in learning, but instead improvements in taking the state tests.

Monday, September 28, 2009

Gingrich, Sharpton and Duncan road show: Longer school days

The idea of Newt Gingrich and Al Sharpton going on tour together boggles the mind. (Though I do recall having seen Gordon Liddy and Timothy Leary do a psycho/schizo duet back in the 80s.). But apparently this is serious and includes Arne Duncan, Secretary of Education.

Well the first stop on the tour is in Philadelphia tomorrow (September 29, 2009); and Duncan, possibly prompted by being surrounded with people who don't hesitate to speak their minds, has advocated for longer school days. As a report in the Philadelphia Inquirer states

Six hours a day just doesn't cut it, said Duncan, who comes to town tomorrow to tour two city schools and meet with local education officials. Our school calendar's based on a 19th century agrarian economy. I'm sure there weren't too many kids in Philadelphia working in their parents' fields this summer.

This simple truth points to one of the most obvious things we can do to improve education in the US. We know that children spend more time in school each year in other OECD countries. And we know that children (particular poor children) are helped by longer school days and a longer school year. And if I didn't have to work on my homework, I would look up the sources for my assertions here.

As a prospective teacher, it is not in my personal interest to have longer school days and a longer school year. I'd love to come up with an excuse to advocate against these; but I can't. The facts (which I really will try to cite in an update) are clear. When so many ideas for improving education in America have mixed research behind them, it is nice to have something that is so clear cut.

I need to return to my teacher training homework now; so this posting stops here.

Thursday, September 24, 2009

Promising noises from the Secretary of Education

Secretary of Education Arne Duncan was interviewed by the Christian Science Monitor and made some very promising remarks regarding NCLB in my opinion. There was nothing even approximating specifics, but I think that he hit on a key insight:

[Duncan] hopes to essentially turn the law on its head. The Bush administration’s legislation, he says, kept the goals loose but the steps tight. He hopes instead to see a law that keeps the goals tight but the steps loose.

Here Duncan is referring to the fact that NCLB very tightly monitors how each state meets its own (loose) standards. These can lead to what I and others have called a race to the bottom between states, particularly when states work to avoid comparison of their education standards.

Exactly how an overhaul of NCLB will tighten or provide some uniformity of the goals is not something I know. I can imagine a range of mechanisms each with their own advantages and problems.

Set a national curriculum
The problems with this are legion. I won't dwell on them other than to say there is little reason to believe that the federal government would do a better job at this than even the worst of our fifty states.
Provide interstate comparisons to parents
When parents get accountability information about their child's school and their child's test scores, simply have these compared to national norms. If state officials can no longer hide their state's performance from parents, that might be enough to get states to start racing to the top. A difficulty with this is that it may require even more testing of students using a nationally normed test. There may be technical ways to get comparable data that won't involve more testing, but it will take some thinking about. Another difficulty with this approach is that it the parental pressure it generates will be insufficient to do the job. Finally, we know that it is parents in the upper middle class who exert the most political pressure, but even in lagging states their children will probably be performing above the national norm.

Some combination of those and other things may be part of what gets proposed. I eagerly await the plan. As for loosening the controls on exactly how states meet the (tighter) goals I can't even begin to speculate. From the philosophical point of view, Duncan's remarks seem very promising and sensible. Although I have no idea of how to achieve this, I am looking forward to more specific announcement.

Thursday, September 17, 2009

Thinking about assessment

The education literature likes to make a distinction between assessment for learning and assessment of learning. The distinction is, in my view, a necessary insight, but the way that it is conceived is both too limiting and prone to confusion. In this rant I am going present a somewhat richer framework for discussing different types of assessment for different purposes.

Where I'm coming from

As I've mentioned before, I am training to be a high school math teacher, and I am enrolled in what I consider to be an outstanding program through Collin College. I must confess that when I signed up for the program, I, in my arrogance, did not think that I would learn much. I am pleased to report that I was dead wrong. I won't go into why I was wrong, but I will say that I go to bed thinking about the ideas that come up from class discussion and readings and I wake up thinking about them. I remain (very) critical of some of the argumentation and scholarship in the readings, but it is extremely helpful for me to read them. I'm gobbling them up and loving it.

I have been, and remain, highly critical of the kinds of testing and incentive systems that have been set up by NCLB even though I fully support the goal of keeping schools and districts accountable for how well they serve all students, particularly the ones who are at risk of being left behind. Please see my previous posts on the matter (and more to come). NCLB does appear to be reaching that stated goal but it distorts the educational system as a whole and hinders progress in other important areas. But this essay is about assessment (testing and similar things). Whether you are a critic or supporter of NCLB you will agree that it is has greatly intensified the amount and importance of (standardized) testing in schools.

The Educators' Complaint

The education literature makes a distinction between assessment of learning and assessment for learning. A similar distinction is also called summative assessment and formative assessment. I will not attempt to give a full definition of these here. I don't think that the definitions in the literature bear up under close inspection, and the fuller the definition the less enlightening it is. Instead here is the rough idea through examples. Assessment of includes things like the TAKS, end of term exams, and major examinations that determine a student's grade. Assessment for learning is the on-going assessment that teachers engage while teaching. These include asking questions of the class, seeing what sorts of questions students ask. These are considered for learning because they help the teacher adapt teaching to the particular student.

The problem with our increased emphasis on assessment of learning is that most of that assessment isn't pedagogically useful. Some even argue that it is harmful in and of itself beyond the misdirection of resources (although I have my doubts about that claim). NCLB is a reality (which really does appear to be meeting its narrow, but important, goals), but the concern among educators is that it leads to too much pedagogically useless assessment. I agree, but I think that we are talking about assessment in a far too limiting framework.

Distinguishing distinctions

When we look at assessment, and try to categorize it, I think that we need to be looking at two dimensions, instead of the one-dimensional approach in the of-for distinction. We need to ask

  1. What is the form of the assessment?
  2. What is the purpose of the assessment?

The current discussion seems to think that all standardized tests (form) serve only to assess what a student has learned and not to adjust teaching (purpose), while all of the less formal (form) assessments are only used to adjust teaching (purpose). Certainly there is a strong connection between form and function, but when looking at assessment it will be useful to look at these along these two not-quite-independent dimensions.

Three purposes

When it comes to considering the various purposes of assessment I think that it is helpful to consider three separate purposes, not just the two in the existing conceptualization.

  1. Adjusting: to help adjust teaching to the needs of the particular student
  2. Grading: to provide feedback to student and family, to assign grades and work as an incentive
  3. Accounting: to evaluate the teaching of the teacher, school, district.

Accounting is what we see in the testing that follows from NCLB. It is about rating and evaluating schools and districts (and within districts it will be used to evaluate teachers). It is the school administrators who have the most to gain or lose by these test results. And they are typically done at the end of the school year. Although students who fail the test will be intensively tutored so that they will pass a retake, these tests are not used to help students directly.

Grading is typically the assessments that a course grade is based upon. These are presented to parents and students. These become part of a student's record and are intended to indicate how much the student learned. Of course these will also feed back on how a particular student is taught. A teacher can learn from these that a student is not meeting expectations and so can look for ways to help the student. One characteristic of grading assessment is that it (almost) never goes beyond what has been taught in class.

Adjusting is used primarily to help determine how to teach a particular student. These can range from everyday queries while teaching to see if students are getting it or not. But at the other extreme these can be the kinds of evaluations that are used to determine whether a student should be in a gifted and talented program or in special education. Those typically involve highly formalized exams, but are used exclusively for determining how best to teach an individual student. Homework may be part of a student's grade (usually to get them to do it), but is used primarily as a frequent check of whether something needs to be retaught.

Any particular assessment can (and often) will serve multiple purposes. But when looking at any particular assessment it is useful to keep those three purposes in mind.

Form follows function except for when it doesn't

If you've been talking about the differences between similes and metaphors in class you may ask for examples to help with the learning that day (adjusting). But you may also ask for examples of each on an end of term examination (grading). So the same form can be used for different purposes in different contexts. I've praised the MAP testing that PISD does. But I honestly don't know what they use it for. I would hope that they use it to help differentiate teaching (adjusting), but it may be used primarily to track teacher performance (accounting). So here is a particular standardized test administered exactly the same way could be used for entirely different purposes.

Some forms of assessment really are single purpose. Some like the Texas TAKS tests can't be used for much other than accounting, and then only a limited type. The test is designed to distinguish between students who have acquired the basic knowledge expected for the grade level from those who have not. It doesn't do a very good job of discriminating between students at the high end or very low end. It is hard for me to imagine a set of exams that is more narrowly focused on one purpose.

With understanding come solutions

This understanding of purposes can bring real, practical, recommendations. The TAKS serves little direct pedagogical purpose other than accounting, we could save a great deal of time and money (that could then go to actually improving education) by sampling. Not every student needs to take the TAKS in every subject. Consider fifth grade TAKS requirements. Students take Reading, Math and Science. Not counting make-ups and such, that takes three full days for the students' to complete. But if the goal is to measure a schools' performance, then have one third of the students take Reading, one third Math, and one third Science. Students would be randomly assigned with neither student nor school staff knowing which student gets which test until test day. All of the tests can then be given on the same day.

I believe that the framework I've introduced above, first separating form from purpose and then distinguishing three separate purposes for assessment, allows for a more useful discussion of assessment than is common. At least it helps me think about these things more carefully, and I hope it does the same for any readers I might have.