Finding 2.1 & 2.2: NCTQ argues that teachers should receive an annual official rating of some kind and that it should occur early enough in the school year to "provide sufficient time for struggling teachers to improve and for administrators to make a final decision about a teacher's continued employment within that school year." There is certain merit to these statements - administrators should definitely know what is happening in their classrooms and teacher should have enough time to improve. I would argue any good principal does this all the time and does not need a formalized process to ensure high-quality instruction is taking place.
Additionally, NCTQ puts pressure on the teacher's contract, pointing out the difference between a "rating" and an "evaluation" happening yearly as a reason why low quality tenured teachers remain in the classroom. I find this to be a misuse of the term "tenure" at its finest. A teacher with 3+ years of experience is not guaranteed a job - there is just a procedure in place that does make it possible to be dismissed within a 3-year cycle of formal observations. Oftentimes, however, the process is difficult to understand and principals have so many other responsibilities with less budget money to pay for assistants/secretaries, they might find less time to complete the procedures.
Finding 2.3: This is the section everyone expects in a discussion of evaluation. NCTQ argues that more test-driven data using value-added modeling (VAM) be used to evaluate teachers "objectively." As I have mentioned on this blog before, there is a lot of debate over whether VAM is a valid and reliable measure of teacher effectiveness. Gary Rubinstein has a six-part analysis of the NYC VAM data and debunks the theory that your scores remain the same year-to-year or school-to-school. I don't think this data is much better in Philadelphia.
Finding 2.4: NCTQ finally brings up that multiple measures are important in evaluating teachers. Bill Gates himself mentioned the importance not to rely too much on test scores when measuring teacher's effectiveness. One of the most important parts of this finding is that outside observers are critical, especially from peers who are experts in your subject-area as a high school teacher. While I received a new teacher coach my first few years teaching they were keenly unaware of the trials of a math classroom and all they had to say was "good job" on multiple occasions. I find it hard to swallow, however, that there is a magic "percentage" that would make it so that teachers do not focus on the test as much as other aspects of their teaching. Next year in Philadelphia we will be evaluated on observations (50%), teacher-specific test scores (15%), building-level test scores (15%), and elective data (20%). That is a lot of testing to worry about.
Finding 2.5: NCTQ argues something positive and gives the example of DC teachers in focusing on what teachers are evaluated on during formal observations. We currently use a framework with a variety of weights for certain pieces and it often seems a bit arbitrary. So much so that while I received an ecstatic rating this year, I almost was considered unsatisfactory two years ago. How can I be that different today?
Finding 2.6: NCTQ argues that we should be rated on a scale of more than just Pass/Fall (Satisfactory/Unsatisfactory). I can agree to a certain extent but I am wary of what that will mean in the future. If some teachers are declared "distinguished" and given performance bonuses, it is entirely possible that high-quality teachers will dumb themselves down in order to teach to a test and get that award. I am not saying it is bound to happen but I am worried.
Overall, this section was unsurprising. Most education reformers say similar things nowadays about incorporating "objective" data into evaluations. From my research these tests are not truly objective nor are the mathematical frameworks set up to compare teachers to each other through VAM.