new! The Impact of Testing in Maryland (2011)

The objective of this project is to investigate the validity evidence based on consequences of the two statewide tests for Maryland, MSA and HSA, focusing on the impact on the three aspects: students, teachers, and central administrations.

1. The Impact of Testing in Maryland : Main effects tables
2. The Impact of Testing in Maryland : Interaction tables

3. Principal Component Analysis (PCA) of the Survey Data: Item loadings
4. Full Report of The Impact of Testing in Maryland

new! Using Student Growth Models For Evaluating Teachers and Schools (2012)

new! The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling: Hope Versus Reality (2012)

new! PowerPoint Presentation (longer version)
new! Presentation for AERA Division H, April 2012 - Vancouver, Canada

new! A Comparison of VAM Models (2012)

Quality Control Charts in Large-Scale Assessment Programs (2010)

Consideration of Test Score Reporting Based on Cut Scores (2009)

Modeling Growth for Accountability and Program Evaluation: An Introduction for Wisconsin Educators (2009)

This work was funded by a contract between the senior author, AIR, and the Wisconsin State Department of Education. We thank the state for permission to make this paper available.

Multiple Choice Items and Constructed Response Items: Does It Matter? (2008)

Content and Grade Trends in State Assessments and NAEP (2007)

Each state is required by the No Child Left Behind Act to report the percents of its students who have reached a score level called "proficient" or above for certain grades in the content areas of reading (or a similar construct) and math. Using 2005 data from public web sites of states and the National Assessment of Educational Progress (NAEP), state-to-state differences in percents were analyzed, both unconditionally and conditionally on NAEP, for (1) trend across content areas (horizontal moderation), (2) trend across grade levels (vertical moderation), and consistency with NAEP. While there was considerable variation from state to state, especially on an idealistic-realistic dimension, the results generally show that states are relatively consistent in trends across grades and contents.

Universal Design in Educational Assessments (2006)

Consistency in the Decision-Making of HSA and MSA: Identifying Students for Remediation for HSA(2006)

This study examined the consistency in the decision-making of HSA and MSA with a purpose of identifying students who are at risk of failing the HSA. The data on which the study was based consisted of the HSA and MSA scores collected from four counties in Maryland. The HSA scores were obtained from the 2004 administration in the subject area of English, and the MSA scores came from the 2003 administration in reading.

In this study, existing cutoff scores were used to dichotomize the MSA scale and re-categorize students so that the resulting categorizations could be compared with the pass/fail determinations of the HSA. Results show that the cut score for passing the HSA was more demanding than the cut score for the "proficient" category in the MSA. Similarly, it shows that the cut score for the "advanced" category in the MSA was set slightly too low, if the purpose is to identify students who are likely to fail the HSA. With regard to which cut score to use to identify students for remediation, it was recommended that students who are below proficient should be selected initially so that the wasted use of the remediation program is minimized.

The Prediction of Performance on the Maryland High School Graduation Exam: Magnitude, Modeling and Reliability of Results (2006)

This research identified potential predictors of at risk students before they take the Maryland High School Assessment (HSA) English examination . Research was based on data collected for the years 2002-2004 for students at four different school systems. To the extent possible, this study utilized the same data in each of the four school systems. The systems differ considerably as to the nature of the student population, size of the system and whether the system setting is more urban or rural.

The analysis of these data was done separately for each County. It occurred in a sequential manner with initial calculation of descriptive statistics, followed by ordinary least squares (OLS), and finally multilevel modeling (HLM). The results of the four school systems are compared on three factors: 1) the similarity of variables that are significantly related to HSA performance (their reliability); 2) the modeling (OLS versus HLM) that works best for predicting the HSA score; and 3) the magnitude of the prediction. Several potential indicators for the performance of HSA are identified by the study and discussed in the paper. These include two measures of reading (performance on MSA Reading and the Scholastic Reading Inventory, poverty, special education, and English Language Learner status. There was some evidence that a student's attendance, English scores at midterm, and GPA also seem to be related to his or her performance on the HSA English 1 exam.

Growth Scales as Alternative to Vertical Scales (2006)

Vertical scales are intended to allow longitudinal interpretations of student change over time, but several deficiencies of vertical scales call their use into question. Deficiencies of vertical scales were discussed and growth scales, a criterion-referenced alternative, were described. Some considerations in developing and using growth scales were suggested.

Harford Reading Excellence Program 2001-2002 - Focus: Program Evaluation & Support

The Harford Reading Excellence Act Grant program was a two-year program that strives to improve reading instruction in five target schools in Harford (MD) County Public Schools by implementing research-based instructional approaches through professional development and by providing a literacy-centered summer school program. This report summarized and highlighted the student outcomes based on Reading Excellence Act Grant program data as well as data from the Maryland State Department of Education. Contact: Robert Lissitz; Melissa Fein performed the evaluation.

Weighting Components - Focus: High School Assessments

This topic was prompted by the problem of weighting constructed-response and selected-response items in the high school assessment program. A literature-based paper was written by Lawrence Rudner and forwarded to the Psychometric Council. The paper was subsequently accepted for publication in Educational Measurement: Issues and Practice.

MD Assessment Web Site - Focus: All Assessment Programs

A searchable computer data base for all known literature on Maryland assessment programs was needed. Lawrence Rudner was the lead researcher and used MARCES-funded ERIC-based personnel for support. They received copies of all papers MSDE has collected over the years and organized the database. See /mdarch/

Evaluating MSPAP - Focus: MSPAP

MSDE had received two evaluative reviews of MSPAP, a content review chaired by Bill Evers and a psychometric review chaired by Ronald Hambleton. Each of these generated considerable reactions from MSDE and outside individuals. MSDE had a need for an evaluative synthesis of all this material with recommendations about how to proceed to improve assessment in Maryland. After much discussion, MARCES decided to solicit an independent contractor for this purpose. The study was done by Edys Quellmalz at SRI (Stanford). Bill Schafer was the MARCES lead for the project.

SE(PAC) - Focus: MSPAP

There is interest in expanding the methodology for generating standard errors of percents above cut to the district and state levels. But there is concern that the theory may not support generalizations at these large levels. A Monte-Carlo study was proposed and agreed to. The MARCES contact was Bill Schafer; Yuan Lee performed the study as a subcontractor.

Evaluating Service Learning - Focus: High School Graduation Requirements

Maryland has a requirement of a service-learning project for high school graduation. The effectiveness of the requirement was evaluated for its purpose and the quality of its implementation. The MARCES contact was Bob Lissitz; Melissa Fein, a MARCES affiliate, performed the study.

Test-Based Methods for Evaluating Teachers - Focus: General

Measurement of teacher quality has obvious potential for policy as well as personnel decisions. It seems reasonable to base assessment of teacher quality on student outcomes. Since these outcomes deemed most important are measured with educational tests, it is desirable to explore ways to quantify teacher quality with data from them. However, there are several methodological approaches to this problem. A study to compare them was carried out by Terry Alban on a contract basis; Bob Lissitz was the MARCES contact.

Annual Conference - Focus: General

The annual MARCES conference is held each year in mid August. Bob Lissitz is the MARCES contact.

Computer Adaptive Testing - Focus: High School Assessments

It would be desirable to have a diagnostic, computer-adaptive assessment to use for candidates who are about to take the high school assessments. This would yield a likelihood of passing and areas of particular weakness, should they exist. In order to demonstrate what is feasible, a prototype system was under development. Lawrence Rudner was the MARCES lead and Phill Gagne was assisting.

Confounding with Writing - Focus: MSPAP

Since the MSPAP format was constructed-response, the potential existed for writing to be confounded with other achievement domains as an artifact of the assessment, itself. The degree to which the MSPAP scoring process was resistant to such confounding had not been assessed. A study in which writing was manipulated for good and poor content was planned. Bob Lissitz was the MARCES contact and Phill Gagne was assisting.

Process Control for MSPAP - Focus: MSPAP

Each year the Psychometric Council reviewed each phase of the MSPAP analyses. Since eight years of data existed by this time, it seemed reasonable to use process control methods to identify quantitatively areas where the Council should have focused particular attention. Bill Schafer was the MARCES contact and Ying Jin was assisting.

Validity of Accommodations - Focus: MSPAP

Tippets, using confirmatory factor analysis for accommodated vs. non-accommodated students, evaluated the comparative internal structures of MSPAP. However, new insights in the structural equations modeling field had never been applied to her report. Further, whether it may or may not seem logically appropriate, it may not have been reasonable to report reading scores for students who received the reading accommodation if the validity of MSPAP was different for them. Bill Schafer was the MARCES contact and Mara Freeman was assisting.

Web-Based Technical Manual - Focus: MSPAP

The MSPAP technical manual was recreated and redistributed every year, although much of it was redundant. It seemed possible to use the web to update the manual. Perhaps only one manual was needed, with tables and text only to augment whatever was no longer current at a given time. We considered organizing the manual around the AERA/APA/NCME Standards for Psychological Testing. Bill Schafer was the MARCES contact and Mara Freeman was assisting.

Combining Data for School Performance Indices - Focus: MSPAP

At the time, only MSPAP data and attendance were combined into school performance indices (SPIs) for elementary schools. There was interest in adding a component for CTBS/5 data. A review of approaches in the various states with emphasis on several states that illustrate very different approaches had been developed. A briefing paper on the desirability of adding a new data source to the SPI was developed and a meeting was held with the State Superintendent where the topic was discussed. Bill Schafer was the MARCES contact and Geoff Wiggins was assisting.