Tuesday, May 31, 2011

Data Quality - An Evaluator's Job?

Recently, P.Allison Minugh posted this question on the AEA group on LinkedIn:

I find there isn't much interest in data management, so I am curious: How important is data management to your evaluation studies, and why or why not? 

My Response was: 

In South Africa, the issue of data management has been consistently handled under what we call "data quality" or "information quality" specialization fields. It has become increasingly more visible at our evaluation conferences, and we are starting to develop a framework for the training and certification of information quality professionals.

Recently there was a Data Quality Conference in Pretoria, and my impression was that Data Management seems to be an IT function in the USA (with a push towards standards like ISO 8000). Here, In South Africa, it is often part of the M&E officer's job. It really is a grassroots concern - How to capture clinical data from paper records, how to make data available across clinics, how to reduce double counting, how to ensure that data collection tools are designed to enhance VRIPT (Validity, Reliability, Integrity, Precision and Timeliness), how to set up your Data Management System (Collection, Collation and Capturing, Reporting and Use) to ensure optimal quality and use.

Data Quality Assessments and Audits have become increasingly more pervasive - in especially the Health Sector (where District Health Information Systems need to produce all kinds of data for reporting on development initiatives, also to major donors like USAID) and the Education Sector (where the Educational Management Information System is used).

Some of my colleagues at FeedbackRA have recently done the "Information Quality Certified Professional" course. More info on this at: http://www.feedbackra.co.za/data-quality-qualifications/

A good book on the topic is titled "Data Quality Assessment" by Arkady Maydanchik


Monday, May 30, 2011

Values and Evaluation

The AEA’s Annual Conference (Wednesday, November 2, through Saturday, November 5, 2011 in Anaheim, California) will focus on Values. eVALUation was also the topic of the last SAMEA conference in 2009.

Jennifer Greene says about this theme:

Like culture, evaluation is inherently imbued with values. Our work as evaluators intrinsically involves the process of valuing, as our charge is to make judgments about the “goodness” or the quality, merit or worth of a program. Judgments rest on criteria, which in turn reflect priorities and beliefs about what is most important. At Evaluation 2011, I would like us to take up the challenges of values and valuing in evaluation, particularly the plurality of values represented by different evaluation purposes and audiences, key evaluation questions, and quality criteria. I anticipate that greater attention to and openness in the value dimensions of our work can improve our practice, offer voice to diverse stakeholder interests, and enhance our capacity to make a difference in society.

Last week, as we celebrated Africa Day, I thought a little about what it means to be an African. This was my FB status update for the day:

I dream in a language that grew up on the African continent, my forebears shed blood, sweat and tears to help tame the land that is my home, and the spirit of Ubuntu directs my choices. In the words of Mbeki: “I am an African”.

This made me think about the philosophy of Ubuntu and how it translates into values which affect my dealings as an evaluator. Ubuntu means “I am what I am because of who we all are”

The Arch, Desmond Tutu, explained it so:

A person with Ubuntu is open and available to others, affirming of others, does not feel threatened that others are able and good, for he or she has a proper self-assurance that comes from knowing that he or she belongs in a greater whole and is diminished when others are humiliated or diminished, when others are tortured or oppressed.Ubuntu speaks particularly about the fact that you can't exist as a human being in isolation. It speaks about our interconnectedness. You can't be human all by yourself, and when you have this quality - Ubuntu - you are known for your generosity. 

There is a Zulu saying: “umuntu ngumuntu ngabantu” which means:  a person is a person through (other) persons” which is very different from “Cogito ergu sum” or "I think, therefore I am".  

I could immediately think of five implications that Ubuntu has for evaluators:

  • You need to be very aware of your role and the role of others as representatives of a bigger collective. Mutual respect is of the utmost importance. This “respect” will affect the way in which you ask questions, and you must interpret people’s answers in this context. Do not be surprised if you have to go to great lengths to get people to provide constructive criticism.  
  • When you share evaluation feedback, affirmation is very important. When you share negative findings, it must never be humiliating for an individual or a group of people.
  • As an evaluator, you are part of the bigger picture. You have an important role to play in a system of interconnected people, organizations and stories. If you try to be the “know-it-all external evaluation specialist” you will hit a wall. Listening and conversing, allowing people to participate in the meaning creation process, is essential.
  • There are many opportunities for “being generous”: If you evaluate a community based organization that takes time to answer your questions and provide you with some of their truly South-African hospitality, you might as well provide something in return. Writing up the evaluation findings in a form that they (not only the donor) can understand and use is one way. Sharing some of your technical knowledge (e.g. how to organize data, where to find a budget template, contact details of other people who work in the same field and could assist) is another way. Sometimes you might even share your evaluation tools and templates with people who did not pay for this “intellectual property”.
  • You have a responsibility to give back. Taking an inexperienced evaluator under your wing or volunteering your time for a good cause shows that you recognize you are where you are because others were willing to share with you. It is not uncommon for people who stay in abject poverty to share the little that they have with each other. Those who have more, probably have a responsibility to share more.

Friday, May 27, 2011

Lessons for Evaluators

This week, a vicious rumour circulated that the speech below was delivered by a mayor of a large South African city. 

Barrie Bramley, writes that it is, however, an unedited clip recorded by an actor for a milk advertisement.

Both the vicous rumour, and the contents of the clips have some lessons for evaluators:

1. Using big words in your reports and presentations will not hide an incoherent argument
2. Being long-winded bores your audience and delays tea
3. Sources should be double checked ALWAYS!
4. Comments made should be based on checked facts.

Have a lovely week!

Thursday, May 26, 2011

22 Seems to be the Magic Number in Solving Education problems!

In a previous post, I introduced the book by Stuart S. Yeh Entitled “The Cost-Effectiveness of 22 Approaches for Raising Student Achievement”.
Now the World Bank released a report entitled “Making Schools Work – New Evidence on Accountability Reforms” which is based on 22 recent impact evaluations of accountability-focused reforms in 11 developing countries. I wonder why this fascination with the number 22?

In the book (written by Barbara Burns, Deon Filmer and Harry Anthony Patrinos) they investigate strategies to address “service delivery failures” where increased spending does not lead to a concomitant change in education output (completion) or outcomes (learning). The idea is that if people in the schooling system are held accountable, things will improve.

This book focuses specifically on
three key strategies to strengthen accountability relationships in school systems—information for accountability, school-based management, and teacher incentives
 and looks into how these can affect school enrolment, completion, and student learning.

Main findings about the three strategies include:

Information for accountability (for example – providing “school report cards”) seems to work, but it isn’t a solution to all the problems. Which information is shared, who it is shared with and how it is shared are important considerations which could help parents, communities and other role players identify where the weaknesses in the system is.

School based management reforms (e.g. implementing effective school governance, and school based management) are effective, but these “reforms need at least five years to bring about fundamental changes at the school level and about eight years to yield significant changes in test scores”

Teacher incentives of two kinds have been investigated: Contract teachers (where teachers are contracted on condition that they deliver certain results), and pay for performance reforms (bonuses from meeting targets) seem to be successful too, but perverse behaviours (Such as gaming, cheating or teaching to the test) are likely to abound and eventually negate the overall success of this strategy.

We’ve seen some progress in this regard in the South African schooling system: School Management and Governance training remains an important component of “whole school” development, and the implementation of the Annual National Assessments (ANA) is likely to evolve into an “information for accountability” initiative. (Also see this article about the ANA’s in the local press). Perhaps its time to take the hand of the labour unions and see how incentivising teachers can be implemented?

Wednesday, May 25, 2011

Cohen's d and Effect Size

In my previous posting I explained the idea of significance testing. A statistically significant result does not necessarily mean that the result is practically significant. The “effect size” usually gives an indication of whether something is practically significant.

There are a couple of different ways of calculating an effect size.

r which is the correlation coefficient or R² which is the coefficient of determination
Eta squared ή²

Cohen’s d

This time, I will focus on Cohen’s d.

If you did a t-test, it’s usually a good idea to calculate cohen’s d.

Cohen's d is an appropriate effect size for the comparison between two means. It indicates the standardized difference between two means, and expresses this difference in standard deviation units. The formula for calculating d when you did a paired sample t test is:

Cohen’s d = Mean difference

                 Standard deviation

If you have two separate groups (in other words you conducted an independent sample t test), you use the pooled standard deviation  instead of the standard deviation.

If Cohen’s d is bigger than 1, the difference between the two means is larger than one standard deviation, anything larger than 2 means that the difference is larger than two standard deviations. It is seldom that we get such big effect sizes with the kinds of programmes that I evaluate, so the following rule of thumb applies:

A d value between 0 to 0.3 is a small effect size, if it is between 0.3 and 0.6 it is a moderate effect size, and an effect size bigger than 0.6 is a large effect size.

Here is an example:

Kids wrote a grade 12 exam, then completed a programme that provides additional compensatory education, and then they rewrite the grade 12 exam. Below is a table that compares the Maths mark prior to the programme, to the Maths mark after the programme.

The result is statistically significant (see the last column, p < .000). The learners' results, on average, improved with about 9.9% (Mean difference is indicated in the “mean” column. Usually such a result is indicated as follow:

t (54) = 6.852; p <  .000

To calculate Cohen’s d, we divide the mean difference by the standard deviation

d = mean difference/ standard deviation = 9.98148 / 10.70442 = 0.932

0.932 is larger than 0.6 so this can be classified as a large difference. In fact it is close to 1, which means that this programme probably helped the learners, on average, to improve their marks with about 1 standard deviation. That is amazing!

Monday, May 23, 2011

Means and p values.

On comparing two groups’ means (or averages), it’s not sufficient to only compare the means –Because an average is just one statistic that summarises a whole distribution of scores.

In the picture below, the mean age at which these kids first drank alcohol, was around age 14. But there are kids who started earlier, and some who started later.

When comparing two means, it is important to determine whether the two distributions differ so much, that it is unlikely that they are both from the same bigger population.

If they differ, the null hypothesis is rejected. If they don’t differ, the alternative hypothesis is rejected.

Notice: although the means differ in B, the overlap in distributions is quite large.

Depending on the scale of the data (nominal, ordinal, interval or ratio) the properties of the distributions (normally distributed or not) and the kind of comparison that’s required (i.e. two independent groups e.g. boys and girls; or two measures for the same group e.g. average for boys before the programme, and after the programme) different statistics may be used.

Usually, we do a t test which yields a t statistic or an ANOVA which yields an F statistic, or their non-paramatric equivalents – the Mann Whitney or Kruskall Wallis test. Because it isn’t very easy to off- hand know if a t of 112 is good or bad, these statistics are converted to a p value (probability value) which indicates how probable it is that the null hypothesis is true.

If the p value is smaller than <0.05, the null hypothesis is rejected – there is only a 5% chance that the two distributions are the same.

Just look carefully at that criterion: p values of 0.5 (50%) and 0.06 (6%) are bigger than 0.05, the null hypothesis will be accepted. A p value 0.045 (4.5%), or any value such as p < 0.000, is smaller than 0.05 and would therefore mean the null hypothesis should be rejected - in other words, the two means differ statistically significantly.

A cut off of p = 0.05 is conventional, but a p of 0.1 (10%) or 0.001 (1%) is sometimes used as a cut-off criterion (depending on the likelihood of Type I and Type II errors)

A result like the one below means:
t (163) = -2.68, p < .05
The t statistic for the means calculated from two groups with 163 cases is -2.68, and is statistically significant at the 5% level.

F (2, 1015) = 111.286, < .001
The F statistic, for a sample of 1015 cases with 2 degrees of freedom (i.e. three groups) is 111.286 and is statistically significant at the 1% level.

The smaller the p value is, the happier you should be – because it means that you will have something interesting to report on!

Friday, May 20, 2011

The Evaluator and Statistics

I had to calculate Cohen's d, Eta Squared and r today, but thought that would be too dreary a topic for a Friday blog post.

Instead, I found some statistics quotes here

Some of my favourites:
Torture numbers, and they'll confess to anything. ~Gregg Easterbrook

He uses statistics as a drunken man uses lampposts - for support rather than for illumination. ~Andrew Lang

Statistics can be made to prove anything - even the truth. ~Author Unknown

Tuesday, May 17, 2011

Evaluation of iPad for Education

Reed College, in Portland Oregon, reports on their evaluation of the use of the iPad in class here. Reports on the use by students and faculty are available.

Whereas a previous evaluation was very critical of the usefulness of the Kindle DX in the class context, this report seems to support the adoption of tablets in the classroom.

The report particularly commented positively on the legibility of material on the iPad, the usability of the touch screen and the size and weight of the tablet. It was also found to be particularly useful if students wanted to switch between texts in class, and the search ability and navigation within texts was also positively evaluated.

They commented that PDF transferability was somewhat difficult, the filing system was not optimally user friendly and that the on-screen keyboard of the iPad did not efficiently support more than short comment typing. Some other concerns related to cost factors and accessibility

Monday, May 16, 2011

Better Evaluation Virtual Writeshop

Irene Guijt posted this on the Pelican List serv today

Perhaps you have undertaken an evaluation on a program to mitigate climate change effects on rural people living in poverty, or one on capacity development in value chains. Or worked on participatory ways to make sense of evaluation data, or developed simple ways to integrate numbers and stories. We’d like to bring unknown experiences to the global stage for wider use.

Do you have an experience that covers many different aspects of evaluation – design, collection, sensemaking, and reporting? Did you look at different options to develop a context-sensitive approach? And has your evaluation process not yet been shared widely? If your answer is yes to these questions, then our virtual writeshop on evaluation may be of interest.

We will facilitate a virtual writeshop between May and September 2011 that will lead to around 10 focused documents to be shared globally. Participating in the writeshop will give you structured editorial support and peer review to develop a publication for the BetterEvaluation site.

For more information, including how to submit a proposal, go here.

Wednesday, May 11, 2011

22 Approaches for Raising Student Achievement

I’m working my way through a book by Stuart S. Yeh Entitled “The Cost-Effectiveness of 22 Approaches for Raising Student Achievement” (Also available as an ebook http://ow.ly/4R62s or on paper from www.loot.co.za see: http://ow.ly/4RRll)

The book (based on Studies in the States) concludes that:

The review of cost-effectiveness studies suggests that rapid assessment is more cost effective with regard to student achievement than comprehensive school reform (CSR), cross-age tutoring, computer-assisted instruction, a longer school day, increases in teacher education, teacher experience or teacher salaries, summer school, more rigorous math classes, value-added teacher assessment, class size reduction, a 10% increase in per pupil expenditure, full-day kindergarten, Head Start (preschool), high-standards exit exams, National Board for Professional Teaching Standards (NBPTS) certification, higher teacher licensure test scores, high-quality preschool, an additional school year, voucher programs, or charter schools

I find this interesting, because this makes a very compelling argument for using computer based learning in schools. The first chapter of the book presents a nice theoretical overview that indicates how kids become disheartened if they don’t consistently have mastery experiences. If assessment and teaching can be individualized so that each learner progressively improves compared to their previous performance (rather than a comparison with peers) they are likely to feel that they are in control of their learning, and they would be more likely to stay engaged in the learning process. The chapter states:

A theory of learning may be deduced: Individualization of assessment, task difficulty and performance expectations for each student on a daily basis, in combination with performance feedback, autonomy in task execution and an accelerating standard of performance, ensures that students achieve success and feel successful on a daily basis, fostering student engagement, increased effort, and further improvements in achievement in a virtuous cycle.

Software can automate the provision of corrective feedback, and assigning of content (on a daily basis) and this has been shown to have powerful effects on learning and achievement. Yeh reports that in a study of Math Assessment, (involving 1,880 students in grades 2 through 8, 80 classrooms and seven states) they found an effect size of 0.324 Standard deviations over a 7 month period. This is a huge increase in learner performance.

Of course we need to remember that this is based on the American school system that is very different from ours. And note the study is about "Cost Effectiveness". It is not saying that the other strategies are not effective - In their context the other strategies were effective but at a higher cost than Rapid Assessment

Of course there are a few other assumptions that will have to be checked if a similar intervention is implemented in South Africa:
1) Computer infrastructure, technical support, and teacher’s abilities must be supportive of successful implementation. It is a lot harder to implement an ICT based project than one might imagine at the outset.
2) Rapid assessment cannot replace the role of the teacher – It can help learners improve, but they still have to get quality tuition from a qualified teacher.
3) Learners’ ability to interact with the software must not be blocked by poor reading / language capabilities.

Tuesday, May 10, 2011

GIS Data and Maps to Find South African Health Facilities

Here is a neat resource I came across at the ESI Data Quality Conference held in March.

It contains some basic data and various layers of health facility data in South Africa, and allows easy map sharing. Check it out at:


Monday, May 09, 2011

Some Educational ICT solutions I've come across

VITALmaths which is a collection of video clips that's accessible from cellphone. They demonstrate basic maths concepts.

LearnThings which provides online learning resources for Maths and Science teaching.

Crocodile Clips a variety of maths games.

Connexions a repository of online teaching and learning resources.

Master Maths M2 Computer Based Training proprietary software for self paced Maths Learning

NovaNet and Success Maker proprietary software from the USA for self paced learning available through a South African distributor

Inspiring Read about South African Issues