Friday, January 15, 2010

Checking prior results in student retention estimation

Prior results in learning predictors for student retention are shown here.

Those reports offer a variety of numbers. But using the Zhang equations, we can computer the missing from the given:

calc(Pos,Neg,Prec,Recall, Pf,Acc) :-
        Pf      is Pos/Neg * (1-Prec)/Prec * Recall, 
        D       is Recall * Pos,
        C       is Pf * Neg,
        A       is C*(1/Pf - 1),
        Acc     is (A+D)/(Neg + Pos). 

Then we can write a simulator to explore a range of possible values. For example, for the Atwell paper:

run(atwel,[prec/Prec,neg=Neg,pos=Pos,pf/Pf,pd/Recall,acc/Acc]) :-
        nl,
        member(Prec,[0.88,0.82,0.73]),
        N   = 5990, 
        Neg = 4881, 
        Pos is (N-Neg), 
        member(Recall,[0.65,0.7,0.75,0.8,0.85,0.9]), 
        calc(Pos,Neg,Prec,Recall,Pf,Acc).

When we run this, we get the following numbers. Note the suspiciously low false alarm rates:

[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=65, acc=92]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=70, acc=93]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=75, acc=93]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=80, acc=94]
[who=atwel, prec=88, neg=4881, pos=1109, pf=3, pd=85, acc=95]
[who=atwel, prec=88, neg=4881, pos=1109, pf=3, pd=90, acc=96]
[who=atwel, prec=82, neg=4881, pos=1109, pf=3, pd=65, acc=91]
[who=atwel, prec=82, neg=4881, pos=1109, pf=3, pd=70, acc=92]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=75, acc=92]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=80, acc=93]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=85, acc=94]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=90, acc=94]
[who=atwel, prec=73, neg=4881, pos=1109, pf=5, pd=65, acc=89]
[who=atwel, prec=73, neg=4881, pos=1109, pf=6, pd=70, acc=90]
[who=atwel, prec=73, neg=4881, pos=1109, pf=6, pd=75, acc=90]
[who=atwel, prec=73, neg=4881, pos=1109, pf=7, pd=80, acc=91]
[who=atwel, prec=73, neg=4881, pos=1109, pf=7, pd=85, acc=91]
[who=atwel, prec=73, neg=4881, pos=1109, pf=8, pd=90, acc=92]

Similarly for the delong results. Here's the query:

run(delong,[prec/Prec,neg=Neg,pos=Pos,pf/Pf,pd/Recall,acc/Acc]) :-
        nl,
        Neg is 500,
        Pos is 500,
        member(Prec,[0.57,0.58,0.59]),
        member(Recall,[0.65,0.7,0.75,0.8,0.85,0.9]), 
        calc(Pos,Neg,Prec,Recall,Pf,Acc).

And here's the results. Note the very high false alarm rates and mediocre accuracies.

[who=delong, prec=57, neg=500, pos=500, pf=49, pd=65, acc=58]
[who=delong, prec=57, neg=500, pos=500, pf=53, pd=70, acc=59]
[who=delong, prec=57, neg=500, pos=500, pf=57, pd=75, acc=59]
[who=delong, prec=57, neg=500, pos=500, pf=60, pd=80, acc=60]
[who=delong, prec=57, neg=500, pos=500, pf=64, pd=85, acc=60]
[who=delong, prec=57, neg=500, pos=500, pf=68, pd=90, acc=61]
[who=delong, prec=58, neg=500, pos=500, pf=47, pd=65, acc=59]
[who=delong, prec=58, neg=500, pos=500, pf=51, pd=70, acc=60]
[who=delong, prec=58, neg=500, pos=500, pf=54, pd=75, acc=60]
[who=delong, prec=58, neg=500, pos=500, pf=58, pd=80, acc=61]
[who=delong, prec=58, neg=500, pos=500, pf=62, pd=85, acc=62]
[who=delong, prec=58, neg=500, pos=500, pf=65, pd=90, acc=62]
[who=delong, prec=59, neg=500, pos=500, pf=45, pd=65, acc=60]
[who=delong, prec=59, neg=500, pos=500, pf=49, pd=70, acc=61]
[who=delong, prec=59, neg=500, pos=500, pf=52, pd=75, acc=61]
[who=delong, prec=59, neg=500, pos=500, pf=56, pd=80, acc=62]
[who=delong, prec=59, neg=500, pos=500, pf=59, pd=85, acc=63]
[who=delong, prec=59, neg=500, pos=500, pf=63, pd=90, acc=64]

No comments:

Post a Comment