moj.similarity.measures
Class BinCAS_row

java.lang.Object
  extended by moj.similarity.SimilarityMeasure
      extended by moj.similarity.measures.BinCAS_row

public class BinCAS_row
extends SimilarityMeasure

CA MEASURE OF ASSOCIATION
http://www.psych.cornell.edu/Darlington/crosstab/table3.htm
CA (for "conditional association") is a modification of OGE that in principle can be defined for tables of any size, but in practice is computed easily only for 2x2 tables, so we explain it in those terms. CA deals with the fact that phi and OGE may be low partly because of the difference between row and column marginal totals. The larger this difference, the lower phi and OGE will usually be. CA is independent of the difference in marginal totals, and can be 1 regardless of the size of that difference.

In a 2x2 table exhibiting some association, two diagonally opposite cells have o > e while the other two have o
CA = (o-major - e-major)/(o-major - e-major + 2m)

is the amount by which the observed frequency in the major diagonal exceeds its expected value, expressed as a proportion of the largest value that difference could be given the observed marginal totals. It can be shown that CA = OGE if the column marginal total for each entry in the major diagonal equals the row marginal total for that same entry. The larger the difference between these row and column marginal totals, the more CA exceeds OGE.

We can also give CA a somewhat different interpretation. Define the "extreme pattern" as the set of four cell frequencies observed when the cells in the major diagonal contain their maximum possible frequencies under the current marginals; at least one of the other two cells will then contain zero. Then pick any one of the four cells and compute (observed-null)/(extreme-null), where these values are defined as the values in that cell under the observed, null, and extreme patterns. It turns out that this procedure yields the same value as CA, regardless of what cell you picked. Therefore CA can also be described as the difference between the observed and null frequencies in any cell, expressed as a proportion of the largest possible difference in the same direction, given the observed marginal totals.

Similarity [0, 1].
Usage: for binary data.

Version:
2004-Nov-06
Author:
Martin Hassel

Constructor Summary
BinCAS_row()
           
 
Method Summary
 double[][] analyzeMatrix(float[][] data_matrix)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BinCAS_row

public BinCAS_row()
Method Detail

analyzeMatrix

public double[][] analyzeMatrix(float[][] data_matrix)
Specified by:
analyzeMatrix in class SimilarityMeasure
Parameters:
data_matrix - an array of float arrays denoting the rows in the matrix that are to be compared for similarity.
Returns:
an array of double arrays containing the similarities between all rows