moj.similarity.measures
Class BinCAS_row
java.lang.Object
moj.similarity.SimilarityMeasure
moj.similarity.measures.BinCAS_row
public class BinCAS_row
- extends SimilarityMeasure
CA MEASURE OF ASSOCIATION
http://www.psych.cornell.edu/Darlington/crosstab/table3.htm
CA (for "conditional association") is a modification of OGE that in
principle can be defined for tables of any size, but in practice is computed
easily only for 2x2 tables, so we explain it in those terms. CA deals with
the fact that phi and OGE may be low partly because of the difference between
row and column marginal totals. The larger this difference, the lower phi and
OGE will usually be. CA is independent of the difference in marginal totals,
and can be 1 regardless of the size of that difference.
In a 2x2 table exhibiting some association, two diagonally opposite cells
have o > e while the other two have o
CA = (o-major - e-major)/(o-major - e-major + 2m)
is the amount by which the observed frequency in the major diagonal
exceeds its expected value, expressed as a proportion of the largest value
that difference could be given the observed marginal totals. It can be shown
that CA = OGE if the column marginal total for each entry in the major
diagonal equals the row marginal total for that same entry. The larger the
difference between these row and column marginal totals, the more CA exceeds OGE.
We can also give CA a somewhat different interpretation. Define the
"extreme pattern" as the set of four cell frequencies observed when the cells
in the major diagonal contain their maximum possible frequencies under the
current marginals; at least one of the other two cells will then contain zero.
Then pick any one of the four cells and compute (observed-null)/(extreme-null),
where these values are defined as the values in that cell under the observed,
null, and extreme patterns. It turns out that this procedure yields the same
value as CA, regardless of what cell you picked. Therefore CA can also be
described as the difference between the observed and null frequencies in any
cell, expressed as a proportion of the largest possible difference in the
same direction, given the observed marginal totals.
Similarity [0, 1].
Usage: for binary data.
- Version:
- 2004-Nov-06
- Author:
- Martin Hassel
BinCAS_row
public BinCAS_row()
analyzeMatrix
public double[][] analyzeMatrix(float[][] data_matrix)
- Specified by:
analyzeMatrix
in class SimilarityMeasure
- Parameters:
data_matrix
- an array of float
arrays denoting the
rows in the matrix that are to be compared for similarity.
- Returns:
- an array of
double
arrays containing the
similarities between all rows