Differences

This shows you the differences between two versions of the page.

--- faculty:denton:writingpapers [2017/01/13 23:40] – [ICDM Paper Deadlines] localadmin
+++ faculty:denton:writingpapers [2017/01/13 23:41] (current) – [SSE and ARM on Continuous Multi-dimenstional data (Matt)] localadmin
@@ Line 3: / Line 3: @@
 Discussion on what to put into a paper.  Also some notes on deadlines for the next papers.
-======  Alan's Paper Writing Recommendations ======
+=====  Alan's Paper Writing Recommendations =====
    * Use present tense
@@ Line 9: / Line 9: @@
    * Punctuate equations
-======  Paper Components ======
+=====  Paper Components =====
 Abstract, Introduction, and Conclusion are fairly fixed.  The others are required but may be arranged as appropriate
@@ Line 55: / Line 55: @@
    * Equations are the new math in the algorithm section
-======  ICDM Paper Deadlines ======
+=====  ICDM Paper Deadlines =====
 Those submitting to the next conference (ICDM July 5) must meet the following deadlines:
@@ Line 69: / Line 69: @@
 This should provide motivation for the completion of your theory and experiments.  Typically you can expect 2 more weeks of writing if you have this preliminary introduction with a good start on respectable results.  Note that Dr. Denton may do final introductions for first time students (or students in general), but you still should do introduction work as motivation and direction.
-====== LaTex and Writing ======
+===== LaTex and Writing =====
 We have a general standard of using <notwiki>LaTex</notwiki>, but it is not required.  You may use Word if you are more comfortable (most writing is just text anyways).  Keep in mind Dr. Denton would tend to contribute some formulae for respectable equations.  In this case, she will still use hand-written or <notwiki>LaTex</notwiki> -- you must convert that into your format.
@@ Line 75: / Line 75: @@
 Our current setup is to use a free version of <notwiki>MikTex</notwiki> along with any editor with syntax highlighting and macro ability (to compile the latex).  Examples are <notwiki>TextPad</notwiki> with the appropriate style file (you will need to setup the compile macros), or <notwiki>WinTex</notwiki> on a cd in the lab, or a plug-in for the Eclipse editor.
+=====  SSE and ARM on Continuous Multi-dimenstional data (Matt) =====
+An update on Matt's work.
+A big point is that typical statistics is concerned with the entire data set.  We will focus on finding the most useful subset of the data using measures of sum of squared errors (SSE) and Person's Correlation (phi).
+Typical ARM finds itemsets that have support >= to the min. support.  We are looking for data subsets that have a SSE <= max. SSE.  More specific, the highest SSE below the threshold.
+The search has two aspects. 1) Intersection of transactions by growing the number of attributes considered.  For example, attribute A and B have <= transactions as either A or B by themselves.  Similar, thresholds of adjacent values for a specific attribute have a union relation to transactions. For example, A with range 1 or range 1 to 2 (i.e., A(1) or A(1,2)).
+The single attribute setup goes like:\\
+''
+\\
+\\
+\\
+\\
+\\
+\\
+''
+Multiple attributes go like:\\
+''A(1)\\
+A(12)\\
+B(1)\\
+C(1)\\
+A(1)B(1)\\
+:\\
+:\\
+''
+So, SSE across attribute sets is upward closed while SSE within a single attribute is downward closed.  Since we want the max. SSE below the threshold we could start with the largest set for each attribute and combine different ranges for each.  Once we drop below the threshold for an attribute combination, we do not need to look at smaller single attribute set level (since the smaller ones will always have a smaller SSE).  For example, if {A(23), B(12)} <= max SSE then we do not need to consider changing A to A(2) or A(3).