Linq, Python, Sql, need advice for TSS WSS BSS calculation



  • Hi i have a table with the output of multivariate cluster analysis:
    Var1,Var2,Var3,..VarN, k2,k3,k4...kn
    where Var1 to n are the variables of study,
    and k2 to kn the cluster clasification.

    Table Example:
    Var1,Var2,Var3,Var4,k2,k3,k4,k5,k6
    3464.57,2992.33,2688.33,504.79,2,3,2,3,2
    2895.32,3365.35,2824.35,504.86,1,2,3,2,6
    2249.32,3300.19,2382.19,504.92,2,1,4,3,4
    3417.81,3311.04,2426.04,504.97,1,2,2,5,2
    3329.66,3497.14,2467.14,505.03,2,2,1,4,2
    3087.85,3653.53,2296.53,505.09,2,1,2,3,4

    The c++ storage will be defined like:
    QList<record> table;

    Struct record
    {
    QList<double> vars;
    QList<int> cluster;
    }

    I need to calculate the total, the within group and the between group square sum.

    https://en.wikipedia.org/wiki/F-test

    So by example to calculate WSS for Var1 and k2 need to:
    in pseudo code:
    get the size of every group:
    count(*) group by(k2),
    calculate the mean of every group:
    sum(Var1) group by(k2), and then divide every one by the previous count.
    compute the diference:
    pow((xgroup1-xmeangroup1),2)
    and many other operations....

    Which alternatives will have more easy and powerfull codification:

    1)Create a MySQL table on the fly and make SQL operations.
    2)Use LINQ, but i does not if QT have QTLinq class.
    3)Try to make trough Python Equivalents of LINQ Methods,
    (how is the interaction between QT and Python, I see that Qgis have many plugin writed in Python)

    Also in my app need to many make other calculus.

    I hope to be clear.
    Greetings



  • After some time I respond to my self,
    the solution was maked in Python with Pandas.

    This link is very ussefull:
    Iterating through groups on: http://pandas.pydata.org/pandas-docs/stable/groupby.html
    Also the book "Python for Data Analysis, West McKinney" pag 255

    This video show how to make calculation:
    ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy
    https://www.youtube.com/watch?v=j9ZPMlVHJVs

    [code]```
    def getDFrameFixed2D():
    y = np.array([3,2,1,5,3,4,5,6,7])
    k = np.array([1,1,1,2,2,2,3,3,3])
    clusters = pd.DataFrame([[a,b] for a,b in zip(y,k)],columns=['Var1','K2'])
    # print (clusters.head()) print("shape(0):",clusters.shape[0])
    return clusters

    X2D=getDFrameFixed2D()
    MainMean = X2D['Var1'].mean(0)
    print("Main mean:",MainMean)

    grouped = X2D['Var1'].groupby(X2D['K2'])

    print("-----Iterating Over Groups-------------")
    Wss=0
    Bss=0
    for name, group in grouped:
    #print(type(name))
    #print(type(group))
    print("Group key:",name)
    groupmean = group.mean(0)
    groupss = sum((group-groupmean)**2)
    print(" groupmean:",groupmean)
    print(" groupss:",groupss)
    Wss+= groupss
    Bss+= ((groupmean - MainMean)**2)*len(group)

    print("----------------------------------")
    print("Wss:",Wss)
    print("Bss:",Bss)
    print("T=B+W:",Bss+Wss)

    Tss = np.sum((X-X.mean(0))**2)
    print("Tss:",Tss)
    print("----------------------------------")
    //your code here

    [/code]
    
    The next job is traduce to c++ or use Python on QT
    Greetings

Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.