Linq, Python, Sql, need advice for TSS WSS BSS calculation
-
Hi i have a table with the output of multivariate cluster analysis:
Var1,Var2,Var3,..VarN, k2,k3,k4...kn
where Var1 to n are the variables of study,
and k2 to kn the cluster clasification.Table Example:
Var1,Var2,Var3,Var4,k2,k3,k4,k5,k6
3464.57,2992.33,2688.33,504.79,2,3,2,3,2
2895.32,3365.35,2824.35,504.86,1,2,3,2,6
2249.32,3300.19,2382.19,504.92,2,1,4,3,4
3417.81,3311.04,2426.04,504.97,1,2,2,5,2
3329.66,3497.14,2467.14,505.03,2,2,1,4,2
3087.85,3653.53,2296.53,505.09,2,1,2,3,4The c++ storage will be defined like:
QList<record> table;Struct record
{
QList<double> vars;
QList<int> cluster;
}I need to calculate the total, the within group and the between group square sum.
https://en.wikipedia.org/wiki/F-test
So by example to calculate WSS for Var1 and k2 need to:
in pseudo code:
get the size of every group:
count(*) group by(k2),
calculate the mean of every group:
sum(Var1) group by(k2), and then divide every one by the previous count.
compute the diference:
pow((xgroup1-xmeangroup1),2)
and many other operations....Which alternatives will have more easy and powerfull codification:
1)Create a MySQL table on the fly and make SQL operations.
2)Use LINQ, but i does not if QT have QTLinq class.
3)Try to make trough Python Equivalents of LINQ Methods,
(how is the interaction between QT and Python, I see that Qgis have many plugin writed in Python)Also in my app need to many make other calculus.
I hope to be clear.
Greetings -
After some time I respond to my self,
the solution was maked in Python with Pandas.This link is very ussefull:
Iterating through groups on: http://pandas.pydata.org/pandas-docs/stable/groupby.html
Also the book "Python for Data Analysis, West McKinney" pag 255This video show how to make calculation:
ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy
https://www.youtube.com/watch?v=j9ZPMlVHJVs[code]```
def getDFrameFixed2D():
y = np.array([3,2,1,5,3,4,5,6,7])
k = np.array([1,1,1,2,2,2,3,3,3])
clusters = pd.DataFrame([[a,b] for a,b in zip(y,k)],columns=['Var1','K2'])
# print (clusters.head()) print("shape(0):",clusters.shape[0])
return clustersX2D=getDFrameFixed2D()
MainMean = X2D['Var1'].mean(0)
print("Main mean:",MainMean)grouped = X2D['Var1'].groupby(X2D['K2'])
print("-----Iterating Over Groups-------------")
Wss=0
Bss=0
for name, group in grouped:
#print(type(name))
#print(type(group))
print("Group key:",name)
groupmean = group.mean(0)
groupss = sum((group-groupmean)**2)
print(" groupmean:",groupmean)
print(" groupss:",groupss)
Wss+= groupss
Bss+= ((groupmean - MainMean)**2)*len(group)print("----------------------------------")
print("Wss:",Wss)
print("Bss:",Bss)
print("T=B+W:",Bss+Wss)Tss = np.sum((X-X.mean(0))**2)
print("Tss:",Tss)
print("----------------------------------")
//your code here[/code] The next job is traduce to c++ or use Python on QT Greetings