Linq, Python, Sql, need advice for TSS WSS BSS calculation
Hi i have a table with the output of multivariate cluster analysis:
where Var1 to n are the variables of study,
and k2 to kn the cluster clasification.
The c++ storage will be defined like:
I need to calculate the total, the within group and the between group square sum.
So by example to calculate WSS for Var1 and k2 need to:
in pseudo code:
get the size of every group:
count(*) group by(k2),
calculate the mean of every group:
sum(Var1) group by(k2), and then divide every one by the previous count.
compute the diference:
and many other operations....
Which alternatives will have more easy and powerfull codification:
1)Create a MySQL table on the fly and make SQL operations.
2)Use LINQ, but i does not if QT have QTLinq class.
3)Try to make trough Python Equivalents of LINQ Methods,
(how is the interaction between QT and Python, I see that Qgis have many plugin writed in Python)
Also in my app need to many make other calculus.
I hope to be clear.
After some time I respond to my self,
the solution was maked in Python with Pandas.
This link is very ussefull:
Iterating through groups on: http://pandas.pydata.org/pandas-docs/stable/groupby.html
Also the book "Python for Data Analysis, West McKinney" pag 255
This video show how to make calculation:
ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy
y = np.array([3,2,1,5,3,4,5,6,7])
k = np.array([1,1,1,2,2,2,3,3,3])
clusters = pd.DataFrame([[a,b] for a,b in zip(y,k)],columns=['Var1','K2'])
# print (clusters.head()) print("shape(0):",clusters.shape)
MainMean = X2D['Var1'].mean(0)
grouped = X2D['Var1'].groupby(X2D['K2'])
print("-----Iterating Over Groups-------------")
for name, group in grouped:
groupmean = group.mean(0)
groupss = sum((group-groupmean)**2)
Bss+= ((groupmean - MainMean)**2)*len(group)
Tss = np.sum((X-X.mean(0))**2)
//your code here
[/code] The next job is traduce to c++ or use Python on QT Greetings