[Moved] Help needed for SQL query

alex2202375 · wrote on 6 Jul 2011, 03:14

[EDIT: moved to brainstorm, as it's not Qt related, Volker]

I am not expert on database and meet a problem on SQL query, can anyone help to solve it?

Background:
Have 3 tables A(column a), B(column b), A_B(columns a, b). A stores data A1, A2, A3..., B stores B1, B2, B3, B4..., A_B stores the relationship of A and B such like {A1, B1}, {A1, B2}, {A2, B1}, {A2, B2}, {A2, B4}.

Question:
How to use SQL to select A data that A related B data has some special ones like B1, B2 and does not have some ones like B4, B5.
The result of above should be A1, because A2 has B4.

My solution is :
@SELECT DISTINCT a FROM A_B WHERE b in (B1, B2)
MINUS
SELECT DISTINCT a FROM A_B WHERE b in (B4) @

But this query's performance is not good enough(twice query), does anyone know how to write a SQL with more efficiency?

cincirin · wrote on 6 Jul 2011, 06:46

How about SELECT DISTINCT a FROM A_B WHERE (b IN (B1, B2)) AND (b NOT IN (B4)) ?

alex2202375 · wrote on 6 Jul 2011, 07:14

[quote author="cincirin" date="1309934816"]How about SELECT DISTINCT a FROM A_B WHERE (b IN (B1, B2)) AND (b NOT IN (B4)) ?[/quote]

I don't think this would work, for data columns {A1, B1}, {A1, B4}, the return value will still A1..

lgeyer · wrote on 6 Jul 2011, 07:23

I do not really see how this would work without a second select statement. However, I think that it might not even matter how the statement is built (either using MINUS, EXCEPT, NOT IN, NOT EXISTS or JOIN) as after leaving the query optimizer it will all look the same and it will be heavily DBMS-related which one performs best.

If you really strive for performance you will have to build different versions and profile them.

Be aware the MINUS is DBMS-specific and not portable, DISTINCT ... EXCEPT should be.

alex2202375 · wrote on 6 Jul 2011, 07:34

@Lukas Geyer,
I agree with you, and do you know how to find A which has both B1,B4?

lgeyer · wrote on 6 Jul 2011, 07:48

I still would go with a sub-select @ SELECT a FROM a_b WHERE b = 1 AND a IN (SELECT a FROM a_b WHERE b = 4) @ An intersection would be another possibility @ SELECT a FROM a_b WHERE b = 1 INTERSECT SELECT a FROM a_b WHERE b = 4 @

Performance is still up to the query optimizer and would require profiling to find the "best" (which is probably not the fastest) solution.

alex2202375 · wrote on 6 Jul 2011, 08:10

@Lukas Geyer,
Thanks for your solution, so if there would be more B to be included, there would be more "Select" needed?

lgeyer · wrote on 6 Jul 2011, 08:20

I think so.

loladiro · wrote on 6 Jul 2011, 08:33

Or something like this
@SELECT a FROM (SELECT a, COUNT(DISTINCT b) as c a_b WHERE b IN(1,4) GROUP BY a HAVING c=2)@

alex2202375 · wrote on 6 Jul 2011, 08:57

[quote author="loladiro" date="1309941231"]Or something like this
@SELECT a FROM (SELECT a, COUNT(DISTINCT b) as c a_b WHERE b IN(1,4) GROUP BY a HAVING c=2)@[/quote]

Good, that's what I want.