[Moved] Help needed for SQL query

alex2202375

[EDIT: moved to brainstorm, as it's not Qt related, Volker]

I am not expert on database and meet a problem on SQL query, can anyone help to solve it?

Background:
Have 3 tables A(column a), B(column b), A_B(columns a, b). A stores data A1, A2, A3..., B stores B1, B2, B3, B4..., A_B stores the relationship of A and B such like {A1, B1}, {A1, B2}, {A2, B1}, {A2, B2}, {A2, B4}.

Question:
How to use SQL to select A data that A related B data has some special ones like B1, B2 and does not have some ones like B4, B5.
The result of above should be A1, because A2 has B4.

My solution is :
@SELECT DISTINCT a FROM A_B WHERE b in (B1, B2)
MINUS
SELECT DISTINCT a FROM A_B WHERE b in (B4) @

But this query's performance is not good enough(twice query), does anyone know how to write a SQL with more efficiency?

cincirin

How about SELECT DISTINCT a FROM A_B WHERE (b IN (B1, B2)) AND (b NOT IN (B4)) ?

alex2202375

[quote author="cincirin" date="1309934816"]How about SELECT DISTINCT a FROM A_B WHERE (b IN (B1, B2)) AND (b NOT IN (B4)) ?[/quote]

I don't think this would work, for data columns {A1, B1}, {A1, B4}, the return value will still A1..

lgeyer

I do not really see how this would work without a second select statement. However, I think that it might not even matter how the statement is built (either using MINUS, EXCEPT, NOT IN, NOT EXISTS or JOIN) as after leaving the query optimizer it will all look the same and it will be heavily DBMS-related which one performs best.

If you really strive for performance you will have to build different versions and profile them.

Be aware the MINUS is DBMS-specific and not portable, DISTINCT ... EXCEPT should be.

alex2202375

@Lukas Geyer,
I agree with you, and do you know how to find A which has both B1,B4?

lgeyer

I still would go with a sub-select @ SELECT a FROM a_b WHERE b = 1 AND a IN (SELECT a FROM a_b WHERE b = 4) @ An intersection would be another possibility @ SELECT a FROM a_b WHERE b = 1 INTERSECT SELECT a FROM a_b WHERE b = 4 @

Performance is still up to the query optimizer and would require profiling to find the "best" (which is probably not the fastest) solution.

alex2202375

@Lukas Geyer,
Thanks for your solution, so if there would be more B to be included, there would be more "Select" needed?

lgeyer

I think so.

loladiro

Or something like this
@SELECT a FROM (SELECT a, COUNT(DISTINCT b) as c a_b WHERE b IN(1,4) GROUP BY a HAVING c=2)@

alex2202375

[quote author="loladiro" date="1309941231"]Or something like this
@SELECT a FROM (SELECT a, COUNT(DISTINCT b) as c a_b WHERE b IN(1,4) GROUP BY a HAVING c=2)@[/quote]

Good, that's what I want.