Alternative to group by (doubt)

Dúvidas, dicas e truques de SQL, Select, Update, Delete, cláusulas, operações com joins, Funções em SQLs, etc
  

Poston Mon, 25 Feb 2008 4:05 pm

Given a table, I want to get the value of the campo_A registry key that contains the highest value of campo_B (which can be one or more than one).
So, I wrote the following code: select campo_A from (select campo_A from table order by campo_B desc) where rownum = 1;

An alternative would be to: select campo_A from table where campo_B = (select max (campo_B) from table) But I saw that the first query is more optimized. Question: this can fail or my reasoning is correct?
Ricardo Carmo
Location: Maceió-AL

Poston Mon, 25 Feb 2008 9:28 pm

This depends on a few factors.

In the first query, you are accessing all your table only once, ordering the results and returning the first record.

In the second cosulta, you are reading the same table twice. A to get the biggest date in the table, and another to fetch the records that have this date.


The main factor, under normal conditions, is: * there is an index on your table on campo_b?

If it does not exist:-the first query will take the trouble to read the entire table, sort it, and return the first record to find. In this case, is the best thing you can do ...-the second query will perform a full table scan to find the largest value of campo_b. then will be held a new full scan on the same table, to find the records that have the campo_b equal to the value found (as the campo_b is not indexed, this second full scan will occur without doubts).


Now, if there is an index on the campo_b (and that he is being used by optimizer, of course):-the first query will continue performing the same process. It may be that in the place of a full table scan the optimizer chooses to perform a full scan index on the index (this would depend on the query optimizer uses statistics, table, and columns returned by the inner query must also be part of the index-I'vê read about this several times, but I honestly don't remember until today have seen an execution plan by following this path ...)Although both can be very time-consuming, especially in a table with many records (not forgetting the cost to order the records, after retrieving them).

-The second query, however, would fare much better under these conditions. In this case, the query probably would hold a full index scan (min/max) on the index, to recover the greater value contained. Once retrieved from this value, the index is accessed again for records containing this value may be returned.

This, of course, assuming that the campo_b cannot have repeated values or that the field has few repetitions.

If the campo_b can take a few distinct values, and with many repetitions (say your campo_b is the status field of the customer table with a few million records), an index on the field could take its toll on the performance if compared to the results for the same query in the first situation.


Anyway: it depends, and ... (now at the end of the text I realized that I wrote and write anything I didn't:P) Every case is different, and there is no PartyPoker a simple rule to follow.

A tip is to always analyze the execution plan generated by their queries (for this, a good knowledge of the tables being analyzed is fundamental), comparing it with the plan generated by other ways to write the same query, to find the \"ideal\" query for your case.
rogenaro
Location: Londrina - PR

Rafael O. Genaro

Poston Mon, 25 Feb 2008 10:45 pm

Rafael you of my no matter the gambiarra what matters in the end is the execution plan .... right ... and the basic rules of always greater for smaller indices ... and do join, always avoid while maximo other shit like outher join
ruevers
Location: sp


Poston Tue, 26 Feb 2008 9:07 am

Rafael, thank you very much for your comments, you helped, including making me see about the benefits of index on the campo_b in the second query. On the other hand, I misspoke in question, because what I'd really like to know is: in the first query, I really have the guarantee of getting the record that contains the highest value in the campo_b (since the inner query is of the form ordernada descendant)?

Or will the rownum = 1 can return any of the records returned by the inner query?
Ricardo Carmo
Location: Maceió-AL

Poston Tue, 26 Feb 2008 7:26 pm

Oops, I may have misread the question = P in the first query, as you put the condition \"where rownum = 1\" in an external query to do the ordering, always you will return a single record, which has the highest value registered on campo_b, even if there are two or more records with the largest value in the table.
In the case of records with duplicate values, anyone can be returned by the query depends on the order that the Bank recovered records, if I'm not mistaken.

In the second query, you always have an answer all the records with the largest value of campo_b in the table. If the field is duplicated in the table, both records would be returned.

If the campo_b is the key of the table (without any other field making part of the key) or if there is a constraint only on it, the second query will always return a single record and is equivalent to primieira (otherwise, you can't guarantee anything ...)
rogenaro
Location: Londrina - PR

Rafael O. Genaro


  • See also
    Replies
    Views
    Last Post


Return to SQL

Who is online

Users browsing this forum: Bing [Bot] and 3 guests