You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please consider to implement aggregate function MEDIAN() which can be very useful when we want to evaluate 'most typical' value from set which does contain several values with big deviation: { 12, 13, 11, 14, 11, 11, 11, 12, 9999, 1, 12,13, 13 } ==> median = 12 (compare with average = 779.46).
Apart from a convenience, I wonder if this could be a substantial speedup - currently I SELECT COUNT (*), then figure out whether I need one or two rows and from where, then SELECT AVG(x) FROM (SELECT x ORDER BY x ROWS a TO b). Any speed improvement for that would be welcome.
I doubt that finding median can be efficiently done using current SQL / PSQL features in FB.
Effective methods known today solve this task using intensive permutations of array elements.
This obviously can not be implmented with normal performance if we will operate with *tables* (cost of disk i/o will kill all such attempts).
There is 'quickselect' method (by Tony Hoare) for selecting Kth largest element and it can be adapted for searching median.
It provides performanse O(n) but it does this only in *average*. If pivot element (that we select for partition) will be largest on each step - we will face O(n^2).
Because of this, one may to consider more effective method developed by Blum-Floyd-Pratt-Rivest-Tarjan
( http://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf )
See in attachment small Python script from this article (if it will occasinally become "404" in the future).
Usage:
python find-median-usin-quickselect_-_blum-floyd-pratt-rivest-tarjan.py
Submitted by: @pavel-zotov
Attachments:
find-median-usin-quickselect_-_blum-floyd-pratt-rivest-tarjan.py.7z
Votes: 3
Please consider to implement aggregate function MEDIAN() which can be very useful when we want to evaluate 'most typical' value from set which does contain several values with big deviation: { 12, 13, 11, 14, 11, 11, 11, 12, 9999, 1, 12,13, 13 } ==> median = 12 (compare with average = 779.46).
Explanation: https://en.wikipedia.org/wiki/Median
Online eval.: http://www.alcula.com/calculators/statistics/median/
PS. Oracle has this function since 10g.
The text was updated successfully, but these errors were encountered: