Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support value distribution histograms [CORE1686] #2111

Open
firebird-automations opened this issue Jan 12, 2008 · 18 comments
Open

Support value distribution histograms [CORE1686] #2111

firebird-automations opened this issue Jan 12, 2008 · 18 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Timo Partanen (partim)

Is duplicated by CORE2381
Relate to CORE4666

Votes: 12

To provide better performance for queries, Firebird should support value distribution histograms. A histogram is a statistical report that shows the frequency of values within steps or ranges of values that fall between a certain minimum and maximum. Generally, histograms improve the performance of queries because the optimizer can use them to estimate the selectivity of conditions better than from the selectivity of indexes.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 13813 ] => Firebird [ 14121 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Suggested by Bill Oliver:

When this is added I would like to see the histogram information available through the monitoring tables, too. Information that might be surfaced might include:

- Column's Most Common Value (MCV)
- The frequency of the most common value, call this Most Common Frequency (MCF)
- Histogram information

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE2381 [ CORE2381 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Philip Williams (unordained)

Is NULL handled as a special case in the histogram?

Can other values be requested to be treated as special, for situations where you know a field is being abused with special values (e.g. the field has a very regular distribution, except for the values -1, 0, and 1, which are each individually special and very different from > 1.)

@firebird-automations
Copy link
Collaborator Author

Commented by: @krilbe

In ref to Philip Williams' comment: I vote strongly for treating null as a special case in the histogram, i.e. always keep a specific frequency/selectivity figure for null, if the indexed column allows null. For a compound index, I'm not quite sure how to handle cases where some columns are null and some are not, but i assume that the all-null case should be treated separately, i.e. like null for a single-column index.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Yes, nulls are to be treated separately.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 Beta 1 [ 10332 ]

Fix Version: 3.0 Alpha 1 [ 10331 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ]

Fix Version: 3.0 Beta 1 [ 10332 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

Link: This issue relate to CORE4666 [ CORE4666 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Atri Sharma (atris)

Hi,

I am a relational database developer, focused on many areas.

I would like to start hacking Firebird,with this issue, if possible.

Please advice.

@firebird-automations
Copy link
Collaborator Author

Commented by: Ann Harrison (awharrison)

Histograms are less useful in database like Firebird that optimize queries
at compile time rather than run time. Knowing the value distribution
in an index is of little use to the optimizer if it doesn't know what value will
be searched. Not that this feature shouldn't be implemented, just don't
expect it to improved performance dramatically in cases like this:

Select c.zipcode
from customers c
inner join orders o on o.customerId = c.customerId
inner join orderItems oi on oi.orderId = o.orderId
where oi.item = ? and c.state = ?

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 4.0 Alpha 1 [ 10731 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ]

Fix Version: 4.0 Alpha 1 [ 10731 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ] =>

@ertankucukoglu
Copy link

Databases I deal with includes both NULLs and EMPTY strings. It is not feasible to convert EMTPY strings to NULLs because the risk that applications start giving errors. I'm read in a recent discussion that histograms might be of help for EMPTY string values.
There are more than several reports about "worse" performance of v3 or v4 versions compared to v2.5 or earlier.
Since histograms are expected to solve some of such problems even not all, I'd like to see them added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants