Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halve the speed of FbDataReader.GetOrdinal() [DNET214] #222

Closed
firebird-automations opened this issue Feb 21, 2009 · 10 comments
Closed

Halve the speed of FbDataReader.GetOrdinal() [DNET214] #222

firebird-automations opened this issue Feb 21, 2009 · 10 comments

Comments

@firebird-automations
Copy link

Submitted by: Gareth Goslett (gareth)

Attachments:
FbDataReaderTest.cs
FbDataReader.patch

By creating a hashtable on first access to the GetOrdinal method and populating it with the fields and indexes we only need to get the hash code of the requested field name for future lookups.

I did some tests on a table with 10 fields and 1000 records. I read all records 5 times so there were 5 * 1000 * 10 lookups.
My tests were run under NUnit so they are all worse than normal. Results are in milliseconds

CURRENT GetOrdinal
Total ordinal reads: 5000
Result 13
Result 11
Result 11
Result 10
Result 10
Average time 0.011
Total named reads : 5000
Result 33
Result 33
Result 36
Result 32
Average time 0.0268

NEW GetOrdinal
Total ordinal reads: 5000
Result 13
Result 9
Result 10
Result 10
Result 9
Average time 0.0102
Total named reads : 5000
Result 13
Result 16
Result 14
Result 14
Average time 0.0114

Here is the new method:

Hashtable names;
public override int GetOrdinal(string name)
{
this.CheckState();

if \(names == null\)
\{
	names = new Hashtable\(\);
  for \(int i = 0; i < this\.fields\.Count; i\+\+\)
  \{
  	names\.Add\(this\.fields\[i\]\.Alias\.ToLowerInvariant\(\)\.GetHashCode\(\), i\);
  \}
\}

int hash = name\.ToLowerInvariant\(\)\.GetHashCode\(\);

if \(names\.ContainsKey\(hash\) == true\)
\{
	return \(Int32\)names\[hash\];
\}

throw new IndexOutOfRangeException\("Could not find specified column in results\."\);

}

Commits: 902fbea

@firebird-automations
Copy link
Author

Modified by: Gareth Goslett (gareth)

description: By creating a hashtable on first access to the GetOrdinal method and populating it with the fields and indexes we only need to get the hash code of the requested field name for future lookups.

I did some tests on a table with 10 fields and 1000 records. I read all records 5 times so there were 5 * 1000 * 10 lookups.

CURRENT GetOrdinal
Total ordinal reads: 5000
Result 13
Result 11
Result 11
Result 10
Result 10
Average time 0.011
Total named reads : 5000
Result 33
Result 33
Result 36
Result 32
Average time 0.0268

NEW GetOrdinal
Total ordinal reads: 5000
Result 13
Result 9
Result 10
Result 10
Result 9
Average time 0.0102
Total named reads : 5000
Result 13
Result 16
Result 14
Result 14
Average time 0.0114

Here is the new method:

Hashtable names;
public override int GetOrdinal(string name)
{
this.CheckState();

if \(names == null\)
\{
	names = new Hashtable\(\);
  for \(int i = 0; i < this\.fields\.Count; i\+\+\)
  \{
  	names\.Add\(this\.fields\[i\]\.Alias\.ToLowerInvariant\(\)\.GetHashCode\(\), i\);
  \}
\}

int hash = name\.ToLowerInvariant\(\)\.GetHashCode\(\);

if \(names\.ContainsKey\(hash\) == true\)
\{
	return \(Int32\)names\[hash\];
\}

throw new IndexOutOfRangeException\("Could not find specified column in results\."\);

}

=>

By creating a hashtable on first access to the GetOrdinal method and populating it with the fields and indexes we only need to get the hash code of the requested field name for future lookups.

I did some tests on a table with 10 fields and 1000 records. I read all records 5 times so there were 5 * 1000 * 10 lookups.
My tests were run under NUnit so they are all worse than normal. Results are in milliseconds

CURRENT GetOrdinal
Total ordinal reads: 5000
Result 13
Result 11
Result 11
Result 10
Result 10
Average time 0.011
Total named reads : 5000
Result 33
Result 33
Result 36
Result 32
Average time 0.0268

NEW GetOrdinal
Total ordinal reads: 5000
Result 13
Result 9
Result 10
Result 10
Result 9
Average time 0.0102
Total named reads : 5000
Result 13
Result 16
Result 14
Result 14
Average time 0.0114

Here is the new method:

Hashtable names;
public override int GetOrdinal(string name)
{
this.CheckState();

if \(names == null\)
\{
	names = new Hashtable\(\);
  for \(int i = 0; i < this\.fields\.Count; i\+\+\)
  \{
  	names\.Add\(this\.fields\[i\]\.Alias\.ToLowerInvariant\(\)\.GetHashCode\(\), i\);
  \}
\}

int hash = name\.ToLowerInvariant\(\)\.GetHashCode\(\);

if \(names\.ContainsKey\(hash\) == true\)
\{
	return \(Int32\)names\[hash\];
\}

throw new IndexOutOfRangeException\("Could not find specified column in results\."\);

}

summary: Double the speed of FbDataReader.GetOrdinal() => Halve the speed of FbDataReader.GetOrdinal()

@firebird-automations
Copy link
Author

Commented by: Gareth Goslett (gareth)

The unit test which I ran to get the results.
I am creating a generic database access layer and the unit test is specific for that. I am including it here so that you can see if my test was okay.

@firebird-automations
Copy link
Author

Modified by: Gareth Goslett (gareth)

Attachment: FbDataReaderTest.cs [ 11374 ]

@firebird-automations
Copy link
Author

Commented by: @cincuranet

I was thinking little bit about this kind of approach. Some ideas to discuss:
* We should generate these hashes probably not on first access for all fields but when this particular field is accessed first. Sometimes you're reading only one or two rows/columns and the overhead for generating the hashtable (espacially for a lot of columns) will be too big.
* I would rather not use Invariant culture, because it may cause colisions (we have these problems already) and maybe we don't have to use Contains method, because if it's not there, it result will be null and that's easy to understand for us, that the column name is wrong.

@firebird-automations
Copy link
Author

Commented by: Gareth Goslett (gareth)

If we only store the hash at the time of request we will need to lookup the index of the field.
for (int i = 0; i < this.fields.Count; i++)
if (String.Compare(name, this.fields[i].Alias, StringComparison.OrdinalIgnoreCase) == 0)
{
namedIndex.Add(hash, i);
return i;
}
The number of possible string comparisons gets larger with more fields.

I like the generation on first access method but there does not seem to be a performance difference between that method and the cache when needed method.
If someone selects 100 fields but only wants to access the first field, they do not care too much about performance. :)

I changed the hash method to the following:
int hash = name.ToUpper(CultureInfo.CurrentCulture).GetHashCode();
ToUpper() is preferred by FxCop over ToLower(), something about it supporting more precise conversions.
I saw that "CultureInfo.CurrentCulture" is used alot in the client library.

@firebird-automations
Copy link
Author

Commented by: @cincuranet

Don't start with implementation if still discussing how to do it. It's just worthless.

If you access the field, you have to do the lookup anyway right now, so we will only store the value for future usage (next rows). Pregeneration all info will not be a good for small results. And in fact, when you finish reading first complete row, next rows will be cached and if you stop after, you will not notice any performance gap.

> If someone selects 100 fields but only wants to access the first field, they do not care too much about performance. :)
That's not true. You can get some summary info in first row and then reading the rest of data. Or you're reading some data but on some places you're using only ExecuteScalar.

If you wanna to discuss this more, join list.

@firebird-automations
Copy link
Author

Commented by: Gareth Goslett (gareth)

Replacement GetOrdinal() method.
I used String.ToUpper() to normalize the user supplied field name, the FxCop explanation is here: "http://msdn.microsoft.com/en-us/library/bb386042.aspx".

@firebird-automations
Copy link
Author

Modified by: Gareth Goslett (gareth)

Attachment: FbDataReader.patch [ 11380 ]

@firebird-automations
Copy link
Author

Commented by: @cincuranet

I've done some tests and filling my hashtable takes even for 1000 columns only 2ms (on my machine in VPC). That's IMO acceptable, taking into account, that in more than ~10 rows the performance benefit will be seen. It's implemented little bit differently - generics and only one lookup to see whether the key is there for better performance.

I'm storing directly name of column in uppercase, because when measuring the difference between string and int (+lookup time too) key, the results were same (in measurement error) (on 10000 columns). In fact for lookup for the string key was little bit faster.

@firebird-automations
Copy link
Author

Modified by: @cincuranet

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5.0 Beta 2 [ 10340 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants