Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With a little change the reader will load data with GetValues(myValues) 30% faster [DNET1038] #949

Closed
firebird-automations opened this issue Apr 8, 2021 · 10 comments

Comments

@firebird-automations
Copy link

Submitted by: Baldur Fürchau (bfuerchau)

I have loaded the Version 7.10.1.0 because the clr 4.8 is not available on all customers, but the improvement is also for the latest version.
In my little test i use generally the GetValues() from th FbDataReader to process all values the same time.
With the current version i get for a table with 90 columns an average of 8500 rows per second.
The Micorosft SwlDatareader give me for the same table 10000 rows per second.

So I have checked the source for the GetValue()-Method:

	public override int GetValues\(object\[\] values\)
	\{
		CheckState\(\);
		CheckPosition\(\);

		var count = Math\.Min\(\_fields\.Count, values\.Length\);
		for \(var i = 0; i < count; i\+\+\)
		\{
			values\[i\] = \_row\[i\]\.Value; // CheckedGetValue\(x =\> GetValue\(x\), i\);
		\}
		return count;
	\}

Because all checks are done you can directly load the row values.
With this only little change i get now 12000 Rows per Second and this is 20% faster than SQL Server and 30% than current version.

An additional reason to use Firebird.

@firebird-automations
Copy link
Author

Commented by: Baldur Fürchau (bfuerchau)

writing error;-).

@firebird-automations
Copy link
Author

Modified by: Baldur Fürchau (bfuerchau)

environment: Windows 10 20H2, Visual Studio 20H2, I7 2,6GHz, 16 GB Memory, Firebird 3.0 => Windows 10 20H2, Visual Studio 2019, I7 2,6GHz, 16 GB Memory, Firebird 3.0

summary: With a little change the reader will load data with GetValues(myValues) 30% faster => With a little change the reader will load data with GetValues(myValues) 20% faster

@firebird-automations
Copy link
Author

Modified by: Baldur Fürchau (bfuerchau)

description: I have loaded the Version 7.10.1.0 because the clr 4.8 is not available on all customers, but the improvement is also for the latest version.
In my little test i use generally the GetValues() from th FbDataReader to process all values the same time.
With the current version i get for a table with 90 columns an average of 8500 rows per second.
The Micorosft SwlDatareader give me for the same table 10000 rows per second.

So I have checked the source for the GetValue()-Method:

	public override int GetValues\(object\[\] values\)
	\{
		CheckState\(\);
		CheckPosition\(\);

		var count = Math\.Min\(\_fields\.Count, values\.Length\);
		for \(var i = 0; i < count; i\+\+\)
		\{
			values\[i\] = \_row\[i\]\.Value; // CheckedGetValue\(x =\> GetValue\(x\), i\);
		\}
		return count;
	\}

Because all checks are done you can directly load the row values.
With this only little change i get now 12000 Rows per Second and this is 20% faster than SQL Server.

An additional reason to use Firebird.

=>

I have loaded the Version 7.10.1.0 because the clr 4.8 is not available on all customers, but the improvement is also for the latest version.
In my little test i use generally the GetValues() from th FbDataReader to process all values the same time.
With the current version i get for a table with 90 columns an average of 8500 rows per second.
The Micorosft SwlDatareader give me for the same table 10000 rows per second.

So I have checked the source for the GetValue()-Method:

	public override int GetValues\(object\[\] values\)
	\{
		CheckState\(\);
		CheckPosition\(\);

		var count = Math\.Min\(\_fields\.Count, values\.Length\);
		for \(var i = 0; i < count; i\+\+\)
		\{
			values\[i\] = \_row\[i\]\.Value; // CheckedGetValue\(x =\> GetValue\(x\), i\);
		\}
		return count;
	\}

Because all checks are done you can directly load the row values.
With this only little change i get now 12000 Rows per Second and this is 20% faster than SQL Server and 30% than current version.

An additional reason to use Firebird.

summary: With a little change the reader will load data with GetValues(myValues) 20% faster => With a little change the reader will load data with GetValues(myValues) 30% faster

@firebird-automations
Copy link
Author

Modified by: @cincuranet

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Author

Commented by: @cincuranet

But the `IscException` is not handled, isn't it?

@firebird-automations
Copy link
Author

Commented by: Baldur Fürchau (bfuerchau)

Thats right, but you can do try/catch around to change the exception.

By th way:
In SQL-Reader i get an exception if the destination array is shorter than the field count.
I think this is better also to implement.

If you are interested i create my own FbBulkCopy like SQL-Server has.
At the moment i can make 1500 Insert/Second, with 4 parallel connections up to 10.000, depending on fieldcount and sizes.
So when i use all available cpus i can fast load my datawarehouse. I can give you than the solution;-).

@firebird-automations
Copy link
Author

Commented by: Baldur Fürchau (bfuerchau)

Finally i made an additional enhencement:

	public override int GetValues\(object\[\] values\)
	\{
		CheckState\(\);
		CheckPosition\(\);

		var count = \_fields\.Count;
		if \(values\.Length < count\)
			throw new ArgumentException\("Argument too small to hold values"\);

		try
		\{
			for \(var i = 0; i < count; i\+\+\)
			\{
				values\[i\] = \_row\[i\]\.Value;
			\}
			return count;
		\}
		catch \(IscException ex\)
		\{
			throw new FbException\(ex\.Message, ex\);
		\}
	\}

	private void CheckIndex\(int i\)
	\{
		//if \(i < 0 \|\| i \>= FieldCount\)
		//	throw new IndexOutOfRangeException\("Could not find specified column in results\."\);
	\}

1) if the argument too small i send an exception like other readers does. It makes no sence, to get less values as available.
2) Catch for iscExcepption is done only once and not for each column.
3) I commented out the checkindex because the clr send the same exception.

With this changes i get now 15.500 Rows/Second on my machine.

Also if DataTable or DataAdapter fills DataTable-Objects, for every column is checked the index which is always correct. Additional the loadingmechanism gets the GetValues() with check index for each column and does call also IsDBNull for each column where the index checks again.

So this improve also the loading behavier.
It seams for each its "microoptimization", but the functions are called, in my test 100,000 Rows with 92 columns 18,400,000 times.

Try it;-).

@cincuranet
Copy link
Member

Your code is missing the code with ExpectedColumnTypes.

Also I don't think that skipping CheckIndex is a good idea. Yes, same exception will be thrown, but from different place and with less descriptive message.

@cincuranet
Copy link
Member

Quickly looking into npgsql I see that it's doing the same in case the array is smaller.

@cincuranet
Copy link
Member

I'm closing this now, because I believe the proposed changes are not equivalent. But, that doesn't mean I'm not open to suggestions for optimizations, etc., quite the opposite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants