New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDFs declared with large varchars take excessive time to execute [CORE5314] #2050
Comments
Modified by: @reevespaulAttachment: test_udf_large_varchars.zip [ 12992 ] |
Commented by: @reevespaul The test SP should produce output similar to the following if a null string is passed: SQL> select aresult from tmp_sp(null); ARESULTTesting f_strpos_huge_un took 63.2850 Execution is slightly longer when an actual string is searched: SQL> select aresult from tmp_sp('astring with some text in it'); ARESULTTesting f_strpos_huge_un took 62.7500 |
Modified by: @reevespaulpriority: Major [ 3 ] => Minor [ 4 ] |
Commented by: Claudio Valderrama C. (robocop) For what I remember, strings are copied. |
Commented by: Sean Leyne (seanleyne) A running/pre-compiled version of the UDF and/or a db script with the Test SP and showing how the UDF was defined (BY DESCRIPTOR?) would be appreciated. |
Commented by: Paulius Pazera (ppazera) glds_udflib.cpp -- UDF source code |
Modified by: Paulius Pazera (ppazera)Attachment: glds_udflib.dll [ 13022 ] Attachment: glds_udflib.cpp [ 13023 ] Attachment: udfTest.sql [ 13024 ] |
Commented by: Paulius Pazera (ppazera) I attached source code and win32 binary for UDF functions doing nothing, accepting two string parameters (cstring, varchar and varchar by desriptor), and returning integer constant I also attached stored procedure which demonstrating performance of UDF calls when passing two string parameters (cstring, varchar and varchar by descriptor): ACTION DURATION_IN_SECONDS CYCLES just for comparison I included built-in 'position' function which also accepts two strings and returns integer |
Commented by: @asfernandes Optimized space for arguments data is 800 bytes. More than that requires an allocation in the heap. There is also core function * (used not only for UDFs) that clears (why?) whole varchar descriptor beforing move something in it. * src/common/cvt.cpp:
|
Commented by: @dyemanov I could be missing something but cannot we reserve the function's input message size inside the impure area (instead of using some temporary buffer) and thus avoid its reallocation at runtime? |
Commented by: @asfernandes Yes, could be. |
Modified by: @asfernandesassignee: Adriano dos Santos Fernandes [ asfernandes ] |
Commented by: @asfernandes The times I see in 3.0 Linux debug build are no near your ones, but anyway I'm fixing the per-call allocation in v3 and master. Please test it. |
Modified by: @asfernandesstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 4.0 Alpha 1 [ 10731 ] Fix Version: 3.0.2 [ 10785 ] |
Commented by: Paulius Pazera (ppazera) right, big string UDF parameter performance in fb3 is much better than in 2.5.6, looks like main issue was already fixed in fb3. Latest fix doesn't make significant difference: 3.0.2.32619 (snapshot with a fix): ACTION DURATION_IN_SECONDS CYCLES but the question still remains why built-in 'position' function is still ~6..17 times faster than UDF calls doing nothing. I played a bit more trying to measure malloc/memcpy/free/ib_util_malloc, here are results for 3.0.1 release: ACTION DURATION_IN_SECONDS CYCLES sounds like malloc/memcpy/free might be done in all cases. If so, could it be avoided? here is code for new tests: extern "C" ISC_LONG EXPORT fn_2mallocmemcpyfree(ISC_LONG *size) extern "C" char* EXPORT fn_ibutilmallocmemcpy(ISC_LONG *size) declare external function f_2mallocmemcpyfree declare external function f_ibutilmallocmemcpy_100 declare external function f_ibutilmallocmemcpy_32000 /*test UDF and declarations, expected 4, 5, 5*/ action='2 mallocmemcpyfree 100'; action='2 mallocmemcpyfree 32000'; action='2 ibutilmallocmemcpy 100/100'; action='2 ibutilmallocmemcpy 100/32000'; action='2 ibutilmallocmemcpy 32000/32000'; |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: No test => Cannot be tested |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Closed [ 6 ] |
Submitted by: @reevespaul
Attachments:
test_udf_large_varchars.zip
glds_udflib.dll
glds_udflib.cpp
udfTest.sql
UDFs declared with large varchars take excessive time to execute
An IBPhoenix client reported the following problem to us.
A single udf, when declared with a 32k string length takes 20 times longer to execute than the same udf declared with a 16k string length.
Likewise, declaring the UDF with a string length of 1K takes half the time again.
Here is a quick summary of the times:
f_strpos_huge - 60s
f_strpos_middling - 3s
f_strpos_1024 - 1.5s
In all cases the same actual function was executed. The only difference was in the declaration.
I have a test case which demonstrates the problem.
Commits: 6c2e26c 13fd2f7
The text was updated successfully, but these errors were encountered: