There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can.
But still, some techniques might be applicable. Definitely worth digging further!
Thank you for the paper - it is well-written and succinct. Karhik, do you think this approach can be applied to Apache Spark as well (given its well-known slowness with UDFs)?
Thank you. Conceptually the ideas behind Froid follow from relational algebra so it can be applied to other relational engines as well. However, the details still need to be figured before making any concrete statement.
If you could share any pointers about UDFs and their performance problems in Spark, I would love to investigate more.
Thank you. Casper is very interesting work, and I am aware of it. Program synthesis offers an alternative approach to such problems, with different trade offs and characteristics.
The paper includes a brief discussion on synthesis-based techniques, and the reasoning behind Froid's design choices.
Froid is now available as a feature of SQL Server 2019 preview. The feature is called "Scalar UDF Inlining" https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018...
Available to try out for free here: https://www.microsoft.com/en-us/sql-server/sql-server-2019