Froid: Optimization of Imperative Programs in a Relational Database [pdf]

karthiksr · on Dec 24, 2018

I am a co-author of the Froid paper, and am around if people have any questions/comments/feedback.

Froid is now available as a feature of SQL Server 2019 preview. The feature is called "Scalar UDF Inlining" https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018...

Available to try out for free here: https://www.microsoft.com/en-us/sql-server/sql-server-2019

RMarcus · on Dec 24, 2018

I've only read part of it, but it seems great so far! I always appreciate the clarity and practicality y'all at the JGL take.

I'm amazed that the implementation was under 1500 LOC! Was that the research prototype or the shipped preview?

Congratulations on the VLDB paper! Hopefully I'll come say "hi" in LA :)

karthiksr · on Dec 24, 2018

Thank you.

The shipped preview has only a bit more than 1500LOC.

The VLDB paper was presented at Rio in Aug this year already, but I'll try to come over to LA anyways :)

maslam · on Dec 24, 2018

Karthik, I'm no Spark expert but almost all advice I read is to avoid UDFs if at all possible. Examples below:

- https://medium.com/teads-engineering/spark-performance-tunin... - https://www.inovex.de/blog/efficient-udafs-with-pyspark/

karthiksr · on Dec 24, 2018

Thank you for those pointers.

There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can. But still, some techniques might be applicable. Definitely worth digging further!

RMarcus · on Dec 24, 2018

Doh! Guess I should've checked. I didn't make it to Rio last year... Figured I was gonna miss a bunch of good stuff.

maslam · on Dec 24, 2018

Thank you for the paper - it is well-written and succinct. Karhik, do you think this approach can be applied to Apache Spark as well (given its well-known slowness with UDFs)?

karthiksr · on Dec 24, 2018

Thank you. Conceptually the ideas behind Froid follow from relational algebra so it can be applied to other relational engines as well. However, the details still need to be figured before making any concrete statement.

If you could share any pointers about UDFs and their performance problems in Spark, I would love to investigate more.

prince617 · on Dec 25, 2018

You might want to check out this related work: http://casper.uwplse.org

karthiksr · on Dec 25, 2018

Thank you. Casper is very interesting work, and I am aware of it. Program synthesis offers an alternative approach to such problems, with different trade offs and characteristics.

The paper includes a brief discussion on synthesis-based techniques, and the reasoning behind Froid's design choices.

gigatexal · on Dec 24, 2018

Why does the first example return price as a char? Looking forward to reading the paper fully. I just scanned it.

karthiksr · on Dec 24, 2018

It returns a formatted string including the price and the currency code. Eg: "5000 USD".