Introduction
I must admit that when I first saw a demo of SELECT AI, I thought it was somewhat of a novelty. However, I changed my mind after completing a proof of concept for a real-world business use case.
This post will introduce SELECT AI, describe how to set it up and demonstrate how SELECT AI performed for real-world business POC.
What is SELECT AI
SELECT AI is a feature of the Oracle Autonomous Database that allows you to converse with data in your database with some help from Generative AI.
💡
SELECT AI allows you to turn a question like "Which customers have the most credit memos?" into a query on your sales database without the end-user needing to know the underlying data model or SQL.
You can use IDE tools like SQL Developer, SQL Developer Web, or SQL Developer for VS Code to enter SQL statements like this:
SELECT AI summarize the sales by region for period JAN-20 showing revenue amount and quantity invoiced;
After running the SQL statement, SELECT AI sends your question and metadata about your schema (more on that later) to your LLM provider, which generates an SQL statement. SELECT AI then runs the returned SQL statement against your tables. e.g.
💡
The appeal of SELECT AI is its ability to operate on your data, and depending on the mode you run it in, you can control if your data is sent to the LLM.
SELECT AI Modes
SELECT AI can be run in the following modes:
runsql
: Generate the SQL statement for a prompt and run the SQL query to return the resulting records.
narrate
: Sends the result of the LLM-generated SQL query run by SELECT AI and sends it back to the LLM to generate a natural language description of that result.
chat
: This essentially allows you to use SELECT AI like Chat GPT. The prompt is sent directly to the LLM, and the response is returned.
showsql
: Displays the SQL statement for a prompt. This allows you to check it's work!
explainsql
: Generates SQL from the prompt and sends the generated SQL to the AI provider, which then produces a natural language explanation of the SQL. This is an excellent way for people to learn SQL with actual data.
If you do not identify a 'mode', then SELECT AI will operate in runsql
mode.
Security
The narrate
mode sends the returned data results to the LLM for additional processing. You should not use this mode if you have concerns about sending your data to a LLM.
It should also be noted that you cannot run PL/SQL statements, DDL statements, or DML statements using the AI
keyword.
General Setup
If you have used other Autonomous PL/SQL APIs like DBMS_VECTOR
or DBMS_CLOUD
, then the setup should look pretty familiar. These steps need to be in place before we perform SELECT AI-specific setups.
Create Network ACL for LLM
I use Open AI in my examples, and my DB schema is WKSP_DEMO.
BEGIN DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE( host => 'api.openai.com', ace => xs$ace_type(privilege_list => xs$name_list('http'), principal_name => 'WKSP_DEMO', principal_type => xs_acl.ptype_db) ); END; /
Grant to Execute on DBMS_CLOUD_AI
I need to grant execute on DBMS_CLOUD_AI
to my DB user WKSP_DEMO
:
grant execute on DBMS_CLOUD_AI to WKSP_DEMO;
Create a Credential for LLM Provider
If you do not already have one, create a credential for your LLM provider. The example below is for Open AI.
BEGIN DBMS_CLOUD.CREATE_CREDENTIAL (credential_name => 'OPENAI_CRED', params => JSON('{"provider": "openai", "access_token": "YOUROPENAICREDENTIAL", "credential_name": "OPENAI_CRED", "url": "https://api.openai.com/v1/chat/completions", "model": "gpt-4o-mini", "temperature": 1.0}')); END;
Business Use Case
A manufacturing company already has a de-normalized sales table for Business Intelligence reporting. The table is updated every evening with the latest sales data. The company wants to provide its executives with a natural language interface to inquire about sales data.
Table Definition
The table is a de-normalized representation of sales summarized by region, customer, and period.
CREATE TABLE SALES_REPORT (OPERATION_NAME VARCHAR2(150), REGION_NAME VARCHAR2(150), SALES_REGION VARCHAR2(150), PLANT_NAME VARCHAR2(50), INVENTORY_ORG_CODE VARCHAR2(3), PERIOD_YEAR NUMBER, QUARTER_NUM NUMBER, PERIOD_NAME VARCHAR2(15), PERIOD_NUM NUMBER, CUSTOMER_NAME VARCHAR2(360), ACCOUNT_NUMBER VARCHAR2(30), ITEM_CODE VARCHAR2(40), ITEM_DESCRIPTION VARCHAR2(240), LINE_UOM_CODE VARCHAR2(3), CURRENCY VARCHAR2(15), CITY VARCHAR2(60), STATE VARCHAR2(60), COUNTRY VARCHAR2(60), TRANSACTION_TYPE VARCHAR2(20), REVENUE_AMOUNT NUMBER, REVENUE_AMOUNT_USD NUMBER, QUANTITY_INVOICED NUMBER, QUANTITY_CREDITED NUMBER, COUNT_LINES NUMBER);
Here are a few of the comments I used for the POC. Consider comments as additional prompts sent to the LLM to help it write more accurate SQL.
COMMENT ON TABLE SALES_REPORT is 'This table stores sales data for a large manufacturing company. In the context of this tabke Sales and Revenue are interchangable.'; COMMENT ON COLUMN SALES_REPORT.OPERATION_NAME IS 'The name of the business operation within the company.'; COMMENT ON COLUMN SALES_REPORT.REGION_NAME IS 'The name of the region where the sale took place.'; COMMENT ON COLUMN SALES_REPORT.SALES_REGION IS 'The name of the sales region where sales reps get credit.'; COMMENT ON COLUMN SALES_REPORT.PLANT_NAME IS 'The name of the plant that produced the material for the sales.'; COMMENT ON COLUMN SALES_REPORT.INVENTORY_ORG_CODE IS 'Alternative code representing the plant.'; COMMENT ON COLUMN SALES_REPORT.TRANSACTION_TYPE IS 'Indicates if this is an Invoice INV or Credit Memo CM'; COMMENT ON COLUMN SALES_REPORT.REVENUE_AMOUNT IS 'The amount of revenue from the sales in the CURRENCY column currency code.'; COMMENT ON COLUMN SALES_REPORT.REVENUE_AMOUNT_USD IS 'The amount of revenue from the sales in the USD currency code.'; COMMENT ON COLUMN SALES_REPORT.QUANTITY_INVOICED IS 'The quantity invoiced in the unit of measure from column LINE_UOM_CODE.'; COMMENT ON COLUMN SALES_REPORT.QUANTITY_CREDITED IS 'The quantity credited if column TRANSACTION_TYPE = CM or Credit Memo.';
Comments are critical if the column's name or the content is unclear. For example, the TRANSACTION_TYPE
column above contains the text CM
to indicate a Credit Memo and INV
to indicate an invoice. Adding a comment to tell the LLM what these values mean will significantly improve the result.
Create a SELECT AI Profile
This step is specific to SELECT AI. This is where you can specify which schemas and tables SELECT AI will consider when it sends Meta Data to the LLM to generate SQL statements. Profiles also allow you to limit the scope of what SELECT AI is allowed to do.
BEGIN DBMS_CLOUD_AI.create_profile( profile_name => 'SALES_OPENAI', attributes => '{"provider": "openai", "comments": "true", "model": "gpt-4o", "temperature": 0.2, "max_tokens": 4096, "conversation": "true", "credential_name": "OPENAI_CRED", "object_list": [{"owner": "WKSP_DEMO", "name": "SALES_REPORT"}]}'); END;
Some of the key options are:
comments
- This option determines if SELECT AI will send table comments to the LLM to help it build the SQL.
conversation
- Indicates if conversation history is enabled for a profile.
credential_name
- This is the credential (see above) used to authenticate against your LLM providers.
object_list
- Array of JSON objects specifying the owner and object names eligible for natural language translation to SQL. To include all objects of a given schema, omit the name
and only specify the owner
field in the JSON object.
Here is a link to all of the possible attributes.
Let's Try it Out
Now that the setups are complete, we can run SELECT AI.
From SQL Developer for VS Code
One key advantage of running SELECT AI from a tool like VS Code is that it can maintain the context of a conversation.
You must select a profile before you can start asking questions:
EXEC DBMS_CLOUD_AI.SET_PROFILE (profile_name => 'SALES_OPENAI');
RUNSQL Example
SELECT AI RUNSQL summarize sales by plant, include quantity and revenue in USD;
Because conversation
is turned on in the profile, we can add to the previous question and retain the context.
SELECT AI RUNSQL include the region in the result;
Hopefully, you are starting to see how powerful SELECT AI is. We are essentially having a conversation with our data!
NARRATE Example
The narrate mode goes further and sends the SQL resultset back to the LLM to perform additional smarts. This results in a natural language response that is easy for executives to consume.
SELECT AI NARRATE show me which plants have the most credit memos;
The result comes back in Markdown format:
The plant with the most credit memos is Austin, with 68 credit memos. Other plants with a high number of credit memos include: - San Antonio: 29 credit memos - Washinton: 27 credit memos - Arizona: 27 credit memos - Minneapolis: 24 credit memos - Kansas City: 23 credit memos - San Diego: 22 credit memos Several other plants have fewer credit memos, ranging from 1 to 16.
I think you will agree that the results are pretty impressive.
SHOWSQL Example
SHOWSQL shows the SQL generated by the LLM in the first step:
SELECT AI SHOWSQL show me which plants have the most credit memos; SELECT "R"."PLANT_NAME" AS "Plant_Name", COUNT("R"."TRANSACTION_TYPE") AS "Credit_Memo_Count" FROM "WKSP_DEMO"."SALES_REPORT" "R" WHERE "R"."TRANSACTION_TYPE" = 'CM' GROUP BY "R"."PLANT_NAME" ORDER BY "Credit_Memo_Count" DESC
EXPLAIN SQL Example
Explain SQL provides a natural language explanation of the generated SQL:
SELECT AI EXPLAINSQL show me which plants have the most credit memos; SELECT "R"."PLANT_NAME" AS "Plant_Name", COUNT("R"."TRANSACTION_TYPE") AS "Credit_Memo_Count" FROM "WKSP_DEMO"."SALES_REPORT" "R" WHERE "R"."TRANSACTION_TYPE" = 'CM' GROUP BY "R"."PLANT_NAME" ORDER BY "Credit_Memo_Count" DESC
Result Part2:
Explanation:
SELECT Clause:
"R"."PLANT_NAME" AS "Plant_Name"
: This selects the plant name from the table and aliases it as "Plant_Name" for readability.
COUNT("R"."TRANSACTION_TYPE") AS "Credit_Memo_Count"
: This counts the number of occurrences where the transaction type is a credit memo ('CM') and aliases it as "Credit_Memo_Count".
FROM Clause:
"WKSP_DEMO"."SALES_REPORT" "R"
: This specifies the table "SALES_REPORT"
from the schema "WKSP_DEMO"
and assigns it an alias "R"
for easier reference in the query.
WHERE Clause:
"R"."TRANSACTION_TYPE" = 'CM'
: This filters the records to include only those where the transaction type is a credit memo.
GROUP BY Clause:
"R"."PLANT_NAME"
: This groups the results by plant name, so the count of credit memos is calculated for each plant.
ORDER BY Clause:
"Credit_Memo_Count" DESC
: This orders the results in descending order based on the count of credit memos, so the plant with the most credit memos appears first.
💡
This is a pretty nice way to learn SQL.
CHAT Example
Finally, we have chat mode. This acts like ChatGPT, and you can ask general questions about the LLM.
SELECT AI CHAT which regions of the us are most suitable for heavy manufactiring locations;
The above SELECT AI prompt will result in a general response from the LLM about the manufacturing location as if you had typed it into ChatGPT.
What About APEX?
If the goal is to allow executives to converse with Sales Data, then APEX is the obvious choice for providing an easy-to-use user interface. To use SELECT AI in a stateless environment like APEX, we must use the PLSQL API DBMS_CLOUD_AI.GENERATE. This API allows you to perform all of the same actions as SELECT AI, except for runsql
.
A call to DBMS_CLOUD_AI.GENERATE from an APEX Page Process could look like this:
BEGIN :P100_RESPONSE := dbms_cloud_ai.generate (prompt => :P100_PROMPT, profile_name => 'SALES_OPENAI', action => :P100_SELECTAI_ACTION); END;
P100_RESPONSE
is the APEX Page Item in which the response will be stored. If you make this a Display Only field with Format = Markdown, the response will be appropriately formatted with headings, bullet points, etc.
P100_PROMPT
is the page item in which the user would enter their prompt.
P100_SELECTAI_ACTION
could be a radio group with the SELECT AI actions, or if you want to simplify for an executive, hard code the action
parameter to 'NARRATE'.
The user interface would look something like this:
What About Tabular Results?
Generating tabular results is trickier because you cannot use the runsql
mode. Instead, you must take the generated SQL and do something like this:
Use DBMS_SQL.PARSE to make sure the SQL is valid and create a cursor.
Use dbms_sql.describe_columns to get the columns for the cursor.
Massage the results into an APEX Collection.
Display the results to the user as a table.
The APEX Team developed a Demo app called CHATDB, which you can download from this Blog Post. It includes an example of the above.
But, Is It Accurate?
AI is all well and good, but we know there is a risk of hallucinations. In other words, sometimes LLMs take liberties with the truth. The good news is that you can at least check the SQL that SELECT AI generates.
Log and Audit the Results
If you have an APEX front end, you can run each request twice: once to capture and log the generated SQL and once to perform the Narrate action. I advise you to log the question, the SQL, parameters (such as model, temperature, etc), and the response. This way, you can perform audits to verify the information that SELECT AI is accurate and fine-tune table and column comments to improve results.
Use the Latest Model
The results will only be as good as the LLM model generating the SQL. Because of this, I would advise using the most advanced LLM model available.
Conclusion
Hopefully, this post provided an good introduction to setting up and getting SELECT AI to work. More importantly, I hope it provided some inspiration to show you how you could benefit from this very powerful feature of the Autonomous Database.
评论
发表评论