In [1]:
from app.core.helper_llm import get_sql_agent_enterprise
from app.tools.database.helper import sql_result_to_text
import time
In [2]:
def run_questions(question: str):
start_time = time.time()
response_ai, sql_query, url_text = get_sql_agent_enterprise(question,"")
print(f"Sql Query Generated: {sql_query}")
final_response = sql_result_to_text(response_ai, sql_query, language="English")
print(final_response+'\n'+url_text)
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time:.2f} seconds")
In [3]:
# Knowledge
user_question = "List the schools with the average score in Math over 560 in the SAT test."
run_questions(user_question)
Dataframe is shortened to 20 rows. Sql Query Generated: SELECT DISTINCT "sname" FROM "satscores" WHERE "AvgScrMath" > 560; Based on the results retrieved from your database, the schools with an average math SAT score greater than 560 are as follows: 1. Alameda Science and Technology Institute 2. Alameda High 3. Alameda Community Learning Center 4. Albany High 5. Berkeley High 6. American High 7. Irvington High 8. Mission San Jose High 9. Washington High 10. Granada High 11. American Indian Public High 12. Oakland Charter High 13. Piedmont High 14. Dublin High 15. Foothill High 16. Amador Valley High 17. Acalanes High 18. Campolindo High 19. Miramonte High Please note that this list is only the top 20 rows out of a total of 130 rows available in the database. The full report is available to be downloaded here: https://temporary-reports-metalearner.s3.amazonaws.com/exports/conversation_report_download_20241012_102035.xlsx?AWSAccessKeyId=AKIA4AJ5F4KNFVXPVPLG&Signature=m2j6zIkIELpFGzjVX5Ownwp676c%3D&Expires=1728730235. The report is only available for 30 minutes. Execution time: 9.41 seconds
In [4]:
# Knowledge
user_question = "Among the schools with the average score in Math over 560 in the SAT test, how many schools are in the bay area?"
run_questions(user_question)
Sql Query Generated: SELECT COUNT(DISTINCT "cds") AS "Number of Schools" FROM "satscores" WHERE "cname" IN ('Alameda', 'Contra Costa', 'Marin', 'Napa', 'San Mateo', 'Santa Clara', 'Solano', 'Sonoma', 'San Francisco') AND "AvgScrMath" > 560; Based on the results retrieved from your database, there are **71 schools** in the counties of Alameda, Contra Costa, Marin, Napa, San Mateo, Santa Clara, Solano, Sonoma, and San Francisco where the average math SAT score is greater than 560. - **Number of Schools**: 71 Execution time: 10.58 seconds
In [5]:
# Knowledge
user_question = "List the telephone number for school ordered by average score in reading ascending"
run_questions(user_question)
Dataframe is shortened to 20 rows. Sql Query Generated: SELECT S."Phone" FROM schools S INNER JOIN satscores SS ON S."CDSCode" = SS."cds" ORDER BY SS."AvgScrRead" ASC; Based on the results retrieved from your database, there are a total of **2269 rows** of phone numbers associated with schools, ordered by their average SAT reading scores. Only the top 20 rows are displayed here: 1. None 2. (510) 748-4008 3. (510) 748-4024 4. (510) 644-6159 5. (510) 657-9155 6. (510) 657-7028 7. (510) 797-0100 8. (925) 606-4722 9. (510) 818-3720 10. (510) 489-2185 11. (510) 992-7800 12. (510) 464-3592 13. (510) 633-0750 14. (510) 729-4303 15. (510) 874-3630 16. (510) 729-4308 17. (510) 594-2703 18. (510) 618-4460 19. (209) 257-5353 20. (209) 257-5150 The full report is available to be downloaded here: https://temporary-reports-metalearner.s3.amazonaws.com/exports/conversation_report_download_20241012_102054.xlsx?AWSAccessKeyId=AKIA4AJ5F4KNFVXPVPLG&Signature=SyeufxXO8%2Fhl2bJv937OaMQ8e5s%3D&Expires=1728730254. The report is only available for 30 minutes. Execution time: 10.09 seconds
In [6]:
# Knowledge
user_question = "What is the telephone number for the school with the lowest average score in reading in Southern California?"
run_questions(user_question)
Sql Query Generated: WITH lowest_reading_score AS (SELECT cds, "AvgScrRead" FROM satscores WHERE "cname" IN ('Los Angeles', 'San Diego', 'Orange', 'Riverside', 'San Bernardino', 'Kern', 'Ventura', 'Santa Barbara', 'San Luis Obispo', 'Imperial') AND "AvgScrRead" IS NOT NULL ORDER BY "AvgScrRead" ASC LIMIT 1) SELECT S."Phone" FROM schools S INNER JOIN lowest_reading_score LRS ON S."CDSCode" = LRS.cds; Based on the results retrieved from your database, the phone number for the school with the lowest reading score among the counties of Los Angeles, San Diego, Orange, Riverside, San Bernardino, Kern, Ventura, Santa Barbara, San Luis Obispo, and Imperial is **(562) 944-0033**. - **Phone**: (562) 944-0033 Execution time: 7.56 seconds
In [7]:
# Knowledge
user_question = "How many test takers are there at the school/s in a county with population over 2 million?"
run_questions(user_question)
Sql Query Generated: SELECT SUM("NumTstTakr") as total_tests_takers FROM satscores WHERE "cname" LIKE 'Los Angeles' OR "cname" LIKE 'San Diego' OR "cname" LIKE 'Orange' OR "cname" LIKE 'Riverside' OR "cname" LIKE 'San Bernardino'; Based on the results retrieved from your database, the total number of test takers from the counties of Los Angeles, San Diego, Orange, Riverside, and San Bernardino is **244,742**. - **total_tests_takers**: 244742 Execution time: 6.27 seconds
In [8]:
# Knowledge
user_question = "What is the grade span offered in schools ordered by highest longitude"
run_questions(user_question)
Sql Query Generated: SELECT "School", "GSserved" FROM schools ORDER BY ABS("Longitude") DESC LIMIT 1; Based on the results retrieved from your database, the school with the highest absolute longitude value is **Mattole Union Elementary**, which does not serve any grade spans (GSserved: None). Here is the data retrieved: - School: Mattole Union Elementary - GSserved: None Execution time: 8.56 seconds
In [9]:
# Knowledge
# Excluded - Different from grounded answer, could be due to business logic, but SQL generated looks correct
user_question = "What is the grade span offered in the school with the highest longitude in cities in that are part of the 'Silicon Valley' region?"
run_questions(user_question)
Sql Query Generated: WITH SiliconValleyRegion AS (SELECT 'San Jose' AS City UNION SELECT 'Palo Alto' UNION SELECT 'Mountain View' UNION SELECT 'Sunnyvale' UNION SELECT 'Santa Clara' UNION SELECT 'Cupertino' UNION SELECT 'Menlo Park' UNION SELECT 'Los Gatos' UNION SELECT 'Fremont' UNION SELECT 'Milpitas') SELECT "School", "GSserved" FROM schools WHERE "City" IN (SELECT City FROM SiliconValleyRegion) ORDER BY "Longitude" DESC LIMIT 1; Based on the results retrieved from your database, the school from the Silicon Valley region that appears first when ordering by longitude in descending order is **Ledesma (Rita) Elementary**, serving grades **K-6**. This indicates that it is the westernmost school in the dataset from the specified Silicon Valley cities. Please note that this result is based on the top row retrieved from the database, and there is a total of 1 row available for this specific query. - School: Ledesma (Rita) Elementary - GSserved: K-6 Execution time: 7.45 seconds
In [10]:
# Knowledge
user_question = "List the names of school administrators from most common to least common"
run_questions(user_question)
Dataframe is shortened to 20 rows. Sql Query Generated: SELECT "admfname1", COUNT("admfname1") FROM "schools" GROUP BY "admfname1" ORDER BY COUNT("admfname1") DESC; Based on the results retrieved from your database, the most common first names of administrators in the dataset are listed with **David** leading the count at 158 occurrences. Following closely are **Michael** with 149 occurrences and **John** with 137 occurrences. The list continues with names like **Jennifer**, **Lisa**, and **Robert**, among others, showcasing a variety of common names among school administrators. It's important to note that these results only represent the top 20 rows out of a total of 2328 rows in the dataset. Here's a brief overview of the top 20 administrator first names and their counts: - **David**: 158 - **Michael**: 149 - **John**: 137 - **Jennifer**: 129 - **Lisa**: 103 - **Robert**: 88 - **Michelle**, **Mark**, **Julie**: 79 - **Susan**: 77 - **Richard**: 74 - **James**: 73 - **Scott**: 71 - **Laura**: 69 - **Karen**: 67 - **Chris**: 65 - **Mary**: 62 - **Steve**, **Jeff**: 61 - **Maria**: 60 This data provides insight into the frequency of certain first names among school administrators, but remember, the total dataset includes 2328 rows, indicating a wide variety of names beyond just these top 20. The full report is available to be downloaded here: https://temporary-reports-metalearner.s3.amazonaws.com/exports/conversation_report_download_20241012_102133.xlsx?AWSAccessKeyId=AKIA4AJ5F4KNFVXPVPLG&Signature=dN0vsqvLiGTcKfQV82fj5y14cws%3D&Expires=1728730293. The report is only available for 30 minutes. Execution time: 10.66 seconds
In [11]:
# Knowledge
# Wrong Answer by MetaLearner as there is no column that indicates if the administration are male or female in the database
user_question = "What are the two most common first names among the female school administrators?"
run_questions(user_question)
Sql Query Generated: SELECT "AdmFName1", COUNT("AdmFName1") AS "Count" FROM "schools" WHERE "AdmFName1" IS NOT NULL AND "AdmFName1" IN (SELECT "AdmFName1" FROM "schools" WHERE "AdmFName1" LIKE 'A%' OR "AdmFName1" LIKE 'B%' OR "AdmFName1" LIKE 'C%' OR "AdmFName1" LIKE 'D%' OR "AdmFName1" LIKE 'E%' OR "AdmFName1" LIKE 'F%' OR "AdmFName1" LIKE 'G%' OR "AdmFName1" LIKE 'H%' OR "AdmFName1" LIKE 'I%' OR "AdmFName1" LIKE 'J%' OR "AdmFName1" LIKE 'K%' OR "AdmFName1" LIKE 'L%' OR "AdmFName1" LIKE 'M%' OR "AdmFName1" LIKE 'N%' OR "AdmFName1" LIKE 'O%' OR "AdmFName1" LIKE 'P%' OR "AdmFName1" LIKE 'Q%' OR "AdmFName1" LIKE 'R%' OR "AdmFName1" LIKE 'S%' OR "AdmFName1" LIKE 'T%' OR "AdmFName1" LIKE 'U%' OR "AdmFName1" LIKE 'V%' OR "AdmFName1" LIKE 'W%' OR "AdmFName1" LIKE 'X%' OR "AdmFName1" LIKE 'Y%' OR "AdmFName1" LIKE 'Z%') GROUP BY "AdmFName1" ORDER BY "Count" DESC LIMIT 2; Based on the results retrieved from your database, the most common first names of administrators in schools, starting with any letter from A to Z, are **David** and **Michael**. David appears **158** times, making it the most common, followed by Michael, which appears **149** times. Here is the data in list form: - David: 158 - Michael: 149 Execution time: 12.55 seconds
In [12]:
# Knowledge
user_question = "What is the total number of schools whose total SAT scores are greater or equal to 1500 whose mailing city is the county seat of Lake County, California?"
run_questions(user_question)
Sql Query Generated: SELECT COUNT(DISTINCT T1."CDSCode") AS "Number of Schools" FROM schools T1 INNER JOIN satscores T2 ON T1."CDSCode" = T2."cds" WHERE (T2."AvgScrMath" + T2."AvgScrRead" + T2."AvgScrWrite") >= 1500 AND T1."City" = 'Lakeport'; Based on the results retrieved from your database, there are **2** schools in Lakeport where the sum of the average scores in Math, Reading, and Writing on the SAT exceeds 1500. - **Number of Schools**: 2 Execution time: 6.50 seconds
In [13]:
# Knowledge
user_question = "Give the names of the schools with the percent eligible for free meals in K-12 is more than 0.1 and test takers whose test score is greater than or equal to 1500?"
run_questions(user_question)
Dataframe is shortened to 20 rows. Sql Query Generated: WITH PercentEligibleFreeMeal AS (SELECT "CDSCode" FROM frpm WHERE "Percent (%) Eligible Free (K-12)" > 0.1) SELECT T2."sname" FROM PercentEligibleFreeMeal AS T1 INNER JOIN satscores AS T2 ON T1."CDSCode" = T2."cds" WHERE T2."NumGE1500" > 0; Based on the results retrieved from your database, the schools listed below are those where more than 10% of students are eligible for free meals and have students who scored above 1500 on the SAT. This list is part of a larger dataset, which contains a total of 1165 rows, but only the top 20 rows are displayed here: 1. FAME Public Charter 2. Envision Academy for Arts & Technology 3. Alameda Science and Technology Institute 4. Alameda High 5. Alternatives in Action 6. Encinal High 7. Albany High 8. REALM Charter High 9. Berkeley High 10. Castro Valley High 11. Emery Secondary 12. American High 13. John F. Kennedy High 14. Washington High 15. Leadership Public Schools - Hayward 16. Impact Academy of Arts & Technology 17. Hayward High 18. Mt. Eden High 19. Tennyson High 20. Granada High Please note that this is a partial view of the data, and there are a total of 1165 rows available in the dataset. The full report is available to be downloaded here: https://temporary-reports-metalearner.s3.amazonaws.com/exports/conversation_report_download_20241012_102209.xlsx?AWSAccessKeyId=AKIA4AJ5F4KNFVXPVPLG&Signature=iI59FnlpKTD%2FUcPKkhRvp9b73Uw%3D&Expires=1728730329. The report is only available for 30 minutes. Execution time: 16.20 seconds
In [14]:
# Knowledge
user_question = "Give the number of schools with the percent eligible for free meals in K-12 is more than 0.1 and test takers whose average read + math score is 300 points less than the maximum possible read + math score."
run_questions(user_question)
Sql Query Generated: WITH PercentEligibleFreeMeal AS (SELECT "CDSCode" FROM frpm WHERE "Percent (%) Eligible Free (K-12)" > 0.1) SELECT COUNT(T2."cds") FROM PercentEligibleFreeMeal AS T1 INNER JOIN satscores AS T2 ON T1."CDSCode" = T2."cds" WHERE T2."AvgScrMath" + T2."AvgScrRead" > 1600-300; Based on the results retrieved from your database, there is **1 school** that has more than 10% of its students eligible for free meals (K-12) and also has an average SAT score (combining Math and Reading) greater than 1300. This indicates a school that, despite having a significant portion of students eligible for free meals, performs very well academically in terms of SAT scores. Execution time: 13.11 seconds
In [15]:
# Knowledge
# Excluded - Different from grounded answer, could be due to business logic, but SQL generated looks correct
user_question = "How many schools have the difference in enrollements between K-12 and ages 5-17 as more than average high school class size?"
run_questions(user_question)
Sql Query Generated: WITH AverageHighSchoolClassSize AS (SELECT 17.2 AS avg_class_size) SELECT COUNT(DISTINCT T1."CDSCode") AS "Number of Schools" FROM schools AS T1 INNER JOIN frpm AS T2 ON T1."CDSCode" = T2."CDSCode" INNER JOIN AverageHighSchoolClassSize AS T3 ON TRUE WHERE T2."Enrollment (K-12)" - T2."Enrollment (Ages 5-17)" > T3.avg_class_size; Based on the results retrieved from your database, there are **2313** schools where the difference between the total K-12 enrollment and the enrollment of ages 5-17 exceeds the average high school class size of **17.2** students. - **Number of Schools**: 2313 Execution time: 6.94 seconds
In [16]:
# Knowledge
user_question = "List the names of the magnet schools with SAT test takers of over 500"
run_questions(user_question)
Sql Query Generated: WITH MagnetSchools AS (SELECT "CDSCode" FROM "schools" WHERE "Magnet" = 1) SELECT T1."sname" FROM "satscores" T1 INNER JOIN MagnetSchools T2 ON T1."cds" = T2."CDSCode" WHERE T1."NumTstTakr" > 500; Based on the results retrieved from your database, the schools **Millikan High**, **Polytechnic High**, and **Troy High** are identified as magnet schools with more than 500 test takers. This indicates that these schools not only have a magnet program but also a significant number of students participating in standardized testing. Here is the list of schools retrieved: - Millikan High - Polytechnic High - Troy High Please note that this is the complete list of schools matching the criteria from the database. Execution time: 7.47 seconds
In [17]:
# Knowledge
user_question = "List the schools with an average score in Math under 600 in the SAT test and are exclusively virtual."
run_questions(user_question)
Sql Query Generated: SELECT DISTINCT T1.School FROM schools T1 INNER JOIN satscores T2 ON T1.CDSCode = T2.cds WHERE T1.Virtual = 'F' AND T2.AvgScrMath < 600; Based on the results retrieved from your database, the schools that are virtual only and have an average SAT Math score below 600 are: 1. California Virtual Academy @ Los Angeles 2. Insight @ Los Angeles 3. Capistrano Connections Academy 4. California Connections Academy @ Ripon This list includes all the rows available in the database that match the criteria, with a total of 4 rows. Execution time: 187.88 seconds
In [18]:
# Knowledge
user_question = "List the schools in Riverside which the average math score for SAT is greater than 400."
run_questions(user_question)
Sql Query Generated: SELECT DISTINCT T1.sname FROM satscores AS T1 INNER JOIN frpm AS T2 ON T1.cds = T2.CDSCode WHERE T2."District Name" LIKE '%Riverside%' AND T1.AvgScrMath > 400 Based on the results retrieved from your database, the schools in Riverside with an average SAT Math score greater than 400 are: - Arlington High - Martin Luther King Jr. High - John W. North High - Polytechnic High - Ramona High **River Springs Charter** is not listed among the schools that meet the criteria of having an average SAT Math score greater than 400. Here is the list of schools that matched your criteria: 1. Arlington High 2. Martin Luther King Jr. High 3. John W. North High 4. Polytechnic High 5. Ramona High Execution time: 8.24 seconds
In [19]:
# Reasoning but grounded with actual traits from DB
user_question = "Summarize the qualities of the schools in Riverside which the average math score for SAT is greater than 400."
run_questions(user_question)
Sql Query Generated: SELECT T1.sname, T2."District Name", T2."School Type", T2."Charter School (Y/N)", T2."Enrollment (K-12)", T2."Percent (%) Eligible FRPM (K-12)" FROM satscores AS T1 INNER JOIN frpm AS T2 ON T1.cds = T2.CDSCode WHERE T2."District Name" LIKE '%Riverside%' AND T1.AvgScrMath > 400; Based on the results retrieved from your database, the query focused on schools within the Riverside district that have an average math SAT score greater than 400. It provides information on the school name, district name, school type, whether the school is a charter, enrollment numbers for grades K-12, and the percentage of students eligible for Free or Reduced-Price Meals (FRPM). Here are the details of the schools that met the criteria: 1. **River Springs Charter** is a K-12 public charter school in the Riverside County Office of Education district, with an enrollment of 5,333 students. 45.42% of its students are eligible for FRPM. 2. **Arlington High** is a public high school in the Riverside Unified district, with an enrollment of 1,956 students. 68.25% of its students are eligible for FRPM. 3. **Martin Luther King Jr. High** is a public high school in the Riverside Unified district, with an enrollment of 3,324 students. 30.45% of its students are eligible for FRPM. 4. **John W. North High** is a public high school in the Riverside Unified district, with an enrollment of 2,281 students. 75.41% of its students are eligible for FRPM. 5. **Polytechnic High** is a public high school in the Riverside Unified district, with an enrollment of 2,777 students. 52.54% of its students are eligible for FRPM. 6. **Ramona High** is a public high school in the Riverside Unified district, with an enrollment of 2,160 students. 78.70% of its students are eligible for FRPM. This data provides a snapshot of the diversity and needs within the Riverside district schools, highlighting the varying levels of FRPM eligibility, which is an indicator of the socioeconomic status of the student populations. Execution time: 22.24 seconds