ROracle, SQL, and other things I forgot

This week marks me reinstalling {ROracle} and looking into accessing our datawarehouse directly from R. Something I had pushed for a quite a while now.

Getting up and running

Installing ROracle on our work laptops is not a “straight forward” excercise. There exists also quite a number of blog posts and messages highlighting that this is generally a pain on other systems as well.

For ECTRL check out the installation guide set-up by Enrico.

In particular, follow the given path and copy the pre-compiled ROracle library from the central drive to your personal user account library folder (something in … dev-library\4.0).

Install {DBI}.

In a nutshell this should do it.

Workflow

To work with PRISME, we need a “connection”: For this we set the database driver and some user and database details.

The following helper function gets us connected.
As recommended elsewhere, never save a password in a script, etc. Thus, we use askForPassword() … you should know the pw :)

connect_to_prisme <- function(.dbname = "porape5", .user = "prutest"){
    drv <- DBI::dbDriver("Oracle")
    con <- DBI::dbConnect(
         drv
        ,dbname   = .dbname,
        ,  user   = .user
        ,password = rstudioapi::askForPassword("Database password")
                     )
    return(con)
}

# store connection session
con <- connect_to_prisme()

Sniffing around

The principle of databasing is “send a query” and “store/process” the result.

The following are some “looking around” things

rs  <- dbSendQuery(con, "select view_name from user_views")
dat <- fetch(rs)
head(dat)

Check out what tables we can access:

rs <- dbSendQuery(con, "SELECT
  table_name, owner
FROM
  all_tables
ORDER BY
  owner, table_name
")
prisme_tbls <- fetch(rs)

Querying the APDF FACT table

rs <- dbSendQuery(con, "SELECT *
                     FROM SWH_FCT. FAC_APDS_FLIGHT_IR691
                   WHERE SRC_airport = 'EDDF'
                   and SRC_DATE_FROM >= '01-JAN-21'
                   and SRC_DATE_FROM < '01-AUG-21' ")
eddf <- fetch(rs)

Construct better queries by using relevant colnames: > names(eddf) [1] “APDS_ID” “AP_C_FLTID” “AP_C_REG” “SK_ADEP_ID”
[5] “BK_ADEP_ID” “ADEP_ICAO” “SK_ADES_ID” “BK_ADES_ID”
[9] “ADES_ICAO” “MVT_TIME_UTC” “CPF_MVT_TIME_UTC” “CFMU_MVT_TIME_UTC”
[13] “AP_MVT_TIME_UTC” “CODA_MVT_TIME_UTC” “BLOCK_TIME_UTC” “AP_BLOCK_TIME_UTC”
[17] “CODA_BLOCK_TIME_UTC” “SCHED_TIME_UTC” “AP_C_FLTRUL” “AP_C_FLTTYP”
[21] “CFMU_FLTTYP” “ARCTYP” “SK_AC_TYPE_ID” “BK_AC_TYPE_ID”
[25] “AP_C_ARCTYP” “AP_S_ARCTYP” “CFMU_ARCTYP” “AC_CLASS”
[29] “AP_C_RWY” “AP_C_STND” “AP_S_RWY” “AP_S_STND”
[33] “AP_C_STATUS” “SRC_AIRPORT” “SRC_PROJECT” “SRC_SYSTEM”
[37] “SRC_PHASE” “SRC_DATE_FROM” “SRC_DATE_TO” “IM_SAMAD_ID”
[41] “IM_LOBT” “IM_ACTUAL_OUT” “IM_EXEC_DATE” “APDS_CREA_LOG_TASK_ID”
[45] “APDS_CONV_LOG_TASK_ID” “APDS_IM_LOG_TASK_ID” “CPF_AIRPORT_LAT” “CPF_AIRPORT_LON”
[49] “C40_CROSS_TIME” “C40_CROSS_LAT” “C40_CROSS_LON” “C40_CROSS_FL”
[53] “C100_CROSS_TIME” “C100_CROSS_LAT” “C100_CROSS_LON” “C100_CROSS_FL”
[57] “C40_TRANSIT_TIME_MIN” “C100_TRANSIT_TIME_MIN” “TAXI_TIME_MIN” “C40_BEARING”
[61] “C100_BEARING” “C40_ARR_IN_FRONT” “C100_ARR_IN_FRONT” “ARR_PRECEDING_1H”
[65] “TF_PRECEDING_1H” “CREA_LOG_TASK_ID” “ARR_THROUGHPUT_20MIN” “DEP_THROUGHPUT_20MIN”
[69] “TF_IN_OUT_WHILE_TAXIOUT” “TF_IN_OUT_WHILE_TAXIOUT_CODA” “CODA_AIRLINE” “CODA_QC”
[73] “C40_CROSS_TIME_CTFM” “C100_CROSS_TIME_CTFM” “C40_CROSS_TIME_CPF” “C100_CROSS_TIME_CPF”
[77] “A40_ENTRY_SECTOR” “A40_UNIMPEDED_TIME” “A40_UNIMPEDED_DIST_KM” “A40_SATURATION_LEVEL”
[81] “A40_IS_UNIMPEDED” “A100_ENTRY_SECTOR” “A100_UNIMPEDED_TIME” “A100_UNIMPEDED_DIST_KM”
[85] “A100_SATURATION_LEVEL” “A100_IS_UNIMPEDED” “TAXI_OUT_UNIMPEDED_TIME” “TAXI_OUT_SATURATION_LEVEL”
[89] “TAXI_OUT_IS_UNIMPEDED” “PREV_BLOCK_TIME_UTC” “CFMU_RTE_LEN_KM” “CFMU_FLIGHT_DURATION_HOUR”
[93] “C40_TRANSIT_DIST_KM” “C100_TRANSIT_DIST_KM” “SVCTYP” “IFPLID”
[97] “DLY_SUB1” “DLY_SUB2” “DLY_SUB3” “DLY_SUB4”
[101] “DLY_SUB5” “DLY1” “DLY2” “DLY3”
[105] “DLY4” “DLY5” “ARR_THR_20MIN_NONTRUNC” “DEP_THR_20MIN_NONTRUNC”
[109] “DLY_TIME1” “DLY_TIME2” “DLY_TIME3” “DLY_TIME4”
[113] “DLY_TIME5” “ASLT_UTC” “DSLT_UTC” “DE_ANTI_ICING”
[117] “MVT_TIME_LT” “CODA_DLY1” “CODA_DLY2” “CODA_DLY3”
[121] “CODA_DLY4” “CODA_DLY5” “CODA_DLY_TIME1” “CODA_DLY_TIME2”
[125] “CODA_DLY_TIME3” “CODA_DLY_TIME4” “CODA_DLY_TIME5” “TF_IN_OUT_WHILE_TAXIIN”
[129] “TF_IN_OUT_WHILE_TAXIIN_CODA”

Some regular stuff I do

load connection with above function

with tbl(con, sql(query)) I can launch a sort of “lazy connection”to PRISME/Oracle database, and work onwards with dplyr queries.

to download the data append %>% collect()

library(ROracle)
library(dplyr)
library(dbplyr)


sql <- "SELECT * FROM SWH_FCT.FAC_AXDS_CANCELLED_FLIGHT"
cncl_db <- tbl(con, sql(sql))

Resources

This article provides an example: https://technology.amis.nl/languages/r/r-and-the-oracle-database-using-dplyr-dbplyr-with-roracle-on-windows-10/

Good luck with making this going!

Avatar
Rainer Koelle
Head Operational ANS Performance

I am a pilot, air traffic controller, and engineer, and work for EUROCONTROL, Performance Review Unit. My research interests revolve around operational air navigation and/or air transportation system performance applying a mix of data science and system analytics. In a side role I am also interested in time-critical decision making applied in aviation security and critical infrastructure protection.

Previous