This example shows how to explore the structure of an Apache Cassandra® database and import data from a Cassandra database table into MATLAB® using a Cassandra database connection with the Apache Cassandra database C++ interface. The Cassandra database stores database tables according to the partition key. The partition key affects how the data is filtered in the database.
In this example, the Cassandra database contains the employees_by_job database table with employee data and the job_id partition key.
Create a Cassandra database connection using the configured data source CassandraDataSource and a blank user name and password. The apacheCassandra function returns conn as a connection object.
datasource = "CassandraDataSource"; username = ""; password = ""; conn = apacheCassandra(datasource,username,password);
View the keyspaces in the Cassandra database using the Cassandra database connection. The returned string array contains the keyspaces. A keyspace contains one or more database tables and defines how the database replicates the data in the tables.
conn.Keyspaces
ans = 6×1 string
"employeedata"
"system"
"system_auth"
"system_distributed"
"system_schema"
"system_traces"
Return the names of the Cassandra database tables in the employeedata keyspace. t is a string array that contains the names of the database tables in the employeedata keyspace.
keyspace = "employeedata";
t = tablenames(conn,keyspace)t = 3×1 string
"employees_by_id"
"employees_by_job"
"employees_by_name"
Return the names of the Cassandra database columns in the employees_by_job database table.
tablename = "employees_by_job";
cols = columninfo(conn,keyspace,tablename);Display the first few names of the Cassandra database columns in the employees_by_job database table.
head(cols)
ans=8×4 table
Name DataType PartitionKey ClusteringColumn
________________ ________ ____________ ________________
"job_id" "text" true ""
"hire_date" "date" false "DESC"
"employee_id" "int" false "ASC"
"commission_pct" "double" false ""
"department_id" "int" false ""
"email" "text" false ""
"first_name" "text" false ""
"last_name" "text" false ""
cols is a table with these variables:
Name — Cassandra database column name
DataType — Cassandra Query Language (CQL) data type of the Cassandra database column
PartitionKey — Partition key indicator
ClusteringColumn — Clustering column indicator
The value in the PartitionKey variable indicates whether the database column is a partition key. The column job_id (job identifier) is a partition key in this database table.
Import data from the employees_by_job database table into MATLAB using the partition key value for shop clerk, SH_CLERK.
keyValue = "SH_CLERK"; results = partitionRead(conn,keyspace,tablename, ... keyValue);
Display the first few rows of the returned employee data.
head(results)
ans=8×13 table
job_id hire_date employee_id commission_pct department_id email first_name last_name manager_id office performance_ratings phone_number salary
building room
__________ ___________ ___________ ______________ _____________ __________ __________ __________ __________ ________________ ___________________ ______________ ______
"SH_CLERK" 03-Feb-2008 183 NaN 50 "GGEONI" "Girard" "Geoni" 120 "West" 121 {[ 2]} "650.507.9879" 2800
"SH_CLERK" 13-Jan-2008 199 NaN 50 "DGRANT" "Douglas" "Grant" 124 "West" 139 {3×1 int32} "650.507.9844" 2600
"SH_CLERK" 19-Dec-2007 191 NaN 50 "RPERKINS" "Randall" "Perkins" 122 "West" 255 {[ 2]} "650.505.4876" 2500
"SH_CLERK" 21-Jun-2007 182 NaN 50 "MSULLIVA" "Martha" "Sullivan" 120 "East" 112 {2×1 int32} "650.507.9878" 2500
"SH_CLERK" 21-Jun-2007 198 NaN 50 "DOCONNEL" "Donald" "OConnell" 124 "East" 125 {3×1 int32} "650.507.9833" 2600
"SH_CLERK" 17-Mar-2007 195 NaN 50 "VJONES" "Vance" "Jones" 123 "West" 236 {5×1 int32} "650.501.4876" 2800
"SH_CLERK" 07-Feb-2007 187 NaN 50 "ACABRIO" "Anthony" "Cabrio" 121 "West" 345 {3×1 int32} "650.509.4876" 3000
"SH_CLERK" 11-Jul-2006 190 NaN 50 "TGATES" "Timothy" "Gates" 122 "East" 297 {3×1 int32} "650.505.3876" 2900
results is a table that contains these variables:
job_id — Job identifier
hire_date — Hire date
employee_id — Employee identifier
commission_pct — Commission percentage
department_id — Department identifier
email — Email address
first_name — First name
last_name — Last name
manager_id — Manager identifier
office — Office location (table that contains two variables for the building and room)
performance_ratings — Performance ratings
phone_number — Phone number
salary — Salary
close(conn)
apacheCassandra | close | columninfo | partitionRead | tablenames