Wednesday, April 27, 2022

AWS Athena's Information_Schema

The information_schema in most database systems is very useful to get information about tables, views, columns that might be available in that db.

Here are the information schema views available. (WARNING: queries against the information_schema views are very slow)


 Schemas:

SELECT

  catalog_name,  schema_name

FROM information_schema.schemata

 Tables:

SELECT

  table_catalog,  table_schema,  table_name,  table_type

FROM information_schema.tables

Columns:

select

  table_catalog,  table_schema,  table_name,  column_name,  data_type,  is_nullable,

  column_default,  comment

from information_schema.columns

order by

  table_catalog,  table_schema,  table_name,  ordinal_position


Retrieved by looking at the PrestoDB test cases: presto/presto-product-tests/src/main/resources/sql-tests/testcases/system at 2ad67dcf000be86ebc5ff7732bbb9994c8e324a8 · prestodb/presto · GitHub 

Pentaho Data Integration - Java Defined Class - Main working sample

 The sample code provided in PDI for "main" doesnt work. Here is the fixed up version:

mypublicnotes/Sample-UserDefinedJavaClass-Main.md at master · rajrao/mypublicnotes (github.com)

Pentaho Data Integration - Stripping characters out of string fields

 Here is a generic method (actually 2) for stripping out characters from all string fields in Pentaho.

mypublicnotes/Sample-Generic-Input-FieldFixerUpper.md at master · rajrao/mypublicnotes (github.com)

These methods are convenient when you are loading a variety of tables or files and you dont know the names of all the string fields ahead of time and so cannot use field based transformations (like Replace in String, etc).