Hands On Lab Page 41
(
key mapped by (ORDER_DAY_KEY),
cf_data:cq_ORGANIZATION_KEY mapped by (ORGANIZATION _KEY),
cf_data:cq_EMPLOYEE_KEY mapped by (EMPLOYEE_KEY),
cf_data:cq_RETAILER_KEY mapped by (RETAILER_KEY),
cf_data:cq_RETAILER_SITE_KEY mapped by (RETAILER_SI TE_KEY),
cf_data:cq_PRODUCT_KEY mapped by (PRODUCT_KEY),
cf_data:cq_PROMOTION_KEY mapped by (PROMOTION_KEY),
cf_data:cq_ORDER_METHOD_KEY mapped by (ORDER_METHOD _KEY),
cf_data:cq_SALES_ORDER_KEY mapped by (SALES_ORDER_K EY),
cf_data:cq_SHIP_DAY_KEY mapped by (SHIP_DAY_KEY),
cf_data:cq_CLOSE_DAY_KEY mapped by (CLOSE_DAY_KEY),
cf_data:cq_QUANTITY mapped by (QUANTITY),
cf_data:cq_UNIT_COST mapped by (UNIT_COST),
cf_data:cq_UNIT_PRICE mapped by (UNIT_PRICE),
cf_data:cq_UNIT_SALE_PRICE mapped by (UNIT_SALE_PRI CE),
cf_data:cq_GROSS_MARGIN mapped by (GROSS_MARGIN),
cf_data:cq_SALE_TOTAL mapped by (SALE_TOTAL),
cf_data:cq_GROSS_PROFIT mapped by (GROSS_PROFIT)
);
2. Load data into this table from your source file. Adjust the file URL specification
as needed to match your environment.
LOAD HADOOP using file url
'sftp://yourID:
[email protected]:22/your -
dir/GOSALESDW.SLS_SALES_FACT.txt'with SOURCE PROPER TIES
('field.delimiter'='\t') INTO TABLE bigsqllab.sls_s ales_fact_nopk;
3. Count the number of rows in your table. Note tha t there are only 440 rows.
select count(*) from bigsqllab.sls_sales_fact_nopk;
4. Consider what just occurred. You loaded a file with 446023 records into your Big
SQL HBase table without error, yet only 440 rows are present in your table. That's
because HBase ensures that each row key is unique. If you put 5 different records with
the same row key into a native HBase table, your HBase table will contain only 1 current
row for that row key. Because you mapped a SQL column with non1unique values to the
HBase row key, HBase essentially updated the information for those rows containing
duplicate row key values.
4.3. Mapping multiple SQL columns to one HBase row key or column
Until now, you've mapped each field from a source file (i.e., each SQL column in the source relational
table) to a single Big SQL HBase column. Although straightforward to implement, this one1to1one
mapping approach has a significant drawback: it consumes considerable disk space. Why? As you
learned earlier, HBase stores full key information (row key, column family name, column name, and
timestamp) along with each cell value. As a result, HBase tables with many columns can consume
considerable disk space.