Oracle In-Memory Base-Level (Part 1: How to setup a test environment with example data)

As everyone is now (hopefully) upgrading their Oracle database to 19c as long term release, most of you gain new functionality. I don't want to mention everything new from 11.2.0.4 or twelve-something to 19c, but there is something that was added, not quietly, but really surprisingly to 19c. 

The In-Memory Base-Level was first announced with the Oracle 20c database beta, which was made available at the Oracle cloud at the beginning of this year. Later on Oracle promulgated that this new feature is available in all Enterprise Edition licenses, starting with version 19.8 (there are some exceptions, e.g. this feature is not allowed at 3rd-party clouds, so there is no availability e.g. at AWS or Azure and some Operating Systems will follow later).

The In-Memory feature is nothing new for Enterprise Edition customers which have bought the In-Memory option. It is available since 12.1, was enhanced and developed with a lot new features since then, and was now released with some restrictions (like the 16 GB maximum size of the In-Memory Column Store, more about the features and restrictions I will post at the second part of this In-Memory Base-Level series).

Before we can start to test the In-Memory Base-Level, we do need to setup an environment. For me as consultant, I can't take a dump from an existing database. It would contain most sensitive data on the one hand, on the other hand I would burst the size of the SSD of my laptop. Therefore, I need to setup something, where I can test things, made especially for my test case scenarios. 

As you may want to do that by yourself, I will tell you, how you can setup a simple environment to test the In-Memory Base-Level using the sample HR schema from Oracle. 

The how-to: 

A) Setup the Example Schemas

1.) Setup a 19.8 database on Linux or Windows with your preferred architecture and customize everything, that one can connect with sqlplus to the database (or pluggable database).

2.) Download the sample schema creation scripts from Github

3.) Copy the scripts to a working directory on your test machine, from which you can run them, and extract them.

4.) Unfortunately, the scripts do have "__SUB__CWD__" folder names inside the sqls and dat files. One needs to change that to the working directory. This is possible e.g. with the following perl command at Linux:

perl -p -i.bak -e 's#__SUB__CWD__#'$(pwd)'#g' *.sql */*.sql */*.dat

If you are on Windows, you can either try to use the perl.exe from your %ORACLE_HOME%/perl/bin and replace $(pwd) at the command with your working directory (but don't kill me, if it does not work, I haven't tested that yet) or you can use e.g. Notepad++ (my preferred tool). With Notepad++ you can use "find in files", specify your working directory and what you want to replace inside of the files of the working directory and it's subdirectories. Once the scripts are ready, you can proceed with

5.) the creation of a tablespace for the sample schemas, e.g. example. If you have setup db_create_file_dest it is a simple 

SQL> create tablespace example;

If you don't know, if this parameter is set, you should check that before:

SQL> show parameter db_create_file_dest

N
AME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest             string     +DATA

6.) I prefer to set this new tablespace as default, because as I do not only run the script but do a lot more, it could otherwise happen that things may be stored accidently at the SYSTEM tablespace or somewhere else.

SQL> alter database default tablespace example;

Remark: The script which is used to create the sample schemas can also create a tablespace with datafiles, but with PDBs there may be an error and personally, I like to do it before and pass only the tablespace name to the script.

7.) Now one can create the sample schemas (please read the readme.txt about the parameters) by running the mksample script. In my environment (I set all passwords to "oracle") I do run the following:

SQL> @/home/oracle/Downloads/db-sample-schemas-19c/mksample oracle oracle oracle oracle oracle oracle oracle oracle example temp $ORACLE_HOME/demo/schema/log pdb1

After the script was running succesful, one now has the example schemas installed, but the number of rows in there does not allow to test any In-Memory functionality, as everything is far to small.

B) Add data into example schemas

Now we can start to add data into the example schemas. We can do this either manually or by script. E.g. I used the HR schema for my tests, so I begin with adding additional rows into the look-up tables Jobs, Locations and Departments. 

1.) Let's do it with jobs:

SQL> insert into jobs values ('IM_TEST','In-Memory-Tester',14000,240000);
        1 row created.
SQL> insert into jobs values ('GG','Good Guys',14,2400);
        1 row created.
SQL> commit;
        Commit complete.
 
Now we do need some more locations (at my sample there are only 23 rows in it). 

2.) Locations:
First we change the LOCATION_ID and the corresponding sequence to fill the table up a little bit more.

SQL> alter sequence locations_seq maxvalue 1000000000 increment by 1;
SQL> alter table locations modify location_id number(9);

As Locations does not have an index on "STREET_ADDRESS" we want to add the STREET_ADDRESS of the already existing rows and add some random data to the other fields. Logically, this does not make any sense, but for testing purposes that's quite ok.

 declare
  i number;

  begin

    for i in 1..200000 loop

    begin

    insert into locations (location_id, street_address, postal_code, city, state_province, country_id)

          values (locations_seq.nextval,

          (select street_address from (select * from locations order by dbms_random.random) where rownum=1),

          trunc(dbms_random.value(1000,999999)),

          (select city from (select * from locations order by dbms_random.random) where rownum=1),

          '',

         (select country_id from (select * from countries order by dbms_random.random) where rownum=1)

          );

    exception when dup_val_on_index then null;

    end;

  end loop;

end;

/
commit;

One ends up with 200'023 rows, but STREET_ADDRESS does have 23 different entries only and it is not evenly distributed.
 

 
 
3.) Departments
For all these locations, we now need some departments:

SQL> alter sequence departments_seq maxvalue 1000000000 increment by 1;
SQL> alter table departments modify department_id number(9);
SQL> alter table departments modify location_id number(9);
SQL> alter table departments modify department_name varchar(100);

As I want to test the In-Memory function with "like XXX%" values, I just add more departments by insert into as select and add the sequence to the DEPARTMENT_NAME.

 You can run something like this, to get also different location_ids and manager_ids per department:

declare
  i number;
begin
   for i in 1..500 loop
      insert into departments
         select departments_seq.nextval,
                substr(department_name,1,20)||'-'||(to_char(trunc(departments_seq.currval))),
                (select manager_id from (select * from departments order by dbms_random.random) where rownum=1),
                (select location_id from (select * from locations order by dbms_random.random) where rownum=1)
                from departments
                where rownum < 30;
   end loop;
end;

After I have run this (and changed some values also) some times, I have now 560'000 different departments in the table, which looks like:

 

4.)  And last, we add some more employees. To test the compression of the In-Memory store, I have created roundabout 750'000 rows with senseless data (because I just use the dbms_random package to create character strings). One can run this several times to blow up the database even more than I did (due to limited resources on my notebook):

SQL> alter table employees modify department_id number(9);
SQL> alter table employees modify email varchar2(50);

declare
  cursor c1 is select department_id from departments where rownum < 20000;
  begin
    for c1_rec in c1 loop
    begin
  insert into employees (employee_id, first_name, last_name, email, hire_date, job_id, salary, department_id)
          values (employees_seq.nextval,
                  dbms_random.string('a',TRUNC(DBMS_RANDOM.value(5,20))),
                  dbms_random.string('a',TRUNC(DBMS_RANDOM.value(5,20))),
                  dbms_random.string('a',TRUNC(DBMS_RANDOM.value(5,20)))||'@mycompany.com',
                  TRUNC(SYSDATE + DBMS_RANDOM.value(0,366)),
                  'IN-MEM',
                  TRUNC(DBMS_RANDOM.value(1000,100000)),
          c1_rec.department_id);
      exception when dup_val_on_index then null;
      end;
  end loop;
end;
/
 
Like I said, the data itself is really senseless, a sample: 
 

 
One of the test scenarios which is mentionend in the In-Memory documentation is a performance improvement when "selecting only some rows of a table with a higher number of columns".
To allow to do some tests with that, I enhance employees by additional columns and add the data from the original columns (but without creating indexes or constraints) by an update. 
 
SQL> alter table employees add (employee_id_1 number(6), first_name_1 varchar2(20),
 last_name_1 varchar2(25), email_1 varchar2(50), phone_number_1 varchar2(20),
 hire_date_1 date, job_id_1 varchar2(10), salary_1 number(8,2), commission_pct_1 number(2,2),
 manager_id_1 number(6), department_id_1 number(9),
 employee_id_2 number(6), first_name_2 varchar2(20), last_name_2 varchar2(25),
 email_2 varchar2(50), phone_number_2 varchar2(20), hire_date_2 date,
 job_id_2 varchar2(10), salary_2 number(8,2), commission_pct_2 number(2,2),
 manager_id_2 number(6), department_id_2 number(9),
 employee_id_3 number(6), first_name_3 varchar2(20), last_name_3 varchar2(25),
 email_3 varchar2(50), phone_number_3 varchar2(20), hire_date_3 date,
 job_id_3 varchar2(10), salary_3 number(8,2), commission_pct_3 number(2,2),
 manager_id_3 number(6), department_id_3 number(9),
 employee_id_4 number(6), first_name_4 varchar2(20), last_name_4 varchar2(25),
 email_4 varchar2(50), phone_number_4 varchar2(20), hire_date_4 date,
 job_id_4 varchar2(10), salary_4 number(8,2), commission_pct_4 number(2,2),
 manager_id_4 number(6), department_id_4 number(9));

SQL> update employees set
        employee_id_1=employee_id, employee_id_2=employee_id, employee_id_3=employee_id, employee_id_4=employee_id,
        first_name_1=first_name, first_name_2=first_name, first_name_3=first_name, first_name_4=first_name,
        last_name_1=last_name, last_name_2=last_name, last_name_3=last_name, last_name_4=last_name,
        email_1=email, email_2=email, email_3=email, email_4=email,
        hire_date_1=hire_date, hire_date_2=hire_date, hire_date_3=hire_date, hire_date_4=hire_date,
        job_id_1=job_id, job_id_2=job_id, job_id_3=job_id, job_id_4=job_id,
        salary_1=salary, salary_2=salary, salary_3=salary, salary_4=salary,
        department_id_1=department_id, department_id_2=department_id, department_id_3=department_id, department_id_4=department_id;

SQL> commit;

 
That's it. With this data in place we can implement the In-Memory Base level and test some SQLs. I will write a second blog post about that soon.

No comments:

Post a Comment

Your comment will be published soon...