How to deploy ODA lite models with (active-active) VLAN configuration easily

This blog entry is a description of deploying an Oracle Database Appliances X6-2 S in a VLAN environment, but it is valid for all models of the X6-2 and X7-2 lite versions (X6-2 S, X6-2 M, X6-2 L, X7-2 S, X7-2 M). The release of the ODA software we used to deploy was 12.1.2.12 - the newest version at the time of the installation.

Re-imaging the ODA

The easiest way to start with the deployment (if the delivered machine does not already have the latest version installed) is to just re-image the ODA following the instructions that can be found at the manual for this software version. So just configure the ILOMs and start with re-imaging. You can also use the already installed ODA software and upgrade it to the newest release. Then you do test the different steps to upgrade an ODA, but as there are more steps included it's easier to re-image and the time used for both approaches doesn't differ much.

Preparation

After re-imaging, the VLAN creation can take part. We have taken the fiber bond to configure our VLAN, but one can use this article also to create VLANs using copper (then use btbond1 or btbond2 instead of sfpbond1). We didn't used the copper interface for our deployment, but on ODA X6-2 machines both network interfaces can be used in parallel (on X7-2 you have to decide either using copper OR fibre).. 
To create a VLAN, Oracle kindly provides a script called ODA_vlan_config.sh which can be found at the support.oracle.com note 2216567.1.
At this note, at the manual and even on the ODAs itself one can also find a template of a json file, which is needed to deploy the ODA. VLAN is supported on the ODA light models (since software release 12.1.2.8), but the graphical deployment user interface doesn't work in VLAN environments.
Taken the provided json template file one has to change everything that is needed to deploy the ODA silent for the own environment. We saved the json template as file /tmp/create-appliance-ODA.json before we changed the entries.
The points to change are especially the name of the instance/database, the database edition, if one want to use standard edition (actually you can't mix SE and EE deployments on the same machine), the time zone, all ip-addresses and domain names. Do NOT change the nodeNumber as it is always "0" on the ODA light models S and M. To use the right VLAN for the public interface the nicName must be changed to match the VLAN configuration.

"instance" : {
"name" : "ODA1",
"instanceBaseName" : "PROD",
"dbEdition" : "EE",
"timeZone" : "UTC",
"ntpServers" : ["10.0.3.14"],
"dnsServers" : ["10.0.4.10","10.0.4.11","10.0.4.12"],
"domainName" : "example.com",

 ...
"nodes" : [ {
"nodeNumber" : "0",
"nodeName" : "ODA1",
"network" : [ {
"nicName" : "sfpbond1.200",
"ipAddress" : "10.24.128.243",
"subNetMask" : "255.255.255.0",
"gateway" : "10.24.128.1",
"networkType" : [ "Public" ],
"isDefaultNetwork" : true


One needs to remove the whole ILOM tree from the json file as the ODA deployment otherwise can fail (and we did had the right names / ip addresses put in there but there may be a routing problem between the different VLANs and the management network). Also one needs to remove the Auto Service Request "asr" tree from the json file, if the ASR should be configured later on. If a non existing account is used at the ASR part or the user/password combination is not right, the whole deployment fails and one needs to start over again (undeploying the ODA first and re-start with the deployment). 

If needed, the disk group redundancy can be changed (NORMAL equals mirroring, HIGH equals triple mirroring which is not available with only 2 NVMe cards) and the disk percentage. The documentation tells:

Storage Configuration Options
When Oracle Database Appliance X6-2 is deployed, you can select one of the following configuration options to divide the storage capacity between DATA diskgroup and RECO diskgroup:
  • External: Storage capacity is split between 80% for DATA and 20% for RECO.
  • Internal: Storage capacity is split between 40% for DATA and 60% for RECO.
  • Custom: Storage capacity is configurable from 10% to 90% for DATA and the remainder for RECO.
Therefore, the deployment was done with a 50% split between DATA and RECO as we needed some kind of "internal" for storing the FRA (backups, archivelogs, flashback logs) locally on the ODA but kept some more space for "DATA".

Do not forget to change the entries inside the database tree at the json file. dbType can only be single instance on ODA lite models, but everything else can be set. The admin password is not only used by the database, but is set as the new "master password", e.g. also for the oracle user at linux (everything can be set after the deployment, you don't need to keep a master password for all).
 

Enable dbConsole if you want to use Enterprise Manager Express on 12c! Provide the right database shape you want to use (db shapes documentation). Keep ACFS as dbStorage, so you are able to use e.g. standard backup agents to backup files from the disks. The character set can't be changed later on, so it is also needed to change it before the deployment starts.

"database" : {
"dbName" : "PROD",
"dbVersion" : "12.1.0.2",
"instanceOnly" : false,
"isCdb" : false,
"pdBName" : "pdb1",
"pdbAdminuserName" : "pdbuser",
"adminPassword" : "welcome1",
"dbType" : "SI",
"dbTargetNodeNumber" : "0",
"dbClass" : "OLTP",
"dbShape" : "odb2",
"dbStorage" : "ACFS",
"dbCharacterSet" : {
"characterSet" : "AL32UTF8",
"nlsCharacterset" : "AL16UTF16",
"dbTerritory" : "AMERICA",
"dbLanguage" : "AMERICAN"
},
"dbConsoleEnable" : false
}

 

Creating the VLAN

After finishing the preparation, the creation of the VLAN can be done. In our deployment we have created two different VLANs on the spfbond1.  So we connected as root and provided the vlan, the bondname, the ip address, network mask and the gateway as parameters to the script we have downloaded from the support.oracle.com note.

./ODA_vlan_config.sh add -v 3001 -i sfpbond1 -ip 10.24.30.243 -n 255.255.255.0 -g 10.24.30.1

./ODA_vlan_config.sh add -v 200 -i sfpbond1 -ip 10.24.128.243 -n 255.255.255.0 -g 10.24.128.1

In our case, we needed to change the bond to run active-active instead of active-failover (the default):

vi /etc/sysconfig/network-scripts/ifcfg-sfpbond1
BONDING_OPTS="mode=8.2.3ad miimon=100 lacp_rate=0" 


If one has used configure-firstnet, it may is necessary to remove ip-address, netmask and gateway from the icfg-sfpbond1 file. 

After the configuration has been changed the network service and the dcs agent must be restarted:

service network restart

initctl stop initdcsagent
initctl start initdcsagent 

Best practice is to check  the network configuration afterwards with:
odacli list-networkinterfaces

#/opt/oracle/dcs/bin/odacli list-networkinterfaces
#
#ID Name NIC Type
#---------------------------------------- -------------------- ------------- #----------
#b324f474-627e-4b15-af83-2ed21eda845a btbond1 btbond1 Bond
#595dfda9-67a6-4fc9-96af-afe966c287a7 em1 em1 Physical
#2033de33-99fb-4d53-8235-6ef088db54fe em2 em2 Physical
#62a09165-d9a5-4b8d-8eaa-45022988215b em3 em3 Physical
#40070762-6e87-42b4-8eb2-b8f9aa642849 em4 em4 Physical
#d70bbb83-e7de-462a-8892-42bdb687d6dc sfpbond1.3001 sfpbond1.2001 Bond
#26a0c940-5705-460a-b51d-ab8723d06f35 priv0 priv0 Dummy
#49cf2709-3c75-4c2c-bf53-718757385335 sfpbond1.200 sfpbond1.200 Bond


Deploying the ODA

As the VLANs are now fully functionable, one can start the deployment (still connected as root):
/opt/oracle/dcs/bin/odacli create-appliance -r /tmp/create-appliance-ODA.json

As you can see, the previously created json file is used as an input parameter for the odacli create-appliance statement. After waiting a couple of minutes, the ODA is now deployed with spfbond1.200 as standard network interface.
So one is now able to use both VLANs for different tasks. As the listener is configured by default only with the VLAN 200 (specified at the json file), one needs to configure the second ip-address as listener endpoint manually, if needed (ssh as grid user and run netmgr). So in our case the public interface for the application is the VLAN 200 and the private interface for the data guard and the backup is VLAN 3001. 


Conclusion

Even if the deployment with the GUI is not working for VLAN environments, the ODA lite models using bare metal installation without OVM (so called virtualized platform) can easily be deployed with VLAN configuration and, if you keep in mind to remove the entries from the json files you don't need, it is fast and straightforward and works on from ODA light software release 12.1.2.9.0 (until at least 12.1.2.12, but I think this is also available for later software releases). If needed, configure additional network things (like permanent routes and rule sets in /etc/sysconfig/network-scripts).

DOAG conference review (Nuremberg, 21.-23.11.2017)

Robotron Booth
 
Time flies by - so it's time now to reflect the 3 days at the DOAG conference this year at Nuremberg. It was my second time I have held a presentation at this big event, this time it was about "Flashback Database, Backup and Recovery for Oracle Database 12c Release 2". The presentation was held at 9am at the second day and it was astonishing, the room was filled up in total. 

As the main goal of my presentation was to sensualize the administrators how to avoid traps in the new default “Single-tenant” architecture, some of the spectators may have had problems to divide between my advices for best-practice and “things not working” in database release 12.2. But they can download the presentation in the next days and I think they will see the differences then.

In addition to my own presentation, the DOAG conference was full of news and interesting other slots. I tried to follow topics I like most – tuning for example, and I have seen some really good presentations there (with new ideas for my daily work). In addition, I have met a product manager responsible for the cost-based optimizer and discussed with him the “Autodesk” tuning problem (details can be found at an older post here). I’ll send him the dump and the selects and I am wondering, which outcome I’ll get later (I'll keep you informed).

Another cool presentation was held by a Swiss employee of trivadis, regarding connecting the “Keepass” freeware to the oracle database to avoid using open passwords in database scripts. This could be something for some of my customers, too.

A lot of presentations are now without any live demos. I didone, but the lady that was presenting at my room just before me had a lot of problems with her live demo - she broke everything. I have tested my live demo two evenings before I held the presentation - and the virtualbox with my Oracle Linux crashed. My disk was full and I needed two hours in the late night to get everything running again. Thank god I did this test (because otherwise this were happen at the presentation time) and at the end it worked, but I can  comprehend, why more and more speakers don't do live demos anymore.

My German colleagues had some cloud workshops - unfortunately, cloud does not seem to be a big topic for DOAG participants. There were not so many people which wanted to do some hands on. That was really a pitty.


A funny presentation was the one from Chris Saxon, where he was doing "More SQL magic!". Magic tricks with PL/SQL and SQL options - the small room was really filled up and even if I have known the technology behind each trick - it is really nice to see what is possible within the database and Chris was able to make a show out of it. Our daily work does not seem to be magic or funny all the time, so it was really a great moment to see him. One of the tricks he showed at the DOAG you can find at youtube (https://www.youtube.com/watch?v=zgXstJZETm0) and I think it's worth to follow him (and his blogs) as a developer as well as a dba (http://www.chrissaxon.co.uk/).

Most impressive are presentations which do not have anything to do with our daily work – at least at a first glimpse. One of the keynotes were held by a German Lufthansa pilot trainer and it was a very impressive presentation. He took some aircraft accidents and told in detail why they happened – and why people died or why not, and what the aircraft industry is doing to avoid crashes from a team member perspectives view. At the end, it is all about respect – trust – teaming and avoiding hierarchical levels at the cockpit (for sure they exist) for the time two pilots are working together. The aircraft industry is more or less forced since years to do a lot of the stuff, other industries are talking about now: Knowledge clusters, flat hierarchies, trust, responsibility and respect. All people on a plane are co-workers and all are doing everything to bring “my aircraft” safe home.

In total, the DOAG was again a big success, I think, for the referents as well as for the participants. You can attend dozens of presentations over 3 days – but with a high number of presentations in parallel, you sometimes decide to attend the wrong presentation. I had that only once this time. The DOAG is THE place for the DACH region to learn, to share, to see, to be seen and to meet old friends and new, interesting people!
The next DOAG conference (the 31st) will again be at Nuremberg in November 2018 - so if you understand German, you should attend. One of the main topics there, I think, will be the database release 18. The first release with the new naming convention. I’m looking forward to that…

The "better quality" empty array - or: Why doing nasty things can help avoiding ORA-00600!

Sometimes, Oracle databases are doing "crazy" things - and often, this results in ORA-00600 or ORA-07445 errors. Yes, it's software and there isn't any software in the world without bugs.

This time, we've got an ORA-00600 error, only for  a specific database version (12.1) and only with a specific PSU level. Or at least, we only could reproduce this error only on this database version and PSU level (which does not mean, by experience, it could not happen anywhere else):
ORA-00600 [pmus.c#101] and/or ORA-00600 [pmus.c#103] where raised while importing a special file into our database. 

While my colleagues found at least two (unfixed) bugs in support.oracle.com and opened an SR, I tried to figure out in deep with one of our senior developers, what happens here, how and when we can reproduce this error and maybe, how can we create a workaround???

In the trace file, I have found an interesting thing which signaled the error:

========== FRAME [15] (pmusmal_Trim_All()+821 -> kgeasnmierr()) ==========
defined by frame pointers 0x7fffa4b69fb0  and 0x7fffa4b69f40
CALL TYPE: call   ERROR SIGNALED: yes   COMPONENT: (null)

   
At this point in time, we did now knew what signaled the error inside of our code, only that an internal oracle function tried to Trim_All, but trimming what? 
While testing, we've found out that a combination of IN/OUT parameters together with a loop were responsible for triggering this error. But wrapping the function with additional local variable declaration, the error wasn't raised anymore. Wrapping the function is, unfortunately, ok in a sql session, but not in the pl/sql code of our application. 

So this code raised an error while other, rewritten code, didn't:
DECLARE
  p_ifi_id NUMBER := 12345;
BEGIN
  FOR i IN 1 .. 10 LOOP
    DECLARE
      ld_startzeitstempel DATE;
      lt_val              pkg_zr_def.t_table_of_value;
      lt_sta              pkg_zr_def.t_table_of_stati;

-- TYPE t_table_of_stati is TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER;       lt_sta2             pkg_zr_def.t_table_of_stati;
      lt_sta2_flag        pkg_zr_def.t_table_of_flags;
      lt_zeitstempel      pkg_import_lg.t_table_of_string; -- TYPE t_table_of_string is TABLE OF VARCHAR2(20) INDEX BY BINARY_INTEGER;
 
      lt_dstg             pkg_import_lg.t_table_of_string;
      lc_zpt_id           VARCHAR2(80);
      lc_linie            VARCHAR2(80);
      ld_ghi_von          DATE;
      ld_ghi_bis          DATE;
      ln_kan_id           NUMBER;
      ln_ghi_id           NUMBER;
      ln_zrd_id           NUMBER;
      l_index          PLS_INTEGER;
      l_sta_orig       VARCHAR2(10);
      l_act_state_list VARCHAR2(1000);
    BEGIN
      pkg_import_lg.prc_get_lastprofil

                                      (p_ifi_id      => p_ifi_id,
                                       p_rec_id      => i,
                                       p_putarr_von  => ld_startzeitstempel,
                                       p_val         => lt_val,
                                       p_sta         => lt_sta,
                                       p_sta2        => lt_sta2,
                                       p_sta2_flag   => lt_sta2_flag,
                                       p_zeitstempel => lt_zeitstempel,
                                       p_dstg        => lt_dstg,
                                       p_zpt_id      => lc_zpt_id,
                                       p_linie       => lc_linie,
                                       p_ghi_von     => ld_ghi_von,
                                       p_ghi_bis     => ld_ghi_bis,
                                       p_kan_id      => ln_kan_id,
                                       p_ghi_id      => ln_ghi_id,
                                       p_zrd_id      => ln_zrd_id);
    END;
  END LOOP;
END;


Moving on with the search, I have found a little nice sentence at support.oracle.com:
Bug 6085136 : ORA-00600 [PMUS.C#103] WHEN SUBMITTING ORACLE APPS TIMECARD. 

At the first sight, this is an Oracle Application error that should have nothing to do with our problem, but read by yourself:
"pmus.c is for handling plsql collections.  At the point it fails it seems to be setting a plsql table variable to null. Part of this involves trimming all elements from the collection and it is during the trim it falls over." 

Didn't we have found this pmusmal_Trim_All()+821 at the trace file? Looking for the call stack I've found this: 
----- Abridged Call Stack Trace -----
ksedsts()+244<-kjzdssdmp()+321<-kjzduptcctx()+692<-kjzdpcrshnfy()+347<-kstdmp()+352<-dbkedDefDump()+8728<-ksedmp()+261<-ksfdmp()+89<-dbgexPhaseII()+1845<-dbgexProcessError()+2539<-dbgeExecuteForError()+110<-dbgePostErrorKGE()+1870<-dbkePostKGE_kgsf()+90<-kgeadse()+429
<-kgerinv_internal()+49<-kgerinv()+40<-kgeasnmierr()+150<-pmusmal_Trim_All()+821<-pmucstrmall()+46<-pevm_SETN()+482
----- End of Abridged Call Stack Trace ----- 


PMUCSTRMALL? Doesn't sound this like a function for handling "strings"? And we do have some table of strings, working as an IN OUT parameter to our procedure.  Then we've found the second bug note with a really nice hint for us: Bug 24421301 : QP_MODIFIERS_PUB ORA-600 [PMUS.C#103] CALLING PACKAGE tells us
"From the tracefile ebsmock51_ora_10693_i312469.trc and in conjunction with the plsql code, plsql callstack, I can see that - an attempt was made to (re-)initialize the package globals ( records, collections ) in PACKAGE QP_Modifiers_PUB via BSETN , and that's when the internal error occurred, while processing the null initialization of a certain collection in that package."

And again, this sounded like it was the same as our problem (global initialization of a package, remote declaration of an array,...), and yes, we do only have the problem when we are running through a loop. We don't have it, if we use local variables and different calls to this procedure. 
Time to hand this over to Oracle in our SR, you think? Well, I did had a different view on that (I don't like to be streamlined, you know) - and I thought: There MUST be a usable workaround and I WANT to find it together with our developer


As a summary: It must have to do with arrays of string, it must have to do something with initialization of these arrays while initializing a package global and it only happens, if you loop, not if you call the procedure for the first time.

Next, I'd suggested something really nasty: Why don't do something like "initialize" our pl/sql tables of strings? Or just deleteing the NEW declared tables of strings??? So we've tried to add the deletes and we 've put a dbms_output into the code to find out, if the variables are really empty at runtime. 

Take a look at the following code (changes to the upper code are marked green):
DECLARE
  p_ifi_id NUMBER := 12345;
BEGIN
  FOR i IN 1 .. 10 LOOP
    DECLARE
      ld_startzeitstempel DATE;
      lt_val              pkg_zr_def.t_table_of_value;
      lt_sta              pkg_zr_def.t_table_of_stati;
      lt_sta2             pkg_zr_def.t_table_of_stati;
      lt_sta2_flag        pkg_zr_def.t_table_of_flags;
      lt_zeitstempel      pkg_import_lg.t_table_of_string;
      lt_dstg             pkg_import_lg.t_table_of_string;
      lc_zpt_id           VARCHAR2(80);
      lc_linie            VARCHAR2(80);
      ld_ghi_von          DATE;
      ld_ghi_bis          DATE;
      ln_kan_id           NUMBER;
      ln_ghi_id           NUMBER;
      ln_zrd_id           NUMBER;
      l_index          PLS_INTEGER;
      l_sta_orig       VARCHAR2(10);
      l_act_state_list VARCHAR2(1000);
    BEGIN

      dbms_output.put_line(lt_zeitstempel.count || ' xxx ' || lt_dstg.count);
      lt_zeitstempel.delete;
      lt_dstg.delete;
      pkg_import_lg.prc_get_lastprofil

                                      (p_ifi_id      => p_ifi_id,
                                       p_rec_id      => i,
                                       p_putarr_von  => ld_startzeitstempel,
                                       p_val         => lt_val,
                                       p_sta         => lt_sta,
                                       p_sta2        => lt_sta2,
                                       p_sta2_flag   => lt_sta2_flag,
                                       p_zeitstempel => lt_zeitstempel,
                                       p_dstg        => lt_dstg,
                                       p_zpt_id      => lc_zpt_id,
                                       p_linie       => lc_linie,
                                       p_ghi_von     => ld_ghi_von,
                                       p_ghi_bis     => ld_ghi_bis,
                                       p_kan_id      => ln_kan_id,
                                       p_ghi_id      => ln_ghi_id,
                                       p_zrd_id      => ln_zrd_id);
    END;
  END LOOP;
END;


The output of the dbms_output was 0xxx0 - this is, what it should be. The variable is declared for EVERY loop again, so it's a new, empty, fresh variable that can NEVER hold any values at the time the dbms_ouput is executed! But - with our deletes, the ORA-00600 wasn't raised again!!
So we made a "better quality empty new created" variable out of an "empty new created" variable and that heals the Oracle bug??? Yes... Strange, isn't it??? And this happend only with VARCHAR2(20) and not with VARCHAR2(10) arrays??? Indeed... 

Now we have at least our workaround while my colleagues try to figure out, if patching the database with a newer PSU will fix that problem and they have time to work together with Oracle support to get a solution working for all databases.
Nevertheless, this shows me again - it's worth to try things you can call stupid, dump or nasty if you run in an Oracle database bug.

The only thing that counts at the end is: The customer can run his daily business without being interrupted by an internal Oracle database error.

----- Update December 2017 -----

The problem is now filed as an official bug at support.oracle.com:

Bug 27242226 : ORA-600[PMUS.C#103] WHILE CREATING/EXECUTING PACKAGE IN SINGLE BYTE CHARACTERSET

The bug was invented with 12.1.0.1 (error does not happen in 11.2.0.4) and is still open at the actual version (12.2.0.1) with the latest PSUs.

Update (19.09.2018) - A fix is available, have a look at this new post for details.