six.impossible: August 2010

Thursday, August 19, 2010

Oracle on OpenSolaris, part 2

If this is going to be a sandbox, it needs more toys. Oracle requires certain packages. The installer still uses Motif. I'm ambivalent about installing a full Gnome desktop, but I'll want X and VNC. And compilers and GNU make.

All actions are performed in the oracle zone as root, until the point where I su to the newly created oracle user. I'll omit the command prompts and responses this time.

Packages

pkg install SUNWxorg-server 

pkg install SUNWxorg-client-programs

pkg install SUNWxwopt

pkg install SUNWxwfsw

pkg install SUNWxvfb

pkg install SUNWxorg-graphics-ddx

pkg install SUNWxorg-headers

pkg install FSWxorg-fonts

pkg install SUNWmfrun

pkg install SUNWxvnc



pkg install SUNWman

pkg install SUNWgmake

pkg install SUNWbinutils

pkg install SUNWgcc

pkg install SUNWgzip

pkg install SUNWunzip

pkg install SUNWwget

pkg install SUNWbtool

pkg install SUNWzfs-auto-snapshot

pkg install pkg:/sunstudioexpress

Where's my manpage index? Grrr. This ought to be a post-install action for SUNWman.

catman -w

Oracle user and group

I suppose these will become reserved IDs when Oracle Corp gets around to releasing Solaris 11. I'm going to skip the oinstall and oper groups.

mkdir -c /export/home



groupadd -g 300 dba

useradd -m -d /export/home/oracle -g dba -u 300 -s /usr/bin/bash oracle

passwd oracle



mkdir /db/oracle

mkdir /db/oraInventory

chown oracle:dba /db/oracle

chown oracle:dba /db/oraInventory

chmod 755 /db/oracle

chmod 755 /db/oraInventory

System Parameter(s)

Prior to Solaris 10, there was some /etc/system tweaking to be done. Each version of Unix has it's own set of required system-level tweaks and accompanying reboots for Oracle.

In Solaris 10+, there's only one key parameter to change, and it's handled through projects, a more fine-grained approach to resource management.

The box has 8GB of memory. The default max shared memory is 25% or 2GB. Starting a database with 2GB of memory configured will cause a ORA-27102: out of memory error. I'll up the limit to 4GB.

Get our current project ID

oracle@oracle:~$ id -p

uid=300(oracle) gid=300(dba) projid=100(user.oracle)

We'll need be root to make these changes to the oracle user. Confirm the current shared memory limit

root@oracle:~$ prctl -n project.max-shm-memory -i project 100

project: 100: user.oracle

NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT

project.max-shm-memory

privileged      2.00GB      -   deny                                 -

system          16.0EB    max   deny                                 -

Set it to 4GB now. I believe this only affects the current shell.

root@oracle:~$ prctl -n project.max-shm-memory -r -v 4gb -i project 100

Set it to 4GB for next reboot

root@oracle:~$ projadd -U oracle -K "project.max-shm-memory=(priv,4GB,deny)" user.oracle

Confirm the changes

root@oracle:~$ prctl -n project.max-shm-memory -i project 100

project: 100: user.oracle

NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT

project.max-shm-memory

privileged      4.00GB      -   deny                                 -

system          16.0EB    max   deny                                 -



root@oracle:~$ projects -l user.oracle

user.oracle

projid : 100

comment: ""

users  : oracle

groups : (none)

attribs: project.max-shm-memory=(priv,4294967296,deny)

I've got a note here that a hack is needed to link libcrypto.so on OpenSolaris during the install, but I'm not sure if it applies here.

ln -s /lib/amd64/libcrypto.so /usr/sfw/lib/amd64

Installation

At this point I set up a VNC session as root so I can use the default Oracle installer. Now the oracle user needs permission to use the display. It's been a very long time since I needed to know this, so I spent a while staring at cryptic error messages before realizing the problem had nothing to do with the Oracle installer. A page of Java stack dump and complaints about permissions on /tmp/.X11-pipe actually mean "run xhost, dummy".

vncserver :1

xhost +local:oracle

Holy cats, that default window manager is ugly!

But it takes me back...late nights on Sun pizzabox workstations in a frigid computer room...spamming full-screen xeyes onto other students' monitors...hey, it's a lot more amusing when nobody in the room has slept in two days.

Away we go.

su - oracle

export DISPLAY=oracle:1.0

cd /junk/database

./runInstaller

The installer will complain because it does not recognize OpenSolaris and can't run any pre-flight checks. That's ok. Say yes. If you're missing a package, it'll probably become apparent later, when the installer invokes make.

I won't screenshot all of the installation screens, but here's a brief summary:

I skipped entering an email address. This won't be a supported installation, and in any case, we already have a problem if I have to hear about "critical security issues in my configuration" via email.
Unless I'm performing an upgrade, I always choose to install the database software only. I prefer to make sure I have basic functionality - in the past, this was by no means assured and often required a support call to Oracle to work around installation bugs - and run dbca later.
The system isn't prepped for RAC so this will be a single instance installation. Perhaps that will be a different writeup.
I chose to install Enterprise, minus OLAP, which I never use. (Table partitioning is still an option with a separate price tag? Seriously?)
The Oracle Base directory is going to be /db/oracle and the software location /db/oracle/product/11.2.0.
The software will be owned by user oracle, group dba.

Post-Installation

Setup the oracle user's environment in ~/.profile.



export ORACLE_HOME=/db/oracle/product/11.2.0

export ORACLE_OWNER=oracle

export ORACLE_SID=toybox

export ORACLE_UNQNAME=toybox

export NLS_LANG=AMERICAN_AMERICA.UTF8



export PATH=/usr/gnu/bin:/usr/bin:/usr/X11/bin:/usr/sbin:/sbin:/${ORACLE_HOME}/bin



export DISPLAY=oracle:1.0

Wednesday, August 18, 2010

Oracle on OpenSolaris, part 1

This is an annotated log of a development installation of Oracle 11g on OpenSolaris x64.

I've made a metric ton of notes on various installation, troubleshooting, and development sessions over the years. Most of them have ended up in text files and email. While there's something to be said for letting this knowledge fade as software changes and techniques become outdated, rather than allowing it to accumulate in a mountain of disorganized cruft, there's also something to be said for delegating the entire issue to Google. :)

It's been a while since I set up this box so I'll start by reviewing the network configuration.



gregory@live:~$ dladm show-phys

LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE

e1000g0      Ethernet             up         1000   full      e1000g0

e1000g1      Ethernet             unknown    0      half      e1000g1



gregory@live:~$ dladm show-ether

LINK            PTYPE    STATE    AUTO  SPEED-DUPLEX                    PAUSE

e1000g0         current  up       yes   1G-f                            bi

e1000g1         current  unknown  yes   0M-h                            bi



gregory@live:~$ dladm show-vnic

LINK         OVER         SPEED  MACADDRESS           MACADDRTYPE         VID

vnic200      e1000g0      1000   2:8:20:d5:a6:43      random              0

vnic199      e1000g0      1000   2:8:20:ea:8a:76      random              0

Ok. I'll put a new vnic on e1000g0 for the new zone.



gregory@live:~$ pfexec dladm create-vnic -l e1000g0 vnic201



gregory@live:~$ dladm show-vnic

LINK         OVER         SPEED  MACADDRESS           MACADDRTYPE         VID

vnic200      e1000g0      1000   2:8:20:d5:a6:43      random              0

vnic199      e1000g0      1000   2:8:20:ea:8a:76      random              0

vnic201      e1000g0      1000   2:8:20:8b:63:ab      random              0



gregory@live:~$ dladm show-link

LINK        CLASS    MTU    STATE    OVER

e1000g0     phys     1500   up       --

e1000g1     phys     1500   unknown  --

vnic200     vnic     1500   up       e1000g0

vnic199     vnic     1500   up       e1000g0

vnic201     vnic     1500   up       e1000g0

Zone setup

I'll copy an the configuration of an existing zone to save a bit of typing. I don't want to clone it as it's a live zone containing a bunch of Rails instances.



gregory@live:~$ zonecfg -z rorpub export



create -b

set zonepath=/zones/rorpub

set brand=ipkg

set autoboot=true

set limitpriv=default,dtrace_proc,dtrace_user

set ip-type=exclusive

add net

set physical=vnic200

end

add dataset

set name=rpool/delegates/rorpub

end

I'll just edit that instead.

I recall reading that zones and zfs are now supposed to play nice together, and that zonecfg should automagically create a zfs filesystem for the zonepath.

At least on this (snv_111b) release, the "automatic" part must be taken with a grain of salt. What actually happens is that the zonecfg install fails because it can't find the filesystem. Then zonecfg uninstall also fails because it can't find /ROOT under the filesystem it already failed to find, leaving you with no obvious way to proceed. This bug is probably fixed in newer builds. I have a boot environment patched to snv_134 but I can't reboot right now.

So I'll make sure the filesystem is created, mounted, and has root-only permissions.



gregory@live:~# pfexec zfs create rpool/zones/oracle

gregory@live:~# pfexec chmod 700 /zones/oracle

Hold on a minute. Typing pfexec for every. single. command. is bollocks. This is just as silly as Windows UAC spam; the extra step quickly becomes rote, making it even less likely that the person at the keyboard will show due care with privileged access. I may as well become root now and drop privileges when I'm done. I'd rather use su than pfexec bash, because my shell prompt is set up to remind me I'm root.

So if you see several pages of commands run as root in the future, don't assume it's because I'm too dense to use pfexec or sudo.



gregory@live:~# su -

Password:

Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008

root@live:~# cd /



root@live:/# zfs set mountpoint=/zones/oracle rpool/zones/oracle

I want to delegate some space to this zone and make sure it has an 8k blocksize. I plan to install Oracle here, and using an 8k Oracle blocksize on top of a 128k ZFS blocksize will cause ZFS to read 16 times as much data than it needs on every random read. This is Bad.

There's more tuning to be done but this step should be done before any datafiles are created.

Oh, good. Oracle now properly supports ZFS (PDF, white paper, May 2010). The "tuning is evil" philosophy used within ZFS used to cause serious conflicts with Oracle's philosophy that everything in the OS ought to be tuned around the RDBMS.



root@live:/# zfs create -o mountpoint=none -o recordsize=8k rpool/delegates/oracle

root@live:/# zfs list -o name,recsize,mountpoint,volsize,zoned | grep oracle

rpool/delegates/oracle                 8K  none                        -    off

rpool/zones/oracle                   128K  none                        -    off

Zone creation



root@live:/# zonecfg -z oracle

zonecfg:oracle> create -b

zonecfg:oracle> set zonepath=/zones/oracle

zonecfg:oracle> set brand=ipkg

zonecfg:oracle> set autoboot=true

zonecfg:oracle> set limitpriv=default,dtrace_proc,dtrace_user

zonecfg:oracle> set ip-type=exclusive

zonecfg:oracle> add net

zonecfg:oracle:net> set physical=vnic201

zonecfg:oracle:net> end

zonecfg:oracle> add dataset

zonecfg:oracle:dataset> set name=rpool/delegates/oracle

zonecfg:oracle:dataset> end

zonecfg:oracle> commit

zonecfg:oracle> verify

zonecfg:oracle> exit



root@live:/# zoneadm -z oracle install

Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).

Image: Preparing at /zones/oracle/root.

Cache: Using /var/pkg/download.

Sanity Check: Looking for 'entire' incorporation.

Installing: Core System (output follows)

DOWNLOAD                                    PKGS       FILES     XFER (MB)

Completed                                  20/20   3021/3021   42.55/42.55



PHASE                                        ACTIONS

Install Phase                              5747/5747

Installing: Additional Packages (output follows)

DOWNLOAD                                    PKGS       FILES     XFER (MB)

Completed                                  37/37   5598/5598   32.52/32.52



PHASE                                        ACTIONS

Install Phase                              7329/7329



Note: Man pages can be obtained by installing SUNWman

Postinstall: Copying SMF seed repository ... done.

Postinstall: Applying workarounds.

Done: Installation completed in 101.537 seconds.



Next Steps: Boot the zone, then log into the zone console

(zlogin -C) to complete the configuration process



root@live:/# zoneadm -z oracle boot

root@live:/# zoneadm list -cv

ID NAME             STATUS     PATH                           BRAND    IP

0 global           running    /                              native   shared

1 rorpub           running    /zones/rorpub                  ipkg     excl

2 oracle           running    /zones/oracle                  ipkg     excl

- barebones        installed  /zones/barebones               ipkg     excl

I should have scripted the system identification to avoid all the network config ESC-2-ing. We'll just pretend I did and move on.



System identification is completed.



oracle console login: root

Password:

Aug 17 20:59:31 oracle login: ROOT LOGIN /dev/console

Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008

edit: Apparently I've been doing everything the hard way. I might have started with zonemgr instead. It didn't hurt to review the quirks of zonecfg in order to clarify why wrapping it in a script is a really good idea.

Now create a couple of filesystems under our delegate filesystem to hold the installation files and the installation.



root@oracle:~# zfs create -o mountpoint=/db rpool/delegates/oracle/db

root@oracle:~# zfs create -o mountpoint=/junk rpool/delegates/oracle/junk



root@oracle:/# zfs list -o name,used,avail,mountpoint,recsize

NAME                          USED  AVAIL  MOUNTPOINT     RECSIZE

rpool                        94.7G   819G  /rpool            128K

rpool/delegates              3.52G   819G  none              128K

rpool/delegates/oracle       2.36G  29.6G  none                8K

rpool/delegates/oracle/db      19K  29.6G  /db                 8K

rpool/delegates/oracle/junk  2.36G  29.6G  /junk               8K

rpool/zones                  2.48G   819G  none              128K

rpool/zones/oracle            248M   819G  /zones/oracle     128K

rpool/zones/oracle/ROOT       248M   819G  legacy            128K

rpool/zones/oracle/ROOT/zbe   248M   819G  legacy            128K

There's the filesystem with the 8k blocksize, as ordered. Time to go back to the global zone.



root@oracle:~# logout

oracle console login: ~.

Password: ~.

[Connection to zone 'oracle' console closed]



root@live:/# zfs set quota=32g rpool/delegates/oracle



root@live:/# zfs list | grep oracle

rpool/delegates/oracle              38K  32.0G    19K  none

rpool/delegates/oracle/db           19K  32.0G    19K  /db

rpool/zones/oracle                 246M   821G    22K  /zones/oracle

rpool/zones/oracle/ROOT            246M   821G    19K  legacy

rpool/zones/oracle/ROOT/zbe        246M   821G   246M  legacy

And dump the oracle installation files into the oracle zone from the zip files in the global zone. They were retrieved out here because the global zone is currently the only one with a window manager and a browser. If I wanted to retrieve them from the command line, I'd still have to start a download from a browser where I could accept the license agreement and then transfer a cookie file to a command line tool, which is kind of a pain.

Mounting the Downloads directory with lofs probably would have worked too, but this is easier.



root@live:/export/home/gregory/Downloads# unzip solaris.x64_11gR2_database_1of2.zip -d /zones/oracle/root/junk

root@live:/export/home/gregory/Downloads# unzip solaris.x64_11gR2_database_2of2.zip -d /zones/oracle/root/junk

I'll have to sort out a DNS issue and install additional packages before proceeding. That'll be in the next post.

Friday, August 6, 2010

Comcast wireless internet service

This is based on a true story. However, I would never consider poking around in sensitive cable company equipment, not even when the lid is left half-cocked and the cable sitting out on my lawn where it could be run over with a mower. Like this picture here, taken in my back yard.

I'm a long-time Comcast customer. When their broadband Internet service works, it's great - fast and reliable. But if something goes wrong, Comcast's support staff and contractors are apparently so compartmentalized that the support conversations can turn surreal.

I recently cancelled my cable TV but asked to keep my broadband Internet service. The customer representative I spoke with was simply astonished that I would stop watching television. "You went cold turkey on the TV?!" We both laughed. She explained that the broadband price goes up if you don't have a bundle, but I was cool with that. She gave me a "customer retention discount". Nice lady.

My bill arrived. Everything looked right. I didn't bother to check whether the cable TV had been turned off; I didn't care one way or the other.

Some time later, my Internet connection went down. That's not so unusual; every service must have maintenance periods. But this was in the middle of a weekday afternoon. It stayed down. I asked Danielle to call Comcast support, as I was feeling pretty grumpy.

Anyone who has ever had a Comcast outage surely knows this routine: You navigate a series of telephone voice response menus. A recorded message may insist that you consider their online support options, even though you've indicated that you're calling about a failure of Internet service. You confirm your phone number. While on hold, you may listen to the same ad for a pay-per-view boxing match once per minute, every minute, even though your phone number is sufficient to let them know you don't have cable TV on your account.

Eventually you speak to a person. The first thing you're asked for when you're connected is the phone number you were required to confirm before they picked up.

The call center script must look a lot like this:

Ask the customer to reboot their cable modem.
Ask the customer to reboot their router.
Ask the customer to reboot their PC.
If the problem isn't fixed, pass the buck and schedule a truck roll.
There is no five. End the call politely but quickly.

What you will not get from most of the daytime phone support staff is technical support. God help you if you try to use networking jargon or make any attempt to help them diagnose the problem. Now, this isn't always true. I've talked to people in the call center who really seemed to know their stuff, and even though it sounded like they were being actively hobbled by whatever helpdesk software they're required to use, they do try to troubleshoot. But those techs mostly come out at night. Mostly.

I had an intermittent outage last spring that took a month and a half and innumerable phone calls to fix. It was a night tech who finally managed to jam "10% packet loss bad, mmkay?" through their escalation firewall and get someone to troubleshoot the neighborhood equipment instead of replacing my cable modem for a 4th time. It helped that he was a gamer. I was playing Age of Conan online at the time. The packet loss was virtually killing us. He understood.

The first phone support tech we talked to for this outage claimed that she could "see the modem" but "not see any traffic going up or down". That's very interesting, considering what I found later. She scheduled a truck roll for two days later.

Two days without net access. Ugh. That's just...ugh.

I thought the first tech's comment was a little odd, so I called back again later to try my hand at Call Center Roulette. The second tech said she couldn't see the modem at all. Which made sense to me, considering that the modem couldn't see Comcast.

I suppose most customers aren't aware that cable modems have an internal web service that'll tell you your cable modem's status. It's no secret, it just isn't advertised. My modem was telling me it had no signal at all. Like there was no cable plugged in. More fool me, I tried to tell the tech that the cable modem had logged the loss of sync that afternoon - yes, there's a log file in there, complete with helpful timestamped error messages - and it had been hunting for a frequency ever since.

There was a long pause. Then she asked me,

"Are there any lights on your cable modem?"

Tell me...how many lights you see?

"..."

"...There's a light for 'power'. The light for 'cable' is flashing. Because it has no signal."

"Well-then-we-have-to-send-a-tech-out. To-check-your-modem." (She really did start speedtalking. I was headed off-script; it was time for her to end the call.)

"Oookay. This is a rental modem. If it's a problem with the modem, I can just drive to the local Comcast shop and have them swap it out, right?"

"Yes, you can do that sir."

"So let's say I do that. I swap the modem. I bring the new, Comcast-provided modem home and plug it in. Let's say this new modem can't see Comcast either. What do we do then?"

"... Sir, then we'd have to send a tech out to check your modem."

"I...see."

"Is-there-anything-else-I-can-help-you-with?"

I'm quite positive there isn't. Have a nice evening.

You may already have figured out what happened; by now I was getting suspicious. The next day, I checked all the connections. Upstairs, basement, wall box, and...out back.

aether-net?

The big silver thing on the right is a filter. That's my neighbor's cable. They have broadband Internet but no cable TV. The filter removes the frequencies used for TV while allowing those for data to pass through. Since I was no longer paying for TV, they had the option of putting one of these on my line as well.

That's not what happened.

Note the distinct lack of wire in that connector on the left. That's my line. This is what their phone techs were trying to diagnose by having me repeatedly reboot my PC and networking equipment. The metal cylinder on the end is a tamper-resistant locking cable terminator, which obviously isn't locked. In fact it was barely even screwed in. Why?

Because a Comcast contractor had tramped through my yard without notifying me and, contrary to instructions, put a terminator on the line to prevent me from stealing the cable I'm paying for. And then failed to lock it. Which meant a) their contractor was incompetent, no matter what his instructions were, and b) I could restore my Internet access by simply removing the damn thing.

But they'd already scheduled a truck roll for the next day, and I didn't want to mess with their equipment. I confess I also wanted to see the look on the field tech's face when he opened up the box and saw what was in there.

I called back again, explained that somebody had goofed, and asked what they expected me to do next. The tech seemed perplexed...why had my cable just gone out? The work order was from last month. I explained that he probably shouldn't trust the dates on his screen. During the previous outage, they'd had a tech run a new line from the back yard to the house (which achieved nothing) and scheduled someone else to bury it. Their contractor, unconcerned with the state law regarding underground utilities, dug a trench and planted the cable. The JULIE team came out two days later to spray paint lines and plant warning flags - to prevent the Comcast guy who'd already dug the trench from accidentally hitting a natural gas line.

I was put on hold for a while. When the tech came back, he told me I'd have to wait for tomorrow's truck roll and they'd get everything sorted. I asked if there was any way they could get it over with today, seeing as there was nothing actually wrong with any of the equipment. No, I'd have to wait until tomorrow. He signed off with an eloquent goodbye in a tone of extreme irony, explaining in detail how much Comcast valued their customers.

We weren't done yet. Now that this truck roll was go, it needed to be confirmed. Twice.

The first call was just to confirm that I really would like a tech to come out and fix the line they were now on record as having screwed up themselves. Yes, I'm still at the same phone number and address I was yesterday. No, my house has not suffered a spontaneous relocation to a different zip code.

As I was ready to hang up, he gave me an enthusiastic pitch for their email-based support solutions, presumably an initiative by a different department to reduce the cost of all these truck rolls. I couldn't help laughing.

"But I have no net access. With which to, you know, access email. A Comcast contractor disconnected it. That's why we're having this conversation."

Long pause.

(mildly condescending) "There are other ways to access email."

Well of course there are! I could drive to the public library, I could run out and drop a grand to buy a laptop and then use the WiFi at Starbuck's, or I could...let's see...really stretching for solutions here...call one of your competitors? WTF?

I was still laughing when he hung up.

One more robocall-directed conversation confirmed that I was willing to wait around all afternoon, and no, they couldn't give me an ETA more accurate than some time in a 3 hour period (these guys don't have mobile phones and GPS? Really?) The truck came out on schedule. Hallelujah! I could finally stop yelling "INTERNETS!" at Danielle.

Of course this tech had been given absolutely no record of the previous conversations, so he didn't know why he was there. The Comcast helpdesk software must be beastly. No matter. He looked tired. I told him, "Hi. I'll save you some time. Let's head straight out back. Someone mistakenly put a terminator on my line."

He said, "Wonderful."

And removed it.