Thursday, August 19, 2010

Oracle on OpenSolaris, part 2

If this is going to be a sandbox, it needs more toys. Oracle requires certain packages. The installer still uses Motif. I'm ambivalent about installing a full Gnome desktop, but I'll want X and VNC. And compilers and GNU make.

All actions are performed in the oracle zone as root, until the point where I su to the newly created oracle user. I'll omit the command prompts and responses this time.

Packages

pkg install SUNWxorg-server
pkg install SUNWxorg-client-programs
pkg install SUNWxwopt
pkg install SUNWxwfsw
pkg install SUNWxvfb
pkg install SUNWxorg-graphics-ddx
pkg install SUNWxorg-headers
pkg install FSWxorg-fonts
pkg install SUNWmfrun
pkg install SUNWxvnc

pkg install SUNWman
pkg install SUNWgmake
pkg install SUNWbinutils
pkg install SUNWgcc
pkg install SUNWgzip
pkg install SUNWunzip
pkg install SUNWwget
pkg install SUNWbtool
pkg install SUNWzfs-auto-snapshot
pkg install pkg:/sunstudioexpress

Where's my manpage index? Grrr. This ought to be a post-install action for SUNWman.

catman -w

Oracle user and group

I suppose these will become reserved IDs when Oracle Corp gets around to releasing Solaris 11. I'm going to skip the oinstall and oper groups.
mkdir -c /export/home

groupadd -g 300 dba
useradd -m -d /export/home/oracle -g dba -u 300 -s /usr/bin/bash oracle
passwd oracle

mkdir /db/oracle
mkdir /db/oraInventory
chown oracle:dba /db/oracle
chown oracle:dba /db/oraInventory
chmod 755 /db/oracle
chmod 755 /db/oraInventory

System Parameter(s)

Prior to Solaris 10, there was some /etc/system tweaking to be done. Each version of Unix has it's own set of required system-level tweaks and accompanying reboots for Oracle.

In Solaris 10+, there's only one key parameter to change, and it's handled through projects, a more fine-grained approach to resource management.

The box has 8GB of memory. The default max shared memory is 25% or 2GB.  Starting a database with 2GB of memory configured will cause a ORA-27102: out of memory error.  I'll up the limit to 4GB.

Get our current project ID
oracle@oracle:~$ id -p
uid=300(oracle) gid=300(dba) projid=100(user.oracle)

We'll need be root to make these changes to the oracle user. Confirm the current shared memory limit
root@oracle:~$ prctl -n project.max-shm-memory -i project 100
project: 100: user.oracle
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
project.max-shm-memory
privileged 2.00GB - deny -
system 16.0EB max deny -

Set it to 4GB now. I believe this only affects the current shell.
root@oracle:~$ prctl -n project.max-shm-memory -r -v 4gb -i project 100

Set it to 4GB for next reboot
root@oracle:~$ projadd -U oracle -K "project.max-shm-memory=(priv,4GB,deny)" user.oracle

Confirm the changes
root@oracle:~$ prctl -n project.max-shm-memory -i project 100
project: 100: user.oracle
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
project.max-shm-memory
privileged 4.00GB - deny -
system 16.0EB max deny -

root@oracle:~$ projects -l user.oracle
user.oracle
projid : 100
comment: ""
users : oracle
groups : (none)
attribs: project.max-shm-memory=(priv,4294967296,deny)

I've got a note here that a hack is needed to link libcrypto.so on OpenSolaris during the install, but I'm not sure if it applies here.
ln -s /lib/amd64/libcrypto.so /usr/sfw/lib/amd64

Installation

At this point I set up a VNC session as root so I can use the default Oracle installer. Now the oracle user needs permission to use the display. It's been a very long time since I needed to know this, so I spent a while staring at cryptic error messages before realizing the problem had nothing to do with the Oracle installer. A page of Java stack dump and complaints about permissions on /tmp/.X11-pipe actually mean "run xhost, dummy".
vncserver :1
xhost +local:oracle

Holy cats, that default window manager is ugly!

But it takes me back...late nights on Sun pizzabox workstations in a frigid computer room...spamming full-screen xeyes onto other students' monitors...hey, it's a lot more amusing when nobody in the room has slept in two days.


Away we go.
su - oracle
export DISPLAY=oracle:1.0
cd /junk/database
./runInstaller

The installer will complain because it does not recognize OpenSolaris and can't run any pre-flight checks. That's ok. Say yes. If you're missing a package, it'll probably become apparent later, when the installer invokes make.

I won't screenshot all of the installation screens, but here's a brief summary:

  • I skipped entering an email address. This won't be a supported installation, and in any case, we already have a problem if I have to hear about "critical security issues in my configuration" via email.
  • Unless I'm performing an upgrade, I always choose to install the database software only. I prefer to make sure I have basic functionality - in the past, this was by no means assured and often required a support call to Oracle to work around installation bugs - and run dbca later.
  • The system isn't prepped for RAC so this will be a single instance installation.  Perhaps that will be a different writeup.
  • I chose to install Enterprise, minus OLAP, which I never use. (Table partitioning is still an option with a separate price tag? Seriously?)
  • The Oracle Base directory is going to be /db/oracle and the software location /db/oracle/product/11.2.0.
  • The software will be owned by user oracle, group dba.

Post-Installation

Setup the oracle user's environment in ~/.profile.

export ORACLE_HOME=/db/oracle/product/11.2.0
export ORACLE_OWNER=oracle
export ORACLE_SID=toybox
export ORACLE_UNQNAME=toybox
export NLS_LANG=AMERICAN_AMERICA.UTF8

export PATH=/usr/gnu/bin:/usr/bin:/usr/X11/bin:/usr/sbin:/sbin:/${ORACLE_HOME}/bin

export DISPLAY=oracle:1.0

Wednesday, August 18, 2010

Oracle on OpenSolaris, part 1

This is an annotated log of a development installation of Oracle 11g on OpenSolaris x64.

I've made a metric ton of notes on various installation, troubleshooting, and development sessions over the years. Most of them have ended up in text files and email. While there's something to be said for letting this knowledge fade as software changes and techniques become outdated, rather than allowing it to accumulate in a mountain of disorganized cruft, there's also something to be said for delegating the entire issue to Google. :)

It's been a while since I set up this box so I'll start by reviewing the network configuration.

gregory@live:~$ dladm show-phys
LINK MEDIA STATE SPEED DUPLEX DEVICE
e1000g0 Ethernet up 1000 full e1000g0
e1000g1 Ethernet unknown 0 half e1000g1

gregory@live:~$ dladm show-ether
LINK PTYPE STATE AUTO SPEED-DUPLEX PAUSE
e1000g0 current up yes 1G-f bi
e1000g1 current unknown yes 0M-h bi

gregory@live:~$ dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
vnic200 e1000g0 1000 2:8:20:d5:a6:43 random 0
vnic199 e1000g0 1000 2:8:20:ea:8a:76 random 0

Ok. I'll put a new vnic on e1000g0 for the new zone.

gregory@live:~$ pfexec dladm create-vnic -l e1000g0 vnic201

gregory@live:~$ dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE VID
vnic200 e1000g0 1000 2:8:20:d5:a6:43 random 0
vnic199 e1000g0 1000 2:8:20:ea:8a:76 random 0
vnic201 e1000g0 1000 2:8:20:8b:63:ab random 0

gregory@live:~$ dladm show-link
LINK CLASS MTU STATE OVER
e1000g0 phys 1500 up --
e1000g1 phys 1500 unknown --
vnic200 vnic 1500 up e1000g0
vnic199 vnic 1500 up e1000g0
vnic201 vnic 1500 up e1000g0

Zone setup

I'll copy an the configuration of an existing zone to save a bit of typing. I don't want to clone it as it's a live zone containing a bunch of Rails instances.

gregory@live:~$ zonecfg -z rorpub export

create -b
set zonepath=/zones/rorpub
set brand=ipkg
set autoboot=true
set limitpriv=default,dtrace_proc,dtrace_user
set ip-type=exclusive
add net
set physical=vnic200
end
add dataset
set name=rpool/delegates/rorpub
end

I'll just edit that instead.

I recall reading that zones and zfs are now supposed to play nice together, and that zonecfg should automagically create a zfs filesystem for the zonepath.

At least on this (snv_111b) release, the "automatic" part must be taken with a grain of salt. What actually happens is that the zonecfg install fails because it can't find the filesystem. Then zonecfg uninstall also fails because it can't find /ROOT under the filesystem it already failed to find, leaving you with no obvious way to proceed. This bug is probably fixed in newer builds. I have a boot environment patched to snv_134 but I can't reboot right now.

So I'll make sure the filesystem is created, mounted, and has root-only permissions.

gregory@live:~# pfexec zfs create rpool/zones/oracle
gregory@live:~# pfexec chmod 700 /zones/oracle

Hold on a minute. Typing pfexec for every. single. command. is bollocks. This is just as silly as Windows UAC spam; the extra step quickly becomes rote, making it even less likely that the person at the keyboard will show due care with privileged access. I may as well become root now and drop privileges when I'm done. I'd rather use su than pfexec bash, because my shell prompt is set up to remind me I'm root.

So if you see several pages of commands run as root in the future, don't assume it's because I'm too dense to use pfexec or sudo.

gregory@live:~# su -
Password:
Sun Microsystems Inc. SunOS 5.11 snv_111b November 2008
root@live:~# cd /

root@live:/# zfs set mountpoint=/zones/oracle rpool/zones/oracle

I want to delegate some space to this zone and make sure it has an 8k blocksize. I plan to install Oracle here, and using an 8k Oracle blocksize on top of a 128k ZFS blocksize will cause ZFS to read 16 times as much data than it needs on every random read. This is Bad.

There's more tuning to be done but this step should be done before any datafiles are created.

Oh, good. Oracle now properly supports ZFS (PDF, white paper, May 2010).  The "tuning is evil" philosophy used within ZFS used to cause serious conflicts with Oracle's philosophy that everything in the OS ought to be tuned around the RDBMS.

root@live:/# zfs create -o mountpoint=none -o recordsize=8k rpool/delegates/oracle
root@live:/# zfs list -o name,recsize,mountpoint,volsize,zoned | grep oracle
rpool/delegates/oracle 8K none - off
rpool/zones/oracle 128K none - off

Zone creation

root@live:/# zonecfg -z oracle
zonecfg:oracle> create -b
zonecfg:oracle> set zonepath=/zones/oracle
zonecfg:oracle> set brand=ipkg
zonecfg:oracle> set autoboot=true
zonecfg:oracle> set limitpriv=default,dtrace_proc,dtrace_user
zonecfg:oracle> set ip-type=exclusive
zonecfg:oracle> add net
zonecfg:oracle:net> set physical=vnic201
zonecfg:oracle:net> end
zonecfg:oracle> add dataset
zonecfg:oracle:dataset> set name=rpool/delegates/oracle
zonecfg:oracle:dataset> end
zonecfg:oracle> commit
zonecfg:oracle> verify
zonecfg:oracle> exit

root@live:/# zoneadm -z oracle install
Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/).
Image: Preparing at /zones/oracle/root.
Cache: Using /var/pkg/download.
Sanity Check: Looking for 'entire' incorporation.
Installing: Core System (output follows)
DOWNLOAD PKGS FILES XFER (MB)
Completed 20/20 3021/3021 42.55/42.55

PHASE ACTIONS
Install Phase 5747/5747
Installing: Additional Packages (output follows)
DOWNLOAD PKGS FILES XFER (MB)
Completed 37/37 5598/5598 32.52/32.52

PHASE ACTIONS
Install Phase 7329/7329

Note: Man pages can be obtained by installing SUNWman
Postinstall: Copying SMF seed repository ... done.
Postinstall: Applying workarounds.
Done: Installation completed in 101.537 seconds.

Next Steps: Boot the zone, then log into the zone console
(zlogin -C) to complete the configuration process

root@live:/# zoneadm -z oracle boot
root@live:/# zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 rorpub running /zones/rorpub ipkg excl
2 oracle running /zones/oracle ipkg excl
- barebones installed /zones/barebones ipkg excl

I should have scripted the system identification to avoid all the network config ESC-2-ing. We'll just pretend I did and move on.

System identification is completed.

oracle console login: root
Password:
Aug 17 20:59:31 oracle login: ROOT LOGIN /dev/console
Sun Microsystems Inc. SunOS 5.11 snv_111b November 2008


edit: Apparently I've been doing everything the hard way. I might have started with zonemgr instead. It didn't hurt to review the quirks of zonecfg in order to clarify why wrapping it in a script is a really good idea.

Now create a couple of filesystems under our delegate filesystem to hold the installation files and the installation.

root@oracle:~# zfs create -o mountpoint=/db rpool/delegates/oracle/db
root@oracle:~# zfs create -o mountpoint=/junk rpool/delegates/oracle/junk

root@oracle:/# zfs list -o name,used,avail,mountpoint,recsize
NAME USED AVAIL MOUNTPOINT RECSIZE
rpool 94.7G 819G /rpool 128K
rpool/delegates 3.52G 819G none 128K
rpool/delegates/oracle 2.36G 29.6G none 8K
rpool/delegates/oracle/db 19K 29.6G /db 8K
rpool/delegates/oracle/junk 2.36G 29.6G /junk 8K
rpool/zones 2.48G 819G none 128K
rpool/zones/oracle 248M 819G /zones/oracle 128K
rpool/zones/oracle/ROOT 248M 819G legacy 128K
rpool/zones/oracle/ROOT/zbe 248M 819G legacy 128K

There's the filesystem with the 8k blocksize, as ordered. Time to go back to the global zone.

root@oracle:~# logout
oracle console login: ~.
Password: ~.
[Connection to zone 'oracle' console closed]

root@live:/# zfs set quota=32g rpool/delegates/oracle

root@live:/# zfs list | grep oracle
rpool/delegates/oracle 38K 32.0G 19K none
rpool/delegates/oracle/db 19K 32.0G 19K /db
rpool/zones/oracle 246M 821G 22K /zones/oracle
rpool/zones/oracle/ROOT 246M 821G 19K legacy
rpool/zones/oracle/ROOT/zbe 246M 821G 246M legacy

And dump the oracle installation files into the oracle zone from the zip files in the global zone. They were retrieved out here because the global zone is currently the only one with a window manager and a browser. If I wanted to retrieve them from the command line, I'd still have to start a download from a browser where I could accept the license agreement and then transfer a cookie file to a command line tool, which is kind of a pain.

Mounting the Downloads directory with lofs probably would have worked too, but this is easier.

root@live:/export/home/gregory/Downloads# unzip solaris.x64_11gR2_database_1of2.zip -d /zones/oracle/root/junk
root@live:/export/home/gregory/Downloads# unzip solaris.x64_11gR2_database_2of2.zip -d /zones/oracle/root/junk

I'll have to sort out a DNS issue and install additional packages before proceeding. That'll be in the next post.

Friday, August 6, 2010

Comcast wireless internet service


This is based on a true story. However, I would never consider poking around in sensitive cable company equipment, not even when the lid is left half-cocked and the cable sitting out on my lawn where it could be run over with a mower. Like this picture here, taken in my back yard.


I'm a long-time Comcast customer. When their broadband Internet service works, it's great - fast and reliable. But if something goes wrong, Comcast's support staff and contractors are apparently so compartmentalized that the support conversations can turn surreal.

I recently cancelled my cable TV but asked to keep my broadband Internet service. The customer representative I spoke with was simply astonished that I would stop watching television. "You went cold turkey on the TV?!" We both laughed. She explained that the broadband price goes up if you don't have a bundle, but I was cool with that. She gave me a "customer retention discount". Nice lady.

My bill arrived. Everything looked right. I didn't bother to check whether the cable TV had been turned off; I didn't care one way or the other.

Some time later, my Internet connection went down. That's not so unusual; every service must have maintenance periods. But this was in the middle of a weekday afternoon. It stayed down. I asked Danielle to call Comcast support, as I was feeling pretty grumpy.

Anyone who has ever had a Comcast outage surely knows this routine: You navigate a series of telephone voice response menus. A recorded message may insist that you consider their online support options, even though you've indicated that you're calling about a failure of Internet service. You confirm your phone number. While on hold, you may listen to the same ad for a pay-per-view boxing match once per minute, every minute, even though your phone number is sufficient to let them know you don't have cable TV on your account.

Eventually you speak to a person. The first thing you're asked for when you're connected is the phone number you were required to confirm before they picked up.

The call center script must look a lot like this:
  1. Ask the customer to reboot their cable modem.
  2. Ask the customer to reboot their router.
  3. Ask the customer to reboot their PC.
  4. If the problem isn't fixed, pass the buck and schedule a truck roll.
  5. There is no five. End the call politely but quickly.
What you will not get from most of the daytime phone support staff is technical support. God help you if you try to use networking jargon or make any attempt to help them diagnose the problem. Now, this isn't always true. I've talked to people in the call center who really seemed to know their stuff, and even though it sounded like they were being actively hobbled by whatever helpdesk software they're required to use, they do try to troubleshoot. But those techs mostly come out at night. Mostly.

I had an intermittent outage last spring that took a month and a half and innumerable phone calls to fix. It was a night tech who finally managed to jam "10% packet loss bad, mmkay?" through their escalation firewall and get someone to troubleshoot the neighborhood equipment instead of replacing my cable modem for a 4th time. It helped that he was a gamer. I was playing Age of Conan online at the time. The packet loss was virtually killing us. He understood.

The first phone support tech we talked to for this outage claimed that she could "see the modem" but "not see any traffic going up or down". That's very interesting, considering what I found later. She scheduled a truck roll for two days later.

Two days without net access. Ugh. That's just...ugh.

I thought the first tech's comment was a little odd, so I called back again later to try my hand at Call Center Roulette. The second tech said she couldn't see the modem at all. Which made sense to me, considering that the modem couldn't see Comcast.

I suppose most customers aren't aware that cable modems have an internal web service that'll tell you your cable modem's status. It's no secret, it just isn't advertised. My modem was telling me it had no signal at all. Like there was no cable plugged in. More fool me, I tried to tell the tech that the cable modem had logged the loss of sync that afternoon - yes, there's a log file in there, complete with helpful timestamped error messages - and it had been hunting for a frequency ever since.

There was a long pause. Then she asked me,
"Are there any lights on your cable modem?"



"..."

"...There's a light for 'power'. The light for 'cable' is flashing. Because it has no signal."

"Well-then-we-have-to-send-a-tech-out. To-check-your-modem." (She really did start speedtalking. I was headed off-script; it was time for her to end the call.)

"Oookay. This is a rental modem. If it's a problem with the modem, I can just drive to the local Comcast shop and have them swap it out, right?"

"Yes, you can do that sir."

"So let's say I do that. I swap the modem. I bring the new, Comcast-provided modem home and plug it in. Let's say this new modem can't see Comcast either. What do we do then?"

"... Sir, then we'd have to send a tech out to check your modem."

"I...see."

"Is-there-anything-else-I-can-help-you-with?"

I'm quite positive there isn't. Have a nice evening.

You may already have figured out what happened; by now I was getting suspicious. The next day, I checked all the connections. Upstairs, basement, wall box, and...out back.


aether-net?

The big silver thing on the right is a filter. That's my neighbor's cable. They have broadband Internet but no cable TV. The filter removes the frequencies used for TV while allowing those for data to pass through. Since I was no longer paying for TV, they had the option of putting one of these on my line as well.

That's not what happened.

Note the distinct lack of wire in that connector on the left. That's my line. This is what their phone techs were trying to diagnose by having me repeatedly reboot my PC and networking equipment. The metal cylinder on the end is a tamper-resistant locking cable terminator, which obviously isn't locked. In fact it was barely even screwed in. Why?

Because a Comcast contractor had tramped through my yard without notifying me and, contrary to instructions, put a terminator on the line to prevent me from stealing the cable I'm paying for. And then failed to lock it. Which meant a) their contractor was incompetent, no matter what his instructions were, and b) I could restore my Internet access by simply removing the damn thing.

But they'd already scheduled a truck roll for the next day, and I didn't want to mess with their equipment. I confess I also wanted to see the look on the field tech's face when he opened up the box and saw what was in there.

I called back again, explained that somebody had goofed, and asked what they expected me to do next. The tech seemed perplexed...why had my cable just gone out? The work order was from last month. I explained that he probably shouldn't trust the dates on his screen. During the previous outage, they'd had a tech run a new line from the back yard to the house (which achieved nothing) and scheduled someone else to bury it. Their contractor, unconcerned with the state law regarding underground utilities, dug a trench and planted the cable. The JULIE team came out two days later to spray paint lines and plant warning flags - to prevent the Comcast guy who'd already dug the trench from accidentally hitting a natural gas line.

I was put on hold for a while. When the tech came back, he told me I'd have to wait for tomorrow's truck roll and they'd get everything sorted. I asked if there was any way they could get it over with today, seeing as there was nothing actually wrong with any of the equipment. No, I'd have to wait until tomorrow. He signed off with an eloquent goodbye in a tone of extreme irony, explaining in detail how much Comcast valued their customers.

We weren't done yet. Now that this truck roll was go, it needed to be confirmed. Twice.

The first call was just to confirm that I really would like a tech to come out and fix the line they were now on record as having screwed up themselves. Yes, I'm still at the same phone number and address I was yesterday. No, my house has not suffered a spontaneous relocation to a different zip code.

As I was ready to hang up, he gave me an enthusiastic pitch for their email-based support solutions, presumably an initiative by a different department to reduce the cost of all these truck rolls. I couldn't help laughing.
"But I have no net access. With which to, you know, access email. A Comcast contractor disconnected it. That's why we're having this conversation."
Long pause.
(mildly condescending) "There are other ways to access email."
Well of course there are! I could drive to the public library, I could run out and drop a grand to buy a laptop and then use the WiFi at Starbuck's, or I could...let's see...really stretching for solutions here...call one of your competitors? WTF?

I was still laughing when he hung up.

One more robocall-directed conversation confirmed that I was willing to wait around all afternoon, and no, they couldn't give me an ETA more accurate than some time in a 3 hour period (these guys don't have mobile phones and GPS? Really?) The truck came out on schedule. Hallelujah!  I could finally stop yelling "INTERNETS!" at Danielle.

Of course this tech had been given absolutely no record of the previous conversations, so he didn't know why he was there. The Comcast helpdesk software must be beastly. No matter. He looked tired. I told him, "Hi. I'll save you some time. Let's head straight out back. Someone mistakenly put a terminator on my line."

He said, "Wonderful."

And removed it.