Gurjeet [tech]: 2010

Enabling VT-x in BIOS (the tricky bit)

I have been having a problem with my laptop in my office network; it works everywhere else but when inside my office, my Vista BSODs for no good reason. The wireless network driver is always listed on the BSOD screen when this happens. I tried reinstalling the driver, turning off different combinations of system software, work in safe mode, turning off fancy features in BIOS, and even prayed to gods, but nothing worked; BSOD every single time (sometimes as early as the welcome screen, and sometime as much as 4 hours later). So I decided to update my drivers.

Luckily, Lenovo has a utility that will let you update all drivers at once, instead of having to hunt for each one. I used the 'System Update' utility and it promptly showed all the drivers that were updated since I got hold of this machine about an year ago; the updates also included a BIOS update. I happily obliged and installed all the updates.

While playing with BIOS before the updates, I had turned off the VT-x, which I remembered, so I went back and after a reboot I switched it back on. But lo and behold, my VirtualBox would not see that setting enabled and would refuse to start any VM that has more than 1 CPU. (It complained VERR_VMX_MSR_LOCKED_OR_DISABLED)

I read on the VirtualBox forum that switching VT-x off and back on worked for one user, so I tried that trick a few times but to no avail. I was about to rollback my BIOS firmware manually (at the risk of bricking my machine) but just then I read this advice in a VMWare post:

Because the VT-x setting is typically locked at power on, it is necessary to fully power down the system after changing any VT-x options in the BIOS. A simple reboot is not sufficient! After saving your BIOS changes, I recommend that you either switch off the power supply itself or pull the power cord(s) out of the wall and wait ten seconds. For laptop systems, you may have to remove the battery as well, although such extreme measures are rarely necessary.

So the trick was to not restart the laptop, but to turn it off, hold your breath for a few seconds and then start it again.

VT-x and my virtual machines have been working fine since then, but my BSOD situation is yet to be evaluated; I hope it works when I go there in a few days.

SQL: Using SUM() to better EXISTS()'s performance

I have a an IRC logger running on a personal server, and I log all IRC conversations in a Postgres table named 'irclog'. The table was getting huge, I started running out of disk space, so I decided to delete logs of all those channels that I never participated in.

Table "irclog"
  Column   |           Type           |                      Modifiers
-----------+--------------------------+-----------------------------------------------------
 id        | integer                  | not null default nextval('irclog_id_seq'::regclass)
 channel   | character varying(30)    |
 day       | character(10)            |
 nick      | character varying(40)    |
 timestamp | integer                  |
 line      | text                     |
 spam      | boolean                  | default false
 time      | timestamp with time zone | default now()
Indexes:
    "irclog_pkey" PRIMARY KEY, btree (id)
    "time_idx" btree ("time")

I tried a few different queries, and tried to optimize them using indexes (single- and multi-column).

select channel, exists(select 1
                       from irclog as i
                       where i.channel = o.channel
                       and i.nick = o.nick) as I_interacted
from irclog as o
where o.nick = 'gurjeet';

Sometimes I had to drop a previously created index as I was running out of disk.

But because the table was huge for my machine (1.3 GB of table data on 128 MB VPS hosted machine), the queries were taking too long, and I had to cancel them after running at max for 20 minutes. I knew the full table scans take only about 58 seconds, and aggregate queries to check most recent activity in channels took about 60 seconds, so I was targeting to make my "interacted" query run in almost the same time.

After a few failed trials, I finally tried this query:

select channel,
       sum( (nick='gurjeet')::int ) as I_interacted,
       count(*) as channel_rows
from irclog
group by channel
order by 3 asc;

This query ran in the same time as a full table scan, and gave me everything I needed to know; which channels I have never interacted with, and which of those I should focus on deleting to free up the most space.

A DELETE and a VACUUM FULL later, I was back to about 65% disk usage, and table size reduced by 50%.

This goes on to show that indexes are not always the solution to long-running queries, and that SQL is tricky, you need to know the right question to get the right answer.

I guess this is the reason we can devise an 'Index Advisor', but don't have any way of implementing a 'Query Advisor'.

Postgres can: Partial indexes

This is another in the series of Postgres' unique capabilities, which you might not find in other RDBMS products (sometimes not even in commercial products!).

I have been a fan of partial indexes since I came across them about 3 years ago, and to show the utility of this unique feature here's a post from Command Prompt's JD explaining a unique use-case and how Partial Indexes can be used to solve the problem:

http://www.commandprompt.com/blogs/joshua_drake/2010/06/where_bing__t/

The post above does not go into details of how partial indexes solve the problem (JD even mistakenly calls it expression-indexes), and neither will I try to explain it since TFM at postgres.org explains it quite well:

http://www.postgresql.org/docs/current/static/indexes-partial.html

All hail Postgres.