Diggin' technology every day

February 24, 2011

So easy it could be a toy, but it’s not

Filed under: General,Random Thought — Tags: — Nate @ 8:44 pm

I was at a little event thrown for the Vertica column-based database, as well as Tableau Software, a Seattle-based data visualization company. Vertica was recently acquired by HP for an undisclosed sum. I had not heard of Tableau until today.

I went in not really knowing what to expect, have heard good things about Vertica from my friend over there but it’s really not an area I have much expertise in.

I left with my mouth on the floor. I mean holy crap that combination looks wicked. Combining the world’s fastest column based data warehouse with a data visualization tool that is so easy some of my past managers could even run it. I really don’t have words to describe it.

I never really considered Vertica for storing IT-related data, and they brought up a case study with one of their bigger customers – Comcast who sends more than 65,000 events a second into a vertica database (including logs, SNMP traps and other data). Hundreds of terabytes of data with sub second query response times. I don’t know if they use Tableau software’s products or not. But there was a good use case for storing IT data in Vertica.

(from Comcast case study)

The test included a snapshot of their application running on a five-node cluster of inexpensive servers with 4 CPU AMD 2.6 GHz core processors with 64-bit 1 MB cache; 8 GB RAM; and ~750 GBs of usable space in a RAID- 5 configuration.
To stress-test Vertica, the team pushed the average insert rate to 65K samples per second; Vertica delivered millisecond-level performance for several different query types, including search, resolve and accessing two days’ worth of data. CPU usage was about 9%, with a fluctuation of +/- 3%, and disk utilization was 12% with spikes up to 25%.

That configuration could of course easily fit on a single server. How about a 48-core Opteron with 256GB of memory and some 3PAR storage or something? Or maybe a DL385G7 with 24 cores, 192GB memory(24x8GB), and 16x500GB 10k RPM SAS disks with RAID 5  and dual SAS controllers with 1GB of flash-backed cache(1 controller per 8 disks). Maybe throw some Fusion IO in there too?

Now I suspect that there will be additional overhead with trying to feed IT data into a Vertica database since  you probably have to format it in some way.

Another really cool feature of Vertica – all of it’s data is mirrored at least once to another server, nothing special about that right? Well they go one step further, they give you the ability to store the data pre-sorted in two different ways, so mirror #1 may be sorted by one field, and mirror #2 is sorted by another field, maximizing use of every copy of the data, while maintaining data integrity.

Something that Tableu did really well that was cool was you don’t need to know how you want to present your data, you just drag stuff around and it will try to make intelligent decisions on how to represent it. It’s amazingly flexible.

Tableu does something else well, there is no language to learn, you don’t need to know SQL, you don’t need to know custom commands to do things, the guy giving the presentation basically never touched his keyboard. And he published some really kick ass reports to the web in a matter of seconds, fully interactive, users could click on something and drill down really easily and quickly.

This is all with the caveat that I don’t know how complicated it might be to get the data into the database in the first place.

Maybe there are other products out there that are as easy to use and stuff as Tableau I don’t know as it’s not a space I spend much time looking at. But this combination looks incredibly exciting.

Both products have fully functional free evaluation versions available to download on the respective sites.

Vertica licensing is based on the amount of data that is stored (I assume regardless of the number of copies stored but haven’t investigated too much), no per-user, no per-node, no per-cpu licensing. If you want more performance, add more servers or whatever and you don’t pay anything more. Vertica automatically re-balances the cluster as you add more servers.

Tableau is licensed as far as I know on a named-user basis or a per-server basis.

Both products are happily supported in VMware environments.

This blog entry really does not do the presentation justice, I don’t have the words for how cool this stuff was to see in action, there aren’t a lot of products or technologies that I get this excited about, but these has shot to near the top of my list.

Time to throw your Hadoop out the window and go with Vertica.


  1. Nice post!!!!! Obviously!

    Comment by Felipe — February 25, 2011 @ 5:06 pm

  2. My company uses Tableau for visualizing BI data (from an MSSQL back end), I don’t know jack about Tableau but my team is currently thinking about and sort of evaluating Splunk… for kind of the same thing. Reading this makes me think that Tableau might also be an option. Ever seen a Splunk demo? Am I on track here?

    If you end up doing anything further with Tableau, do write about it or drop me a private e-mail.

    Comment by Dan — February 25, 2011 @ 7:27 pm

  3. Thanks for the post !

    I’ve been using Splunk for about 3 years now yeah, I know it pretty well. It is a powerful tool, though it’s report generation abilities don’t seem to compare to Tableau as far as what you can do, how fast you can do it and how easy it is to do it. Splunk is really good in my opinion for viewing raw log data, e.g. if developers need access to production logs Splunk provides a nice user interface to look at logs, but not nearly as powerful for reporting.
    Some splunk users have really done some amazing things with it, I remember one use case where technical support people had access to production logs and when a user called in they could in near real time help the user find the right product they were searching for (I think that was with Blue Nile). I’ve never come across a technical support team that had that level of real time access to log data.
    When I write reports in Splunk for anything more than the most trivial stuff, it really makes my brain hurt, it gets complicated really fast. I’m not a data analysis person at heart that is not my area of expertise, yet with Splunk I feel like I need someone like that full time to leverage it’s potential. Tableau would for the most part eliminate that gap based on what I saw.

    With the caveat of course Splunk makes it really easy to get data into the system that’s what it’s built for. I’m not sure what I would do at this point if I was going to get log data and stuff into something like Vertica for reporting with Tableau.

    I would love to have a merged product, Splunk’s IT integration, with a Vertica back end(for performance) and a Tableau way of generating reports, that would be a dream come true.

    Comment by Nate — February 25, 2011 @ 8:02 pm

  4. I’m learning something new here every post. Thanks to all.

    Has anyone else checked out Ingres VectorWise?

    That one fell off our radar because to say the least, they’re not communicative. I did try. But it looks interesting tech.

    It’s not of much general interest i suppose, but they mention a customer who is the son of Felix Rohatyn, who was a name to conjour with in finance, so i doubt his boy is a slouch at anything.

    all best to everyone,

    – john

    Comment by John (other John) — March 11, 2011 @ 6:43 am

  5. I seriously love your website.. Excellent colors
    & theme. Did you develop this web site yourself? Please reply back as I’m looking to create my own personal website and would love to learn where you got this from or just what the theme is called. Appreciate it!

    Comment by HostGator Reseller Review — April 28, 2013 @ 8:49 pm

  6. […] was first introduced to Tableau (and Vertica) a couple of years ago at a local event in Seattle. Both products really […]

    Pingback by Big pop in Tableau IPO « — May 17, 2013 @ 9:36 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress