Monday 18 November 2013

Where can I download m2eclipse? the drag the install button does not work

Open eclipse, go to

help -> install new software


check both boxes, click next and finish..


Thursday 7 November 2013

nosql with sql - prestodb is open sourced


Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.


Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto is targeted at analysts who expect response times ranging from sub-second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.


Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.

Leading internet companies including Airbnb and Dropbox are using Presto.

Presto is amazing. Lead engineer Andy Kramolisch got it into production in just a few days. It's an order of magnitude faster than Hive in most our use cases. It reads directly from HDFS, so unlike Redshift, there isn't a lot of ETL before you can use it. It just works.
Christopher Gutierrez, Manager of Online Analytics, Airbnb

We're really excited about Presto. We're planning on using it to quickly gain insight about the different ways our users use Dropbox, as well as diagnosing problems they encounter along the way. In our tests so far it's been rock solid and extremely fast when applied to some of our most important ad hoc use cases.
Fred Wulff, Software Engineer, Dropbox

Wednesday 6 November 2013

bingiton results

To be honest, I am surprised google got 5 out of 5, but it was a fair test.

I switched between left and right choices, so they do not keep bing on one side and google on the other.

Some insight in to why I made my choices:

Ebay results coming top, I did not like that.

When I searched my blog I got no results on bing! but then it is hosted on google :)

For mongodb I preferred results that pointed me towards the mongodb documentation rather than stackoverflow answers, which should rightly appear a bit further down the results.

The other searches were just better quality results in my opinion.

Try it yourself:

bingiton results

Tuesday 5 November 2013

Mongodb - Find items where a property in a nested array does not contain a string

Give the following json document (below) stored in mongo, how can I find elements which meet all the below:

SearchTerm equal to a specified string
DisplayURL in the nested Ads collection which does not contain a specified string
Position in the nested Ads collection which does not contain a specified string

The query to do this is at the bottom of the page. First a bit of background.

My mongodb instance contained 3 million of these records. When I ran the query the first time, it took around 7-8 seconds to complete. I made the index:

ensureIndex({ SearchTerm:1 }, { "Ads.DisplayURL":1 })

It then took less than 0.1 seconds to run.

Here is the json document I am using:

  "LocalDay": "2013-11-05T00:00:00",
  "PageIndex": 1,
  "PageId": 961425400,
  "Ads": [
      "Rating": "",
      "DisplayDomain": "",
      "Title": "(ACE) Top 25 at home exercises - American Council On Exercise",
      "ClickURL": "http:\/\/\/acefit\/fitness-programs-article\/2863\/Top-25-At-Home-Exercises\/",
      "Rank": 1,
      "DisplayURL": "",
      "Description2": "",
      "UrlDomain": "",
      "Description1": "",
      "Position": "N"
      "Rating": "",
      "DisplayDomain": "",
      "Title": "The Ultimate Home Workout - Shape",
      "ClickURL": "http:\/\/\/fitness\/workouts\/ultimate-home-workout",
      "Rank": 3,
      "DisplayURL": "",
      "Description2": "",
      "UrlDomain": "",
      "Description1": "",
      "Position": "N"
  "PagesDeep": 1,
  "DBStoreTime": "2013-11-05T09:56:12.627",
  "HTML": "",
  "LocationId": 2,
  "VariantId": 2,
  "NumNatural": 10,
  "SearchTerm": "exercise at home",
  "BrowseTime": 1516,
  "ServerId": 2454,
  "Blocked": "False",
  "Page1TopAds": "",
  "EngineId": 6,
  "utcstoretime": 1383645873,
  "SearchTermId": 1643017

The working query:

"SearchTerm" : "some search"
}, {
"Ads": {
$elemMatch: {
Position: { "$regex" : "^((?!somestring).)*$" },
DisplayURL: { "$regex" : "^((?!somestring).)*$" }

Or in one line so that you can run in the mongodb console:

db.extractedads.find({"SearchTerm" : "somestring" }, {"Ads": { $elemMatch: { Position: { "$regex" : "^((?!somestring).)*$" }, DisplayURL: { "$regex" : "^((?!somestring).)*$" } } } })