Wednesday, June 13, 2012

Hadoop Summit San Jose June 13-14, 2012

Hadoop Summit is taking place in San Jose, California in June 13 and 14. There are different interesting and not so interesting sessions.

An observation about organization - so many things are distributed, in the spirit of Hadoop distributed nature. Examples - one big hall for lunch and presenters' booths is in one end of the building, the sessions are in the other end of the building - so people have to walk there and back. Another example - lunch: boxes with sandwiches on one side of the hall, soda is on the other...

There are no power sockets to plug your laptop. Only a couple of them along the walls.

Several sessions are over-capacitated. Couldn't get to some of the sessions.

But anyway here are some session notes:



Hadoop sessions notes


== AWS (Amazon Web Services) big data infrastructure



  • Netflix streams data from S3 directly into MapReduce (w/o HDFS) and back
  • Netflix bumps up from 300 to 400+ nodes over weekend
  • Netflix has an additional query cluster
  • Cheaper Experimentation = Faster Innovation
  • Logs are stored as JSON in S3
  • Honu a tool that aggregates logs and makes it available as Hive tables for analysts https://github.com/jboulon/Honu


Another climate prediction company:

  • Provision a cluster, send data, run jobs, shut down the cluster.


Case study airbnb (find a place to stay) - they moved from RDS to DinamoDB (Amazon nosql db)
and use S3 for data storage



== Unified Big Data Architecture: Integrating Hadoop within an Enterprise Analytical Ecosystem - Aster


Different data:

  • stable schema (structured) - data from RDB's, ... Use Teradata or Hadoop sometimes
  • evolving schema (semi-structured) - web logs, twitter stream, ... Hadoop, Aster for joining with structured data and for SQL+MapReduce
  • no schema (unstructured), PDF files, images,... Hadoop, sometimes Aster for MapReduce Analytics



Aster SQL-H - for business people

  • ANSI SQL on Hadoop data
  • through HCatalog it connects to Hive and HDFS




== Scalding (new Hadoop language from Twitter)


  • it looked to me as a library for Scala and Cascading
  • it can read/write from/to HDFS, DBs, MemCache, etc...
  • the model is similar to Pig and coding style is similar to Cascading
  • you can develop locally without shipping to hadoop
  • I was loosing track actually when the guy was talking about scala or cascading or scalding because of lack of my knowledge in these things
  • scala is a language for writing, not reading (personal impression)



== Microsoft Big Data



  • Microsoft wants to make sure that Hadoop works well on Azur as well as Windows
  • On Azur it has neat UI for administration and data processing
  • It has Hive console to create and manage Hive tables
  • It's all on http://hadooponazure.com



  • Integrating Excel to hadooponazure. You download an odbc driver for Hive and connect your Excel to Hive data.
  • Then can you can build Hive data and pull data to excel. Then this excel doc is uploaded to SharePoint where do all sorts of reporting, pivoting and charting. Once you republish this document to the SharePoint then you can schedule this excel document to refresh itself from hadoop with a certain cadency.


.NET also has a neat way to programmatically submit the Hive jobs.

JavaScript can call Hadoop jobs from "Interactive JavaScript console" in hadooponazure.com. You can query hive and parse the results into json and then graph it.

Hadoop you do? I am fine... -- funny sentence.

Overall: Microsoft did a good job in bringing Hadoop to the less technically prepared people.

== Hadoop and Cloud @ Netflix

  • They recommend movies based on Facebook (user's profile, friends)
  • Everything is personalized
  • 25M+ subscribers
  • 4M/day ratings
  • Searches: 3M/day
  • Plays: 30M/day
They use
  • Hadoop
  • Hive
  • Pig
  • Java
They use "Markov Chains" algorithm.


Sqoop 2



  • It's moving data from/to relational and non-relational databases
  • It's much easier to use than sqoop 1
  • It has UI admin panel
  • It's now client-server as opposed to only client sqoop 1
  • It's easier to integrate with Hive and HBase. In fact you can not only move data from db's to hdfs but also further move data to hive tables or hbase tables
  • It is going to be more secure





Tuesday, February 21, 2012

Could not get Chrome DLL version RelaunchChromeBrowserWithNewCommandLineIfNeeded

This error can occur in many circumstances. In my case I was running chrome.exe from command line to pack Chrome extension. But searching internet shows that this error occurs in many different environments and circumstances. What worked for me:

  1. Go to the Chrome installation directory C:\Users\yourusername\AppData\Local\Google\Chrome\Application
  2. It has the following directory inside: "17.0.963.56" - the name of directory on your computer can be different
  3. Simply add the full path to the chrome home directory and this build directory to your "PATH" environment variable.

Git to Hudson: Please tell me who you are

Here is the error that Git gives Hudson sometimes:

Caused by: hudson.plugins.git.GitException: Error performing command: git.exe tag -a -f -m Hudson Build #4 hudson-seo-plugin-4
Command "git.exe tag -a -f -m Hudson Build #4 hudson-seo-plugin-4" returned status code 128:
*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.


And here is the solution:

  1. Go to your project
  2. Click Configure
  3. Under "Source Code Management" -> Git click on "Advanced" button
  4. Enter required data in "Config user.name Value" and "Config user.email Value"
  5. Click "Save" at the bottom of the page.
  6. Restart you build by clicking "Build Now"
 Please note that setting this information on Hudson (not job) level didn't help me.

Wednesday, February 08, 2012

Photoshop Transparent Background in CS5

I was trying to remove background from a couple of icons. Internet is full of articles that tell you to use the  Magic Wand then press Delete. I did it but instead of deleting the background it would show me the following dialog:



and whatever I did - changed Contents or Blending, changed Opacity - nothing helped. I simply cannot delete the background!

After an hour of agony I decided to try another tool - Background Eraser Tool and when I was opening it I accidentally noticed another tool - Magic Eraser Tool. Select it and click on your background - it's going to work.

Tuesday, October 11, 2011

Grails vs. Play Framework comparison


By Mikhail Gavryuchkov


I like both frameworks. Grails or Groovy on Rails and Play framework which I would call Java on Rails.

I went through both tutorials: Grails Quick Start and Getting Started Book for Grails and Play tutorial.

I developed one application http://managers.internetpolyglot.com/articlemanager in Grails, I haven't developed anything yet in Play. This is middle of October 2011. Which means that if I continue with any of these frameworks I'll add or modify information on this page.

I am a pretty seasoned Java developer with exposure to web technologies and I did my share of Struts, Spring MVC and JSF. I have a website http://www.internetpolyglot.com which is written in AppFuse (thank you Matt Raible for this awesome framework that was the precursor of both Grails and Play) and needs redevelopment very badly. What to choose for the new generation of Internet Polyglot?

As a Java developer I most probably will not go for any of the following - Ruby on Rails, Django (Python), Perl, and even PHP. I want to keep it in the JVM. My recent two projects were using Spring MVC, Spring and Hibernate - a big pain in the butt! I am so tired of configuration xml's. The initial start is much faster in Grails and Play. Grails' language is Groovy, not Java but it's very Java-like. Some times it looks a bit creepy to my Java-used eye but more often than not it's quite intuitive.

If you google for "Grails vs. Play" you'll find many posts - most of them praise Play as a great replacement for Grails and only a couple of them say that they've hit some serious problems on live projects developed on Play.

Below is the matrix of comparison between Grails and Play framework, based on my personal (no doubt limited and subjective) experience.


Plugins/Modules

GrailsGrails has Grails plugins. They are simply amazing. And the way they are organized on the site is also very good. It's very important for me to see the grades. Unfortunately there are no reviews - it would be helpful to read reviews about different plugins. I used at least two plugins - Spring security plugin and Searchable plugin. Spring security worked like a charm although I had to spend some time on adjusting it to the legacy database tables. The Searchable plugin worked so-so. Whatever I did I couldn't customize the search results (or maybe I didn't spend enough time?). I spent at least one evening browsing the plugins directory and thinking "aha, I'll need this plugin some time in the future. And this one too. And that!".

PlayPlay framework has a similar concept to plugins - modules. Some of the functionality is already included in core Play and Play modules are quite the same as plugins. Except - the modules directory doesn't show grades or reviews so it's harder to make a choice - you are on your own, take your own risk when choosing this or that module.

Who winsGrails


Language


Grails: Groovy. Very much Java-like. You can even write Java code instead of Groovy. But Groovy is more concise, which is good and bad. Good because emmm... it's good. But bad because sometimes it becomes less readable, less intuitive. Groovy is a scripting language. Again - good because it allows more freedom and bad because it doesn't check your syntax at the time of development. You will find out whether you made this stupid typo later when you run the app. It sucks, to be honest. It means that to maintain reasonable reliability your project needs a lot of unit- and integration- tests.

Play: Java. Some say that Java is a dying language and there are many signs of it. But I still love it.I love that I can use Eclipse and it helps me so very much - autocomplete, navigation from class to class or from method to method, showing javadoc, etc. And yes, syntax errors are found at compile time.

Scala. I don't know this language. Yet. But it seems I'll do sooner or later because its adoption grows very fast. A couple of things (heard from others): the learning curve is steeper but after that you are in coding heaven. Play recently started supporting Scala.




Who wins: Play


IDE integration

Grails: Eclipse - there is a plugin as a part of SpringSourse Tool Suite. I tried it and I didn't like it. It was highlighting as a syntax error code lines that are perfectly valid. Autocomplete was not working well either.

IntelliJ IDEA - muuuuuch better. There was just one hickup when I was setting up the environment. After that development was much smoother. The downside - it's not free. But to me it's worth it, if I am to go with Grails I will definitely develop in IDEA.

Play: Eclipse works out of the box. Without any plugins or anything else. And works perfectly. Just don't forget to run "play eclipsify" every time you add a module.

Who wins: Play


Search

Grails: Searchable plugin - as I wrote earlier I didn't quite like it. There are a couple more plugins that allow you to make contextual search in your app but anything I tried I didn't like.

Play: I don't know. The tutorial didn't show anything in it. There are a couple of modules that I haven't tried. So I guess it's a tie.

Who wins: Tie


Security

Grails: I used Spring Security plugin and like it very much. I had to do some customizations to make it work on a legacy database tables but eventually it works well providing role-based security.

Play: Mmmmm... Not so good. The Adding authentication tutorial works but provides only admin/non admin authorization. What about other roles? Most of the web applications require role-bases authorization.

Who wins: Grails


Tutorial (Documentation)



Who wins: Tie


Persistence

Grails: GORM (Grails Object-Relational Mapping). I love it. It's so simple especially after you come from Hibernate where you have to write every one your freaking finder.

Play: Very good too. Although the syntax sometimes is not as elegant as Grails' GORM. But it's just a matter of taste.

Who wins: Tie


Scaffolding

Grails: Quick scaffolding of an application that provides a basic CRUD functionality.

Play: Scaffolding not so bad either. However out of the box the one-to-many relationship is not provided in scaffolded app. I.e. There is no way to navigate to a child entity from parent entity.

Who wins: Grails


Template engine

Grails: I like the syntax of gsp (Grails Server Pages) better. It resembles ol' good JSTL.

Play: Syntax (ironically) is Groovy-based and sometimes it's quite hard to read. But I guess it's a matter of getting used to.

Who wins: Grails


JQuery/Ajax

Grails: I tried it, it works pretty well.

Play: Haven't tried yet. Modules repository doesn't give much.

Who wins: Grails


Servlet API

Grails: Has it.

Play: Multiple posts about Play praise its creators for their boldness of dumping the Servlet API and making it purely stateless. It's hard to say for sure, but does it mean that I won't be able to access my request or session? Googled a bit more and I see that yes, you can access your request and session, don't worry.

Who wins: Tie


SEO


Grails: Oh yes, we need to have SEO-friendly url. For example instead of http://www.internetpolyglot.com/lessons/es/en I want to have http://www.internetpolyglot.com/lessons-spanish-english . It seems that Grails has it, although I haven't tried myself.

Play: There are a couple of Stackoverflow answers: 1 and 2. So I guess it's possible. The only thing - I don't like routs. I guess I need to read more about them but I don't get them right now after completing tutorial.

Who wins: Tie


Upgrade

Grails: Upgrade is simple. I had one problem when Grails folks decided to switch from hsqldb to h2 but it was a minor hickup.

Play: Upgrade is not simple. People have fear of upgrade.

Who wins: Grails


Development cycle

Grails: It's related to IDE support, so Grails is not as good at it but Groovy is more concise. But I don't know Groovy well enough :). Also every time you change your model class or controller - you see grails restarting.

Play: It's fast and efficient. The best part - you can develop the whole day without having to restart your server or your server restarting itself.

Who wins: Play


Running in debug mode

Grails: It's very important to me to be able to place a breakpoint in my Eclipse and see the program stop and inspect my variables. I tried it in Grails - it's working but not very reliably, even in IDEA.

Play: http://www.playframework.org/documentation/1.0.1/ide says it should be easy. So let's believe it.

Who wins: Play


Tests

Grails: I didn't like it. That's it. While developing Internet Polyglot Article Manager I knew I had to follow the test-driven development and cover my code with tests. But I don't wanted! I so much wasn't getting a knack while reading the book that I was procrastinating in writing those unit- and integration- tests. So no, there are no tests written for the Article Manager and I am deeply ashamed of it.

Play: Old good JUnit for unit- and integration- tests. Absolutely amazing browser/Selenium based test harness. Easy-to-write Selenium view-level tests. I liked it!

Who wins: Play



Production

Grails: Well thought approach in configuration file. When you build a war file and deploy it on your production it automatically picks up the prod mode and connects to your production database.

Play: You need to change your config file before building your app for production. Not good. I can't comment on whether it's good or not that Play app doesn't deploy in a servlet container - it runs standalone. I don't know - Tomcat gives me so many things for free, I am simply used to it. Maybe it's good to run it without Tomcat - time will tell.

Who wins: Grails


Performance

I didn't do it myself. Others did and posted contradicting benchmarks. Some say Grails is faster, some say Play.

Grails: 

Play: 

Who wins: Tie


Web services

Grails: It was insultingly easy to enable Web service on a class method. Different plugins allow it with different level of easiness.

Play: This post shows that it's quite easy too.

Who wins: Tie




Industry Momentum

Google Trends search on grails, play framework:


Grails: Reached its plateau and shows signs of wearing out.

Play: Shows stable growth.

Who wins: Play

Backwards Compatibility

Grails: Fully backward compatible, there is no history of incompatibility

Play: Play 2.0 is NOT backward compatible with 1.2

Who wins: Grails


Results

Grails - 8, Play - 6

Grails is still winning? How come?

A personal advice to Play folks - change your modules repository, allow grades and reviews. It will drive the community-based development, the one that makes Grails' plugins so powerful.

===============================================

Update 2012-06-20

I am working with Scala now. Doing lots of BigData analysis, specifically using Cascading on Hadoop. My experience: Scala is not an easy language - steep learning curve, even for me, a seasoned Java-ist. Java is much easier to start coding with. Yes, more verbose but more readable too.
As a friend of mine once told me "Scala is for writing, not for reading".
 There are many people around me who I asked and who expressed a similar opinion. 

Do you really need Scala to develop business logic and UI?

Play Framework seems to be more and more about Scala - and Java becomes a second class citizen (you simply cannot provide the best of the class experience in both worlds, you need to focus on something). At the same time Scala developers are more expensive than Java developers. And in high demand. It simply means that your project will incur much higher cost and you'll have harder time finding your developers. So the question remains: d
o you really need Scala to develop business logic and UI?

I used Grails on another new project and I was completely loving it. One thing though - I used Intellij IDEA as IDE. The development is more than just enjoyful - everything simply works and works well. Sometimes I didn't know how to do something in Groovy - so I easily was using Java out of the box. Adding Spring security (one Grails plugin) - one evening, adding User management and self-registration (another Grails plugin) - another evening. Adding login using Facebook or Twitter - yet another evening using another plugin. Try to do the same with Play Framework. A quick google search on "play framework facebook authentication" gave one fbconnect module which supports only Play 1.0! And as usual - I have no idea whether this module is good or not because there are no community grading system for modules.

Also, Play Frameworks creators told us that ORM (Object Relational Mapping) is overrated. They are promoting Anorm - and again it's for Scala, not for Java. And it's not ORM as its name suggests. You can happily write your JDBC queries in your code. Oh not again! Not another holy war of Hibernate against JDBC or iBatis!! For me personally ORM is a great performance improvement - both development performance and database querying performance (considering Hibernate first and second level caching). So going back to the world of JDBC - thank you, no thank you.

Let's run the Google trends seach on grails, play framework :

Hmmm.... Isn't it Play Framework reaching its plateau?