26 October 2012

Dev Tip: MongoDB basics, best practices, schema design

Hi Folks,

Lets start with introduction to MongoDB; It covers how to try yourself, Installation on Linux. etc...

Lets look into MongoDB basics with Application development focus:

MongoDB Best Practice guide, lessons learned from expertise:

MongoDB Schema Design:

Dev Tip: Cake PHP RAD framework - Basics

Hi folks,

Today lets look into Cake PHP Rapid Application Development framework.

Learn how to build application with cake PHP:

Dev Tip: Best practices of CSS with statistics

Hi folks, lets look into css best practices, starts with some statistics, colors used, etc..

Very useful for Web Developers, especially those who are planning to start new Venture; online presence.

Dev Tip: PHP Best practice guide simplified!

Hi folks, Lets look into the PHP best practices.

Dev Tip: Why you need a PHP Framework and How?

Hi guys,

I found few good docs and presentations regarding how to pick the best framework for your needs. Every framework (ZEND, CakePHP, Yii, etc...) has its own advantages. Choosing wisely for your needs is one of the biggest challenge for you. Its all about whats the need for building a new framework? etc..

Lets look into the some best practices for PHP with Enterprise tools and techniques used.

19 October 2012

Cloud Tip: How to scale website for 1 million + hits

Yesterday I was working on solutions to scale up our App Server (JBoss) to a maximum of 1 million + hits (per day) by utilizing minimal resources. I came up with a solution something like:

serversetup

The minimal architecture becomes some thing like this

Varnish will cache all the request in memory. so what about the statefull requests.

I came up with a VCL script with this architecture:

VCL Script used (its a sample one, production one having ACL and other throttling features):

backend default {
    .host = "localhost";  # Varnish is running on same server as Apache
    .port = "8080";
}

sub vcl_recv {
  # remove unnecessary cookies
  if (req.http.cookie ~ "JSESSIONID") {
    return (pass);
  } else {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.http.cookie ~ "JSESSIONID" || req.request == "POST") {
    return (deliver);
  } else {
    # remove all other cookies and prevent backend from setting any
    unset beresp.http.set-cookie;
    set beresp.ttl = 600s;
  }
}

sub vcl_deliver {
  # send some handy statistics back, useful for checking cache
  if (obj.hits > 0) {
    set resp.http.X-Cache-Action = "HIT";
    set resp.http.X-Cache-Hits = obj.hits;
  } else {
    set resp.http.X-Cache-Action = "MISS";
  }
}

Dont forgot to benchmark with YSLOW or Apache Benchmark tool. :D

The real speed depands upon how much memory you are allocating for Varnish... its upto you!

You can customize this for you requirement. Take wordpress VCL template as base reference.

Reference Materials:

Varnish Internals

Cloud Tip: Amazon Cloud Security best practice

General Best Practice

Security Best Practice

Cloud Tips: Amazon Cloud High availability Workshop Slides

Part 1

Part 2

10 October 2012

Architecture Tip: Distributed Task Queue Framework

Celery: Distributed Task Queue

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

Tip: Interoperability can be obtained by webhooks ( can be used with PHP and other languages).

Celery is used in production systems to process millions of tasks a day.

First Steps with Celery

Celery is a task queue with batteries included. It is easy to use so that you can get started without learning the full complexities of the problem it solves. It is designed around best practices so that your product can scale and integrate with other languages, and it comes with the tools and support you need to run such a system in production.

In this tutorial you will learn the absolute basics of using Celery. You will learn about;

Choosing and installing a message broker.
Installing Celery and creating your first task
Starting the worker and calling tasks.
Keeping track of tasks as they transition through different states, and inspecting return values.

Celery may seem daunting at first - but don’t worry - this tutorial will get you started in no time. It is deliberately kept simple, so to not confuse you with advanced features. After you have finished this tutorial it’s a good idea to browse the rest of the documentation, for example the Next Steps tutorial, which will showcase Celery’s capabilities.

Choosing a Broker

Celery requires a solution to send and receive messages, usually this comes in the form of a separate service called a message broker.

There are several choices available, including:

RabbitMQ

RabbitMQ is feature-complete, stable, durable and easy to install. It’s an excellent choice for a production environment. Detailed information about using RabbitMQ with Celery:

Using RabbitMQ

If you are using Ubuntu or Debian install RabbitMQ by executing this command:

$ sudo apt-get install rabbitmq-server

When the command completes the broker is already running in the background, ready to move messages for you: Starting rabbitmq-server: SUCCESS.

And don’t worry if you’re not running Ubuntu or Debian, you can go to this website to find similarly simple installation instructions for other platforms, including Microsoft Windows:

http://www.rabbitmq.com/download.html

Redis

Redis is also feature-complete, but is more susceptible to data loss in the event of abrupt termination or power failures. Detailed information about using Redis:

Using Redis

Using a database

Using a database as a message queue is not recommended, but can be sufficient for very small installations. Your options include:

If you’re already using a Django database for example, using it as your message broker can be convenient while developing even if you use a more robust system in production.

Other brokers

In addition to the above, there are other transport implementations to choose from, including

Installing Celery

Celery is on the Python Package Index (PyPI), so it can be installed with standard Python tools like pip or easy_install:

$ pip install celery

Application

The first thing you need is a Celery instance, this is called the celery application or just app in short. Since this instance is used as the entry-point for everything you want to do in Celery, like creating tasks and managing workers, it must be possible for other modules to import it.

In this tutorial you will keep everything contained in a single module, but for larger projects you want to create a dedicated module.

Let’s create the file tasks.py:

from celery import Celery

celery = Celery('tasks', broker='amqp://guest@localhost//')

@celery.task
def add(x, y):
    return x + y

The first argument to Celery is the name of the current module, this is needed so that names can be automatically generated, the second argument is the broker keyword argument which specifies the URL of the message broker you want to use, using RabbitMQ here, which is already the default option. See Choosing a Broker above for more choices, e.g. for Redis you can use redis://localhost, or MongoDB:mongodb://localhost.

You defined a single task, called add, which returns the sum of two numbers.

Running the celery worker server

You now run the worker by executing our program with the worker argument:

$ celery -A tasks worker --loglevel=info

In production you will want to run the worker in the background as a daemon. To do this you need to use the tools provided by your platform, or something like supervisord (seeRunning the worker as a daemon for more information).

For a complete listing of the command line options available, do:

$  celery worker --help

There also several other commands available, and help is also available:

$ celery help

Calling the task

To call our task you can use the delay() method.

This is a handy shortcut to the apply_async() method which gives greater control of the task execution (see Calling Tasks):

>>> from tasks import add
>>> add.delay(4, 4)

The task has now been processed by the worker you started earlier, and you can verify that by looking at the workers console output.

Calling a task returns an AsyncResult instance, which can be used to check the state of the task, wait for the task to finish or get its return value (or if the task failed, the exception and traceback). But this isn’t enabled by default, and you have to configure Celery to use a result backend, which is detailed in the next section.

Keeping Results

If you want to keep track of the tasks’ states, Celery needs to store or send the states somewhere. There are several built-in result backends to choose from:SQLAlchemy/Django ORM, Memcached, Redis, AMQP (RabbitMQ), and MongoDB – or you can define your own.

For this example you will use the amqp result backend, which sends states as messages. The backend is specified via the backend argument to Celery, (or via theCELERY_RESULT_BACKEND setting if you choose to use a configuration module):

celery = Celery('tasks', backend='amqp', broker='amqp://')

or if you want to use Redis as the result backend, but still use RabbitMQ as the message broker (a popular combination):

celery = Celery('tasks', backend='redis://localhost', broker='amqp://')

To read more about result backends please see Result Backends.

Now with the result backend configured, let’s call the task again. This time you’ll hold on to the AsyncResult instance returned when you call a task:

>>> result = add.delay(4, 4)

The ready() method returns whether the task has finished processing or not:

>>> result.ready()
False

You can wait for the result to complete, but this is rarely used since it turns the asynchronous call into a synchronous one:

>>> result.get(timeout=1)
4

In case the task raised an exception, get() will re-raise the exception, but you can override this by specifying the propagate argument:

>>> result.get(propagate=True)

If the task raised an exception you can also gain access to the original traceback:

>>> result.traceback
...

See celery.result for the complete result object reference.

Configuration

Celery, like a consumer appliance doesn’t need much to be operated. It has an input and an output, where you must connect the input to a broker and maybe the output to a result backend if so wanted. But if you look closely at the back there’s a lid revealing loads of sliders, dials and buttons: this is the configuration.

The default configuration should be good enough for most uses, but there’s many things to tweak so Celery works just the way you want it to. Reading about the options available is a good idea to get familiar with what can be configured. You can read about the options in the the Configuration and defaults reference.

The configuration can be set on the app directly or by using a dedicated configuration module. As an example you can configure the default serializer used for serializing task payloads by changing the CELERY_TASK_SERIALIZER setting:

celery.conf.CELERY_TASK_SERIALIZER = 'json'

If you are configuring many settings at once you can use update:

celery.conf.update(
    CELERY_TASK_SERIALIZER='json',
    CELERY_RESULT_SERIALIZER='json',
    CELERY_TIMEZONE='Europe/Oslo',
    CELERY_ENABLE_UTC=True,
)

For larger projects using a dedicated configuration module is useful, in fact you are discouraged from hard coding periodic task intervals and task routing options, as it is much better to keep this in a centralized location, and especially for libraries it makes it possible for users to control how they want your tasks to behave, you can also imagine your SysAdmin making simple changes to the configuration in the event of system trouble.

You can tell your Celery instance to use a configuration module, by calling theconfig_from_object() method:

celery.config_from_object('celeryconfig')

This module is often called “celeryconfig”, but you can use any module name.

A module named celeryconfig.py must then be available to load from the current directory or on the Python path, it could look like this:

celeryconfig.py:

BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'

CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True

To verify that your configuration file works properly, and doesn’t contain any syntax errors, you can try to import it:

$ python -m celeryconfig

For a complete reference of configuration options, see Configuration and defaults.

To demonstrate the power of configuration files, this how you would route a misbehaving task to a dedicated queue:

celeryconfig.py:

CELERY_ROUTES = {
    'tasks.add': 'low-priority',
}

Or instead of routing it you could rate limit the task instead, so that only 10 tasks of this type can be processed in a minute (10/m):

celeryconfig.py:

CELERY_ANNOTATIONS = {
    'tasks.add': {'rate_limit': '10/m'}
}

If you are using RabbitMQ, Redis or MongoDB as the broker then you can also direct the workers to set a new rate limit for the task at runtime:

$ celery control rate_limit tasks.add 10/m
worker.example.com: OK
    new rate limit set successfully

Enterprise Tip: Zivios Enterprise Management System Open Source

Zivios Open source Enterprise Management Architecture

Zivios is a web based control panel which brings together vital open source technologies needed by medium and large enterprises. At it’s core, Zivios provides identity management, single sign-on, user, group and computer provisioning, as well as remote management of services.

What is Zivios?

Zivios aims to be a consolidated management portal for providing core infrastructure services using opensource technologies. The long term goals of Zivios are:

Identity Management
Single Sign-on and Certificate authority
Package and Patch Management
Service Management
Network Monitoring
Backup provisioning
Core Infrastructure Services (NTP,DNS, etc)

With an infinitely extensible plugin architecture, Zivios extends all aforementioned fundamentals to an arbitrary number of services. The ideology behind Zivios is that datastore for services can be (and most likely, will be) different. As such, we cannot depend on Ldap or Kerberos to fulfill our identity Management needs. Zivios allows the use of plugins which maintain their own independent datastore following CRUD operations.

With an extensibile framework, Zivios addresses the needs of complex heterogeneous deployments by providing an open and scalable API for modular development and sanitized consolidation.

Why Zivios?

Opensource software has progressed to the point where it can (feature-wise) compete with many proprietary offerings. However, use of such technologies stay limited to large corporations who have experienced and highly skilled IT staff. Zivios consolidates and simplifies the use and management of complex technologies and builds on them to provide an integrated identity management platform. With an infinitely extensible plugin system, any application can be managed via Zivios.

We feel that managing a fully featured opensource network is currently out of reach of the average administrator. Integration and proper management of servers, services and identity requires intricate knowledge and, unfortunately, has a rather steep learning curve.

Highlighting Key Points of Zivios

Zivios allows the administrator to get up to speed with the opensource technologies quickly and without requiring indepth knowledge. The core technologies come pre-integrated so little time is wasted in the redundant task of setting it up correctly for each and every system. Core services are online right after installation. Management is done completely via the web panel, even if some tasks require access to remote machines (Zivios acts on your behalf) using server side agents.
Zivios ensures correctness in day to day repetitive tasks such as identity management. In loosely coupled systems, Zivios also ensures compliance (imagine forgetting to delete an ex-employee's ERP account, where your ERP system is online and not integrated directly with deployed directory services.
Zivios reduces time to task completion with cascading data changes. Imagine having to reset the password for all users in your finance department. It would be quite laborious, especially if users in the department have a diverse password store.
Zivios allows delegated administration to be implemented in a simple manner. Since everything is part of the tree, you can delegate access of any tree object to any user or group. As training and service level knowledge is not required, delegated administration can actually be enforced. In opensource networks, it is common for few (or even singular) resources to have the knowledge required to administer a particular service or server. Delegated administration is impossible in this scenario as all requests will rebound automatically to that particular administrator, making him critical for that task.
Zivios enforces customizable organizational workflow. Workflow will be used (in future versions) to allow for the transaction to be deferred until approved.
Zivios allows administrators to focus on actual problems and spend time on improving the system rather than maintaining it on a day to day basic.

have a look at features, architecture and screenshots page for more details.

Zivios main interface is built upon a PHP framework, for communication between servers/devices they have implemented a python based modular and extensible agent(a simple XML-RPC server) with ssl support for security purpose.

Current version of zivios is 0.6.x, and is in under extensive development, you can communicate with zivios developers via irc or mailing lists.

Links

Zivios Website
Zivios on Launchpad

Server Tip: Clone Hard disk with basic Linux Utlities

To clone identical HDD in a Linux system you can use dd command, dd command makes it really easy.

dd if=/dev/hda of=/dev/hdb

above command will clone hda to hdb (partitions, boot record etc), but what if you have to clone a drive that is not attached to same system?

I found a really interesting way to transfer files over the network simple by using netcat and dd.

Boot the machine (where second drive is attached) with a live cd distro like Ubuntu or Knoppix. Setup networking or let the DHCP server assign an ip automatically. My DHCP assigned this machine ip 192.168.0.2
run following command on this machine
nc -l -p 12222 | dd of=/dev/hdb (where /dev/hdb is the target drive)
Now come to the machine where you have attached the drive to be cloned and issue following command.
dd if=/dev/hda | nc 192.168.0.2 12222 (/dev/hda is the drive to be cloned)

It will take time (usually several hours) depending on size of the drive. If you are worried about bandwidth you can pipe through gzip to compress and uncompress the streamed data on source and destination machine respectively, to do this you would run following commands.

On target machine:

nc -l -p 12222 | gzip –dfc | dd of=/dev/hdb

And on source machine.

dd if=/dev/hda | gzip –cf | nc 192.168.0.2 12222

Try dd and nc (Netcat) for simple backups too!

Web Tip: Create your own Open Source Social Networking Site

if you are planning to create your own social networking website, instead of working from scratch, check out Elgg. Elgg is an open source social networking platform.

from their website:

“Elgg is an open, flexible social networking engine, designed to run at the heart of any socially-aware application. Building on Elgg is easy, and because the engine handles common web application and social functionality for you, you can concentrate on developing your idea.”

Elgg can be extended through plugins, plugins provide ways of adding functionality to Elgg like facebook applications.

Server Tip: MySQL Cluster Setup

Introduction

This HOWTO is designed for a classic setup of two servers behind a load-balancer. The aim is to have true redundancy – either server can be unplugged and yet the site will remain up.

Notes:

You MUST have a third server as a management node but this can be shut down after the cluster starts. Also note that I do not recommend shutting down the management server (see the extra notes at the bottom of this document for more information). You can not run a MySQL Cluster with just two servers And have true redundancy.

Although it is possible to set the cluster up on two physical servers you WILL NOT GET the ability to “kill” one server and for the cluster to continue as normal. For this you need a third server running the management node.

we are going to talk about three servers:

node01.example.com 192.168.0.10

node02.example.com 192.168.0.20

node03.example.com 192.168.0.30

Servers node01 and node02 will be the two that end up “clustered”. This would be perfect for two servers behind a loadbalancer or using round robin DNS and is a good replacement for replication. Server node03 needs to have only minor changes made to it and does NOT require a MySQL install. It can be a low-end machine and can be carrying out other tasks.

Get the software:

For Generally Available (GA), supported versions of the software, download from

http://www.mysql.com/downloads/cluster/

Make sure that you select the correct platform – in this case, “Linux – Generic” and then the correct architecture (for LINUX this means x86 32 or 64 bit).

Note: Only use MySQL Server executables (mysqlds) that come with the MySQL Cluster installation.

STAGE1: Installation of Data and SQL nodes on node01 and node02

On each of the machines designated to host data or SQL nodes(in our case node01 and node02), perform the following steps as the system root user:

create a new mysql user group, and then add a mysql user to this group:

shell> groupadd mysql

shell> useradd -g mysql mysql

Change location to the directory containing the downloaded file, unpack the archive, and create a symlink to the mysql directory named mysql. Note that the actual file and directory names vary according to the MySQL Cluster version number.

shell> cd /var/tmp

shell> tar -C /usr/local -xzvf mysql-cluster-gpl-7.1.5-linux-x86_64-glibc23.tar.gz

shell> ln -s /usr/local/mysql-cluster-gpl-7.1.5-linux-i686-glibc23 /usr/local/mysql

shell> export PATH=$PATH:/usr/local/mysql/bin

shell> echo “export PATH=\$PATH:/usr/local/mysql/bin” >> /etc/bash.bashrc

Change location to the mysql directory and run the supplied script for creating the system databases:

shell> cd mysql

shell> ./scripts/mysql_install_db –user=mysql
Set the necessary permissions for the MySQL server and data directories:

shell> chown -R root .

shell> chown -R mysql data

shell> chgrp -R mysql .
Copy the MySQL startup script to the appropriate directory, make it executable, and set it to start when the operating system is booted up:

shell> cp support-files/mysql.server /etc/init.d/mysql

shell> chmod +x /etc/init.d/mysql

shell> update-rc.d mysql defaults

STAGE2: Installation of Management node on node03

Installation of the management node does not require the mysqld binary. Only the MySQL Cluster management server (ndb_mgmd) is required; I assume that you have placed mysql-cluster-gpl-7.1.5-linux-i686-glibc23.tar.gz in /var/tmp.

As system root perform the following steps to install ndb_mgmd and ndb_mgm on the Cluster management node host (node02):

Change location to the /var/tmp directory, and extract the ndb_mgm and ndb_mgmd from the archive into a suitable directory such as /usr/local/bin:

shell> cd /var/tmp

shell> tar -zxvf mysql-cluster-gpl-7.1.5-linux-i686-glibc23.tar.gz

shell> cd /usr/local/mysql-cluster-gpl-7.1.5-linux-i686-glibc23

shell> cp bin/ndb_mgm* /usr/local/bin
Change location to the directory into which you copied the files, and then make both of them executable:

shell> cd /usr/local/bin

shell> chmod +x ndb_mgm*

STAGE3: Configuration of Management node

The first step in configuring the management node is to create the directory in which the configuration file can be found and then to create the file itself. For example (running as root):

shell> mkdir /var/lib/mysql-cluster

shell> cd /var/lib/mysql-cluster

shell> vi config.ini

For our setup, the config.ini file should read as follows:

[ndbd default]

NoOfReplicas=2

DataMemory=80M

IndexMemory=18M

[tcp default]

[ndb_mgmd]

hostname=192.168.0.30 # Hostname or IP address of MGM node

datadir=/var/lib/mysql-cluster # Directory for MGM node log files

[ndbd]

hostname=192.168.0.10 # Hostname or IP address

datadir=/usr/local/mysql/data # Directory for this data node’s data files

[ndbd]

hostname=192.168.0.20 # Hostname or IP address

datadir=/usr/local/mysql/data # Directory for this data node’s data files

[mysqld]

hostname=192.168.0.10 # Hostname or IP address

[mysqld]

hostname=192.168.0.20 # Hostname or IP address

STAGE4: Configuration of Data and SQL nodes

The first step in configuring the management node is to create the directory in which the configuration file can be found and then to create the file itself. For example (running as root):

shell> vi /etc/my.cnf

Note :
We show vi being used here to create the file, but any text editor should work just as well.

For each data node and SQL node in our setup, my.cnf should look like this:

[client]

port = 3306

socket = /tmp/mysql.sock

[mysqld]

port = 3306

socket = /tmp/mysql.sock

skip-locking

ndbcluster # run NDB storage engine

ndb-connectstring=192.168.0.30 # location of management server

[mysql_cluster]

ndb-connectstring=192.168.0.30 # location of management server

Important :
Once you have started a mysqld process with the NDBCLUSTER and ndb-connectstring parameters in the [mysqld] in the my.cnf file as shown previously, you cannot execute any CREATE TABLE or ALTER TABLE statements without having actually started the cluster. Otherwise, these statements will fail with an error.

STAGE4: Starting the MySQL Cluster

Starting the cluster is not very difficult after it has been configured. Each cluster node process must be started separately, and on the host where it resides. The management node should be started first, followed by the data nodes, and then finally by any SQL nodes:

On the management host(node03), issue the following command from the system shell to start the management node process:

shell> ndb_mgmd -f /var/lib/mysql-cluster/config.ini –configdir=/var/lib/mysql-cluster
On each of the Data/SQL node hosts, run these commands to start the ndbd and mysql server process:

shell> /usr/local/mysql/bin/ndbd

shell> /etc/init.d/mysql start

If all has gone well, and the cluster has been set up correctly, the cluster should now be operational. You can test this by invoking the ndb_mgm management node client. The output should look like that shown here:

node03:~# ndb_mgm

– NDB Cluster — Management Client –

ndb_mgm> SHOW

Connected to Management Server at: localhost:1186

Cluster Configuration

———————

[ndbd(NDB)] 2 node(s)

id=2 @192.168.0.10 (mysql-5.1.44 ndb-7.1.5, Nodegroup: 0, Master)

id=3 @192.168.0.20 (mysql-5.1.44 ndb-7.1.5, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)

id=1 @192.168.0.30 (mysql-5.1.44 ndb-7.1.5)

[mysqld(API)] 2 node(s)

id=4 @192.168.0.10 (mysql-5.1.44 ndb-7.1.5)

id=5 @192.168.0.20 (mysql-5.1.44 ndb-7.1.5)

STAGE5: Testing the Setup

If you are OK to here it is time to test mysql. On either server node01 or node02 enter the following commands: Note that we have no root password yet:

shell> mysql

create database testdb;

use test;

CREATE TABLE cluster_test (i INT) ENGINE=NDBCLUSTER;

INSERT INTO cluster_test (i) VALUES (1);

SELECT * FROM cluster_test;

You should see 1 row returned (with the value 1).

If this works, now go to the other server and run the same SELECT and see what you get. Insert from that host and go back to previous host and see if it works. If it works then you made it!