What is Sonar?
Sonar is an open source platform for continuous inspection of code quality. It is developed with a main objective in mind: make code quality management accessible to everyone with minimal effort.
Sonar provides code analyzers, reporting tools, defects hunting modules and TimeMachine as core functionality.
Sonar covers the 7 axes of code quality check points which include:
How does Sonar work?
Sonar is made of simple and flexible architecture that consists of three components:
A) A set of source code analyzers that are grouped in a maven plugin and are triggered on demand. The analyzers use configuration which is stored in the database. Though Sonar relies on Maven to run analysis, it is also capable to analyze Maven and non-Maven projects.
B) A database to not only store the results of analysis, projects and global configuration but also to keep historical analysis for Time Machine. 5 Database engines are currently supported: Oracle, MySQL, Derby, PostgreSQL and MS SQLServer
C) A web reporting tool is used to display code quality dashboards on projects, hunt for defects, and check TimeMachine and to configure analysis.
What Sonar provides?
Sonar enables to manage multiple quality profiles in order to adapt the required level to the type of project means new project, critical application, technical lib etc. Managing a profile consists of activate, deactivate, weight coding rules define thresholds on metrics for automatic alerting define project, profile association
Sonar have 2 dashboards that give the big picture to get hints where there might be issues and to compare projects 1)a consolidated view that shows all projects 2) project dashboard is also available at modules and packages level
To confirm that what seems to be an issue is really an issue, Sonar offers a hunting tool set which enables to go from overview to smallest details:
A) It is drill down on every measure displayed to see what is behind
B) Classes clouds used to find less covered classes by unit tests
C) Hotspots which have on a page the most and the least files
D) And a multi-entry like duplication, coverage, violations, tests success etc. source viewer to confirm the findings made with the hunting tools
TimeMachine is used to watch the evolution, replay the past, especially as it records versions of the project
Native application is an application program that has been developed for use on a particular platform or device. It is dependent on device or platform.
An HTML 5 app is housed on the web and runs inside a mobile browser. Unlike apps built specially for Apple devices, Android devices or any other devices. Here, the developer doesn’t need to build the app specifically bfor each OS.
What’s better : Native Mobile Applications Or HTML 5 Applications. To decide this, lets discuss some of the important factors that will impact application development, end user experience & mobile app development.
1) Better User Experience & Performance:
Native apps are developed for specific platform; they can easily interact with various features of the operating system and get advantage of special functionalities, these are not still achievable using HTML 5.
2) Faster Graphics & Fluid Animations:
Graphics are faster, plain and simple in native apps as it directly communicates using frameworks, which are specially, designed for specific platforms. It might not be so important if we are dealing with simple apps but it’s especially important in games, apps that are very interactive and manipulate images and sounds.
Along with performance, security is one of the reasons of using native app over HTML 5 app. Native application are more secure due to use of native framework and APIs.
HTML 5 Applications:
1) Cross-platform deployment cost:
HTML 5 is known to be the common language of the web, It is meant to work seamlessly across mobile platforms and browsers. HTML 5 web apps can be installed from the web as icons on your home screen across any phone.
2) Easy to develop:
It’s easy to develop applications using HTML 5 and most of the developers like to write code using HTML.
3) Searchable contents:
The contents on HTML 5 can be crawled and indexed by search engines, so possibly discovered by people searching on the net. This will increase the accessibility of the app tremendously.
Finally, Native Apps & HTML 5 Mobile Apps both are acceptable based on functionality of applications, if you have some limited functionality in applications & functional domain is not so much complicated then HTML 5 applications are best to choose, if you are dealing with some complicated domain functionality & want a good user experience & better performance then you need to go with Native Apps.
This is second blog to our series of blog for more information about Hadoop. Here, we need to consider two main pain point with Big Data as
Hadoop is designed for parallel processing into a distributed environment, so Hadoop requires such a mechanism which helps users to answer these questions. In 2003 Google has published two white papers Google File System (GFS) and MapReduce framework. Dug Cutting had read these papers and designed file system for hadoop which is known as Hadoop Distributed File System (HDFS) and implemented a MapReduce framework on this file system to process data. This has become the core components of Hadoop.
Hadoop Distributed File System :
HDFS is a virtual file system which is scalable, runs on commodity hardware and provides high throughput access to application data. It is a data storage component of Hadoop. It stores its data blocks on top of the native file system.It presents a single view of multiple physical disks or file systems. Data is distributed across the nodes, node is an individual machine in a cluster and cluster is a group of nodes. It is designed for applications which need a write-once-read-many access. It does not allow modification of data once it is written. Hadoop has a master/slave architecture. The Master of HDFS is known as Namenode and Slave is known as Datanode.
It is a deamon which runs on master node of hadoop cluster. There is only one namenode in a cluster. It contains metadata of all the files stored on HDFS which is known as namespace of HDFS. It maintain two files EditLog, record every change that occurs to file system metadata (transaction history) and FsImage, which stores entire namespace, mapping of blocks to files and file system properties. The FsImage and the EditLog are central data structures of HDFS.
It is a deamon which runs on slave machines of Hadoop cluster. There are number of datanodes in a cluster. It is responsible for serving read/write request from the clients. It also performs block creation, deletion, and replication upon instruction from the Namenode. It also sends a Heartbeat message to the namenode periodically about the blocks it hold. Namenode and Datanode machines typically run a GNU/Linux operating system (OS).
Following are some of the characteristics of HDFS ,
1) Data Integrity :
When a file is created in HDFS, it computes a checksum of each block of the file and stores this checksums in a separate hidden file. When a client retrieves file contents, it verifies that the data it received matches the checksum stored in the associated checksum file.
2) Robustness :
The primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are NameNode failures, DataNode failures and network partitions.
3) Cluster Rebalancing :
The HDFS is compatible with data re balancing that means it will automatically move the data from one datanode to another, if free space on datanode falls below a certain threshold.
4) Accessibility :
It can be accessed from applications in many different ways. Hadoop provides a Java API for applications to use. An HTTP browser can also be used to browse the files of an HDFS instance using default web interface of hadoop.
5) Re-replication :
When a datanode send heartbeats to namenode and if any block is missing then namenode mark that block as dead. This dead block is re-replicated from the other datanode. Re-replication arise when a datanode become unavailable, a replica is corrupted, a hard disk may fail, or the replication factor value is increased.
In general MapReduce is a programming model which allows to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop uses this model to process data which is stored on HDFS. It splits a task across the processes. Generally, we send data to the process but in MapReduce we send process to the data which decreases network overhead.
MapReduce job is an analysis work that we want to run on data, which is broken down into multiple task because the data is stored on different nodes which can run paralleled. A MapReduce program processes data by manipulating (key/value) pairs in the general form
map: (K1,V1) ? list(K2,V2)
reduce: (K2,list(V2)) ? list(K3,V3)
Following are the phases of MapReduce job ,
1) Map :
In this phase we simultaneously ask our machines to run a computation on their local block of data. As this phase completes, each node stores the result of its computation in temporary local storage, this is called the “intermediate data”. Please note that the output of this phase is written to the local disk, not to the HDFS.
2) Combine :
Sometime we want to perform a local reduce before we transfer result to reduce task. In such scenarios we add combiner to perform local reduce task. It is a reduce task which runs on local data. For example, if the job processes a document containing the word “the” 574 times, it is much more efficient to store and shuffle the pair (“the”, 574) once instead of the pair (“the”, 1) multiple times. This processing step is known as combining.
3) Partition :
In this phase partitioner will redirect the result of mappers to different reducers. When there are multiple reducers, we need some ways to determine the appropriate one to send a (key/value) pair outputted by a mapper.
4) Reduce :
The Map task on the machines have completed and generated their intermediate data. Now we need to gather all of this intermediate data to combine it for further processing such that we have one final result. Reduce task run on any of the slave nodes. When the reduce task receives the output from the various mappers, it sorts the incoming data on the key of the (key/value) pair and groups together all values of the same key.
The Master of MapReduce engine is known as Jobtracker and Slave is known as Tasktracker.
Jobtracker is a coordinator of the MapReduce job which runs on master node. When the client machine submits the job then it first consults Namenode to know about which datanode have blocks of file which is input for the submitted job. The Job Tracker then provides the Task Tracker running on those nodes with the Java code required to execute job.
Tasktracker runs actual code of job on the data blocks of input file. It also sends heartbeats and task status back to the jobtracker.
If the node running the map task fails before the map output has been consumed by the reduce task, then Jobtracker will automatically rerun the map task on another node to re-create the map output that is why it is known as self-hexaling system.
Read More : Apache Hadoop: An Introduction