Tuesday, 22 November 2016

What machine learning algorithms can be suitable for data and images?

What machine learning algorithms can be suitable for data and images?

In this post I would like to give brief idea about when to use which algorithms on what data.

We have gone through the previous post, Introduction to machine learning.

Machine learning is used to filter spam data, recognizing the face, recommendation engines.When you have a large data set on which you’d like to perform analytical analysis or for example to recognize the patterns.

To avoid explicit programming, we can use machine learning, to train computers which can learn, analyse and act on the data as we specified.These qualities make machine learning more powerful on these days and also all the machine learning softwares are free.We can also implement our application on single machine or massive scale.

We can use machine learning libraries in all languages, environment you prefer (java, c++, python...etc languages, windows, linux...Etc OSs)

Here I am giving brief introduction of 11 machine learning tools, provides functionality for individual apps.


Scikit-learn for Python:

This Scikit learn is implemented on the top of existing python packages like NumPy, SciPy, and matplotlib. The resulting libraries can be used either for interactive “workbench” applications or be embedded into other software and reused. It is an open source.

Scan the list of things available in scikit-learn and you quickly realize that it includes tools for many of the standard machine-learning tasks (such as clustering, classification, regression, etc.). And since scikit-learn is developed by a large community of developers and machine-learning experts, promising new techniques tend to be included in fairly short order.

Machine learning tasks like Classification, Clustering, Regression and Dimensionaliy Reduction

Shogun for C++ Java, Python, C#, Ruby, R, Lua, Octave, and Matlab:

It is a free, open source toolbox, which is written in c++,  it contains many algorithms and data structures for machine learning problems.The core of shogun is written ibn C++ and also it provides the interfaces like Java, Python, C#, Ruby, R, Lua, Octave, and Matlab.

Shogun is a faster and easy to work with, than other libraries. It has the pre-calculated kernels. Shogun was developed with bioinformatics applications.
It is capable of processing huge datasets consisting of up to 10 million samples. .

Shogun supports below algorithms.
1. Support vector machines
2. Dimentionality reduction algorithms, as listed below

  • PCA (princliple component analysis)
  • Kernel PCA
  • Locally Linear Embedding
  • Hessian Locally Linear Embedding
  • Local Tangent Space Alignment
  • Linear Local Tangent Space Alignment
  • Kernel Locally Linear Embedding
  • Kernel Local Tangent Space Alignment
  • Multidimensional Scaling
  • Isomap
  • Diffusion Maps
  • Laplacian Eigenmaps

3. Clustering algorithms: k-means and GMM
4. Kernel Ridge Regression, Support Vector Regression
5. HMM(A hidden Markov model)
6. K-Nearest Neighbors, for pattern recolonization
7. Kernel Perceptions.

What are kernels?

Kernals takes two inputs and spits out how similar they are.

Data --> Features --> learning algorithm

There are different kernels implemented, ranging from kernels for numerical data (such as gaussian or linear kernels) to kernels on special data.

In shogun kernels was implemented for numeric data as listed below:
linear
gaussian
polynomial
sigmoid kernels

The supported kernels for special data include:
Spectrum
Weighted Degree
Weighted Degree with Shifts

MLlib for HADOOP, SCALA and Python:

MLlib is Apache’s own machine learning library for Spark and Hadoop, it contains a range of common algorithms and useful data types, designed to run at speed and scale.
Highlites of MLlib

•It is built on Apache Spark, a fast and general engine for large-scale data processing.
•Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
•Write applications quickly in Java, Scala, or Python.

It works with any Hadoop project, Java is the primary language for working in MLlib, Python users can connect MLlib with the NumPy library

Coming to scala users, they can write code against MLlib. If setting up a Hadoop cluster is impractical, MLlib can be deployed on top of Spark without Hadoop.

Mahout for HADOOP:

The Mahout framework has long been tied to Hadoop, but many of the algorithms that present in it can also run as-is outside Hadoop. Mahout primarily used in producing scalable machine learning algorithms.  They're useful for stand-alone applications that might eventually be migrated into Hadoop or for Hadoop projects that could be spun off into their own stand-alone applications.

It implements below machine learning techniques:
Recommendation
Classification
Clustering

Some of the companies such as Adobe, Facebook, LinkedIn, Foursquare, Twitter, and Yahoo use Mahout internally.

Accord Framework or AForge.net FOR .NET for image processing:

This is also a machine learning framework which is uaed for signal processing. AForge.net. “Signal processing,” by the way, refers here to a range of machine learning algorithms for images and audio, such as for seamlessly stitching together images or performing face detection. 

AForge.net: These algorithms are used for video reading and used to implement such functions as the tracking of moving objects. 

Accord: These algorithms also include libraries that provide a more conventional gamut of machine learning functions, from neural networks to decision-tree systems.

H2O for BIG DATA, R and JSON APIs:

This is an open source math and machine learning engine for big data that brings distribution and parallelism to powerful algorithms while keeping the widely used languages of R and JSON as an API. 

It has storage platform and provides a user-friendly interface for easy querying. 
Users interact with H2O via a GUI that uses standard R statistical analysis syntax while running machine learning algorithms behind the scenes.
One greater feature is, because of its in-memory distributed key value store,  H2O can process data faster and at a larger scale than other predictive analytics solutions. 
And also it is very fast, interactive, real-time predictive analytics.The applications are vary from different domains.

The speed and flexibility of H2O allows the user to fit hundreds or thousands of potential models as part of discovering patterns in data. 

It has the interface for JAVA, SCALA, PYTHON and R.

WHO IS USING H2O?

Users of H2O and 0xdata include Netflix, Rushcard, Trulia, and Vendavo for machine learning on their big datasets. 

Cloudera Oryx for HADOOP:

Cloudera Oryx machine learning project is also designed for Hadoop, Oryx is designed to allow machine learning models to be deployed on real-time streamed data, enabling projects like real-time spam filters or recommendation engines.

Oryx is used to implement recommendation systems. Using Oryx we can try to build recommendation engines.

GoLearn:
GoLearn contains the lack of an all-in-one machine learning library, the simplicity comes from the way data is loaded and handled in the library, since it’s patterned after SciPy and R. 
We can easily extend the implementations of data structures.

Weka especially for DATA MINING:

Weka, a product of the University of Waikato, New Zealand, collects a set of Java machine learning algorithms engineered specifically for data mining. This GNU GPLv3-licensed collection has a package system to extend its functionality, with both official and unofficial packages available. Weka even comes with a book to explain both the software and the techniques used, so those looking to get a leg up on both the concepts and the software may want to start there.

While Weka isn’t aimed specifically at Hadoop users, it can be used with Hadoop thanks to a set of wrappers produced for the most recent versions of Weka. Note that it doesn’t yet support Spark, only MapReduc. Clojure users can also leverage Weka, thanks to the Clj-ml library.

CUDA-Convnet FOR Python:

By now most everyone knows how GPUs can crunch certain problems faster than CPUs. But applications don’t automatically take advantage of GPU acceleration; they have to be specifically written to do so. CUDA-Convnet is a machine learning library for neural-network applications, written in C++ to exploit the Nvidia’s CUDA GPU processing technology;For those using Python rather than C++, the resulting neural nets can be saved as Python pickled objects and thus accessed from Python.

ConvNetJS for JAVA SCRIPT:

This frame work provides neural network machine learning libraries for use in JavaScript. This facilitates the use of the browser as a data workbench. An NPM version is also available for those using Node.js, This framework is designed to make proper use of JavaScript’s asynchronicity.

No comments: