Thursday, February 04, 2016

What is the method of Map and Reduce – How it works? (Part 1)


Hadoop MapReduce is a software framework developed by Apache, which is used to process huge datasets in parallel across clustered hardware. There are two basic steps involved in this process: Map and Reduce. In this article, we shall see that how Map and Reduce steps work. But before that, let us review another important term i.e. MapReduce Job.
A Job is a top level unit of work.  Job typically consists of two phases i.e. map and reduce. However, the latter can be avoided. A typical example of MapReduce Job is counting the number of occurrences of a specific word across thousands of documents. During the Map phase of the job, occurrence of the word in each document is counted while reduce phase aggregates word count of the individual document and results in a total sum.

The Map Phase

In Map phase, the MapReduce framework takes input data by default from Hadoop Distributed File System. The input data is divided into small chunks which are processed by several map tasks running in parallel across the hadoop cluster.

Tuesday, February 02, 2016

C#: A Brief History of Microsoft’s Premier Language

ByteScout BarCode Generator freeware

C#: Where it all started

C# is the premier language of Microsoft’s famous .NET framework. With huge success of JAVA and its WORA (Write once run anywhere) feature, big bosses at Microsoft started to think that there was a need of programming language that can compete with Java. As a result C# was developed which is strongly typed and fully object oriented. The first version of the C# was released in 2000 as an integral part of Microsoft’s .NET framework. Before C#, Simple Managed C (SMC) compiler system was used for writing class libraries. C# was one of the many languages designed for common language interface.

Authors of C#

C# was developed by a team of developers at Microsoft led by Anders Hejlsberg. Anders Hejlsberg had previously developed languages such as Embarcadero Delphi and Turbo Pascal. The development process was started in January 1999. Language was initially named COOL (C-Like Object Oriented Language). In July 2000, Microsoft announced .NET Project at professional developer’s conference. By that time, the name COOL was considered final. However due to trademark reasons, the language was renamed as C#.

Criticism and Opinions

Sun Microsystems’ co-founder Bill Joy and Author of Java James Gosling severely criticized C#. They maintained that C# was a downgraded version of Java with reliability, security and productivity removed from Java. Authors of stream books Angelica Langer and Klaus Kreft said that C# and Java have astounding similarities and C# lacks innovation. There is hardly anything that C# provides which Java doesn’t. However at July 2000’s developer conference, Ander Hejlsberg retaliated by saying that C# is much closer to C++ in its design rather than to Java.
Currently C# 6.0 is the latest version of C# which was released in July 2000.

P.S. BarCode Generator SDK is made using 100% C# code

Thursday, January 28, 2016

What is Hadoop and What it is for?

If you have heard of big data, you have probably heard of Hadoop as well. These two terms are often used in conjunction owing to the fact that Hadoop is the primary software framework currently being used for distributed storage, manipulation and distributed processing of huge amount of data residing on a cluster of computers that are build on commodity hardware.
Currently, companies like LinkedIn, Facebook, Yahoo and ebay use Hadoop for processing huge amount of data and extracting useful information from that. Hadoop fetches inspiration from Google’s publications on Big table, Map reduce and GoogleFS. However, the fact that Hadoop can be hosted on very simple commodity hardware i.e. an Intel based PC with Linux OS and few TB of hard disk, makes it one of the most popular big data processing software. The Hadoop framework consists of many tools. HDFS and Hadoop MapReduce are the two core subsets of the Hadoop environment.
Hadoop Distributed File System (HDFS)
HDFS is a special name for file system used by Hadoop environment. HDFS is similar to any other file system. The difference between HDFS and ordinary file system is that when a file is stored on HDFS, it is virtually divided into small chunks and replicated on three hardware servers by default. This increases fault tolerance of the file system.
Hadoop MapReduce
Hadoop MapReduce is a phenomenon where a large processing request is split into multiple smaller requests that are sent to several processing servers which process these requests in parallel. This technique utilizes scalability power of CPU in most efficient manner.
Apart from these two core technologies, Hadoop framework consists of multiple small tools and technologies such as HBase, Lucene search engine,  ZooKeeper and Languages such as Pig and Hive.

photo credit: n1atsigns2 via photopin (license)

Tuesday, January 26, 2016

New software released: PDF Extractor SDK 6.20.2354, PDF Multitool 6.20.2354, PDF Renderer SDK 6.20.2354, PDF To HTML SDK 6.20.2354, PDF Viewer SDK 6.20.0.2354


ByteScout software

What's new ByteScout PDF Extractor SDK 6.20.2354:

  • PDF To Text, PDF To CSV, PDF To XML functions improved
  • New Extract Video, Extract Audio examples
  • CSV and XML extractors improved support for tables with empty columns inside
  • new MultimediaExtractor to extract video and audio from PDF
  • new property PageDataCaching
  • new "MemoryCareProcessingOfHugeFiles" example
  • fixed null exception when trying to dispose already disposed pages
  • XLSExtractor: improves fonts support
  • SkipInvisibleText now skips clipped text (which is not visible)
  • text output rendering improved
  • XFDF Extractor: added support for checkboxes
  • Images output improved to support more sub-formats
  • Unicode text handling improved
What's new ByteScout PDF Renderer SDK 6.20.2354:
  • PDF To Image conversion improved
  • PDF reading speed improved
  • improved support for images and text
  • new property PageDataCaching controlling automatic disposing of previously accessed pages
  • added DisposePage()
  • new "MemoryCareProcessingOfHugeFiles" example
  • fixed null exception when trying to dispose already disposed pages
  • Annotations scaling and coordinates improved
  • Rendering: Added command line vbscript example
  • Rendering improves annotations rendering
  • Colors management improved
  • new "Make Thumbnail" example
  • Images output improved to support more sub-formats
What's new ByteScout PDF Viewer  6.20.0.2354:
  • PDF viewer control for WInForms improved
  • PDF reading speed improved
  • improved support for images and text
  • annotations rendering improved
  • Colors management improved
  • Images output improved to support more sub-formats
  • Unicode text handling improved
  • More PDF specific functions and PDF reading speed improvements
  • CSV and XML extractors improved support for tables with empty columns inside
  • new functions: Extract Video and Extract Audio
  • File annotations rendering improved
  • fixed null exception when trying to dispose already disposed pages
  • Annotations scaling and coordinates improved
  • XLSExtractor: improves fonts support
  • Rendering: Added command line vbscript example
  • Rendering improves annotations rendering
  • text output rendering improved
  • XFDF Extractor: added support for checkboxes
  • PDF2HTML improves form controls output
  • Images output improved to support more sub-formats
  • Unicode text handling improved
  • TIFF splitting improved

  • PDF To HTML conversion improved
  • PDF reading speed improved
  • new property PageDataCaching controlling automatic disposing of previously accessed pages
  • implementing page caching type
  • example "Memory-Care Processing Of Huge Docs" renamed to "Reduce Memory Usage"
  • new DisposePage() method to HTMLExtractor
  • SkipInvisibleText now skips clipped text (which is not visible)
  • Rendering improves annotations rendering
  • Colors management improved
  • PDF2HTML improves form controls output
  • Images output improved to support more sub-formats
  • Unicode text handling improved

Tuesday, January 19, 2016

Big Data: What it is & what it is not (Part 2)

Read the 1st part of the article 

What is not big data?

Data which doesn’t fulfill all of the above three characteristics isn’t considered big data. For instance the data which is not huge, being generated at a pace which can be handled by traditional DBMS and is well structured and relational isn’t considered big data. Data which fulfills one of the three Vs is Big data.

How Big Data is collected?

Though, anyone with specialized software and hardware products can collect big data but primarily big data is collected by researchers for research purposes and business analyst for extracting useful information from the data which leverages them to take business decisions. 

How much data is considered big data?

Numerically speaking, data which consist of petabytes(1024 terabytes) or more is considered as big data. However, different analysts have different definitions regarding the magnitude of big data. It is also said that data which cannot be processed by a single machine, or if the data processing requires specialized tools for storing and processing, the data is called big data. Also, if you need to hire a team of data-scientists just for manipulating data, consider your data, Big data.

Friday, January 15, 2016

Big Data: What it is & what it is not (Part 1)

What is big data? ByteScout Bear goes to know
Analysts have been processing data for centuries in order to extract useful information from it. With the advent of modern computers, data processing has been revolutionized. 

DBMS systems can efficiently store, manipulate, edit and extract useful information from data. However, with the advent of social media, online marketing and digitalization of manual data entry systems, the amount of data has grown exponentially. Big data is so huge that existing systems are unable to process such huge data. Here comes the term big data.


What big data is?

Industry analyst, Doug Laney says that data having the following three characteristics, known as 3 Vs, is considered big data. 


Volume:

The data is so large that it can’t be handled by existing relational DBMS systems. For example, the data obtained from twitter posts and other social media platforms.


Velocity

Data is generated at such a fast speed that it can’t be processed at runtime. 


Variety

Data can be of different varieties i.e. structured data as well as unstructured data such as videos, images, email, financial data and stock tier data.
Read the 2nd part of the article 

Tuesday, January 12, 2016

Differences between VB 6 and VB.NET & Migrating VB 6 code to Microsoft VB.NET

ByteScout .NET Bear
VB 6 is old application development framework that targets the COM infrastructure which in turn depends upon the useable components of the windows framework. VB 6 is considered extremely simple since it doesn’t depend on outside components since all the functions and features that are required by the application are shipped in the form of runtime libraries with VB 6 application code. VB 6 code can either be compiled to interpreted P code or the native runtime code.
On the other hand, Microsoft VB.NET is the new generation language that targets .NET runtime and is part of Microsoft’s .NET Framework. VB .NET borrows usable components from .NET framework whereas execution of the VB.NET code is the responsibility of the .NET runtime. Since VB .NET code is first compiled into Microsoft intermediate language code which is common for all Microsoft languages such as C#, J# etc. This makes VB.NET eligible for back end programming language of ASP.NET applications as well. Following are some of the basic differences between VB.NET and VB 6.
  1. VB.NET uses .NET Common Language Runtime while VB 6 used VB-Runtime.
  2. VB.NET is a strongly typed or type-safe language while VB 6 is not strongly typed.
  3. VB.NET is a compiled language while VB 6 is an interpreted language.


Migrating code from VB 6 to VB.NET
Following are some of the blogs and articles that explain the process of converting VB 6 programs to VB.NET.
  • MSDN
  • StackOverFlow
  • CodeProject


  • VisualBasic.About.Com