Big data – sql query engines & data stores

Apache Hive – Provides structure for data already stored in a distributed system.

Apache Impala – massively parallel processing SQL query engine

Apache Hbase – a database, a distributed, scalable, big data store.

Apache Kudu – a distributed data storage engine that makes fast analytics on fast and changing data easy

A Day in the life of a Big Data Architect


What’s happening?

About Me

Hi, I’m Vj, founder

at learn with vj

I am a Big Data Architect, data technology blogger, and content creator; exploring data storage, data processing, and data extraction in the big data world. Expect the latest ideas and thoughts about the data; data trends, usage, enhancements, industry reviews, and an occasional video! I hope you enjoy!


  • Big Data: Series 1 Part 1 – Introduction to Big Data

    Introduction to Big Data: This video is an attempt to provide a high-level view of Big Data. I hope you enjoy the content. Click YouTube Link or Heading to Watch..


  • Data Architecture – Series 1 Part 1 – Data Architecture Introduction

    Data Architecture refers to the art and science of building data structures and the process of building enterprise data structures. In this series of videos, you can learn the elements of data-architecture as an Introduction.


  • Apache Ozone: By Cloudera

    An Object Storage for Big Data: Ozone – Object Storage for Big Data – Arpit Agarwal, Sr Engineering Manager, Storage Team, Cloudera This demo is very interesting if you are looking for Object storage on premise: Disclaimer: This demo is by Cloudera


  • hdfs : Replication Factor

    Replication factor: How to change the default replication factor of an existing file in hdfs? When it comes to the reliability of the file system of the blocks, there is a property called replication factor. The replication factor in a production cluster typically set three, and also the default replication factor, also three, which means […]


  • hdfs : finding block size

    Finding Block Size: Finding default block size : Finding blocksize using the hdfs-site.xml file :


  • Productivity Tips – Time management

    Time management is the most important factor for Productivity. There are only 24 hours in a day for everyone Some people make the time productive and the others not that much. In this video let us see some tips and tricks how to make use of the time in a timely and productive way


  • Productivity Tips – Mindfulness

    Mindfulness is the psychological process of purposely bringing one’s attention to experiences occurring in the present moment without judgment, which one develops through the practice of meditation and through other training.


  • 5 Python Conditional Statement 4 – If..else..elif

    If..else..elif : code if person_age < 4: ticket_price = 0elif person_age < 18: ticket_price = 5elif person_age < 65: ticket_price = 10else: ticket_price = 5 Python does not require an else block at the end of an if-elif chain. Sometimes an else block is useful; sometimes you can avoid using.


  • 5 Python Conditional Statements 3 – Condition operators or symbols

    Operators evaluation: 1) == Equal to operator – True when matches – False when don’t match.CASE == CASE (matches and True in this case)Name = ‘Vijay’Name == ‘vijay’ (False)Age = 18Age == 18 (True) 2) != not equal to operator – True when matches – False when don’t match. CASE != case (matches and True […]


  • 5 Python Conditional statements 2 – Simple If..else

    Simple If..else statement: code cities = [‘LA’, ‘Chicago’, ‘kansas city’, ‘columbia’,’Boston’] for city in cities: if city == ‘Boston’: print(city.upper()) else: print(city.title()) Results: La Chicago Kansas City Columbia BOSTON if statement is an expression that can be evaluated as True or False and is called a conditional test. If a conditional test evaluates to True, […]


  • 5 Python Condition Statements 1 – Intro

    Programming often involves examining a set of conditions and deciding which action to take based on those conditions. Python’s if statement allows you to examine the current state of a program and respond appropriately to that state. Programming Statements: If…elif…else If..Else statements can be used with:VariablesListsArraysand more….


  • 4 Python List – Pointing to the same List

    Assigning the a variable to a List – Code BOTH VARIABLES POINTING TO THE SAME LIST Any changes made to list appears same in both variables now VAR1 = VAR2 – there is no : used in this case print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods APPEND print(“Append […]


  • 4 Python List – Appending to a list

    Appending: Code #APPENDING TO LIST print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods[:] #APPEND print(“Append new food items”) my_favorite_foods.append(‘fruits’) friends_favorite_foods.append(‘ice cream’) Print print(“My favorite foods are:”) print(my_favorite_foods) print(“\nMy friend’s favorite foods are:”) print(friends_favorite_foods) Results: Append new food items My favorite foods are: [‘pizza’, ‘rice’, ‘cake’, ‘fruits’, ‘ice cream’] My […]


  • 4 Python Lists – Copying Lists

    COPYING LIST code print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods[:] print(“My favorite foods are:”) print(my_favorite_foods) print(“\nMy friend’s favorite foods are:”) print(friends_favorite_foods) Results: Copying list from on to another My favorite foods are: [‘pizza’, ‘rice’, ‘cake’] My friend’s favorite foods are: [‘pizza’, ‘rice’, ‘cake’]


  • 4 Python List – Slicing the List

    SLICING a LIST code players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the three players from 2ND PLAYER ONWARDS on my team:”) print(players[1:4]) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the FIRST THREE players on my team:”) print(players[0:3]) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the FIRST FOUR players on […]


  • 4 Python LIST – Looping through a slice

    Looping Through a SLICE print(“Looping through SLICE code”) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the first three players on my team:”) for player in players[:3]: print(player.title()) Results Looping through SLICE code Here are the first three players on my team: Vijay Mohan Vasu


  • Traditional DBMS vs Big data

    In traditional relational database management systems, data was often moved to computational space for processing. In Big Data space bringing the computation to where data is located. So, everything is real-time. A key feature of these types of real-time notifications is that they enable real-time actions. However, using such a capability would require you to approach your application and your […]


  • IOT: Internet of things

    Big data generated by machines. It’s everywhere and there’s a lot. If you look at all sources of big data, machine data is the largest source of big data. This complex data sensing capability smart and called as smart data. A smart phone gives you a way to track many things, including your geo-location, and […]


  • Big data: source and value.

    The main sources of Big Data are data generated by machines, people, and organisations. Machine generated data we refer to data generated from real time sensors in industrial machinery or vehicles that logs that track user behaviour online, environmental sensors or personal health trackers, and many other sense data resources. Human generated data, we refer […]


  • 6. Introduction to DWBI – Goals of Data Warehouse

    Building a Data warehouse Purpose & Goals of Data Warehouse Purpose & Goals of Data Warehouse: Focus on fundamentals before delving into details of modeling and implementation. Listen to the Business and understand the goals and current pain points in reporting the business needs. The Data warehouse must make the organizational information readily available and […]


  • 5. Introduction to DW&BI – ETL Extract Transform and Load

    ETL – Extract Transform & Load Extract Transform and Load: ETL means Extract, Transform, and Load. ETL is a process combined with technology. Using this process data is extracted from all sources, and depending on the user requirements the data is transformed for reporting needs, and finally, data loaded into the Data warehouse. ETL is […]


  • 4.Introduction to DW & BI – DWBI Cycle

    DWBI Cycle Data Warehousing and Business Intelligence: A real Business Intelligence passes through a Data warehouse and utilizes the data-stored for information, knowledge, and plans that produce effective business actions. So, Business Intelligence is a combination of process, technology, and a tool. Means Business Intelligence encompasses Data warehouse, Tools, and Visual knowledge management. Among all, […]


  • 3.Introduction to DW&BI – Kimball or Inmon Model

    Data warehouse – Kimball or Inmon model. The following information provides an understanding of the basics of two different approach in Data warehouse modeling: History of Data Warehouse: In 1990 Inmon wrote a book “Building the Data Warehouse.” Inmon defines an architecture for the collection of disparate sources into detailed, time variant data store (The […]


  • 2.Introduction to DW&BI – Why DWBI?

    Why DWBI Executives and Manager managing business challenged for new avenues to improve the business and economic conditions. The challenge is to do more with less and to make better decisions in a competitive industry. DWBI environment provides access to actionable data to act in a short time. The impressive reputation of DWBI in the […]


  • 1.Introduction to DW & BI – What is Data Warehouse & Business Intelligence

    Introduction to Data Warehouse and Business Intelligence What is a Data Warehouse? Defining a Data warehouse is very simple. A data warehouse is a data repository were all relevant Enterprise data is stored and provides as a single source for the Enterprise reports, analysis, and presentation through ad-hoc reports, canned reports, portals, and dashboards. What is […]


  • SSDB – Part 11. ETL – Redefining Nullable Values for Enhanced Reporting

    Redefining Nullable Values for Enhanced Reporting: CLICK to Continue Reading…….


  • SSDB – Part 10. ETL – Combining Data from Multiple Tables

    Combining data from multiple tables: CLICK to continue reading….


  • SSDB – Part 9. ETL – Reformatting Data to Make it More Readable

    Reformatting Data to Make it More Readable…..CLICK to Continue reading……….


  • SSDB – Part 8. ETL – Converting Data types for consistency

    Converting Data Types for consistency:……… READ/VIEW VIDEO……………..


  • TSQL – Part 1(d) – Working with NULLS

    NULL is used to indicate an unknown or missing value. NULL is not equivalent to zero or an empty string. Arithmetic or string. CLICK to READ ….


  • TSQL – Part 1(c) – Data Types

    Transact-SQL supports a wide range of data types, which can be broadly categorized as exact numeric, approximate numeric, character, date/time, binary, and other. CLICK TO READ……………….


  • TSQL – Part 1(b) – SELECT statement

    Use the SELECT statement to retrieve a rowset of data from tables and views in a database.. CLICK TO READ ————–


  • TSQL – Part 1(a) – Introduction to Transact SQL

    Transact-SQL is an essential skill for database professionals and developers working with Microsoft SQL Server. CLICK to READ —————–


  • SSDB – Part 4. ETL – Slowly Changing Dimension SCD Type 2 and 3

    Extract Transform and Load using SQL server database stored procedure.


%d bloggers like this: