Featured

Big data – sql query engines & data stores

Apache Hive – Provides structure for data already stored in a distributed system.

Apache Impala – massively parallel processing SQL query engine

Apache Hbase – a database, a distributed, scalable, big data store.

Apache Kudu – a distributed data storage engine that makes fast analytics on fast and changing data easy

A Day in the life of a Big Data Architect

posts

What’s happening?


About Me

Hi, I’m Vj, founder

at learn with vj

I am a Big Data Architect, data technology blogger, and content creator; exploring data storage, data processing, and data extraction in the big data world. Expect the latest ideas and thoughts about the data; data trends, usage, enhancements, industry reviews, and an occasional video! I hope you enjoy!

LIFESTYLE

  • Big Data: Series 1 Part 1 – Introduction to Big Data

    Introduction to Big Data: This video is an attempt to provide a high-level view of Big Data. I hope you enjoy the content. Click YouTube Link or Heading to Watch..

    READ MORE

  • Data Architecture – Series 1 Part 1 – Data Architecture Introduction

    Data Architecture refers to the art and science of building data structures and the process of building enterprise data structures. In this series of videos, you can learn the elements of data-architecture as an Introduction.

    READ MORE

  • Apache Ozone: By Cloudera

    An Object Storage for Big Data: Ozone – Object Storage for Big Data – Arpit Agarwal, Sr Engineering Manager, Storage Team, Cloudera This demo is very interesting if you are looking for Object storage on premise: Disclaimer: This demo is by Cloudera

    READ MORE

  • hdfs : Replication Factor

    Replication factor: How to change the default replication factor of an existing file in hdfs? When it comes to the reliability of the file system of the blocks, there is a property called replication factor. The replication factor in a production cluster typically set three, and also the default replication factor, also three, which means […]

    READ MORE

  • hdfs : finding block size

    Finding Block Size: Finding default block size : Finding blocksize using the hdfs-site.xml file :

    READ MORE

  • Productivity Tips – Time management

    Time management is the most important factor for Productivity. There are only 24 hours in a day for everyone Some people make the time productive and the others not that much. In this video let us see some tips and tricks how to make use of the time in a timely and productive way

    READ MORE

  • Productivity Tips – Mindfulness

    Mindfulness is the psychological process of purposely bringing one’s attention to experiences occurring in the present moment without judgment, which one develops through the practice of meditation and through other training.

    READ MORE

  • 5 Python Conditional Statement 4 – If..else..elif

    If..else..elif : code if person_age < 4: ticket_price = 0elif person_age < 18: ticket_price = 5elif person_age < 65: ticket_price = 10else: ticket_price = 5 Python does not require an else block at the end of an if-elif chain. Sometimes an else block is useful; sometimes you can avoid using.

    READ MORE

  • 5 Python Conditional Statements 3 – Condition operators or symbols

    Operators evaluation: 1) == Equal to operator – True when matches – False when don’t match.CASE == CASE (matches and True in this case)Name = ‘Vijay’Name == ‘vijay’ (False)Age = 18Age == 18 (True) 2) != not equal to operator – True when matches – False when don’t match. CASE != case (matches and True […]

    READ MORE

  • 5 Python Conditional statements 2 – Simple If..else

    Simple If..else statement: code cities = [‘LA’, ‘Chicago’, ‘kansas city’, ‘columbia’,’Boston’] for city in cities: if city == ‘Boston’: print(city.upper()) else: print(city.title()) Results: La Chicago Kansas City Columbia BOSTON if statement is an expression that can be evaluated as True or False and is called a conditional test. If a conditional test evaluates to True, […]

    READ MORE

  • 5 Python Condition Statements 1 – Intro

    Programming often involves examining a set of conditions and deciding which action to take based on those conditions. Python’s if statement allows you to examine the current state of a program and respond appropriately to that state. Programming Statements: If…elif…else If..Else statements can be used with:VariablesListsArraysand more….

    READ MORE

  • 4 Python List – Pointing to the same List

    Assigning the a variable to a List – Code BOTH VARIABLES POINTING TO THE SAME LIST Any changes made to list appears same in both variables now VAR1 = VAR2 – there is no : used in this case print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods APPEND print(“Append […]

    READ MORE

  • 4 Python List – Appending to a list

    Appending: Code #APPENDING TO LIST print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods[:] #APPEND print(“Append new food items”) my_favorite_foods.append(‘fruits’) friends_favorite_foods.append(‘ice cream’) Print print(“My favorite foods are:”) print(my_favorite_foods) print(“\nMy friend’s favorite foods are:”) print(friends_favorite_foods) Results: Append new food items My favorite foods are: [‘pizza’, ‘rice’, ‘cake’, ‘fruits’, ‘ice cream’] My […]

    READ MORE

  • 4 Python Lists – Copying Lists

    COPYING LIST code print(“Copying list from one to another”) my_favorite_foods = [‘pizza’, ‘rice’, ‘cake’] friends_favorite_foods = my_favorite_foods[:] print(“My favorite foods are:”) print(my_favorite_foods) print(“\nMy friend’s favorite foods are:”) print(friends_favorite_foods) Results: Copying list from on to another My favorite foods are: [‘pizza’, ‘rice’, ‘cake’] My friend’s favorite foods are: [‘pizza’, ‘rice’, ‘cake’]

    READ MORE

  • 4 Python List – Slicing the List

    SLICING a LIST code players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the three players from 2ND PLAYER ONWARDS on my team:”) print(players[1:4]) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the FIRST THREE players on my team:”) print(players[0:3]) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the FIRST FOUR players on […]

    READ MORE

  • 4 Python LIST – Looping through a slice

    Looping Through a SLICE print(“Looping through SLICE code”) players = [‘Vijay’, ‘Mohan’, ‘Vasu’, ‘Sai’, ‘Chandika’,’Shreya’] print(“Here are the first three players on my team:”) for player in players[:3]: print(player.title()) Results Looping through SLICE code Here are the first three players on my team: Vijay Mohan Vasu

    READ MORE

  • Traditional DBMS vs Big data

    In traditional relational database management systems, data was often moved to computational space for processing. In Big Data space bringing the computation to where data is located. So, everything is real-time. A key feature of these types of real-time notifications is that they enable real-time actions. However, using such a capability would require you to approach your application and your […]

    READ MORE

  • IOT: Internet of things

    Big data generated by machines. It’s everywhere and there’s a lot. If you look at all sources of big data, machine data is the largest source of big data. This complex data sensing capability smart and called as smart data. A smart phone gives you a way to track many things, including your geo-location, and […]

    READ MORE

  • Big data: source and value.

    The main sources of Big Data are data generated by machines, people, and organisations. Machine generated data we refer to data generated from real time sensors in industrial machinery or vehicles that logs that track user behaviour online, environmental sensors or personal health trackers, and many other sense data resources. Human generated data, we refer […]

    READ MORE

  • 6. Introduction to DWBI – Goals of Data Warehouse

    Building a Data warehouse Purpose & Goals of Data Warehouse Purpose & Goals of Data Warehouse: Focus on fundamentals before delving into details of modeling and implementation. Listen to the Business and understand the goals and current pain points in reporting the business needs. The Data warehouse must make the organizational information readily available and […]

    READ MORE

  • 5. Introduction to DW&BI – ETL Extract Transform and Load

    ETL – Extract Transform & Load Extract Transform and Load: ETL means Extract, Transform, and Load. ETL is a process combined with technology. Using this process data is extracted from all sources, and depending on the user requirements the data is transformed for reporting needs, and finally, data loaded into the Data warehouse. ETL is […]

    READ MORE

  • 4.Introduction to DW & BI – DWBI Cycle

    DWBI Cycle Data Warehousing and Business Intelligence: A real Business Intelligence passes through a Data warehouse and utilizes the data-stored for information, knowledge, and plans that produce effective business actions. So, Business Intelligence is a combination of process, technology, and a tool. Means Business Intelligence encompasses Data warehouse, Tools, and Visual knowledge management. Among all, […]

    READ MORE

  • 3.Introduction to DW&BI – Kimball or Inmon Model

    Data warehouse – Kimball or Inmon model. The following information provides an understanding of the basics of two different approach in Data warehouse modeling: History of Data Warehouse: In 1990 Inmon wrote a book “Building the Data Warehouse.” Inmon defines an architecture for the collection of disparate sources into detailed, time variant data store (The […]

    READ MORE

  • 2.Introduction to DW&BI – Why DWBI?

    Why DWBI Executives and Manager managing business challenged for new avenues to improve the business and economic conditions. The challenge is to do more with less and to make better decisions in a competitive industry. DWBI environment provides access to actionable data to act in a short time. The impressive reputation of DWBI in the […]

    READ MORE

  • 1.Introduction to DW & BI – What is Data Warehouse & Business Intelligence

    Introduction to Data Warehouse and Business Intelligence What is a Data Warehouse? Defining a Data warehouse is very simple. A data warehouse is a data repository were all relevant Enterprise data is stored and provides as a single source for the Enterprise reports, analysis, and presentation through ad-hoc reports, canned reports, portals, and dashboards. What is […]

    READ MORE

  • SSDB – Part 11. ETL – Redefining Nullable Values for Enhanced Reporting

    Redefining Nullable Values for Enhanced Reporting: CLICK to Continue Reading…….

    READ MORE

  • SSDB – Part 10. ETL – Combining Data from Multiple Tables

    Combining data from multiple tables: CLICK to continue reading….

    READ MORE

  • SSDB – Part 9. ETL – Reformatting Data to Make it More Readable

    Reformatting Data to Make it More Readable…..CLICK to Continue reading……….

    READ MORE

  • SSDB – Part 8. ETL – Converting Data types for consistency

    Converting Data Types for consistency:……… READ/VIEW VIDEO……………..

    READ MORE

  • TSQL – Part 1(d) – Working with NULLS

    NULL is used to indicate an unknown or missing value. NULL is not equivalent to zero or an empty string. Arithmetic or string. CLICK to READ ….

    READ MORE

  • TSQL – Part 1(c) – Data Types

    Transact-SQL supports a wide range of data types, which can be broadly categorized as exact numeric, approximate numeric, character, date/time, binary, and other. CLICK TO READ……………….

    READ MORE

  • TSQL – Part 1(b) – SELECT statement

    Use the SELECT statement to retrieve a rowset of data from tables and views in a database.. CLICK TO READ ————–

    READ MORE

  • TSQL – Part 1(a) – Introduction to Transact SQL

    Transact-SQL is an essential skill for database professionals and developers working with Microsoft SQL Server. CLICK to READ —————–

    READ MORE

  • SSDB – Part 4. ETL – Slowly Changing Dimension SCD Type 2 and 3

    Extract Transform and Load using SQL server database stored procedure.

    READ MORE

%d bloggers like this: