Unit 2
Unit
2
Data warehouse Process and
technology(DWM)
Warehousing Strategy
A
warehouse strategy is a plan that outlines how you will manage your warehouse
for optimal efficiency. It’s important to note that a warehouse plan is not the
same thing as a strategy. A well-crafted warehouse strategy focuses on how
well your warehouse operates within given benchmarks (e.g., storage
density, fulfilment speed, inventory accuracy, etc.)
- Outline
your data stack: including ETL process, data pipeline tools, and data science pipeline
- Define data lifecycle
requirements
- Determine actions you expect
to take on the data
- Sketch out dashboards you
expect to build with the data
- Build your data analytics team
- Explore data warehouse
solutions
Efficiently providing
systematic, contextual data to the business intelligence tool of an
organization, the data warehouses can find out more practical business
strategies.
- Business User: Business
users or customers need a data warehouse to look at summarized data from
the past.Since these people are coming from a non-technical background
also, the data may be represented to them in an uncomplicated way.
- Maintains consistency: Data warehouses are programmed in such a way that they can be
applied in a regular formatto all collected data from differentsources,
which makes it effortlessfor company decision-makers to analyse and share
data insights By standardizing thedata, the risk of errorin interpretation
is also reduced and improves overall accuracy.
- Store historical data: DataWarehouses are also used to store historical data that
means, the time variabledata from the past and this input can be used for
various purposes.
- Make strategic decisions: Data warehouses contribute to making better strategic
decisions. Some businessstrategies may be depending upon the data stored
within the datawarehouses.
- High response time: Data warehouse has got to be prepared for somewhat sudden masses
and type of queries thatdemands a major degree of flexibility and fast
latency.
Data Warehousing Process
There are five notable stages of data warehousing, starting with
defining the business goals and how they are measured.
1.Establishing Business Objectives
Every business has objectives, and the first step in the data
warehousing process is clearly determining those business objectives.example,
delivering the best telecommunication services in the industry. Every
department should have objectives that contribute to the larger company goal.
2.Collecting And Analysing Information
The next stage will collect and analyse the information to put
into the data warehouse. Each business department will create a summary and
analytical reports that include information about various business processes.
3.Developing A Data Framework
storing data in the warehouse needs to be done in an organized and
efficient manner. The best way to do that is to develop a conceptual data
model. A
data framework will ensure that the
data you hold in the warehouse is complete. The framework can take a long time
to construct, but it makes the data warehousing process relatively easy.
4.Moving The Data
Every business has databases
and backups, which will be the primary
source of information. You will have to ensure the data is complete from the
source or program it for completion before transfer.
The key is to ensure any reports generated from the data warehouse
match the reports from the data source.
5.Plan Implementation
The final stage will be implementing the plan. Once you move the
data and successfully track its importation into the data warehouse, you must
ensure that the data fits well into the predetermined structure or framework.
If it is a large project, it will include developing phases and schedules for
data delivery.
Warehouse Planning and Implementation
The warehouse planning is to minimise high warehousing costs which
arise, for example, from low-capacity utilisation, outdated warehouse
technology or complex organisation. A further aim can be to increase the degree
of automation. Implementing a data warehouse involves several critical steps,
each contributing to the overall success of the data architect’s project.
1. Requirement Gathering and Analysis
- Identifying the data sources and the type of
data to be collected.
- Determining the business goals that the data
warehouse should support.
- Engaging stakeholders, including business
users, data engineers, and database administrators, to gather requirements
2. Designing the Data Warehouse
Architecture
A well-planned data warehouse
architecture is crucial for
efficient data storage and efficient data retrieval.
- Data Modeling:
Designing the schema, which defines how data is organized within the
warehouse. Common models include star schema, snowflake schema, and data
vault modeling.
- ETL Processes:
Planning how data will be extracted from source systems, transformed to
fit the warehouse schema, and loaded into the warehouse.
- Data Storage Solutions:
Selecting appropriate technologies for data storage, such as relational
databases or big data platforms.
3.Data Cleansing and Validation
·
Data Cleansing: Identifying and correcting errors or
inconsistencies in the data.
·
Data Validation: Ensuring that the data meets the predefined
quality criteria and business rules
4. Implementing Data Security
and Compliance
Data security and compliance are
paramount in any such data warehousing system
or project.
·
Data Encryption: Protecting data at rest and in transit through
encryption.
·
Access Controls: Implementing role-based access to restrict who
can access or modify data.
·
Compliance: Ensuring that the data warehouse complies
with relevant regulations and standards.
5.Testing and Quality Assurance
·
User Acceptance Testing (UAT): Engaging end-users
to test the system and ensure it meets their needs.
·
Quality Assurance (QA): Testing the system for data accuracy,
query performance, and security.
6.Deployment and Maintenance
The data
warehouse has passed all tests, it is ready for deployment. Ongoing maintenance
is crucial to ensure all the data the system remains reliable and efficient.
·
Monitoring: Continuously monitoring the data warehouse for
performance issues or data inconsistencies.
·
Upgrades and Scalability: Updating the system to handle increased
data volumes or new business requirements.
The client/server
computing model and data warehousing
The client/server computing model plays
a significant role in the architecture of data warehousing systems, providing a
framework for distributing processing tasks between clients (end-users) and
servers (data warehouse infrastructure). Here’s how the client/server model
relates to data warehousing:
Computing Model
In the client/server computing model,
computing tasks are divided between clients and servers:
1.
Client:
·
Refers to the end-user devices, such as desktop
computers, laptops, tablets, or smartphones.
·
Executes user applications and interfaces with
the user.
·
Requests data and services from servers.
2.
Server:
·
Refers to powerful computers or server clusters
that provide services to clients.
·
Handles data storage, processing, and
management tasks.
·
Responds to client requests by providing data
or executing operations.
Role in Data Warehousing
In the context of data warehousing, the
client/server model is used to facilitate access to and utilization of the data
warehouse by end-users:
1.
Client-
2.
Side Applications:
·
Business intelligence (BI) tools, reporting
applications, and analytical software installed on client devices.
·
Allows users to interact with the data
warehouse, query data, generate reports, and visualize insights.
3.
Server-Side Components:
·
Data warehouse servers and associated
infrastructure.
·
Stores and manages large volumes of structured
and sometimes unstructured data.
·
Executes complex data processing tasks,
including ETL (Extract, Transform, Load), data aggregation, and query
optimization.
Interaction in Data Warehousing
The client/server model
facilitates the interaction between end-users and the data warehouse:
1.
Data Retrieval:
·
Clients send queries to the data warehouse
servers to retrieve specific datasets or perform analyses.
·
Servers process these queries, accessing the
required data from storage, and returning the results to the clients.
2.
Data Presentation:
·
Servers provide processed data to clients in a
format suitable for analysis or visualization.
·
Clients use BI tools or reporting applications
to present the data to end-users in the form of charts, graphs, tables, or
dashboards.
3.
Data Manipulation:
·
Clients can manipulate and analyse data locally
using client-side applications.
·
Servers may also provide services for advanced
data processing, such as machine learning or predictive analytics, depending on
the architecture.
Benefits
The client/server
computing model offers several benefits in the context of data warehousing:
1.
Scalability:
·
Enables scaling of both client and server
components independently based on demand.
·
Additional clients can be added without
impacting server performance, and vice versa.
2.
Centralized Management:
·
Centralizes data storage and management on
servers, simplifying administration and ensuring data consistency.
3.
Resource Utilization:
·
Distributes processing tasks between clients
and servers, optimizing resource utilization and performance.
4.
Flexibility:
·
Allows users to access data warehouse resources
from various client devices and locations, providing flexibility in data access
and analysis.
Distributed DBMS implementations in data warehousing
A distributed database is essentially a database that is dispersed
across numerous sites, i.e., on various computers or over a network of
computers, and is not restricted to a single system. A distributed database
system is spread across several locations with distinct physical components.
Types:
1. Homogeneous Database: A homogeneous database
stores data uniformly across all locations. All sites utilize the same
operating system, database management system, and data structures. They are
therefore simple to handle.
2. Heterogeneous Database: With a heterogeneous
distributed database, many locations may employ various software and schema,
which may cause issues with queries and transactions. They could even employ separate database data
models. Translations are therefore necessary for communication across various
sites.
Data may be stored on several places in two ways using distributed
data storage:
- Replication - With
this strategy, every aspect of the connection is redundantly kept at two
or more locations. It is a completely redundant database if the entire
database is accessible from every location. Systems preserve copies of the
data as a result of replication. This has advantages since it makes more
data accessible at many locations. Moreover, query requests can now be
handled in parallel.
- Fragmentation - In
this method, the relationships are broken up into smaller pieces and each
fragment is kept in the many locations where it is needed. To ensure there
is no data loss, the pieces must be created in a way that allows for the
reconstruction of the original relation. As fragmentation doesn't result
in duplicate data, consistency is not a concern.
Uses for distributed databases
- The corporate management information system
makes use of it.
- Multimedia apps utilize it.
- Used in hotel chains, military command systems,
etc.
- The production control system also makes use of
it
Characteristics of distributed databases
Distributed databases are logically connected to one another when
they are part of a collection, and they frequently form a single logical
database. Data is physically stored across several sites and is separately
handled in distributed databases.
- Place unrelated
- Spread-out query processing
- The administration of distributed transactions
- Independent of hardware
- Network independent of operating systems
- Transparency of transactions
- DBMS unrelated
Designing Data Warehouse Schemas
Designing an effective data warehouse schema involves
understanding business processes, defining dimension and fact tables, and
ensuring data integrity.
Steps in Schema Design
- Identify Business Processes:
Determine the key processes and measures to be analysed.
- Define Dimension Tables:
Create tables for descriptive attributes.
- Create Fact Table:
Centralize measures and link to dimension tables.
- Normalize Data:
For snowflake schema, further break down dimension tables to reduce
redundancy.
- Ensure Data Integrity:
Use primary and foreign keys to maintain relationships.
Benefits of Data Warehouse Schemas
- Efficient Querying:
Well-designed schemas enhance query performance.
- Data Integration:
Facilitates combining data from multiple sources.
- Reduced Redundancy:
Normalization in snowflake schema lowers data duplication.
- Enhanced Data Analysis:
Supports complex queries and detailed analysis.
Challenges in Data Warehouse Schemas
- Complexity: Designing
and maintaining complex schemas can be challenging.
- Storage Space: Snowflake
and fact constellation schemas may require more storage.
- Query Performance:
Multiple joins in snowflake schema can impact performance.
Best Practices for Data Warehouse Schema Design
- Understand Business Needs:
Tailor the schema to business requirements.
- Balance Normalization:
Find a balance between reducing redundancy and maintaining performance.
- Ensure Scalability:
Design schemas that can grow with the business.
- Optimize Query Performance:
Use indexing and partitioning to enhance performance.
Comments
Post a Comment