Code: T-19                                         Subject: DATA WAREHOUSING AND DATA MINING

Time: 3 Hours                                                                                         Flowchart: Alternate Process: December 2005            Max. Marks: 100

 

NOTE: There are 9 Questions in all.

·      Question 1 is compulsory and carries 20 marks. Answer to Q. 1. must be written in the space provided for it in the answer book supplied and nowhere else.

·      Out of the remaining EIGHT Questions answer any FIVE Questions. Each question carries 16 marks.

·      Any required data not explicitly given, may be suitably assumed and stated.

 

Q.1       Choose the correct or best alternative in the following:                                         (2x10)

       

a.       Groupware tools are examples of 

 

                   (A)  DSS                                             (B)  MIS

(C)    TPS                                            (D)  ERP

       

b.      Why isn’t the data in operational systems appropriate for business analysis?

 

(A)    It is not consistent..                      

(B)    It is too detailed.

(C)    It is not optimised for decision support applications and tools..

(D)    All of the above.

            

             c.   Knowledge management, data warehousing and decision support

                  

(A) Are three interrelated information organization, manipulation,  

      delivery and presentation disciplines    

(A)    are unrelated components of Information Systems

(B)    are invented by IBM                    

(C)    Are vendor marketing programs with little commercial relevance

 

             d.   A data mart is

 

(A)    A small data warehouse.              (B) A departmental data warehouse.

(C)  A simple data warehouse.            (D)  All of the above.        

 

             e.   A Central Data Warehouse (CDW) is:

                  

(A)     always a relational database         

(B)     the data mart for the large enterprise

(C)     needed for data mining                

(D)    exposed to describe how dependent data marts achieve consistency


             f.    Explicit knowledge is:

 

(A)     Created by and exists within employees.        

(B)     Stored and publicly available.

(C)     Learned only from prior failures.  

(D)    Obtained only from prior successes.

       

             g.   Data mining includes:

 

(A)     Analysing large volumes of data to discover interesting associations or patterns.          

(B)     Querying a large data warehouse to uncover undiscovered facts.

(C)     Very complex SQL query operations.

(D)    Slicing and dicing until you uncover interesting details.

 

             h.   Query and reporting tools are most appropriate for:

 

(A)    Controlled predictable query environments     

(B)    Adhoc reporting requirements

(C)    Complex multifaceted business query applications      

(D)    Discovery mode applications

 

             i.    Which of the following is the best example of a specific multidimensional query?

 

(A)   How many programmers worked more than 2000 hours last year?

(B)   Who are the customers in the northeast?

(C) What is the profit for baby goods, by store, by month?

(D) Why are our best customers shopping at the competition?

 

             j.    A star schema is

 

(A)     A de-normalised arrangement of dimensions and facts along with related 

       measures

(B) Same as snowflake schema

(C) A complex relational join              

(D) A normalized arrangement of dimensions and facts along with related  

       measures

 

 

Answer any FIVE Questions out of EIGHT Questions.

Each question carries 16 marks.

 

  Q.2     a.   Define and differentiate between

(i)                  Executive information system & decision support system.

(ii)                Primitive Data and derived data.                                         (4x2=8)

 

             b.   What is the role of DSS in decision-making?                                                       (4)

 

             c.   Explain the Data Warehouse back end tools that are used to populate and refresh data.                  (4)

 

  Q.3     a.   How are organizations using the information from data warehouses?                    (6)

       

             b.   What is a measure? Give an example to demonstrate the computation of a measure by relational aggregation operations.                                          (6)                                                             

       

             c.   Meta Data is critical to the success of a Data Warehouse. Explain from the point of creating and maintaining data.                                                    (4)

 

  Q.4     a.   What is a concept hierarchy? Define Roll-up and Drill-down. Show how these operations use the concept hierarchy.                                            (8)

            

             b.   State why, for the integration of multiple heterogeneous information sources, many companies in industry prefer the Update-driven approach (which constructs and uses data warehouses), rather than query-driven approaches (which applies wrappers and integrators). Describe situations where the query driven approach is preferable over the update-driven approach?                                                                  (8)

       

  Q.5     a.   Explain the following:

(i)                  ROLAP

(ii)                MOLAP                                                                      (4.5x2 = 9)

 

             b.   List any four sources of External Data. (4)

 

             c.   Give examples of unstructured and structured data.                                             (3)

 

  Q.6     a.   What is Noise? How can it be removed by the binning method? Also, Explain any four methods for removing missing values.                                     (8)

 

             b.   Explain the five different strategies for data reduction?                                         (5)

 

c.       Discuss the issues to consider during data integration.                                          (3)

                                                                

  Q.7     a.   Define classification. How is prediction different from classification?                    (7)   

                

             b.   Write the basic algorithm for inducing a decision tree from training samples.                        (9)

 

  Q.8     a.   How does the tree pruning process work?                                                          (4)

 

             b.   Explain the migration plan in building the data warehouse?                                   (9)

 

             c.   It is crucial for data warehouse systems to support highly efficient data cube computation techniques, access methods, and query processing techniques in an architecture environment. Comment on this statement.                                                           (3)

 

  Q.9     a.   Explain the following OLAP operations with an example, each                  (5x2=10)

(i)                  Pivot.

(ii)                Slice and Dice.

 

              b.  Differentiate between Characterisation and clustering.                                         (6)