BIG DATA MANAGEMENT

 

 

 

 

SCHOOL OF MATHEMATICAL AND COMPUTER SCIENCES

 

 

Department of Computer Science

 

 

 

 

F21BD

 

 

BIG DATA MANAGEMENT

 

 

 

 

 

 

Duration: Two Hours

 

 

 

 

 

 

 

 

 

ANSWER THREE QUESTIONS

 

 

 

 

 

Q1

(a)                                         (i)        In terms of a NoSQL database system, what does it mean for the  system to be CP?

(2 marks)

(ii)       Can a distributed data management system be both CP and AP,       explain your answer?

(2 marks)

(iii) Give an example of an AP system and explain why a company  might choose to be AP rather than CP?

  • marks)

(iv) Some NoSQL database systems use MVCC – what does it mean

and why is it used?

  • marks)

 

(b)  Big Data is sometimes described using the 5 Vs where the 5th V  is VALUE – what are the other 4 Vs and what do they mean?

  • marks)

 

(c)                                          (i)        Explain Consistent Hashing and why it is used with regard to    NoSQL database systems.

  • marks)

 

(ii)       What is a Bloom filter, how and why does the Cassandra NoSQL  database system use them?

 

  • marks) Q2
  • What is BSON and how is it related to NoSQL database systems?
    • marks)

 

  • What is referential integrity and is it supported by MongoDB?
    • marks)

 

  • Look at the following MongoDB command:

db.d1.find({“person.lastname”:”Smith”}).count()

 

                  (i)     What type of object is d1?  
  (1 mark)
   
  (ii) What data type would be returned from running this statement, assuming matching records were found?  
  (1 mark)
   
                                 (iii) What does the dot notation syntax tell you about the document schema?  
  (2 marks)
   
                  (iv)  You have a database which includes names and addresses, but  

have noticed a spelling error and all entries with the lastname of “Jnes” should be “Jones”.  Write the MongoDB command to make this change.

 

  (3 marks)
   
                                             QUESTION CONTINUES ON NEXT PAGE  
  • Denormalise these RDBMS entities for storage in MongoDB, with Employee as an embedded document of Project, showing your answer as a suitable JSON schema.

 

CREATE TABLE Employee

(

empID int(5) Not Null Primary Key,   firstName varchar(30),   lastName varchar(30),   email varchar(30)

);

 

CREATE TABLE Project

(

pID int(8) Not Null Primary Key,   pName varchar(50),   startDate date,   endDate date,   managedBy int(5),

FOREIGN KEY (managedBy) REFERENCES Employee(empID) );

 

(6 marks)

 

(e)       MongoDB is considered as a schema-less database system.

Explain what this means, yet why JSON schemas may be included.

(3 marks) Q3

  • Define the term “Data Integration” and explain the 3 components required to implement a data integration system.

(4 marks)

 

  • (i) Describe the RDF data model and what are the permissible

values for each part.

(4 marks)

 

(ii)               Explain how RDF supports data integration.

(2 marks)

 

  • Consider the following RDF:

 

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX ex: <http://www.macs.hw.ac.uk/geo#>

ex:Country a rdfs:Class . ex:City a rdfs:Class . ex:CapitalCity rdfs:subClassOf ex:City . ex:cityIn a rdfs:Property ;  rdfs:Domain ex:City ;  rdfs:Range ex:Country . ex:capitalOf a rdfs:Property ;  rdfs:Domain ex:CapitalCity ;  rdfs:Range ex:Country . ex:population a rdfs:Property ;  rdfs:Range xsd:integer .

 

ex:London rdfs:label “London” ;  ex:capitalOf ex:UK ;  ex:cityIn ex:UK ;  ex:population 14040163 . ex:UK rdfs:label “UK” ;  ex:population 65648100 .

 

  • What triple(s) can we infer about ex:London and ex:UK?
    • marks)

 

  • Assume the above is part of a larger dataset.

 

Write a SPARQL query to find the labels of all capital cities.

  • marks)

(iii) Write a SPARQL query to calculate the average city population by      country.

  • marks) Q4

(a)              Explain the difference between an OWL ontology and a Shape         Expression (ShEx) Schema.

(4 marks)

(b)  Provide an RDF graph that conforms to the following Shape

Expression.

(The minimally satisfying graph is sufficient. You can provide the graph as abstract syntax or Turtle.)

 

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX hwu: <http://www.macs.hw.ac.uk/ns/>

 

hwu:StudentShape {   rdf:type [hwu:Student] ;   foaf:name xsd:string ;   foaf:mbox IRI? ;

hwu:studies @hwu:CourseShape {1,8}

}

hwu:CourseShape {   rdf:type [hwu:Course] ;   rdfs:label xsd:string

}

 

(6 marks)

 

                                             QUESTION CONTINUES ON NEXT PAGE

(c) Assume that the following entities are defined in an OWL ontology about HWU staff and courses:

Classes: Academic, Course, Campus, Dubai, Malaysia, Orkney Object Properties: teaches, locatedAt

 

Dubai, Malaysia and Orkney are disjoint subclasses of Campus.

teaches has the domain Academic and the range Course. locatedAt has the domain Academic and the range Campus.

 

  • Use Manchester Syntax to define classes for UniversityStaff and AdministrativeStaff such that academics and administrative staff are types of university staff, and no member of staff can be both an academic and an administrator.
    • marks)

 

  • Use Manchester Syntax to define the class OrkneyAcademic. Such an academic is based in Orkney and teaches at least one course.
    • marks)

(d)                 Describe two of the services performed by a standard Description           Logic reasoner.

(2 marks)

 

 

          

                    END OF PAPER