Chapter 5 - Securing Your AWS Resources


AWS IAM (Identity and Access Management)


  • Management dashboard (https://console.aws.amazon.com/iam)
  • It connects an account to all the administration tools for managing the account security.
  • Protect the root user by locking it down( the e-mail id and password to cerate an AWS account for the first time)
  • For daya to day administrative work and effective functionality of root user create USERS and GROUPS.
Protecting the Root User
  • Root user has all the permissions for the AWS resources and authorizing the expenses.
  • Active usage of root user is a significant security risk, if compromised in attacks, data will be stolen and the resources will be used my others generating huge bills.
  • Protect root user by 1) Strong Password 2) Multifactor authentication (MFA) 3) use IAM user instead of root user.
Authentication
  • 3 Ways - 1) management Console 2) Programatic Access 3) Command-line Interface(CLI)
  • “who you claim to be”
  • management Console – Provide User ID and passowrd.
  • Programatic or CLI Access – set the access keys.
Passwords
  • Complex password – Uppsercase, Lowercase,Numbers, Symbols.
  • Use random generated strings
  • Never reuse the passwords across multiple accounts.
  • Efficient way – Use a password manager like LastPass, Dashlane, or KeePass2.
  • Create a Password Policy from root user to all the accounts.
  • MFA adds second laye of security.
  • MFA works with associated physical device - Universal 2nd Factor(U2F) , r MFA-compliant device like YubiKey, smartphone with the Authenticator app
  • Enter with Step 1: Password -> Step 2:Short-lived 6 digit number sent to your MFA device.
Access Keys
  • Programmatic and Command-line access to many AWS resources is authenticated by access keys (without the option of MFA).
  • New set of keys generated AWS Management Console -> Security Credentials page -> Create New Access Key Button->Download the new pair.
  • Security Credentials page - lists all the keys and allows - deactivate, activate or delete existing keys.
  • The root user take the time to delete all keys associated with the root account.
Secure Shell Key Pairs
  • Encrypting remote login sessions is the Secure Shell (SSH) protocol
  • When that “network” is the internet safety of your data is less.
Users, Groups, and Roles
  • “Principle of Least Privilege” -> assign only priveleges need by users and only if required.
IAM User
  • Primary Admin (User) to replace root access -> AdministratorAccess policy
IAM Groups
  • Using groups to administrate the permissions associated with multiple users in batches can get all that done much more efficiently.
  • New Employee -> Create New User -> attach to an Appropriate Group -> Automatic inheritance of Policies
IAM Roles
  • Important for Application Services not Users
  • “Trusted Entity”
  • Entity -> AWS resources (like EC2) -> 3rd party federated identify provider(like Google)

Providing Federated Access


  • Security Assertion Markup Language 2.0 (SAML) or Microsoft’s Active Directory into your infrastructure.
  • SSO (Single Sign-on)
  • AWS Directory Service for Microsoft Active Directory(AWS Microsoft AD)

Credential Report


  • Download Report button ->CSV -> IAM User, MFA enabled, last logged in, active access keys
  • Monitor accoutns for security holes.

Encryption


  • Encryption Keys -> AWS Key Management System (AWS KMS)
  • KMS will apply encryption using a customer master key (CMK)
  • Encrypt any data managed by an AWS service, including RDS, DynamoDB, EBS volumes attached to EC2 instances
  • S3 encryption -> during or after bucket creation.
  • Two ways 1) S3-managed server-side encryption keys (SSE-S3) or 2) KMS-managed keys (SSE-KMS)
  • S3 buckets -> server-side encryption
  • Data at transit from your local infrastrucutre ->client-side encryption
  • Before uploading to S3 -> KMS-managed customer master key or a clientside master key.

Regulatory Compliance (AWS Artifact)


  • Artifacts - the service home page is a set of links to documents describing various regulatory standards and how AWS meets them. Each of those documents is referred to by Amazon as an Artifact.
  • U.S. government’s Federal Risk and Authorization Management Program (FedRAMP), the Government of Canada (GC) Partner Package, and the Australian Prudential Regulation Authority (APRA) “Management of Security Risk in Information and Information Technology”
  • PCI DSS Attestation of Compliance (AOC) and Responsibility Summary for handling credit card transaction data.
  • Service Organization Controls (SOC) 1, 2, and 3 audits of AWS infrastructure.

Chapter 1 -The Cloud

Explore job roles in the world of data

3 Key Roles

  • Database Administrators manage databases, assigning permissions to users, storing backup copies of data and restore data in case of any failures.
  • Data Engineers are vital in working with data, applying data cleaning routines, identifying business rules, and turning data into useful information.
  • Data Analysts explore and analyze data to create visualizations and charts to enable organizations to make informed decisions

Azure Database Administrator

  • The design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions
  • Overall availability and consistent performance and optimizations of the database solutions.
  • work with stakeholders to implement policies, tools, and processes for backup and recovery plans to recover following a natural disaster or human-made error.
  • Managing the security od data, granting privileges over the data, granting or denying access to users as appropriate.
Tasks and Responsibilties
  • Installing and upgrading the database server and application tools.
  • Allocating system storage and planning storage requirements for the database system.
  • Modifying the database structure, as necessary, from information given by application developers.
  • Enrolling users and maintaining system security.
  • Ensuring compliance with database vendor license agreement.
  • Controlling and monitoring user access to the database.
  • Monitoring and optimizing the performance of the database.
  • Planning for backup and recovery of database information.
  • Maintaining archived data.
  • Backing up and restoring databases.
  • Contacting database vendor for technical support.
  • Generating various reports by querying from database as per need.
  • Managing and monitoring data replication.
Tools
Azure Data Studio
  • Graphical user interface for managing many different database systems.
  • Connection ->SQL Server databases, Azure SQL Database, PostgreSQL, Azure SQL Data Warehouse, and SQL Server Big Data Clusters
  • Extensible tool -> 1. Download and install extensions from third-party developers that connect to other systems 2. Provide wizards that help to automate many administrative tasks.
    SQL Server Management Studio
  • A graphical interface, enabling you to query data, perform general database administration tasks, and generate scripts for automating database maintenance and support operations.
  • Generate Transact-SQL scripts -> DBA the ability to schedule and automate many common tasks.
  • Transact-SQL is a set of programming extensions from Microsoft that adds several features to the Structured Query Language (SQL), including transaction control, exception and error handling, row processing, and declared variables.
Azure Portal
  • Database services in Azure.
  • SQL Server on cloud
  • Configuration tasks -> Increasing the database size, creating a new database, and deleting an existing database

Azure Data Engineer

  • Design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads.
  • Data platform technologies - relational and nonrelational databases, file stores, and data streams.
  • Ensuring privacy of data
  • Management and monitoring of data stores and data pipelines to ensure that data loads perform as expected.
Tasks and Responsibilties
  • Developing, constructing, testing, and maintaining databases and data structures.
  • Aligning the data architecture with business requirements.
  • Data acquisition.
  • Developing processes for creating and retrieving information from data sets.
  • Using programming languages and tools to examine the data.
  • Identifying ways to improve data reliability, efficiency, and quality.
  • Conducting research for industry and business questions.
  • Deploying sophisticated analytics programs, machine learning, and statistical methods.
  • Preparing data for predictive and prescriptive modeling.
  • Using data to discover tasks that can be automated.
Tools
  • A relational database management system -> SQL
  • SQL to create databases, tables, indexes, views, and the other objects required by the database.
  • SQL Server Management Studio -> create and query tables visually, but you can also create your own SQL scripts manually.
  • sqlcmd utility (Command Line) to connect to Microsoft SQL Server and Azure SQL Database, and run ad-hoc queries and commands.
  • SQL server professional -> data manipulation tool might be Transact-SQL.
  • Azure Databricks -> a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.
  • Azure HDInsight to generate and test predictive models.
    • Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.
  • Azure Data Factory is a managed cloud service that’s built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
    • It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale.
    • Create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores.
    • Build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
    • Publish your transformed data to data stores such as Azure Synapse Analytics for business intelligence (BI) applications to consume.
    • Raw data can be organized into meaningful data stores and data lakes for better business decisions.
    • How does it work?
    • Data Factory contains a series of interconnected systems that provide a complete end-to-end platform for data engineers.
    • Connect and collect -> 1. The first step in building an information production system is to connect to all the required sources of data and processing, such as software-as-a-service (SaaS) services, databases, file shares, and FTP web services. 2. To move the data as needed to a centralized location for subsequent processing. 3. Use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis. 4. Collect data in Azure Data Lake Storage and transform the data later by using an Azure Data Lake Analytics compute service. 5. Collect data in Azure Blob storage and transform it later by using an Azure HDInsight Hadoop cluster.
    • Transform and enrich -> 1. Data is present in a centralized data store in the cloud, process or transform the collected data by using ADF mapping data flows. 2. Data flows enable data engineers to build and maintain data transformation graphs that execute on Spark without needing to understand Spark clusters or Spark programming. 3. ADF supports external activities for executing your transformations on compute services such as HDInsight Hadoop, Spark, Data Lake Analytics, and Machine Learning.
    • CI/CD and publish -> 1. Data pipelines using Azure DevOps and GitHub, allowing you to incrementally develop and deliver your ETL processes before publishing the finished product. 2. After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure CosmosDB, or whichever analytics engine your business users can point to from their business intelligence tools.
    • Monitor -> Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Azure Monitor logs, and health panels on the Azure portal.
  • Top-level concepts
    • Pipelines -> A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task.
    • Data Flows -> Create and manage graphs of data transformation logic that you can use to transform any-sized data. You can build-up a reusable library of data transformation routines and execute those processes in a scaled-out manner from your ADF pipelines. Data Factory will execute your logic on a Spark cluster that spins-up and spins-down when you need it.
    • Activities -> Activities represent a processing step in a pipeline. Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.
    • Datasets -> Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
    • Linked services -> Connection strings, which define the connection information that’s needed for Data Factory to connect to external resources.
    • Linked services are used for two purposes in Data Factory: * A data store that includes, a SQL Server database, Oracle database, file share, or Azure blob storage account. For a list of supported data stores, see the copy activity article. * A compute resource that can host the execution of an activity. For example, the HDInsight, Hive activity runs on an HDInsight Hadoop cluster. * For a list of transformation activities and supported compute environments, see the transform data article.

    • Integration Runtimes -> An activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the bridge between the activity and linked Services.
  • The non-relational field -> Azure Cosmos DB.
  • To manipulate and query the Cosmos DB data -> HiveQL, R, or Python.

Data Analyst

  • Maximize the value of their data assets.
  • Designing and building scalable models, cleaning and transforming data, and enabling advanced analytics capabilities through reports and visualizations.
Tasks and Responsibilities
  • Making large or complex data more accessible, understandable, and usable.
  • Creating charts and graphs, histograms, geographical maps, and other visual models that help to explain the meaning of large volumes of data, and isolate areas of interest.
  • Transforming, improving, and integrating data from many sources, depending on the business requirements.
  • Combining the data result sets across multiple sources. For example, combining sales data and weather data provides a useful insight into how weather influenced sales of certain products such as ice creams.
  • Finding hidden patterns using data.
  • Delivering information in a useful and appealing way to users by creating rich graphical dashboards and reports.
Common data visualization tools
  • Microsoft Office Apps such as Microsoft Excel for creating rich visual reports.
  • Microsoft Power BI, a powerful visualization platform, to create rich, graphical dashboards and reports over data that can vary dynamically.
  • Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights.
  • Your data might be held somewhere local such as an Excel spreadsheet, or in a collection of cloud-based and on-premises databases, or some other set of data sources.
  • Power BI -> lets you easily connect to your data sources, discover what’s important in that data, and share your findings with others in the organization.

Relational Databases

<- Back to Main Menu

Explore Core Data Concepts

Topics

  • Identify how data is defined and stored
  • Identify types of data and data storage
  • Describe and differentiate batch and streaming data

Identify how data is defined and stored


What is data?

Data is a collection of facts such as numbers, descriptions, and observations used in decision making.

Classification of data :

  • Structured,
  • Semi-structured, or
  • Unstructured.

Structured data

  1. Tabular data that is represented by rows and columns in a database.
  2. Databases that hold tables in this form are called relational databases (the mathematical term relation refers to an organized set of data held as a table). Note: Each row in a table has the same set of columns. Relational Tables

Semi-structured data

JSON
  • It is information that doesn’t reside in a relational database but still has some structure to it. Example: JavaScript Object Notation (JSON) JSON
    Key-Value Pair:
  • A key-value database stores Associative arrays.
  • A key-value database stores data as a single collection without structure or relation.

    Unstructured data

    Graph Database
  • A graph contains nodes (information about objects), and edges (information about the relationships between objects).

Data defined, stored, and accessed in cloud computing

  • Structured Data -> Relational Database such as SQL server or Azure SQL Database
  • Unstructured Data ->Azure Blob Storage (Blob - Binary Large Object)
  • Semi-strucutred Data->Azure CosmosDB

    Levels of Access

  • Read-Only
  • Read/Write
  • Owner - Full Privilege

    Scenario

  • Data Analyst/Data Engineer ->Owner
  • Sales people -> Read-Only

    Data Processing Solutions

    Transactional System (OLTP - Online Transactional Processing)
  • It records Transactions ( small,discrete unit of work)
  • Splitting tables out into separate groups of columns like this is called normalization
  • Advantage: Normalization can enable a transactional system to cache much of the information required to perform transactions in memory, and speed throughput.
  • Disadvantage: normalized tables will frequently need to join the data held across several tables back together again. This can make it difficult for business users(Analysis) who might need to examine the data.
    Analytical System
  • Capturing raw data, and using it to generate insights for decision-making.
    Tasks:

    Data ingestion: the process of capturing the raw data. The repository could be a file store, a document database, or even a relational database.
    Data transformation: Filter out anomolies, required tranformation like date or address etc., perform aggregation like,sum and other KPIs(Key Performance Indicators)
    Data querying: Query to analyse the data.
    Data visualization:

  • Visualizing the data can often be useful as a tool for examining data.
  • Charts such as bar charts, line charts, plot results on geographical maps, pie charts, or illustrate how data changes over time.
  • Microsoft Power BI - rich graphical representation of data.

Identify types of data and data storage


The characteristics of relational and non-relational data

Describe the characteristics of relational and non-relational data

Relational Database

  • Simple table structure with Rows and Columns.
  • Normalization: Data is split into a large number of narrow, well-defined tables (a narrow table is a table with few columns), with references from one table to another.

Non-Relational database

  • Store data in a format that more closely matches the original structure.
  • Example : Document DB
  • Advantage: Ease of retrieving information ex: from one document.
  • Disadvantage: 1. Duplication (Ex: Same address of two different people stored seperately) 2. Increase in storage and complex maintanance.

Describe transactional workloads

Relational Database workload
  • Handles transaction processing.
  • ACID (Atomicity, Consistency, Isolation and Durability)
    1. Atomicity: guarantees that each transaction is treated as a single unit, which either succeeds completely, or fails completely(power failures, errors, and crashes).
    2. Consistency: should never lose or create data in a manner that can’t be accounted for.
    3. Isolation: Ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
    4. Durability: Guarantees that once a transaction has been committed, it will remain committed even if there’s a system failure such as a power outage or crash.
  • Locks: 1.Implement relational consistency and isolation by applying locks to data when it is updated. 2. The lock is only released when the transaction commits or rolls back. 3. Disadvantage: Extensive locking can lead to poor performance, while applications wait for locks to be released.
    Distributed Databases:
  • A distributed database is a database in which data is stored across different physical locations.
  • It may be held in multiple computers located in the same physical location (for example, a datacenter), or may be dispersed over a network of interconnected computers.
  • Takes more time to update.
  • locks may be retained for a very long time, especially if there’s a network failure between databases at a critical point in time. To counter this problem, many distributed database management systems relax the strict isolation requirements of transactions and implement “eventual consistency.”
  • Eventual consistency: 1. Definition - as an application writes data, each change is recorded by one server and then propagated to the other servers in the distributed database system asynchronously. 2. Adv - helps to minimize latency, it can lead to temporary inconsistencies in the data. 3. Used - the application doesn’t require any ordering guarantees. Examples include counts of shares, likes, or non-threaded comments in a social media system.

Describe analytical workloads

  • Read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels.
  • Data analysis and decision making.
  • Generated by aggregating the facts presented by the raw data into summaries, trends, and other kinds of “Business information.”
  • Snapshot of the data at a given point in time, or a series of snapshots(provides bigger picture).
  • Example : Monthly Reports

Describe the difference between batch and streaming data


  • Data processing : the conversion of raw data to meaningful information through a process.
  • Types od Data ingestion:
    1. Processing data as it arrives is called streaming.
    2. Buffering and processing the data in groups is called batch processing.

      Understand batch processing

  • Newly arriving data elements are collected into a group.
  • The whole group is then processed at a future time as a batch.
  • Ways of group processing:
    1. scheduled time interval (for example, every hour)
    2. it could be triggered when a certain amount of data has arrived, or as the result of some other event.
  • Advantages of batch processing include:
    1. Large volumes of data can be processed at a convenient time.
    2. Scheduled to run at a time when computers or systems might be idle, such as overnight, or during off-peak hours.
  • Disadvantages of batch processing include:
    1. The time delay between ingesting the data and getting the results.
    2. All of a batch job’s input data must be ready before a batch can be processed (carefully checked,Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt, the input data must be carefully checked before the job can be run again, typographical errors in dates)

Understand streaming and real-time data

  • Each new piece of data is processed when it arrives.
  • Streaming handles data in real time.
  • No waiting until the next batch processing interval
  • Most scenarios where new, dynamic data is generated on a continual basis.
  • Example : financial institution(stock markert) , online gaming, real-estate website

Understand differences between batch and streaming data

DifferencesBatchStreaming
Data Scopeprocess all the data in the datasetmost recent data received, or within a rolling time window (the last 30 seconds, for example)
Data Size handling large datasets efficientlyindividual records or micro batches consisting of few records
Performancefew hours latencyorder of seconds or milliseconds
Analysisperforming complex analytics simple response functions, aggregates, or calculations such as rolling averages

Roles and Responsibitities

<- Back to Main Menu

Postgres DDL DML Statements

Import postgresql

import psycopg2

Data Definition Language(DDL) Commands

CREATE  – Create Database or objects like table, index, function, views, store procedure and triggers.
DROP    – Delete objects from the database.
ALTER   - Alter the structure of the database.
TRUNCATE– Remove all records from a table (spaces allocated for the records are also removed.)
COMMENT – Add comments to the data dictionary.
RENAME  – Rename an object existing in the database.

Create Tables in Music DB

The database is from http://tahaghoghi.com/LearningMySQL/downloads.php

Exported all the tables from MySql to csv files

from IPython.core.display import Image, display
PATH = "./"
Image(filename = PATH + "MusicDB.jpg", width=500, height=500)

ER Diagram

Create Database

Please provide your postgres username and password

con=psycopg2.connect("host=127.0.0.1 user= password=")
cur=con.cursor()
con.set_session(autocommit=True)
name_db="musicdb"
sqlCreatedb="create database " +name_db+";"
cur.execute(sqlCreatedb)

Show Database

sqlShowdb="select datname from pg_database;"
cur.execute(sqlShowdb)
print(cur.fetchall())
[('postgres',), ('student',), ('template1',), ('template0',), ('udacity',), ('musicdb',)]

Always close the existing connection before opening new databse connection

try: 
    con.close()
except psycopg2.Error as e:
    print(e)

Connect with the Musicdb

con=psycopg2.connect("host=127.0.0.1 dbname=musicdb user=postgres password=110484")
cur=con.cursor()

Create Tables for MusicDB

try:
    sqlCreateArtist="create table artist(artist_id INT NOT NULL DEFAULT 0,\
                    artist_name VARCHAR(128) DEFAULT NULL,\
                    PRIMARY KEY (artist_id));"
except psycopg2.Error as e:
    print("Artist already created")
    print(e)
con.set_session(autocommit=True)
cur.execute(sqlCreateArtist)
try:
    sqlCreatealbum="create table album(artist_id INT NOT NULL DEFAULT 0, \
    album_id INT NOT NULL DEFAULT 0,album_name VARCHAR(128) DEFAULT NULL,\
    PRIMARY KEY (artist_id,album_id),\
    FOREIGN KEY (artist_id) REFERENCES artist(artist_id));"
except psycopg2.Error as e:
    print("Album already created")
    print(e)
con.set_session(autocommit=True)
cur.execute(sqlCreatealbum)
cur.execute("Select * from album")
print(cur.fetchall())
[]
cur.execute("Select * from artist")
print(cur.fetchall())
[]
try:
    sqlCreatetrack="CREATE TABLE if not exists track (\
        track_id INT NOT NULL DEFAULT 0,\
        track_name VARCHAR(128) DEFAULT NULL,\
        artist_id INT NOT NULL DEFAULT 0,\
        album_id INT NOT NULL DEFAULT 0,\
        time DECIMAL(5,2) DEFAULT NULL,\
        PRIMARY KEY (artist_id,album_id,track_id),\
        FOREIGN KEY (artist_id)REFERENCES artist(artist_id),\
        FOREIGN KEY (artist_id,album_id) REFERENCES album(artist_id,album_id)\
    )"
except psycopg2.Error as e:
    print("track table already exists")
    print(e)
con.set_session(autocommit=True)
cur.execute(sqlCreatetrack)
cur.execute("select * from track")
print(cur.fetchall())
[]
try:
    sqlCreateplayed="CREATE TABLE played (\
    artist_id INT NOT NULL DEFAULT 0,\
    album_id INT NOT NULL DEFAULT 0,\
    track_id INT NOT NULL DEFAULT 0,\
    played TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,\
    PRIMARY KEY (artist_id,album_id,track_id,played),\
    FOREIGN KEY (artist_id) REFERENCES artist(artist_id),\
    FOREIGN KEY (artist_id,album_id) REFERENCES album(artist_id,album_id),\
    FOREIGN KEY (artist_id,album_id,track_id)REFERENCES track(artist_id,album_id,track_id)) "
except pyscopg2.Error as e:
    print("Table played already exists")
    print(e)
cur.execute(sqlCreateplayed)
cur.execute("select * from played;")
print(cur.fetchall())
[]
f = open(r'C:\Users\tskir\Documents\SQL\Oracle Emp\PostgreSQL\musicDBCSV\artist.csv', 'r')
next(f)
cur.copy_from(f, 'artist', sep=',')

f.close()
cur.execute("select * from artist")
print(cur.fetchall())
[(1, '"New Order"'), (2, '"Nick Cave & The Bad Seeds"'), (3, '"Miles Davis"'), (4, '"The Rolling Stones"'), (5, '"The Stone Roses"'), (6, '"Kylie Minogue"')]
f=open(r'C:\Users\tskir\Documents\SQL\Oracle Emp\PostgreSQL\musicDBCSV\album.csv','r')
next(f)
cur.copy_from(f,'album',sep=",")
f.close()
cur.execute("select * from album")
print(cur.fetchall())
[(1, 1, 'Retro - John McCready FAN'), (1, 2, 'Substance (Disc 2)'), (1, 3, 'Retro - Miranda Sawyer POP'), (1, 4, 'Retro - New Order / Bobby Gillespie LIVE'), (1, 5, '"Power, Corruption and Lies"'), (1, 6, 'Substance 1987 (Disc 1)'), (1, 7, 'Brotherhood'), (2, 1, 'Let Love In'), (3, 1, 'Live Around The World'), (3, 2, 'In A Silent Way'), (4, 1, 'Exile On Main Street'), (5, 1, 'Second Coming'), (6, 1, 'Light Years')]
import csv
f=open(r'C:\Users\tskir\Documents\SQL\Oracle Emp\PostgreSQL\musicDBCSV\track.csv','r')
next(f)
cur.copy_from(f,'track',sep=",")
f.close()
cur.execute("select * from track")
print(cur.fetchall())
[(0, 'Elegia', 1, 1, Decimal('4.93')), (1, 'In A Lonely Place', 1, 1, Decimal('6.26')), (2, 'Procession', 1, 1, Decimal('4.47')), (3, 'Your Silent Face', 1, 1, Decimal('5.99')), (4, 'Sunrise', 1, 1, Decimal('6.01')), (5, "Let's Go", 1, 1, Decimal('3.90')), (6, 'Broken Promise', 1, 1, Decimal('3.76')), (7, 'Dreams Never End', 1, 1, Decimal('3.20')), (8, 'Cries And Whispers', 1, 1, Decimal('3.42')), (9, 'All Day Long', 1, 1, Decimal('5.18')), (10, 'Sooner Than You Think', 1, 1, Decimal('5.21')), (11, 'Leave Me Alone', 1, 1, Decimal('4.67')), (12, 'Lonesome Tonight', 1, 1, Decimal('5.19')), (13, 'Every Little Counts', 1, 1, Decimal('4.47')), (14, 'Run Wild', 1, 1, Decimal('3.95')), (0, 'In A Lonely Place', 1, 2, Decimal('6.30')), (1, 'Procession', 1, 2, Decimal('4.46')), (2, 'Mesh', 1, 2, Decimal('3.44')), (3, 'Hurt', 1, 2, Decimal('6.98')), (4, 'The Beach', 1, 2, Decimal('7.32')), (5, 'Confusion', 1, 2, Decimal('7.64')), (6, 'Lonesome Tonight', 1, 2, Decimal('5.20')), (7, 'Murder', 1, 2, Decimal('3.93')), (8, 'Thieves Like Us', 1, 2, Decimal('6.95')), (9, 'Kiss Of Death', 1, 2, Decimal('7.05')), (10, 'Shame Of The Nation', 1, 2, Decimal('7.91')), (11, '1963', 1, 2, Decimal('5.63')), (0, 'Fine Time', 1, 3, Decimal('4.71')), (1, 'Temptation', 1, 3, Decimal('8.71')), (2, 'True Faith', 1, 3, Decimal('5.88')), (3, 'The Perfect Kiss', 1, 3, Decimal('4.83')), (4, 'Ceremony', 1, 3, Decimal('4.41')), (5, 'Regret', 1, 3, Decimal('4.14')), (6, 'Crystal', 1, 3, Decimal('6.83')), (7, 'Bizarre Love Triangle', 1, 3, Decimal('4.35')), (8, 'Confusion', 1, 3, Decimal('8.22')), (9, 'Round And Round', 1, 3, Decimal('4.52')), (10, 'Blue Monday', 1, 3, Decimal('7.48')), (11, 'Brutal', 1, 3, Decimal('4.83')), (12, 'Slow Jam', 1, 3, Decimal('4.88')), (13, 'Everyone Everywhere', 1, 3, Decimal('4.43')), (0, '"Ceremony [Studio 54, Barcelona 7/7/84]"', 1, 4, Decimal('4.82')), (1, '"Procession [Polytechnic of Central London, London 6/12/85]"', 1, 4, Decimal('3.57')), (2, '"Everything\'s Gone Green [Tolworth Recreation Centre, London 12/3/85]"', 1, 4, Decimal('5.25')), (3, 'In A Lonely Place [Glastonbury Festival 20/6/81]', 1, 4, Decimal('5.55')), (4, '"Age Of Consent [Spectrum Arena, Warrington 1/3/86]"', 1, 4, Decimal('5.04')), (5, 'Elegia [Glastonbury Festival 19/6/87]', 1, 4, Decimal('4.77')), (6, 'The Perfect Kiss [Glastonbury Festival 19/6/87]', 1, 4, Decimal('9.73')), (7, '"Fine Time [Popular Creek Music Theatre, Chicago 30/6/89]"', 1, 4, Decimal('5.04')), (8, '"World [Starplex Amphitheatre, Dallas 21/7/93]"', 1, 4, Decimal('4.81')), (9, 'Regret [Reading Festival 29/8/93]', 1, 4, Decimal('4.03')), (10, 'As It Is When It Was [Reading Festival 29/8/93]', 1, 4, Decimal('3.80')), (11, '"Intermission By Alan Wise [Olympia, Paris 12/11/01]"', 1, 4, Decimal('1.34')), (12, '"Crystal [Big Day Out, Gold Coast 20/1/02]"', 1, 4, Decimal('6.86')), (13, '"Turn My Way [Olympia, Liverpool 18/7/01]"', 1, 4, Decimal('4.96')), (14, '"Temptation [Big Day Out, Gold Coast 20/1/02]"', 1, 4, Decimal('7.79')), (0, 'Age Of Consent', 1, 5, Decimal('5.26')), (1, 'We All Stand', 1, 5, Decimal('5.24')), (2, 'The Village', 1, 5, Decimal('4.62')), (3, '5 8 6', 1, 5, Decimal('7.52')), (4, 'Your Silent Face', 1, 5, Decimal('6.00')), (5, 'Ultraviolence', 1, 5, Decimal('4.87')), (6, 'Ecstasy', 1, 5, Decimal('4.42')), (7, 'Leave Me Alone', 1, 5, Decimal('4.69')), (0, 'Ceremony', 1, 6, Decimal('4.42')), (1, "Everything's Gone Green", 1, 6, Decimal('5.51')), (2, 'Temptation', 1, 6, Decimal('6.99')), (3, 'Blue Monday', 1, 6, Decimal('7.49')), (4, 'Confusion', 1, 6, Decimal('4.72')), (5, 'Thieves Like Us', 1, 6, Decimal('6.61')), (6, 'Perfect Kiss', 1, 6, Decimal('8.04')), (7, 'Subculture', 1, 6, Decimal('4.80')), (8, 'Shellshock', 1, 6, Decimal('6.48')), (9, 'State of the Nation', 1, 6, Decimal('6.54')), (10, 'Bizarre Love Triangle', 1, 6, Decimal('6.74')), (11, 'True Faith', 1, 6, Decimal('5.93')), (0, 'State of the Nation', 1, 7, Decimal('6.56')), (1, 'Every Little Counts', 1, 7, Decimal('4.48')), (2, 'Angel Dust', 1, 7, Decimal('3.73')), (3, 'All Day Long', 1, 7, Decimal('5.21')), (4, 'Bizarre Love Triangle', 1, 7, Decimal('4.37')), (5, 'Way of Life', 1, 7, Decimal('4.11')), (6, 'Broken Promise', 1, 7, Decimal('3.80')), (7, 'As It Is When It Was', 1, 7, Decimal('3.77')), (8, 'Weirdo', 1, 7, Decimal('3.89')), (9, 'Paradise', 1, 7, Decimal('3.86')), (0, 'Do You Love Me?', 2, 1, Decimal('5.95')), (1, "Nobody's Baby Now", 2, 1, Decimal('3.87')), (2, 'Loverman', 2, 1, Decimal('6.37')), (3, 'Jangling Jack', 2, 1, Decimal('2.78')), (4, 'Red Right Hand', 2, 1, Decimal('6.18')), (5, 'I Let Love In', 2, 1, Decimal('4.25')), (6, 'Thirsty Dog', 2, 1, Decimal('3.81')), (7, "Ain't Gonna Rain Anymore", 2, 1, Decimal('3.77')), (8, 'Lay Me Low', 2, 1, Decimal('5.15')), (9, 'Do You Love Me? (Part Two)', 2, 1, Decimal('6.23')), (0, 'In A Silent Way', 3, 1, Decimal('1.81')), (1, 'Intruder', 3, 1, Decimal('4.87')), (2, 'New Blues', 3, 1, Decimal('5.58')), (3, 'Human Nature', 3, 1, Decimal('12.80')), (4, 'Mr. Pastorius', 3, 1, Decimal('3.54')), (5, 'Amandla', 3, 1, Decimal('5.87')), (6, 'Wrinkle', 3, 1, Decimal('7.28')), (7, 'Tutu', 3, 1, Decimal('8.89')), (8, 'Full Nelson', 3, 1, Decimal('2.81')), (9, 'Time After Time', 3, 1, Decimal('9.98')), (10, 'Hannibal', 3, 1, Decimal('7.37')), (0, 'Shhh/Peaceful', 3, 2, Decimal('16.67')), (1, "In A Silent Way/It's About That Time", 3, 2, Decimal('16.67')), (0, 'Rocks Off', 4, 1, Decimal('4.54')), (1, 'Rip This Joint', 4, 1, Decimal('2.38')), (2, 'Shake Your Hips', 4, 1, Decimal('3.00')), (3, 'Casino Boogie', 4, 1, Decimal('3.57')), (4, 'Tumbling Dice', 4, 1, Decimal('3.79')), (5, 'Sweet Virginia', 4, 1, Decimal('4.44')), (6, 'Torn & Frayed', 4, 1, Decimal('4.30')), (7, 'Sweet Black Angel', 4, 1, Decimal('2.97')), (8, 'Loving Cup', 4, 1, Decimal('4.43')), (9, 'Happy', 4, 1, Decimal('3.08')), (10, 'Turd On The Run', 4, 1, Decimal('2.64')), (11, 'Ventilator Blues', 4, 1, Decimal('3.40')), (12, 'I Just Want To See His Face', 4, 1, Decimal('2.90')), (13, 'Let It Loose', 4, 1, Decimal('5.31')), (14, 'All Down The Line', 4, 1, Decimal('3.84')), (15, 'Stop Breaking Down', 4, 1, Decimal('4.57')), (16, 'Shine A Light', 4, 1, Decimal('4.28')), (17, 'Soul Survivor', 4, 1, Decimal('3.82')), (0, 'Breaking Into Heaven', 5, 1, Decimal('11.37')), (1, 'Driving South', 5, 1, Decimal('5.17')), (2, 'Ten Storey Love Song', 5, 1, Decimal('4.50')), (3, 'Daybreak', 5, 1, Decimal('6.56')), (4, 'Your Star Will Shine', 5, 1, Decimal('2.99')), (5, 'Straight To The Man', 5, 1, Decimal('3.26')), (6, 'Begging You', 5, 1, Decimal('4.94')), (7, 'Tightrope', 5, 1, Decimal('4.45')), (8, 'Good Times', 5, 1, Decimal('5.67')), (9, 'Tears', 5, 1, Decimal('6.84')), (10, 'How Do You Sleep', 5, 1, Decimal('4.99')), (11, 'Love Spreads', 5, 1, Decimal('5.79')), (12, 'Untitled', 5, 1, Decimal('6.43')), (0, 'Spinning Around', 6, 1, Decimal('3.46')), (1, 'On A Night Like This', 6, 1, Decimal('3.55')), (2, 'So Now Goodbye', 6, 1, Decimal('3.62')), (3, 'Disco Down', 6, 1, Decimal('3.96')), (4, 'Loveboat', 6, 1, Decimal('4.18')), (5, 'Koocachoo', 6, 1, Decimal('4.00')), (6, 'Your Disco Needs You', 6, 1, Decimal('3.56')), (7, 'Please Stay', 6, 1, Decimal('4.14')), (8, 'Bittersweet Goodbye', 6, 1, Decimal('3.72')), (9, 'Butterfly', 6, 1, Decimal('4.16')), (10, 'Under The Influence Of Love', 6, 1, Decimal('3.40')), (11, "I'm So High", 6, 1, Decimal('3.55')), (12, 'Kids', 6, 1, Decimal('4.34'))]
import csv
f=open(r'C:\Users\tskir\Documents\SQL\Oracle Emp\PostgreSQL\musicDBCSV\played.csv','r')
next(f)
cur.copy_from(f,'played',sep=",")
f.close()

cur.execute("select * from played")
print(cur.fetchall())
[(1, 3, 0, datetime.datetime(2006, 8, 14, 10, 21)), (1, 3, 1, datetime.datetime(2006, 8, 14, 10, 25)), (1, 3, 2, datetime.datetime(2006, 8, 14, 10, 30)), (1, 3, 3, datetime.datetime(2006, 8, 14, 10, 36)), (1, 3, 4, datetime.datetime(2006, 8, 14, 10, 41)), (1, 3, 5, datetime.datetime(2006, 8, 14, 10, 43)), (1, 3, 6, datetime.datetime(2006, 8, 14, 10, 47)), (1, 3, 7, datetime.datetime(2006, 8, 14, 10, 54)), (3, 1, 0, datetime.datetime(2006, 8, 15, 14, 0)), (3, 1, 1, datetime.datetime(2006, 8, 15, 14, 26)), (3, 1, 2, datetime.datetime(2006, 8, 15, 14, 33))]

Data Manipulation Language(DML)

SELECT  – query data in the database
INSERT  – insert data into a table
UPDATE  – update data in a table
DELETE  – delete data from a table

cur.execute("select count(*) from artist")
print(cur.fetchall())
[(6,)]
cur.execute("select count(*) from album")
print(cur.fetchall())
[(13,)]
cur.execute("select count(*) from played")
print(cur.fetchall())
[(11,)]
cur.execute("select count(*) from artist")
print(cur.fetchall())
[(6,)]

Close all Connections

cur.close()
con.close()