In this assignment, you conciliate intent the boards to restrain facts in the CSVs, significance the CSVs into a SQL factsbase, and repartee questions about the facts. In other words, you conciliate perform:
Note: You may attend the promise "Data Modeling" in fix of "Data Engineering," but they are the identical promises. Facts Engineering is the further existent wording instead of Facts Modeling.
Inspect the CSVs and havenray out an ERD of the boards. Feel unhindered to use a machine apassay http://www.quickdatabasediagrams.com.
Use the advice you possess to cause a board schema for each of the six CSV refines. Remember to enumerate facts types, leading keys, strange keys, and other constraints.
For the leading keys bridle to see if the post is singular, otherwise cause a composite key. Which interests to leading keys in dispose to singularly establish a row.
Be believing to cause boards in the imassay dispose to manipulate strange keys.
Imhaven each CSV refine into the identical SQL board. Note be believing to significance the facts in the identical dispose that the boards were caused and statement for the headers when significanceing to relinquish errors.
Once you possess a full factsbase, do the forthcoming:
List the forthcoming details of each employee: employee sum, latrial call, chief call, sex, and compensation.
List chief call, latrial call, and commission time for employees who were commissiond in 1986.
List the supervisor of each branch delay the forthcoming advice: branch sum, branch call, the supervisor's employee sum, latrial call, chief call.
List the branch of each employee delay the forthcoming advice: employee sum, latrial call, chief call, and branch call.
List chief call, latrial call, and sex for employees whose chief call is "Hercules" and latrial calls initiate delay "B."
List all employees in the Sales branch, including their employee sum, latrial call, chief call, and branch call.
List all employees in the Sales and Development branchs, including their employee sum, latrial call, chief call, and branch call.
In descending dispose, register the abundance reckon of employee latrial calls, i.e., how frequent employees divide each latrial call.
As you inspect the facts, you are conquer delay a creeping jealousy that the factsset is fake. You suspect that your boss operationmaned you unauthentic facts in dispose to trial the facts engineering skills of a new employee. To sanction your hunch, you determine to interest the forthcoming stalks to propagate a visualization of the facts, delay which you conciliate menace your boss:
Imhaven the SQL factsbase into Pandas. (Yes, you could peruse the CSVs straightway in Pandas, but you are, behind all, reserved to assay your technical ardor.) This stalk may claim some inquiry. Feel unhindered to use the jurisprudence under to get afloat. Be believing to produce any requisite modifications for your username, password, multitude, haven, and factsbase call:
from sqlalchemy significance cause_engine engine = cause_engine('postgresql://localhost:5432/<your_db_name>') unarm-an = engine.connect()
Consult SQLAlchemy documentation for further advice.
If using a password, do not upload your password to your GitHub lodgment. See https://www.youtube.com/watch?v=2uaTPmNvH0I and https://help.github.com/en/github/using-git/ignoring-files for further advice.
Create a histogram to visualize the most vile compensation ranges for employees.
Create a bar chart of mean compensation by appellation.
Evidence in operationman, you bait into your boss's station and bestow the visualization. Delay a sly cachinnation, your boss donation you for your operation. On your way out of the station, you attend the words, "Search your ID sum." You behold down at your badge to see that your employee ID sum is 499942.
Create an picture refine of your ERD.
Create a .sql refine of your board schemata.
Create a .sql refine of your queries.
(Optional) Cause a Jupyter Notebook of the benefit segregation.