Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/10. Applied Data Science Capstone/03. Exploratory Data Analysis/03. Exploratory Data Analysis - Data Visualization.ipynb
Views: 4598
SpaceX Falcon 9 First Stage Landing Prediction
Assignment: Exploring and Preparing Data
Estimated time needed: 70 minutes
In this assignment, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is due to the fact that SpaceX can reuse the first stage.
In this lab, you will perform Exploratory Data Analysis and Feature Engineering.
Falcon 9 first stage will land successfully
Several examples of an unsuccessful landing are shown here:
Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.
Objectives
Perform exploratory Data Analysis and Feature Engineering using Pandas
and Matplotlib
Exploratory Data Analysis
Preparing Data Feature Engineering
Import Libraries and Define Auxiliary Functions
We will import the following libraries the lab
Set the style for the Seaborn plots in the latter stages of this notebook.
Exploratory Data Analysis
First, let's read the SpaceX dataset into a Pandas dataframe and print its summary
First, let's try to see how the FlightNumber
(indicating the continuous launch attempts.) and Payload
variables would affect the launch outcome.
We can plot out the FlightNumber
vs. PayloadMass
and overlay the outcome of the launch. We see that as the flight number increases, the first stage is more likely to land successfully. The payload mass is also important; it seems the more massive the payload, the less likely the first stage will return.
We see that different launch sites have different success rates. CCAFS LC-40
, has a success rate of 60 %, while KSC LC-39A
and VAFB SLC 4E
has a success rate of 77%.
Next, let's drill down to each site visualize its detailed launch records.
TASK 1: Visualize the relationship between Flight Number and Launch Site
Use the function catplot
to plot FlightNumber
vs LaunchSite
, set the parameter x
parameter to FlightNumber
,set the y
to Launch Site
and set the parameter hue
to 'class'
Now try to explain the patterns you found in the Flight Number vs. Launch Site scatter point plots.
TASK 2: Visualize the relationship between Payload and Launch Site
We also want to observe if there is any relationship between launch sites and their payload mass.
Now if you observe Payload Vs. Launch Site scatter point chart you will find for the VAFB-SLC launchsite there are no rockets launched for heavypayload mass(greater than 10000).
TASK 3: Visualize the relationship between success rate of each orbit type
Next, we want to visually check if there are any relationship between success rate and orbit type.
Let's create a bar chart
for the sucess rate of each orbit
Analyze the ploted bar chart try to find which orbits have high sucess rate.
TASK 4: Visualize the relationship between FlightNumber and Orbit type
For each orbit, we want to see if there is any relationship between FlightNumber and Orbit type.
You should see that in the LEO orbit the Success appears related to the number of flights; on the other hand, there seems to be no relationship between flight number when in GTO orbit.
TASK 5: Visualize the relationship between Payload and Orbit type
Similarly, we can plot the Payload vs. Orbit scatter point charts to reveal the relationship between Payload and Orbit type
With heavy payloads the successful landing or positive landing rate are more for Polar,LEO and ISS.
However for GTO we cannot distinguish this well as both positive landing rate and negative landing(unsuccessful mission) are both there here.
TASK 6: Visualize the launch success yearly trend
You can plot a line chart with x axis to be Year
and y axis to be average success rate, to get the average launch success trend.
The function will help you get the year from the date:
you can observe that the sucess rate since 2013 kept increasing till 2020
Features Engineering
By now, you should obtain some preliminary insights about how each important variable would affect the success rate, we will select the features that will be used in success prediction in the future module.
TASK 7: Create dummy variables to categorical columns
Use the function get_dummies
and features
dataframe to apply OneHotEncoder to the column Orbits
, LaunchSite
, LandingPad
, and Serial
. Assign the value to the variable features_one_hot
, display the results using the method head. Your result dataframe must include all features including the encoded ones.
TASK 8: Cast all numeric columns to float64
Now that our features_one_hot
dataframe only contains numbers cast the entire dataframe to variable type float64
We can now export it to a CSV for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.
Authors
Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Nayef Abou Tayoun is a Data Scientist at IBM and pursuing a Master of Management in Artificial intelligence degree at Queen's University.
Change Log
Date (YYYY-MM-DD) | Version | Changed By | Change Description |
---|---|---|---|
2021-10-12 | 1.1 | Lakshmi Holla | Modified markdown |
2020-09-20 | 1.0 | Joseph | Modified Multiple Areas |
2020-11-10 | 1.1 | Nayef | updating the input data |
Copyright © 2020 IBM Corporation. All rights reserved.