TDM 10100: Project 8 — 2023
Motivation: Functions are an important part of writing efficient code.
Functions allow us to repeat and reuse code. If you find you using a set of coding steps over and over, a function may be a good way to reduce your lines of code!
Context: We’ve been learning about and using functions these last few weeks.
To learn how to write your own functions we need to learn some of the terminology and components.
Scope: r, functions
Dataset(s)
We will use the same dataset(s) as last week:
-
/anvil/projects/tdm/data/icecream/combined/products.csv
-
/anvil/projects/tdm/data/icecream/combined/reviews.csv
Please choose 3 cores when launching the JupyterLab for this project. |
|
Please remember to run the Error in fread: could not find function "fread" |
We will see how to write our own function, so that we can make a repetitive operation easier, by turning it into a single command.
We need to take care to name the function something concise but meaningful, so that other users can understand what the function does.
Function parameters can also be called formal arguments.
A function contains multiple interrelated statements. We can "call" the function, which means that we run all of the statements from the function. Functions can be built-in or can be created by the user (user-defined). Some examples of built in functions are:
Syntax of a function
|
Questions
Question 1 (2 pts)
To gain better insights into our data, let’s make two simple plots. The following are two examples. You can create your own plots.
-
In project 07, you found the different ingredients for the first record in the
products
data frame. We may get all of the ingredients from theproducts
data frame, and find the top 10 most frequently used ingredients. Then we can create a bar chart for the distribution of the number of times that each ingredient appears. -
A line plot to visualize the distribution of the reviews of the products.
-
What information are you gaining from these graphs?
The table
function can be useful to get the distribution of the number of times that each ingredient appears.
This is a good website for bar plot examples: www.statmethods.net/graphs/bar.html
This is a good website for line plot examples: www.sthda.com/english/wiki/line-plots-r-base-graphs
Making a dotchart
for Question 1 is helpful and insightful, as demonstrated in the video. BUT we also want you to see how to make a bar plot and a line plot. Do not worry about the names of the ingredients too much. If only a few names of ingredients appear on the x-axis for Question 1, that is OK wiht us. We just want to show the distribution (in other words, the numbers) of times that items appear. We are less concerned about the item names themselves.
Question 2 (1 pt)
For practice, now that you have a basic understanding of how to make a function, we will use that knowledge, applied to our dataset.
Here are pieces of a function we will use on this dataset; products, reviews and products' rating put them in the correct order
* merge_results <- merge(products_df, reviews_df, by="key")
* }
* function(products_df, reviews_df, myrating)
* return(products_reviews_results)
* {
* products_reviews_results <- merge_results[merge_results$rating >= myrating, ]
* products_reviews_by_rating <-
Question 3 (1 pt)
Take the above function and add comments explaining what the function does at each step.
Question 4 (2 pts)
my_selection <- products_reviews_by_rating(products, reviews, 4.5)
Use the code above, to answer the following question. We want you to use the data frame my_selection
when solving Question 4. (Do not use the full products
data frame for Question 4.)
-
How many products are there (altogether) that have rating at least 4.5? (This is supposed to be simple: You can just find the number of rows of the data frame
my_selection
.)
The function merged two data sets products and reviews. Both of them have an |
Question 5 (2 pts)
For Question 5, go back to the full products
data frame. (In other words, do not limit yourself to my_selection
any more.) When you are constructing your function in part a, it should be helpful to review the videos from Question 1.
-
Now create a function that takes 1 ingredient as the input, and finds the number of products that contain that ingredient.
-
Use your function to determine how many products contain SALT as an ingredient.
(Note: If you test the function with "GUAR GUM", for instance, you will see that there are 85 products with "GUAR GUM" as an ingredient, as we learned in the previous project.)
Project 08 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project08.ipynb
-
-
R code and comments for the assignment
-
firstname-lastname-project08.R
.
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |