Learning Objectives:
- Select approproate libraries or existing code segments to use in creating new programs
What are Libraries?
In Python, a library is a collection of pre-written code, functions, and modules that extend the language’s capabilities. These libraries are designed to be reused by developers to perform common tasks, rather than having to write the code from scratch. Libraries are essential for simplifying and accelerating the development process, as they provide a wide range of tools and functions for various purposes.
Here are some key points about Python libraries:
-
Modules: Libraries in Python consist of modules, which are individual Python files containing functions, classes, and variables related to a specific set of tasks or a particular domain. You can import these modules into your own Python code to access their functionality.
-
Standard Library: Python comes with a comprehensive standard library that includes modules for various tasks, such as working with files, networking, data processing, and more. These modules are readily available and do not require installation.
-
Third-party Libraries: In addition to the standard library, there is a vast ecosystem of third-party libraries created by the Python community. These libraries cover a wide range of domains, including web development, data analysis, machine learning, game development, and more. Some popular third-party libraries include NumPy, Pandas, Matplotlib, TensorFlow, Django, Flask, and many others.
How Do We Get Libraries into Our Code and Working?</strong>
To get libraries into our code, we use the import statement followed by the library we want to import.
Lets start simply:
#In this code cell, we are importing the math library which allows us to do math operations,
#and the random library which lets us take pseudorandom numbers and choices.
import math
import random
#We use the libraries by first calling them by their name, then using one of their methods.
#For example:
num = 64
print(math.sqrt(num))
numList = [1,2,3,4,5,6]
print(random.choice(numList))
#Here, 'math' and 'random' are the names of the libraries, and 'sqrt' and 'choice' are the names of the methods.
from math import sqrt
from random import *
num = 64
print(sqrt(num))
numList = [1,2,3,4,5,6]
print(choice(numList))
Import your own library from a list of provided libraries, and use one of its methods. This can be something very bare bones, such as printing the time, getting a random number in a list, or doing something after sleeping a certain amount of time ```python #math library module examples: sqrt(num), square(num), cube(num), factorial(num) #random library module examples: choice(list), randrange(lowest, highest, step[numbers chosen in multiples of {step}]) #datetime library module examples: datetime.now() #sleep library module examples: sleep(milliseconds) import math import random num = 16 print(math.sqrt(num)) numList = [211,131,3134,4,645,6636] print(random.choice(numList)) ``` 4.0 3134 Documentation</strong>
Documentation in Python libraries refers to the written information and explanations provided to help users understand how to use the library, its classes, functions, and modules. It serves as a comprehensive guide that documents the library's functionality, usage, and often includes code examples. Documentation is typically created by the library developers and is an essential component of a well-maintained library.
Examples of Documentation: An introductory section explaining the purpose of the library, a section on how to install the library, basic usage examples, etc. ```python calcAverage(grades) ''' You know the name of the procedure and the perameters, but... You probably wouldn't be able to use this procedure with confidence because you don't know its function exactly (maybe you can guess that it finds the average, but you wouldn't know if it uses mean, mode, or median to find the average). You would also need more information on the perameters. ''' ``` Libraries and APIs</strong>
- A file that contains procedures that cane be used in a program is called a library - An Application Program Interface (API) provides specifications for how procedures in a library behave and can be used. APIs define the methods and functions that are available for developers to interact with a library. They specify how to make requests, provide inputs, and receive outputs, creating a clear and consistent way to use library features. Which libraries will be very important to us?
- Requests - Simplifies working with HTTP servers, including 'request'-ing data from them, and recieving it
- Pillow - Simplifies image processing
- Pandas - Simplifies data analysis & manipulation
- Numpy - Vastly quickens functionality of arrays up to 50 times faster than regular python list
- Scikit-Learn - Implements machine learning models and statistical modelling
- Tensorflow - Data automation, model tracking, performance monitoring, and model retraining
- Matplotlib - Creates static, animated, and interactive visualizations in Python
DON'T FORGET TO DOWNLOAD ALL OF THESE (pip install "library")
Popcorn Hack #2Using the requests library and the ? module (since we should already be using this in our backend) GET a request from the api at "https://api.github.com" ```python import requests #GET a request using the requests library. Remember to put your api link in quotes! If you get something along the lines of response [200] then you succeeded requests.get("https://api.github.com/events") ``` <Response [200]> ## Scikit-Learn and Numpy This code uses NumPy to create an array, and Scikit-Learn to analyze the data. It creates a linear regression which describes the relationship between the x and y arrays which reperesent independent and dependent variables. In simpler terms, it is creating a line of best fit between the two data sets, just like how you would in something like desmos. > ```python import numpy as np import numpy as np from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Generate some example data X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Feature (independent variable) y = np.array([2, 4, 5, 4, 5]) # Target (dependent variable) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a Linear Regression model model = LinearRegression() # Fit the model to the training data model.fit(X_train, y_train) # Make predictions on the test data y_pred = model.predict(X_test) # Evaluate the model by calculating the Mean Squared Error mse = mean_squared_error(y_test, y_pred) # Print the model coefficients and MSE, model coefficient is the slope of the linear regression line, MSE is how well the model is performing, the closer it is to 0 the better print("Model Coefficients:", model.coef_) print("Mean Squared Error:", mse) intercept = model.intercept_ slope = model.coef_[0] print(f"Linear Regression Equation: y = {intercept} + {slope} * X") ``` ## Request - The requests module allows you to send HTTP requests using Python. - In order to download requests, you would have to type pip install requests in your terminal. ## Syntax - requests.methodname(params) ```python import requests x = requests.get('http://127.0.0.1:9008/') print(x.text) #not functional code, example of syntax ``` --------------------------------------------------------------------------- BadStatusLine Traceback (most recent call last) File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:665, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 664 # Make the request on the httplib connection object. --> 665 httplib_response = self._make_request( 666 conn, 667 method, 668 url, 669 timeout=timeout_obj, 670 body=body, 671 headers=headers, 672 chunked=chunked, 673 ) 675 # If we're going to release the connection in ``finally:``, then 676 # the response doesn't need to know about the connection. Otherwise 677 # it will also try to release it and we'll have a double-release 678 # mess. File /usr/lib/python3/dist-packages/urllib3/connectionpool.py:421, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 417 except BaseException as e: 418 # Remove the TypeError from the exception chain in 419 # Python 3 (including for exceptions like SystemExit). 420 # Otherwise it looks like a bug in the code. --> 421 six.raise_from(e, None) 422 except (SocketTimeout, BaseSSLError, SocketError) as e: File
Homework Hack
1) Create a code that makes a data table which organizes the average values(mean) from a data set the has atleast 5 values per category and using 2 libraries, ex:</br> 2) Create a Python script that downloads images from a website using the requests library, processes them with the Pillow library, and then performs data analysis with the Pandas library. ```python #homework 1 import pandas as pd import numpy as np # Creating a sample dataset data = { 'Day': ['Monday', 'Tuesday', 'Wednsday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 'OuncesWaterDrank': [43, 65, 50, 55, 68, 62, 65] } # Creating a DataFrame df = pd.DataFrame(data) # Calculating the mean for each category means = df.groupby('Day')['OuncesWaterDrank'].mean().reset_index() result = pd.DataFrame({'Day': ['Monday', 'Tuesday', 'Wednsday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 'MeanOuncesWaterDrank': means['OuncesWaterDrank']}) print(result) ``` --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) /home/ellier/vscode/csp3-repo/_notebooks/2023-10-26-Libraries.ipynb Cell 36 line 2 1 #homework 1 ----> 2 import pandas as pd 3 import numpy as np 5 # Creating a sample dataset ModuleNotFoundError: No module named 'pandas' ```python #homework 2 import requests from PIL import Image import pandas as pd image_urls = [ 'https://imugr.com/a/uoKSBO9', 'https://thumbs.dreamstime.com/b/sunrise-over-beach-cancun-beautiful-mexico-40065727.jpg', 'https://images.pexels.com/photos/1974521/pexels-photo-1974521.jpeg?cs=srgb&dl=pexels-julia-kuzenkov-1974521.jpg&fm=jpg' ] for i, url in enumerate(image_urls): response = requests.get(url) with open(f'image_{i}.jpg', 'wb') as f: f.write(response.content) data = {'Image_ID': [i for i in range(len(image_urls))]} df = pd.DataFrame(data) print("data analysis") print(df) ```